Redefining cancer subtypes using multi-omics and deep learning

Cancer is a disease of the genome that is characterized by abnormal cell growth and invasion of other body parts. It affected 19m people in 2020, and was the cause of 9.5m deaths in 2020 alone.

For centuries, a conventional organ-based classification system of cancer (i.e., breast cancer, lung cancer, colon cancer, prostate cancer, and so on) has been successful in clinical practice and research and has been used globally in cancer registries. Recently, molecular pathology and diagnostics have become an essential component in clinical decision-making to manage cancer patients. To advance oncology towards personalized medicine, we need informative biomarkers that can classify tumors/patients for therapeutic and prognostic subtypes.

Recent advances in high-throughput sequencing enabled deep profiling of cancer biopsies using multiple molecular assays such as RNA-seq, exome-seq, methyl-seq. These efforts create related but non-identical multi-omic datasets from the same biopsy. We can now look at the genome, gene expression and epigenome of the same tumor at the same time. Combinations of such datasets provide better molecular understanding of cancer as opposed to looking at just mutations on single genes or a pre-defined panel of genes. However, the diversity and dimensions of such multi-omic data sets necessitate the use of modern machine learning algorithms. Only this way, we can make sense of the complexity of molecular features of cancer and discover more informative composite biomarkers as opposed to biomarkers that rely on mutations of single genes. Only by tackling this complexity with proper computational methods coupled with multi-omics true precision medicine can be achieved.

This year at the AACR2021 and EACR2021 annual meetings, we are presenting how large multi-modal datasets (multi-omics) can be integrated to refine and in some cases redefine molecular subtypes of cancers. In our research, we show that patterns we derive from multi-omics data sets are biologically relevant and improve known subtypes. You can view the high-resolution version of the poster here.