Single-Cell Multiomics of Cancer And Machine Learning Approaches

Paramita Mishra

INTRODUCTION

Cancer is a quintessential complex disease, hallmarked by biomarkers acquired through molecular modifications at all levels (genomic, transcriptomic, proteomic, metablomic, epigenomic). At the advent of next generation sequencing (NGS) technologies, efforts were made to understand cancer mechanisms using single-omics data. For instance, epigenetic changes could be found using differential ATAC-Seq landscapes¹, mRNA-Seq profiles could be used for differential expression analysis, and proteins could be identified using proteomic and transcriptomic data from ChIP-Seq and RNA-Seq.² These techniques facilitate gene identification, epigenetic marker detection, and classification of tumors based on proteomic profiles. However, there is a missing piece of the puzzle: the relationship between the molecular nature and phenotypes involved in cancer has been very unclear.³

Single-cell multiomics (scMO) is a recent and groundbreaking technology for creating tangible omic data at multiple layers to identify cells and cellular function in tandem, bringing hope in understanding the genomic-level causation, heterogeneity, and branched evolution of cancer cells. In single-cell cancer epigenomics, integration of scMO data gives insight on tumor heterogeneity, the role of the tumor microenvironment (TMI) and corresponding precision therapies, cancer cell evolution, resistance and survival. The integrative analysis of single-cell transcriptomics and single-cell epigenomics in cancer has already answered some questions about molecular function and phenotypes.⁴ Single-cell omic and multi-omic analyses are boosted by new techniques for sample collection and analysis, such as liquid biopsy and scMO-based machine learning (ML). Our understanding of epigenetic abnormalities in cancer - often responsible for dysregulation, aberrant function and altered expression of genes - has increased exponentially with the onset of these techniques. In the process, scMO data has grown exponentially, increasing in its dimensionality as new integrative scMO sequencing techniques increase⁵ ⁶. This points to the need for feature selection, data science, sequencing data optimization and data integration in multi-omics. Single-cell multiomics therefore involves an interdisciplinary integration of omic data analysis, big data, machine learning, precision medicine, and prior knowledge of carcinogenic genome and epigenome patterns, to gain sensitive and precise insights.

In this review, I aim to discuss cancer genomics, the growing presence of scMO in oncology research, processing and interpretation of big multiomics data, and machine learning for scMO. Specifically, I highlight the pathology, causation, genomics and epigenomics of cancer, single-cell sequencing techniques and its role in oncology, the importance of multi-omics and the role of scMO in precision medicine. The latter half reviews current trends in single-cell oncology-based ML for superior detection and treatment, highlighting basic probabilistic and ML models used in multi-omics and future ML-based genomic medicine. Finally, I discuss an underrepresented challenge - the presence of big data in cancer as a result of the rapidly-improving single-cell omics technologies and the subsequent “curse of dimensionality”⁷, highlighting data- and algorithm-based solutions.

I. THE COMPLEXITY OF CANCER

The definition of cancer is often extended beyond that of a targeted disease - it is considered a complex biological system, governed by genetic alterations. Cancer shows highly heterogeneity, both in its phenotypic manifestation and its genome. Despite this variability, a set of genomic properties are common in cancers: (1) uncontrolled growth and division of cells in our body, (2) damage to DNA and the presence of alterations, deletions, rearrangements, and genetic dysregulation, and (3) temporal genetic and epigenetic changes, causing loss of control at the single-cell gene level⁸.

Tumor Progression

Studies have generated a so-called “cancer blueprint”⁹, showing how one alternatively -expressed cell can create a myriad of manifestations. The first manifestation is the pre-malignant lesion. An example is the autosomal dominant disease FAP (familial adenomatous polyposis; related to colon polyps.) The gene APC is an inhibitor of the WNT signalling pathway and cells without APC have unstoppable growth¹⁰. In practice, lesions often disappear¹¹; if not, the next stage of cancer presents as a primary tumor. The local malignant tumor, unlike lesions, rarely disappears and often reaches the next stage: a lethal tumor that may spread in the body, resist immune response/treatment, and cause invasion and metastatic disease - the two key drivers of cancer mortality¹².

Carcinogenic Mutations

High-throughput sequencing technologies have caused the identification of three genes shaping our understanding of the blueprint: oncogenes (e.g. growth factors or GF, GF receptors, signaling molecules, protein transcription regulators in nucleus¹³) which control cell growth and cell lethality, tumor suppressor genes which control DNA repair and growth¹⁴¹⁵, and most-interestingly, epigenetic modifiers such as enzymes regulating transcription or proteins for chromatin formation. 10% cancers are inherited in germline mutations with high penetrance (causing 40-90%< people inheriting the mutations to develop cancer/cancers¹⁶). Many such mutations were identified using bulk sequencing and are often oncogene or tumor suppressor mutations.

Tumor genome sequencing has revealed prevalent proteins affecting epigenetic regulation called chromatin remodeling factors (CRF). CRFs partake by changing DNA methylation, histone protein levels, or nucleosome position. The function of these complexes is unclear - yet, almost all cancers have one of more of these proteins frequently mutated¹⁷. The holistic study of often-unknown mutations like CRFs is important. One reason for this is the Knudson’s Two Hit Model (KTHM)¹⁸ - an early theoretical model used to explain a majority of cancers. KTHM states a lower bound of two mutations in cells to start tumor development. Therefore, markers of many forms of mutations must be measured to capture the holistic single-cell profile of a cancer cell.

Epigenomic Modification Of The Cancer Cell

The presence of unique cellular combinations of mutations is an important trait in cancer. Some of the least understood aspects of these combinations involve the epigenome. Mutations in epigenetic modifiers cause the reprogramming of gene expression. The epigenome is actively-changing in cancers - yet, it is one of the less-understood domains within single-cell genomics due to less reliable single cell sequencing. Technologies for scSeq of the epigenome are promising but still in their early stages; despite this, breakthroughs in oncology have been made through single-cell epigenomic data¹⁹.

Recent sc-Seq techniques have provided solutions for this problem and discovered a dramatic increase in the relevance of epigenetics in cancer cells. Many epigenomic changes are now known to be mutually dependent on other omic activity²⁰. For instance, promoter CpG island hypermethylation-based silencing of repair genes, may occur in a “loop”²¹: the silencing can cause genetic changes, and translocations and mutations can then cause epigenetic disruption - a great example of sophisticated epigenomic insights using multi-omics, in this case using epigenomic and transcriptomic profiles. Furthermore, the integration of epigenomic data in omic and multi-omic analysis has made us aware of the phenomena below.

Cancer lethality, resistance and metastasis: Epigenomic changes are a key form of adaptation and survival, which generates late-stage mutations and heterogeneity in cancer. In general, histone modifying enzyme mutations allow the cancer cell to show differential regulation, causing heightened adaptation in tumor cells to changing conditions like chemotherapy, and ultimately develop resistance and take on properties such as invasion and metastasis²². Most cancer cells have epigenetic mutations that modify proteins regulating transcriptomics.If a cancerous tumor were to stop at the formation of a local, primary malignant tumor, we can simply use surgery, radiotherapy or chemotherapy - forms of physical tumor removal. However, instead this becomes a rapidly-spreading cancer due to cancer epigenomics. The alterations of the epigenome allow the tumor to “hack” the genetic code, and eventually with mutations and evolutionary pressure, this causes resistance to genomic/cancer therapy, initiation or increase of metastasis.

Tumor microenvironment (TME): The TME is a diverse molecular system composed of immune, stromal, endothelial, and tumor cells²³. It also includes non-cellular components such as the extracellular matrix and secreted signaling molecules. The area is very epigenetically active, showing heterogeneity, plasticity, and complex molecular cross-interactions ²⁴. Immune checkpoint blockade²⁵ therapies - one of the greatest paradigm shifts in cancer immunotherapy - relies on T-cell mediated anti-tumor immunity within the TME²⁶. High-dimensional multimodal datasets can enable cancer evolutionary lineage tracing and epigenetic profiling of TME immune cells. This would give insight into mechanisms driving the functional diversity of TME immune cells and tumor cells - a diversity driven to a large degree by epigenomic changes.

Cancer evolution: Tumor diversity undergoes Darwinian selection pressures which affects cancer genomics. Some TME-based environmental selection pressures are the immune system, food, oxygen or water deprivation, pH changes, temperature, chemotherapy, radiotherapy, exposure to mutagens. Epigenetic changes are also under the same Darwinian pressure, assuming variation and perfect cellular competition, and are often heritable. This can be used to create computational frameworks for the single-cell epigenome. Single-cell tumor phylogenic evolutionary trees can reveal “driver mutations” (initial mutations that occurred before mutagenesis) for therapies against resistant cells.²⁷ Clonal cell populations can be used alongside spatial tags and serial sampling to understand the TME complex better.

Heterogeneity:²⁸ Tumors contain heterogeneous cells with distinct genetic and phenotypic properties that can promote metastasis and drug resistance differentially. Inter-tumor heterogeneity means differences in the tumors among different patients, however intra-tumor or spatial heterogeneity is the difference within a single tumor mass. Precision medicine often aims to decipher and target this heterogeneity for treatment provision based on the precise molecular makeup of a tumor. Single-cell techniques provide a way to profile individual cells within tumours and learn differential function, resistance or metastasis. ²⁹

II. SEQUENCING CANCER

Next Generation Sequencing (NGS)

NGS made it possible to sequence DNA more economically, sensitively and efficiently than Sanger sequencing. The parallel sequencing feature in NGS makes it a staple for sequencing, since this facilitates the processing of several samples and genomic areas in a quick way. Additionally, it can work much better than Sanger Sequencing on low-quantity input, detecting mutations more accurately. In a study, mutations in tumor tissues for the five most common cancer types were analyzed using NGS.³⁰ The “Shannon entropy level” (to measure analytical utility) was calculated for each tumor. The aim was to see if NGS reveals new information (high numbers of entropy are positive). Even within the most common cancers, there was a big difference in scores, which shows that for some major cancer types, NGS may have analytic utility and certainly the right sensitivity to provide cancer diagnoses.

Since the onset of NGS, whole genome and exome DNA-sequencing have been key drivers of the scientific understanding of the disease. In 2013, full somatic mutations were discovered within tumors³¹, and their mutation load was discovered, as well as the differential mutation load based on the cancer. For the well-characterized oncogenes and tumor suppressor genes, there is evidence documenting their cellular growth-based functionality. However, there are some limitations of NGS for cancer; in the case of aneuploidy, tumour heterogeneity and contamination with normal tissue, which are all common in cancer, NGS would not give high-quality values. In general, bulk analysis techniques for cancer analysis would cause averaging signals from mixed cells, which may mask or hide tumor clones and may never be detected in measuring cell diversity.³² Finally, tumor genome sequencing may reveal mutations using NGS, but for many, there is no critical data emphasizing or defining gene function.³³

Single-Cell Sequencing

Single-cell sequencing (sc-Seq) technologies have been rapidly built over the past decade with the aim of observing the “multilayered status” of cells. SCT has the power to elucidate genomic, epigenomic, and transcriptomic heterogeneity in cellular populations, and the changes at these levels.

Sample preparation for cancer: The benefit of creating a sample for scSeq is that almost any tissue can be used, albeit the data quality is fragile and the right techniques must be used to avoid amplification bias or allele dropout. An average cancer cell contains ∼6–12 pg and 10–50 pg of total DNA and total RNA, respectively, depending on ploidy and type. This includes 1 to 5% mRNA, meaning that amplification (WGA) is very important. If the sample is solid tumor cells, surgery and solid biopsy at very late metastatic stages of a cancer makes tissue difficult to obtain, as does collection at very early stages of cancer, and especially for a pre-cancer screen. Also, isolating cells from solid tissues may cause unbiased disaggregating of the tissue, skewing ‘omic data. An efficient solution is using another biomarker for cancer, called CTCs (Circulating tumor cells), cells shed by solid tumors during metastasis, using liquid biopsy as the sampling method. Liquid biopsies³⁴ do not have an issue of unideal disaggregating of tissues, since circulating tumor DNA (ctDNA) released can directly be collected as the biomarker.³⁵ Micro-manipulation devices or special pipettes are common isolation techniques when it does not matter that output is low-throughput. Flourescence-activating cell sorting and last capture micro-dissection were developed to improve isolation, and finally, the microfluidics technique was discovered, causing a significant increase in throughput with little material required. Barcoded seq prep may also be required for some formulations.

Drawbacks To Single-Cell Sequencing

Drawbacks include loss of tumor characteristics including spatial information, intratumor heterogeneity, and important cell-to-cell interactions. Single-cell preparation by nature requires single cells to be dissociated, therefore losing spatial information. Single-cell sequencing may not accurately represent the underlying genome of the whole tumor when a small biopsy is taken, due to intratumor heterogeneity not being captured. Dissociating single cells from tissues may itself alter the cells and their gene expression. There is also a tradeoff for microfluidic devices. There is a reduction in allelic dropout, but entire cell populations can be lost. There may be bias for certain cell sizes, which can skew results, like uneven amplification. Some single-call sequencing techniques require complex dissociation protocols to obtain individualized fresh cells. This may be due to unneeded manipulation between sample collection and processing. To avoid this happening, researchers have to work with cell lines or organoids, which is not a perfect alternative for the co-existent system existing in the TME, and the single-cell sequencing insights that could have been found from these interactions.

Single-Cell Multiomics And Analysing Cancer scMO Data

Multi-omics describes a set of multi-dimensional tools for wrangling high-throughput sequencing data from various domains and techniques. To study all types of cells and omics layers, we should consider single-cell sequencing methods from both laboratory and clinical views. Single cell sequencing can split the heterogeneity of bulk tissue at the genotypic and phenotypic level. Multi-omics studies characteristics of single cells, but also studies combined regulatory mechanisms evident only at pushed-down dimensions. The most special aspect of scMO is its ability to interpret correlations between separate omics reads, the ability to facilitate machine learning and dimensionality processing, and the applications to systems biology (through networks and correlations) and precision medicine (thorough ML-based regressions and classifications). scMO frequently covers the following at the single-cell level: transcriptomics, genomics, epigenomics, proteomics, temporal and spatial multi-omics.

Single-cell transcriptomics: Smart-seq is often the preferred method for scMO transcriptomics. It uses full-length cDNA amplification alongside oligo-dT priming and template switching. RamDa-seq7 detects RNAs with no poly-A tail, like enhancers. scRNA-Seq is difficult due to the volume of RNA copies in cells. Therefore, microdroplet technologies are used to optimize reverse transcription by conducting it with barcoding of each oil droplet. Microwells can also be used, since they can handle thousands of cell, and this adds to sensitivity by reducing allele dropouts. In cancer, the transcriptomic biosphere was first studied in 2016, solely using scRNA-seq on CD4 cells in melanoma patients.³⁶

Single-cell genomics: Uniform amplification of the DNA is difficult at the single-cell level since only two copies of DNA are present in a single cell, quite unlike with transcriptomics. Amplification methods like degenerate oligonucleotide-primed PCR (DOP-PCR) must be used. ³⁷Allelic dropout and amplification bias affect data quality and sequencing depth. Nuc-Seq and single nucleus exome-seq are great alternatives for when SNVs and indels are identified, since allelic dropout and bias makes traditional sc-DNASeq technique not sensitive enough for these mutations.

Single-cell epigenomics: We can use DNA methylation and histone profiles for single-cell epigenomics. Single-cell bisulfite sequencing (scBS-seq) is the most reliable technique for methylation. scATAC-seq and sc-Hi-C are methods for quantifying open chromatin patterns using small numbers of cells, and chromtin structure, respectively. Meanwhile for histone modifications, Drop-ChIP and scChIC-seq can be used. Drop-ChIP is a droplet microfluidics approach, and ChIP-Seq can be conducted at the single-cell level. In 2019, single-cell chromatin immunoprecipitation was used to look at breast cancer patients. However, ChIP-seq has yet to be fully adapted for single-cells. Histone landscapes of patient-derived xenografts showed contrast between cells that would respond to versus. be resistant to chemotherapy, confirming the existence of an epigenetic nature for tumor resistance.³⁸

Single-cell proteomics, temporal and spatial multi-omics: For proteomics at a single-cell level, mass spectrometry or flow cytometry is often preferred over sequencing. However, there are some concerns about large sample size requirements as well as the ability to measure only a ³⁹few proteins, for techniques like mass spectrometry. In this case, CyToF has recently been used. A form of mass cytometry, it can conduct analyses on a large number of proteins using labelled antibody tags. The integration of spatial data in multiomics is lacking. Single-cell sequencing lacks spatial data by nature - the tissue is isolated into single cells before sequencing. However, there are some new transcriptomics-based spatial techniques, using barcoding for spatial information. Recently, spatial transcriptome techniques Slide-seq75 and Visium are being used for conducting gene expression analysis in tissue sections. Spatial information is tagged through molecular barcoding. Finally, for temporal data, the closest technique for this information is Monocle and Monocle 2, an algorithm for complex single-cell trajectories. Monocle uses a machine learning technique called “Reversed graph embedding” to learn a principal curve passing through the central tendencies of a dataset and then generates a tree which is a temporal map.⁴⁰ Monocle learns the output graph from single-cell RNA-Seq data.

III. MACHINE LEARNING

Machine Learning In Multi-Omics

Machine learning refers to algorithms that mathematically fit a predictive model to the observed (“training”) data. This model can then be generally applied to predict properties or “labels” of yet unencountered (“testing”) data. In the training process, the algorithm focuses on the progressive improved performance of a computer for the specific task assigned through learning with each iteration or “epoch” of data fed to the machine. It is a branch of artificial intelligence with an “ability to interpret large, cryptic cancer datasets”, and predict over them. Meanwhile, Deep Learning is a subset of Machine Learning that emerged in recent years. Neural networks are used to process and find complex representations of multi-omic data. When given large-scale datasets with high dimensionality, DL tends to outperform ML in oncology.⁴¹

Until recently, the interest for DL has been rather limited for multi-omics analysis ⁴². However, DL algorithm performance on analysis of omics data has shown promise in all realms, from using classification for detection of cancer, to precision risk stratification for cancer patients. Current single-cell seq technologies produce profiles for millions of single cells very quickly, opening the door to the use of powerful deep learning approaches. On top of that, multiomic data, after integration, can easily be fed to this algorithm, with a lot of choices for neural network analysis based on the kind of information fed - for instance, Convolutional Neural Nets, derived from their application in computer vision, can process positional and spatial information very well, interpreting data through a “moving window”.

Often another very appropriate choice for single-cell multi-omics learning is the use of autoencoders, a type of neural network containing three layers (encoder, bottleneck, decoder) to learn a compressed representation of raw data. The dimension of the last layer is normally lower than the input layer, reducing the curse of dimensionality which follows single-cell data. The encoder will learn as much information about the input as possible while ignoring the noise that is commonplace with this form of genetic data. Therefore, autoencoders are a dimensionality reduction algorithm, and store a low-dimensional optimized view of complex data which can be analysed and visualized. It also has a flexible architecture, increasing integration possibilities between gene and protein expression data. There are other DL algorithms that are commonplace in this domain such as DNNs, ANNs, and GANs.

Fig 1: Interpretation of tumor stages and key modifications.

Figure 2: Autoencoder use in single cell data (Adopted from “Deep Learning in Single-Cell Genomics – Laboratory for Statistical and Translational Genomics”, University of Pennsylvania 2019)

Single-Cell Sequencing, Multiomics And Big Data

“Big data” refers to data not fitting into memory and therefore being too big to be processed by conventional means⁴³. Unlike bulk sequencing, where sequencing libraries are generated from thousands of cells, single-cell technologies must generate cell-specific sequencing libraries⁴⁴. Therefore, for the first time, single-cell technology has caused genomic data measurement at the scale of millions of cells. These technologies, specifically in the context of single-cell multi-omics, can run in parallel, which causes there to be extreme data generation in extremely small periods of time.⁴⁵ On top of this, multi-omic data can be of various forms - there is no relational consistency in data-types. The complexity of the data and massive sequencing volumes make scMO a big data problem.

SCT profiles an increasing number of cells, while the underlying amount of raw data per experiment does not nearly grow as fast. Thereby computational needs for data preprocessing and storage stay relatively constant, while the need for development of analytics dealing with a large number of cells is critical. For analytics, there are a few problems with SCT data which must be dealt with. Firstly, there is a “curse of dimensionality” due to big data in genomics - there are a large number of parameters, much higher than the sample number. The “curse” is that the number of features characterizing the data are too large relative to the number of samples itself, i.e. the “curse of dataset sparsity” (e.g. number of parameters like genomic factors is far larger than the number of samples, which results in model overfitting and computational inefficiency⁴⁶. There are algorithms tailored to reducing the noise and dimensionality of such data. For effective use of analytics platforms, model selection must include considerations of dimensionality and noise, or a preprocessing pipeline must be made. Finally, imputation of missing values is important in preserving data quality, especially if the data is being sent to a deep learning algorithm. This is because for high-precision, the columns with greater weights given by the neural network must have the right data for every row. It is important to note that further layers of machine learning cannot tell between imputed and non-imputed data, and therefore a deep learning algorithm for imputing values in a careful way may be considered.⁴⁷

Figure 3: Precision Medicine Pipeline

CONCLUSION

Ultimately, cancer is anything but a singular disease or symptom - it is a self-contained system. Multiomic analysis of single-cell cancer data makes it possible to use the unprecedented power of current high-throughput molecular and computational tools to draw a more complete figure of the different players in tumorigenesis and tumor establishment. At the same time, it may provide us with new instruments and strategies useful in basic and clinical research laboratories and in translational medicine and therapeutic endeavors. As our understanding of cancer has grown, so has its complexity at the data, mutation, and incidence-level. For the longest time, research insights into cancer have posed more questions than answers, and understanding the basis of cancer has caused the ignorance of traits such as the evolution of its genome, intra-tumor variability, functional genomics, microenvironment response, the temporal and spatial axis, and omic-omic correlationary factors. While the introduction of huge datasets has been promising in cancer research and increased its applicability in deep learning, it has cursed us due to ambiguity about its processing, questions about feature selection, and data integration.

Cancer is of a genomic and epigenomic basis, and it is ever-evolving. The only way to beat cancer in its race against precision medicine is to integrate all that we know to target therapy, track its evolution, and ultimately cause better detection and therapy. The careful yet generous use of new machine learning algorithms from areas such as data engineering, NLP, and computer vision can be the final straw for cancer, and the creation of specialized ML techniques for single-cell multi-omics, optionally alongside secondary omics data, combined with better understanding of ML in the research community can facilitate great change. The treatment of cancer has popularly been called “the greatest challenge humans have ever tackled”⁴⁸.

The discovery of single-cell multi-omics, and its facilitation due to sophisticated computational techniques, is certainly the harbinger of coming changes and ultimate cures for cancer. Under a systems biology and precision medicine framework, multi-omic analysis combines the unprecedented power of high-throughput genomic or computational biology tools, creating a holistic model of the different players in tumorigenesis. Additionally, this may aid prevention of cancer in the future given a heightened interest in personalized medicine and leaps in biomarker detection, both using sequencing and algorithmically. The research community must not overlook the enormity of data that comes along with this technique, and instead adapt to interpret profiles in a specific, collaborative, and inter-disciplinary way. A multi-omic approach - combined with technological advancements - may be our best bet in ending the suffering caused by the prevalence of cancer.

Bibliography

1. Lutsik, Pavlo, Annika Baude, Daniela Mancarella, Simin Öz, Alexander Kühn, Reka Toth, Joschka Hey, et al. 2020. “Globally Altered Epigenetic Landscape and Delayed Osteogenic Differentiation in H3.3-G34W-Mutant Giant Cell Tumor of Bone.” Nature Communications 11 (1). https://doi.org/10.1038/s41467-020-18955-y.

2. Bossel Ben-Moshe, Noa, Shlomit Gilad, Gili Perry, Sima Benjamin, Nora Balint-Lahat, Anya Pavlovsky, Sharon Halperin, et al. 2018. “MRNA-Seq Whole Transcriptome Profiling of Fresh Frozen versus Archived Fixed Tissues.” BMC Genomics 19 (1). https://doi.org/10.1186/s12864-018-4761-3.

3. Hamamoto, Ryuji, Masaaki Komatsu, Ken Takasawa, Ken Asada, and Syuzo Kaneko. 2019. “Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine.” Biomolecules 10 (1): 62. https://doi.org/10.3390/biom10010062.

4. Lee, Jeongwoo, Do Young Hyeon, and Daehee Hwang. 2020. “Single-Cell Multiomics: Technologies and Data Analysis Methods.” Experimental & Molecular Medicine 52 (9): 1428–42. https://doi.org/10.1038/s12276-020-0420-2.

5. Lähnemann, David, Johannes Köster, Ewa Szczurek, Davis J. McCarthy, Stephanie C. Hicks, Mark D. Robinson, Catalina A. Vallejos, et al. 2020. “Eleven Grand Challenges in Single-Cell Data Science.” Genome Biology 21 (1). https://doi.org/10.1186/s13059-020-1926-6.

6. Lee, Jeongwoo, Do Young Hyeon, and Daehee Hwang. 2020. “Single-Cell Multiomics: Technologies and Data Analysis Methods.” Experimental & Molecular Medicine 52 (9): 1428–42. https://doi.org/10.1038/s12276-020-0420-2.

7. Mirza, Bilal, Wei Wang, Jie Wang, Howard Choi, Neo Christopher Chung, and Peipei Ping. 2019. “Machine Learning and Integrative Analysis of Biomedical Big Data.” Genes 10 (2): 87. https://doi.org/10.3390/genes10020087.

8. Dean, Michael, and Karobi Moitra. “Cancer: Blueprint of a Tumor.” Colloquium Series on The Genetic Basis of Human Disease, vol. 6, no. 1, Dec. 2018, pp. i–76. DOI.org (Crossref), doi:10.4199/C00169ED1V01Y201809GBD009.

9. Velculescu, V. E. 2008. “Defining the Blueprint of the Cancer Genome.” Carcinogenesis 29 (6): 1087–91. https://doi.org/10.1093/carcin/bgn096.

10. Kinzler, Kenneth W, and Bert Vogelstein. 1996. “Lessons from Hereditary Colorectal Cancer.” Cell 87 (2): 159–70. https://doi.org/10.1016/s0092-8674(00)81333-1.

11. Esteller, Manel. 2008. “Epigenetics in Cancer.” New England Journal of Medicine 358 (11): 1148–59. https://doi.org/10.1056/nejmra072067.

12. Valastyan, Scott, and Robert A. Weinberg. 2011. “Tumor Metastasis: Molecular Insights and Evolving Paradigms.” Cell 147 (2): 275–92. https://doi.org/10.1016/j.cell.2011.09.024.

13. Weinberg, Robert A. 1996. “How Cancer Arises.” Scientific American 275 (3): 62–70. https://doi.org/10.1038/scientificamerican0996-62.

14. Vogel, F. 1979. “Genetics of Retinoblastoma.” Human Genetics 52 (1). https://doi.org/10.1007/bf00284597.

15. Day, Mark L., Rosalinda G. Foster, Kathleen C. Day, Xin Zhao, Peter Humphrey, Paul Swanson, Antonio A. Postigo, Steven H. Zhang, and Douglas C. Dean. 1997. “Cell Anchorage Regulates Apoptosis through the Retinoblastoma Tumor Suppressor/E2F Pathway.” Journal of Biological Chemistry 272 (13): 8125–28. https://doi.org/10.1074/jbc.272.13.8125.

16. Xu, Xue, Yuan Zhou, Xiaowen Feng, Xiong Li, Mohammad Asad, Derek Li, Bo Liao, Jianqiang Li, Qinghua Cui, and Edwin Wang. 2020. “Germline Genomic Patterns Are Associated with Cancer Risk, Oncogenic Pathways, and Clinical Outcomes.” Science Advances 6 (48): eaba4905. https://doi.org/10.1126/sciadv.aba4905.

17. Esteller, M. 2006. “Epigenetics Provides a New Generation of Oncogenes and Tumour-Suppressor Genes.” British Journal of Cancer 94 (2): 179–83. https://doi.org/10.1038/sj.bjc.6602918.

18. Knudson, A. G. 1971. “Mutation and Cancer: Statistical Study of Retinoblastoma.” Proceedings of the National Academy of Sciences 68 (4): 820–23. https://doi.org/10.1073/pnas.68.4.820.

19. Chen, Xingqi. 2020. “Single-Cell Epigenomics: Methods and Translation.” Epigenetics Methods, 525–35. https://doi.org/10.1016/b978-0-12-819414-0.00026-4.

20. Esteller, Manel. 2007. “Cancer Epigenomics: DNA Methylomes and Histone-Modification Maps.” Nature Reviews Genetics 8 (4): 286–98. https://doi.org/10.1038/nrg2005.

21. Romero-Garcia, Susana, Heriberto Prado-Garcia, and Angeles Carlos-Reyes. 2020. “Role of DNA Methylation in the Resistance to Therapy in Solid Tumors.” Frontiers in Oncology 10 (August). https://doi.org/10.3389/fonc.2020.01152.

22. Baylin, Stephen B., and Peter A. Jones. 2016. “Epigenetic Determinants of Cancer.” Cold Spring Harbor Perspectives in Biology 8 (9): a019505. https://doi.org/10.1101/cshperspect.a019505.

23. Binnewies et al., 2018

24. (Fridman et al., 2017)

25. Hodi et al., 2010; Robert et al., 2014; Garon et al., 2015; Ribas and Wolchok, 2018

26. (Sharma and Allison, 2015; Wei et al., 2018

27. Stratton, Michael R., Peter J. Campbell, and P. Andrew Futreal. 2009. “The Cancer Genome.” Nature 458 (7239): 719–24. https://doi.org/10.1038/nature07943.

28. Litzenburger, Ulrike M., Jason D. Buenrostro, Beijing Wu, Ying Shen, Nathan C. Sheffield, Arwa Kathiria, William J. Greenleaf, and Howard Y. Chang. 2017. “Single-Cell Epigenomic Variability Reveals Functional Cancer Heterogeneity.” Genome Biology 18 (1). https://doi.org/10.1186/s13059-016-1133-7.

29. Lawson, Devon A., Kai Kessenbrock, Ryan T. Davis, Nicholas Pervolarakis, and Zena Werb. 2018. “Tumour Heterogeneity and Metastasis at Single-Cell Resolution.” Nature Cell Biology 20 (12): 1349–60. https://doi.org/10.1038/s41556-018-0236-7.

30. Hagemann IS, O'Neill PK, Erill I, Pfeifer JD. Diagnostic yield of targeted next generation sequencing in various cancer types: an information-theoretic approach. Cancer Genet. 2015;208:441–447.

31. (Vofelstein et al.)

32. Navin NE (2015) The first five years of single-cell cancer genomics and beyond. Genome Res

33. (Tokheim et al., 2016).

34. Crowley, Emily, Federica Di Nicolantonio, Fotios Loupakis, and Alberto Bardelli. 2013. “Liquid Biopsy: Monitoring Cancer-Genetics in the Blood.” Nature Reviews Clinical Oncology 10 (8): 472–84. https://doi.org/10.1038/nrclinonc.2013.110.

35. Stroun M, Lyautey J, Lederrey C, Olson-Sand A, Anker P. About the possible origin and mechanism of circulating DNA: apoptosis and active DNA release. Clin Chim Acta. 2001;313(1–2):139–42.

36. Tirosh, I., B. Izar, S. M. Prakadan, M. H. Wadsworth, D. Treacy, J. J. Trombetta, A. Rotem, et al. 2016. “Dissecting the Multicellular Ecosystem of Metastatic Melanoma by Single-Cell RNA-Seq.” Science 352 (6282): 189–96. https://doi.org/10.1126/science.aad0501.

37. Fu, Yusi, Fangli Zhang, Xiannian Zhang, Junlong Yin, Meijie Du, Mengcheng Jiang, Lu Liu, Jie Li, Yanyi Huang, and Jianbin Wang. 2019. “High-Throughput Single-Cell Whole-Genome Amplification through Centrifugal Emulsification and EMDA.” Communications Biology 2 (1). https://doi.org/10.1038/s42003-019-0401-y.

38. Grosselin, Kevin, Adeline Durand, Justine Marsolier, Adeline Poitou, Elisabetta Marangoni, Fariba Nemati, Ahmed Dahmani, et al. 2019. “High-Throughput Single-Cell ChIP-Seq Identifies Heterogeneity of Chromatin States in Breast Cancer.” Nature Genetics 51 (6): 1060–66. https://doi.org/10.1038/s41588-019-0424-9.

39. Reversed graph embedding resolves complex single-cell trajectories.

40. “Monocle.” 2014. Github.io. 2014. https://cole-trapnell-lab.github.io/monocle-release/docs/.

41. Yee, Nelson S. 2021. “Machine Intelligence for Precision Oncology.” World Journal of Translational Medicine 9 (1): 1–10. https://doi.org/10.5528/wjtm.v9.i1.1.

42. (Tan et al., 2020b)

43. M. Cox, D. Ellsworth

44. Y. Xin, J. Kim, M. Ni, Y. Wei, H. Okamoto, J. Lee, C. Adler, K. Cavino, A.J. Murphy, G.D. Yancopoulos, H.C. Lin, J. Gromada

45. G.X.Y. Zheng, J.M. Terry, P. Belgrader, P. Ryvkin, Z.W. Bent, R. Wilson, S.B. Ziraldo, T.D. Wheeler, G.P. McDermott, J. Zhu, M.T. Gregory, J. Shuga, L. Montesclaros, J.G. Underwood, D.A. Masquelier, S.Y. Nishimura, M. Schnall-Levin, P.W. Wyatt, C.M. Hindson, R. Bharadwaj, A. Wong, K.D. Ness, L.W. Beppu, H.J. Deeg, C. McFarland, K.R. Loeb, W.J. Valente, N.G. Ericson, E.A. Stevens, J.P. Radich, T.S. Mikkelsen, B.J. Hindson, J.H. Bielas

46. Wu, Q.; Boueiz, A.; Bozkurt, A.; Masoomi, A.; Wang, A.; DeMeo, D.L.; Weiss, S.T.; Qiu, W. Deep Learning

47. Arisdakessian, Cédric, Olivier Poirion, Breck Yunits, Xun Zhu, and Lana X. Garmire. 2019. “DeepImpute: An Accurate, Fast, and Scalable Deep Neural Network Method to Impute Single-Cell RNA-Seq Data.” Genome Biology 20 (1). https://doi.org/10.1186/s13059-019-1837-6.

48. (Cancer: Blueprint Of A Tumor)