Identifying Genetic Drivers of Cancer Morphology
Cancer is a leading cause of mortality worldwide, claiming the lives of nearly 8 million people in 2008 alone. To effectively treat cancer, we need a holistic understanding of how aberrations in key cellular pathways can drive tumor formation. Current research, however, remains predominantly focused on molecular data, despite the fact that clinical diagnosis and prognostication rely primarily on the morphologic analysis of histologic data. In this thesis, we develop 1) an image processing pipeline capable of extracting clinically-relevant morphological features from whole-slide tissue samples, and 2) a system of multi-task regressions to robustly and efficiently associate gene expression levels with transformations in specific morphological traits. These allow us to distill massive amounts of histological and molecular data into a set of unbiased and testable hypotheses regarding the effect of specific genes on particular clinically-relevant aspects of tumor morphology. We demonstrate our system on matching histological and molecular data from a total of 574 breast cancer patients from two independent cohorts: 248 from the Netherlands Cancer Institute, and 326 from the Cancer Genome Atlas. Our results corroborate many associations between known onco- and tumor-suppressor- genes and tumor morphology, including the recently discovered role of CDC6 in epithelial-mesenchymal transition. We also identify several putative and previously unknown key genes in breast carcinoma, together with their purported role in tumor morphology, e.g., the role of VIPR2 in promotion of the stromal environment. These promising results pave the way for future investigative work into these genes, and show the viability of our integrative analysis of morphological and molecular data.