Machine Learning and Computer Vision Approaches for Phenotypic Profiling in Yeast

Discovering and identifying functions of genes in an organism has been and will be the goal of genetics research. Developing tools to help researchers with their gene function discoveries has been the overarching goal of my thesis. One of the great methods to understand the gene function is by perturbing the gene and identifying the phenotypic consequences on the cell by imaging. By focusing on specific compartments in the cell, the perturbed function reveals itself through altered subcellular morphology. It will be useful for the community to develop an automated pipeline to discover abnormal morphological changes due to any perturbation.

I focused on comparing different combinations of methods for performing image analysis and outlier detection to automatically detect cells with abnormal morphologies. This detection allowed me to quantify the percentage of cells with abnormal phenotypes for a perturbed gene population, and thus the penetrance. I combined outlier detection and a neural network-based image analysis of single cells to quantify the phenotypic variability within a population of cells. As a model system, I first focused on the genes that influence the architecture of four subcellular compartments of the endocytic pathway. 17 distinct abnormal phenotypes were identified that are associated with many genes. Nearly half of these perturbed populations displayed multiple phenotypes, suggesting that morphological pleiotropy is prevalent.

I automated the clustering of cells with abnormal morphology on a comprehensive list of 18 subcellular compartments into distinct abnormal phenotypes to understand the global morphology of the yeast cell. While the central players of many important biological processes have been discovered, there remain numerous gaps in our understanding of the regulation of cellular morphology. By developing a computational pipeline to systematically and quantitatively assess abnormal phenotypes at the genome-wide level, I aimed to address these knowledge gaps. These analyses allowed for the identification of connections between discrete biological processes, the prediction of novel gene function, and the generation of a clearer understanding of basic eukaryotic cell biology.