Ontology-Based representation of cancer driver genes for predictive purposes
Stratifying cancer patients based on their genomic signature allows predicting disease progression and delivering targeted treatments to which patients are more likely to respond. Therefore, many computational approaches have been used to identify gene expression signatures that enable customised prognostication and therapies for individuals. However, the goal of identifying such signatures may only be achieved by relating genes to phenotypes, the specific biological processes and molecular functions genes can be involved in and the cellular locations at which gene products are active. The aim of this project is to find a meaningful embedding for genes by interlinking multiple ontologies using machine learning. The embedding will then be used as input to deep learning algorithms that are used for predicting patients’ survival related characteristics.
The ideal candidate for this project will have great machine learning programming skills in Python. Good knowledge of ontologies is required too. Familiarity with autoencoders and biomedical data is a big plus.
Symbolic Rule Extraction
As machine learning algorithms increasingly make decisions previously entrusted to humans, it is of paramount importance to explain their decisions, evidenced by recent regulations such as the European Union’s “Right to Explanation”. Deep neural networks have had great success in both supervised and unsupervised machine learning, however due to the large number of elementary operations they use to derive a decision, they are in general not explainable. Attempts to solve this problem is thus focused on finding ways of reducing the complexity of all these operations. Automatic rule extraction is an approach that aims at extracting rules that in combination act as a proxy model that can be used for summarising and explaining the network decision. The aim of this project is to extract rules from a deep learning model that get cancer patients’ gene expression as input and classifies them as those likely or unlikely to relapse.
The ideal candidate for this project will have great machine learning programming skills in Python. Strong mathematical background is required too. Familiarity with biomedical data is a big plus.