Gene prioritization is the process of assigning similarity or confidence scores to genes and ranking them based on the probability of their association with the disease of interest. Several bioinformatics tools for gene prioritization currently exist, but they are time-consuming and costly to use. Agam and his students are developing novel computational approaches to identify high-confidence candidate genes that are causative for disease phenotypes from the large lists of variations produced by high-throughput genomics. This includes an approach to more quickly and accurately rank genes most likely to cause autism. Agam and his team’s approach is an algorithm based on the modified conditional random field model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. The algorithm draws from multidimensional biological information in a database that integrates information from more than 35 public databases and private collections; molecular pathways; phenotypic databases; and ontologies – basically, everything known about a gene.
Above: Symptom-disease network image generated by Ph.D. student Haithum Elhadi.
- Disease gene prioritization using network and feature (Journal of Computational Biology, vol. 22, no. 4, 2015, pp. 313-323) http://www.ncbi.nlm.nih.gov/pubmed/25844670
- Minority oversampling to help train classifiers in cases that few examples are available from one of the classes (which makes the classifiers more difficult to train). The paper was accepted for publication in CIKM 2016.
- Learning from Synthetic Data Using a Stacked Multichannel Autoencoder. The idea is to use synthetic data to help improve classification results of actual data. This was published in ICMLA 2015.