Large-scale transcriptome and epigenome association analysis across multiple traits
James J Peters Va Medical Center, Bronx NY
Investigators
Linked publications & trials
Abstract
PROJECT SUMMARY Precision Psychiatry is an emerging approach that considers patientsâ characteristics to customize prevention and treatment for serious mental illness. The Million Veteran Program (MVP) is the largest and most comprehensive biobank in the world, currently involving multi-ancestry genetic data from more than 650,000 Veterans and highly dense electronic health record information that fully captures the clinical characteristics of each participant. Given the high prevalence of serious mental illness among our Veterans, MVP provides a unique opportunity to perform large-scale genetic discovery that will further our understanding of the pathophysiology of serious mental illness and promote Precision Psychiatry. While well-powered genome-wide association studies (GWAS) have identified multiple risk variants across serious mental illness, there have been limited conclusive findings on the functional relevance of most discovered loci due to small effect size, overlap with non-coding regions of the genome and unclear mechanisms through which they act. Our group and others have shown that a large portion of phenotypic variability in disease risk can be explained by regulatory variants with cell type specificity, i.e. genetic variants that affect epigenetic mechanisms and the expression levels of genes. Studying gene expression and epigenome changes directly in MVP samples is not feasible as such data are not available. To overcome these limitations, we propose to take advantage of large-scale datasets with genotyping and multiscale molecular profiling that our group and others have generated in human brain tissue and apply machine learning approaches to directly impute genome-wide transcriptomes, epigenomes and proteomes in MVP samples using the existing MVP genotypes. The primary goals of our project are threefold: First, imputed MVP transcriptomes, epigenomes and proteomes will be meta-analyzed to single tissue-specific gene dysregulation scores for each individual via a novel method, called PolyXcan, which leverages a data- driven correlation-aware meta-analytical framework and performs joint multi-omics-wide association studies. For each serious mental illness, key gene drivers and molecular pathways will be identified with a structured, interpretable deep learning approach and gene-gene interaction effects by leveraging patient subtypes identified with semi-supervised graph-based cluster methods; both of these approaches are only possible with well- powered individual-level (genotypic and phenotypic) data of the scale that exists in MVP and we expect them to enhance efforts for gene target prioritization and drug discovery. Second, imputed gene dysregulation for each individual in MVP will be integrated with perturbagen reference libraries (describing the effect of therapeutic compounds on gene expression) to identify the extent to which compounds could be therapeutic by antagonizing the predicted gene dysregulation. We have validated this approach to summary level data (from GWAS) in a wide range of disorders (autoimmune, neuropsychiatric and COVID-19). Here we propose to use the same approach at the individual level to determine whether genetics can be utilized to rank potential treatments and predict the ones that achieve better outcomes. Third, the scale of data generation and its integration into predictive models will provide a wealth of data that will be made available to the MVP scientific community for other diseases beyond the immediate goals of this proposal that have the potential to increase our understanding of Precision Psychiatry. Successful completion of our study would have an enormous impact on our Veterans since, in addition to the tremendous burden of suffering and economic costs, serious mental illness increases the mortality rate among Veterans.
View original record on NIH RePORTER →