investigator_user investigator user funding collaborators pending menu bell message arrow_up arrow_down filter layers globe marker add arrow close download edit facebook info linkedin minus plus save share search sort twitter remove user-plus user-minus
  • Project leads
  • Collaborators

Statistical Genomics and the Analysis of High Dimensional Data

Sylvia Richardson

0 Collaborator(s)

Funding source

Medical Research Council (MRC)
Motivated by important biomedical collaborations, the broad aim of this programme is to enrich the statistical toolkit that is commonly used in genomics analyses by developing and implementing new analytical strategies for integrative and translational genomics, as well as for the analysis of data rich genetic epidemiological studies. The core of the proposed models and inference will be embedded within state of the art statistical developments in Bayesian modelling and computations, although analogous penalised approaches will also be investigated and compared to Bayesian implementations. We will focus on modelling and computational strategies where the multidimensional (e.g. complex multiple-phenotype) and multivariate aspects (many explanatory features) of complex data sets are exploited at key steps of the analysis strategy. In order to perform dimension reduction, we will investigate sparse regression approaches, i.e. the finding of well supported models involving only a small number of important features, as well as the formulation of flexible clustering structures that can uncover major patterns of variability in large sets of genomic or epidemiological biomarkers. One important focus will be build structural links between different sources of data using context specific (e.g. epidemiological, biological) information, hierarchical relationships and informative priors based on substantive knowledge. Model uncertainty will be taken into account as part of the Bayesian modelling process. We will use the methods to search for new associations and structures in a range of exemplar case studies demonstrating multiple facets of the applicability of the novel methods that will be developed. In particular, we will investigate gene expression profiling for autoimmune diseases, biomarkers for predicting small-for-gestational age infants, prognostic scoring for progression free survival in breast cancer, gene-environment interactions for Type 2 diabetes and integrative genomics of coronary heart disease. The algorithms that are developed will be made efficient by integrating new parallel computing techniques and novel software architecture that enormously reduce computing time. The computer programmes implemented will be open source and made publicly available.

Related projects