investigator_user investigator user funding collaborators pending menu bell message arrow_up arrow_down filter layers globe marker add arrow close download edit facebook info linkedin minus plus save share search sort twitter remove user-plus user-minus
  • Project leads
  • Collaborators

Developing Informatics Technologies to Model Cancer Gene Regulation

Xiaole Shirley Liu

0 Collaborator(s)

Funding source

National Institutes of Health (NIH)
Genome-wide studies have demonstrated that trans-acting factors, including transcription factors, chromatin regulators and other chromatin-associated factors, are frequently mutated in cancer, reaffirming that aberrant gene regulation is a key mechanism in oncogenesis. The way in which these trans-acting factors regulate transcription on a genome-wide basis is poorly understood, motiving ever increasing number of ChIP-seq and DNase-seq experiments to map genome-wide transcription factor binding (cistrome) and chromatin status (epigenome). Novel and significant biological insights have been gained through the analysis of ChIP-seq and DNase-seq data integrated with other published ChIP-seq and DNase-seq data sets as well as expression profiles. Most cancer biologists, however, find computational data analysis and integration of cistrome and epigenome data to be the major bottleneck of such studies due to the lack of informatics expertise and infrastructure. The objective of this proposal is to develop the informatics technologies to improve the acquisition, analysis, integration and reuse of ChIP-seq and DNase-seq data so as to allow experimental cancer biologists to model transcriptional and epigenetic gene regulation in cancer research. Specifically, we propose to develop informatics technologies to address three critical aspects of epigenome and cistrome data analysis. First, we will implement software to automate data collection, processing and quality control, enabling diverse types of unpublished and public ChIP-seq and DNase-seq data to be analyzed and converted into statistics and formats that can be readily used for integrative analysis. Second, we will develop systems to allow gene expression data to be interpreted with cistrome and epigenome data in order to elucidate regulatory mechanisms. Third, we will develop tools to quickly and accurately identify informative public datasets and to infer combinatorial rules of regulation and interactions. Finally, we will develop the infrastructure and interface to host the algorithms and tools developed in the first three aims, and provide the experimental cancer biologists with a flexible and intuitive user experience. We will design our software to interact easily with complementary software systems and databases. The software developed in this proposal will be freely available open-source, and we will work with our collaborators and users to improve its functions and user interface.

Related projects