March 12, 2015

intro

(Ernst and Kellis 2015)

Main Idea

Exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets

Matrix of Observed and Imputed Data

tablea

1. Leverage other marks in same sample

test

2. Leverage same mark in different sample

test

Types of data used to impute

used

Advantages of Imputation

  • Beneficial even if observed data is available
    • Combining information –> robust to experimental noise, confounders
    • Achieve a higher sequencing depth –> higher signal to noise ratio
  • Improve GWAS enrichments –> epigenomic maps as an unbiased approach for discovering disease-relevant tissues and cell types
  • Quality Control –> Are there discrepancies between imputed and observed datasets
  • Feature importance
  • Chromatin state annotation

Limitations

  • If the presence of mark signal is highly specific to one or a few samples, and it does not correlate with other marks mapped in the sample or has a different correlation structure than in samples used for training, then it would not be possible to accurately impute the mark at those locations
  • When the target mark has been mapped in only a few samples, the features pertaining to the same mark in other samples may be less informative or more biased e.g. TFBS
  • For tissue samples that reflect mixtures of multiple cell types, our imputed maps will most likely reflect the same mixture as the observed data, though deconvolution of mixed samples is a potentially important direction for future work

ChromImpute Software