August 2010

EPIGENETICS:

Genome-Scale Epigenetic Marker Detection Across Populations


UPDATE AUGUST 25:
MetMap, the software under discussion in this blog post, is free, open-access, and available at:
http://www.cs.berkeley.edu/~meromit/MetMap


Introductory biology students are still commonly taught that all inherited biochemical information is passed down through the pattern of adenine, guanine, cytosine, and thymine subunits in DNA. This is not true.

Most notably, other DNA alterations can also be inherited. A commonly studied example is methylation patterns, i.e. CH3 units affixed to (typically) cytosine units near the beginning of a gene.

This is known as epigenetic inheritance. Figuring out how it works has implications in stem cell specialization, cancer development and progression, and many other aspects of human health and medicine.

It's generally impractical, or costly, to elucidate DNA methylation patterns across an entire genome, much less a large population of people. An ability to accurately read epigenetic information cheaply, quickly, and on a large scale would clearly increase our understanding of epigenetics in relation to human health and medicine.

Lior Pachter (University of California at Berkeley, United States) and coworkers have developed computer software, "MetMap," which extracts more information from the common "MethylSeq" experimental technique for genome-scale methylation analysis. They have applied their development to epigenetic interrogation of neutrophils, a type of white blood cell, from four separate people.

MethylSeq and MetMap: A brief introduction.

This next section is sort of heavy. If you want to skip it, the take-home message is that MetMap software incorporates MethylSeq experimental epigenetic data, bias correction, and genetic sequence information all into one model.

MetMap is a mathematical compliment to, not a substitute for, the MethylSeq experimental approach for epigenetic interrogation. MethylSeq uses an enzyme that breaks apart DNA at sequences possessing unmethylated "CpG sites" (put simply, locations in the DNA of epigenetic significance).

Later, the resulting DNA fragments can be read out to determine the location of the unmethylated CpG sites. A limitation of MethylSeq experiments is that their efficacy depends upon CpG site density, and the extent of methylation may vary even among a single cell type.

In other words, MethylSeq hides potentially useful epigenetic data. Furthermore, methods aimed at removing questionable data invariably end up throwing away perfectly good data as well.

The utility of MetMap is its determination, via statistical probability, of the extent of CpG site methylation that is ambiguous via MethylSeq alone. In other words, MetMap software uncovers the epigenetic data hidden by MethylSeq experiments.

MetMap classifies DNA cleavage sites as either methylated, unmethylated, or differentially methylated, and according to whether they are (or are not) present in an unmethylated CpG site. They are further classified according to the relative extent they are detected via MethylSeq experiments, such that MetMap can apply a normalization protocol to correct for the limitations of MethylSeq experiments.

Relations between all of these variables are calcuated via three probability functions. One involves the distance between CpG sites as a function of unmethylation, the second involves the probability of methylation extent given whether or not a DNA cleavage site is in an unmethylated CpG site, and the third involves a correction for MethylSeq experimental ambiguity.

A possible problem is that a high density of CpG sites may inherently report a high extent of unmethylation. MetMap corrects for this too, taking into account how much of the entire region is unmethylated.

MetMap evaluation.

The scientists tested out their Metmap software on neutrophil (a type of white blood cell) DNA from four human males. The DNA was cleaved into fragments according to the conventional MethylSeq experimental protocol.

MetMap software doubled the amount of epigenetic data yielded by MethylSeq experiments. It was able to accurately identify epigenetically-relevant methylation patterns.

The scientists further found that while there is high epigenetic variation at specific sites in DNA across individuals, there is less epigenetic variation across larger DNA segments. This suggests that, contrary to convention, a comparison across individuals should consider larger DNA sequences rather than specific sites of variance.

Perhaps most importantly, MetMap provided the scientists with previously unknown epigenetic markers, i.e. DNA sequences relevant to epigenetic inheritance. Further studies should independently evaluate their biological function, and relevance to human health and medicine.

Implications.

MetMap software will revolutionize the application of experimental MethylSeq screening to the search for epigenetic similarity and variance among humans. This will be very useful for probing the role of epigenetics in human health and disease.

NOTE: The scientists' research was funded by the National Institutes of Health.

ResearchBlogging.org for more information:
Singer, M., Boffelli, D., Dhahbi, J., Schoenhuth, A., Schroth, G. P., Martin, D. I. K., & Pachter, L. (2010). MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations PLoS Computational Biology, 6 (8) DOI: 10.1371/journal.pcbi.1000888