DOI


Introduction

DNAmArray is a comprehensive and modular workflow for the pre-processing, quality control, and analysis of DNA methylation (DNAm) array data, tailored for large-scale epigenome-wide association studies (EWAS). It integrates best practices in the field with in-house developed tools, and has been informed by almost a decade of research using the Biobank-based Integrative Omics Study (BIOS) consortium data. The BIOS dataset combines blood-based DNAm and gene expression from across six Dutch biobanks and around 4,000 individuals, and has been used to extensively advance our understanding of epigenetic regulation and health1-8.

The workflow combines a series of convenient DNAmArray functions with BioConductor packages, including:

  • minfi for reading in IDAT files and normalization9,
  • MethylAid for sample-level quality control10,
  • bacon for controlling bias and inflation in EWAS test statistics11, and
  • omicsPrint for identifying and resolving sample mismatches12.

While thoroughly validated on DNAm data profiled using the Illumina Infinium HumanMethylation450 and EPIC arrays, notes within this documentation also outline any changes needed to apply DNAmArray to Infinium MethylationEPIC v2.0 BeadChip data. In addition, any dependencies should be installed automatically, but otherwise please refer to the relevant package’s documentation (and let us know by opening a GitHub issue!)

In conclusion, DNAmArray provides a scalable, reproducible, and EWAS-ready framework for DNA methylation data analysis, compatible with evolving technologies and suitable for integration with downstream follow-up analyses.


Example Data

The example data13 used in this workflow is available from the NCBI Gene Expression Omnibus (GEO), a public repository of microarray data. It contains genome-wide DNA methylation data from whole blood obtained using the Illumina Infinium MethylationEPIC BeadChip microarray. The participants consist of 679 children exposed to polybrominated biphenyl (PBB), an endocrine-disrupting compound, which was accidentally added to the food supply in Michigan in the 1970s.


References

1Dekkers, K.F., van Iterson, M., Slieker, R.C., et al. Blood lipids influence DNA methylation in circulating cells. Genome Biol. 17, 138 (2016).

2Slieker, R.C., van Iterson, M., Luijk, R. et al. Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms. Genome Biol. 17, 191 (2016).

3Bonder, M., Luijk, R., Zhernakova, D. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet. 49, 131-138 (2017).

4Luijk, R., Wu, H., Ward-Caviness, C.K. et al. Autosomal genetic variation is associated with DNA methylation in regions variably escaping X-chromosome inactivation. Nat Commun. 9, 3738 (2018).

5van Rooij, J., Mandaviya, P.R., Claringbould, A., et al. Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies. Genome Biol. 20, 1:235 (2019).

6Hop, P.J., Luijk, R., Daxinger, L., et al. Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference. Genome Biol. 21, 220 (2020).

7Dekkers, K.F., Slieker, R.C., Ioan-Facsinay, A. et al. Lipid-induced transcriptomic changes in blood link to lipid metabolism and allergic response. Nat Commun. 14, 544 (2023).

8Lui, Y., Sinke, L., Jonkman, T.H., et al. The inactive X chromosome accumulates widespread epigenetic variability with age. Clin Epigenetics. 15, 135 (2023).

9Aryee, M.J., Jaffe, A.E., Corrada-Bravo, H. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 30, 10 (2014).

10van Iterson, M., Tobi, E.W., Slieker, R.C., et al. MethylAid: visual and interactive quality control of large Illumina 450k datasets. Bioinformatics 30, 23 (2014).

11van Iterson, M., van Zwet, E.W., BIOS consortium, et al. Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol. 18, 19 (2017).

12van Iterson, M., Cats, D., Hop, P. et al. omicsPrint: detection of data linkage errors in multiple omics studies. Bioinformatics. 34, 12 (2018).

13Curtis, S.W., Cobb, D.O., Kilaru, V. et al. Exposure to polybrominated biphenyl (PBB) associates with genome-wide DNA methylation differences in peripheral blood. Epigenetics. 14, 1:52-66 (2019).