DOI


Introduction

DNAmArray is a comprehensive and modular workflow for the pre-processing, quality control, and analysis of DNA methylation (DNAm) array data, tailored for large-scale epigenome-wide association studies (EWAS). It integrates best practices in the field with in-house developed tools, and has been informed by almost a decade of research using the Biobank-based Integrative Omics Study (BIOS) consortium data. The BIOS dataset combines blood-based DNAm and gene expression from across six Dutch biobanks and around 4,000 individuals, and has been used to extensively advance our understanding of epigenetic regulation and health1-8.

The workflow combines a series of convenient DNAmArray functions with BioConductor packages, including:

  • minfi for reading in IDAT files and normalization9,
  • MethylAid for sample-level quality control10,
  • bacon for controlling bias and inflation in EWAS test statistics11, and
  • omicsPrint for identifying and resolving sample mismatches12.

While thoroughly validated on DNAm data profiled using the Illumina Infinium HumanMethylation450 and EPIC arrays, notes within this documentation also outline any changes needed to apply DNAmArray to Infinium MethylationEPIC v2.0 BeadChip data. In addition, any dependencies should be installed automatically, but otherwise please refer to the relevant package’s documentation (and let us know by opening a GitHub issue!)

In conclusion, DNAmArray provides a scalable, reproducible, and EWAS-ready framework for DNA methylation data analysis, compatible with evolving technologies and suitable for integration with downstream follow-up analyses.


Example Data

The example data13 used in this workflow is available from the NCBI Gene Expression Omnibus (GEO), a public repository of microarray data. It contains genome-wide DNA methylation data from whole blood obtained using the Illumina Infinium MethylationEPIC BeadChip microarray. The participants consist of 679 children exposed to polybrominated biphenyl (PBB), an endocrine-disrupting compound, which was accidentally added to the food supply in Michigan in the 1970s.