Introduction

DNAmArray is a comprehensive and modular workflow for the pre-processing, quality control, and analysis of DNA methylation (DNAm) array data, tailored for large-scale epigenome-wide association studies (EWAS). It integrates best practices in the field with in-house developed tools, and has been informed by almost a decade of research using the Biobank-based Integrative Omics Study (BIOS) consortium data. The BIOS dataset combines blood-based DNAm and gene expression from across six Dutch biobanks and around 4,000 individuals, and has been used to extensively advance our understanding of epigenetic regulation and health^1-8.

The workflow combines a series of convenient DNAmArray functions with BioConductor packages, including:

minfi for reading in IDAT files and normalization⁹,
MethylAid for sample-level quality control¹⁰,
bacon for controlling bias and inflation in EWAS test statistics¹¹, and
omicsPrint for identifying and resolving sample mismatches¹².

While thoroughly validated on DNAm data profiled using the Illumina Infinium HumanMethylation450 and EPIC arrays, notes within this documentation also outline any changes needed to apply DNAmArray to Infinium MethylationEPIC v2.0 BeadChip data. In addition, any dependencies should be installed automatically, but otherwise please refer to the relevant package’s documentation (and let us know by opening a GitHub issue!)

In conclusion, DNAmArray provides a scalable, reproducible, and EWAS-ready framework for DNA methylation data analysis, compatible with evolving technologies and suitable for integration with downstream follow-up analyses.

Example Data

The example data¹³ used in this workflow is available from the NCBI Gene Expression Omnibus (GEO), a public repository of microarray data. It contains genome-wide DNA methylation data from whole blood obtained using the Illumina Infinium MethylationEPIC BeadChip microarray. The participants consist of 679 children exposed to polybrominated biphenyl (PBB), an endocrine-disrupting compound, which was accidentally added to the food supply in Michigan in the 1970s.

Streamlined workflow for the quality control, normalization, and analysis of Illumina methylation array data

Lucy Sinke, Maarten van Iterson, Davy Cats, BIOS Consortium, Tom Kuipers, and Bas Heijmans
Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands

Introduction

Example Data

Streamlined workflow for the quality control, normalization, and analysis of Illumina methylation array data

Lucy Sinke, Maarten van Iterson, Davy Cats, BIOS Consortium, Tom Kuipers, and Bas Heijmans Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands

Introduction

Example Data

Lucy Sinke, Maarten van Iterson, Davy Cats, BIOS Consortium, Tom Kuipers, and Bas Heijmans
Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands