Installation

The DNAmArray package can be installed in several ways, and has been tested for >= R-4.4.3 on various Linux-builds and Windows.


Install using devtools

First, install devtools, and then use the install_github() function to fetch the DNAmArray package.


Install from source using git/R

Using git, you can git clone our repository and then install the package, changing _x.y.z. to the relevant version.


Loading packages

First, load the packages that are required for this workflow:

  • DNAmArray - the main package, containing in-house build functions for the preprocessing of DNAm data
  • MethylAid10 - for sample-level quality control
  • omicsPrint12 - in-house tool use to detect sample linkage errors and resolve them
  • bacon11 - in-house tool for reducing bias and inflation in EWAS test statistics
  • GEOquery14 - bridges the gap between BioConductor tools and GEO
  • tidyverse - for data wrangling
  • reshape2 - for data wrangling
  • ggrepel - for data visualization
  • ComplexHeatmap - for creating heatmaps
  • circlize - designed to allow circular plots, but also used to create custom colour palettes
  • BiocParallel - to parallelize processing of genomic data
  • IlluminaHumanMethylationEPICmanifest - containing EPIC probe annotation for the example data (other packages are available for 450k or EPICv2)
  • minfi9 - functions to preprocess DNAm data
  • wateRmelon30 - functions to preprocess DNAm data
  • snow - required by MethyLImp2 for parallelization
  • limma27 - for EWAS analysis
  • EpiDISH21 - for cell count predictions
  • sva23-25 - for estimating latent factors
  • methyLImp220
  • SummarizedExperiment

Importing IDATs

The first step in analysing microarray data is importing raw intensity files into your software program. In this example, we show how to import raw IDAT files from GEO into R, but similar strategies can be employed for all Illumina DNAm array files when using R.

Using the getGEOSuppFiles() function, supplementary data is downloaded to the current working directory. These consist of the raw IDAT files alongside relevant documentation.

The data can then be efficiently unpacked using the gunzip() function.

getGEO() can then be used to import SOFT format microarray data into R as a large GSE-class list, and extract the metadata of interest.


Preparing targets

Before progressing further, take some time to get familiar with the targets data frame, removing duplicate information and converting variables to relevant classes.

## 'data.frame':    679 obs. of  12 variables:
##  $ sample_ID         : chr  "3141584" "10120245" "99990005" "12130802" ...
##  $ geo_accession     : chr  "GSM3228562" "GSM3228563" "GSM3228564" "GSM3228565" ...
##  $ sex               : chr  "Female" "Female" "Male" "Male" ...
##  $ age               : num  75 31.5 46.7 46.4 53.6 ...
##  $ log_total_pbb     : num  -2.542 -2.754 -0.371 0.115 -1.548 ...
##  $ pbb_153           : num  0.069 0.054 0.68 1.107 0.203 ...
##  $ pbb_77            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pbb_101           : num  0 0 0 0.008 0 0 0 0 0 0 ...
##  $ pbb_180           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ supplementary_file: chr  "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3228nnn/GSM3228562/suppl/GSM3228562_200550980002_R01C01_Grn.idat.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3228nnn/GSM3228563/suppl/GSM3228563_200550980002_R02C01_Grn.idat.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3228nnn/GSM3228564/suppl/GSM3228564_200550980002_R03C01_Grn.idat.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3228nnn/GSM3228565/suppl/GSM3228565_200550980002_R04C01_Grn.idat.gz" ...
##  $ plate             : chr  "200550980002" "200550980002" "200550980002" "200550980002" ...
##  $ row               : num  1 2 3 4 5 6 7 8 1 3 ...

This data frame consists of 679 observations and phenotypic information for 51 variables is stored after cleaning. This includes:

  • sample_ID - the ID of the sample
  • geo_accession - the GEO accession number of the sample
  • sex - male or female
  • age - age of the individual
  • log_total_pbb, pbb_153, pbb_77, pb_101, pbb_180 - exposure levels
  • supplementary_file - location of IDAT file
  • plate - EPIC array number
  • row - row number on the array (continuous)