Normalization

Motivation

Our workflow outline the use of functional normalization16, which exploits internal control probes designed to detect technical variations without assaying biological differences, and dasen as implemented by wateRmelon30. Both are adjusted and updated to use the interpolatedXY method31.

Functional normalization has been shown to perform favourably when compared to other approaches17. Using the internal control probes avoids the problems associated with global normalization methods, where biological variation can be mistaken for a technical effect and removed. This is especially important in studies where groups are expected to have differential methylation signatures, such as multiple tissue studies18.

Conversations on the best approaches for normalization in DNAm data pipelines are ongoing19.


Principal Components

The default of selecting only two principal components is often too low for this type of data. Often you will see a drop-off in proportion of variance explained after a certain number of principal components, and this can indicate an efficient selection.


Running Normalization

In order to run normalization the annotation of the RGset must be updated for EPIC arrays.

We use the adjustedFunnorm function from wateRmelon, which uses the interpolated XY method31. By default, functional normalization returns normalized copy number data making the returned GenomicRatioSet twice the size necessary when only beta-values or M-values are required. Therefore, we set keepCN to FALSE.

## [adjustedFunnorm] Background and dye bias correction with noob
## [adjustedFunnorm] Mapping to genome
## [adjustedFunnorm] Quantile extraction
## [adjustedFunnorm] Normalization
## class: GenomicRatioSet 
## dim: 865859 496 
## metadata(0):
## assays(1): Beta
## rownames(865859): cg14817997 cg26928153 ... cg07587934 cg16855331
## rowData names(0):
## colnames(496): GSM3228585_200550980034_R01C01
##   GSM3228586_200550980034_R02C01 ... GSM3229237_200598350015_R06C01
##   GSM3229239_200598350080_R03C01
## colData names(25): sample_ID geo_accession ... Neu Mono
## Annotation
##   array: IlluminaHumanMethylationEPIC
##   annotation: ilm10b4.hg19
## Preprocessing
##   Method: NA
##   minfi version: NA
##   Manifest version: NA

It is also possible to use adjustedDasen to apply dasen normalization to normalize autosomal CpGs and infer the sex chromosome linked CpGs by linear interpolation on corrected autosomal CpGs. Instead of outputting a GRset, this function outputs the normalized beta values. Therefore, using this normalization removes the need for the DNAmArray reduce function in the next steps.