# Introduction

Our workflow uses the Functional Normalization approach (Hansen, 2014), which exploits internal control probes designed to detect technical variations without assaying biological differences. It has been shown to perform favourably when compared to other approaches (Heiss, 2015; Liu, 2016).

Using the internal control probes avoids the problems associated with global normalization methods, where biological variation can be mistaken for a technical effect and removed. This is especially important in studies where groups are expected to have differential methylomes, such as multiple tissue studies (Min, 2018).

# Principal Components Selection

The default of selecting only two principal components is often too low for this type of data, so our screeplot() function allows visualisation of control probe eigenvalues to help select the optimal number. Often you will see a drop-off in proportion of variance explained after a certain number of principal components, and this can indicate an efficient cut-off.

par(mar=c(5,5,4,2), mgp=c(2.5,1,0), cex.main=1.5, font.main="1", fg="#6b6b6b", col.main="#4b4b4b")
pc <- screeplot(RGset)

This plot shows that only a small amount of variance is explained after the sixth principal component. For this reason, we choose to carry out normalization using 6 principal components.

# Functional Normalization

By default, functional normalization returns normalized copy number data making the returned GenomicRatioSet twice the size necessary when only beta-values or M-values are required. Therefore, we developed preprocessFunnorm.DNAmArray(), which adds an option not to return these to the preprocessFunnorm() function from minfi (Feinberg, 2014).

GRset <- preprocessFunnorm.DNAmArray(RGset, nPCs=6, keepCN=FALSE)
## [preprocessFunnorm] Modified version not returning CN!
## [preprocessFunnorm] Background and dye bias correction with noob
## [preprocessFunnorm] Mapping to genome
## [preprocessFunnorm] Quantile extraction
## [preprocessFunnorm] Normalization
GRset
## class: GenomicRatioSet
## dim: 485512 138
## assays(1): Beta
## rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923
## rowData names(0):
## colnames(138): GSM3092700_9985178096_R01C01
##   GSM3092701_9985178127_R03C02 ... GSM3093566_9020331152_R05C01
##   GSM3093567_9020331152_R06C01
## colData names(13): geo_accession cohort ... yMed predictedSex
## Annotation
##   array: IlluminaHumanMethylation450k
##   annotation: ilmn12.hg19
## Preprocessing
##   Method: NA
##   minfi version: NA
##   Manifest version: NA

This function returns beta-values calculated with an offset of 100, similar to specifying getBeta() with option type='Illumina'. If required, these can be transformed to M-values using minfi’s getM() which applies a logit2-transformation to the beta-values.

From inspecting the GenomicRatioSet object, you can see that there is one Beta assay, with methylation values for 485,512 CpG sites.