跳转到内容

生物信息学/HumMeth27QCReport

维基教科书,自由的教学读本

HumMeth27QCReport

[编辑]

HumMeth27QCReport是Illumina的Infinium BeadChip甲基化芯片的质控工具,是由CRG Genotyping UnitCRG Bioinformatics Core合作开发的R包。

HumMeth27QCReport包的CRAN下载地址

安装HumMeth27QCReport

[编辑]

软件包下载地址:HumMeth27QCReport

安装前需要安装一些依赖包:

# 检查是否安装BiocManager软件包
if (!requireNamespace("BiocManager", quietly=TRUE))
  install.packages("BiocManager")

# 安装依赖的R包
install.packages(c("amap", "tcltk2"))
BiocManager::install(c("IlluminaHumanMethylation27k.db","FDb.InfiniumMethylation.hg18","FDb.InfiniumMethylation.hg19"))

# 安装HumMeth27QCReport
install.packages("path/to/HumMeth27QCReport_1.2.15.tar.gz", repos = NULL, type = "source")

HumMeth27QCReport包的函数和参数

[编辑]

ImportData()

[编辑]
ImportData(Dir)

参数

[编辑]
Dir 输入文件所在文件夹,也是输出文件夹

返回值

[编辑]

包含三或四个数据框的列表,每个样本一个文件,与样本同名的pdf文件。.

QCCheck()

[编辑]
QCCheck(ImportDataR, pval)

参数

[编辑]
ImportDataR ImportData函数的结果;
pval p值的阈值,筛选进行标准化和后续分析的样本;

返回值

[编辑]
三个不同的质控图 第一个是methylumi包的"plotSampleIntensities"函数绘制的Intensity图;

第二个是未检测到的CPG的直方图百分比(即CPGs的检测p值大于0.05或0.01);.

每个样本平均p值的直方图。.

名为QualityCheck.pdf的文件中包含所有图形

三个数据框的列表 1. 分析结果汇总;2. CPGs的检测p值大于0.05或;3. 大于 0.01;

NormCheck()

[编辑]
NormCheck(ImportDataR, platform, pval, ChrX, ClustMethod, normMethod)

参数

[编辑]
ImportDataR ImportData函数的结果;
platform 平台类型,可选值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip);
pval p值阈值
ClustMethod 聚类方法,可选值有 "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" , "kendall";
ChrX 是否将X染色体上的CpGs用于分析,默认FALSE,否;TRUE,是。
normMethod 标准化方法,可选值 "quantile" 或 "ssn",参看lumi包中的lumiMethyN()函数文档默认为 "quantile";

返回值

[编辑]
图形 标准化后Beta值的PCA图;

标准化后Beta值的聚类图。

所有图形的名为ExplorativeAnalysis.pdf的pdf文件。

数据框 标准化的M值的data.frame

HumMeth27QCReport()

[编辑]
HumMeth27QCReport(ImportDataR, platform, pval, ChrX, ClustMethod, quoteOutput, normMethod)

参数

[编辑]
ImportDataR ImportData函数的结果;
platform 平台类型,可选值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip);
pval p值阈值
ClustMethod 聚类方法,可选值有 "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" , "kendall";
ChrX 是否将X染色体上的CpGs用于分析,默认FALSE,否;TRUE,是。
quoteOutput 如果标准化后的数据里有非数值项,是否加引号,默认是TRUE是。FALSE否。
normMethod 标准化方法,可选值 "quantile" 或 "ssn",参看lumi包中的lumiMethyN()函数文档默认为 "quantile";

返回值

[编辑]

质控图形和标准化后的Beta值矩阵

getAssayControls()

[编辑]
getAssayControls(ImportDataR, platform)

参数

[编辑]
ImportDataR ImportData函数的结果;
platform 平台类型,可选值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip);

返回值

[编辑]

八个质控图

getFileSepChar()

[编辑]
getFileSepChar(File)

参数

[编辑]
File 可读文本文件的名称

返回值

[编辑]

文件的分隔符号

输入文件的准备

[编辑]

Sample table文件示例:SampleTable.txt

[编辑]
Index Sample ID Sample Group Sentrix Barcode Sample Section Detected Genes (0.01) Detected Genes (0.05) Signal Average GRN Signal Average RED Signal P05 GRN Signal P05 RED Signal P25 GRN Signal P25 RED Signal P50 GRN Signal P50 RED Signal P75 GRN Signal P75 RED Signal P95 GRN Signal P95 RED Sample_Plate Sample_Well
1 Hela_1 Hela Hela 1 27571 27572 6032.731 3968.032 0 0 343 330 941 617 8394 2075 27229 23051 DemoData A01
2 Hela_2 Hela Hela 2 27573 27574 6732.99 3863.289 0 0 359 327 1002 586 9288 1899 30716 22657 DemoData B01
3 Raji_1 Raji Raji 1 27528 27548 6082.553 3778.055 0 0 614 569 1228 911 7691 2150 27401 21009 DemoData C01
4 Raji_2 Raji Raji 2 27568 27577 6632.926 3756.684 0 0 510 463 1179 750 8460 1944 30422 21667 DemoData D01
5 Jurkat_1 Jurkat Jurkat 1 27534 27545 6828.716 3911.746 0 0 519 458 1227 766 8717 2081 30918 22437 DemoData E01
6 Jurkat_2 Jurkat Jurkat 2 27514 27529 7014.263 3903.508 0 0 449 411 1175 695 9040 2026 31989 22589 DemoData F01
7 A431_1 A431 A431 1 27575 27576 6763.988 3758.975 0 0 426 357 1088 616 10578 2192 29410 20948 DemoData G01
8 A431_2 A431 A431 2 27575 27576 6633.289 3840.406 0 0 409 382 1063 645 10428 2284 28832 21356 DemoData H01
9 K562_1 K562 K562 1 27530 27538 6425.563 3720.365 0 0 400 379 995 639 9058 1775 29100 21597 DemoData A02
10 K562_2 K562 K562 2 27535 27543 6547.497 3736.386 0 0 405 413 1015 675 9282 1819 29531 21463 DemoData B02

Control table文件示例:ControlProbeProfile.txt

[编辑]
Index TargetID ProbeID <Sn>.Signal_Grn <Sn>.Signal_Red <Sn>.Detection Pval ... ... ...

<Sn>: = Sample Name

Index TargetID ProbeID Hela_1.Signal_Grn Hela_1.Signal_Red Hela_1.Detection Pval Hela_2.Signal_Grn Hela_2.Signal_Red Hela_2.Detection Pval
1 BISULFITE CONVERSION 4670278 13997 661 3.68E-38 14496 738 3.68E-38
2 BISULFITE CONVERSION 4670484 506 513 1.98E-05 613 502 3.56E-09
3 BISULFITE CONVERSION 5270706 13007 578 3.68E-38 14583 575 3.68E-38
4 BISULFITE CONVERSION 5290048 346 467 0.012058 337 353 0.050596
5 EXTENSION 360446 1262 48739 3.68E-38 1226 49133 3.68E-38
6 EXTENSION 520537 1588 65535 3.68E-38 1498 65535 3.68E-38
7 EXTENSION 1190050 39316 2572 3.68E-38 46292 2545 3.68E-38
8 EXTENSION 2630184 65500 1656 3.68E-38 65535 1355 3.68E-38
9 HYBRIDIZATION 2450040 2637 371 3.68E-38 2821 321 3.68E-38
10 HYBRIDIZATION 5690072 23628 527 3.68E-38 28406 635 3.68E-38
11 HYBRIDIZATION 5690110 10321 247 3.68E-38 11464 286 3.68E-38
12 NEGATIVE 50110 230 295 0.63193 289 427 0.029181
13 NEGATIVE 360079 258 304 0.501571 135 293 0.820958
14 NEGATIVE 430114 161 353 0.668574 169 245 0.854451
15 NEGATIVE 460494 145 359 0.700552 281 191 0.687725
16 NEGATIVE 540577 163 322 0.7571 121 281 0.879578
17 NEGATIVE 610692 110 356 0.807308 159 360 0.512171
18 NEGATIVE 610706 152 383 0.597531 141 336 0.670246
19 NEGATIVE 670750 184 260 0.856797 132 290 0.835865
20 NEGATIVE 1190458 162 389 0.540998 173 368 0.426892
21 NEGATIVE 1500059 299 434 0.062366 226 440 0.080044
22 NEGATIVE 1500167 182 545 0.069276 287 390 0.065245
23 NEGATIVE 1500398 165 258 0.895271 194 341 0.449981
24 NEGATIVE 1660097 163 336 0.715998 158 241 0.885352
25 NEGATIVE 1770019 264 362 0.283621 250 265 0.527731
26 NEGATIVE 1940364 152 439 0.398556 168 388 0.370417
27 NEGATIVE 1990692 220 594 0.011778 208 407 0.182252
28 NON-POLYMORPHIC 110184 21912 773 3.68E-38 23847 525 3.68E-38
29 NON-POLYMORPHIC 1740025 1242 12619 3.68E-38 1267 11792 3.68E-38
30 NON-POLYMORPHIC 2480348 1690 22906 3.68E-38 2112 24975 3.68E-38
31 NON-POLYMORPHIC 2810035 16531 451 3.68E-38 18677 395 3.68E-38
32 SPECIFICITY 3800086 249 463 0.089121 330 389 0.027287
33 SPECIFICITY 3800154 13051 1019 3.68E-38 12701 937 3.68E-38
34 SPECIFICITY 4610400 356 461 0.010974 299 523 0.001706
35 SPECIFICITY 4610725 1207 23986 3.68E-38 1347 23608 3.68E-38
36 STAINING 4200736 1167 33466 3.68E-38 1294 27325 3.68E-38
37 STAINING 4570020 22107 429 3.68E-38 23868 413 3.68E-38
38 STAINING 5050601 778 732 7.48E-18 856 1077 3.68E-38
39 STAINING 5340168 971 469 1.42E-15 983 527 2.62E-22
40 TARGET REMOVAL 580035 331 317 0.22061 337 402 0.017107

BetaAverage table文件示例:AvgBeta.txt

[编辑]
Index TargetID <Sn>.AVG_Beta <Sn>.Intensity <Sn>.Signal_A <Sn>.Signal_B <Sn>.BEAD_STDERR_A <Sn>.BEA

<Sn>: = Sample Name

Index TargetID Hela_1.AVG_Beta Hela_1.Intensity Hela_1.Signal_A Hela_1.Signal_B Hela_1.BEAD_STDERR_A Hela_1.BEAD_STDERR_B Hela_1.Avg_NBEADS_A Hela_1.Avg_NBEADS_B Hela_1.Detection Pval SYMBOL
3 cg00003994 0.02954 12967 12581 386 511.197 38.65517 28 18 3.68E-38 MEOX2
4 cg00005847 0.823062 10966 1858 9108 69.19808 533.973 24 20 3.68E-38 HOXD3
6 cg00007981 0.027668 19345 18807 538 1055.73 58.04218 24 19 3.68E-38 PANX1
7 cg00008493 0.962965 18450 587 17863 36.4771 746.25 19 16 3.68E-38 COX8C
8 cg00008713 0.033844 30245 29218 1027 1039.59 90.22325 21 17 3.68E-38 IMPA2
10 cg00010193 0.553988 55690 24783 30907 1014.98 1716.425 15 17 3.68E-38 FLJ35816
11 cg00011459 0.938229 10018 525 9493 56.12486 454.7543 14 17 3.68E-38 PMM2
13 cg00012386 0.020579 33284 32597 687 2174.17 58.45416 14 18 3.68E-38 C1orf142
14 cg00012792 0.040012 36889 35409 1480 1097.253 136.5769 32 23 3.68E-38 TXNDC5
17 cg00014837 0.884958 8097 843 7254 77.45966 507.9264 15 19 3.68E-38 ACRBP

Discard.txt

[编辑]

想要丢弃的样本,一个样本名一行。

运行HumMeth27QCReport

[编辑]

加载R包

[编辑]
library(HumMeth27QCReport)
Loading required package: methylumi
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: BiocGenericsThe following objects are masked from package:parallel:

    clusterApply, clusterApplyLB, clusterCall,
    clusterEvalQ, clusterExport, clusterMap,
    parApply, parCapply, parLapply, parLapplyLB,
    parRapply, parSapply, parSapplyLB

The following objects are masked from package:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked from package:base:

    anyDuplicated, append, as.data.frame, basename,
    cbind, colnames, dirname, do.call, duplicated,
    eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce,
    rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view
    with 'browseVignettes()'. To cite Bioconductor,
    see 'citation("Biobase")', and for packages
    'citation("pkgname")'.

Loading required package: scales
Loading required package: reshape2
Loading required package: ggplot2
Loading required package: matrixStats

Attaching package: matrixStatsThe following objects are masked from package:Biobase:

    anyMissing, rowMedians

Loading required package: FDb.InfiniumMethylation.hg19
Loading required package: GenomicFeatures
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: S4VectorsThe following object is masked from package:base:

    expand.grid

Loading required package: IRanges

Attaching package: IRangesThe following object is masked from package:grDevices:

    windows

Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
Loading required package: org.Hs.eg.db

Loading required package: minfi
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics

Attaching package: MatrixGenericsThe following objects are masked from package:matrixStats:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet,
    colCollapse, colCounts, colCummaxs, colCummins,
    colCumprods, colCumsums, colDiffs, colIQRDiffs,
    colIQRs, colLogSumExps, colMadDiffs, colMads,
    colMaxs, colMeans2, colMedians, colMins,
    colOrderStats, colProds, colQuantiles,
    colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars,
    colWeightedMads, colWeightedMeans,
    colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys,
    rowAvgsPerColSet, rowCollapse, rowCounts,
    rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs,
    rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,
    rowMeans2, rowMedians, rowMins, rowOrderStats,
    rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates,
    rowVarDiffs, rowVars, rowWeightedMads,
    rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

The following object is masked from package:Biobase:

    rowMedians

Loading required package: Biostrings
Loading required package: XVector

Attaching package: BiostringsThe following object is masked from package:base:

    strsplit

Loading required package: bumphunter
Loading required package: foreach
Loading required package: iterators
Loading required package: locfit
locfit 1.5-9.4 	 2020-03-24
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Loading required package: lumi
No methods found in package RSQLite for request: dbListFields when loading lumiAttaching package: lumiThe following objects are masked from package:methylumi:

    estimateM, getHistory

Loading required package: IlluminaHumanMethylation27k.db

Loading required package: amap
Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula

Attaching package: HmiscThe following objects are masked from package:Biostrings:

    mask, translate

The following object is masked from package:AnnotationDbi:

    contents

The following object is masked from package:Biobase:

    contents

The following objects are masked from package:base:

    format.pval, units

Loading required package: gplots

Attaching package: gplotsThe following object is masked from package:IRanges:

    space

The following object is masked from package:S4Vectors:

    space

The following object is masked from package:stats:

    lowess

Loading required package: plotrix

Attaching package: plotrixThe following object is masked from package:gplots:

    plotCI

The following object is masked from package:scales:

    rescale

Loading required package: WriteXLS
Loading required package: tcltk2
Loading required package: tcltk

Attaching package: tcltk2The following objects are masked from package:Hmisc:

    label, label<-

The following objects are masked from package:SummarizedExperiment:

    values, values<-

The following objects are masked from package:GenomicRanges:

    values, values<-

The following objects are masked from package:IRanges:

    values, values<-

The following objects are masked from package:S4Vectors:

    values, values<-

Warning messages:
1: replacing previous import FDb.InfiniumMethylation.hg18::get27k by FDb.InfiniumMethylation.hg19::get27k when loading HumMeth27QCReport 
2: replacing previous import FDb.InfiniumMethylation.hg18::get450k by FDb.InfiniumMethylation.hg19::get450k when loading HumMeth27QCReport 
3: replacing previous import FDb.InfiniumMethylation.hg18::getNearestTSS by FDb.InfiniumMethylation.hg19::getNearestTSS when loading HumMeth27QCReport 
4: replacing previous import FDb.InfiniumMethylation.hg18::getNearest by FDb.InfiniumMethylation.hg19::getNearest when loading HumMeth27QCReport 
5: replacing previous import FDb.InfiniumMethylation.hg18::getNearestGene by FDb.InfiniumMethylation.hg19::getNearestGene when loading HumMeth27QCReport 
6: replacing previous import FDb.InfiniumMethylation.hg18::getNearestTranscript by FDb.InfiniumMethylation.hg19::getNearestTranscript when loading HumMeth27QCReport 
7: replacing previous import FDb.InfiniumMethylation.hg18::getPlatform by FDb.InfiniumMethylation.hg19::getPlatform when loading HumMeth27QCReport 
8: replacing previous import Hmisc::label<- by tcltk2::label<- when loading HumMeth27QCReport 
9: replacing previous import Hmisc::label by tcltk2::label when loading HumMeth27QCReport

NormCheck()运行示例

[编辑]
Dir <- system.file("extdata/",package="HumMeth27QCReport")
ImportDataR <- ImportData(Dir)
normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")
Perform quantile color balance adjustment ...
Processing sample Hela_1 ...
Processing sample Hela_2 ...
Processing sample Raji_1 ...
Processing sample Raji_2 ...
Processing sample Jurkat_1 ...
Processing sample Jurkat_2 ...
Processing sample A431_1 ...
Processing sample A431_2 ...
Processing sample K562_1 ...
Processing sample K562_2 ...
Perform quantile normalization ...
Warning message:
In prcomp.default(t(data.nona), tol = 0.1, na.action = na.omit, 
    center = T, scale = T) :
 extra argument na.action will be disregarded

QCCheck()运行示例

[编辑]
ControlResults <- getAssayControls(ImportDataR,platform="Hum27")
QCresults <- QCCheck(ImportDataR, pval=0.05)
normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")#结果同上
The purpose of this method is better served by diagnostics()

结果解读

[编辑]

输出解读

质控图解读

后续分析

[编辑]

与450k芯片后续分析相同