Title: | Annotated Copy-Number Regions |
---|---|
Description: | This data package provides SNP array data from different types of copy-number regions. These regions were identified manually by the authors of the package and may be used to generate realistic data sets with known truth. |
Authors: | Morgane Pierre-Jean and Pierre Neuvial |
Maintainer: | Morgane Pierre-Jean <[email protected]> |
License: | LGPL (>= 2.1) |
Version: | 0.3.2 |
Built: | 2024-11-14 05:14:32 UTC |
Source: | https://github.com/mpierrejean/acnr |
This data package contains SNP array data from different types of copy-number regions. These regions were identified manually by the authors of the package and may be used to generate realistic data sets with known truth.
Package: | acnr |
Type: | Package |
Title: | Annotated Copy-Number Regions |
Version: | 0.2.2 |
Date: | 2014-09-08 |
Author: | Morgane Pierre-Jean and Pierre Neuvial |
Maintainer: | Morgane Pierre-Jean <[email protected]> |
License: | LGPL (>= 2.1) |
Depends: | R (>= 2.10), R.utils |
Suggests: | RUnit, BiocGenerics |
biocViews: | ExperimentData |
Morgane Pierre-Jean and Pierre Neuvial
Get minor and major copy number labels from region annotation labels
getMinorMajorCopyNumbers(region)
getMinorMajorCopyNumbers(region)
region |
A character value, the annotation label for a copy number
region. Should be encoded as
|
A matrix
with length(region)
rows and two columns:
C1
and C2
, as described above.
Bengtsson H., Neuvial, P. and Speed, T. P. (2010) TumorBoost: normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC bioinformatics 11 (2010), p. 245.
Neuvial, P., Bengtsson H., and Speed, T. P. (2011) Statistical analysis of Single Nucleotide Polymorphism microarrays in cancer studies. Chapter 11 in *Handbook of Statistical Bioinformatics*, Springer.
dat <- loadCnRegionData(dataSet="GSE29172_H1395", tumorFraction=1) regions <- unique(dat$region) getMinorMajorCopyNumbers(regions)
dat <- loadCnRegionData(dataSet="GSE29172_H1395", tumorFraction=1) regions <- unique(dat$region) getMinorMajorCopyNumbers(regions)
The GEO GSE11976 data set is a dilution series from the Illumina HumanCNV370v1 chip type (Staaf et al, 2008).
A data frame with 770668 observations of 7 variables:
total copy number (not log-scaled)
allelic ratios in the diluted tumor sample (after TumorBoost)
germline genotypes
a character value, annotation label for the region. Should be
encoded as "(C1,C2)"
, where C1
denotes the minor copy number
and C2
denotes the major copy number. For example,
Normal
Hemizygous deletion
Homozygous deletion
Single copy gain
Copy-neutral LOH
Balanced two-copy gain
Unbalanced two-copy gain
Single-copy gain with LOH
A numeric value between 0 and 1, the percentage of tumor cells in the sample.
@source http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11976 @references Staaf, J., Lindgren, D., Vallon-Christersson, J., Isaksson, A., Goransson, H., Juliusson, G., ... & Ringn\'er, M. (2008). Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol, 9(9), R136.
These data have been processed from the files available at http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/ using scripts that are included in the 'inst/preprocessing/GSE11976' directory of this package.
dat <- loadCnRegionData("GSE11976_CRL2324") unique(dat$region)
dat <- loadCnRegionData("GSE11976_CRL2324") unique(dat$region)
The GEO GSE13372 data set is from the Affymetrix GenomeWideSNP_6 chip type. We have extracted one tumor/normal pair corresponding to the breast cancer cell line HCC1143. For consistency with the other data sets in the package the tumor and normal samples are labeled according to their tumor cellularity, that is, 100
A data frame with 205842 observations of 7 variables:
total copy number (not log-scaled)
allelic ratios in the diluted tumor sample (after TumorBoost)
germline genotypes
allelic ratios in the diluted tumor sample (before TumorBoost)
allelic ratios in the matched normal sample
a character value, annotation label for the region. Should be
encoded as "(C1,C2)"
, where C1
denotes the minor copy number
and C2
denotes the major copy number. For example,
Normal
Hemizygous deletion
Homozygous deletion
Single copy gain
Copy-neutral LOH
Balanced two-copy gain
Unbalanced two-copy gain
Single-copy gain with LOH
the (germline) genotype of SNPs. By definition, rows with missing genotypes are interpreted as non-polymorphic loci (a.k.a. copy number probes).
A numeric value between 0 and 1, the percentage of tumor cells in the sample.
These data have been processed from the files available from GEO using scripts that are included in the 'inst/preprocessing/GSE13372' directory of this package. This processing includes normalization of the raw CEL files using the CRMAv2 method implemented in the aroma.affymetrix package.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13372 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13372
Chiang DY, Getz G, Jaffe DB, O'Kelly MJ et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 2009 Jan;6(1):99-103. PMID: 19043412
Bengtsson, H., Wirapati , P. & Speed, T.P. (2009). A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6, Bioinformatics 25(17), pp. 2149-56.
Bengtsson H., Neuvial, P. and Speed, T. P. (2010) TumorBoost: normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC bioinformatics 11 (2010), p. 245.
dat <- loadCnRegionData("GSE13372_HCC1143") unique(dat$region)
dat <- loadCnRegionData("GSE13372_HCC1143") unique(dat$region)
The GEO GSE29172 data set is a dilution series from the Affymetrix GenomeWideSNP_6 chip type. The GEO GSE26302 data set contains the experiment corresponding to the matched normal (i.e. 0% dilution).
A data frame with 770668 observations of 7 variables:
total copy number (not log-scaled)
allelic ratios in the diluted tumor sample (after TumorBoost)
germline genotypes
allelic ratios in the diluted tumor sample (before TumorBoost)
allelic ratios in the matched normal sample
a character value, annotation label for the region. Should be
encoded as "(C1,C2)"
, where C1
denotes the minor copy number
and C2
denotes the major copy number. For example,
Normal
Hemizygous deletion
Homozygous deletion
Single copy gain
Copy-neutral LOH
Balanced two-copy gain
Unbalanced two-copy gain
Single-copy gain with LOH
the (germline) genotype of SNPs. By definition, rows with missing genotypes are interpreted as non-polymorphic loci (a.k.a. copy number probes).
A numeric value between 0 and 1, the percentage of tumor cells in the sample.
These data have been processed from the files available from GEO using scripts that are included in the 'inst/preprocessing/GSE29172' directory of this package. This processing includes normalization of the raw CEL files using the CRMAv2 method implemented in the aroma.affymetrix package.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE29172 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26302
Rasmussen, M., Sundstr\"om, M., Kultima, H. G., Botling, J., Micke, P., Birgisson, H., Glimelius, B. & Isaksson, A. (2011). Allele-specific copy number analysis of tumor samples with aneuploidy and tumor heterogeneity. Genome Biology, 12(10), R108.#'
Bengtsson, H., Wirapati , P. & Speed, T.P. (2009). A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6, Bioinformatics 25(17), pp. 2149-56.
Bengtsson H., Neuvial, P. and Speed, T. P. (2010) TumorBoost: normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC bioinformatics 11 (2010), p. 245.
dat <- loadCnRegionData("GSE29172_H1395") unique(dat$region)
dat <- loadCnRegionData("GSE29172_H1395") unique(dat$region)
List available data sets
listDataSets()
listDataSets()
name of one of the data sets of the package, see listDataSets
listDataSets()
listDataSets()
List of available tumor fractions for a data set
listTumorFractions(dataSet)
listTumorFractions(dataSet)
dataSet |
The name of a data set from the package, see listDataSets |
A numeric vector, the available tumor fractions for a data set
dataSets <- listDataSets() fracs <- listTumorFractions(dataSets[1])
dataSets <- listDataSets() fracs <- listTumorFractions(dataSets[1])
Load real, annotated copy number data
loadCnRegionData(dataSet, tumorFraction = 1)
loadCnRegionData(dataSet, tumorFraction = 1)
dataSet |
name of one of the data sets of the package, see
|
tumorFraction |
proportion of tumor cells in the "tumor" sample (a.k.a.
tumor cellularity). See |
This function is a wrapper to load real genotyping array data taken from
* a dilution series from the Affymetrix GenomeWideSNP_6 chip type (Rasmussen
et al, 2011), see GSE29172_H1395
* a dilution series from the
Illumina HumanCNV370v1 chip type (Staaf et al, 2008), see
GSE11976_CRL2324
* a tumor/normal pair from the Affymetrix
GenomeWideSNP_6 chip type (Chiang et al, 2008), see
GSE13372_HCC1143
a data.frame containing copy number data for different types of copy number regions. Columns:
Total copy number
Allele B fraction (a.k.a. BAF)
a character value,
annotation label for the region. Should be encoded as "(C1,C2)"
,
where C1
denotes the minor copy number and C2
denotes the
major copy number. For example,
Normal
Hemizygous deletion
Homozygous deletion
Single copy gain
Copy-neutral LOH
Balanced two-copy gain
Unbalanced two-copy gain
Single-copy gain with LOH
the (germline) genotype of SNPs. By definition, rows with missing genotypes are interpreted as non-polymorphic loci (a.k.a. copy number probes).
Morgane Pierre-Jean and Pierre Neuvial
affyDat <- loadCnRegionData(dataSet="GSE29172_H1395", tumorFraction=1) str(affyDat) illuDat <- loadCnRegionData(dataSet="GSE11976_CRL2324", tumorFraction=.79) str(illuDat) affyDat2 <- loadCnRegionData(dataSet="GSE13372_HCC1143", tumorFraction=1) str(affyDat2)
affyDat <- loadCnRegionData(dataSet="GSE29172_H1395", tumorFraction=1) str(affyDat) illuDat <- loadCnRegionData(dataSet="GSE11976_CRL2324", tumorFraction=.79) str(illuDat) affyDat2 <- loadCnRegionData(dataSet="GSE13372_HCC1143", tumorFraction=1) str(affyDat2)