Title: | Joint Segmentation of Multivariate (Copy Number) Signals |
---|---|
Description: | Methods for fast segmentation of multivariate signals into piecewise constant profiles and for generating realistic copy-number profiles. A typical application is the joint segmentation of total DNA copy numbers and allelic ratios obtained from Single Nucleotide Polymorphism (SNP) microarrays in cancer studies. The methods are described in Pierre-Jean, Rigaill and Neuvial (2015) <doi:10.1093/bib/bbu026>. |
Authors: | Morgane Pierre-Jean [aut, cre], Pierre Neuvial [aut], Guillem Rigaill [aut] |
Maintainer: | Morgane Pierre-Jean <[email protected]> |
License: | LGPL (>= 2.1) |
Version: | 1.0.2 |
Built: | 2025-01-06 04:48:35 UTC |
Source: | https://github.com/mpierrejean/jointseg |
Perform binning in order to remove missing values
binMissingValues(Y, verbose = FALSE)
binMissingValues(Y, verbose = FALSE)
Y |
A numeric matrix |
verbose |
A |
Some segmentation methods (in particular, GFLars) do not natively handle the
situation when some observations have missing values in one or more
dimensions. In order to avoid dropping the corresponding observations
entirely, binMissingValues
bins the signal values of the last complete
observation before a (range of) observations with missing entries using the
binMeans
function.
In the specific case when the first row has NA values, the first non-missing entry is replicated in order to make smoothing possible. This choice is arbitrary but some arbitrary choice is needed in that case.
Currently this function is only used by doGFLars
in order
to make it possible to run GFLars segmentation on SNP array data where most
markers (on the order of 2/3 to 5/6) have missing values, because of
uninformative or missing allelic ratio signals.
The binMissingValues
function may be used for other segmentation
methods suffering from the same limitation. However, we emphasize that
handling missing values natively in the segmentation method would be a
better solution.
Currently this function is only used by doGFLars
in order
to make it possible to run GFLars segmentation on SNP array data where most
markers (on the order of 2/3 to 5/6) have missing values, because of
uninformative or missing allelic ratio signals. The binMissingValues
function may be used for other segmentation methods suffering from the same
limitation. However, we emphasize that handling missing values natively in
the segmentation method would be a better solution.
Morgane Pierre-Jean and Pierre Neuvial
Bleakley, K., & Vert, J. P. (2011). The group fused lasso for multiple change-point detection. arXiv preprint arXiv:1106.4199.
Vert, J. P., & Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. Advances in Neural Information Processing Systems, 23, 2343-2351.
sim <- randomProfile(10, 1, 0.1, 3) Y <- sim$profile Y[c(4, 8), 2] <- NA Y[c(7, 8), 3] <- NA res <- binMissingValues(Y) Y <- sim$profile Y[1:5, 2] <- NA Yb <- binMissingValues(Y) Y <- sim$profile Y[3:5, 2] <- NA Yb <- binMissingValues(Y)
sim <- randomProfile(10, 1, 0.1, 3) Y <- sim$profile Y[c(4, 8), 2] <- NA Y[c(7, 8), 3] <- NA res <- binMissingValues(Y) Y <- sim$profile Y[1:5, 2] <- NA Yb <- binMissingValues(Y) Y <- sim$profile Y[3:5, 2] <- NA Yb <- binMissingValues(Y)
This function is a wrapper for convenient use of the CBS
segmentation
method by PSSeg
. It applies the
segment
function and reshapes the results
doCBS(y, ..., verbose = FALSE)
doCBS(y, ..., verbose = FALSE)
y |
A numeric vector, the signal to be segmented |
... |
Arguments to be passed to |
verbose |
A |
A list with a single element:
breakpoint positions
Morgane Pierre-Jean and Pierre Neuvial
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run CBS segmentation res <- doCBS(datS[["c"]]) getTpFp(res$bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(datS, breakpoints=list(sim$bkp, res$bkp))
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run CBS segmentation res <- doCBS(datS[["c"]]) getTpFp(res$bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(datS, breakpoints=list(sim$bkp, res$bkp))
High-level function for univariate or multivariate segmentation by dynamic programming
doDynamicProgramming(Y, K, stat = NULL, verbose = FALSE)
doDynamicProgramming(Y, K, stat = NULL, verbose = FALSE)
Y |
A numeric vector or a matrix, the signal to be segmented |
K |
The number of change points to find |
stat |
A vector containing the names or indices of the columns of
|
verbose |
A |
If the signal is uni-dimensional, this function simply uses the segmentation
method provided in the cghseg
package reshapes the results.
If the signal is multi-dimensional, this function applies the
pruneByDP
function and reshapes the results.
bkp |
A vector of |
dpseg |
A list of two elements
|
This is essentially a wrapper for convenient segmentation by dynamic
programming using the PSSeg
function.
Morgane Pierre-Jean and Pierre Neuvial
Rigaill, G. (2015). A pruned dynamic programming algorithm to recover the best segmentations with 1 to K_max change-points. Journal de la Societe Francaise de Statistique, 156(4), 180-205.
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run pruned DPA segmentation resDP <- doDynamicProgramming(datS[["c"]], K=K) getTpFp(resDP$bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(datS, breakpoints=list(sim$bkp, resDP$bkp)) ## run 2d dynamic programming segmentation K <- 2 len <- 1e3 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile datS$d <- 2*abs(datS$b-1/2) datS[which(datS$genotype!=0.5),"d"] <- NA Y = cbind(datS$c,datS$d) resDP2d <- doDynamicProgramming(Y, K = K)
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run pruned DPA segmentation resDP <- doDynamicProgramming(datS[["c"]], K=K) getTpFp(resDP$bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(datS, breakpoints=list(sim$bkp, resDP$bkp)) ## run 2d dynamic programming segmentation K <- 2 len <- 1e3 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile datS$d <- 2*abs(datS$b-1/2) datS[which(datS$genotype!=0.5),"d"] <- NA Y = cbind(datS$c,datS$d) resDP2d <- doDynamicProgramming(Y, K = K)
High-level function for multivariate fused Lars (GFLars) segmentation
doGFLars(Y, K, stat = NULL, ..., verbose = FALSE)
doGFLars(Y, K, stat = NULL, ..., verbose = FALSE)
Y |
A |
K |
The number of change points to find |
stat |
A vector containing the names or indices of the columns of
|
... |
Further arguments to be passed to 'segmentByGFLars' |
verbose |
A |
This function is a wrapper around the lower-level segmentation function
segmentByGFLars
. It can be run on p-dimensional,
piecewise-constant data in order to defined a set of candidate change
points. It is recommended to prune this list of candidates using dynamic
programming (pruneByDP
), combined with a selection of the best
number of change points. The jointSeg
function provides a
convenient wrapper for performing segmentation, pruning and model selection.
For the specific case of DNA copy number data segmentation, see the
dedicated wrapper PSSeg
.
The default weights are calibrated as suggested by
Bleakley and Vert (2011). Using this calibration, the first breakpoint
maximizes the likelihood ratio test (LRT) statistic.
An object of the same structure as the output of
segmentByGFLars
This implementation is derived from the MATLAB code by Vert and Bleakley: http://cbio.ensmp.fr/GFLseg.
Morgane Pierre-Jean and Pierre Neuvial
Bleakley, K., & Vert, J. P. (2011). The group fused lasso for multiple change-point detection. arXiv preprint arXiv:1106.4199.
Vert, J. P., & Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. Advances in Neural Information Processing Systems, 23, 2343-2351.
p <- 2 trueK <- 10 sim <- randomProfile(1e4, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- doGFLars(Y, K) print(res$bkp) print(sim$bkp) plotSeg(Y, res$bkp) ## a toy example with missing values sim <- randomProfile(1e2, 1, 0.1, 2) Y <- sim$profile Y[3:50, 2] <- NA res <- doGFLars(Y, 10, 2, verbose=TRUE) print(res$bkp) print(sim$bkp) plotSeg(Y, res$bkp)
p <- 2 trueK <- 10 sim <- randomProfile(1e4, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- doGFLars(Y, K) print(res$bkp) print(sim$bkp) plotSeg(Y, res$bkp) ## a toy example with missing values sim <- randomProfile(1e2, 1, 0.1, 2) Y <- sim$profile Y[3:50, 2] <- NA res <- doGFLars(Y, 10, 2, verbose=TRUE) print(res$bkp) print(sim$bkp) plotSeg(Y, res$bkp)
This function is a wrapper for convenient use of the PSCBS
segmentation method by PSSeg
. It applies the
segmentByPairedPSCBS
function and reshapes the results
doPSCBS(Y, ..., verbose = FALSE)
doPSCBS(Y, ..., verbose = FALSE)
Y |
A matrix of signals to be segmented, containing the following columns
|
... |
Arguments to be passed to
|
verbose |
A |
A list with a single element:
breakpoint positions
Morgane Pierre-Jean and Pierre Neuvial
## Not run: ## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run PSCBS segmentation Y <- as.matrix(subset(datS, select=c(c, b, genotype))) res <- doPSCBS(Y) getTpFp(res$bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(datS, breakpoints=list(sim$bkp, res$bkp)) ## End(Not run)
## Not run: ## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run PSCBS segmentation Y <- as.matrix(subset(datS, select=c(c, b, genotype))) res <- doPSCBS(Y) getTpFp(res$bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(datS, breakpoints=list(sim$bkp, res$bkp)) ## End(Not run)
The 'PSCN' package is not maintained anymore and it is not available for R >= 3.0.0. The original 'doPSCN' function has been moved to the directory 'zzz.defunct'. The skeleton of that function is kept for backward compatibility.
doPSCN(Y, alpha = 0.01, platform = c("Illumina", "Affymetrix"), verbose = FALSE, ...)
doPSCN(Y, alpha = 0.01, platform = c("Illumina", "Affymetrix"), verbose = FALSE, ...)
Y |
The signal to be segmented, a matrix containing the following columns:
|
alpha |
sensitivity level in [0,1] to be passed to
|
platform |
Specifies form which array platform 'Y' was generated: Illumina or Affymetrix |
verbose |
A |
... |
Further arguments to be passed to |
Morgane Pierre-Jean and Pierre Neuvial
Chen, H., Xing, H., & Zhang, N. R. (2011). Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays. PLoS computational biology, 7(1), e1001060.
print("The 'PSCN' package is not available for R >= 3.0.0.") print("See http://cran.r-project.org/web/packages/PSCN/index.html")
print("The 'PSCN' package is not available for R >= 3.0.0.") print("See http://cran.r-project.org/web/packages/PSCN/index.html")
High-level function for multivariate recursive binary (RBS) segmentation
doRBS(Y, K, stat = NULL, ..., verbose = FALSE)
doRBS(Y, K, stat = NULL, ..., verbose = FALSE)
Y |
A |
K |
The number of change points to find |
stat |
A vector containing the names or indices of the columns of
|
... |
Further arguments to be passed to 'segmentByRBS' |
verbose |
A |
This function is a wrapper around the lower-level segmentation function
segmentByRBS
. It can be run on p-dimensional,
piecewise-constant data in order to defined a set of candidate change
points. It is recommended to prune this list of candidates using dynamic
programming (pruneByDP
), combined with a selection of the best
number of change points. The jointSeg
function provides a
convenient wrapper for performing segmentation, pruning and model selection.
An object of the same structure as the output of
segmentByRBS
Morgane Pierre-Jean and Pierre Neuvial
Gey, S., & Lebarbier, E. (2008). Using CART to Detect Multiple Change Points in the Mean for Large Sample. http://hal.archives-ouvertes.fr/hal-00327146/
p <- 2 trueK <- 10 len <- 5e4 sim <- randomProfile(len, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- doRBS(Y, K) getTpFp(res$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives cols <- rep(2, K) cols[1:trueK] <- 3 par(mfrow=c(p,1)) for (ii in 1:p) { plot(Y[, ii], pch=19, cex=0.2) abline(v=res$bkp[1:trueK], col= cols) abline(v=sim$bkp, col=8, lty=2) } ## NA:s in one dimension at a true breakpoint jj <- sim$bkp[1] Y[jj-seq(-10, 10), p] <- NA res2 <- doRBS(Y, K) getTpFp(res2$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives ## NA:s in both dimensions at a true breakpoint Y[jj-seq(-10, 10), ] <- NA res3 <- doRBS(Y, K) getTpFp(res3$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives
p <- 2 trueK <- 10 len <- 5e4 sim <- randomProfile(len, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- doRBS(Y, K) getTpFp(res$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives cols <- rep(2, K) cols[1:trueK] <- 3 par(mfrow=c(p,1)) for (ii in 1:p) { plot(Y[, ii], pch=19, cex=0.2) abline(v=res$bkp[1:trueK], col= cols) abline(v=sim$bkp, col=8, lty=2) } ## NA:s in one dimension at a true breakpoint jj <- sim$bkp[1] Y[jj-seq(-10, 10), p] <- NA res2 <- doRBS(Y, K) getTpFp(res2$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives ## NA:s in both dimensions at a true breakpoint Y[jj-seq(-10, 10), ] <- NA res3 <- doRBS(Y, K) getTpFp(res3$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives
Estimate standard deviation of an unimodal signal with possible changes in mean
estimateSd(y, method = c("Hall", "von Neumann"))
estimateSd(y, method = c("Hall", "von Neumann"))
y |
A numeric vector |
method |
Method used to estimate standard deviation |
von Neumann's estimator is proportional to the mean absolute deviation
(mad
) of the first-order differences of the original signals:
mad(diff(y)
. By construction this estimator is robust to 1) changes
in the mean of the signal (through the use of differences) and 2) outliers
(through the use of mad
instead of mean
).
The proportionality constant ensures
that the resulting estimator is consistent for the estimation of the
standard deviation in the case of Gaussian signals.
Hall's estimator is a weigthed sum of squared elements of y. Let m=3.
Morgane Pierre-Jean and Pierre Neuvial
Von Neumann, J., Kent, R. H., Bellinson, H. R., & Hart, B. T. (1941). The mean square successive difference. The Annals of Mathematical Statistics, 153-162.
Peter Hall, J. W. Kay and D. M. Titterington (1990). Asymptotically Optimal Difference-Based Estimation of Variance in Nonparametric Regression Biometrika,521-528
n <- 1e4 y <- rnorm(n) ## a signal with no change in mean estimateSd(y) estimateSd(y, method="von Neumann") sd(y) mad(y) z <- y + rep(c(0,2), each=n/2) ## a signal with *a single* change in mean estimateSd(z) estimateSd(z, method="von Neumann") sd(z) mad(z) z <- y + rep(c(0,2), each=100) ## a signal with many changes in mean estimateSd(z) estimateSd(z, method="von Neumann") sd(z) mad(z)
n <- 1e4 y <- rnorm(n) ## a signal with no change in mean estimateSd(y) estimateSd(y, method="von Neumann") sd(y) mad(y) z <- y + rep(c(0,2), each=n/2) ## a signal with *a single* change in mean estimateSd(z) estimateSd(z, method="von Neumann") sd(z) mad(z) z <- y + rep(c(0,2), each=100) ## a signal with many changes in mean estimateSd(z) estimateSd(z, method="von Neumann") sd(z) mad(z)
Low-level API for the pruned dynamic programming algorithm (pDPA)
Fpsn(x, Kmax, mini = min(x), maxi = max(x))
Fpsn(x, Kmax, mini = min(x), maxi = max(x))
x |
A vector of double : the signal to be segmented |
Kmax |
Max number of segments |
mini |
Min value for the mean parameter of the segment |
maxi |
Max value for the mean parameter of the segment |
This implementation uses functional pruning and segment neighborhood, and the L2-loss function
A list with a vector containing the position of the change-points
Guillem Rigaill
Rigaill, G. (2015). A pruned dynamic programming algorithm to recover the best segmentations with 1 to K_max change-points. Journal de la Societe Francaise de Statistique, 156(4), 180-205.
doDynamicProgramming
for a higher-level function
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run pruned DPA segmentation res <- Fpsn(datS[["c"]], Kmax=2*K+1) ## plot segmentation results for the true number of breakpoints bkp <- res$t.est[K+1, 1:K] plotSeg(datS, breakpoints=bkp)
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## run pruned DPA segmentation res <- Fpsn(datS[["c"]], Kmax=2*K+1) ## plot segmentation results for the true number of breakpoints bkp <- res$t.est[K+1, 1:K] plotSeg(datS, breakpoints=bkp)
Generate a copy number profile by resampling input data
getCopyNumberDataByResampling(length, nBkp = NA, bkp = NULL, regData = NULL, regions = NULL, regAnnot = NULL, minLength = 0, regionSize = 0, connex = TRUE)
getCopyNumberDataByResampling(length, nBkp = NA, bkp = NULL, regData = NULL, regions = NULL, regAnnot = NULL, minLength = 0, regionSize = 0, connex = TRUE)
length |
length of the profile |
nBkp |
number of breakpoints. If |
bkp |
a numeric vector of breakpoint positions that may be used to
bypass the breakpoint generation step. Defaults to |
regData |
a data.frame containing copy number data for different types of copy number regions. Columns:
|
regions |
a character vector of region labels that may be used to
bypass the region label generation step. Defaults to |
regAnnot |
a data.frame containing annotation data for each copy number region. Columns:
If |
minLength |
minimum length of region between breakpoints. Defaults to 0. |
regionSize |
If |
connex |
If |
This function generates a random copy number profile of length 'length',
with 'nBkp' breakpoints randomly chosen. Between two breakpoints, the
profile is constant and taken among the different types of regions in
regData
.
Elements of regData[["region"]]
must be of the form "(C1,C2)"
,
where C1
denotes the minor copy number and C2
denotes the
major copy number. For example,
Normal
Hemizygous deletion
Homozygous deletion
Single copy gain
Copy-neutral LOH
Balanced two-copy gain
Unbalanced two-copy gain
Single-copy gain with LOH
If 'connex' is set to TRUE (the default), transitions between copy number regions are constrained in such a way that for any breakpoint, one of the minor and the major copy number does not change. Equivalently, this means that all breakpoints can be seen in both total copy numbers and allelic ratios.
A list with elements
the profile (a length
by 2
data.frame
containing the same fields as the input data regData
.
a vector of bkp positions (the last row index before a breakpoint)
regions |
a character vector of region labels |
Morgane Pierre-Jean and Pierre Neuvial
Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." *Briefings in Bioinformatics*, no. 4: 600-615.
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) sim <- getCopyNumberDataByResampling(len=1e4, nBkp=5, minLength=100, regData=affyDat) plotSeg(sim$profile, sim$bkp) ## another run with identical parameters bkp <- sim$bkp regions <- sim$regions sim2 <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDat, regions=regions) plotSeg(sim2$profile, bkp) ## change tumor fraction but keep same "truth" affyDatC <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5) simC <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatC, regions=regions) plotSeg(simC$profile, bkp) ## restrict to only normal, single copy gain, and copy-neutral LOH ## with the same bkp affyDatR <- subset(affyDat, region %in% c("(1,1)", "(0,2)", "(1,2)")) simR <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatR) plotSeg(simR$profile, bkp) ## Same 'truth', on another dataSet regions <- simR$regions illuDat <- acnr::loadCnRegionData(dataSet="GSE11976", tumorFraction=1) sim <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=illuDat, regions=regions) plotSeg(sim$profile, sim$bkp)
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) sim <- getCopyNumberDataByResampling(len=1e4, nBkp=5, minLength=100, regData=affyDat) plotSeg(sim$profile, sim$bkp) ## another run with identical parameters bkp <- sim$bkp regions <- sim$regions sim2 <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDat, regions=regions) plotSeg(sim2$profile, bkp) ## change tumor fraction but keep same "truth" affyDatC <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5) simC <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatC, regions=regions) plotSeg(simC$profile, bkp) ## restrict to only normal, single copy gain, and copy-neutral LOH ## with the same bkp affyDatR <- subset(affyDat, region %in% c("(1,1)", "(0,2)", "(1,2)")) simR <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatR) plotSeg(simR$profile, bkp) ## Same 'truth', on another dataSet regions <- simR$regions illuDat <- acnr::loadCnRegionData(dataSet="GSE11976", tumorFraction=1) sim <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=illuDat, regions=regions) plotSeg(sim$profile, sim$bkp)
Calculate the number of true positives and false positives among candidate breakpoints
getTpFp(candidates, trueBkp, tol, relax = -1)
getTpFp(candidates, trueBkp, tol, relax = -1)
candidates |
Breakpoints found by the methods |
trueBkp |
True breakpoints |
tol |
Tolerance on the position of candidate breakpoints called true |
relax |
Controls the way multiple breapoints within tolerance area are recorded.
|
A list with elements:
The number of true positives
The number of false positives
Morgane Pierre-Jean and Pierre Neuvial
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.7) ## generate a synthetic CN profile K <- 10 len <- 2e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## (group-)fused Lasso segmentation res <- PSSeg(data=datS, K=2*K, method="GFLars", stat="c", profile=TRUE) ## results of the initial (group-)fused lasso segmentation getTpFp(res$initBkp, sim$bkp, tol=10, relax=-1) getTpFp(res$initBkp, sim$bkp, tol=10, relax=0) getTpFp(res$initBkp, sim$bkp, tol=10, relax=1) plotSeg(datS, breakpoints=list(sim$bkp, res$initBkp)) ## results after pruning (group-)fused Lasso candidates by dynamic programming) getTpFp(res$bestBkp, sim$bkp, tol=10, relax=-1) getTpFp(res$bestBkp, sim$bkp, tol=10, relax=0) getTpFp(res$bestBkp, sim$bkp, tol=10, relax=1) plotSeg(datS, breakpoints=list(sim$bkp, res$bestBkp))
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.7) ## generate a synthetic CN profile K <- 10 len <- 2e4 sim <- getCopyNumberDataByResampling(len, K, minLength=100, regData=affyDat) datS <- sim$profile ## (group-)fused Lasso segmentation res <- PSSeg(data=datS, K=2*K, method="GFLars", stat="c", profile=TRUE) ## results of the initial (group-)fused lasso segmentation getTpFp(res$initBkp, sim$bkp, tol=10, relax=-1) getTpFp(res$initBkp, sim$bkp, tol=10, relax=0) getTpFp(res$initBkp, sim$bkp, tol=10, relax=1) plotSeg(datS, breakpoints=list(sim$bkp, res$initBkp)) ## results after pruning (group-)fused Lasso candidates by dynamic programming) getTpFp(res$bestBkp, sim$bkp, tol=10, relax=-1) getTpFp(res$bestBkp, sim$bkp, tol=10, relax=0) getTpFp(res$bestBkp, sim$bkp, tol=10, relax=1) plotSeg(datS, breakpoints=list(sim$bkp, res$bestBkp))
Joint segmentation of multivariate signals in two steps:
first-pass segmentation. By default, a fast, greedy approach is used
(see method
).
pruning of the candidate change points obtained by dynamic programming
jointSeg(Y, method = "RBS", stat = NULL, dpStat = stat, segFUN = NULL, jitter = NULL, modelSelectionMethod = ifelse(match(method, c("DynamicProgramming", "RBS", "GFLars"), nomatch = 0) > 0, "Lebarbier", "none"), modelSelectionOnDP = (match(method, c("DynamicProgramming", "RBS", "GFLars"), nomatch = 0) > 0), ..., profile = FALSE, verbose = FALSE)
jointSeg(Y, method = "RBS", stat = NULL, dpStat = stat, segFUN = NULL, jitter = NULL, modelSelectionMethod = ifelse(match(method, c("DynamicProgramming", "RBS", "GFLars"), nomatch = 0) > 0, "Lebarbier", "none"), modelSelectionOnDP = (match(method, c("DynamicProgramming", "RBS", "GFLars"), nomatch = 0) > 0), ..., profile = FALSE, verbose = FALSE)
Y |
The signal to be segmented (a matrix or a numeric vector) |
method |
A
|
stat |
A vector containing the names or indices of the columns of
|
dpStat |
A vector containing the names or indices of the columns of
|
segFUN |
The segmentation function to be used when |
jitter |
Uncertainty on breakpoint position after initial segmentation.
Defaults to |
modelSelectionMethod |
A character value, the name of the method used to perform model selection. |
modelSelectionOnDP |
A logical value. If |
... |
Further arguments to be passed to the lower-level segmentation
method determined by argument |
profile |
A |
verbose |
A |
If modelSelectionOnDP
is set to FALSE
, then model selection is
run on the sets of the form bkp[1:k]
for , where
bkp
is the set of breakpoints identified by the
initial segmentation. In particular, this implies that the candidate
breakpoints in bkp
are sorted by order of appearance and not by
position.
If jitter
is not NULL
, it should be a vector of integer
indices. The set of candidate breakpoints passed on to dynamic programming is
augmented by all indices distant from an element of jitter
from one of
the candidates. For example, if jitter==c(-1, 0, 1)
and the initial
set of breakpoints is c(1,5)
then dynamic programming is run on
c(1,2,4,5,6)
.
If the return value of the initial segmentation has an element named
dpseg
, then initial segmentation results are not pruned by dynamic
programming.
Bleakley, K., & Vert, J. P. (2011). The group fused lasso for multiple change-point detection. arXiv preprint arXiv:1106.4199.
Vert, J. P., & Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. Advances in Neural Information Processing Systems, 23, 2343-2351.
Gey, S., & Lebarbier, E. (2008). Using CART to Detect Multiple Change Points in the Mean for Large Sample. http://hal.archives-ouvertes.fr/hal-00327146/
Rigaill, G. (2010). Pruned dynamic programming for optimal multiple change-point detection. arXiv preprint arXiv:1004.0887.
## A two-dimensional signal p <- 2 trueK <- 10 len <- 1e4 sim <- randomProfile(len, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- jointSeg(Y, method="RBS", K=K) bkp <- res$bestBkp getTpFp(bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(Y, list(sim$bkp, res$bestBkp), col=1) ## Now we add some NA:s in one dimension jj <- sim$bkp[1] Y[jj-seq(-10,10), p] <- NA res2 <- jointSeg(Y, method="RBS", K=K, verbose=TRUE) bkp <- res2$bestBkp getTpFp(res2$bestBkp, sim$bkp, tol=5, relax = -1) ## true and false positives
## A two-dimensional signal p <- 2 trueK <- 10 len <- 1e4 sim <- randomProfile(len, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- jointSeg(Y, method="RBS", K=K) bkp <- res$bestBkp getTpFp(bkp, sim$bkp, tol=5, relax = -1) ## true and false positives plotSeg(Y, list(sim$bkp, res$bestBkp), col=1) ## Now we add some NA:s in one dimension jj <- sim$bkp[1] Y[jj-seq(-10,10), p] <- NA res2 <- jointSeg(Y, method="RBS", K=K, verbose=TRUE) bkp <- res2$bestBkp getTpFp(res2$bestBkp, sim$bkp, tol=5, relax = -1) ## true and false positives
Map breakpoint positions back to original space
mapPositionsBack(pos)
mapPositionsBack(pos)
pos |
A sorted list of position (or indices) |
Morgane Pierre-Jean and Pierre Neuvial
Select the best number of breakpoints
modelSelection(rse, n, c = 2.5, lambdas = NULL, method = c("Birge", "Lebarbier"))
modelSelection(rse, n, c = 2.5, lambdas = NULL, method = c("Birge", "Lebarbier"))
rse |
RSE as output by |
n |
Length of the profile |
c |
Parameter for the model selection |
lambdas |
A list of candidate values for the calibration of the penalty |
method |
Method to calibrate the constant in the penalty for model selection |
This function is not intended to be called directly, but implicitly through
jointSeg
or PSSeg
.
A list with elements
the best number of breakpoints
A numerical value, the result of an internal model selection function
Morgane Pierre-Jean and Pierre Neuvial
Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal processing, 85(4), 717-736
Birg\'e, L. (2001). Gaussian model selection. J.Eur Math. Soc, 3(3):203-268
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) sim <- getCopyNumberDataByResampling(1e4, 5, minLength=100, regData=affyDat) Y <- as.matrix(sim$profile[, "c"]) ## Find candidate breakpoints K <- 50 resRBS <- segmentByRBS(Y, K=K) ## Prune candidate breakpoints resDP <- pruneByDP(Y, candCP=resRBS$bkp) selectedModel <- modelSelection(rse=resDP$rse, n=nrow(Y), method="Lebarbier") str(selectedModel) ## breakpoints of the best model bestBkp <- resDP$bkp[[selectedModel$kbest]] print(bestBkp) ## truth print(sim$bkp) ## Note that all of the above can be done directly using 'PSSeg' res <- PSSeg(sim$profile, method="RBS", stat="c", K=K) ## stopifnot(identical(res$bestBkp, bestBkp))
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) sim <- getCopyNumberDataByResampling(1e4, 5, minLength=100, regData=affyDat) Y <- as.matrix(sim$profile[, "c"]) ## Find candidate breakpoints K <- 50 resRBS <- segmentByRBS(Y, K=K) ## Prune candidate breakpoints resDP <- pruneByDP(Y, candCP=resRBS$bkp) selectedModel <- modelSelection(rse=resDP$rse, n=nrow(Y), method="Lebarbier") str(selectedModel) ## breakpoints of the best model bestBkp <- resDP$bkp[[selectedModel$kbest]] print(bestBkp) ## truth print(sim$bkp) ## Note that all of the above can be done directly using 'PSSeg' res <- PSSeg(sim$profile, method="RBS", stat="c", K=K) ## stopifnot(identical(res$bestBkp, bestBkp))
Plot signal and breakpoints with segment-level signal estimates
plotSeg(dat, breakpoints = NULL, regNames = NULL, exclNames = c("genotype", "region", "bT", "bN", "cellularity"), ylabs = colnames(dat), ylims = NULL, binExclPattern = "^b[N|T]*$", col = "#33333333", pch = 19, cex = 0.3)
plotSeg(dat, breakpoints = NULL, regNames = NULL, exclNames = c("genotype", "region", "bT", "bN", "cellularity"), ylabs = colnames(dat), ylims = NULL, binExclPattern = "^b[N|T]*$", col = "#33333333", pch = 19, cex = 0.3)
dat |
A |
breakpoints |
A vector of breakpoints positions, or a |
regNames |
Region labels, a vector of length
|
exclNames |
A vector of column names corresponding to columns that should not be plotted. |
ylabs |
A vector of 'y' labels (column names or indices) that should be plotted. |
ylims |
An optional |
binExclPattern |
A vector of column names or indices in
|
col |
Color of plotting symbol, see |
pch |
Plotting symbol, see |
cex |
Magnification factor for plotting symbol, see |
Argument 'binCols' is mainly used to avoid calculating mean levels for allelic ratios, which would not make sense as they are typically multimodal.
Morgane Pierre-Jean and Pierre Neuvial
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) sim <- getCopyNumberDataByResampling(1e4, 5, minLength=100, regData=affyDat) dat <- sim$profile res <- PSSeg(dat, method="RBS", stat=c("c", "d"), K=50) bkpList <- list(true=sim$bkp, est=res$bestSeg) plotSeg(dat, breakpoints=bkpList)
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1) sim <- getCopyNumberDataByResampling(1e4, 5, minLength=100, regData=affyDat) dat <- sim$profile res <- PSSeg(dat, method="RBS", stat=c("c", "d"), K=50) bkpList <- list(true=sim$bkp, est=res$bestSeg) plotSeg(dat, breakpoints=bkpList)
Exact segmentation of a multivariate signal using dynamic programming.
pruneByDP(Y, candCP = 1:(nrow(Y) - 1), K = length(candCP), allowNA = TRUE, verbose = FALSE)
pruneByDP(Y, candCP = 1:(nrow(Y) - 1), K = length(candCP), allowNA = TRUE, verbose = FALSE)
Y |
A |
candCP |
A vector of candidate change point positions (defaults to 1:(n-1)) |
K |
The maximum number of change points to find |
allowNA |
A boolean value specifying whether missing values should be allowed or not. |
verbose |
A |
This function retrieves the maximum likelihood solution of the gaussian
homoscedastic change model into segments, for . The dynamic programming algorithm used is quadratic in
time. For signals containing more than 1000 points, we recommend using a
first pass segmentation (see
segmentByRBS
) to find a smaller
number of candidates, and to run pruneByDP
on these candidates only,
as initially suggested by Gey and Lebarbier (2008). These two steps can be
performed using jointSeg
for generic multivariate signals, and
using PSSeg
for copy number signals from SNP array data.
if allowNA
, the calulation of the cost of removing candidate
breakpoints between i and j for i<j tolerates missing values. Method
!allowNA
is maintained in order to check consistency with the
original dynamic programming in the absence of NA:s.
A list with elements:
bkpList |
A list of vectors of change point positions for the best model with k change points, for k=1, 2, ... K |
rse |
A vector of K+1 residual squared errors |
V |
V[i,j] is the best RSE for segmenting intervals 1 to j |
This implementation is derived from the MATLAB code by Vert and Bleakley: http://cbio.ensmp.fr/GFLseg.
Morgane Pierre-Jean and Pierre Neuvial
Bellman, R. (1961). On the approximation of curves by line segments using dynamic programming. Communications of the ACM, 4(6), 284.
Gey, S., & Lebarbier, E. (2008). Using CART to Detect Multiple Change Points in the Mean for Large Sample. http://hal.archives-ouvertes.fr/hal-00327146/
p <- 2 trueK <- 10 sim <- randomProfile(1e4, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- segmentByRBS(Y, K) resP <- pruneByDP(Y, res$bkp) ## Note that all of the above can be dmethod=="other"one directly using 'jointSeg' resJ <- jointSeg(sim$profile, method="RBS", K=K) stopifnot(identical(resP$bkpList, resJ$dpBkp)) ## check consistency when no NA resP2 <- pruneByDP(Y, res$bkp, allowNA=FALSE) max(abs(resP$rse-resP2$rse)) plotSeg(Y, list(resP$bkp[[trueK]], sim$bkp), col=1)
p <- 2 trueK <- 10 sim <- randomProfile(1e4, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- segmentByRBS(Y, K) resP <- pruneByDP(Y, res$bkp) ## Note that all of the above can be dmethod=="other"one directly using 'jointSeg' resJ <- jointSeg(sim$profile, method="RBS", K=K) stopifnot(identical(resP$bkpList, resJ$dpBkp)) ## check consistency when no NA resP2 <- pruneByDP(Y, res$bkp, allowNA=FALSE) max(abs(resP$rse-resP2$rse)) plotSeg(Y, list(resP$bkp[[trueK]], sim$bkp), col=1)
This function splits (bivariate) copy number signals into parent-specific (PS) segments using recursive binary segmentation
PSSeg(data, method, stat = NULL, dropOutliers = TRUE, rankTransform = FALSE, ..., profile = FALSE, verbose = FALSE)
PSSeg(data, method, stat = NULL, dropOutliers = TRUE, rankTransform = FALSE, ..., profile = FALSE, verbose = FALSE)
data |
Data frame containing the following columns:
These data are assumed to be ordered by genome position. |
method |
|
stat |
A vector containing the names or indices of the columns of
|
dropOutliers |
If TRUE, outliers are droped by using DNAcopy package |
rankTransform |
If TRUE, data are replaced by their ranks before segmentation |
... |
Further arguments to be passed to |
profile |
Trace time and memory usage ? |
verbose |
A |
Before segmentation, the decrease in heterozygosity d=2|b-1/2|
defined
in Bengtsson et al, 2010 is calculated from the input data. d
is only
defined for heterozygous SNPs, that is, SNPs for which
data$genotype==1/2
. d
may be seen as a "mirrored" version of
allelic ratios (b
): it converts them to a piecewise-constant signals
by taking advantage of the bimodality of b
for heterozygous SNPs. The
rationale for this transformation is that allelic ratios (b
) are only
informative for heterozygous SNPs (see e.g. Staaf et al, 2008).
Before segmentation, the outliers in the copy number signal are droped according the method explained by Venkatraman, E. S. and Olshen, A. B., 2007.
The resulting data are then segmented using the jointSeg
function, which combines an initial segmentation according to argument
method
and pruning of candidate change points by dynamic programming
(skipped when the initial segmentation *is* dynamic programming).
If argument stat
is not provided, then dynamic programming is run on
the two dimensional statistic "(c,d)"
.
If argument stat
is provided, then dynamic programming is run on
stat
; in this case we implicitly assume that stat
is a
piecewise-constant signal.
A list with elements
Best set of breakpoints after dynamic programming
Results of the
initial segmentation, using 'doNnn', where 'Nnn' corresponds to argument
method
Results of dynamic programming, a list of
vectors of breakpoint positions for the best model with k breakpoints for
k=1, 2, ... K where K=length(initBkp)
a matrix
providing time usage (in seconds) and memory usage (in Mb) for the main
steps of the program. Only defined if argument profile
is set to
TRUE
Morgane Pierre-Jean and Pierre Neuvial
Bengtsson, H., Neuvial, P., & Speed, T. P. (2010). TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC bioinformatics, 11(1), 245.
Staaf, J., Lindgren, D., Vallon-Christersson, et al. (2008). Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol, 9(9), R136.
Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." *Briefings in Bioinformatics*, no. 4: 600-615.
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, regData=affyDat) datS <- sim$profile ## run binary segmentation (+ dynamic programming) resRBS <- PSSeg(data=datS, method="RBS", stat=c("c", "d"), K=2*K, profile=TRUE) resRBS$prof getTpFp(resRBS$bestBkp, sim$bkp, tol=5) plotSeg(datS, breakpoints=list(sim$bkp, resRBS$bestBkp))
## load known real copy number regions affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5) ## generate a synthetic CN profile K <- 10 len <- 1e4 sim <- getCopyNumberDataByResampling(len, K, regData=affyDat) datS <- sim$profile ## run binary segmentation (+ dynamic programming) resRBS <- PSSeg(data=datS, method="RBS", stat=c("c", "d"), K=2*K, profile=TRUE) resRBS$prof getTpFp(resRBS$bestBkp, sim$bkp, tol=5) plotSeg(datS, breakpoints=list(sim$bkp, resRBS$bestBkp))
Generate a random multi-dimensional profile with breakpoints and noise
randomProfile(length, nBkp, noiseLevel, dim, minLength = 0)
randomProfile(length, nBkp, noiseLevel, dim, minLength = 0)
length |
length of the profile |
nBkp |
number of breakpoints |
noiseLevel |
variance of the signal between two breakpoints |
dim |
dimension of the profile |
minLength |
minimum length of region between breakpoints by default minLength = 0 |
Generate a random profile (vector) of length length
, with nBkp
breakpoints randomly chosen. Between two breakpoints, the profile is
constant, uniformly chosen between 0 and 1, and a Gaussian noice of variance
noiseLevel
is added.
a list
with elements
the profile (a length
by dim
matrix)
the list of breakpoints positions (the last position at the left of a breakpoint)
This implementation is derived from the MATLAB code by Vert and Bleakley: http://cbio.ensmp.fr/GFLseg.
Morgane Pierre-Jean and Pierre Neuvial
len <- 1e4 nBkp <- 10 noiseLevel <- 1 dim <- 2 sim <- randomProfile(len, nBkp, noiseLevel, dim) res <- doGFLars(sim$profile, K=5*nBkp) str(res)
len <- 1e4 nBkp <- 10 noiseLevel <- 1 dim <- 2 sim <- randomProfile(len, nBkp, noiseLevel, dim) res <- doGFLars(sim$profile, K=5*nBkp) str(res)
Extract endpoint matrix from DP result
retour_sn(path)
retour_sn(path)
path |
the path vector of the "colibri_sn_R_c C" function |
Low-level function for multivariate fused Lars segmentation (GFLars)
segmentByGFLars(Y, K, weights = defaultWeights(nrow(Y)), epsilon = 1e-09, verbose = FALSE)
segmentByGFLars(Y, K, weights = defaultWeights(nrow(Y)), epsilon = 1e-09, verbose = FALSE)
Y |
A |
K |
The number of change points to find |
weights |
A |
epsilon |
Values smaller than epsilon are considered null. Defaults to
|
verbose |
A |
This function recrusively looks for the best candidate change point
according to group-fused LARS. This is a low-level function. It is generally
advised to use the wrapper doGFLars
which also works on data
frames, has a convenient argument stat
, and includes a basic
workaround for handling missing values.
See also jointSeg
for combining group fused LARS segmentation
with pruning by dynamic programming (pruneByDP
).
See PSSeg
for segmenting genomic signals from SNP arrays.
The default weights are calibrated as suggested by
Bleakley and Vert (2011). Using this calibration, the first breakpoint
maximizes the likelihood ratio test (LRT) statistic.
A list with elements:
A vector of k
candidate
change-point positions
The estimated lambda values for each change-point
A vector of length p
, the mean signal per
column
A i x p
matrix of change-point values for the
first i change-points
, a
n-1 x K
matrix
This implementation is derived from the MATLAB code by Vert and Bleakley: http://cbio.ensmp.fr/GFLseg.
Morgane Pierre-Jean and Pierre Neuvial
Bleakley, K., & Vert, J. P. (2011). The group fused lasso for multiple change-point detection. arXiv preprint arXiv:1106.4199.
Vert, J. P., & Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. Advances in Neural Information Processing Systems, 23, 2343-2351.
PSSeg
, jointSeg
,
doGFLars
, pruneByDP
p <- 2 trueK <- 10 sim <- randomProfile(1e4, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- segmentByGFLars(Y, K) print(res$bkp) print(sim$bkp) plotSeg(Y, res$bkp)
p <- 2 trueK <- 10 sim <- randomProfile(1e4, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- segmentByGFLars(Y, K) print(res$bkp) print(sim$bkp) plotSeg(Y, res$bkp)
Low-level function for multivariate Recursive Binary Segmentation (RBS)
segmentByRBS(Y, K, minRegionSize = 2, verbose = FALSE)
segmentByRBS(Y, K, minRegionSize = 2, verbose = FALSE)
Y |
A |
K |
The number of change points to find |
minRegionSize |
Regions with less than |
verbose |
A |
This function recrusively looks for the best candidate change point
according to binary segmentation. This is the low-level function. It is
generally advised to use the wrapper doRBS
which also works on
data frames and has a convenient argument stat
.
See jointSeg
for combining recursive binary segmentation with
pruning by dynamic programming (pruneByDP
).
See PSSeg
for segmenting genomic signals from SNP arrays.
Each dimension of the original signal is scaled before segmentation, using
estimateSd
.
A list with elements:
bkp |
A |
rse |
the residual squared error (RSE) for the successive segmentations |
gain |
The gain provided by each breakpoints in terms of difference between RSE |
Morgane Pierre-Jean and Pierre Neuvial
Gey, S., & Lebarbier, E. (2008). Using CART to Detect Multiple Change Points in the Mean for Large Sample. http://hal.archives-ouvertes.fr/hal-00327146/
PSSeg
, jointSeg
, doRBS
,
pruneByDP
p <- 2 trueK <- 10 len <- 1e4 sim <- randomProfile(len, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- segmentByRBS(Y, K) getTpFp(res$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives cols <- rep(2, K) cols[1:trueK] <- 3 par(mfrow=c(p,1)) for (ii in 1:p) { plot(Y[, ii], pch=19, cex=0.2) abline(v=res$bkp[1:trueK], col= cols) abline(v=sim$bkp, col=8, lty=2) } ## NA:s in one dimension at a true breakpoint jj <- sim$bkp[1] Y[jj-seq(-10, 10), p] <- NA res2 <- segmentByRBS(Y, K) getTpFp(res2$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives ## NA:s in both dimensions at a true breakpoint Y[jj-seq(-10, 10), ] <- NA res3 <- segmentByRBS(Y, K) getTpFp(res3$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives
p <- 2 trueK <- 10 len <- 1e4 sim <- randomProfile(len, trueK, 1, p) Y <- sim$profile K <- 2*trueK res <- segmentByRBS(Y, K) getTpFp(res$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives cols <- rep(2, K) cols[1:trueK] <- 3 par(mfrow=c(p,1)) for (ii in 1:p) { plot(Y[, ii], pch=19, cex=0.2) abline(v=res$bkp[1:trueK], col= cols) abline(v=sim$bkp, col=8, lty=2) } ## NA:s in one dimension at a true breakpoint jj <- sim$bkp[1] Y[jj-seq(-10, 10), p] <- NA res2 <- segmentByRBS(Y, K) getTpFp(res2$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives ## NA:s in both dimensions at a true breakpoint Y[jj-seq(-10, 10), ] <- NA res3 <- segmentByRBS(Y, K) getTpFp(res3$bkp, sim$bkp, tol=10, relax = -1) ## true and false positives