Compare two VCF/BCF files reporting various statistics
Usage
vcfcomp(
test,
truth,
formats = c("DS", "GT"),
stats = "r2",
by.sample = FALSE,
by.variant = FALSE,
flip = FALSE,
names = NULL,
bins = NULL,
af = NULL,
out = NULL,
choose_random_start = FALSE,
return_pse_sites = FALSE,
...
)
Arguments
- test
path to the comparision file (test), which can be a VCF/BCF file, vcftable object or saved RDS file.
- truth
path to the baseline file (truth),which can be a VCF/BCF file, vcftable object or saved RDS file.
- formats
character vector. the FORMAT tags to extract for the test and truth respectively. default c("DS", "GT") extracts 'DS' of the target and 'GT' of the truth.
- stats
the statistics to be calculated. supports the following. "r2": the Pearson correlation coefficient square. "f1": the F1-score, good balance between sensitivity and precision. "nrc": the Non-Reference Concordance rate "pse": the Phasing Switch Error rate
- by.sample
logical. calculate sample-wise concordance, which can be stratified by MAF bin.
- by.variant
logical. calculate variant-wise concordance, which can be stratified by MAF bin. If by.sample is TRUE, then do sample-wise calculation only regardless the value of by.variant. If both by.sample and by.variant are FALSE, then do calculations for all samples and variants together in a bin.
- flip
logical. flip the ref and alt variants
- names
character vector. reset samples' names in the test VCF.
- bins
numeric vector. break statistics into allele frequency bins.
- af
file path to allele frequency text file or saved RDS file.
- out
output prefix for saving objects into RDS file
- choose_random_start
choose random start for stats="pse"
- return_pse_sites
boolean. return phasing switch error sites
- ...
options passed to
vcftable
Details
vcfcomp
implements various statisitcs to compare two VCF/BCF files,
e.g. report genotype concocrdance, correlation stratified by allele frequency.
Author
Zilong Li zilong.dk@gmail.com
Examples
library('vcfppR')
test <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR")
truth <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR")
samples <- "HG00133,HG00143,HG00262"
res <- vcfcomp(test, truth, stats="f1", samples=samples, setid=TRUE)
str(res)