Skip to contents

Compare two VCF/BCF files reporting various statistics

Usage

vcfcomp(
  test,
  truth,
  formats = c("DS", "GT"),
  stats = "r2",
  by.sample = FALSE,
  by.variant = FALSE,
  flip = FALSE,
  names = NULL,
  bins = NULL,
  af = NULL,
  out = NULL,
  choose_random_start = FALSE,
  return_pse_sites = FALSE,
  ...
)

Arguments

test

path to the comparision file (test), which can be a VCF/BCF file, vcftable object or saved RDS file.

truth

path to the baseline file (truth),which can be a VCF/BCF file, vcftable object or saved RDS file.

formats

character vector. the FORMAT tags to extract for the test and truth respectively. default c("DS", "GT") extracts 'DS' of the target and 'GT' of the truth.

stats

the statistics to be calculated. supports the following. "r2": the Pearson correlation coefficient square. "f1": the F1-score, good balance between sensitivity and precision. "nrc": the Non-Reference Concordance rate "pse": the Phasing Switch Error rate

by.sample

logical. calculate sample-wise concordance, which can be stratified by MAF bin.

by.variant

logical. calculate variant-wise concordance, which can be stratified by MAF bin. If by.sample is TRUE, then do sample-wise calculation only regardless the value of by.variant. If both by.sample and by.variant are FALSE, then do calculations for all samples and variants together in a bin.

flip

logical. flip the ref and alt variants

names

character vector. reset samples' names in the test VCF.

bins

numeric vector. break statistics into allele frequency bins.

af

file path to allele frequency text file or saved RDS file.

out

output prefix for saving objects into RDS file

choose_random_start

choose random start for stats="pse"

return_pse_sites

boolean. return phasing switch error sites

...

options passed to vcftable

Value

a list of various statistics

Details

vcfcomp implements various statisitcs to compare two VCF/BCF files, e.g. report genotype concocrdance, correlation stratified by allele frequency.

Author

Zilong Li zilong.dk@gmail.com

Examples

library('vcfppR')
test <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR")
truth <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR")
samples <- "HG00133,HG00143,HG00262"
res <- vcfcomp(test, truth, stats="f1", samples=samples, setid=TRUE)
str(res)