Skip to content

fix: writeVcf preserves fileDate and suppresses bare contig lines (#78)#101

Open
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/78-writeVcf-roundtrip
Open

fix: writeVcf preserves fileDate and suppresses bare contig lines (#78)#101
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/78-writeVcf-roundtrip

Conversation

@jmg421

@jmg421 jmg421 commented Jul 1, 2026

Copy link
Copy Markdown

Summary

Fixes #78.

Achieves faithful writeVcf() + readVcf() round-trips by addressing both sources of metadata discrepancy reported in the issue.

Changes

1. Preserve existing fileDate

writeVcf() previously unconditionally replaced the fileDate header with Sys.time(). Now it preserves any existing fileDate value, only adding a new one when none exists. This retains provenance information (e.g., the original VCF creation date).

2. Suppress uninformative contig lines

When seqinfo() contains only NA seqlengths and NA genomes (common for structural variant VCFs), writeVcf() no longer emits bare ##contig=<ID=X> lines. These lines carry no useful information and were the primary cause of round-trip mismatches — on re-read, they altered the header structure (added a contig slot that didn't exist in the original).

Result

fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
first <- readVcf(fl)
writeVcf(first, out)
roundtrip <- readVcf(out)
all.equal(roundtrip, first)  # TRUE

Testing

  • Added test_writeVcf_roundtrip_issue78 verifying perfect round-trips for both structural.vcf (all-NA seqinfo) and ex2.vcf (real seqinfo with contig lines).
  • Confirms fileDate is preserved verbatim.
  • All existing writeVcf tests continue to pass.

Agentic tooling disclosure

This PR was produced with the assistance of Kiro CLI (Amazon's AI coding agent). All changes were reviewed and verified by the author.

…oconductor#78)

Two changes to improve writeVcf + readVcf round-trip fidelity:

1. Preserve existing fileDate: writeVcf no longer unconditionally
   replaces the fileDate header with the current date. If a fileDate
   exists in the VCF metadata, it is written as-is. A new fileDate is
   only added when none exists.

2. Suppress uninformative contig lines: when seqinfo has all-NA
   seqlengths and all-NA genomes (e.g., structural variants with
   unknown assembly), writeVcf no longer emits bare ##contig=<ID=X>
   lines that carry no useful information and break round-trip equality.

Together these fixes allow:
  all.equal(readVcf(writeVcf(x, out)), x) == TRUE
for all package example VCFs (structural.vcf, ex2.vcf).

Closes Bioconductor#78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faithfully recapitulate metadata via a round-trip through writeVcf + readVcf.

1 participant