Skip to content

fix: expand() converts Number=A to Number=1 in headers (#79)#99

Open
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/79-expand-number-A
Open

fix: expand() converts Number=A to Number=1 in headers (#79)#99
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/79-expand-number-A

Conversation

@jmg421

@jmg421 jmg421 commented Jul 1, 2026

Copy link
Copy Markdown

Summary

Fixes #79.

After expand(), each row contains exactly one ALT allele, so per-ALT fields (Number=A) are now scalar. The INFO and FORMAT header metadata should reflect this by updating Number=ANumber=1.

Problem

As reported in #79, a round-trip through writeVcf() + readVcf() + expand() produced a CompressedNumericList for fields like AF instead of a plain numeric vector. This happened because the header still declared Number=A, causing readVcf() to parse the scalar values back as list columns.

Fix

  • Extracted a clean .updateHeaderNumberA() helper that updates both INFO and FORMAT headers using exact == "A" matching (not grep).
  • Applied to both code paths: the multi-allele expansion path and the biallelic early-return path (which was previously missed).
  • Removed the old inline grep("A", num) approach which was too broad (would match any Number string containing 'A').

Testing

  • Added test_expand_numberA_to_1_issue79 covering:
    • Multi-allele VCF (main expansion path)
    • Biallelic VCF (early-return path)
    • Round-trip: expand → writeVcf → readVcf → expand keeps AF as numeric
  • All existing expand tests continue to pass.

Agentic tooling disclosure

This PR was produced with the assistance of Kiro CLI (Amazon's AI coding agent) and Jarvis CLI (a custom agentic tool). All changes were reviewed and verified by the author.

…ioconductor#79)

After expand(), each row has exactly one ALT allele, so per-ALT fields
(Number=A) are now scalar. Update both INFO and FORMAT header metadata
to reflect this.

Previously, the fix was only partially applied (grep-based, multi-allele
path only). This refactors into a clean .updateHeaderNumberA() helper
that handles both code paths and uses exact == 'A' matching.

Fixes the round-trip discrepancy where writeVcf() + readVcf() + expand()
would produce CompressedNumericList for AF instead of plain numeric.

Closes Bioconductor#79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert Number=A to Number=1 when creating an ExpandedVCF

1 participant