Skip to content

fix: expand() handles NULL geno data in sites-only VCFs (#72)#100

Open
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/72-expand-null-geno
Open

fix: expand() handles NULL geno data in sites-only VCFs (#72)#100
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/72-expand-null-geno

Conversation

@jmg421

@jmg421 jmg421 commented Jul 1, 2026

Copy link
Copy Markdown

Summary

Fixes #72.

expand() errored with 'data' must be of a vector type, was 'NULL' when called on sites-only VCFs (e.g., the GATK gnomAD allele frequency resource) that declare FORMAT fields in the header but have no sample columns.

Root Cause

Sites-only VCFs have 0 sample columns, but readVcf() still creates geno slots as list-matrices with dimensions (nrow × 0). When expand() processes Number=R fields (like AD), it calls .expandAD() with this empty list-matrix. Inside .expandAD():

  1. is.list(AD) → TRUE (a list-matrix is still a list)
  2. length(AD) → 0 (no elements since 0 columns)
  3. unlist(AD) → NULL
  4. array(NULL, c(idxlen, xcols, 2L))ERROR

Fix

Added an early-return guard in .expandAD(): when the input list has length 0 or xcols (number of samples) is 0, return an appropriately-shaped empty array(integer(0L), c(idxlen, xcols, 2L)) instead of attempting to process non-existent allele depth data.

Testing

  • Added inst/unitTests/cases/sites_only.vcf — a minimal sites-only VCF with multi-allelic variants and FORMAT headers but no sample columns.
  • Added test_expand_sitesOnly_issue72 unit test verifying:
    • expand() succeeds on a 0-sample VCF
    • Multi-allelic rows are correctly expanded (3 rows → 6)
    • INFO Number=A fields are correctly flattened
  • All existing expand tests continue to pass.

Agentic tooling disclosure

This PR was produced with the assistance of Kiro CLI (Amazon's AI coding agent). All changes were reviewed and verified by the author.

)

Sites-only VCFs (e.g., gnomAD allele frequency files) declare FORMAT
fields in the header but have no sample columns. This results in geno
data stored as empty list-matrices (nrow × 0). When expand() attempted
to process Number=R fields (like AD), .expandAD() would call
array(NULL, ...) because unlist() on an empty list returns NULL.

Fix: add an early return in .expandAD() when the input list is empty
or xcols (number of samples) is 0, returning an appropriately-shaped
empty array instead.

Also adds a sites-only test VCF and corresponding unit test.

Closes Bioconductor#72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expand Error data is NULL

1 participant