fix: expand() handles NULL geno data in sites-only VCFs (#72)#100
Open
jmg421 wants to merge 1 commit into
Open
fix: expand() handles NULL geno data in sites-only VCFs (#72)#100jmg421 wants to merge 1 commit into
jmg421 wants to merge 1 commit into
Conversation
) Sites-only VCFs (e.g., gnomAD allele frequency files) declare FORMAT fields in the header but have no sample columns. This results in geno data stored as empty list-matrices (nrow × 0). When expand() attempted to process Number=R fields (like AD), .expandAD() would call array(NULL, ...) because unlist() on an empty list returns NULL. Fix: add an early return in .expandAD() when the input list is empty or xcols (number of samples) is 0, returning an appropriately-shaped empty array instead. Also adds a sites-only test VCF and corresponding unit test. Closes Bioconductor#72
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #72.
expand()errored with'data' must be of a vector type, was 'NULL'when called on sites-only VCFs (e.g., the GATK gnomAD allele frequency resource) that declare FORMAT fields in the header but have no sample columns.Root Cause
Sites-only VCFs have 0 sample columns, but
readVcf()still creates geno slots as list-matrices with dimensions (nrow × 0). Whenexpand()processesNumber=Rfields (like AD), it calls.expandAD()with this empty list-matrix. Inside.expandAD():is.list(AD)→ TRUE (a list-matrix is still a list)length(AD)→ 0 (no elements since 0 columns)unlist(AD)→ NULLarray(NULL, c(idxlen, xcols, 2L))→ ERRORFix
Added an early-return guard in
.expandAD(): when the input list has length 0 orxcols(number of samples) is 0, return an appropriately-shaped emptyarray(integer(0L), c(idxlen, xcols, 2L))instead of attempting to process non-existent allele depth data.Testing
inst/unitTests/cases/sites_only.vcf— a minimal sites-only VCF with multi-allelic variants and FORMAT headers but no sample columns.test_expand_sitesOnly_issue72unit test verifying:expand()succeeds on a 0-sample VCFAgentic tooling disclosure
This PR was produced with the assistance of Kiro CLI (Amazon's AI coding agent). All changes were reviewed and verified by the author.