fix: strand-aware PRECEDEID/FOLLOWID for intergenic variants (#55)#102
Open
jmg421 wants to merge 1 commit into
Open
fix: strand-aware PRECEDEID/FOLLOWID for intergenic variants (#55)#102jmg421 wants to merge 1 commit into
jmg421 wants to merge 1 commit into
Conversation
…ants (Bioconductor#55) Previously, locateVariants() classified flanking genes into PRECEDEID and FOLLOWID based solely on genomic position (left/right of variant) without considering gene strand. This meant: - A '-' strand gene to the RIGHT of the variant was incorrectly placed in PRECEDEID (should be FOLLOWID, since the variant is downstream of that gene's TSS) - A '-' strand gene to the LEFT of the variant was incorrectly placed in FOLLOWID (should be PRECEDEID, since the variant is upstream of that gene's TSS) Fix: after finding genes in the upstream/downstream positional windows, partition them by strand and assign to the correct ID list: - PRECEDEID: '+' genes in downstream window + '-' genes in upstream window - FOLLOWID: '+' genes in upstream window + '-' genes in downstream window This matches Ensembl VEP's upstream_gene_variant / downstream_gene_variant classification. For genomes with all '+' or '*' strand annotations, the behavior is unchanged (backward compatible). Closes Bioconductor#55
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #55.
locateVariants()withIntergenicVariantsnow correctly classifies flanking genes into PRECEDEID and FOLLOWID based on gene strand, matching Ensembl VEP'supstream_gene_variant/downstream_gene_variantsemantics.Problem
As reported in #55, PRECEDEID and FOLLOWID were assigned based purely on genomic position (right/left of variant) without considering the gene's transcriptional orientation. This meant:
-strand gene to the RIGHT of the variant was placed in PRECEDEID, but the variant is actually downstream of that gene (should be FOLLOWID)-strand gene to the LEFT of the variant was placed in FOLLOWID, but the variant is actually upstream of that gene (should be PRECEDEID)The reporter confirmed this by comparing with Ensembl VEP output, which correctly accounts for strand.
Fix
After finding genes in the upstream/downstream positional windows, partition them by strand:
+strand genes in downstream window +-strand genes in upstream window+strand genes in upstream window +-strand genes in downstream windowFor genomes with all
+or*strand annotations, the behavior is identical to before (backward compatible). The fix usesignore.strand=TRUEfor the initial overlap finding to ensure genes on both strands are captured.Testing
Added
test_locateVariants_intergenic_strand_issue55with a synthetic 4-gene scenario (2 strands × 2 positions) verifying correct strand-aware classification.Agentic tooling disclosure
This PR was produced with the assistance of Kiro CLI (Amazon's AI coding agent). All changes were reviewed and verified by the author.