Skip to content

fix: strand-aware PRECEDEID/FOLLOWID for intergenic variants (#55)#102

Open
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/55-intergenic-strand
Open

fix: strand-aware PRECEDEID/FOLLOWID for intergenic variants (#55)#102
jmg421 wants to merge 1 commit into
Bioconductor:develfrom
jmg421:fix/55-intergenic-strand

Conversation

@jmg421

@jmg421 jmg421 commented Jul 1, 2026

Copy link
Copy Markdown

Summary

Fixes #55.

locateVariants() with IntergenicVariants now correctly classifies flanking genes into PRECEDEID and FOLLOWID based on gene strand, matching Ensembl VEP's upstream_gene_variant / downstream_gene_variant semantics.

Problem

As reported in #55, PRECEDEID and FOLLOWID were assigned based purely on genomic position (right/left of variant) without considering the gene's transcriptional orientation. This meant:

  • A - strand gene to the RIGHT of the variant was placed in PRECEDEID, but the variant is actually downstream of that gene (should be FOLLOWID)
  • A - strand gene to the LEFT of the variant was placed in FOLLOWID, but the variant is actually upstream of that gene (should be PRECEDEID)

The reporter confirmed this by comparing with Ensembl VEP output, which correctly accounts for strand.

Fix

After finding genes in the upstream/downstream positional windows, partition them by strand:

  • PRECEDEID (variant is upstream of gene): + strand genes in downstream window + - strand genes in upstream window
  • FOLLOWID (variant is downstream of gene): + strand genes in upstream window + - strand genes in downstream window

For genomes with all + or * strand annotations, the behavior is identical to before (backward compatible). The fix uses ignore.strand=TRUE for the initial overlap finding to ensure genes on both strands are captured.

Testing

Added test_locateVariants_intergenic_strand_issue55 with a synthetic 4-gene scenario (2 strands × 2 positions) verifying correct strand-aware classification.

Agentic tooling disclosure

This PR was produced with the assistance of Kiro CLI (Amazon's AI coding agent). All changes were reviewed and verified by the author.

…ants (Bioconductor#55)

Previously, locateVariants() classified flanking genes into PRECEDEID
and FOLLOWID based solely on genomic position (left/right of variant)
without considering gene strand. This meant:
- A '-' strand gene to the RIGHT of the variant was incorrectly placed
  in PRECEDEID (should be FOLLOWID, since the variant is downstream of
  that gene's TSS)
- A '-' strand gene to the LEFT of the variant was incorrectly placed
  in FOLLOWID (should be PRECEDEID, since the variant is upstream of
  that gene's TSS)

Fix: after finding genes in the upstream/downstream positional windows,
partition them by strand and assign to the correct ID list:
- PRECEDEID: '+' genes in downstream window + '-' genes in upstream window
- FOLLOWID: '+' genes in upstream window + '-' genes in downstream window

This matches Ensembl VEP's upstream_gene_variant / downstream_gene_variant
classification. For genomes with all '+' or '*' strand annotations, the
behavior is unchanged (backward compatible).

Closes Bioconductor#55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

locateVariants returns genes from both Forward and Reverse strands in PRECEDEID and FOLLOWID

1 participant