feat: add Pinecone skill for scientific RAG persistence layer#173
Open
immuhammadfurqan wants to merge 3 commits into
Open
feat: add Pinecone skill for scientific RAG persistence layer#173immuhammadfurqan wants to merge 3 commits into
immuhammadfurqan wants to merge 3 commits into
Conversation
Adds a Pinecone skill positioned as the retrieval persistence layer complementing existing database-lookup, paper-lookup, and biopython skills. Includes: - Serverless and pod-based index management - Namespace strategies for organism/study isolation - MongoDB-style metadata filtering - Hybrid dense + sparse BM25 search - Multimodal retrieval with voyage-multimodal-3 - Working scripts for PubMed and radiology indexing - Three reference docs (index types, embedding models, hybrid search) Tested with pinecone>=6.0.0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Pinecone skill positioned as the retrieval persistence layer that complements existing
database-lookup,paper-lookup, andbiopythonskills. While those skills handle direct API access to public databases, Pinecone enables persistent semantic retrieval over embedded scientific data — necessary when repeated API queries are too slow, when working with proprietary research data, or when building multimodal RAG pipelines.What's Included
SKILL.md(403 lines) covering index management, batch upserting, namespaces, metadata filtering, hybrid search, and multimodal retrievalreferences/index_types.md— serverless vs pod tradeoffs, dimension selection table for 12 scientific embedding modelsreferences/scientific_embedding_models.md— domain to model mapping with usage patternsreferences/hybrid_search.md— dense + sparse BM25 setup with alpha-blending tuning guidescripts/index_pubmed.py— working PubMed abstract ingestion pipelinescripts/multimodal_radiology.py— working multimodal (text + image) ingestion pipelineScientific Workflows Covered
Integration Points
Explicitly cross-references and integrates with:
paper-lookup,database-lookup,biopython,pysam,scanpy,scientific-writing,pyhealth.Scanner Results
Validation
Checklist