HR Buddy is an application which is inspired by HR Chatbot portals. Uses combination of Llama (LLM) + Nomic (Embedding) models. Uses Retrieval Augumented Generation for identify the context strictly from the HR Policies PDF.
flowchart TD
UI["Streamlit UI"]
SESS["Session State<br/>(history, session_id)"]
MDB[("MongoDB<br/>Auth & Chat History")]
subgraph RAG [RAG Engine]
direction LR
subgraph Hybrid [Hybrid Retrieval]
SEM["Semantic Search<br/>ChromaDB + OllamaEmbeddings<br/>(MMR, fetch_k=18)"]
BM25["Keyword Search<br/>BM25 (heading-enriched<br/>chunks, top_k=6)"]
FUSION["Weighted Fusion<br/>(semantic=0.7, bm25=0.3)"]
SEM --> FUSION
BM25 --> FUSION
end
CTX["Context Assembly<br/>(top_k=6 documents)"]
LLM["Ollama Llama 3.2<br/>3B params<br/>(temp=0.1, ctx=4096)"]
Hybrid --> CTX --> LLM
end
UI --> SESS
SESS --> MDB
UI -->|"user input + history"| RAG
LLM -->|"response stream"| UI
Before running the application, ensure your system has the following:
- Docker & Docker Compose installed.
- Hardware: Minimum 8GB RAM (16GB+ recommended) to run the Llama 3.2 model smoothly.
- OS: Linux or macOS (Windows users should use WSL2).
If you are on MacOS / Linux, simply make the shell script executable
chmod +x run.sh
Then, just run the shell script.
./run.sh
Note: If you have any other shell instead of bash, open the first line of run.sh and replace the first line with the shell of your choice.
This script will handle all the setup of Ollama Package and the model and as well as builds the docker container.
- Frontend: Streamlit
- AI/LLM: Ollama (Llama 3.2 3B)
- Embeddings: Nomic Embed Text
- Vector Store: ChromaDB
- Retrieval: Hybrid search (semantic + BM25 keyword)
- Security: Prompt injection defense (system message isolation, input sanitization)
- Database: MongoDB (for user authentication and chat history)
- Orchestration: Langchain & Docker
The app uses hybrid search combining semantic (vector) and keyword (BM25) retrieval:
| Parameter | Default | Description |
|---|---|---|
enabled |
true |
Toggle hybrid search on/off |
semantic_weight |
0.7 |
Weight for semantic (embedding) similarity scores |
bm25_weight |
0.3 |
Weight for BM25 keyword matching scores |
top_k |
6 |
Number of final documents returned to the LLM |
fetch_k |
18 |
Documents fetched per retriever before fusion |
Disable hybrid search ("enabled": false) to fall back to semantic-only retrieval.
The RAG engine implements defense-in-depth against prompt injection via three layers:
1. System Message Isolation
Instructions and retrieved context are sent as a role: system message (authoritative in Llama's chat template), separate from the user message. This prevents user input from overriding behavior rules.
2. Input Sanitization Known injection patterns are stripped from all user-facing text before it reaches the LLM:
- Bracket-based overrides (
[SYSTEM UPDATE],[OVERRIDE],[TASK]) - "Ignore previous instructions" variants
- Mode-switching attacks ("you are now in developer mode")
- Prompt extraction attempts ("output your system prompt")
- Session ID is sanitized to alphanumeric characters only
3. Structured History
Conversation history is reconstructed as real user / assistant message pairs rather than flat text, preventing re-injection of successful attacks across conversation turns.
By default, the application uses the provided 2016 HR Manual. To use your own data:
- Delete the existing PDF in the
rag_source/directory. - Place your company's HR policy PDF into
rag_source/. - Update the
PDF_PATHvariable inmain.pyif the filename changes. - Restart the containers to trigger a fresh vector embedding.
Ollama Connection Refused inside Docker: If the Streamlit app cannot reach Ollama, you need to configure Ollama to listen to the Docker bridge network.
- Run
sudo systemctl edit ollama.service - Add the following under the
[Service]block:Environment="OLLAMA_HOST=0.0.0.0" - Save, then run
sudo systemctl daemon-reloadandsudo systemctl restart ollama.
ToastCoder * GitHub: @ToastCoder