PDF Analyzer with Local Ollama

A powerful tool to analyze PDF documents and answer questions using Retrieval-Augmented Generation (RAG) with your local Ollama installation. This tool extracts content only from the provided PDF and doesn't use any external knowledge sources.

Features

PDF Text Extraction: Uses pdfplumber for robust text extraction, including Arabic text
Semantic Search: Creates embeddings and uses FAISS for fast similarity search
Local AI: Uses your local Ollama installation for answering questions
Multi-language Support: Works with Arabic, English, and other languages
Two Interfaces: Both web UI (Streamlit) and command-line interface
Source Citation: Shows which parts of the PDF were used to generate answers

Installation

Install Python dependencies:

pip install -r requirements.txt

Install and setup Ollama:
- Download Ollama from https://ollama.ai
- Install and start the Ollama service
- Pull a model (e.g., ollama pull llama3.2)
Verify Ollama is running:

ollama list

Usage

Web Interface (Recommended)

Start the Streamlit app:

streamlit run pdf_analyzer.py

Open your browser to http://localhost:8501
Upload your PDF and click "Process PDF"
Ask questions about the PDF content

Command Line Interface

Run the CLI:

python cli.py "path/to/your/pdf/file.pdf"

Ask questions interactively

Advanced Options

python cli.py "document.pdf" --model llama3.2 --chunk-size 500 --overlap 100 --top-k 5

Configuration

Edit config.py to customize:

Ollama model name
Chunk size and overlap
Embedding model
Number of relevant chunks to retrieve

How It Works

Text Extraction: Extracts text from PDF using pdfplumber
Text Chunking: Splits text into overlapping chunks for better context
Embedding Creation: Creates vector embeddings using SentenceTransformers
Vector Storage: Stores embeddings in FAISS index for fast similarity search
Question Processing:
- Converts question to embedding
- Finds most similar text chunks
- Sends relevant context to Ollama
- Returns AI-generated answer based only on PDF content

Supported Models

Any Ollama model can be used. Popular choices:

llama3.2 (recommended for general use)
mistral
codellama (for code-related documents)
qwen2.5 (good for multilingual content)

Make sure to pull the model first: ollama pull model-name

Troubleshooting

Ollama Connection Issues

Ensure Ollama is running: ollama serve
Check if model is available: ollama list
Verify the model name in configuration

PDF Processing Issues

Ensure the PDF contains extractable text (not just images)
Try with a different PDF to isolate the issue
Check file permissions

Memory Issues

Reduce chunk size in configuration
Use a smaller embedding model
Process smaller PDFs

Example Questions

For an Arabic legal document:

"ما هو موضوع هذا النظام؟" (What is the subject of this system?)
"ما هي المواد المتعلقة بالحكم؟" (What are the articles related to governance?)

For English documents:

"What is the main topic of this document?"
"Summarize the key points"
"What are the requirements mentioned?"

File Structure

pdf_analyzer/
├── pdf_analyzer.py      # Main Streamlit application
├── cli.py              # Command-line interface
├── config.py           # Configuration settings
├── requirements.txt    # Python dependencies
├── README.md          # This file
└── cache/             # Cached indexes (created automatically)

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
1-النظام الأساسي للحكم.pdf		1-النظام الأساسي للحكم.pdf
ENHANCED_FEATURES.md		ENHANCED_FEATURES.md
FASTAPI_DOCS.md		FASTAPI_DOCS.md
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
cli.py		cli.py
config.py		config.py
enhanced_legal_rag_fixed.py		enhanced_legal_rag_fixed.py
fastapi_app.py		fastapi_app.py
pdf_analyzer.py		pdf_analyzer.py
requirements.txt		requirements.txt
run_fastapi.bat		run_fastapi.bat
simple_cli.py		simple_cli.py
test_api.bat		test_api.bat
test_api.py		test_api.py
ultra_simple.py		ultra_simple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Analyzer with Local Ollama

Features

Installation

Usage

Web Interface (Recommended)

Command Line Interface

Advanced Options

Configuration

How It Works

Supported Models

Troubleshooting

Ollama Connection Issues

PDF Processing Issues

Memory Issues

Example Questions

File Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Analyzer with Local Ollama

Features

Installation

Usage

Web Interface (Recommended)

Command Line Interface

Advanced Options

Configuration

How It Works

Supported Models

Troubleshooting

Ollama Connection Issues

PDF Processing Issues

Memory Issues

Example Questions

File Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages