Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
-
Updated
Jun 16, 2026 - Python
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.
A program for extracting hard coded (burned in) subtitle from a video and generating an external subtitle.
Boosting Document Intelligence
This repository offers a simple OCR library that leverages system APIs like VisionKit and Media OCR for accurate text recognition. Check out the examples and start integrating with ease! 🐙✨
Open Models For Document Intelligence
Add a description, image, and links to the chineseocr topic page so that developers can more easily learn about it.
To associate your repository with the chineseocr topic, visit your repo's landing page and select "manage topics."