Wikisource:OCR
This observatory of OCR systems lists known optical character recognition (OCR) systems which could be useful to Wikimedians. All systems — open, free or paid — are relevant to be listed and documented below. If you have used an effective OCR system, please list it below (optionally with some comments).
Commons.js
edit- Wikisource:Google OCR (old)
- Wikisource:Tesseract OCR (new)
Extension
edit- Section to expand.
- mw:Help:Extension:Wikisource/Wikimedia OCR. Based on Wikimedia's Google OCR and Tesseract OCR cited above.
Free
editOnline and free
edit- Section to expand.
- Wikimedia https://ocr.wmcloud.org . Based on Wikimedia's Google OCR and Tesseract OCR cited above. Image input only (no pdf).
Other free systems
edit- Kraken https://kraken.re/master/ ("optimized for historical and non-Latin script material")
- models for 17th century French, see [1]
- catalog of several training sets for various languages and types of documents: https://htr-united.github.io/catalog.html
Paid system
edit- Section to expand.