OCR sous Linux

Le reconnaissance optique de caractères est possible sous Linux avec différents outils comme Tesseract et gImageReader.

Ce dernier étant une interface graphique "conviviale" pour Tesseract qui fonctionne habituellement uniquement en ligne de commande.

Pour installer ces logiciels il faut lancer dans un terminal les commandes suivantes:

sudo add-apt-repository ppa:sandromani/gimagereader
sudo apt-get update
sudo apt-get install gimagereader-gtk tesseract-ocr tesseract-ocr-fra tesseract-ocr-eng

Cette procédure fonctionne pour Ubuntu 14.04-16.06 et bien sûr Linux Mint 17-18.

Un raccourci gImageReader est créé dans la section graphisme. Les deux derniers paramètres indiquent l'installation des langues françaises et anglaises. Il y en a beaucoup d'autres disponibles:

tesseract-ocr-afr - tesseract-ocr language files for Afrikaans
tesseract-ocr-all - Tesseract OCR with all language packages
tesseract-ocr-amh - tesseract-ocr language files for Amharic
tesseract-ocr-ara - tesseract-ocr language files for Arabic
tesseract-ocr-asm - tesseract-ocr language files for Assamese
tesseract-ocr-aze - tesseract-ocr language files for Azerbaijani
tesseract-ocr-aze-cyrl - tesseract-ocr language files for Azerbaijani (Cyrillic)
tesseract-ocr-bel - tesseract-ocr language files for Belarusian
tesseract-ocr-ben - tesseract-ocr language files for Bengali
tesseract-ocr-bod - tesseract-ocr language files for Tibetan Standard
tesseract-ocr-bos - tesseract-ocr language files for Bosnian
tesseract-ocr-bul - tesseract-ocr language files for Bulgarian
tesseract-ocr-cat - tesseract-ocr language files for Catalan
tesseract-ocr-ceb - tesseract-ocr language files for Cebuano
tesseract-ocr-ces - tesseract-ocr language files for Czech
tesseract-ocr-chi-sim - tesseract-ocr language files for Simplified Chinese
tesseract-ocr-chi-tra - tesseract-ocr language files for Traditional Chinese
tesseract-ocr-chr - tesseract-ocr language files for Cherokee
tesseract-ocr-cym - tesseract-ocr language files for Welsh
tesseract-ocr-dan - tesseract-ocr language files for Danish
tesseract-ocr-dan-frak - tesseract-ocr language files for Danish (Fraktur)
tesseract-ocr-deu - tesseract-ocr language files for German
tesseract-ocr-deu-frak - tesseract-ocr language files for German Fraktur
tesseract-ocr-dev - transitional dummy package
tesseract-ocr-dzo - tesseract-ocr language files for Dzongkha
tesseract-ocr-ell - tesseract-ocr language files for Greek
tesseract-ocr-enm - tesseract-ocr language files for Middle English
tesseract-ocr-epo - tesseract-ocr language files for Esperanto
tesseract-ocr-equ - tesseract-ocr language files for equations
tesseract-ocr-est - tesseract-ocr language files for Estonian
tesseract-ocr-eus - tesseract-ocr language files for Basque
tesseract-ocr-fas - tesseract-ocr language files for Persian
tesseract-ocr-fin - tesseract-ocr language files for Finnish
tesseract-ocr-frk - tesseract-ocr language files for Frankish
tesseract-ocr-frm - tesseract-ocr language files for Middle French
tesseract-ocr-gle - tesseract-ocr language files for Irish
tesseract-ocr-gle-uncial - tesseract-ocr language files for Irish (Uncial)
tesseract-ocr-glg - tesseract-ocr language files for Galician
tesseract-ocr-grc - tesseract-ocr language files for Ancient Greek
tesseract-ocr-guj - tesseract-ocr language files for Gujarati
tesseract-ocr-hat - tesseract-ocr language files for Hatian
tesseract-ocr-heb - tesseract-ocr language files for Hebrew
tesseract-ocr-hin - tesseract-ocr language files for Hindi
tesseract-ocr-hrv - tesseract-ocr language files for Croatian
tesseract-ocr-hun - tesseract-ocr language files for Hungarian
tesseract-ocr-iku - tesseract-ocr language files for Inuktitut
tesseract-ocr-ind - tesseract-ocr language files for Indonesian
tesseract-ocr-isl - tesseract-ocr language files for Icelandic
tesseract-ocr-ita - tesseract-ocr language files for Italian
tesseract-ocr-ita-old - tesseract-ocr language files for Old Italian
tesseract-ocr-jav - tesseract-ocr language files for Javanese
tesseract-ocr-jpn - tesseract-ocr language files for Japanese
tesseract-ocr-kan - tesseract-ocr language files for Kannada
tesseract-ocr-kat - tesseract-ocr language files for Georgian
tesseract-ocr-kat-old - tesseract-ocr language files for Old Georgian
tesseract-ocr-kaz - tesseract-ocr language files for Kazakh
tesseract-ocr-khm - tesseract-ocr language files for Khmer
tesseract-ocr-kir - tesseract-ocr language files for Kyrgyz
tesseract-ocr-kor - tesseract-ocr language files for Korean
tesseract-ocr-kur - tesseract-ocr language files for Kurdish
tesseract-ocr-lao - tesseract-ocr language files for Lao
tesseract-ocr-lat - tesseract-ocr language files for Latin
tesseract-ocr-lav - tesseract-ocr language files for Latvian
tesseract-ocr-lit - tesseract-ocr language files for Lithuanian
tesseract-ocr-mal - tesseract-ocr language files for Malayalam
tesseract-ocr-mar - tesseract-ocr language files for Marathi
tesseract-ocr-mkd - tesseract-ocr language files for Macedonian
tesseract-ocr-mlt - tesseract-ocr language files for Maltese
tesseract-ocr-msa - tesseract-ocr language files for Malay
tesseract-ocr-mya - tesseract-ocr language files for Burmese
tesseract-ocr-nep - tesseract-ocr language files for Nepali
tesseract-ocr-nld - tesseract-ocr language files for Dutch
tesseract-ocr-nor - tesseract-ocr language files for Norwegian
tesseract-ocr-ori - tesseract-ocr language files for Oriya
tesseract-ocr-osd - tesseract-ocr language files for script and orientation
tesseract-ocr-pan - tesseract-ocr language files for Punjabi
tesseract-ocr-pol - tesseract-ocr language files for Polish
tesseract-ocr-por - tesseract-ocr language files for Portuguese
tesseract-ocr-pus - tesseract-ocr language files for Pashto
tesseract-ocr-ron - tesseract-ocr language files for Romanian
tesseract-ocr-rus - tesseract-ocr language files for Russian
tesseract-ocr-san - tesseract-ocr language files for Sanskrit
tesseract-ocr-sin - tesseract-ocr language files for Sinhala
tesseract-ocr-slk - tesseract-ocr language files for Slovak
tesseract-ocr-slk-frak - tesseract-ocr language files for Slovak Fractur
tesseract-ocr-slv - tesseract-ocr language files for Slovenian
tesseract-ocr-spa - tesseract-ocr language files for Spanish
tesseract-ocr-spa-old - tesseract-ocr language files for Old Spanish
tesseract-ocr-sqi - tesseract-ocr language files for Albanian
tesseract-ocr-srp - tesseract-ocr language files for Serbian
tesseract-ocr-srp-latn - tesseract-ocr language files for Serbian (Latin)
tesseract-ocr-swa - tesseract-ocr language files for Swahili
tesseract-ocr-swe - tesseract-ocr language files for Swedish
tesseract-ocr-syr - tesseract-ocr language files for Syriac
tesseract-ocr-tam - tesseract-ocr language files for Tamil
tesseract-ocr-tel - tesseract-ocr language files for Telugu
tesseract-ocr-tgk - tesseract-ocr language files for Tajik
tesseract-ocr-tgl - tesseract-ocr language files for Tagalog
tesseract-ocr-tha - tesseract-ocr language files for Thai
tesseract-ocr-tir - tesseract-ocr language files for Tigrinya
tesseract-ocr-tur - tesseract-ocr language files for Turkish
tesseract-ocr-uig - tesseract-ocr language files for Uyghur
tesseract-ocr-ukr - tesseract-ocr language files for Ukranian
tesseract-ocr-urd - tesseract-ocr language files for Urdu
tesseract-ocr-uzb - tesseract-ocr language files for Uzbek
tesseract-ocr-uzb-cyrl - tesseract-ocr language files for Uzbek (Cyrillic)
tesseract-ocr-vie - tesseract-ocr language files for Vietnamese
tesseract-ocr-yid - tesseract-ocr language files for Yiddish