PublicSoftTools
Tools16 min read·PublicSoftTools Team·May 2026

OCR Online Free — Extract Text from Images

The free OCR tool uses optical character recognition to extract text from any image — scanned documents, photos of signs, screenshots, and more. It supports 15+ languages and runs entirely in your browser using Tesseract.js, so no image is ever uploaded to a server.

What Is OCR and How Does It Work?

Optical Character Recognition (OCR) is the process of identifying text within an image and converting it into editable, searchable characters. When you scan a printed page or photograph a document, the result is a raster image — a grid of pixels with no concept of letters or words, just colored dots.

Modern OCR engines analyze that pixel grid using trained neural networks. The recognition pipeline typically involves several stages:

  1. Pre-processing: Converting to grayscale, adjusting contrast, and removing noise
  2. Binarization: Converting the grayscale image to pure black and white based on a threshold, making text/background contrast absolute
  3. Layout analysis: Detecting text regions, columns, paragraphs, and lines
  4. Character segmentation: Identifying individual character boundaries within each text line
  5. Recognition: Matching each segmented character against trained character models to determine the most likely character
  6. Post-processing: Using dictionary and language model information to correct recognition errors

About Tesseract.js

This tool uses Tesseract.js, a WebAssembly port of the Tesseract OCR engine. Tesseract was originally developed at Hewlett-Packard Laboratories in the 1980s, released as open source in 2005, and has been maintained by Google since 2006. It is one of the most accurate open-source OCR engines available.

The full Tesseract engine supports over 100 languages. The browser-based version (Tesseract.js) supports 15+ of the most widely used languages — enough for the vast majority of real-world document recognition tasks. The recognition engine runs as a WebAssembly module directly in your browser, meaning no image data is ever transmitted to any server.

How to Extract Text from an Image

  1. Open the OCR — Image to Text tool
  2. Click Choose File or drag an image onto the upload zone. Supported formats: PNG, JPG, GIF, BMP, TIFF, WebP
  3. Select the language of the text in the image from the language dropdown
  4. Click Recognise Text
  5. The extracted text appears in the output panel. Click Copy to copy it to your clipboard

Recognition time depends on image size and complexity. A simple typed page typically takes 3–10 seconds. Dense multi-column layouts or images with small text may take longer.

Supported Languages

LanguageScriptNotes
EnglishLatinBest accuracy; most training data available
ArabicArabic (RTL)Right-to-left text; clear print performs best
Chinese (Simplified)HanPrinted text; handwriting not supported
French, German, Spanish, ItalianLatinHigh accuracy on typed text; diacritics (é, ü, ñ) recognised
HindiDevanagariGood on clear, printed text
JapaneseHiragana, Katakana, Kanji (mixed)Printed text; vertical layouts may vary in accuracy
KoreanHangulHigh accuracy on printed text
Russian, UkrainianCyrillicHigh accuracy on standard printed documents
PortugueseLatinFull diacritic support (ã, ç, ê)
Dutch, Polish, SwedishLatinGood accuracy on typed text

Tips for Better OCR Accuracy

Use high-resolution images — 300 DPI minimum

OCR accuracy improves significantly with higher resolution. At low resolution, characters that look distinct to human eyes become ambiguous blobs of pixels. Aim for at least 300 DPI for scanned documents — the same standard used by professional document management systems. If photographing text with a phone, ensure the image is in focus, well-lit, and held parallel to the document surface. Blurry or low-light photos are the most common cause of poor OCR results.

The most common OCR character confusions caused by low resolution:

At 300+ DPI, these distinctions become clear to the recognition engine and accuracy improves dramatically.

Straighten the image before processing

Tesseract handles moderate skew — pages photographed at a slight angle — but results are better on straight, flat scans. If your image is rotated more than 10–15 degrees, use your phone's crop and straighten tool or an image editor before uploading. The OCR engine reads lines of text horizontally; significant rotation forces it to re-estimate line orientation, which reduces accuracy.

Choose the correct language

Selecting the wrong language is a common and easily overlooked mistake. If a document is in French but you leave the language set to English, the OCR engine will misrecognise accented characters like é, à, and ç — substituting similar-looking ASCII characters. Always match the language setting to the language of the text in the image.

Typed text significantly outperforms handwriting

Tesseract is trained on printed and typed text. Handwritten text recognition requires a completely different model (typically based on sequence-to-sequence recurrent networks rather than character segmentation). Expect partial results at best with handwriting. If you need to digitise handwritten notes, either type them manually or use a service specifically trained for handwriting recognition.

Improve contrast and remove shadows before scanning

OCR accuracy depends heavily on the contrast between text and background. A page with coffee stains, yellow aging, or uneven lighting reduces contrast and increases recognition errors. Before scanning old or stained documents, photocopy them on a modern copier with contrast enhancement, then scan the copy. For digital images, increase contrast in any photo editor before running OCR.

Common OCR Use Cases

Use caseInput typeWorkflow
Editing a received PDFScanned PDF (image-based)Screenshot each page → OCR → copy text → paste into Word
Digitising receiptsPhone photo of paper receiptPhoto → OCR → copy amounts into spreadsheet
Extracting data from formsPrinted form scanScan → OCR → copy field values for data entry
Archiving printed booksBook page scanHigh-res scan → OCR → text file → searchable digital copy
Reading text in a photoPhoto of sign, menu, labelPhoto → OCR → readable text
Making a PDF searchableImage-only PDFExport pages as images → OCR each → combine extracted text

OCR and Scanned PDFs

If you have a scanned PDF — a PDF where the pages are images rather than selectable text — you can take a screenshot or export a page as an image and run it through the OCR tool. Many PDF readers (Adobe Acrobat Reader, Firefox, Chrome) let you export individual pages as PNG or JPG.

Once you have the extracted text, you can paste it into a Word document or use the PDF ↔ Word Converter to work with the content further. For selectable-text PDFs (not scanned), the PDF → Word converter extracts the text directly without needing OCR.

Privacy: Your Images Never Leave Your Device

Tesseract.js runs entirely in your browser as a WebAssembly module. Your image is processed in local memory — no pixel data is sent to any server. The first time you use the tool, the language model files (typically 1–5 MB per language) are downloaded and cached in your browser. Subsequent uses of the same language work offline without re-downloading. This makes the tool appropriate for confidential documents, ID scans, medical records, or any sensitive material.

Extract Text from Images Free

15+ languages, no uploads, no account. Optical character recognition in your browser.

Open OCR — Image to Text