Why is my OCR result inaccurate?

The most common causes: low resolution (use at least 300 DPI), blurry or poorly lit photos, significant page skew, wrong language selected, or handwritten text (Tesseract is trained on printed text, not handwriting). Higher resolution and better lighting dramatically improve accuracy.

Are my images uploaded to a server?

No. Tesseract.js runs entirely in your browser as a WebAssembly module. Your images are processed in local memory and never sent to any server. This makes the tool safe for confidential documents, medical records, and sensitive materials.

Can OCR read handwritten text?

Tesseract is optimised for printed and typed text. Handwritten text recognition requires a different model and produces poor results with this tool. Expect partial, error-prone output for handwriting. For important handwritten documents, manual transcription or a dedicated handwriting recognition app is more reliable.

How do I extract text from a scanned PDF?

Take a screenshot or export individual PDF pages as PNG or JPG images (most PDF readers support this). Then upload each page image to the OCR tool. For PDFs with selectable text (not scanned), use the PDF to Word converter instead — it extracts text directly without needing OCR.

Tools16 min read·PublicSoftTools Team·May 2026

OCR Online Free — Extract Text from Images

Q: What is OCR and how does it work?

OCR (Optical Character Recognition) converts image pixels into editable text. The process involves pre-processing, binarization, layout analysis, character segmentation, neural network recognition, and post-processing with dictionary correction. Modern OCR engines use LSTM neural networks trained on millions of character examples.

The free OCR tool uses optical character recognition to extract text from any image — scanned documents, photos of signs, screenshots, and more. It supports 15+ languages and runs entirely in your browser using Tesseract.js, so no image is ever uploaded to a server.

What Is OCR and How Does It Work?

Optical Character Recognition (OCR) is the process of identifying text within an image and converting it into editable, searchable characters. When you scan a printed page or photograph a document, the result is a raster image — a grid of pixels with no concept of letters or words, just colored dots.

Modern OCR engines analyze that pixel grid using trained neural networks. The recognition pipeline typically involves several stages:

Pre-processing: Converting to grayscale, adjusting contrast, and removing noise
Binarization: Converting the grayscale image to pure black and white based on a threshold, making text/background contrast absolute
Layout analysis: Detecting text regions, columns, paragraphs, and lines
Character segmentation: Identifying individual character boundaries within each text line
Recognition: Matching each segmented character against trained character models to determine the most likely character
Post-processing: Using dictionary and language model information to correct recognition errors

About Tesseract.js

This tool uses Tesseract.js, a WebAssembly port of the Tesseract OCR engine. Tesseract was originally developed at Hewlett-Packard Laboratories in the 1980s, released as open source in 2005, and has been maintained by Google since 2006. It is one of the most accurate open-source OCR engines available.

The full Tesseract engine supports over 100 languages. The browser-based version (Tesseract.js) supports 15+ of the most widely used languages — enough for the vast majority of real-world document recognition tasks. The recognition engine runs as a WebAssembly module directly in your browser, meaning no image data is ever transmitted to any server.

How to Extract Text from an Image

Open the OCR — Image to Text tool
Click Choose File or drag an image onto the upload zone. Supported formats: PNG, JPG, GIF, BMP, TIFF, WebP
Select the language of the text in the image from the language dropdown
Click Recognise Text
The extracted text appears in the output panel. Click Copy to copy it to your clipboard

Recognition time depends on image size and complexity. A simple typed page typically takes 3–10 seconds. Dense multi-column layouts or images with small text may take longer.

Supported Languages

Language	Script	Notes
English	Latin	Best accuracy; most training data available
Arabic	Arabic (RTL)	Right-to-left text; clear print performs best
Chinese (Simplified)	Han	Printed text; handwriting not supported
French, German, Spanish, Italian	Latin	High accuracy on typed text; diacritics (é, ü, ñ) recognised
Hindi	Devanagari	Good on clear, printed text
Japanese	Hiragana, Katakana, Kanji (mixed)	Printed text; vertical layouts may vary in accuracy
Korean	Hangul	High accuracy on printed text
Russian, Ukrainian	Cyrillic	High accuracy on standard printed documents
Portuguese	Latin	Full diacritic support (ã, ç, ê)
Dutch, Polish, Swedish	Latin	Good accuracy on typed text

Tips for Better OCR Accuracy

Use high-resolution images — 300 DPI minimum

OCR accuracy improves significantly with higher resolution. At low resolution, characters that look distinct to human eyes become ambiguous blobs of pixels. Aim for at least 300 DPI for scanned documents — the same standard used by professional document management systems. If photographing text with a phone, ensure the image is in focus, well-lit, and held parallel to the document surface. Blurry or low-light photos are the most common cause of poor OCR results.

The most common OCR character confusions caused by low resolution:

1 (one) and l (lowercase L) and I (uppercase I)
0 (zero) and O (letter O)
rn and m
cl and d
8 and B

At 300+ DPI, these distinctions become clear to the recognition engine and accuracy improves dramatically.

Straighten the image before processing

Tesseract handles moderate skew — pages photographed at a slight angle — but results are better on straight, flat scans. If your image is rotated more than 10–15 degrees, use your phone's crop and straighten tool or an image editor before uploading. The OCR engine reads lines of text horizontally; significant rotation forces it to re-estimate line orientation, which reduces accuracy.

Choose the correct language

Selecting the wrong language is a common and easily overlooked mistake. If a document is in French but you leave the language set to English, the OCR engine will misrecognise accented characters like é, à, and ç — substituting similar-looking ASCII characters. Always match the language setting to the language of the text in the image.

Typed text significantly outperforms handwriting

Tesseract is trained on printed and typed text. Handwritten text recognition requires a completely different model (typically based on sequence-to-sequence recurrent networks rather than character segmentation). Expect partial results at best with handwriting. If you need to digitise handwritten notes, either type them manually or use a service specifically trained for handwriting recognition.

Improve contrast and remove shadows before scanning

OCR accuracy depends heavily on the contrast between text and background. A page with coffee stains, yellow aging, or uneven lighting reduces contrast and increases recognition errors. Before scanning old or stained documents, photocopy them on a modern copier with contrast enhancement, then scan the copy. For digital images, increase contrast in any photo editor before running OCR.

Common OCR Use Cases

Use case	Input type	Workflow
Editing a received PDF	Scanned PDF (image-based)	Screenshot each page → OCR → copy text → paste into Word
Digitising receipts	Phone photo of paper receipt	Photo → OCR → copy amounts into spreadsheet
Extracting data from forms	Printed form scan	Scan → OCR → copy field values for data entry
Archiving printed books	Book page scan	High-res scan → OCR → text file → searchable digital copy
Reading text in a photo	Photo of sign, menu, label	Photo → OCR → readable text
Making a PDF searchable	Image-only PDF	Export pages as images → OCR each → combine extracted text

OCR and Scanned PDFs

If you have a scanned PDF — a PDF where the pages are images rather than selectable text — you can take a screenshot or export a page as an image and run it through the OCR tool. Many PDF readers (Adobe Acrobat Reader, Firefox, Chrome) let you export individual pages as PNG or JPG.

Once you have the extracted text, you can paste it into a Word document or use the PDF ↔ Word Converter to work with the content further. For selectable-text PDFs (not scanned), the PDF → Word converter extracts the text directly without needing OCR.

Privacy: Your Images Never Leave Your Device

Tesseract.js runs entirely in your browser as a WebAssembly module. Your image is processed in local memory — no pixel data is sent to any server. The first time you use the tool, the language model files (typically 1–5 MB per language) are downloaded and cached in your browser. Subsequent uses of the same language work offline without re-downloading. This makes the tool appropriate for confidential documents, ID scans, medical records, or any sensitive material.

Extract Text from Images Free

15+ languages, no uploads, no account. Optical character recognition in your browser.

Open OCR — Image to Text