OCR Online Free — Extract Text from Images
The free OCR tool uses optical character recognition to extract text from any image — scanned documents, photos of signs, screenshots, and more. It supports 15+ languages and runs entirely in your browser using Tesseract.js, so no image is ever uploaded to a server.
What Is OCR and How Does It Work?
Optical Character Recognition (OCR) is the process of identifying text within an image and converting it into editable, searchable characters. When you scan a printed page or photograph a document, the result is a raster image — a grid of pixels with no concept of letters or words, just colored dots.
Modern OCR engines analyze that pixel grid using trained neural networks. The recognition pipeline typically involves several stages:
- Pre-processing: Converting to grayscale, adjusting contrast, and removing noise
- Binarization: Converting the grayscale image to pure black and white based on a threshold, making text/background contrast absolute
- Layout analysis: Detecting text regions, columns, paragraphs, and lines
- Character segmentation: Identifying individual character boundaries within each text line
- Recognition: Matching each segmented character against trained character models to determine the most likely character
- Post-processing: Using dictionary and language model information to correct recognition errors
About Tesseract.js
This tool uses Tesseract.js, a WebAssembly port of the Tesseract OCR engine. Tesseract was originally developed at Hewlett-Packard Laboratories in the 1980s, released as open source in 2005, and has been maintained by Google since 2006. It is one of the most accurate open-source OCR engines available.
The full Tesseract engine supports over 100 languages. The browser-based version (Tesseract.js) supports 15+ of the most widely used languages — enough for the vast majority of real-world document recognition tasks. The recognition engine runs as a WebAssembly module directly in your browser, meaning no image data is ever transmitted to any server.
How to Extract Text from an Image
- Open the OCR — Image to Text tool
- Click Choose File or drag an image onto the upload zone. Supported formats: PNG, JPG, GIF, BMP, TIFF, WebP
- Select the language of the text in the image from the language dropdown
- Click Recognise Text
- The extracted text appears in the output panel. Click Copy to copy it to your clipboard
Recognition time depends on image size and complexity. A simple typed page typically takes 3–10 seconds. Dense multi-column layouts or images with small text may take longer.
Supported Languages
| Language | Script | Notes |
|---|---|---|
| English | Latin | Best accuracy; most training data available |
| Arabic | Arabic (RTL) | Right-to-left text; clear print performs best |
| Chinese (Simplified) | Han | Printed text; handwriting not supported |
| French, German, Spanish, Italian | Latin | High accuracy on typed text; diacritics (é, ü, ñ) recognised |
| Hindi | Devanagari | Good on clear, printed text |
| Japanese | Hiragana, Katakana, Kanji (mixed) | Printed text; vertical layouts may vary in accuracy |
| Korean | Hangul | High accuracy on printed text |
| Russian, Ukrainian | Cyrillic | High accuracy on standard printed documents |
| Portuguese | Latin | Full diacritic support (ã, ç, ê) |
| Dutch, Polish, Swedish | Latin | Good accuracy on typed text |
Tips for Better OCR Accuracy
Use high-resolution images — 300 DPI minimum
OCR accuracy improves significantly with higher resolution. At low resolution, characters that look distinct to human eyes become ambiguous blobs of pixels. Aim for at least 300 DPI for scanned documents — the same standard used by professional document management systems. If photographing text with a phone, ensure the image is in focus, well-lit, and held parallel to the document surface. Blurry or low-light photos are the most common cause of poor OCR results.
The most common OCR character confusions caused by low resolution:
1(one) andl(lowercase L) andI(uppercase I)0(zero) andO(letter O)rnandmclandd8andB
At 300+ DPI, these distinctions become clear to the recognition engine and accuracy improves dramatically.
Straighten the image before processing
Tesseract handles moderate skew — pages photographed at a slight angle — but results are better on straight, flat scans. If your image is rotated more than 10–15 degrees, use your phone's crop and straighten tool or an image editor before uploading. The OCR engine reads lines of text horizontally; significant rotation forces it to re-estimate line orientation, which reduces accuracy.
Choose the correct language
Selecting the wrong language is a common and easily overlooked mistake. If a document is in French but you leave the language set to English, the OCR engine will misrecognise accented characters like é, à, and ç — substituting similar-looking ASCII characters. Always match the language setting to the language of the text in the image.
Typed text significantly outperforms handwriting
Tesseract is trained on printed and typed text. Handwritten text recognition requires a completely different model (typically based on sequence-to-sequence recurrent networks rather than character segmentation). Expect partial results at best with handwriting. If you need to digitise handwritten notes, either type them manually or use a service specifically trained for handwriting recognition.
Improve contrast and remove shadows before scanning
OCR accuracy depends heavily on the contrast between text and background. A page with coffee stains, yellow aging, or uneven lighting reduces contrast and increases recognition errors. Before scanning old or stained documents, photocopy them on a modern copier with contrast enhancement, then scan the copy. For digital images, increase contrast in any photo editor before running OCR.
Common OCR Use Cases
| Use case | Input type | Workflow |
|---|---|---|
| Editing a received PDF | Scanned PDF (image-based) | Screenshot each page → OCR → copy text → paste into Word |
| Digitising receipts | Phone photo of paper receipt | Photo → OCR → copy amounts into spreadsheet |
| Extracting data from forms | Printed form scan | Scan → OCR → copy field values for data entry |
| Archiving printed books | Book page scan | High-res scan → OCR → text file → searchable digital copy |
| Reading text in a photo | Photo of sign, menu, label | Photo → OCR → readable text |
| Making a PDF searchable | Image-only PDF | Export pages as images → OCR each → combine extracted text |
OCR and Scanned PDFs
If you have a scanned PDF — a PDF where the pages are images rather than selectable text — you can take a screenshot or export a page as an image and run it through the OCR tool. Many PDF readers (Adobe Acrobat Reader, Firefox, Chrome) let you export individual pages as PNG or JPG.
Once you have the extracted text, you can paste it into a Word document or use the PDF ↔ Word Converter to work with the content further. For selectable-text PDFs (not scanned), the PDF → Word converter extracts the text directly without needing OCR.
Privacy: Your Images Never Leave Your Device
Tesseract.js runs entirely in your browser as a WebAssembly module. Your image is processed in local memory — no pixel data is sent to any server. The first time you use the tool, the language model files (typically 1–5 MB per language) are downloaded and cached in your browser. Subsequent uses of the same language work offline without re-downloading. This makes the tool appropriate for confidential documents, ID scans, medical records, or any sensitive material.
Extract Text from Images Free
15+ languages, no uploads, no account. Optical character recognition in your browser.
Open OCR — Image to Text