Home›Guides›OCR a scanned PDF

How to OCR a scanned PDF — turning images of pages into real text with the PDF Pro OCR tool.

⏱ 2 min read 🎯 Easy 🛠 PDF Pro OCR

A scanned PDF looks like a document, but to a computer it is just a stack of pictures — you can't select a name, search for an invoice number, or let a screen reader read it. OCR is the step that pulls real, selectable text back out of those pictures. This guide walks the whole job in five steps, run entirely in your browser tab.

What you'll need

A modern browser (Chrome, Edge, Firefox, or Safari from the last two years)
The scanned or image-only PDF you want to OCR, on your device
A sense of what language the document is in — that drives the accuracy
A few minutes — image-only pages take a slower recognition pass, and the first use of a language downloads a small pack

The five steps

Open the OCR tool

Head to the PDF Pro OCR tool. The page loads with the Tesseract recognition engine bundled as WebAssembly, ready to run on your CPU. There is no signup, no email-confirm wall, no daily page counter — and no upload endpoint to send your scan to.

Choose your scanned PDF

Drag the file onto the drop zone or click to browse. The tool reads it straight from your disk and renders a thumbnail grid of every page. This is also where the tool quietly sorts your pages into two groups: pages that already carry a real text layer, and image-only pages that will need the full recognition pass.

Pick the recognition language

Choose the language that matches your document. The engine recognizes Latin-script languages plus Cyrillic, Greek and more — and picking the right one is the single biggest accuracy lever you have. The first time you use a given language, a small data file (a few MB) downloads and is then cached, so the next run in that language starts immediately.

Run OCR

Click Run OCR. The tool moves through your pages in two speeds: any page that already has a real text layer is extracted instantly and exactly, while image-only pages go through the slower recognition pass on your CPU. A progress indicator shows which page is being read — a long scan of photographed pages is the slowest case, so give it a moment.

Copy or save the extracted text

When the pass finishes, the result is real, selectable text — not another picture of the page. Select it, copy it to the clipboard, or save it out, then paste it into a document, search it, or feed it to a translator or summarizer. Nothing is locked behind a signup or an upgrade; the recognized text is yours the moment it appears.

Copy extracted text

Common mistakes & gotchas

Expecting perfection from a blurry photo. OCR accuracy is bounded by scan quality. A clean, straight, ~300 DPI scan of printed text recognizes very well; a phone snapshot taken at an angle in poor light will not. Re-scan before you blame the tool.
Picking the wrong recognition language. Running an English pass on a Cyrillic document produces confident nonsense. Match the language to the document — it is the cheapest accuracy win available.
Trying to OCR handwriting. The engine is tuned for printed text. Handwritten notes, signatures, and cursive will be unreliable no matter how clean the scan.
Assuming the first run is broken because it's slow. The first time you use a language, a few-megabyte data pack downloads. That is a one-time cost — it is cached, and later runs in that language start immediately.
Feeding it a loose image file. The tool takes PDF files. If you only have a photo, put it into a PDF first — the JPG to PDF converter does that in your browser — then run OCR on the resulting PDF.

Troubleshooting

Why did some pages finish instantly and others take much longer?

Because they were handled differently. Pages that already contain a real text layer skip OCR entirely and go through fast, exact extraction. Only true image-only pages get the slower recognition pass on your CPU — so a mixed PDF will visibly speed up and slow down as it works.

The recognized text has errors. How do I improve accuracy?

Accuracy depends almost entirely on the scan. Re-scan sharp, straight, and well-lit at around 300 DPI, make sure the recognition language matches the document, and de-skew tilted pages before you start. Printed text on a clean scan recognizes very well; low contrast and blur are what hurt.

Does my scanned file get uploaded to a server?

No. The Tesseract engine runs inside your browser, so the scan is read straight from your device and never leaves it. If you want to confirm it, open DevTools, switch to the Network tab, and run OCR — you'll see zero file uploads.

My document is in two languages. Which one should I pick?

Select the document's dominant language and add the optional English pass to catch the secondary one. For a page that is genuinely half-and-half, that combination usually beats running either language alone.

Can the browser handle a big multi-page scan?

Yes — there is no artificial page cap, because recognition costs your CPU time, not a server bill. The real ceiling is your browser's memory, roughly 500 MB on a modern laptop. A few-hundred-page scan simply takes longer; on a phone, stick to shorter documents.

Ready to OCR a scan?

Open the browser OCR tool and run your scanned PDF through the five steps above.

Open the tool →