Question 1

What does this do exactly?

Accepted Answer

Takes a scanned (image-only) PDF, OCRs each page with Tesseract.js, then writes a new PDF that looks identical to the scan BUT with an invisible text layer over each page. The result is searchable in Acrobat / Preview / browsers, and copy-paste works.

Question 2

How long does it take?

Accepted Answer

Tesseract is browser-side via WebAssembly — roughly 5-15 seconds per page depending on your CPU and image complexity. A 10-page document is around 1-2 minutes. First run also downloads the language data (~10-30 MB per language).

Question 3

Is anything uploaded?

Accepted Answer

No. Tesseract runs in your browser via WebAssembly. The PDF, OCR'd text, and output never leave the page.

PDF OCR — Make Scans Searchable

What "searchable PDF" means

When this works well

When it struggles

Privacy