PDF to Markdown
Extract PDF text with heading detection. Best for text-heavy documents.
How heading detection works
pdf.js exposes the font size of every text run. The tool clusters sizes:
- Find the most common (modal) size — that's body text
- Sizes ~1.4× larger → H2, ~1.8× → H1, ~1.2× → H3
- Anything below body size → footnotes / small text
This works well for documents with consistent typographic hierarchy. Magazines, decorative layouts, and PDFs exported from inconsistent sources may need cleanup.
What's preserved
- Paragraph breaks
- Heading hierarchy (H1-H3)
- Inline bold (if it's encoded in font name)
- Bullet-list indentation
What's not
- Images (replaced with
[image])
- Complex tables (becomes flat text)
- Two-column layouts (text may interleave)
- Footnotes (mixed into body text)
Privacy
The PDF is parsed in your browser via pdf.js. Nothing is uploaded.