A PDF content summarizer that keeps the outline — section by section, not flattened into a blob.
Most summarizers concatenate everything and hand back one paragraph that loses the document's shape. This one detects Abstract, Methods, Results, clauses, and chapters individually — then writes a TL;DR per section so the original hierarchy survives.
Structure preserved, not flattened.
A 40-page PDF isn't 40 pages of one thing — it's an outline. The summarizer should return an outline too.
Most LLM summarizers chunk a PDF, summarize each chunk, and concatenate the result into one prose paragraph. That output is convenient for tweets but useless for documents that have shape — research papers, contracts, board reports, multi-chapter handbooks.
A structure-aware summarizer instead detects the document's actual hierarchy first — Abstract, Methods, Results, Discussion, or Clause 1, Clause 2, Clause 3 — and writes one TL;DR per detected section. The output is itself an outline, mirroring the source.
The difference matters when you need to find something. With a flat blob you re-read the whole summary to locate the part about pricing. With per-section TL;DRs you jump straight to "Clause 4 · Pricing" and find a 2-line answer with a link back to the source paragraph.
Built for documents with shape.
If your PDF has chapters, clauses, line items, or agenda blocks, a per-section summary preserves what a flat one destroys.
How section detection works.
Heading detection is a typography problem before it's a language problem. The pipeline reads the page like a designer would, then summarizes like an editor would.
x, y, fontSize, weight, and page. Scanned PDFs are OCR'd first so the same metadata exists.1.1.2, I.A) confirm hierarchy depth.Output formats — pick the shape you need.
Same hierarchical extraction, three rendering modes. Switch between them without re-summarizing.
What you get vs a flat summary.
Both produce text. Only one preserves the document.
- closeLoses the outline. Methods and Discussion get blurred into the same prose stream.
- closeCross-section citations. A claim from Results may be attributed to a passage in Methods.
- closeNo navigation. You re-read the summary to find a topic.
- closeLength collapses meaning. A 40-page contract becomes 200 words; clauses disappear.
- closeHard to export structurally. The Word doc has no headings.
- checkOutline preserved. Each Abstract, Method, clause, or chapter has its own block.
- checkSection-scoped citations. A bullet in Methods cites only Methods passages.
- checkJump to topic. Click "Clause 4" and read 60 words instead of re-scanning the whole summary.
- checkLength adapts to depth. Long sections get longer summaries automatically.
- checkStructural export. DOCX with H1/H2 styles, Markdown with proper heading levels.
When section-aware actually matters.
A two-page memo doesn't need this. A forty-page contract does.
Pair it with the rest of the privacy stack.
Summarization is one piece — the other tools handle the document around it.
Frequently asked questions
How does the summarizer detect sections in a PDF?
Can I get one summary per chapter instead of one for the whole document?
What if my PDF doesn't have explicit headings?
Can I export the section summaries as a Word doc?
Does each section summary include its own source citations?
Stop reading forty pages. Start reading forty TL;DRs — one per section.
Drop a PDF, watch the outline appear, get a per-section TL;DR with section-scoped citations. Export to Word, Markdown, or back to PDF — structure intact.
auto_awesomeOpen the summarizer