How does the summarizer detect sections in a PDF?

Section detection combines typography analysis (font size jumps, weight changes, all-caps usage) with positional cues (vertical spacing, indentation, numbering patterns like 1., 1.1, I., A.). The parser extracts a heading tree from the PDF's text layer, validates it against page geometry, and groups paragraphs into the section they belong to. The result is a hierarchical outline that drives per-section summarization.

Can I get one summary per chapter instead of one for the whole document?

Yes — that's the default behavior. The summarizer treats each detected section (chapter, clause, IMRAD block, agenda item) as its own unit and produces an independent TL;DR for it. You also get a roll-up executive paragraph at the top, but the per-section breakdown is the primary output and can be exported on its own.

Can I export the section summaries as a Word doc?

Yes. Export options include Word (.docx) with proper heading styles applied, Markdown with H1/H2 hierarchy intact, plain text, and PDF. The Word export keeps the section structure so you can drop it into a report or briefing template without re-formatting. Use the local PDF-to-Word converter if you need the original document in editable form alongside the summary.

Structure-aware summarization

A PDF content summarizer that keeps the outline — section by section, not flattened into a blob.

大多数摘要工具将所有内容串联在一起，返回一段丢失文档结构的文字。本工具分别检测摘要、方法、结果、条款和章节——然后为每个章节生成简短摘要，保留原始层次结构。

account_treeHierarchical output format_list_bulletedPer-section TL;DR linkSection-scoped citations descriptionDOCX / MD / PDF export

auto_awesome打开摘要工具 arrow_downwardHow section detection works

articleAbstract

scienceMethods

analyticsResults

forumDiscussion

article

Abstract · TL;DR

Study tests retrieval-grounded summarization on 4k clinical PDFs.

science

Methods · TL;DR

Two-stage pipeline: heading detection, then per-section abstractive pass.

analytics

Results · TL;DR

+18 ROUGE-L over flat baselines; section attribution 96% accurate.

forum

Discussion · TL;DR

保留大纲的输出将审阅者处理长PDF的时间减少约40%。

Structure preserved, not flattened.

40页的PDF不是40页相同内容——它是一个大纲。摘要工具也应该返回一个大纲。

大多数LLM摘要工具将PDF分块、对每块进行摘要，并将结果串联为一段散文。这种输出对推文很方便，但对以下类型的文档毫无用处： have shape — research papers, contracts, board reports, multi-chapter handbooks.

具有结构感知的摘要工具首先检测文档的实际层次结构——摘要、方法、结果、讨论，或条款1、条款2、条款3——然后生成 one TL;DR per detected section。输出本身就是一个大纲，镜像源文档。

当你需要查找某些内容时，差异就体现出来了。使用扁平块，你需要重新阅读整个摘要来定位关于定价的部分。使用按章节摘要，你可以直接跳转到“条款4 · 定价”，找到一个2行的答案，并附有返回源段落的链接。

blockFlat blob output

account_treeSection-aware

articleAbstract

scienceMethods

analyticsResults

forumDiscussion

Built for documents with shape.

如果你的PDF有章节、条款、行项目或议程块，按章节摘要可以保留扁平摘要所破坏的内容。

science

Research papers

IMRAD structure preserved — Abstract, Introduction, Methods, Results, Discussion each get their own TL;DR with section-scoped citations.

IMRAD

gavel

Contracts

每个条款独立摘要 — 期限、定价、责任、终止 — 便于你逐条查阅合同义务。

Per-clause

balance

Legal briefs

事实陈述、论点一、论点二、结论 — 保留为独立模块，而非合并成单一叙述。

Sectioned

trending_up

Financial reports

营收、运营费用、现金流、风险因素 — 每项均附带原始数据进行摘要。

Line items

groups

会议记录

Agenda items become sections — each gets a decision-and-action TL;DR, so attendees see what was concluded per topic.

Per-agenda

How section detection works.

标题检测首先是排版问题，其次才是语言问题。处理流程像设计师一样读取页面，再像编辑一样进行摘要。

PDF parsing

提取带有位置元数据的文本层 — 每个文本段获得 x, y, fontSize, weight, and page。扫描版 PDF 会先进行 OCR 识别，以保留相同的元数据。

Heading detection

Cluster spans by typography: bigger font + bolder weight + leading whitespace = heading candidate. Numbering patterns (1.1.2, I.A) confirm hierarchy depth.

Semantic block grouping

正文段落归属于最近的前置标题。对于没有明确标题的 PDF，向量嵌入会检测主题转换并生成模块标签。

Per-section abstractive summary

每个模块在章节范围的上下文中独立摘要，互不干扰。引用精确到模块内的段落级别。

Output formats — pick the shape you need.

Same hierarchical extraction, three rendering modes. Switch between them without re-summarizing.

format_list_bulleted

Bullet TL;DR

每节三到五条要点，适合快速浏览、简报文档和按主题略读的邮件摘要。

Methods

Two-stage retrieval pipeline

N=412 clinical PDFs sampled

ROUGE-L primary metric

subject

Executive paragraph

每节一段精炼段落，面向习惯阅读连续文本的读者，保留各发现之间的逻辑联系，适用于备忘录和报告。

Results

章节感知版本在保留测试文档上比平铺基线高出 18 个 ROUGE-L 分，章节归属准确率达 96%。

account_tree

Outline / mind-map

可折叠的章节与子章节树形结构，适合先导航后阅读的长篇 PDF。

Paper

Abstract

Methods

Sampling

Pipeline

Results

What you get vs a flat summary.

Both produce text. Only one preserves the document.

Flat blobTypical summarizer

整篇文档只有一段摘要

closeLoses the outline. 方法与讨论混入同一文本流中，难以区分。
closeCross-section citations. 结果部分的结论可能被错误归因于方法部分的段落。
closeNo navigation. 你需要重新阅读摘要才能找到某个主题。
closeLength collapses meaning. A 40-page contract becomes 200 words; clauses disappear.
closeHard to export structurally. The Word doc has no headings.

Section-awareThis tool

One TL;DR per detected section, hierarchy intact

checkOutline preserved. Each Abstract, Method, clause, or chapter has its own block.
checkSection-scoped citations. A bullet in Methods cites only Methods passages.
checkJump to topic. 点击「第4条」，阅读60个字，而无需重新扫描整个摘要。
checkLength adapts to depth. Long sections get longer summaries automatically.
checkStructural export. DOCX 含 H1/H2 样式，Markdown 含正确标题层级。

When section-aware actually matters.

A two-page memo doesn't need this. A forty-page contract does.

menu_book

长篇技术性 PDF

当文档超过40页且包含不同阶段（背景、设计、评估）时，平铺摘要会将各阶段压缩为一段无差别的段落，使你失去按主题浏览的能力。

group

Multi-author papers

每位作者以不同风格和术语撰写不同章节。按章节摘要尊重这些边界，而非强行统一成虚假的叙述。

gavel

Contracts where each clause counts

在一份30条款的主服务协议中，每个条款都是独立的谈判面。将定价和终止条款混为一谈，会掩盖你真正需要标注的内容。

与其他工具配合使用 privacy stack.

摘要只是其中一环 — 其他工具负责处理文档的其余部分。

常见问题

摘要工具如何检测PDF中的章节？

章节检测结合排版分析（字号跳变、字重变化、全大写用法）与位置信息（垂直间距、缩进、编号格式如 1.、1.1、I.、A.）。解析器从 PDF 文本层提取标题树，根据页面几何结构进行验证，并将段落归入所属章节。结果是驱动逐章节摘要的层级大纲。参见 the technical flow 了解四阶段处理流程。

能否为每个章节分别生成摘要，而不是整篇文档只有一个？

是的，这是默认行为。摘要工具将每个检测到的章节（章、条款、IMRAD 模块、议程项）作为独立单元，为其生成独立的简要摘要。顶部还有汇总执行段落，但按章节细分是主要输出，可单独导出。在以下位置打开工具： /summarize-pdf-ai 来试试看。

What if my PDF doesn't have explicit headings?

对于没有排版标题的文档（纯散文、扫描文章、文字记录），工具会回退到语义分块：通过向量嵌入检测主题转换来对段落聚类，然后分配合成章节标签。输出仍是层级结构——你得到的是按主题分组的摘要，而非任意分块的摘要。

能否将章节摘要导出为 Word 文档？

可以。导出选项包括：已应用正确标题样式的 Word (.docx)、H1/H2 层级完整的 Markdown、纯文本和 PDF。Word 导出保留章节结构，可直接插入报告或简报模板而无需重新格式化。如需将原始 PDF 转为可编辑格式，请使用 PDF to Word (local) alongside the summary.

Does each section summary include its own source citations?

是的。每个章节摘要都带有返回源 PDF 的页面和段落锚点，因此方法摘要中的要点会引用方法部分的确切段落（而非结果部分）。点击任意要点可在内嵌查看器中跳转至高亮的源文本段。引用限定在所属章节范围内，避免了平铺摘要器常见的跨章节归因错误。如需深入了解某个章节，请切换至 chat mode and ask follow-ups.

Stop reading forty pages. Start reading forty TL;DRs — one per section.

上传 PDF，看着大纲出现，获得带章节范围引用的逐章节摘要。导出为 Word、Markdown 或 PDF — 结构完整保留。

auto_awesome打开摘要工具