Doc2X API v3 JSON Output Guide
Overview
v3 JSON is not a separate API. It is the extended output of the same parsing flow when using the v3-2026 model:
- Create a task with
POST /api/v2/parse/preuploadand passmodel: "v3-2026" - Poll the task with
GET /api/v2/parse/status - When
status=success, each page returnspage.layout.blocksin addition to the originalpage.md
Compared with the older result that mainly exposed md, v3 JSON also provides:
- structured block-level content
- parent-child relations between blocks
- per-page reading order
- cropped image URLs for figure blocks
- HTML content for table blocks
How To Enable It
You can enable the v3-2026 model with the following complete request:
curl --location --request POST 'https://v2.doc2x.noedgeai.com/api/v2/parse/preupload' \
--header 'Authorization: Bearer sk-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "v3-2026"
}'After receiving the returned uid and upload url, upload the file with PUT, then poll GET /api/v2/parse/status?uid=... for the final result.
Response Overview
Top-level fields
| Field | Type | Description |
|---|---|---|
code | string | Request result, success on success |
data.status | string | Task status, usually processing, failed, or success |
data.progress | int | Task progress in the range 0~100 |
data.result.version | string | Version field; currently an empty string in the sample and should be treated as reserved |
data.result.pages | array | Per-page parsing results |
page fields
| Field | Type | Description |
|---|---|---|
page.url | string | Page-level image URL; empty in the current sample |
page.page_idx | int | Page index starting from 0 |
page.page_width | int | Page width in pixels |
page.page_height | int | Page height in pixels |
page.md | string | Markdown content for the page |
page.score | int | Page quality score in the range 0~100 |
page.layout.blocks | array | Structured block list for the page |
Response Example
The following example is shortened from a real v3 response and keeps the key structures for md, blocks, attributes, and linking.
{
"code": "success",
"data": {
"status": "success",
"progress": 100,
"result": {
"version": "",
"pages": [
{
"url": "",
"page_idx": 0,
"page_width": 1753,
"page_height": 2338,
"md": "<!-- Meanless: Water Research 264 (2024) 122255... -->\n\n# Non-radical oxidation driven by iron-based materials...",
"score": 85,
"layout": {
"blocks": [
{
"id": "blk_p0_0",
"type": "Text",
"text": "Water Research 264 (2024) 122255",
"bbox": [699, 98, 1046, 132],
"reading_order": 0,
"parent_id": "",
"src": "",
"attributes": {
"is_boilerplate": true
}
},
{
"id": "blk_p0_7",
"type": "Title",
"text": "Non-radical oxidation driven by iron-based materials without energy assistance in wastewater treatment",
"bbox": [101, 480, 1343, 585],
"reading_order": 7,
"parent_id": "",
"src": ""
},
{
"id": "blk_p0_21",
"type": "Text",
"text": "The persistence of organic pollutants in aquatic environments presents a significant risk to human health...",
"bbox": [103, 1486, 858, 1924],
"reading_order": 21,
"parent_id": "",
"src": "",
"linking": {
"prev_block_id": "",
"next_block_id": "blk_p0_22"
}
},
{
"id": "blk_p0_24",
"type": "FootnoteGroup",
"text": "",
"bbox": [101, 1964, 1242, 2042],
"reading_order": 24,
"parent_id": "",
"src": ""
}
]
}
}
]
}
}
}The following is a table block example:
{
"id": "blk_p2_2",
"type": "TableGroup",
"text": "",
"bbox": [101, 156, 1649, 907],
"reading_order": 2,
"parent_id": "",
"src": ""
}{
"id": "blk_p2_5",
"type": "Table",
"text": "",
"bbox": [101, 217, 1649, 819],
"reading_order": 4,
"parent_id": "blk_p2_2",
"src": "",
"table_data": {
"html": "<table><tr><td rowspan=\"2\">Review</td><td rowspan=\"2\">Iron-based materials</td><td colspan=\"5\">Oxidation systems</td>...</tr></table>"
}
}Common block fields
| Field | Type | Description |
|---|---|---|
id | string | Unique block id; the current sample uses values like blk_p2_5 |
type | string | Block type such as Text, Title, or Table |
text | string | Text payload of the block; in the current sample, group, figure, and table blocks usually return an empty string |
bbox | int[4] | Absolute coordinates in the form [x1, y1, x2, y2] |
reading_order | int | Reading order within the page, starting from 0; group blocks always share the same value as their first child block |
parent_id | string | Parent group id; blocks without a parent currently return an empty string |
src | string | Image resource URL; non-image blocks usually return an empty string |
linking | object | Linking metadata used when one logical piece of content is split across multiple blocks, columns, or pages |
attributes | object | Extra block attributes, only present when needed |
table_data | object | Only present on Table blocks; the current sample contains html |
linking
linking is used to indicate that multiple blocks should be read as one logical unit. Its basic structure is:
{
"prev_block_id": "blk_p0_21",
"next_block_id": "blk_p0_22"
}In the current sample, missing prev_block_id or next_block_id is represented by an empty string.
Common linking scenarios include:
- Cross-column or cross-page paragraph continuation
- The typical target is
Text. - When a long paragraph is split by multi-column layout or continues onto the next page,
prev_block_idandnext_block_idcan be used to chain the related text blocks together.
- Cross-column or cross-page table continuation
- The typical target is
Table. - In general, only
Tablecarrieslinking;TableGroupusually does not. - When one logical table is split across different columns or pages,
linkingcan connect the preceding and following table blocks.
- Linking a table-of-contents entry to its page number
- The typical target is a TOC-related block, for example text blocks inside
TOCGroup. - When the TOC entry text and its page number are detected as separate blocks,
linkingcan indicate that they belong to the same TOC item.
Block types
The block types are split into two groups below: general types and group types. Some already appear in the current sample, while others are reserved in the v3-json design and may appear in other documents.
General types
| Type | Meaning | Typical characteristics |
|---|---|---|
Text | Text paragraph blocks, including normal body paragraphs, OCR text paragraphs, headers, footers, and similar text paragraphs | May include attributes or linking |
Title | Title block, including document titles and section titles | Usually a standalone heading block |
Caption | Figure or table caption text | Often a child of FigureGroup or TableGroup |
Figure | Figure/image body | src is usually non-empty and parent_id points to FigureGroup |
Table | Table body | table_data.html contains table HTML |
Equation | Formula content | Usually a child of EquationGroup |
EquationNumber | Equation numbering | Examples: (1), (2) |
Underline | Standalone underline block | Often used for independently detected underline elements |
Code | Code or pseudocode body block | Often used for the code region itself |
Group types
| Type | Meaning | Typical characteristics |
|---|---|---|
FigureGroup | Figure group container | Usually contains Figure plus one or more Caption blocks |
TableGroup | Table group container | Usually contains Table plus one or more Caption blocks |
EquationGroup | Equation group container | Usually contains Equation, and may contain EquationNumber |
FootnoteGroup | Footnote container | Container for the footnote area near the page bottom |
ReferenceGroup | Reference list container | Container for the references section |
TOCGroup | Table-of-contents container | Often used to organize TOC entries and page numbers |
CodeGroup | Parent container for code plus caption/description | Often used to organize code with its descriptive text |
SideGroup | Sidebar, aside, or side-note container | Often used to organize side content |
Detailed notes for each type:
Text
- Text paragraph block.
- Normal body paragraphs, header/footer text paragraphs, and page-number text paragraphs may all be returned as
Text. - Boilerplate text such as headers, footers, and page numbers is also returned as
Text, usually withattributes.is_boilerplate=true. linkingmay appear when the paragraph is split across columns, blocks, or pages.
Title
- Title block.
- Commonly used for document titles and section titles.
Caption
- Descriptive text for figures, tables, and similar media objects.
- Commonly appears as a child of
FigureGrouporTableGroup.
Figure
- The figure/image itself.
srcis the cropped image URL.textis an empty string in the current sample.
Table
- The table itself.
- The table is returned through
table_data.html. linkingmay appear when one logical table is split across columns or pages.
Equation
- Formula body.
- In the current sample, the formula content is stored in
text.
EquationNumber
- Formula numbering block.
- Usually appears in the same
EquationGroupas the correspondingEquation.
FigureGroup
- Container block for a figure group.
- Usually used to group the figure and its caption(s).
TableGroup
- Container block for a table group.
- Usually used to group the table and its caption(s).
TOCGroup
- Container block for a table-of-contents region.
- If a TOC entry and its page number are detected as separate blocks,
linkingcan be used to associate them.
EquationGroup
- Container block for an equation group.
- Usually used for multi-line equations or equations with numbering.
FootnoteGroup
- Container block for the footnote area.
- In the current sample,
textis an empty string.
ReferenceGroup
- Container block for the references section.
- Usually used to organize continuous reference entries.
attributes field
| Attribute | Type | Meaning |
|---|---|---|
is_boilerplate | bool | Boilerplate text such as headers, footers, page numbers, and repeated publication info; it is generally recommended to remove it |
writing_mode | string | Writing mode such as horizontal or vertical text |
rotation | int | Clockwise rotation angle relative to the default text 0 degree orientation, commonly 90, 180, or 270 |
is_boilerplate
Marks repeated, standardized, or non-body text, for example:
- headers
- footers
- page numbers
- DOI and publication metadata
- This flag is not always perfectly accurate in practice, and clients will usually treat it as discardable content.
- In
md, this kind of content is usually wrapped in an HTML comment like<!-- Meanless: ... -->.
Example:
{
"type": "Text",
"text": "Received 23 May 2024; Received in revised form 22 July 2024; Accepted 11 August 2024",
"attributes": {
"is_boilerplate": true
}
}writing_mode
Used to represent writing direction, mainly reserved for vertical writing. Integrations should treat it as an extensible string enum. Common design values include:
horizontal-tb: default horizontal writingvertical-rl: generally used for Classical Chinese, Japanese, and similar writing that flows right-to-left and top-to-bottom
rotation
Describes the visual clockwise rotation angle of a block relative to the default text 0 degree orientation, mainly useful for Text and Caption. Integrations should handle it as an integer angle:
0: default orientation90180270
table_data field
table_data only appears on Table blocks.
Field description
| Field | Type | Description |
|---|---|---|
html | string | HTML content of the current table |
Example:
{
"table_data": {
"html": "<table><tr><td>Quencher</td><td>...</td></tr></table>"
}
}Extract figure and table images from the PDF using the JSON
If you want to turn the figures and tables detected in v3 JSON into standalone images, you can use each block bbox together with the page page_width / page_height to render and crop them again from the source PDF.
pdfdeal already provides helper utilities for this:
extract_v3_figure_images: extract images forFigureblocksextract_v3_table_images: extract images forTableblocksscripts/extract_v3_figures.py: CLI helper for figure extractionscripts/extract_v3_tables.py: CLI helper for table extraction
How it works
The flow is:
- Read the
v3 JSONand getpages - Filter
FigureorTableblocks from each pagelayout.blocks - Use each block
bboxtogether with pagepage_width/page_heightto map page coordinates into rendered pixel coordinates - Render only the PDF pages that contain target blocks at the requested
dpi - Crop each target region into a standalone PNG and write a
manifest.json
This is useful when you want to:
- extract the original figure region for each
Figureblock from the source PDF - generate table screenshots for each
Tableblock - save figure and table images locally instead of working only with the JSON structure
CLI examples
The following scripts come from the scripts/ directory in the pdfdeal repository:
pdfdeal/scripts/extract_v3_figures.pypdfdeal/scripts/extract_v3_tables.py
Extract figure images:
python scripts/extract_v3_figures.py \
--pdf /path/to/input.pdf \
--v3-json /path/to/input_v3.json \
--dpi 200 \
--output-dir ./Output/figuresExtract table images:
python scripts/extract_v3_tables.py \
--pdf /path/to/input.pdf \
--v3-json /path/to/input_v3.json \
--dpi 200 \
--output-dir ./Output/tablesPython example
from pdfdeal import extract_v3_figure_images, extract_v3_table_images
figure_summary = extract_v3_figure_images(
pdf_path="/path/to/input.pdf",
v3_json_path="/path/to/input_v3.json",
dpi=200,
output_dir="./Output/figures",
)
table_summary = extract_v3_table_images(
pdf_path="/path/to/input.pdf",
v3_json_path="/path/to/input_v3.json",
dpi=200,
output_dir="./Output/tables",
)
print(figure_summary["crop_count"], figure_summary["manifest_path"])
print(table_summary["crop_count"], table_summary["manifest_path"])Output layout
The output directory usually contains:
_pages/: rendered full-page PNGs- cropped
figureortablePNG files manifest.json: metadata for each crop, includingpage_idx,block_id,block_xyxy,crop_box_pixels, and output path
manifest.json is useful for:
- mapping each block to a local image file
- downstream multimodal indexing or figure/table post-processing
- debugging whether the crop location matches the
v3 JSON
Compatibility recommendations
Recommended integration behavior for v3 JSON:
- Keep using
page.mdas the directly displayable or exportable text result. - Treat
page.layout.blocksas a structured enhancement layer rather than a replacement formd. - Treat
type,attributes, andtable_dataas extensible fields; do not fail on unknown values. - Be compatible with empty strings: in the current sample,
parent_id,src,linking.prev_block_id, andlinking.next_block_idmay all be empty strings. - Be compatible with group blocks returning
text: ""rather thannullin the current sample.