Doc2X API v3 JSON Output Guide

Overview

v3 JSON is not a separate API. It is the extended output of the same parsing flow when using the v3-2026 model:

Create a task with POST /api/v2/parse/preupload and pass model: "v3-2026"
Poll the task with GET /api/v2/parse/status
When status=success, each page returns page.layout.blocks in addition to the original page.md

Compared with the older result that mainly exposed md, v3 JSON also provides:

structured block-level content
parent-child relations between blocks
per-page reading order
cropped image URLs for figure blocks
HTML content for table blocks

How To Enable It

You can enable the v3-2026 model with the following complete request:

bash

curl --location --request POST 'https://v2.doc2x.noedgeai.com/api/v2/parse/preupload' \
  --header 'Authorization: Bearer sk-xxx' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "model": "v3-2026"
  }'

After receiving the returned uid and upload url, upload the file with PUT, then poll GET /api/v2/parse/status?uid=... for the final result.

Response Overview

Top-level fields

Field	Type	Description
`code`	string	Request result, `success` on success
`data.status`	string	Task status, usually `processing`, `failed`, or `success`
`data.progress`	int	Task progress in the range `0~100`
`data.result.version`	string	Version field; currently an empty string in the sample and should be treated as reserved
`data.result.pages`	array	Per-page parsing results

page fields

Field	Type	Description
`page.url`	string	Page-level image URL; empty in the current sample
`page.page_idx`	int	Page index starting from `0`
`page.page_width`	int	Page width in pixels
`page.page_height`	int	Page height in pixels
`page.md`	string	Markdown content for the page
`page.score`	int	Page quality score in the range `0~100`
`page.layout.blocks`	array	Structured block list for the page

Response Example

The following example is shortened from a real v3 response and keeps the key structures for md, blocks, attributes, and linking.

json

{
  "code": "success",
  "data": {
    "status": "success",
    "progress": 100,
    "result": {
      "version": "",
      "pages": [
        {
          "url": "",
          "page_idx": 0,
          "page_width": 1753,
          "page_height": 2338,
          "md": "<!-- Meanless: Water Research 264 (2024) 122255... -->\n\n# Non-radical oxidation driven by iron-based materials...",
          "score": 85,
          "layout": {
            "blocks": [
              {
                "id": "blk_p0_0",
                "type": "Text",
                "text": "Water Research 264 (2024) 122255",
                "bbox": [699, 98, 1046, 132],
                "reading_order": 0,
                "parent_id": "",
                "src": "",
                "attributes": {
                  "is_boilerplate": true
                }
              },
              {
                "id": "blk_p0_7",
                "type": "Title",
                "text": "Non-radical oxidation driven by iron-based materials without energy assistance in wastewater treatment",
                "bbox": [101, 480, 1343, 585],
                "reading_order": 7,
                "parent_id": "",
                "src": ""
              },
              {
                "id": "blk_p0_21",
                "type": "Text",
                "text": "The persistence of organic pollutants in aquatic environments presents a significant risk to human health...",
                "bbox": [103, 1486, 858, 1924],
                "reading_order": 21,
                "parent_id": "",
                "src": "",
                "linking": {
                  "prev_block_id": "",
                  "next_block_id": "blk_p0_22"
                }
              },
              {
                "id": "blk_p0_24",
                "type": "FootnoteGroup",
                "text": "",
                "bbox": [101, 1964, 1242, 2042],
                "reading_order": 24,
                "parent_id": "",
                "src": ""
              }
            ]
          }
        }
      ]
    }
  }
}

The following is a table block example:

json

{
  "id": "blk_p2_2",
  "type": "TableGroup",
  "text": "",
  "bbox": [101, 156, 1649, 907],
  "reading_order": 2,
  "parent_id": "",
  "src": ""
}

json

{
  "id": "blk_p2_5",
  "type": "Table",
  "text": "",
  "bbox": [101, 217, 1649, 819],
  "reading_order": 4,
  "parent_id": "blk_p2_2",
  "src": "",
  "table_data": {
    "html": "<table><tr><td rowspan=\"2\">Review</td><td rowspan=\"2\">Iron-based materials</td><td colspan=\"5\">Oxidation systems</td>...</tr></table>"
  }
}

Common block fields

Field	Type	Description
`id`	string	Unique block id; the current sample uses values like `blk_p2_5`
`type`	string	Block type such as `Text`, `Title`, or `Table`
`text`	string	Text payload of the block; in the current sample, group, figure, and table blocks usually return an empty string
`bbox`	int[4]	Absolute coordinates in the form `[x1, y1, x2, y2]`
`reading_order`	int	Reading order within the page, starting from `0`; group blocks always share the same value as their first child block
`parent_id`	string	Parent group id; blocks without a parent currently return an empty string
`src`	string	Image resource URL; non-image blocks usually return an empty string
`linking`	object	Linking metadata used when one logical piece of content is split across multiple blocks, columns, or pages
`attributes`	object	Extra block attributes, only present when needed
`table_data`	object	Only present on `Table` blocks; the current sample contains `html`

linking

linking is used to indicate that multiple blocks should be read as one logical unit. Its basic structure is:

json

{
  "prev_block_id": "blk_p0_21",
  "next_block_id": "blk_p0_22"
}

In the current sample, missing prev_block_id or next_block_id is represented by an empty string.

Common linking scenarios include:

Cross-column or cross-page paragraph continuation

The typical target is Text.
When a long paragraph is split by multi-column layout or continues onto the next page, prev_block_id and next_block_id can be used to chain the related text blocks together.

Cross-column or cross-page table continuation

The typical target is Table.
In general, only Table carries linking; TableGroup usually does not.
When one logical table is split across different columns or pages, linking can connect the preceding and following table blocks.

Linking a table-of-contents entry to its page number

The typical target is a TOC-related block, for example text blocks inside TOCGroup.
When the TOC entry text and its page number are detected as separate blocks, linking can indicate that they belong to the same TOC item.

Block types

The block types are split into two groups below: general types and group types. Some already appear in the current sample, while others are reserved in the v3-json design and may appear in other documents.

General types

Type	Meaning	Typical characteristics
`Text`	Text paragraph blocks, including normal body paragraphs, OCR text paragraphs, headers, footers, and similar text paragraphs	May include `attributes` or `linking`
`Title`	Title block, including document titles and section titles	Usually a standalone heading block
`Caption`	Figure or table caption text	Often a child of `FigureGroup` or `TableGroup`
`Figure`	Figure/image body	`src` is usually non-empty and `parent_id` points to `FigureGroup`
`Table`	Table body	`table_data.html` contains table HTML
`Equation`	Formula content	Usually a child of `EquationGroup`
`EquationNumber`	Equation numbering	Examples: `(1)`, `(2)`
`Underline`	Standalone underline block	Often used for independently detected underline elements
`Code`	Code or pseudocode body block	Often used for the code region itself

Group types

Type	Meaning	Typical characteristics
`FigureGroup`	Figure group container	Usually contains `Figure` plus one or more `Caption` blocks
`TableGroup`	Table group container	Usually contains `Table` plus one or more `Caption` blocks
`EquationGroup`	Equation group container	Usually contains `Equation`, and may contain `EquationNumber`
`FootnoteGroup`	Footnote container	Container for the footnote area near the page bottom
`ReferenceGroup`	Reference list container	Container for the references section
`TOCGroup`	Table-of-contents container	Often used to organize TOC entries and page numbers
`CodeGroup`	Parent container for code plus caption/description	Often used to organize code with its descriptive text
`SideGroup`	Sidebar, aside, or side-note container	Often used to organize side content

Detailed notes for each type:

`Text`

Text paragraph block.
Normal body paragraphs, header/footer text paragraphs, and page-number text paragraphs may all be returned as Text.
Boilerplate text such as headers, footers, and page numbers is also returned as Text, usually with attributes.is_boilerplate=true.
linking may appear when the paragraph is split across columns, blocks, or pages.

`Title`

Title block.
Commonly used for document titles and section titles.

`Caption`

Descriptive text for figures, tables, and similar media objects.
Commonly appears as a child of FigureGroup or TableGroup.

`Figure`

The figure/image itself.
src is the cropped image URL.
text is an empty string in the current sample.

`Table`

The table itself.
The table is returned through table_data.html.
linking may appear when one logical table is split across columns or pages.

`Equation`

Formula body.
In the current sample, the formula content is stored in text.

`EquationNumber`

Formula numbering block.
Usually appears in the same EquationGroup as the corresponding Equation.

`FigureGroup`

Container block for a figure group.
Usually used to group the figure and its caption(s).

`TableGroup`

Container block for a table group.
Usually used to group the table and its caption(s).

`TOCGroup`

Container block for a table-of-contents region.
If a TOC entry and its page number are detected as separate blocks, linking can be used to associate them.

`EquationGroup`

Container block for an equation group.
Usually used for multi-line equations or equations with numbering.

`FootnoteGroup`

Container block for the footnote area.
In the current sample, text is an empty string.

`ReferenceGroup`

Container block for the references section.
Usually used to organize continuous reference entries.

`attributes` field

Attribute	Type	Meaning
`is_boilerplate`	bool	Boilerplate text such as headers, footers, page numbers, and repeated publication info; it is generally recommended to remove it
`writing_mode`	string	Writing mode such as horizontal or vertical text
`rotation`	int	Clockwise rotation angle relative to the default text `0` degree orientation, commonly `90`, `180`, or `270`

`is_boilerplate`

Marks repeated, standardized, or non-body text, for example:

headers
footers
page numbers
DOI and publication metadata
This flag is not always perfectly accurate in practice, and clients will usually treat it as discardable content.
In md, this kind of content is usually wrapped in an HTML comment like .

Example:

json

{
  "type": "Text",
  "text": "Received 23 May 2024; Received in revised form 22 July 2024; Accepted 11 August 2024",
  "attributes": {
    "is_boilerplate": true
  }
}

`writing_mode`

Used to represent writing direction, mainly reserved for vertical writing. Integrations should treat it as an extensible string enum. Common design values include:

horizontal-tb: default horizontal writing
vertical-rl: generally used for Classical Chinese, Japanese, and similar writing that flows right-to-left and top-to-bottom

`rotation`

Describes the visual clockwise rotation angle of a block relative to the default text 0 degree orientation, mainly useful for Text and Caption. Integrations should handle it as an integer angle:

0: default orientation
90
180
270

`table_data` field

table_data only appears on Table blocks.

Field description

Field	Type	Description
`html`	string	HTML content of the current table

Example:

json

{
  "table_data": {
    "html": "<table><tr><td>Quencher</td><td>...</td></tr></table>"
  }
}

Extract figure and table images from the PDF using the JSON

If you want to turn the figures and tables detected in v3 JSON into standalone images, you can use each block bbox together with the page page_width / page_height to render and crop them again from the source PDF.

pdfdeal already provides helper utilities for this:

extract_v3_figure_images: extract images for Figure blocks
extract_v3_table_images: extract images for Table blocks
scripts/extract_v3_figures.py: CLI helper for figure extraction
scripts/extract_v3_tables.py: CLI helper for table extraction

How it works

The flow is:

Read the v3 JSON and get pages
Filter Figure or Table blocks from each page layout.blocks
Use each block bbox together with page page_width / page_height to map page coordinates into rendered pixel coordinates
Render only the PDF pages that contain target blocks at the requested dpi
Crop each target region into a standalone PNG and write a manifest.json

This is useful when you want to:

extract the original figure region for each Figure block from the source PDF
generate table screenshots for each Table block
save figure and table images locally instead of working only with the JSON structure

CLI examples

The following scripts come from the scripts/ directory in the pdfdeal repository:

pdfdeal/scripts/extract_v3_figures.py
pdfdeal/scripts/extract_v3_tables.py

Extract figure images:

bash

python scripts/extract_v3_figures.py \
  --pdf /path/to/input.pdf \
  --v3-json /path/to/input_v3.json \
  --dpi 200 \
  --output-dir ./Output/figures

Extract table images:

bash

python scripts/extract_v3_tables.py \
  --pdf /path/to/input.pdf \
  --v3-json /path/to/input_v3.json \
  --dpi 200 \
  --output-dir ./Output/tables

Python example

python

from pdfdeal import extract_v3_figure_images, extract_v3_table_images

figure_summary = extract_v3_figure_images(
    pdf_path="/path/to/input.pdf",
    v3_json_path="/path/to/input_v3.json",
    dpi=200,
    output_dir="./Output/figures",
)

table_summary = extract_v3_table_images(
    pdf_path="/path/to/input.pdf",
    v3_json_path="/path/to/input_v3.json",
    dpi=200,
    output_dir="./Output/tables",
)

print(figure_summary["crop_count"], figure_summary["manifest_path"])
print(table_summary["crop_count"], table_summary["manifest_path"])

Output layout

The output directory usually contains:

_pages/: rendered full-page PNGs
cropped figure or table PNG files
manifest.json: metadata for each crop, including page_idx, block_id, block_xyxy, crop_box_pixels, and output path

manifest.json is useful for:

mapping each block to a local image file
downstream multimodal indexing or figure/table post-processing
debugging whether the crop location matches the v3 JSON

Compatibility recommendations

Recommended integration behavior for v3 JSON:

Keep using page.md as the directly displayable or exportable text result.
Treat page.layout.blocks as a structured enhancement layer rather than a replacement for md.
Treat type, attributes, and table_data as extensible fields; do not fail on unknown values.
Be compatible with empty strings: in the current sample, parent_id, src, linking.prev_block_id, and linking.next_block_id may all be empty strings.
Be compatible with group blocks returning text: "" rather than null in the current sample.

Doc2X Client

One-Click Save Guide

Doc2X MCP Guide

Doc2X API v3 JSON Output Guide

Overview

How To Enable It

Response Overview

Top-level fields

page fields

Response Example

Common block fields

linking

Block types

General types

Group types

`Text`

`Title`

`Caption`

`Figure`

`Table`

`Equation`

`EquationNumber`

`FigureGroup`

`TableGroup`

`TOCGroup`

`EquationGroup`

`FootnoteGroup`

`ReferenceGroup`

`attributes` field

`is_boilerplate`

`writing_mode`

`rotation`

`table_data` field

Field description

Extract figure and table images from the PDF using the JSON

How it works

CLI examples

Python example

Output layout

Compatibility recommendations

One-Click Save Guide

Doc2X API v3 JSON Output Guide ​

Overview ​

How To Enable It ​

Response Overview ​

Top-level fields ​

page fields ​

Response Example ​

Common block fields ​

linking ​

Block types ​

General types ​

Group types ​

Text ​

Title ​

Caption ​

Figure ​

Table ​

Equation ​

EquationNumber ​

FigureGroup ​

TableGroup ​

TOCGroup ​

EquationGroup ​

FootnoteGroup ​

ReferenceGroup ​

attributes field ​

is_boilerplate ​

writing_mode ​

rotation ​

table_data field ​

Field description ​

Extract figure and table images from the PDF using the JSON ​

How it works ​

CLI examples ​

Python example ​

Output layout ​

Compatibility recommendations ​

Doc2X API v3 JSON Output Guide

Overview

How To Enable It

Response Overview

Top-level fields

page fields

Response Example

Common block fields

linking

Block types

General types

Group types

`Text`

`Title`

`Caption`

`Figure`

`Table`

`Equation`

`EquationNumber`

`FigureGroup`

`TableGroup`

`TOCGroup`

`EquationGroup`

`FootnoteGroup`

`ReferenceGroup`

`attributes` field

`is_boilerplate`

`writing_mode`

`rotation`

`table_data` field

Field description

Extract figure and table images from the PDF using the JSON

How it works

CLI examples

Python example

Output layout

Compatibility recommendations