Skip to content

Doc2X API v3 JSON Output Guide

Overview

v3 JSON is not a separate API. It is the extended output of the same parsing flow when using the v3-2026 model:

  1. Create a task with POST /api/v2/parse/preupload and pass model: "v3-2026"
  2. Poll the task with GET /api/v2/parse/status
  3. When status=success, each page returns page.layout.blocks in addition to the original page.md

Compared with the older result that mainly exposed md, v3 JSON also provides:

  • structured block-level content
  • parent-child relations between blocks
  • per-page reading order
  • cropped image URLs for figure blocks
  • HTML content for table blocks

How To Enable It

You can enable the v3-2026 model with the following complete request:

bash
curl --location --request POST 'https://v2.doc2x.noedgeai.com/api/v2/parse/preupload' \
  --header 'Authorization: Bearer sk-xxx' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "model": "v3-2026"
  }'

After receiving the returned uid and upload url, upload the file with PUT, then poll GET /api/v2/parse/status?uid=... for the final result.

Response Overview

Top-level fields

FieldTypeDescription
codestringRequest result, success on success
data.statusstringTask status, usually processing, failed, or success
data.progressintTask progress in the range 0~100
data.result.versionstringVersion field; currently an empty string in the sample and should be treated as reserved
data.result.pagesarrayPer-page parsing results

page fields

FieldTypeDescription
page.urlstringPage-level image URL; empty in the current sample
page.page_idxintPage index starting from 0
page.page_widthintPage width in pixels
page.page_heightintPage height in pixels
page.mdstringMarkdown content for the page
page.scoreintPage quality score in the range 0~100
page.layout.blocksarrayStructured block list for the page

Response Example

The following example is shortened from a real v3 response and keeps the key structures for md, blocks, attributes, and linking.

json
{
  "code": "success",
  "data": {
    "status": "success",
    "progress": 100,
    "result": {
      "version": "",
      "pages": [
        {
          "url": "",
          "page_idx": 0,
          "page_width": 1753,
          "page_height": 2338,
          "md": "<!-- Meanless: Water Research 264 (2024) 122255... -->\n\n# Non-radical oxidation driven by iron-based materials...",
          "score": 85,
          "layout": {
            "blocks": [
              {
                "id": "blk_p0_0",
                "type": "Text",
                "text": "Water Research 264 (2024) 122255",
                "bbox": [699, 98, 1046, 132],
                "reading_order": 0,
                "parent_id": "",
                "src": "",
                "attributes": {
                  "is_boilerplate": true
                }
              },
              {
                "id": "blk_p0_7",
                "type": "Title",
                "text": "Non-radical oxidation driven by iron-based materials without energy assistance in wastewater treatment",
                "bbox": [101, 480, 1343, 585],
                "reading_order": 7,
                "parent_id": "",
                "src": ""
              },
              {
                "id": "blk_p0_21",
                "type": "Text",
                "text": "The persistence of organic pollutants in aquatic environments presents a significant risk to human health...",
                "bbox": [103, 1486, 858, 1924],
                "reading_order": 21,
                "parent_id": "",
                "src": "",
                "linking": {
                  "prev_block_id": "",
                  "next_block_id": "blk_p0_22"
                }
              },
              {
                "id": "blk_p0_24",
                "type": "FootnoteGroup",
                "text": "",
                "bbox": [101, 1964, 1242, 2042],
                "reading_order": 24,
                "parent_id": "",
                "src": ""
              }
            ]
          }
        }
      ]
    }
  }
}

The following is a table block example:

json
{
  "id": "blk_p2_2",
  "type": "TableGroup",
  "text": "",
  "bbox": [101, 156, 1649, 907],
  "reading_order": 2,
  "parent_id": "",
  "src": ""
}
json
{
  "id": "blk_p2_5",
  "type": "Table",
  "text": "",
  "bbox": [101, 217, 1649, 819],
  "reading_order": 4,
  "parent_id": "blk_p2_2",
  "src": "",
  "table_data": {
    "html": "<table><tr><td rowspan=\"2\">Review</td><td rowspan=\"2\">Iron-based materials</td><td colspan=\"5\">Oxidation systems</td>...</tr></table>"
  }
}

Common block fields

FieldTypeDescription
idstringUnique block id; the current sample uses values like blk_p2_5
typestringBlock type such as Text, Title, or Table
textstringText payload of the block; in the current sample, group, figure, and table blocks usually return an empty string
bboxint[4]Absolute coordinates in the form [x1, y1, x2, y2]
reading_orderintReading order within the page, starting from 0; group blocks always share the same value as their first child block
parent_idstringParent group id; blocks without a parent currently return an empty string
srcstringImage resource URL; non-image blocks usually return an empty string
linkingobjectLinking metadata used when one logical piece of content is split across multiple blocks, columns, or pages
attributesobjectExtra block attributes, only present when needed
table_dataobjectOnly present on Table blocks; the current sample contains html

linking

linking is used to indicate that multiple blocks should be read as one logical unit. Its basic structure is:

json
{
  "prev_block_id": "blk_p0_21",
  "next_block_id": "blk_p0_22"
}

In the current sample, missing prev_block_id or next_block_id is represented by an empty string.

Common linking scenarios include:

  1. Cross-column or cross-page paragraph continuation
  • The typical target is Text.
  • When a long paragraph is split by multi-column layout or continues onto the next page, prev_block_id and next_block_id can be used to chain the related text blocks together.
  1. Cross-column or cross-page table continuation
  • The typical target is Table.
  • In general, only Table carries linking; TableGroup usually does not.
  • When one logical table is split across different columns or pages, linking can connect the preceding and following table blocks.
  1. Linking a table-of-contents entry to its page number
  • The typical target is a TOC-related block, for example text blocks inside TOCGroup.
  • When the TOC entry text and its page number are detected as separate blocks, linking can indicate that they belong to the same TOC item.

Block types

The block types are split into two groups below: general types and group types. Some already appear in the current sample, while others are reserved in the v3-json design and may appear in other documents.

General types

TypeMeaningTypical characteristics
TextText paragraph blocks, including normal body paragraphs, OCR text paragraphs, headers, footers, and similar text paragraphsMay include attributes or linking
TitleTitle block, including document titles and section titlesUsually a standalone heading block
CaptionFigure or table caption textOften a child of FigureGroup or TableGroup
FigureFigure/image bodysrc is usually non-empty and parent_id points to FigureGroup
TableTable bodytable_data.html contains table HTML
EquationFormula contentUsually a child of EquationGroup
EquationNumberEquation numberingExamples: (1), (2)
UnderlineStandalone underline blockOften used for independently detected underline elements
CodeCode or pseudocode body blockOften used for the code region itself

Group types

TypeMeaningTypical characteristics
FigureGroupFigure group containerUsually contains Figure plus one or more Caption blocks
TableGroupTable group containerUsually contains Table plus one or more Caption blocks
EquationGroupEquation group containerUsually contains Equation, and may contain EquationNumber
FootnoteGroupFootnote containerContainer for the footnote area near the page bottom
ReferenceGroupReference list containerContainer for the references section
TOCGroupTable-of-contents containerOften used to organize TOC entries and page numbers
CodeGroupParent container for code plus caption/descriptionOften used to organize code with its descriptive text
SideGroupSidebar, aside, or side-note containerOften used to organize side content

Detailed notes for each type:

Text

  • Text paragraph block.
  • Normal body paragraphs, header/footer text paragraphs, and page-number text paragraphs may all be returned as Text.
  • Boilerplate text such as headers, footers, and page numbers is also returned as Text, usually with attributes.is_boilerplate=true.
  • linking may appear when the paragraph is split across columns, blocks, or pages.

Title

  • Title block.
  • Commonly used for document titles and section titles.

Caption

  • Descriptive text for figures, tables, and similar media objects.
  • Commonly appears as a child of FigureGroup or TableGroup.

Figure

  • The figure/image itself.
  • src is the cropped image URL.
  • text is an empty string in the current sample.

Table

  • The table itself.
  • The table is returned through table_data.html.
  • linking may appear when one logical table is split across columns or pages.

Equation

  • Formula body.
  • In the current sample, the formula content is stored in text.

EquationNumber

  • Formula numbering block.
  • Usually appears in the same EquationGroup as the corresponding Equation.

FigureGroup

  • Container block for a figure group.
  • Usually used to group the figure and its caption(s).

TableGroup

  • Container block for a table group.
  • Usually used to group the table and its caption(s).

TOCGroup

  • Container block for a table-of-contents region.
  • If a TOC entry and its page number are detected as separate blocks, linking can be used to associate them.

EquationGroup

  • Container block for an equation group.
  • Usually used for multi-line equations or equations with numbering.

FootnoteGroup

  • Container block for the footnote area.
  • In the current sample, text is an empty string.

ReferenceGroup

  • Container block for the references section.
  • Usually used to organize continuous reference entries.

attributes field

AttributeTypeMeaning
is_boilerplateboolBoilerplate text such as headers, footers, page numbers, and repeated publication info; it is generally recommended to remove it
writing_modestringWriting mode such as horizontal or vertical text
rotationintClockwise rotation angle relative to the default text 0 degree orientation, commonly 90, 180, or 270

is_boilerplate

Marks repeated, standardized, or non-body text, for example:

  • headers
  • footers
  • page numbers
  • DOI and publication metadata
  • This flag is not always perfectly accurate in practice, and clients will usually treat it as discardable content.
  • In md, this kind of content is usually wrapped in an HTML comment like <!-- Meanless: ... -->.

Example:

json
{
  "type": "Text",
  "text": "Received 23 May 2024; Received in revised form 22 July 2024; Accepted 11 August 2024",
  "attributes": {
    "is_boilerplate": true
  }
}

writing_mode

Used to represent writing direction, mainly reserved for vertical writing. Integrations should treat it as an extensible string enum. Common design values include:

  • horizontal-tb: default horizontal writing
  • vertical-rl: generally used for Classical Chinese, Japanese, and similar writing that flows right-to-left and top-to-bottom

rotation

Describes the visual clockwise rotation angle of a block relative to the default text 0 degree orientation, mainly useful for Text and Caption. Integrations should handle it as an integer angle:

  • 0: default orientation
  • 90
  • 180
  • 270

table_data field

table_data only appears on Table blocks.

Field description

FieldTypeDescription
htmlstringHTML content of the current table

Example:

json
{
  "table_data": {
    "html": "<table><tr><td>Quencher</td><td>...</td></tr></table>"
  }
}

Extract figure and table images from the PDF using the JSON

If you want to turn the figures and tables detected in v3 JSON into standalone images, you can use each block bbox together with the page page_width / page_height to render and crop them again from the source PDF.

pdfdeal already provides helper utilities for this:

  • extract_v3_figure_images: extract images for Figure blocks
  • extract_v3_table_images: extract images for Table blocks
  • scripts/extract_v3_figures.py: CLI helper for figure extraction
  • scripts/extract_v3_tables.py: CLI helper for table extraction

How it works

The flow is:

  1. Read the v3 JSON and get pages
  2. Filter Figure or Table blocks from each page layout.blocks
  3. Use each block bbox together with page page_width / page_height to map page coordinates into rendered pixel coordinates
  4. Render only the PDF pages that contain target blocks at the requested dpi
  5. Crop each target region into a standalone PNG and write a manifest.json

This is useful when you want to:

  • extract the original figure region for each Figure block from the source PDF
  • generate table screenshots for each Table block
  • save figure and table images locally instead of working only with the JSON structure

CLI examples

The following scripts come from the scripts/ directory in the pdfdeal repository:

  • pdfdeal/scripts/extract_v3_figures.py
  • pdfdeal/scripts/extract_v3_tables.py

Extract figure images:

bash
python scripts/extract_v3_figures.py \
  --pdf /path/to/input.pdf \
  --v3-json /path/to/input_v3.json \
  --dpi 200 \
  --output-dir ./Output/figures

Extract table images:

bash
python scripts/extract_v3_tables.py \
  --pdf /path/to/input.pdf \
  --v3-json /path/to/input_v3.json \
  --dpi 200 \
  --output-dir ./Output/tables

Python example

python
from pdfdeal import extract_v3_figure_images, extract_v3_table_images

figure_summary = extract_v3_figure_images(
    pdf_path="/path/to/input.pdf",
    v3_json_path="/path/to/input_v3.json",
    dpi=200,
    output_dir="./Output/figures",
)

table_summary = extract_v3_table_images(
    pdf_path="/path/to/input.pdf",
    v3_json_path="/path/to/input_v3.json",
    dpi=200,
    output_dir="./Output/tables",
)

print(figure_summary["crop_count"], figure_summary["manifest_path"])
print(table_summary["crop_count"], table_summary["manifest_path"])

Output layout

The output directory usually contains:

  • _pages/: rendered full-page PNGs
  • cropped figure or table PNG files
  • manifest.json: metadata for each crop, including page_idx, block_id, block_xyxy, crop_box_pixels, and output path

manifest.json is useful for:

  • mapping each block to a local image file
  • downstream multimodal indexing or figure/table post-processing
  • debugging whether the crop location matches the v3 JSON

Compatibility recommendations

Recommended integration behavior for v3 JSON:

  1. Keep using page.md as the directly displayable or exportable text result.
  2. Treat page.layout.blocks as a structured enhancement layer rather than a replacement for md.
  3. Treat type, attributes, and table_data as extensible fields; do not fail on unknown values.
  4. Be compatible with empty strings: in the current sample, parent_id, src, linking.prev_block_id, and linking.next_block_id may all be empty strings.
  5. Be compatible with group blocks returning text: "" rather than null in the current sample.