Doc2X API v2 PDF Interface Documentation

Basic Information

Base URL

https://v2.doc2x.noedgeai.com

Important Reminders

Please access the API interface directly. Regions outside mainland China may experience significant network fluctuations, leading to severe file upload interruptions
Not recommended for large-scale online service integration. Due to limited computing resources, queuing may occur (manifested as progress showing 0 during polling). More suitable for batch data processing
After obtaining results through status, if you need to save images, please download manually or obtain images locally through the export interface as soon as possible. The server only temporarily retains results for 24 hours

Authorization

First, you need to obtain an API Key (similar to sk-xxx). Get API website: open.noedgeai.com

Add to HTTP request headers:

bash

Authorization: Bearer sk-xxx

Authorization

POST /api/v2/parse/preupload File Pre-upload

Recommended to use this interface for faster upload speeds

Large file upload interface, file size <= 1GB

Request Parameters

None

Request Example

bash

curl -X POST 'https://v2.doc2x.noedgeai.com/api/v2/parse/preupload' \
--header 'Authorization: Bearer sk-xxx'

Response Example

json

{
  "code": "success",
  "data": {
    "uid": "0192d745-5776-7261-abbd-814df3af3449",
    "url": "https://doc2x-pdf.oss-cn-beijing.aliyuncs.com/tmp/0192d745-5776-7261-abbd-814df3af3449.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256..."
  }
}

After obtaining the url, use HTTP PUT method to upload the file to the url field in the returned result
After upload completion, use the /api/v2/parse/status interface to poll for results. Uses Alibaba Cloud OSS, specific speed depends on your network speed (overseas users may experience upload failures).

Interface Description

Flow chart as follows:

Interface Description

Exception errors (such as processing limit/processing page limit restrictions) will be returned in the status interface

Python Example

python

import json
import time
import requests as rq

base_url = "https://v2.doc2x.noedgeai.com"
secret = "sk-xxx"

def preupload():
    url = f"{base_url}/api/v2/parse/preupload"
    headers = {
        "Authorization": f"Bearer {secret}"
    }
    res = rq.post(url, headers=headers)
    if res.status_code == 200:
        data = res.json()
        if data["code"] == "success":
            return data["data"]
        else:
            raise Exception(f"get preupload url failed: {data}")
    else:
        raise Exception(f"get preupload url failed: {res.text}")

def put_file(path: str, url: str):
    with open(path, "rb") as f:
        res = rq.put(url, data=f) # body is file binary stream
        if res.status_code != 200:
            raise Exception(f"put file failed: {res.text}")

def get_status(uid: str):
    url = f"{base_url}/api/v2/parse/status?uid={uid}"
    headers = {
        "Authorization": f"Bearer {secret}"
    }
    res = rq.get(url, headers=headers)
    if res.status_code == 200:
        data = res.json()
        if data["code"] == "success":
            return data["data"]
        else:
            raise Exception(f"get status failed: {data}")
    else:
        raise Exception(f"get status failed: {res.text}")

upload_data = preupload()
print(upload_data)
url = upload_data["url"]
uid = upload_data["uid"]

put_file("test.pdf", url)

while True:
    status_data = get_status(uid)
    print(status_data)
    if status_data["status"] == "success":
        result = status_data["result"]
        with open("result.json", "w") as f:
            json.dump(result, f)
        break
    elif status_data["status"] == "failed":
        detail = status_data["detail"]
        raise Exception(f"parse failed: {detail}")
    elif status_data["status"] == "processing":
        # processing
        progress = status_data["progress"]
        print(f"progress: {progress}")
        time.sleep(3)

Notes

Since there is a certain delay for the server to fetch after users upload to OSS, the status will not immediately update to "task in progress" after uploading files, you need to wait (<20s)
The link is valid for 5 minutes after obtaining it, pay attention to timing
URL links cannot be reused: if HTTP PUT fails (i.e., status_code!=200), you can retry. If PUT gets a 200 return, the link cannot be reused
Since the number of pages cannot be known before uploading the file, rate limit triggers (parse_concurrency_limit, parse_task_limit_exceeded) will only be triggered in the status interface

GET /api/v2/parse/status View Asynchronous Status

After using the above asynchronous call, use this interface to poll status. Recommended polling frequency is 1~3 seconds per time

Cloud status (including images on CDN) can only be queried for results within 24 hours. Please export and save as soon as possible

View Asynchronous Status Request Parameters

Request Headers

Name	Description	Example Value
Authorization	Api key	Bearer sk-usui9lodl89p7r51suvo0awdawd

Request Body

Name	Location	Type	Required	Description
uid	query	string	Yes	Asynchronous task id

View Asynchronous Status Request Example

bash

curl --request GET 'https://v2.doc2x.noedgeai.com/api/v2/parse/status?uid=01920000-0000-0000-0000-000000000000' \
--header 'Authorization: Bearer sk-xxx'

python

import requests

url = 'https://v2.doc2x.noedgeai.com/api/v2/parse/status?uid=01920000-0000-0000-0000-000000000000'
headers = {'Authorization': 'Bearer sk-xxx'}

response = requests.get(url, headers=headers)

print(response.text)

View Asynchronous Status Response Example

json

{
  "code": "success",
  "data": {
    "progress": 100,
    "status": "success",
    "detail": "",
    "result": {
      "version": "v2",
      "pages": [
        {
          "url": "",
          "page_idx": 0,
          "page_width": 1802,
          "page_height": 2332,
          "md": ""
        }
      ]
    }
  }
}

Failed Case

json

{
  "code": "parse_error",
  "msg": "Parse error"
}

Field Explanations

Field	Meaning	Example
data.progress	Task progress, integer from 0~100	100
data.status	processing, failed, success	In progress, failed, success
data.detail	Detailed error information when status=failed	Parse failed, file too large
result.pages	Results
page.url	Page URL, not empty if small images exist on page, otherwise empty	https://cdn.noedgeai.com/xxx.jpg
page.page_idx	Page id, starting from 0
page.page_width/height	Page width/height, unit: pixels
page.md	Markdown format text for this page

POST /api/v2/convert/parse Request Export File (Asynchronous)

Export File Request Parameters

Name	Location	Type	Required	Description
uid	body	json	Yes	Parse task id
to	body	json	Yes	Export format, supports: md\|tex\|docx
formula_mode	body	json	Yes	Export mode, fill in: normal; change to: dollar when exporting md files with $ formula markers
filename	body	json	No	Exported md/tex filename (without extension), default output.md/output.tex, only valid for md and tex
merge_cross_page_forms	body	bool	No	Merge cross-page tables

Export File Request Example

bash

curl --location --request POST 'https://v2.doc2x.noedgeai.com/api/v2/convert/parse' \
--header 'Authorization: Bearer sk-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "uid": "01920000-0000-0000-0000-000000000000",
    "to": "md",
    "formula_mode": "normal",
    "filename": "my_markdown.md",
    "merge_cross_page_forms": false
}'

python

import requests
import json

url = "https://v2.doc2x.noedgeai.com/api/v2/convert/parse"
headers = {
    "Authorization": "Bearer sk-xxx",
    "Content-Type": "application/json",
}

data = {
    "uid": "01920000-0000-0000-0000-000000000000",
    "to": "md",
    "formula_mode": "normal",
    "filename": "my_markdown.md",
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.text)

Export File Response Example

json

{
  "code": "success",
  "data": {
    "status": "processing",
    "url": ""
  }
}

Note: The interface /api/v2/convert/parse is used to trigger export file tasks. Subsequently, you need to use the /api/v2/convert/parse/result interface to poll export task status. Do not repeatedly poll the /convert/parse interface

GET /api/v2/convert/parse/result Export Get Results

Export Get Results Request Parameters

Export Get Results Request Headers

Name	Description	Example Value
Authorization	Api key	Bearer sk-usui9lodl89p7r51suvo0awdawd

Export Get Results Request Body

Name	Location	Type	Required	Description
uid	query	string	Yes	Asynchronous task id

Export Get Results Request Example

bash

curl --location --request GET 'https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result?uid=01920000-0000-0000-0000-000000000000' \
--header 'Authorization: Bearer sk-xxx'

python

import requests

url = 'https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result?uid=01920000-0000-0000-0000-000000000000'
headers = {'Authorization': 'Bearer sk-xxx'}

response = requests.get(url, headers=headers)

print(response.text)

Export Get Results Response Example

json

{
  "code": "success",
  "data": {
    "status": "success",
    "url": "https://doc2x-backend.s3.cn-north-1.amazonaws.com.cn/objects/xxx/convert_tex_none.zip?..."
  }
}

Same as /api/v2/convert/parse export file return results, then you need to use the URL in it to download the file

Download File from URL

After getting successful return examples from /api/v2/convert/parse/result or /api/v2/convert/parse interfaces, you can use HTTP GET method to request the url to download the file:

Note: In some scenarios, the returned url will represent & with \u0026, which needs to be actively replaced with &

Download File from URL Request Example

bash

curl -L -o downloaded_file.zip "https://doc2x-backend.s3.cn-north-1.amazonaws.com.cn/objects/xxx/convert_tex_none.zip?..."

python

import requests

response = requests.get("https://doc2x-backend.s3.cn-north-1.amazonaws.com.cn/objects/xxx/convert_tex_none.zip?...")

with open('downloaded_file.zip', 'wb') as f:
    f.write(response.content)

Error Codes

HTTP Status Codes

When httpcode is 429, it's an API rate limit exceeded error, wait for previously submitted tasks to complete
When httpcode is 200, it's a business-related error

Error Code Descriptions

Error Code	Reason	Solution
parse_task_limit_exceeded	Task number limit exceeded	Number of tasks being processed has reached the limit, wait for previously submitted tasks to complete
parse_concurrency_limit	Task file page limit exceeded	Pages being processed have reached the limit, wait for previously submitted tasks to complete
parse_quota_limit	Insufficient parsing page quota	Current available pages are insufficient
parse_error	Parse error	Wait briefly and retry, if error persists contact the person in charge
parse_create_task_error	Task creation failed	Wait briefly and retry, if error persists contact the person in charge
parse_status_not_found	Status expired or uid error	Wait briefly and retry, if error persists contact the person in charge
parse_file_too_large	Single file size exceeds limit	Currently allows single file size <= 300M, please split PDF
parse_page_limit_exceeded	Single file page count exceeds limit	Currently allows single file pages <= 2000 pages, please split PDF
parse_file_lock	File parsing failed	To prevent repeated parsing, temporarily locked for one day. Consider PDF compatibility issues, try reprinting and try again. If still fails, feedback request_id to person in charge
parse_file_not_pdf	Uploaded file is not a PDF	Please parse files with .pdf extension
parse_file_invalid	Parse file error or invalid	We cannot parse this PDF, usually PDF format issues or non-standard PDF
parse_timeout	Processing time exceeds 15min	Usually caused by content being too long to process completely within 15min, try splitting PDF for recognition

Useful Integrations

Packaged Python Library - pdfdeal

Source code: https://github.com/NoEdgeAI/pdfdeal-docs
Documentation: https://noedgeai.github.io/pdfdeal-docs/zh/guide/

Coze Plugin - PDF Recognition from URL

Plugin address: https://www.coze.cn/store/plugin/7398010704374153253

Other References

Example reference for multi-level title hierarchy enhancement using LLM

Doc2X API v2 PDF Interface Documentation ​

Basic Information ​

Base URL ​

Important Reminders ​

Authorization ​

POST /api/v2/parse/preupload File Pre-upload ​

Recommended to use this interface for faster upload speeds ​

Request Parameters ​

Request Example ​

Response Example ​

Interface Description ​

Python Example ​

Notes ​

GET /api/v2/parse/status View Asynchronous Status ​

View Asynchronous Status Request Parameters ​

Request Headers ​

Request Body ​

View Asynchronous Status Request Example ​

View Asynchronous Status Response Example ​

Failed Case ​

Field Explanations ​

POST /api/v2/convert/parse Request Export File (Asynchronous) ​

Export File Request Parameters ​

Export File Request Example ​

Export File Response Example ​

GET /api/v2/convert/parse/result Export Get Results ​

Export Get Results Request Parameters ​

Export Get Results Request Headers ​

Export Get Results Request Body ​

Export Get Results Request Example ​

Export Get Results Response Example ​

Download File from URL ​

Download File from URL Request Example ​

Error Codes ​

HTTP Status Codes ​

Error Code Descriptions ​

Useful Integrations ​

Packaged Python Library - pdfdeal ​

Coze Plugin - PDF Recognition from URL ​

Other References ​

Doc2X API v2 PDF Interface Documentation

Basic Information

Base URL

Important Reminders

Authorization

POST /api/v2/parse/preupload File Pre-upload

Recommended to use this interface for faster upload speeds

Request Parameters

Request Example

Response Example

Interface Description

Python Example

Notes

GET /api/v2/parse/status View Asynchronous Status

View Asynchronous Status Request Parameters

Request Headers

Request Body

View Asynchronous Status Request Example

View Asynchronous Status Response Example

Failed Case

Field Explanations

POST /api/v2/convert/parse Request Export File (Asynchronous)

Export File Request Parameters

Export File Request Example

Export File Response Example

GET /api/v2/convert/parse/result Export Get Results

Export Get Results Request Parameters

Export Get Results Request Headers

Export Get Results Request Body

Export Get Results Request Example

Export Get Results Response Example

Download File from URL

Download File from URL Request Example

Error Codes

HTTP Status Codes

Error Code Descriptions

Useful Integrations

Packaged Python Library - pdfdeal

Coze Plugin - PDF Recognition from URL

Other References