Doc2X MCP Guide
This guide explains how to set up Doc2X MCP in different clients and use MCP to call Doc2X services (for example, parsing PDFs).
Note: MCP (stdio) means the client starts a local process (here
npx ...doc2x-mcp) and communicates via stdin/stdout. Network/proxy settings, Node.js, and how the client injects environment variables will directly affect whether setup succeeds.
Prerequisites
- A Doc2X API Key (example:
sk-xxx). Get it from:open.noedgeai.com - Node.js installed, with
npxavailable - Network access to Doc2X services (users outside mainland China may need a proxy)
Generic MCP Configuration
In any client that supports MCP (stdio), add a configuration like this:
{
"command": "npx",
"args": ["-y", "@noedgeai-org/doc2x-mcp@latest"],
"env": {
"DOC2X_API_KEY": "sk-xxx"
}
}Environment Variables
| Variable | Required | Default |
|---|---|---|
DOC2X_API_KEY | Yes | - |
DOC2X_BASE_URL | No | https://v2.doc2x.noedgeai.com |
DOC2X_HTTP_TIMEOUT_MS | No | 60000 |
DOC2X_POLL_INTERVAL_MS | No | 2000 |
DOC2X_MAX_WAIT_MS | No | 600000 |
DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS | No | 5000 |
DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES | No | 10 |
DOC2X_DOWNLOAD_URL_ALLOWLIST | No | .amazonaws.com.cn,.aliyuncs.com,.noedgeai.com |
Notes:
DOC2X_API_KEY: Doc2X API Key (looks likesk-xxx)DOC2X_HTTP_TIMEOUT_MS/DOC2X_POLL_INTERVAL_MS/DOC2X_MAX_WAIT_MS: in millisecondsDOC2X_PARSE_PDF_MAX_OUTPUT_CHARS: limits the max characters returned bydoc2x_parse_pdf_wait_text(0means unlimited)DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES: limits the max pages merged bydoc2x_parse_pdf_wait_text(0means unlimited)DOC2X_DOWNLOAD_URL_ALLOWLIST: allowlisted hosts fordoc2x_download_url_to_file(comma-separated;*allows any host, not recommended)
FAQ
npxnot found- Make sure Node.js is installed (LTS recommended)
- Reopen your terminal or check your
PATH
Dependency installation is slow or fails
- Check network/proxy settings
- Try switching the npm registry mirror and retry
API Key not taking effect
- Ensure
DOC2X_API_KEYis injected when starting the MCP server - Check client logs to confirm environment variables are loaded
- Ensure
Why
formula_leveldoes not take effectformula_levelonly works for parse tasks created withv3-2026- If the source parse task uses default
v2, changingformula_levelmay appear to make no difference - To compare
formula_level=0/1/2, submit the PDF parse task withmodel: "v3-2026"first
Why three exports look the same
- For the same
uid + to, later export outputs may overwrite earlier ones - When comparing
formula_level=0/1/2, download immediately after each export succeeds - Do not wait and download all at the end, otherwise you may only get the last export result
- For the same
Why is the model input truncated / output incomplete
doc2x_parse_pdf_wait_textmerges the parsed result into a single text response; if it’s too long, it may exceed the model/session context limit and get truncated on input- Use
DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES/DOC2X_PARSE_PDF_MAX_OUTPUT_CHARSto cap merged pages and returned characters (recommended defaults:10/5000; if it still truncates, reduce further, e.g.5/3000) - If you need the full content, prefer exporting to a local file (md/tex/docx) and then do summarization/translation/extraction based on the exported files, instead of pasting the entire text into chat
Downloaded file looks garbled
- First check whether the downloaded file is actually an archive (for example
.zip); some export results must be extracted before viewing - Try
unzip your_file.zip -d output_dir, then open the extracted file - If it is a plain text file like
.mdor.tex, open it with a UTF-8 compatible editor
- First check whether the downloaded file is actually an archive (for example
Not sure how to prompt / the model didn’t call Doc2X MCP
In your prompt, explicitly say to use
doc2x-mcp(or Doc2X MCP), and clearly include:- A local file path (prefer an absolute path, or a path relative to the current working directory)
- Whether to export
- Export format (md / tex / docx)
- (If needed) output path or follow-up requirements
Recommended prompt templates (replace the path with yours; prefer absolute paths):
Parse a PDF and export to Markdown / LaTeX / Word
textUse doc2x-mcp to parse /abs/path/paper.pdf, and export to md|tex|docx (e.g., md).Parse → export to Markdown → summarize key points
textFirst use doc2x-mcp to parse /abs/path/paper.pdf and export to md; then give me 10 key takeaways + 1 short summary based on the exported content.Parse → export to Markdown → translate into Chinese (keep formulas/code)
textUse doc2x-mcp to parse /abs/path/paper.pdf and export to md; then translate the main text into Chinese, but keep code blocks and math formulas unchanged.Parse → export to Markdown → generate a table of contents
textUse doc2x-mcp to parse /abs/path/paper.pdf and export to md; then generate a table of contents based on heading levels.Parse → export to Markdown → split output by H1
textUse doc2x-mcp to parse /abs/path/paper.pdf and export to md; if the content is long, split the output by H1 headings and label each section with its title.
Client Guides
Install the Skill (Optional)
This installs the local Skill shipped in the doc2x-mcp repo for Codex CLI / Claude Code. It packages a repeatable workflow for using doc2x-mcp tools (parse → export → download / troubleshooting), so you don’t have to restate the steps in every chat.
Note: the one-liner runs a remote script (
curl | sh/irm | iex). In enterprise/production environments, review the script first. See also: Claude Code Skills docs.
One-liner
macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | shWindows PowerShell:
irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iexBy default this installs to both Codex CLI + Claude Code. After installation, check these directories exist (restart the client if needed):
- Codex CLI:
~/.codex/skills/public/doc2x-mcp(override withCODEX_HOME) - Claude Code:
~/.claude/skills/doc2x-mcp(override withCLAUDE_HOME)
Skill content (SKILL.md, open to view):
Project Links
- Repo: NoEdgeAI/doc2x-mcp
- Issues: NoEdgeAI/doc2x-mcp/issues (suggestions / bug reports)