Extract - Textin 智能文档解析

curl --request POST \ --url https://api.textin.com/api/xparse/extract/sync \ --header 'Content-Type: multipart/form-data' \ --header 'x-ti-app-id: <api-key>' \ --header 'x-ti-secret-code: <api-key>' \ --form file='@example-file' \ --form 'extract_config={"schema": {"type": "object", "properties": {"商品": {"type": ["string","null"], "description": ""}}, "required": ["商品"]}, "generate_citations": false, "stamp": false}' \ --form 'parse_config={"provider": "textin", "parse_mode": "auto"}'

{ "code": 200, "msg": "success", "data": { "file_id": "xxx", "status": "completed", "result": { "success_count": 1, "extracted_schema": { "商品": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）" }, "citations": { "商品": { "value": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）", "bounding_regions": [ { "page_number": 1, "position": [ 137, 599, 1129, 599, 1129, 625, 182, 625 ], "text": "童装 Looney Tunes UT（短袖T恤）" } ] } }, "pages": [ { "page_number": 1, "image_id": "62bfe3c3a8e9c9cf.jpg", "height": 1824, "width": 600, "angle": 0, "status": "Success", "durations": 930.178466796875 } ], "stamps": [] } } }

授权

x-ti-app-id

string

header

必填

请登录Textin后前往 "工作台-账号设置-开发者信息" 查看 x-ti-app-id

x-ti-secret-code

string

header

必填

请登录Textin后前往 "工作台-账号设置-开发者信息" 查看 x-ti-secret-code

请求体

multipart/form-data

file

必填

需要处理的文档文件（支持 PDF、WORD、EXCEL、PPT、图片等多种格式）

extract_config

string

必填

抽取配置的 JSON 字符串（必填），与Pipeline Extract节点配置一致。

必须包含以下字段：

schema: JSON Schema定义，用于指定要抽取的字段结构
generate_citations: 是否生成引用信息（坐标位置），默认 false
stamp: 是否调用印章识别，默认 false

配置格式参考信息抽取 - Extract。

示例:

"{\"schema\": {\"type\": \"object\", \"properties\": {\"商品\": {\"type\": [\"string\",\"null\"], \"description\": \"\"}}, \"required\": [\"商品\"]}, \"generate_citations\": false, \"stamp\": false}"

parse_config

string

Parse配置的 JSON 字符串（可选），与Pipeline Parse节点配置一致。

如果未提供，将使用默认的Parse配置（provider: "textin"）。

配置格式参考文档解析 - Parse。

示例:

"{\"provider\": \"textin\", \"parse_mode\": \"auto\"}"

响应

200 - application/json

抽取结果

code

enum<integer>

默认值:200

必填

状态码

200: Success (成功)
40101: x-ti-app-id 或 x-ti-secret-code 为空
40102: x-ti-app-id 或 x-ti-secret-code 无效，验证失败
40103: 客户端IP不在白名单
40003: 余额不足，请充值后再使用
40004: Parameter error (参数错误，请检查入参）
40007: 机器人不存在或未发布
40008: 机器人未开通，请至市场开通后重试
40302: 上传文件大小不符，文件大小不超过 50M
40303: 文件类型不支持，接口会返回实际检测到的文件类型，如"当前文件类型为.gif"
40304: 图片尺寸不符，长宽比小于2的图片宽高需在20～20000像素范围内，其他图片的宽高需在20～10000像素范围内
40305: File not uploaded (识别文件未上传）
40306: qps超过限制
40400: 无效的请求链接，请检查链接是否正确
40422: The file is corrupted (文件损坏)
40423: Password required or incorrect password (PDF密码错误)
40424: Page number out of range (页面设置超出文件范围）
40425: The input file format is not supported (输入文件格式不支持）
40428: Process office file failed (word和ppt转pdf失败或者超时)
500: Engine failed (服务器内部错误）
50011: LLM Connection Failed （访问大模型超时）
50012: LLM Engine Failed (大模型引擎错误）
50207: Partial failed (部分页面解析失败)

更多详细错误信息参考错误码说明。

可用选项:

200,

40101,

40102,

40103,

40003,

40004,

40007,

40008,

40302,

40303,

40304,

40305,

40306,

40400,

40422,

40423,

40424,

40425,

40428,

500,

50011,

50012,

50207

msg

string

必填

错误信息

示例:

"success"

data

object

必填

Show child attributes