智能抽取 - Textin 智能文档解析

general information extration

curl --request POST \
  --url https://api.textin.com/ai/service/v3/entity_extraction \
  --header 'Content-Type: application/json' \
  --header 'x-ti-app-id: <api-key>' \
  --header 'x-ti-secret-code: <api-key>' \
  --data '
{
  "file": {
    "file_base64": "/9j/4AAQSk...",
    "file_url": "https://example.com/document.pdf",
    "file_name": "document.pdf"
  },
  "schema": {
    "type": "object",
    "properties": {
      "商品": {
        "type": "string",
        "description": "商品名称"
      }
    },
    "required": [
      "商品"
    ]
  },
  "parse_options": {
    "page_start": 1,
    "page_count": 10,
    "get_image": "objects",
    "crop_dewarp": 0,
    "remove_watermark": 0,
    "parse_mode": "scan",
    "formula_level": 0,
    "table_flavor": "html",
    "pdf_pwd": "<string>"
  },
  "extract_options": {
    "generate_citations": true,
    "stamp": true
  }
}
'

{
  "code": 200,
  "message": "Success",
  "version": "v3.0.29_20250819",
  "duration": 8267,
  "x_request_id": "7596b8c9d2ddbc9924b66651e9efc174",
  "status": "finished",
  "result": {
    "success_count": 1,
    "extracted_schema": {
      "商品": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）"
    },
    "citations": {
      "商品": {
        "value": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）",
        "bounding_regions": [
          {
            "page_number": 1,
            "position": [
              137,
              599,
              1129,
              599,
              1129,
              625,
              182,
              625
            ],
            "text": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）"
          }
        ]
      }
    },
    "pages": [
      {
        "page_number": 1,
        "status": "Success",
        "durations": 930.178466796875,
        "image_id": "62bfe3c3a8e9c9cf.jpg",
        "height": 1824,
        "width": 600,
        "angle": 0
      }
    ],
    "stamps": [
      {
        "color": "红色",
        "position": [
          1223,
          995,
          1642,
          1007,
          1630,
          1689,
          1621,
          1677
        ],
        "stamp_shape": "圆章",
        "type": "公章",
        "value": "电力公司专用章"
      }
    ]
  },
  "part_durations": {
    "parse_duration": 1080,
    "retrieve_duration": 0,
    "prompt_duration": 1,
    "llm_duration": 7114,
    "format_duration": 51
  }
}

POST

service

entity_extraction

general information extration

curl --request POST \
  --url https://api.textin.com/ai/service/v3/entity_extraction \
  --header 'Content-Type: application/json' \
  --header 'x-ti-app-id: <api-key>' \
  --header 'x-ti-secret-code: <api-key>' \
  --data '
{
  "file": {
    "file_base64": "/9j/4AAQSk...",
    "file_url": "https://example.com/document.pdf",
    "file_name": "document.pdf"
  },
  "schema": {
    "type": "object",
    "properties": {
      "商品": {
        "type": "string",
        "description": "商品名称"
      }
    },
    "required": [
      "商品"
    ]
  },
  "parse_options": {
    "page_start": 1,
    "page_count": 10,
    "get_image": "objects",
    "crop_dewarp": 0,
    "remove_watermark": 0,
    "parse_mode": "scan",
    "formula_level": 0,
    "table_flavor": "html",
    "pdf_pwd": "<string>"
  },
  "extract_options": {
    "generate_citations": true,
    "stamp": true
  }
}
'

{
  "code": 200,
  "message": "Success",
  "version": "v3.0.29_20250819",
  "duration": 8267,
  "x_request_id": "7596b8c9d2ddbc9924b66651e9efc174",
  "status": "finished",
  "result": {
    "success_count": 1,
    "extracted_schema": {
      "商品": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）"
    },
    "citations": {
      "商品": {
        "value": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）",
        "bounding_regions": [
          {
            "page_number": 1,
            "position": [
              137,
              599,
              1129,
              599,
              1129,
              625,
              182,
              625
            ],
            "text": "童装 Looney Tunes UT（短袖T恤）女装SUPIMA COTTON圆领T恤（短袖）"
          }
        ]
      }
    },
    "pages": [
      {
        "page_number": 1,
        "status": "Success",
        "durations": 930.178466796875,
        "image_id": "62bfe3c3a8e9c9cf.jpg",
        "height": 1824,
        "width": 600,
        "angle": 0
      }
    ],
    "stamps": [
      {
        "color": "红色",
        "position": [
          1223,
          995,
          1642,
          1007,
          1630,
          1689,
          1621,
          1677
        ],
        "stamp_shape": "圆章",
        "type": "公章",
        "value": "电力公司专用章"
      }
    ]
  },
  "part_durations": {
    "parse_duration": 1080,
    "retrieve_duration": 0,
    "prompt_duration": 1,
    "llm_duration": 7114,
    "format_duration": 51
  }
}

授权

x-ti-app-id

string

header

必填

请登录Textin后前往 "工作台-账号设置-开发者信息" 查看 x-ti-app-id

x-ti-secret-code

string

header

必填

请登录Textin后前往 "工作台-账号设置-开发者信息" 查看 x-ti-secret-code

请求体

application/json

支持的文件格式：png, jpg, jpeg, pdf, bmp, tiff, webp, doc, docx, html, mhtml, xls, xlsx, csv, ppt, pptx, txt, ofd；

支持schema模式的结构化信息抽取，通过定义字段结构进行精确抽取。

file

object

必填

文件信息

显示子属性

schema

object

必填

抽取数据结构，参考JSON schema说明

示例:

{
  "type": "object",
  "properties": {
    "商品": { "type": "string", "description": "商品名称" }
  },
  "required": ["商品"]
}

parse_options

object

解析阶段参数

显示子属性

extract_options

object

高级抽取控制

显示子属性

响应

200 - application/json

返回结果

code

enum<integer>

必填

状态码

200: Success (成功)
40101: x-ti-app-id 或 x-ti-secret-code 为空
40102: x-ti-app-id 或 x-ti-secret-code 无效，验证失败
40103: 客户端IP不在白名单
40003: 余额不足，请充值后再使用
40004: Parameter error (参数错误，请检查入参）
40007: 机器人不存在或未发布
40008: 机器人未开通，请至市场开通后重试
40302: 上传文件大小不符，文件大小不超过 50M
40303: 文件类型不支持，接口会返回实际检测到的文件类型，如“当前文件类型为.gif”
40304: 图片尺寸不符，长宽比小于2的图片宽高需在20～20000像素范围内，其他图片的宽高需在20～10000像素范围内
40305: File not uploaded (识别文件未上传）
40306: qps超过限制
40400: 无效的请求链接，请检查链接是否正确
40422: The file is corrupted (文件损坏)
40423: Password required or incorrect password (PDF密码错误)
40424: Page number out of range (页面设置超出文件范围）
40425: The input file format is not supported (输入文件格式不支持）
40428: Process office file failed (word和ppt转pdf失败或者超时)
500: Engine failed (服务器内部错误）
50011: LLM Connection Failed （访问大模型超时）
50012: LLM Engine Failed (大模型引擎错误）
50207: Partial failed (部分页面解析失败)

可用选项:

200,

40101,

40102,

40103,

40003,

40004,

40007,

40008,

40302,

40303,

40304,

40305,

40306,

40400,

40422,

40423,

40424,

40425,

40428,

500,

50011,

50012,

50207

示例:

200

message

string

必填

成功或错误信息

示例:

"Success"

version

string

必填

版本号

示例:

"v3.0.29_20250819"

duration

integer

总耗时(ms)

示例:

8267

x_request_id

string

请求ID

示例:

"7596b8c9d2ddbc9924b66651e9efc174"

status

string

处理状态

示例:

"finished"

result

object

显示子属性

part_durations

object

各阶段耗时统计

显示子属性

文档解析文字识别VLM

⌘I