跳转到主要内容
POST
/
ai
/
service
/
v3
/
entity_extraction
general information extration
curl --request POST \
  --url https://api.textin.com/ai/service/v3/entity_extraction \
  --header 'Content-Type: application/json' \
  --header 'x-ti-app-id: <api-key>' \
  --header 'x-ti-secret-code: <api-key>' \
  --data '{
  "file": {
    "file_base64": "/9j/4AAQSk...",
    "file_url": "https://example.com/document.pdf",
    "file_name": "document.pdf"
  },
  "schema": {
    "type": "object",
    "properties": {
      "商品": {
        "type": "string",
        "description": "商品名称"
      }
    },
    "required": [
      "商品"
    ]
  },
  "parse_options": {
    "page_start": 1,
    "page_count": 10,
    "get_image": "objects",
    "crop_dewarp": 0,
    "remove_watermark": 0,
    "parse_mode": "scan",
    "formula_level": 0,
    "table_flavor": "html",
    "pdf_pwd": "<string>"
  },
  "extract_options": {
    "generate_citations": true,
    "stamp": true
  }
}'
{
  "code": 200,
  "message": "Success",
  "version": "v3.0.29_20250819",
  "duration": 8267,
  "x_request_id": "7596b8c9d2ddbc9924b66651e9efc174",
  "status": "finished",
  "result": {
    "success_count": 1,
    "extracted_schema": {
      "商品": "童装 Looney Tunes UT(短袖T恤)女装SUPIMA COTTON圆领T恤(短袖)"
    },
    "citations": {
      "商品": {
        "value": "童装 Looney Tunes UT(短袖T恤)女装SUPIMA COTTON圆领T恤(短袖)",
        "bounding_regions": [
          {
            "page_number": 1,
            "position": [
              137,
              599,
              1129,
              599,
              1129,
              625,
              182,
              625
            ],
            "text": "童装 Looney Tunes UT(短袖T恤)女装SUPIMA COTTON圆领T恤(短袖)"
          }
        ]
      }
    },
    "pages": [
      {
        "page_number": 1,
        "image_id": "62bfe3c3a8e9c9cf.jpg",
        "height": 1824,
        "width": 600,
        "angle": 0,
        "status": "Success",
        "durations": 930.178466796875
      }
    ],
    "stamps": [
      {
        "color": "红色",
        "position": [
          1223,
          995,
          1642,
          1007,
          1630,
          1689,
          1621,
          1677
        ],
        "stamp_shape": "圆章",
        "type": "公章",
        "value": "电力公司专用章"
      }
    ]
  },
  "part_durations": {
    "parse_duration": 1080,
    "retrieve_duration": 0,
    "prompt_duration": 1,
    "llm_duration": 7114,
    "format_duration": 51
  }
}

Authorizations

x-ti-app-id
string
header
required

登录Textin后前往 "工作台-账号设置-开发者信息" 查看 x-ti-app-id

x-ti-secret-code
string
header
required

登录Textin后前往 "工作台-账号设置-开发者信息" 查看 x-ti-secret-code

Body

application/json

支持的文件格式:png, jpg, jpeg, pdf, bmp, tiff, webp, doc, docx, html, mhtml, xls, xlsx, csv, ppt, pptx, txt, ofd;

支持schema模式的结构化信息抽取,通过定义字段结构进行精确抽取。

file
object
required

文件信息

schema
object
required

抽取数据结构,参考JSON schema说明

Example:
{
"type": "object",
"properties": {
"商品": { "type": "string", "description": "商品名称" }
},
"required": ["商品"]
}
parse_options
object

解析阶段参数

extract_options
object

高级抽取控制

Response

200 - application/json

返回结果

code
enum<integer>
required

状态码

  • 200: Success (成功)
  • 40101: x-ti-app-id 或 x-ti-secret-code 为空
  • 40102: x-ti-app-id 或 x-ti-secret-code 无效,验证失败
  • 40103: 客户端IP不在白名单
  • 40003: 余额不足,请充值后再使用
  • 40004: Parameter error (参数错误,请检查入参)
  • 40007: 机器人不存在或未发布
  • 40008: 机器人未开通,请至市场开通后重试
  • 40302: 上传文件大小不符,文件大小不超过 50M
  • 40303: 文件类型不支持,接口会返回实际检测到的文件类型,如“当前文件类型为.gif”
  • 40304: 图片尺寸不符,长宽比小于2的图片宽高需在20~20000像素范围内,其他图片的宽高需在20~10000像素范围内
  • 40305: File not uploaded (识别文件未上传)
  • 40306: qps超过限制
  • 40400: 无效的请求链接,请检查链接是否正确
  • 40422: The file is corrupted (文件损坏)
  • 40423: Password required or incorrect password (PDF密码错误)
  • 40424: Page number out of range (页面设置超出文件范围)
  • 40425: The input file format is not supported (输入文件格式不支持)
  • 40428: Process office file failed (word和ppt转pdf失败或者超时)
  • 500: Engine failed (服务器内部错误)
  • 50011: LLM Connection Failed (访问大模型超时)
  • 50012: LLM Engine Failed (大模型引擎错误)
  • 50207: Partial failed (部分页面解析失败)
可用选项:
200,
40101,
40102,
40103,
40003,
40004,
40007,
40008,
40302,
40303,
40304,
40305,
40306,
40400,
40422,
40423,
40424,
40425,
40428,
500,
50011,
50012,
50207
Example:

200

message
string
required

成功或错误信息

Example:

"Success"

version
string
required

版本号

Example:

"v3.0.29_20250819"

duration
integer

总耗时(ms)

Example:

8267

x_request_id
string

请求ID

Example:

"7596b8c9d2ddbc9924b66651e9efc174"

status
string

处理状态

Example:

"finished"

result
object
part_durations
object

各阶段耗时统计