Overview
This endpoint provides image-to-text functionality using vision-enabled models in response mode. provide image URLs to extract text content, describe scenes, or analyze visual information.Authentication
This endpoint requires authentication using a Bearer token.Your API key in the format:
sk-*****Request Body
The model to use for the request. Must be a vision-enabled model.
Array of message objects with role and content. Each message contains:
role: “user” or “assistant”content: Array of content objects supporting the following types:- input_text: Text prompt with
textfield - input_image: Image input with
image_urlfield, supports:- Base64 encoded images:
data:image/jpeg;base64,{base64_string} - Supported formats: JPG, PNG
- Base64 encoded images:
- input_text: Text prompt with
Whether to stream the response
Request Example
Response
Unique identifier for the response
Object type, always “response”
Unix timestamp of when the response was created
The model used for generating the response
The generated text output (the extracted text or image analysis)
Token usage statistics

