gemini-2.5-pro (Image To Text) - GPTProto API Documentation

Authentication

Sign up for a GPTProto account at https://gptproto.com
Navigate to the API Keys section in your dashboard
Generate a new API key (sk-xxxxx)
Copy and securely store your API key For authentication details, please refer to the Authentication section.

Initiate Request

curl --location 'https://gptproto.com/v1beta/models/gemini-2.5-pro:generateContent' \
--header 'Authorization: GPTPROTO_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "What is shown in this PNG image?"
        },
        {
          "file_data": {
            "mime_type": "image/png",
            "file_uri": "https://tos.gptproto.com/resource/cat.png"
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "thinkingConfig": {
      "includeThoughts": true,
      "thinkingBudget": 1000
    }
  }
}'

Response Example

{
    "candidates": [
        {
            "content": {
                "role": "model",
                "parts": [
                    {
                        "text": "This image depicts a cozy, serene indoor scene, likely generated by AI or created as digital art, characterized by warm lighting and a peaceful atmosphere. Here are the key elements shown:\n\n**The Subject:**\n*   **The Cat:** The central focus is a fluffy, orange (ginger) tabby cat sitting upright on a wooden window sill. The cat has striking green eyes, long white whiskers, and a very bushy tail that curls slightly over the edge of the sill. It is looking forward with a calm expression.\n\n**The Setting:**\n*   **Window:** The cat is positioned in front of a large window with a wooden frame. Through the glass, you can see large trees with green foliage, suggesting a garden or park outside. The background is blurred (bokeh effect), keeping the focus on the interior.\n*   **Lighting:** Soft, golden sunlight is streaming in from the left, illuminating the cat's fur and creating a halo effect. There are visible dust motes or magical sparkles floating in the sunbeams, adding to the dreamy quality.\n\n**Surrounding Objects:**\n*   **Book:** To the left of the cat on the window sill, there is a thick, closed book lying flat.\n*   **Tea:** In the lower right corner, sitting on a small round wooden side table, is a teacup on a saucer. Steam is rising from the cup, indicating a hot beverage. A tea bag tag hangs over the side.\n*   **Interior Details:** On the right side of the image, there is a dark wooden bookshelf filled with books. Partially visible is a brown leather armchair with a chunky, cream-colored knit blanket draped over it, enhancing the \"hygge\" or cozy feel.\n*   **Wall Art:** A framed picture hangs on the wall near the bookshelf, though the image inside the frame is indistinct.\n\nOverall, the image conveys a sense of comfort, quiet contemplation, and the warmth of a relaxing afternoon.",
                        "thoughtSignature": "Cv0oAY89a19fZ/pV0io*****"
                    }
                ]
            },
            "finishReason": "STOP"
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 1097,
        "candidatesTokenCount": 403,
        "totalTokenCount": 2779,
        "trafficType": "ON_DEMAND",
        "promptTokensDetails": [
            {
                "modality": "IMAGE",
                "tokenCount": 1089
            },
            {
                "modality": "TEXT",
                "tokenCount": 8
            }
        ],
        "candidatesTokensDetails": [
            {
                "modality": "TEXT",
                "tokenCount": 403
            }
        ],
        "thoughtsTokenCount": 1279
    },
    "modelVersion": "gemini-2.5-pro",
    "createTime": "2025-12-22T13:20:13.753813Z",
    "responseId": "jUVJaZWBLvzN998P1tDo4Q0"
}

Parameters

Path Parameters

Endpoint: https://gptproto.com/v1beta/models/{model}:{method}

Google Gemini API provides two methods for content generation, distinguished by whether they return responses incrementally (streaming) or all at once (non-streaming).

Method	Example	Description
`generateContent`	`https://gptproto.com/v1beta/models/gemini-2.5-pro:generateContent`	Generates a complete response all at once. Best for applications where you need the full response before processing.
`streamGenerateContent`	`https://gptproto.com/v1beta/models/gemini-2.5-pro:streamGenerateContent`	Streams the response incrementally as it’s generated. Ideal for chat interfaces and real-time applications where latency is important.

Core Parameters

Parameter	Type	Required	Default	Range	Description
`contents`	array	✅ Yes	-	-	Content of the current conversation with the model. For single-turn queries, this contains one instance. For multi-turn queries (e.g., chat), this contains the conversation history and the latest request.
>`contents.role`	string	✅ Yes	-	`user` `model`	The role of the message sender.
>`contents.parts`	array	✅ Yes	-	-	The content parts of the message, which can contain different types of content (text, inlineData, etc.).
>>`contents.parts.text`	string	✅ Yes	-	-	Text content of the part. For multimodal input details, see Multimodal Input.

Advanced Parameters

Parameter	Type	Required	Default	Range	Description
`tools`	array	❌ No	-	-	List of tools the model may use to generate the next response. Supported tools include Function and codeExecution.
`toolConfig`	object	❌ No	-	-	Configuration for any tools specified in the request.
`safetySettings`	array	❌ No	-	-	List of unique SafetySetting instances for filtering unsafe content. Each SafetyCategory should have at most one setting. see SafetySetting
`generationConfig`	object	❌ No	-	-	Configuration options for content generation.
>`generationConfig.temperature`	number	❌ No	-	0.0-1.0	Controls the randomness of the output. Lower values produce more deterministic results.
>`generationConfig.topP`	number	❌ No	-	0.0-1.0	Nucleus sampling probability threshold.
>`generationConfig.topK`	integer	❌ No	-	-	Top-k sampling parameter.
>`generationConfig.maxOutputTokens`	integer	❌ No	-	-	Maximum number of tokens to generate.
>`generationConfig.thinkingConfig`	object	❌ No	-	-	Configuration for thinking functionality. If set for models that don’t support thinking, the system will return an error. See Thinking Config for details.
>`generationConfig.imageConfig`	object	❌ No	-	-	Configuration for image generation. If set for models that don’t support these configuration options, the system will return an error. See Image Config for details.
>`generationConfig.mediaResolution`	enum	❌ No	-	`MEDIA_RESOLUTION_UNSPECIFIED` `MEDIA_RESOLUTION_LOW` `MEDIA_RESOLUTION_MEDIUM` `MEDIA_RESOLUTION_HIGH`	If specified, uses the specified media resolution. Note: This field describes the resolution of input media. To control the resolution of output images, use the `imageConfig` field instead.

Multimodal Input

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "base64-encoded-image-data"
          }
        },
        {
          "text": "Describe this image."
        }
      ]
    }
  ]
}

parameter	Type	Required	Default	Range / Example	Description
`contents.parts`	array	✅ Yes	—	`text` `inlineData` `fileData`	The content parts of the message, which can contain different types of content .
>`contents.parts.inlineData`	object	❌ No	-	-	Inline media content. If used, `data` must be base64-encoded.
>>`contents.parts.inlineData.mimeType`	string	✅ Yes (if inline_data is used)	-	`application/pdf` `image/jpeg`	The IANA-standard MIME type of the source data. If the provided MIME type is not supported, the system will return an error.
>>`contents.parts.inlineData.data`	string	✅ Yes (if inline_data is used)	-	-	Base64-encoded media data.
>`contents.parts.fileData`	object	❌ No	-	-	File media content. If used, `fileUri` must be provided.
>>`contents.parts.fileData.mimeType`	string	✅ Yes (if file_data is used)	-	`application/pdf` `image/jpeg`	The IANA-standard MIME type of the source data. If the provided MIME type is not supported, the system will return an error.
>>`contents.parts.fileData.fileUri`	string	✅ Yes (if file_data is used)	-	-	The URI of the file to be processed.

Safety Settings

{
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_HIGH_AND_ABOVE"
    }
  ]
}

parameter	Type	Required	Default	Range / Example	Description
`category`	string	✅ Yes	-	`HARM_CATEGORY_HATE_SPEECH` `HARM_CATEGORY_SEXUALLY_EXPLICIT` `HARM_CATEGORY_DANGEROUS_CONTENT` `HARM_CATEGORY_HARASSMENT` `HARM_CATEGORY_CIVIC_INTEGRITY`	The harm category to apply the safety setting to.
`threshold`	string	✅ Yes	-	`BLOCK_ONLY_HIGH` `BLOCK_MEDIUM_AND_ABOVE` `BLOCK_LOW_AND_ABOVE` `BLOCK_NONE`	The threshold for blocking content.

Generation Config

{
  "generationConfig": {
    "temperature": 0.7,
    "topP": 0.95,
    "topk": 40,
    "maxOutputTokens": 1024,
    "stopSequences": ["Human:"],
    "response_mimeType": "text/plain"
  }
}

parameter	Type	Required	Default	Range / Example	Description
`temperature`	number	❌ No	-	0.0-1.0	Controls the randomness of the output.
`topP`	number	❌ No	-	0.0-1.0	Nucleus sampling threshold.
`topk`	integer	❌ No	-	-	Top-k sampling parameter.
`maxOutputTokens`	integer	❌ No	-	-	Maximum number of tokens to generate.
`stopSequences`	array	❌ No	-	-	Sequences at which to stop generation.
`responseMimeType`	string	❌ No	-	`text/plain` `application/json`	MIME type of the response.

Thinking Config

{
  "generationConfig": {
    "thinkingConfig": {
      "includeThoughts": true,
      "thinkingBudget": 1000
    }
  }
}

Note: thinking_level is only supported on Gemini 3.0 and above. It cannot be used together with thinking_budget; doing so will return an error.

parameter	Type	Required	Default	Range / Example	Description
`thinkingConfig`	object	❌ No	-	-	Configuration for thinking functionality.
>`thinkingConfig.includeThoughts`	boolean	❌ No	-	-	Indicates whether to include thoughts in the response. If true, thoughts are only returned when thinking is enabled.
>`thinkingConfig.thinkingBudget`	integer	❌ No	-	0-24576	Specifies the maximum number of tokens for generated thoughts.
>`thinkingConfig.thinkingLevel`	enum	❌ No	`high`	`low` `high` `medium` `minimal`	Controls the maximum depth of the model’s internal reasoning process before generating a response. If not specified, the default is HIGH. Recommended for Gemini 3 or newer models. Using it with older models may cause errors.

Image Config

{
  "generationConfig": {
    "imageConfig": {
      "aspectRatio": "1:1",
      "imageSize": "1k"
    }
  }
}

parameter	Type	Required	Default	Range / Example	Description
`imageConfig`	object	❌ No	-	-	Configuration for image generation.
>`imageConfig.aspectRatio`	string	❌ No	-	`1:1` `2:3` `3:2` `3:4` `4:3` `9:16` `16:9` `21:9`	Aspect ratio of the generated image. If not specified, the model will select the appropriate aspect ratio based on the specified content.
>`imageConfig.imageSize`	string	❌ No	-	`1k` `2k` `4k`	Approximate size of the generated image. If not specified, the model will use the default value of 1k.

Web Search

Web Search Grounding allows Gemini models to connect with real-time web content, providing more accurate answers with verifiable sources.

Basic Usage

{
  "tools": [
    {
      "google_search": {}
    }
  ]
}

Parameter	Type	Required	Default	Range / Example	Description
`tools`	array	❌ No	-	-	Array of tools to use for the request.
>`tools.googleSearch`	object	❌ No	-	-	Configuration for Google Search Grounding.
>>`tools.googleSearch.timeRangeFilter`	object	❌ No	-	-	Optional time range filter for the search.
>>>`tools.googleSearch.timeRangeFilter.startTime`	string	❌ No	-	`2024-01-01T00:00:00Z` or timestamp	Optional start time for the search in ISO 8601 format or timestamp.
>>>`tools.googleSearch.timeRangeFilter.endTime`	string	❌ No	-	`2024-01-01T00:00:00Z` or timestamp	Optional end time for the search in ISO 8601 format or timestamp. Note: For Gemini 3 models, the time span cannot exceed 24 hours.

Error Codes

Common Error Codes

Error Code	Error Name	Description
401	Unauthorized	API key is missing or invalid
403	Forbidden	Your API key doesn’t have permission to access this resource, or insufficient balance for the requested operation
429	Too Many Requests	You’ve exceeded your rate limit
500	Internal server error	An internal server error occurred
503	Content policy violation	Content blocked due to safety concerns (actual status code is 400)

API Reference

​Authentication

​Initiate Request

​Parameters

​Path Parameters

​Core Parameters

​Advanced Parameters

​Multimodal Input

​Safety Settings

​Generation Config

​Thinking Config

​Image Config

​Web Search

​Basic Usage

​Error Codes

​Common Error Codes

Authentication

Initiate Request

Parameters

Path Parameters

Core Parameters

Advanced Parameters

Multimodal Input

Safety Settings

Generation Config

Thinking Config

Image Config

Web Search

Basic Usage

Error Codes

Common Error Codes