Skip to main content
POST
/
v1beta
/
models
/
gemini-2.5-flash-image:generateContent
Image to Image
curl --request POST \
  --url https://gptproto.com/v1beta/models/gemini-2.5-flash-image:generateContent \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "contents": [
    {
      "parts": [
        {
          "text": "<string>",
          "inline_data": {
            "mime_type": "<string>",
            "data": "<string>"
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": [
      "<string>"
    ],
    "imageConfig": {
      "aspectRatio": "<string>"
    }
  }
}'
{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Overview

Google Gemini 2.5 Flash Image, a powerful new image generation and editing model with advanced features and creative control.
  • Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading.
  • Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
  • Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it’s perfect.
  • High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters.

Supported inputs & outputs :

Inputs: Text and Images Outputs: Text and image

Authentication

This endpoint requires authentication using a Bearer token.
Authorization
string
default:"sk-***********"
required
Your API key in the format: YOUR_API_KEY

Request Body

contents
array
required
generationConfig
object

Image to Image

curl -X POST "https://gptproto.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "contents": [
    {
      "parts": [
        {
          "text": "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off."
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "iVBORw0KGgoAAAANSUhEUgAAANQAAAFPCA...."
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": [
      "TEXT",
      "IMAGE"
    ]
  }
}'
Technical specifications
  • Maximum images per prompt: 3
  • Maximum image size: 7 MB
  • Supported aspect ratios: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
  • Supported MIME types: image/png, image/jpeg, image/webp

Generate Content

You can optionally configure the response modalities and aspect ratio of the model’s output in the config field of generate_content calls.

Output types

The model defaults to returning text and image responses (i.e. response_modalities=['Text', 'Image']). You can configure the response to return only images without text using response_modalities=['Image'].

Aspect ratios

The model defaults to matching the output image size to that of your input image, or otherwise generates 1:1 squares. You can control the aspect ratio of the output image using the aspect_ratio field under image_config in the response request: The different ratios available and the size of the image generated are listed in this table:
Aspect ratioResolution
1:11024x1024
2:3832x1248
3:21248x832
3:4864x1184
4:31184x864
4:5896x1152
5:41152x896
9:16768x1344
16:91344x768
21:91536x672

Response

{
    "candidates": [
        {
            "content": {
                "role": "model",
                "parts": [
                    {
                        "inlineData": {
                            "mimeType": "image/png",
                            "data": "image base64"
                        }
                    }
                ]
            },
            "finishReason": "STOP"
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 1302,
        "candidatesTokenCount": 1290,
        "totalTokenCount": 2592,
        "thoughtsTokenCount": 0,
        "promptTokensDetails": [
            {
                "modality": "IMAGE",
                "tokenCount": 1290
            },
            {
                "modality": "TEXT",
                "tokenCount": 12
            }
        ]
    },
    "modelVersion": "gemini-2.5-flash-image"
}
{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Request Example

Adding and removing elements

Provide an image and describe your change. The model will match the original image’s style, lighting, and perspective.
curl --location 'https://gptproto.com/v1beta/models/gemini-2.5-flash-image:generateContent' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
      "contents": [{
        "parts":[
            {"text": "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off."},
            {
              "inline_data": {
                "mime_type":"image/png",
                "data": "iVBORw0KGgoAAAANSUhEUgAAANQAAAFPCA...."
              }
            }
        ]
      }]
    }'
InputOutput
InputOutput
A photorealistic picture of a fluffy ginger cat…Using the provided image of my cat, please add a small, knitted wizard hat…

Advanced composition: Combining multiple images

Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages.
curl --location 'https://gptproto.com/v1beta/models/gemini-2.5-flash-image:generateContent' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
      "contents": [{
        "parts":[
            {
              "inline_data": {
                "mime_type":"image/png",
                "data": "iVBORw0KGgoAAAANSUhEUgAAANQAAAFPCA...."
              }
            },
            {
              "inline_data": {
                "mime_type":"image/png",
                "data": "{{gemini_png_base64_2}}"
              }
            },
            {"text": "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment."}
        ]
      }]
    }'
Input1Input2Output
InputInputOutput
A professionally shot photo of a blue floral summer dress…Full-body shot of a woman with her hair in a bun…Create a professional e-commerce fashion photo…

Best Practices

To elevate your results from good to great, incorporate these professional strategies into your workflow.
  • Be Hyper-Specific: The more detail you provide, the more control you have. Instead of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings.”
  • Provide Context and Intent: Explain the purpose of the image. The model’s understanding of context will influence the final output. For example, “Create a logo for a high-end, minimalist skincare brand” will yield better results than just “Create a logo.”
  • Iterate and Refine: Don’t expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, “That’s great, but can you make the lighting a bit warmer?” or “Keep everything the same, but change the character’s expression to be more serious.”
  • Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. “First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar.”
  • Use “Semantic Negative Prompts”: Instead of saying “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.”
  • Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot, macro shot, low-angle perspective.

Limitations

  • For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
  • Image generation does not support audio or video inputs.
  • The model won’t always follow the exact number of image outputs that the user explicitly asks for.
  • The model works best with up to 3 images as an input.
  • When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
  • Uploading images of children is not currently supported in EEA, CH, and UK.