Image to Image

Overview

Google Gemini 2.5 Flash Image, a powerful new image generation and editing model with advanced features and creative control.

Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading.
Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it’s perfect.
High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters.

Supported inputs & outputs :

Inputs: Text and Images Outputs: Text and image

Authentication

This endpoint requires authentication using a Bearer token.

Authorization

string

default:"sk-***********"

required

Your API key in the format: YOUR_API_KEY

Request Body

messages

array

required

Array of message objects for the conversation

Show properties

role

string

required

role of user.

content

string

required

The prompt for the generation.

experimental_attachments

array

required

Show properties

contentType

string

required

Supported MIME types:image/png, image/jpeg, image/webp

url

string

required

image url.

model

string

default:"gemini-2.5-flash-image"

required

The default output size for chat mode is 1:1 (1024x1024)

curl -X POST "https://gptproto.com/v1/chat/completions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "messages": [
    {
      "role": "user",
      "content": "Two people standing together",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "https://mgszhhytxjyitifbalro.supabase.co/storage/v1/object/public/chat-attachments/uploads/guakmauqtr.png"
        },
        {
          "contentType": "image/jpeg",
          "url": "https://mgszhhytxjyitifbalro.supabase.co/storage/v1/object/public/chat-attachments/uploads/3ox9ikxbcxc.jpeg"
        }
      ],
  	  "model": "gemini-2.5-flash-image"
    }
  ]
}'

Response

{
  "candidates": [
      {
          "content": {
              "role": "model",
              "parts": [
                  {
                      "inlineData": {
                          "mimeType": "image/png",
                          "data": "base64"
                      }
                  }
              ]
          },
          "finishReason": "STOP"
      }
  ],
  "usageMetadata": {
      "promptTokenCount": 1302,
      "candidatesTokenCount": 1290,
      "totalTokenCount": 2592,
      "thoughtsTokenCount": 0,
      "promptTokensDetails": [
          {
              "modality": "IMAGE",
              "tokenCount": 1290
          },
          {
              "modality": "TEXT",
              "tokenCount": 12
          }
      ]
  },
  "modelVersion": "gemini-2.5-flash-image"
}

{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Request Example

Adding and removing elements

Provide an image and describe your change. The model will match the original image’s style, lighting, and perspective.

curl --location 'https://gptproto.com/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input	Output

A photorealistic picture of a fluffy ginger cat…	Using the provided image of my cat, please add a small, knitted wizard hat…

Advanced composition: Combining multiple images

Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages.

curl --location 'https://gptproto.com/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url_1"
        },{
          "contentType": "image/png",
          "url": "image_url_2"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input1	Input2	Output

A professionally shot photo of a blue floral summer dress…	Full-body shot of a woman with her hair in a bun…	Create a professional e-commerce fashion photo…

Best Practices

To elevate your results from good to great, incorporate these professional strategies into your workflow.

Be Hyper-Specific: The more detail you provide, the more control you have. Instead of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings.”
Provide Context and Intent: Explain the purpose of the image. The model’s understanding of context will influence the final output. For example, “Create a logo for a high-end, minimalist skincare brand” will yield better results than just “Create a logo.”
Iterate and Refine: Don’t expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, “That’s great, but can you make the lighting a bit warmer?” or “Keep everything the same, but change the character’s expression to be more serious.”
Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. “First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar.”
Use “Semantic Negative Prompts”: Instead of saying “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.”
Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot, macro shot, low-angle perspective.

Limitations

For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
Image generation does not support audio or video inputs.
The model won’t always follow the exact number of image outputs that the user explicitly asks for.
The model works best with up to 3 images as an input.
When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
Uploading images of children is not currently supported in EEA, CH, and UK.

API Reference

​Overview

​Supported inputs & outputs :

​Authentication

​Request Body

​Image to Image

​Response

​Request Example

​Adding and removing elements

​Advanced composition: Combining multiple images

​Best Practices

​Limitations

Overview

Supported inputs & outputs :

Authentication

Request Body

Image to Image

Response

Request Example

Adding and removing elements

Advanced composition: Combining multiple images

Best Practices

Limitations