Image to Image

Overview

Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading.
Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it’s perfect.
High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters.

Supported inputs & outputs :

Inputs: Text and Images Outputs: Text and image

The default output size for chat mode is 1:1 (1024x1024)

curl -X POST "https://gptproto/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "messages": [
    {
      "role": "user",
      "content": "Two people standing together",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "https://mgszhhytxjyitifbalro.supabase.co/storage/v1/object/public/chat-attachments/uploads/guakmauqtr.png"
        },
        {
          "contentType": "image/jpeg",
          "url": "https://mgszhhytxjyitifbalro.supabase.co/storage/v1/object/public/chat-attachments/uploads/3ox9ikxbcxc.jpeg"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Authentication

This endpoint requires authentication using a Bearer token.

Authorization

string

default:"sk-***********"

required

Your API key in the format: Bearer YOUR_API_KEY

Request Body

messages

array

required

Array of message objects for the conversation

model

string

default:"gemini-2.5-flash-image-preview"

required

The model to use for the request

Request Example

Response

Success

200

{
  "candidates": [
      {
          "content": {
              "role": "model",
              "parts": [
                  {
                      "inlineData": {
                          "mimeType": "image/png",
                          "data": "image base64"
                      }
                  }
              ]
          },
          "finishReason": "STOP"
      }
  ],
  "usageMetadata": {
      "promptTokenCount": 1302,
      "candidatesTokenCount": 1290,
      "totalTokenCount": 2592,
      "thoughtsTokenCount": 0,
      "promptTokensDetails": [
          {
              "modality": "IMAGE",
              "tokenCount": 1290
          },
          {
              "modality": "TEXT",
              "tokenCount": 12
          }
      ]
  },
  "modelVersion": "gemini-2.5-flash-image"
}

Error Responses

{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Request Example

Adding and removing elements

Provide an image and describe your change. The model will match the original image’s style, lighting, and perspective.

curl --location 'https://gptproto/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input	Output

A photorealistic picture of a fluffy ginger cat…	Using the provided image of my cat, please add a small, knitted wizard hat…

Inpainting (Semantic masking)

Conversationally define a “mask” to edit a specific part of an image while leaving the rest untouched.

curl --location 'https://gptproto/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input	Output

A wide shot of a modern, well-lit living room…	Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa…

Style transfer

Provide an image and ask the model to recreate its content in a different artistic style.

curl --location 'https://gptproto/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input	Output

A photorealistic, high-resolution photograph of a busy city street…	Transform the provided photograph of a modern city street at night…

Advanced composition: Combining multiple images

Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages.

curl --location 'https://gptproto/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url_1"
        },{
          "contentType": "image/png",
          "url": "image_url_2"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input1	Input2	Output

A professionally shot photo of a blue floral summer dress…	Full-body shot of a woman with her hair in a bun…	Create a professional e-commerce fashion photo…

High-fidelity detail preservation

To ensure critical details (like a face or logo) are preserved during an edit, describe them in great detail along with your edit request.

curl --location 'https://gptproto/v1/chat/completions' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.",
      "experimental_attachments": [
        {
          "contentType": "image/png",
          "url": "image_url_1"
        },{
          "contentType": "image/png",
          "url": "image_url_2"
        }
      ],
  	  "model": "gemini-2.5-flash-image-preview"
    }
  ]
}'

Input1	Input2	Output

A professional headshot of a woman with brown hair and blue eyes…	A simple, modern logo with the letters ‘G’ and ‘A’…	Take the first image of the woman with brown hair, blue eyes, and a neutral expression…

Best Practices

To elevate your results from good to great, incorporate these professional strategies into your workflow.

Be Hyper-Specific: The more detail you provide, the more control you have. Instead of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings.”
Provide Context and Intent: Explain the purpose of the image. The model’s understanding of context will influence the final output. For example, “Create a logo for a high-end, minimalist skincare brand” will yield better results than just “Create a logo.”
Iterate and Refine: Don’t expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, “That’s great, but can you make the lighting a bit warmer?” or “Keep everything the same, but change the character’s expression to be more serious.”
Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. “First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar.”
Use “Semantic Negative Prompts”: Instead of saying “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.”
Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot, macro shot, low-angle perspective.

Limitations

For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
Image generation does not support audio or video inputs.
The model won’t always follow the exact number of image outputs that the user explicitly asks for.
The model works best with up to 3 images as an input.
When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
Uploading images of children is not currently supported in EEA, CH, and UK.

API Reference

​Overview

​Supported inputs & outputs :

​Image to Image

​Authentication

​Request Body

​Request Example

​Response

​Error Responses

​Request Example

​Adding and removing elements

​Inpainting (Semantic masking)

​Style transfer

​Advanced composition: Combining multiple images

​High-fidelity detail preservation

​Best Practices

​Limitations

Overview

Supported inputs & outputs :

Image to Image

Authentication

Request Body

Request Example

Response

Error Responses

Request Example

Adding and removing elements

Inpainting (Semantic masking)

Style transfer

Advanced composition: Combining multiple images

High-fidelity detail preservation

Best Practices

Limitations