Image to Image

Overview

Google Gemini 2.5 Flash Image, a powerful new image generation and editing model with advanced features and creative control.

Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading.
Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it’s perfect.
High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters.

Supported inputs & outputs :

Inputs: Text and Images Outputs: Text and image

Authentication

This endpoint requires authentication using a Bearer token.

Authorization

string

default:"sk-***********"

required

Your API key in the format: YOUR_API_KEY

Request Body

model

string

default:"gemini-2.5-flash-image"

required

The model to use for the request

prompt

string

required

Prompt can add image url

size

string

default:"16:9"

1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9. default 1:1

curl -X POST "https://gptproto.com/v1/images/generations" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "gemini-2.5-flash-image",
  "prompt": "image_url1 , image_url2. Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.",
  "size": "16:9"
}'

Size

The model defaults to matching the output image size to that of your input image, or otherwise generates 1:1 squares. The size of the image generated are listed in this table:

Aspect ratio	Resolution
1:1	1024x1024
2:3	832x1248
3:2	1248x832
3:4	864x1184
4:3	1184x864
4:5	896x1152
5:4	1152x896
9:16	768x1344
16:9	1344x768
21:9	1536x672

Response

  {
      "created": 1762156444807,
      "data": [
          {
              "b64_json": "image_base64"
          }
      ],
      "output_format": "png",
      "quality": "high",
      "size": "16:9",
      "usage": {
          "input_tokens": 535,
          "input_tokens_details": {
              "image_tokens": 516,
              "text_tokens": 19
          },
          "output_tokens": 1291,
          "total_tokens": 1826
      }
  }

{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Request Example

Adding and removing elements

Provide an image and describe your change. The model will match the original image’s style, lighting, and perspective.

curl --location 'https://gptproto.com/v1/images/generations' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "model": "gemini-2.5-flash-image",
  "prompt": "image_url1. Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.",
  "size": "1:1"
}'

Input	Output

A photorealistic picture of a fluffy ginger cat…	Using the provided image of my cat, please add a small, knitted wizard hat…

Advanced composition: Combining multiple images

Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages.

curl --location 'https://gptproto.com/v1/images/generations' \
--header 'Authorization: sk-xxxx' \
--header 'Content-Type: application/json' \
--data '{
  "model": "gemini-2.5-flash-image",
  "prompt": "image_url1,image_url2. Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.",
  "size": "1:1"
}'

Input1	Input2	Output

A professionally shot photo of a blue floral summer dress…	Full-body shot of a woman with her hair in a bun…	Create a professional e-commerce fashion photo…

Best Practices

To elevate your results from good to great, incorporate these professional strategies into your workflow.

Be Hyper-Specific: The more detail you provide, the more control you have. Instead of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings.”
Provide Context and Intent: Explain the purpose of the image. The model’s understanding of context will influence the final output. For example, “Create a logo for a high-end, minimalist skincare brand” will yield better results than just “Create a logo.”
Iterate and Refine: Don’t expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, “That’s great, but can you make the lighting a bit warmer?” or “Keep everything the same, but change the character’s expression to be more serious.”
Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. “First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar.”
Use “Semantic Negative Prompts”: Instead of saying “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.”
Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot, macro shot, low-angle perspective.

Limitations

For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
Image generation does not support audio or video inputs.
The model won’t always follow the exact number of image outputs that the user explicitly asks for.
The model works best with up to 3 images as an input.
When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
Uploading images of children is not currently supported in EEA, CH, and UK.

API Reference

​Overview

​Supported inputs & outputs :

​Authentication

​Request Body

​Image to Image

​Size

​Response

​Request Example

​Adding and removing elements

​Advanced composition: Combining multiple images

​Best Practices

​Limitations

Overview

Supported inputs & outputs :

Authentication

Request Body

Image to Image

Size

Response

Request Example

Adding and removing elements

Advanced composition: Combining multiple images

Best Practices

Limitations