Overview
Google Gemini 2.5 Flash Image, a powerful new image generation and editing model with advanced features and creative control.- Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading.
- Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
- Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it’s perfect.
- High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters.
Supported inputs & outputs :
Inputs: Text and Images Outputs: Text and imageAuthentication
This endpoint requires authentication using a Bearer token.Your API key in the format:
YOUR_API_KEYRequest Body
The model to use for the request
Prompt can add image url
1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9. default 1:1
Image to Image
Size
The model defaults to matching the output image size to that of your input image, or otherwise generates 1:1 squares. The size of the image generated are listed in this table:| Aspect ratio | Resolution |
|---|---|
| 1:1 | 1024x1024 |
| 2:3 | 832x1248 |
| 3:2 | 1248x832 |
| 3:4 | 864x1184 |
| 4:3 | 1184x864 |
| 4:5 | 896x1152 |
| 5:4 | 1152x896 |
| 9:16 | 768x1344 |
| 16:9 | 1344x768 |
| 21:9 | 1536x672 |
Response
Request Example
Adding and removing elements
Provide an image and describe your change. The model will match the original image’s style, lighting, and perspective.| Input | Output |
|---|---|
![]() | ![]() |
| A photorealistic picture of a fluffy ginger cat… | Using the provided image of my cat, please add a small, knitted wizard hat… |
Advanced composition: Combining multiple images
Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages.| Input1 | Input2 | Output |
|---|---|---|
![]() | ![]() | ![]() |
| A professionally shot photo of a blue floral summer dress… | Full-body shot of a woman with her hair in a bun… | Create a professional e-commerce fashion photo… |
Best Practices
To elevate your results from good to great, incorporate these professional strategies into your workflow.- Be Hyper-Specific: The more detail you provide, the more control you have. Instead of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings.”
- Provide Context and Intent: Explain the purpose of the image. The model’s understanding of context will influence the final output. For example, “Create a logo for a high-end, minimalist skincare brand” will yield better results than just “Create a logo.”
- Iterate and Refine: Don’t expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, “That’s great, but can you make the lighting a bit warmer?” or “Keep everything the same, but change the character’s expression to be more serious.”
- Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. “First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar.”
- Use “Semantic Negative Prompts”: Instead of saying “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.”
- Control the Camera: Use photographic and cinematic language to control the composition. Terms like
wide-angle shot,macro shot,low-angle perspective.
Limitations
- For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
- Image generation does not support audio or video inputs.
- The model won’t always follow the exact number of image outputs that the user explicitly asks for.
- The model works best with up to 3 images as an input.
- When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
- Uploading images of children is not currently supported in EEA, CH, and UK.






