Google Gemini 2.5 Flash Image, a powerful new image generation and editing model with advanced features and creative control.
Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading.
Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it’s perfect.
High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters.
The model defaults to returning text and image responses (['Text', 'Image']). You can configure the response to return only images without text using ( ['Image']).
The model defaults to returning text and image responses (i.e. response_modalities=['Text', 'Image']). You can configure the response to return only images without text using response_modalities=['Image'].
The model defaults to matching the output image size to that of your input image, or otherwise generates 1:1 squares. You can control the aspect ratio of the output image using the aspect_ratio field under image_config in the response request: The different ratios available and the size of the image generated are listed in this table:
Provide an image and describe your change. The model will match the original image’s style, lighting, and perspective.
Copy
curl --location 'https://gptproto.com/v1beta/models/gemini-2.5-flash-image:generateContent' \--header 'Authorization: sk-xxxx' \--header 'Content-Type: application/json' \--data '{ "contents": [{ "parts":[ {"text": "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off."}, { "inline_data": { "mime_type":"image/png", "data": "iVBORw0KGgoAAAANSUhEUgAAANQAAAFPCA...." } } ] }] }'
Input
Output
A photorealistic picture of a fluffy ginger cat…
Using the provided image of my cat, please add a small, knitted wizard hat…
Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages.
Copy
curl --location 'https://gptproto.com/v1beta/models/gemini-2.5-flash-image:generateContent' \--header 'Authorization: sk-xxxx' \--header 'Content-Type: application/json' \--data '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/png", "data": "iVBORw0KGgoAAAANSUhEUgAAANQAAAFPCA...." } }, { "inline_data": { "mime_type":"image/png", "data": "{{gemini_png_base64_2}}" } }, {"text": "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment."} ] }] }'
Input1
Input2
Output
A professionally shot photo of a blue floral summer dress…
To elevate your results from good to great, incorporate these professional strategies into your workflow.
Be Hyper-Specific: The more detail you provide, the more control you have. Instead of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings.”
Provide Context and Intent: Explain the purpose of the image. The model’s understanding of context will influence the final output. For example, “Create a logo for a high-end, minimalist skincare brand” will yield better results than just “Create a logo.”
Iterate and Refine: Don’t expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, “That’s great, but can you make the lighting a bit warmer?” or “Keep everything the same, but change the character’s expression to be more serious.”
Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. “First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar.”
Use “Semantic Negative Prompts”: Instead of saying “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.”
Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot, macro shot, low-angle perspective.