Skip to main content

Overview

Image editing allows you to modify existing images through various techniques:
  • Inpainting: Add or remove content within specific regions
  • Outpainting: Expand images beyond their original boundaries
  • Background Editing: Replace backgrounds while preserving subjects
  • Mask-free Editing: Natural language edits without masks
  • Conversational Editing: Iterative refinement through chat (Gemini)

Imagen 3 Editing

Setup

from google import genai
from google.genai.types import (
    EditImageConfig,
    Image,
    MaskReferenceConfig,
    MaskReferenceImage,
    RawReferenceImage,
)

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
edit_model = "imagen-3.0-capability-001"

Inpainting

Inpainting Insert

Add new content to specific regions of an image using masks.
Automatically detect and replace foreground objects:
# Generate a starting image
image_prompt = """
a small wooden bowl with grapes and apples on a marble 
kitchen counter, light brown cabinets blurred in the background
"""

generated_image = client.models.generate_images(
    model="imagen-3.0-generate-002",
    prompt=image_prompt,
    config=GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="1:1",
    ),
)

# Edit the foreground objects
edit_prompt = "a small white ceramic bowl with lemons and limes"

raw_ref_image = RawReferenceImage(
    reference_image=generated_image.generated_images[0].image,
    reference_id=0
)

mask_ref_image = MaskReferenceImage(
    reference_id=1,
    reference_image=None,  # Auto-detect
    config=MaskReferenceConfig(
        mask_mode="MASK_MODE_FOREGROUND",
        mask_dilation=0.1,
    ),
)

edited_image = client.models.edit_image(
    model=edit_model,
    prompt=edit_prompt,
    reference_images=[raw_ref_image, mask_ref_image],
    config=EditImageConfig(
        edit_mode="EDIT_MODE_INPAINT_INSERTION",
        number_of_images=1,
    ),
)

Inpainting Remove

Remove unwanted objects from images:
starting_image = Image(
    gcs_uri="gs://cloud-samples-data/generative-ai/image/mirror.png"
)

raw_ref_image = RawReferenceImage(
    reference_image=starting_image,
    reference_id=0
)

mask_ref_image = MaskReferenceImage(
    reference_id=1,
    reference_image=None,
    config=MaskReferenceConfig(
        mask_mode="MASK_MODE_SEMANTIC",
        segmentation_classes=[85],  # Mirror class
    ),
)

remove_image = client.models.edit_image(
    model=edit_model,
    prompt="",  # Empty prompt for removal
    reference_images=[raw_ref_image, mask_ref_image],
    config=EditImageConfig(
        edit_mode="EDIT_MODE_INPAINT_REMOVAL",
        number_of_images=1,
    ),
)

Background Editing

Replace backgrounds while preserving product or subject:
product_image = Image(
    gcs_uri="gs://cloud-samples-data/generative-ai/image/suitcase.png"
)

raw_ref_image = RawReferenceImage(
    reference_image=product_image,
    reference_id=0
)

mask_ref_image = MaskReferenceImage(
    reference_id=1,
    reference_image=None,
    config=MaskReferenceConfig(
        mask_mode="MASK_MODE_BACKGROUND",
    ),
)

prompt = """
a light blue suitcase in front of a window in an airport, 
lots of bright natural lighting, planes taking off in the distance
"""

edited_image = client.models.edit_image(
    model=edit_model,
    prompt=prompt,
    reference_images=[raw_ref_image, mask_ref_image],
    config=EditImageConfig(
        edit_mode="EDIT_MODE_BGSWAP",
        number_of_images=1,
    ),
)
Background swap is perfect for product photography, e-commerce listings, and marketing materials.

Outpainting

Expand images beyond their original boundaries:
from PIL import Image as PIL_Image
import io

# Load original image
initial_image = Image.from_file(location="living-room.png")
mask = PIL_Image.new("L", initial_image._pil_image.size, 0)

# Helper function to pad image and mask
def pad_image_and_mask(image, mask, target_size):
    image.thumbnail(target_size)
    mask.thumbnail(target_size)
    
    # Add padding around image
    padded_image = PIL_Image.new("RGB", target_size, (0, 0, 0))
    padded_mask = PIL_Image.new("L", target_size, 255)
    
    # Center the original image
    x = (target_size[0] - image.width) // 2
    y = (target_size[1] - image.height) // 2
    padded_image.paste(image, (x, y))
    padded_mask.paste(mask, (x, y))
    
    return padded_image, padded_mask

# Prepare padded versions
target_size = (1875, 2500)  # 3:4 aspect ratio
image_padded, mask_padded = pad_image_and_mask(
    initial_image._pil_image,
    mask,
    target_size
)

# Convert to Image objects
def get_bytes_from_pil(pil_image):
    byte_io = io.BytesIO()
    pil_image.save(byte_io, "PNG")
    return byte_io.getvalue()

image_padded_obj = Image(image_bytes=get_bytes_from_pil(image_padded))
mask_padded_obj = Image(image_bytes=get_bytes_from_pil(mask_padded))

raw_ref_image = RawReferenceImage(
    reference_image=image_padded_obj,
    reference_id=0
)

mask_ref_image = MaskReferenceImage(
    reference_id=1,
    reference_image=mask_padded_obj,
    config=MaskReferenceConfig(
        mask_mode="MASK_MODE_USER_PROVIDED",
        mask_dilation=0.03,
    ),
)

prompt = "a chandelier hanging from the ceiling"

edited_image = client.models.edit_image(
    model=edit_model,
    prompt=prompt,
    reference_images=[raw_ref_image, mask_ref_image],
    config=EditImageConfig(
        edit_mode="EDIT_MODE_OUTPAINT",
        number_of_images=1,
    ),
)

Mask-Free Editing

Edit images using natural language without masks:
original_image = Image(
    gcs_uri="gs://cloud-samples-data/generative-ai/image/latte.jpg"
)

raw_ref_image = RawReferenceImage(
    reference_image=original_image,
    reference_id=0
)

prompt = """
swan latte art in the coffee cup and an assortment of red velvet 
cupcakes in gold wrappers on the white plate
"""

edited_image = client.models.edit_image(
    model=edit_model,
    prompt=prompt,
    reference_images=[raw_ref_image],  # Only reference image
    config=EditImageConfig(
        edit_mode="EDIT_MODE_DEFAULT",
        number_of_images=1,
    ),
)
Mask-free editing is ideal for subtle changes, style adjustments, and when precise masking isn’t needed.

Gemini Conversational Editing

Subject Customization

Apply subjects from reference images to new contexts:
from google.genai.types import Part, GenerateContentConfig, ImageConfig

MODEL_ID = "gemini-2.5-flash-image"

# Load reference image
with open("dog-1.jpg", "rb") as f:
    image = f.read()

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        Part.from_bytes(
            data=image,
            mime_type="image/jpeg",
        ),
        "Create a pencil sketch image of this dog wearing a cowboy hat in a western-themed setting.",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="1:1",
        ),
    ),
)

for part in response.candidates[0].content.parts:
    if part.inline_data:
        display(Image(data=part.inline_data.data))

Style Transfer

Transfer style from one image to another:
# Load style reference (living room)
with open("living-room.png", "rb") as f:
    style_image = f.read()

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        Part.from_bytes(
            data=style_image,
            mime_type="image/png",
        ),
        "Using the concepts, colors, and themes from this living room generate a kitchen with the same aesthetic.",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="21:9",
        ),
    ),
)

Multi-Turn Iterative Editing

Refine images through conversation:
# Start a chat session
chat = client.chats.create(model=MODEL_ID)

# First edit: Change color
response = chat.send_message(
    message=[
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/perfume.jpg",
            mime_type="image/jpeg",
        ),
        "change the perfume color to a light purple",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="3:2",
        ),
    ),
)

# Save the generated image data
data = response.candidates[0].content.parts[0].inline_data.data

# Second edit: Add text (continuing the conversation)
response = chat.send_message(
    message=[
        Part.from_bytes(
            data=data,
            mime_type="image/jpeg",
        ),
        "inscribe the word flowers in French on the perfume bottle in a delicate white cursive font",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="3:2",
        ),
    ),
)
Multi-turn editing is perfect for iterative design workflows where you refine details progressively.

Multiple Reference Images

Combine elements from multiple images:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/suitcase.png",
            mime_type="image/png",
        ),
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/woman.jpg",
            mime_type="image/jpeg",
        ),
        "Generate an image of the woman pulling the suitcase in an airport.",
    ],
    config=GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="9:16",
        ),
    ),
)

Configuration Options

Mask Dilation

Control how far the mask extends beyond detected boundaries:
config=MaskReferenceConfig(
    mask_dilation=0.1,  # 10% expansion (0.0 to 1.0)
)

Safety Controls

config=EditImageConfig(
    safety_filter_level="BLOCK_MEDIUM_AND_ABOVE",
    person_generation="ALLOW_ADULT",
)

Semantic Segmentation Classes

For MASK_MODE_SEMANTIC, use these class IDs:
Class IDObject TypeClass IDObject TypeClass IDObject Type
0backpack50carrot100sidewalk_pavement
1umbrella51hot_dog101runway
2bag52pizza102terrain
3tie53donut103book
4suitcase54cake104box
5case55fruit_other105clock
6bird56food_other106vase
7cat57chair_other107scissors
8dog58armchair108plaything_other
9horse59swivel_chair109teddy_bear
10sheep60stool110hair_dryer
11cow61seat111toothbrush
12elephant62couch112painting
13bear63trash_can113poster
14zebra64potted_plant114bulletin_board
15giraffe65nightstand115bottle
16animal_other66bed116cup
17microwave67table117wine_glass
28toilet78bathroom_counter128motorcyclist
35building85mirror135traffic_sign
42television92stove142sky
125person175bicycle176car
180bus181train182truck
Common use cases:
  • 8 - dog
  • 7 - cat
  • 125 - person
  • 85 - mirror
  • 42 - television
  • 176 - car

Best Practices

1

Start with clear reference images

Use high-quality, well-lit images with clear subject boundaries.
2

Write descriptive edit prompts

Be specific about what you want to change and how the result should look.
3

Experiment with mask dilation

Adjust mask_dilation if edges appear too sharp or too blurry.
4

Use appropriate edit modes

Choose the right mode for your task:
  • Inpainting for targeted changes
  • Background swap for product images
  • Mask-free for subtle adjustments
  • Conversational for iterative refinement

Next Steps

Image Generation

Learn about text-to-image generation

Visual Q&A

Ask questions about images with Gemini

Build docs developers (and LLMs) love