Fine-tune Gemini models using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to adapt models to your specific tasks and preferences
Gemini supports two powerful tuning approaches to customize model behavior for your specific use cases: Supervised Fine-Tuning (SFT) for task-specific adaptation and Direct Preference Optimization (DPO) for aligning models with human preferences.
Supervised fine-tuning uses labeled training data to refine the base model’s capabilities toward your specific tasks. Each training example demonstrates the desired output for a given input.
Training data should be in JSONL format with input-output pairs:
{ "contents": [ { "role": "user", "parts": [{"text": "Context: The Normans were an ethnic group...\nQuestion: In what country is Normandy located?"}] } ], "completion": { "role": "model", "parts": [{"text": "France"}] }}
Ensure your training data is high-quality, well-labeled, and directly relevant to your target task. Low-quality data can adversely affect performance and introduce bias.
# Poll for completionjob_status = client.tuning_jobs.get(tuning_job.name)print(f"Status: {job_status.state}")print(f"Progress: {job_status.tuning_progress}")
5
Deploy and Use
Once training completes, use your fine-tuned model:
# Use the tuned modelresponse = client.models.generate_content( model=tuning_job.tuned_model_endpoint, contents="Context: ...\nQuestion: What is the capital of France?")print(response.text)
DPO teaches Gemini to generate better responses by learning from human preferences. Instead of labeled “correct” answers, you provide pairs of responses where humans preferred one over the other.
from datasets import load_dataset# Load preference dataset (e.g., UltraFeedback)dataset = load_dataset("zhengr/ultrafeedback_binarized")# Transform to Gemini formattrain_transformed = []for example in dataset["train_prefs"][:1000]: result = transform_to_gemini_format(example) if result: train_transformed.append(result)# Save as JSONLwith open("dpo_train.jsonl", "w") as f: for item in train_transformed: f.write(json.dumps(item) + "\n")
# Test with base modelbase_response = client.models.generate_content( model="gemini-2.0-flash-001", contents="Explain quantum computing")# Test with DPO-tuned modeltuned_response = client.models.generate_content( model=dpo_job.tuned_model_endpoint, contents="Explain quantum computing")print("Base model:", base_response.text)print("DPO-tuned:", tuned_response.text)
Consider using Gemini Flash for faster, cheaper tuning
Fine-tuning can introduce or amplify biases present in your training data. Always evaluate outputs for fairness, safety, and alignment with your values before deploying to production.