Skip to main content
The Phoenix Playground provides an interactive environment for rapidly iterating on prompts, testing different models, and tuning generation parameters—all without writing code. It’s designed for prompt engineering, quick experimentation, and debugging LLM behavior.

What is the Playground?

The Playground is a web-based interface for:
  • Prompt Engineering: Craft and refine prompts with live feedback
  • Model Comparison: Test the same prompt across multiple models side-by-side
  • Parameter Tuning: Adjust temperature, top-p, max tokens, and other settings
  • Trace Replay: Load production traces and rerun them with different configurations
  • Iteration Speed: Get immediate feedback without writing or deploying code

Accessing the Playground

The Playground is available in the Phoenix UI:
1

Launch Phoenix

Start Phoenix and navigate to the UI:
phoenix serve
# Open http://localhost:6006
2

Open Playground

Click “Playground” in the navigation menu or navigate to a specific project and click “Open in Playground”.
3

Configure your LLM

Select your model provider (OpenAI, Anthropic, etc.) and enter API credentials if needed.

Key Features

Prompt Editor

The Playground provides a rich editor for crafting prompts with:
  • System/User/Assistant Messages: Structure conversational prompts
  • Template Variables: Use {{variable}} syntax for dynamic content
  • Multi-turn Conversations: Build complex conversation flows
  • Syntax Highlighting: Clear visual formatting

Example Prompt

System: You are a helpful customer support agent for {{company_name}}.
You should be professional but friendly.

User: {{customer_query}}

Assistant:

Model Selection

Choose from supported LLM providers:

OpenAI

GPT-4, GPT-4 Turbo, GPT-3.5 Turbo

Anthropic

Claude 3 Opus, Sonnet, Haiku

Azure OpenAI

Azure-hosted OpenAI models

Custom Providers

Configure custom API endpoints

Parameter Tuning

Adjust generation parameters interactively: Temperature (0.0 - 2.0)
  • Controls randomness in outputs
  • Lower = more deterministic
  • Higher = more creative/varied
Top P (0.0 - 1.0)
  • Nucleus sampling threshold
  • Lower = more focused on likely tokens
  • Higher = broader token selection
Max Tokens
  • Maximum length of generated response
  • Prevents runaway generation
Frequency/Presence Penalty (OpenAI)
  • Reduce repetition in outputs
  • Frequency: penalize based on token frequency
  • Presence: penalize based on token presence
Stop Sequences
  • Define custom stopping points
  • Useful for structured outputs

Side-by-Side Comparison

Compare multiple model/parameter combinations simultaneously:
1

Add comparison column

Click “Add Comparison” to create a new configuration column.
2

Configure each variant

Set different models or parameters for each column:
  • Column 1: GPT-4 with temp 0.7
  • Column 2: Claude 3 Sonnet with temp 0.7
  • Column 3: GPT-4 with temp 0.2
3

Run all variants

Click “Run All” to execute the same prompt across all configurations.
4

Compare outputs

Review outputs side-by-side to identify quality, cost, and latency differences.

Trace Replay

One of the most powerful features is replaying production traces in the Playground:
1

Find a trace

Navigate to a trace in your project that you want to replay or debug.
2

Open in Playground

Click “Replay in Playground” from the trace detail view.
3

Modify configuration

The Playground loads with the exact prompt and inputs from the trace. Now you can:
  • Edit the prompt
  • Change the model
  • Adjust parameters
  • Modify input variables
4

Rerun and compare

Execute the modified configuration and compare against the original trace output.
Use Cases for Trace Replay:
  • Debug problematic production outputs
  • Test prompt improvements on real user queries
  • Evaluate model upgrades (e.g., GPT-3.5 → GPT-4)
  • Investigate why certain inputs failed

Playground Configuration

API Keys

Configure API keys for model providers:
# Set via environment variable
export OPENAI_API_KEY="sk-..."
Or enter directly in the Playground UI settings.

Custom Providers

Add custom LLM providers through the Phoenix configuration:
# In Phoenix configuration (helper code in src/phoenix/server/api/helpers/playground_clients.py)
# Custom providers can be registered for use in the Playground

Saving and Sharing

Save Prompt Configurations

Prompt configurations from the Playground can be saved for reuse:
1

Name your configuration

Give your prompt + parameters a descriptive name.
2

Save as template

Click “Save Template” to store in Phoenix.
3

Load later

Access saved templates from the Playground sidebar.

Export to Code

Convert Playground configurations to production code:
# Example: Export Playground config to OpenAI Python code
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    temperature=0.7,
    max_tokens=500,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Phoenix?"}
    ]
)
The Playground provides export options for common frameworks:
  • OpenAI Python SDK
  • Anthropic Python SDK
  • LangChain
  • LlamaIndex

Integration with Prompt Management

The Playground integrates with Phoenix’s Prompt Management system:

Save to Prompt Registry

Prompts created in the Playground can be saved as versioned prompts:
1

Finalize prompt

Test and refine your prompt in the Playground.
2

Save as versioned prompt

Click “Save to Prompt Registry” and provide:
  • Prompt name
  • Version tag (e.g., “v1.0”, “production”)
  • Description
3

Use in production

Load the saved prompt in your application code:
import phoenix as px

client = px.Client()
prompt = client.get_prompt(
    name="customer_support_greeting",
    tag="production"
)

Load from Prompt Registry

Bring existing versioned prompts into the Playground for testing:
  1. Click “Load Prompt” in the Playground
  2. Select from your saved prompts
  3. Choose a specific version or tag
  4. Test with different models or parameters

Playground for Experiments

Use the Playground to rapidly prototype before running formal experiments:
1

Prototype in Playground

Test your task logic interactively with various inputs.
2

Validate across examples

Manually try multiple dataset examples to verify behavior.
3

Export to code

Convert your Playground configuration to a task function.
4

Run full experiment

Execute the task systematically across your dataset:
from phoenix.experiments import run_experiment

def task(input):
    # Logic refined in Playground
    return process(input)

result = run_experiment(
    dataset=dataset,
    task=task,
    experiment_name="playground-prototype"
)

Best Practices

Iterate Quickly: Use the Playground for fast iteration before committing to code or experiments.
Test Edge Cases: Try unusual inputs, very long queries, and adversarial examples. Compare Models: Don’t assume one model is always better—test on your specific use case. Document Findings: Save configurations that work well and note parameter settings. Use Trace Replay: When debugging production issues, always replay the trace in the Playground first. Version Prompts: Once you find a good prompt, save it to the Prompt Registry with proper versioning.

Keyboard Shortcuts

Speed up your workflow with keyboard shortcuts:
  • Cmd/Ctrl + Enter: Run current configuration
  • Cmd/Ctrl + S: Save configuration
  • Cmd/Ctrl + K: Clear output
  • Tab: Navigate between fields

Next Steps

Prompt Management

Version and manage prompts systematically

Experiments

Run systematic experiments on datasets

Tracing

Understand trace replay capabilities

Evaluation

Evaluate Playground outputs systematically

Build docs developers (and LLMs) love