What is the Playground?
The Playground is a web-based interface for:- Prompt Engineering: Craft and refine prompts with live feedback
- Model Comparison: Test the same prompt across multiple models side-by-side
- Parameter Tuning: Adjust temperature, top-p, max tokens, and other settings
- Trace Replay: Load production traces and rerun them with different configurations
- Iteration Speed: Get immediate feedback without writing or deploying code
Accessing the Playground
The Playground is available in the Phoenix UI:Open Playground
Click “Playground” in the navigation menu or navigate to a specific project and click “Open in Playground”.
Key Features
Prompt Editor
The Playground provides a rich editor for crafting prompts with:- System/User/Assistant Messages: Structure conversational prompts
- Template Variables: Use
{{variable}}syntax for dynamic content - Multi-turn Conversations: Build complex conversation flows
- Syntax Highlighting: Clear visual formatting
Example Prompt
Model Selection
Choose from supported LLM providers:OpenAI
GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
Anthropic
Claude 3 Opus, Sonnet, Haiku
Azure OpenAI
Azure-hosted OpenAI models
Custom Providers
Configure custom API endpoints
Parameter Tuning
Adjust generation parameters interactively: Temperature (0.0 - 2.0)- Controls randomness in outputs
- Lower = more deterministic
- Higher = more creative/varied
- Nucleus sampling threshold
- Lower = more focused on likely tokens
- Higher = broader token selection
- Maximum length of generated response
- Prevents runaway generation
- Reduce repetition in outputs
- Frequency: penalize based on token frequency
- Presence: penalize based on token presence
- Define custom stopping points
- Useful for structured outputs
Side-by-Side Comparison
Compare multiple model/parameter combinations simultaneously:Configure each variant
Set different models or parameters for each column:
- Column 1: GPT-4 with temp 0.7
- Column 2: Claude 3 Sonnet with temp 0.7
- Column 3: GPT-4 with temp 0.2
Trace Replay
One of the most powerful features is replaying production traces in the Playground:Modify configuration
The Playground loads with the exact prompt and inputs from the trace. Now you can:
- Edit the prompt
- Change the model
- Adjust parameters
- Modify input variables
- Debug problematic production outputs
- Test prompt improvements on real user queries
- Evaluate model upgrades (e.g., GPT-3.5 → GPT-4)
- Investigate why certain inputs failed
Playground Configuration
API Keys
Configure API keys for model providers:- OpenAI
- Anthropic
- Azure OpenAI
Custom Providers
Add custom LLM providers through the Phoenix configuration:Saving and Sharing
Save Prompt Configurations
Prompt configurations from the Playground can be saved for reuse:Export to Code
Convert Playground configurations to production code:- OpenAI Python SDK
- Anthropic Python SDK
- LangChain
- LlamaIndex
Integration with Prompt Management
The Playground integrates with Phoenix’s Prompt Management system:Save to Prompt Registry
Prompts created in the Playground can be saved as versioned prompts:Save as versioned prompt
Click “Save to Prompt Registry” and provide:
- Prompt name
- Version tag (e.g., “v1.0”, “production”)
- Description
Load from Prompt Registry
Bring existing versioned prompts into the Playground for testing:- Click “Load Prompt” in the Playground
- Select from your saved prompts
- Choose a specific version or tag
- Test with different models or parameters
Playground for Experiments
Use the Playground to rapidly prototype before running formal experiments:Best Practices
Iterate Quickly: Use the Playground for fast iteration before committing to code or experiments.
Keyboard Shortcuts
Speed up your workflow with keyboard shortcuts:Cmd/Ctrl + Enter: Run current configurationCmd/Ctrl + S: Save configurationCmd/Ctrl + K: Clear outputTab: Navigate between fields
Next Steps
Prompt Management
Version and manage prompts systematically
Experiments
Run systematic experiments on datasets
Tracing
Understand trace replay capabilities
Evaluation
Evaluate Playground outputs systematically