Key Metrics Categories
Usage Metrics
- Request volume over time
- Active users (daily, weekly, monthly)
- Requests per user
- Model usage distribution
- Provider distribution
Performance Metrics
- Latency percentiles (p50, p95, p99)
- Time to first token (TTFT)
- Throughput (requests/second)
- Error rates by model and provider
- Cache hit rates
Cost Metrics
- Total spend over time
- Cost per user
- Cost per feature/workflow
- Cost by model and provider
- Token usage and costs
Quality Metrics
- Success rate (2xx vs errors)
- User feedback scores
- Retry rates
- Session completion rates
- Average response length
Dashboard Overview
The Helicone dashboard provides real-time metrics visualization at helicone.ai/dashboard:High-Level Metrics
At the top of your dashboard, see your key metrics at a glance:- Total Requests: Request count for the selected time period
- Total Cost: Cumulative cost across all requests
- Average Latency: Mean latency across all requests
- Error Rate: Percentage of failed requests (4xx/5xx)
- Active Users: Unique users making requests
Time-Series Graphs
Visualize trends over time:- Requests Over Time: See usage patterns and identify spikes
- Cost Over Time: Track spending trends and budget
- Latency Over Time: Monitor performance degradation
- Errors Over Time: Identify reliability issues
Breakdowns
Understand your usage composition:- By Model: Which models are used most
- By Provider: OpenAI, Anthropic, Google, etc.
- By User: Top users by request count or cost
- By Property: Custom property breakdowns (environment, feature, etc.)
Request Metrics
Volume & Distribution
Track how many requests you’re making:Latency Analysis
Understand request performance: Percentiles explained:- p50 (median): Half of requests are faster, half are slower
- p95: 95% of requests are faster - identifies slow outliers
- p99: 99% of requests are faster - catches worst-case scenarios
- p50 increasing: Overall performance degrading
- p95/p99 spikes: Some requests becoming very slow
- Large p99-p50 gap: Inconsistent performance
Time to First Token (TTFT)
For streaming requests, TTFT measures perceived responsiveness:- Lower TTFT = faster perceived response
- Critical for chat interfaces
- Varies significantly by model
Error Rates
Track request failures:- 429 (Rate Limit): Hitting provider rate limits
- 400 (Bad Request): Invalid request parameters
- 500 (Server Error): Provider outages
- 503 (Service Unavailable): Provider capacity issues
Session Metrics
For workflows using sessions, track aggregate session metrics:Session Performance
Session Cost Analysis
Understand the cost of complete workflows:- Total session cost: Sum of all requests in the session
- Cost distribution: Which parts of the workflow are most expensive
- Cost per success: Total cost divided by successful sessions
User Metrics
Analyze per-user behavior and costs:User Activity
User Segmentation
Group users by behavior:- Power Users: Top 10% by request volume
- Active Users: Made requests in last 7 days
- New Users: First request in last 30 days
- At-Risk Users: Declining usage patterns
User Costs
Track spending per user:Cost Metrics
Total Spend
Track your LLM spending over time:Cost by Model
Understand which models drive costs:Cost by Custom Property
Segment costs by any dimension:Token Usage
Track token consumption:Performance Optimization
Identifying Slow Requests
Use metrics to find performance bottlenecks:- Sort by latency: Find slowest requests
- Check patterns: Do slow requests share characteristics?
- Analyze prompts: Are slow requests using longer prompts?
- Compare models: Are certain models consistently slower?
Cache Hit Rate
Track cache effectiveness:- High hit rate (>30%): Cache working well
- Low hit rate (<10%): Review cache strategy
- Consider increasing cache bucket size
- Check cache TTL settings
Cost Optimization
Model Selection
Compare costs across models:- Use cheaper models for simple tasks
- Reserve expensive models for complex tasks
- A/B test model quality vs cost
- Implement model fallbacks
Prompt Optimization
Reduce token usage:- Shorten system prompts
- Remove redundant instructions
- Use fewer examples in few-shot prompts
- Implement prompt compression
Feature Cost Analysis
Identify expensive features:Custom Metric Tracking
Add custom properties to enable rich analytics:Alerts & Monitoring
Set up alerts based on metrics:Cost Alerts
- Daily spend exceeds threshold
- User spend exceeds limit
- Unusual cost spike detected
Performance Alerts
- Latency p95 exceeds threshold
- Error rate exceeds threshold
- TTFT degradation detected
Usage Alerts
- Request rate spike
- Unusual traffic pattern
- Provider rate limit approaching
Exporting Metrics
API Export
Export metrics for external analysis:Data Warehouse Integration
Integrate with your data warehouse:- Export data via API
- Load into your warehouse (Snowflake, BigQuery, etc.)
- Join with your business data
- Build custom dashboards
Best Practices
Metric Collection
✅ Do:- Tag all requests with custom properties for rich segmentation
- Use consistent property names across your application
- Track both business and technical metrics
- Set up alerts for critical metrics
- Collect metrics without acting on them
- Use inconsistent property names
- Ignore low-level metrics (they reveal patterns)
- Wait for issues to become critical
Metric Analysis
✅ Do:- Review metrics weekly to identify trends
- Compare across time periods (week-over-week, month-over-month)
- Segment by user cohorts and features
- Look for correlations between metrics
- Look at metrics in isolation
- Ignore gradual degradation
- Focus only on averages (check percentiles too)
- Optimize prematurely without data
Related Features
Requests
Drill down into individual requests from metrics
Custom Properties
Add dimensions for richer metric analysis
User Metrics
Deep dive into user-level analytics
Alerts
Set up alerts based on metric thresholds
Questions?
Need help or have questions? We’re here to help:- Discord Community: Join our Discord server for quick help
- GitHub Issues: Report bugs or request features on GitHub
- Documentation: Check our full documentation for more guides
