Overview
Gorkie implements comprehensive observability through:- OpenTelemetry - Distributed tracing
- Langfuse - AI-specific observability
- Pino - Structured logging
OpenTelemetry Setup
OpenTelemetry is initialized at application startup:server/index.ts
- Automatic trace propagation across async operations
- Spans are exported to Langfuse for analysis
- Graceful shutdown on process exit
Error Handling
Unhandled errors are captured and telemetry is flushed before exit:server/index.ts
Langfuse Integration
Langfuse provides AI-specific observability:Automatic Tracing
All AI SDK operations are automatically traced:server/lib/ai/agents/orchestrator.ts
- Model calls (prompt, completion, tokens)
- Tool executions (input, output, duration)
- Agent reasoning steps
- Error traces
Environment Variables
Configure Langfuse with these environment variables:Viewing Traces
- Navigate to Langfuse Dashboard
- Select your project
- View traces grouped by:
- Session - Full conversation thread
- User - Specific Slack user
- Trace - Individual message handling
Langfuse automatically groups traces by
sessionId (derived from Slack thread), making it easy to debug full conversations.Structured Logging with Pino
Gorkie uses Pino for high-performance structured logging:server/lib/logger.ts
Log Outputs
Logs are written to multiple destinations: Production:logs/app.log- File outputstdout- Console output (for container logs)
logs/app.log- File outputpino-pretty- Pretty-printed console output
server/lib/logger.ts
Log Levels
Configure log level via environment variable:Structured Context
Always include relevant context in logs:- Use structured fields (objects) instead of string interpolation
- Include
ctxIdorthreadIdfor correlation - Add
errorfield for exceptions (automatically serialized) - Keep messages concise and action-oriented
Example Log Output
Development (pino-pretty):Error Handling Patterns
Consistent error handling across the codebase:toLogError utility safely extracts error information:
Monitoring Best Practices
1. Context Propagation
Always passctxId through the call stack:
- Filtering logs by conversation thread
- Correlating events across async operations
- Debugging specific user issues
2. Trace Important Operations
Log key lifecycle events:3. Monitor Resource Usage
Log sandbox lifecycle for cost monitoring:4. Alert on Critical Errors
Set up alerts for:- Unhandled exceptions
- Sandbox creation failures
- Database connection errors
- Rate limit exhaustion
- E2B API errors
Debugging Tips
Find All Logs for a Thread
Filter by Log Level
Track Sandbox Lifecycle
Monitor Tool Execution
Performance Metrics
Key metrics to track in production:| Metric | Description | Target |
|---|---|---|
| Response Time | Time from message to reply | < 5s |
| Sandbox Creation | Time to create new sandbox | < 30s |
| Sandbox Resume | Time to resume paused sandbox | < 5s |
| Tool Execution | Time per tool call | < 3s |
| Memory Usage | Application memory footprint | < 512MB |
| Active Sandboxes | Number of running sandboxes | < 50 |
Use Langfuse’s analytics dashboard to track AI-specific metrics like token usage, cost per conversation, and tool success rates.
Troubleshooting
Missing Traces
If traces aren’t appearing in Langfuse:- Verify
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYare set - Check SDK is properly initialized before first AI call
- Ensure SDK shutdown is called on exit
- Look for SDK errors in logs
Log File Size
If log files grow too large:- Implement log rotation (use
pino-rollor external tool) - Lower
LOG_LEVELtowarnorerror - Filter verbose libraries (Slack SDK, Drizzle)
- Set up external log aggregation (CloudWatch, Datadog)
Context Missing
If logs are missingctxId:
- Ensure
getContextId(context)is called early - Pass
ctxIdto all child functions - Add
ctxIdto child logger:logger.child({ ctxId })