Debugging LLM Applications

Debugging LLM applications is different from traditional software debugging. Issues can be subtle - wrong responses, inconsistent behavior, or silent failures that only affect quality, not functionality. Helicone provides comprehensive debugging tools to identify, diagnose, and resolve issues in your LLM applications.

Common LLM Issues

Errors & Timeouts

API failures, rate limits, timeouts, and provider outages

Quality Issues

Wrong answers, inconsistent outputs, hallucinations, and context loss

Performance Problems

Slow responses, high latency, and token inefficiency

Cost Overruns

Unexpected spending, inefficient prompts, and model selection

Debugging Workflow

Filter by Status Codes

Start by identifying failed requests using status code filters:

Helicone request page showing status code filter for error identification

Common status codes:

200 - Success
400 - Bad request (malformed input)
401 - Authentication failed
429 - Rate limit exceeded
500 - Provider error
503 - Provider unavailable

// Add request IDs for easier debugging
const requestId = `req-${Date.now()}`;

const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [...],
  },
  {
    headers: {
      "Helicone-Request-Id": requestId,
      "Helicone-Property-Feature": "document-processing",
    },
  }
);

Inspect Request Details

Click on any request to see complete details:

Helicone request detail page with full request and response data

Key information available:

Full request body - Exact prompt and parameters sent
Complete response - What the model returned
Timing breakdown - Where latency occurred
Token usage - Input/output token counts
Cost - Exact cost of this request
Custom properties - Your metadata for filtering

Use Playground for Testing

Test fixes immediately without redeploying code:

Playground button on request detail page

The Playground allows you to:

Modify the prompt and see new results
Change model parameters (temperature, max tokens)
Switch models to compare outputs
Test different approaches quickly

Helicone playground interface for testing prompts

Currently, only OpenAI models are supported in the Playground

Track Sessions for Context

Debug issues in multi-turn conversations by viewing complete sessions:

const sessionId = `session-${userId}-${Date.now()}`;

// First request in conversation
await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  },
  {
    headers: {
      "Helicone-Session-Id": sessionId,
      "Helicone-Session-Name": "Customer Chat",
      "Helicone-Session-Path": "/greeting",
    },
  }
);

// Follow-up request (same session)
await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: conversationHistory,
  },
  {
    headers: {
      "Helicone-Session-Id": sessionId,
      "Helicone-Session-Path": "/follow-up",
    },
  }
);

Sessions help you:

See the full conversation context
Identify where context was lost
Track how costs accumulate
Understand user interaction patterns

Debugging Specific Issues

API Errors & Rate Limits

When you see 429 or 500 errors:

Implement Retries
Add Rate Limiting
Use Fallback Providers

async function makeRequestWithRetry(
  client: OpenAI,
  params: any,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await client.chat.completions.create(
        params,
        {
          headers: {
            "Helicone-Property-Retry-Attempt": String(i),
          },
        }
      );
    } catch (error: any) {
      if (error?.status === 429 && i < maxRetries - 1) {
        // Exponential backoff
        const delay = Math.pow(2, i) * 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}

// Prevent rate limit errors
headers: {
  "Helicone-RateLimit-Policy": "100;w=60;s=user", // 100 per minute per user
}

import { createGateway } from "@ai-sdk/gateway";

const gateway = createGateway({
  apiKey: process.env.GATEWAY_API_KEY,
  baseURL: "https://gateway.helicone.ai/v1",
});

// Automatically falls back if primary provider fails
const response = await gateway.chat.completions.create({
  model: "gpt-4o",
  messages: [...],
});

Quality Issues

When responses are wrong or inconsistent:

Compare Across Sessions

Filter requests by custom properties to identify patterns:

headers: {
  "Helicone-Property-Query-Type": "technical-support",
  "Helicone-Property-User-Type": "premium",
}

Then filter in the dashboard to see:

Do technical queries fail more often?
Are premium users having different issues?
Which features have the most quality problems?

Track Model Versions

Tag requests with model versions to compare quality:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    extra_headers={
        "Helicone-Property-Prompt-Version": "v2.1",
        "Helicone-Property-System-Prompt": "technical-assistant"
    }
)

This helps you:

A/B test prompt changes
Track quality regressions
Identify which version works best

Use Score Tracking

Add quality scores to track improvements:

// After getting user feedback
await fetch(`https://api.helicone.ai/v1/request/${requestId}/score`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${HELICONE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    scores: {
      "user-satisfaction": 5,
      "accuracy": 0.9,
      "helpfulness": 4,
    },
  }),
});

Performance Problems

When responses are slow:

Analyze Latency
Optimize Token Usage
Use Faster Models

Check the request details for timing breakdown:

Queue time - How long before processing started
Processing time - Model inference time
Network time - Transfer latency

// Add timing metadata
const startTime = Date.now();

const response = await client.chat.completions.create(
  params,
  {
    headers: {
      "Helicone-Property-Client-Start-Time": String(startTime),
    },
  }
);

const endTime = Date.now();
console.log(`Total latency: ${endTime - startTime}ms`);

Review token counts in request details:

// Reduce max tokens for faster responses
const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [...],
    max_tokens: 500, // Limit output length
  },
  {
    headers: {
      "Helicone-Property-Max-Tokens": "500",
    },
  }
);

Switch to faster models for simple queries:

function selectModel(complexity: string) {
  switch (complexity) {
    case "simple":
      return "gpt-4o-mini"; // Much faster
    case "complex":
      return "gpt-4o";
    default:
      return "gpt-4o-mini";
  }
}

Cost Overruns

When costs are higher than expected:

// Add cost tracking properties
headers: {
  "Helicone-Property-Feature": "document-analysis",
  "Helicone-Property-Document-Length": String(docLength),
  "Helicone-Session-Id": sessionId,
}

Then analyze in the dashboard:

Filter by feature to find expensive operations
Check session costs to see complete workflows
Review token usage to identify inefficient prompts
Compare model costs to find cheaper alternatives

See the Cost Tracking guide for detailed optimization strategies.

Advanced Debugging Techniques

Custom Request IDs

Use predictable IDs to correlate with your own logs:

const requestId = `${userId}-${feature}-${timestamp}`;

headers: {
  "Helicone-Request-Id": requestId,
}

Then search for this ID in both Helicone and your application logs.

Property-Based Filtering

Tag requests with rich metadata for powerful filtering:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    extra_headers={
        "Helicone-Property-Environment": os.getenv("ENV"),
        "Helicone-Property-User-Tier": user.tier,
        "Helicone-Property-Feature": "search",
        "Helicone-Property-Version": "v2.3",
        "Helicone-Property-AB-Test": "prompt-variant-B",
    }
)

Filter combinations like:

“Show me production errors for premium users”
“Compare v2.3 vs v2.2 response times”
“Which A/B test variant has better quality?”

Session Replay

Replay entire sessions to reproduce issues:

Find the problematic session in the dashboard
Click “Replay Session”
View the exact sequence of requests
Test fixes against the same inputs

Session replay is especially useful for debugging multi-turn conversations where context matters.

Debugging Checklist

When investigating an issue:

Check status codes for obvious errors
Review request/response in detail
Test fixes in Playground
Look at session context if multi-turn
Filter by custom properties to find patterns
Compare with working requests
Check timing breakdown for performance
Review token usage for cost issues
Add more logging for future debugging

Proactive Debugging

Prevent issues before they happen:

Set Up Alerts

// Configure in Helicone dashboard:
// 1. Error rate > 5%
// 2. Average latency > 2 seconds
// 3. Daily cost > $100
// 4. Any 500 errors

Add Comprehensive Logging

function makeTrackedRequest(feature: string, userId: string, params: any) {
  return client.chat.completions.create(
    params,
    {
      headers: {
        "Helicone-Session-Id": `${userId}-${Date.now()}`,
        "Helicone-Property-Feature": feature,
        "Helicone-Property-Environment": process.env.NODE_ENV,
        "Helicone-Property-Version": APP_VERSION,
        "Helicone-User-Id": userId,
      },
    }
  );
}

Monitor Key Metrics

Track these metrics weekly:

Error rate - Should stay below 2%
P95 latency - Should be under 3 seconds
Average cost per session - Watch for increases
Cache hit rate - Should be above 50% for cacheable content

Debugging Tools Reference

Request Filters

Filter by status, model, properties, and more

Sessions

Track multi-turn conversations and workflows

Custom Properties

Add metadata for powerful filtering

Alerts

Get notified of issues immediately

Next Steps

Agent Tracing

Debug complex agent workflows with tool calls

Cost Tracking

Identify and optimize expensive operations

Experiments

A/B test fixes before deploying to production

Use Cases

Tutorials

Debugging LLM Applications

Common LLM Issues

Errors & Timeouts

Quality Issues

Performance Problems

Cost Overruns

Debugging Workflow

Debugging Specific Issues

API Errors & Rate Limits

Quality Issues

Performance Problems

Cost Overruns

Advanced Debugging Techniques

Custom Request IDs

Property-Based Filtering

Session Replay

Debugging Checklist

Proactive Debugging

Set Up Alerts

Add Comprehensive Logging

Monitor Key Metrics

Debugging Tools Reference

Request Filters

Sessions

Custom Properties

Alerts

Next Steps

Agent Tracing

Cost Tracking

Experiments

Build docs developers (and LLMs) love

Use Cases

Tutorials

​Common LLM Issues

Errors & Timeouts

Quality Issues

Performance Problems

Cost Overruns

​Debugging Workflow

​Debugging Specific Issues

​API Errors & Rate Limits

​Quality Issues

​Performance Problems

​Cost Overruns

​Advanced Debugging Techniques

​Custom Request IDs

​Property-Based Filtering

​Session Replay

​Debugging Checklist

​Proactive Debugging

​Set Up Alerts

​Add Comprehensive Logging

​Monitor Key Metrics

​Debugging Tools Reference

Request Filters

Sessions

Custom Properties

Alerts

​Next Steps

Agent Tracing

Cost Tracking

Experiments

Build docs developers (and LLMs) love

Common LLM Issues

Debugging Workflow

Debugging Specific Issues

API Errors & Rate Limits

Quality Issues

Performance Problems

Cost Overruns

Advanced Debugging Techniques

Custom Request IDs

Property-Based Filtering

Session Replay

Debugging Checklist

Proactive Debugging

Set Up Alerts

Add Comprehensive Logging

Monitor Key Metrics

Debugging Tools Reference

Next Steps