Skip to main content
Trigger testing validates that skills activate correctly in response to user prompts. Skill Lab defines 4 trigger types based on OpenAI’s methodology for testing tool selection behavior.

Overview

From core/models.py:27-33:
class TriggerType(str, Enum):
    """Types of trigger tests based on OpenAI methodology."""
    
    EXPLICIT = "explicit"      # Skill named directly with $ prefix
    IMPLICIT = "implicit"      # Describes exact scenario without naming skill
    CONTEXTUAL = "contextual"  # Realistic noisy prompt with domain context
    NEGATIVE = "negative"      # Should NOT trigger (catches false positives)
These trigger types test different levels of prompt clarity and help ensure skills activate reliably across realistic usage patterns.

1. Explicit Triggers

Purpose: Verify that directly naming the skill activates it. Explicit triggers use the $skill-name syntax to request a specific skill by name. This is the clearest form of activation and should always work.

Example Test Case

- id: explicit-1
  name: "Explicit trigger with $ prefix"
  skill_name: creating-reports
  trigger_type: explicit
  prompt: "$creating-reports for the sales data from last quarter"
  expected:
    skill_triggered: true

Characteristics

  • Clarity: Highest - skill name is explicit in the prompt
  • Real-world usage: Common when users know which skill they want
  • Expected behavior: Should always trigger the named skill
The $ prefix is a convention used by some agent runtimes (like Claude CLI) to explicitly invoke skills.

2. Implicit Triggers

Purpose: Test that the skill activates when the scenario matches perfectly without naming it. Implicit triggers describe exactly what the skill does without mentioning the skill name. The agent must match the prompt to the skill based on the description alone.

Example Test Case

- id: implicit-1
  name: "Implicit trigger describing exact scenario"
  skill_name: creating-reports
  trigger_type: implicit
  prompt: "I need to generate a comprehensive report from my sales data"
  expected:
    skill_triggered: true

Characteristics

  • Clarity: High - clear intent but no skill name
  • Real-world usage: Very common - users describe what they want
  • Expected behavior: Should trigger if description matches well

Best Practices for Implicit Tests

  1. Use phrasing that closely mirrors the skill’s description
  2. Include key terms and concepts from the skill documentation
  3. Avoid ambiguity - make the intent clear

3. Contextual Triggers

Purpose: Validate activation in realistic, noisy prompts with extra context. Contextual triggers simulate real-world scenarios where the user provides additional context, background information, or constraints alongside the core request. These test the skill’s robustness to noise.

Example Test Case

- id: contextual-1
  name: "Contextual trigger with domain context"
  skill_name: creating-reports
  trigger_type: contextual
  prompt: |
    I'm preparing for the Q4 board meeting next week. We have sales data
    from three regions (APAC, EMEA, Americas) and I need to present the
    results. Can you help me create a detailed report showing trends,
    comparisons, and key insights? The data is in CSV format.
  expected:
    skill_triggered: true

Characteristics

  • Clarity: Medium - intent is clear but buried in context
  • Real-world usage: Very common - users provide background and constraints
  • Expected behavior: Should still trigger despite extra context

Best Practices for Contextual Tests

  1. Include realistic domain-specific details
  2. Add constraints, preferences, or requirements
  3. Mix the core request with background information
  4. Test with different levels of “noise” to find activation thresholds
Contextual triggers are the most realistic but also the hardest to get right. If your skill fails contextual tests but passes implicit tests, consider improving the description or adding more examples.

4. Negative Triggers

Purpose: Ensure the skill does NOT activate for unrelated tasks (catch false positives). Negative triggers test that the skill correctly stays inactive when the user’s request doesn’t match. This prevents over-eager activation and ensures skill specificity.

Example Test Case

- id: negative-1
  name: "Should not trigger for data analysis request"
  skill_name: creating-reports
  trigger_type: negative
  prompt: "Can you help me analyze the statistical significance of these results?"
  expected:
    skill_triggered: false

Characteristics

  • Clarity: Varies - prompt is clear but for a different task
  • Real-world usage: Common - users have diverse needs
  • Expected behavior: Skill should remain inactive

Best Practices for Negative Tests

  1. Test with tasks that are similar but distinct from the skill’s purpose
  2. Include prompts that might share keywords but have different intent
  3. Cover edge cases where activation would be incorrect
  4. Test with tasks that other skills should handle instead
Negative tests are crucial for preventing false positives in multi-skill environments where multiple skills might seem relevant.

Test Case Structure

Trigger tests are defined in .skill-lab/tests/triggers.yaml:
tests:
  - id: unique-test-id
    name: Human-readable test name
    skill_name: skill-directory-name
    trigger_type: explicit|implicit|contextual|negative
    prompt: |
      The user prompt to test
    expected:
      skill_triggered: true|false
      # Optional assertions:
      exit_code: 0
      commands_include:
        - "python scripts/generate.py"
      files_created:
        - "output/report.pdf"
      no_loops: true

Expectation Fields

From core/models.py:156-163:
FieldTypeDescription
skill_triggeredbooleanWhether the skill should activate
exit_codeintExpected process exit code (optional)
commands_includelist[string]Commands that should appear in trace (optional)
files_createdlist[string]Files that should be created (optional)
no_loopsbooleanVerify no retry loops occurred (optional)

Running Trigger Tests

To run trigger tests for a skill:
# Run all trigger tests
sklab trigger ./my-skill

# Run with a specific runtime
sklab trigger ./my-skill --runtime claude
Trigger testing requires the Claude CLI to be installed and configured. See the Trigger Testing guide for setup instructions.

Test Generation

Skill Lab can automatically generate trigger test cases using an LLM:
# Generate tests for all 4 trigger types
sklab generate ./my-skill

# Specify the model to use
sklab generate ./my-skill --model claude-sonnet-4-5-20250929

# Generate only specific trigger types
sklab generate ./my-skill --types explicit,implicit
The generator analyzes the skill’s description and body to create realistic test cases for each trigger type.

Trigger Test Reports

After running trigger tests, you’ll get a summary by type:
Trigger Test Results:
  explicit:    2/2 passed (100%)
  implicit:    3/3 passed (100%)
  contextual:  2/3 passed (67%)
  negative:    4/4 passed (100%)

Overall: 11/12 tests passed (91.7%)
From core/models.py:237-269, the TriggerReport includes:
  • Overall pass rate
  • Breakdown by trigger type
  • Individual test results with trace paths
  • Detailed failure messages

Best Practices

Coverage Guidelines

A well-tested skill should include:
Trigger TypeRecommended CountWhy
Explicit1-2Verify basic name-based activation
Implicit2-3Test core scenario matching
Contextual2-4Validate robustness to real-world noise
Negative2-4Prevent false positives

Writing Effective Tests

  1. Start with explicit - Ensure basic activation works
  2. Add implicit tests - Cover the main use cases from your description
  3. Include contextual tests - Simulate realistic user prompts
  4. Don’t skip negative tests - Prevent false positives early

Common Pitfalls

Over-specific contextual tests: Don’t make contextual tests so specific that they’re essentially implicit tests with extra words. Add genuine domain context and realistic noise.
Weak negative tests: Don’t test with completely unrelated prompts (e.g., “What’s the weather?” for a report generation skill). Test with prompts that are similar enough to be interesting.

Build docs developers (and LLMs) love