Skip to main content

What It Does

Instantly searches the web for real-time answers using the Perplexity AI API (Sonar Pro). Just ask anything — “What’s the weather in Tokyo?” or “Latest news on AI?” — and it returns a concise, spoken-friendly answer without symbols or jargon.

Suggested Trigger Words

  • “search”
  • “tell me about”
  • “what is”
  • “who is”

Key Features

Real-Time Search

Access current information from the web, not just LLM training data

Clean Answers

Automatically removes citations and symbols for natural speech

Fast Response

Optimized for quick answers with max_tokens=150

Sonar Pro Model

Uses Perplexity’s most advanced search model

How It Works

1

User Query

User asks a question using a trigger word
2

Immediate Acknowledgment

“Let me check that for you real quick”
3

API Request

Sends query to Perplexity Sonar Pro with search capabilities enabled
4

Response Cleaning

Strips citation markers [1], [2], etc. and extra whitespace
5

Spoken Answer

Delivers clean, conversational answer via TTS

API Requirements

Perplexity AI API key required. Get one at perplexity.ai/settings/api

Setup Instructions

  1. Visit perplexity.ai/settings/api
  2. Create an API key
  3. Replace YOUR_API_KEY in main.py with your actual key:
api_key = "pplx-abc123..."  # Your real key here

Code Walkthrough

API Configuration

Optimized for quick, spoken-friendly responses:
payload = {
    "model": "sonar-pro",  # sonar or sonar-pro both work
    "temperature": 0.2,
    "disable_runs": False,
    "top_p": 0.9,
    "max_tokens": 150,
    "messages": [
        {
            "role": "system",
            "content": (
                """
                Give a short, clear answer in simple spoken language.
                Do not use symbols, citations, or abbreviations.
                """
            )
        },
        {
            "role": "user",
            "content": f"{msg}"
        }
    ]
}
Model Choice: sonar-pro provides more accurate results. Use sonar for faster/cheaper queries.

API Request with Timing

Logs response time for debugging:
start_time = time.time()

# Send the request to Perplexity
response = requests.post(
    "https://api.perplexity.ai/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json=payload
)

end_time = time.time()
response_time = round(end_time - start_time, 3)

self.worker.editor_logging_handler.info(f"⏱️ API Response Time: {response_time} seconds")

Response Extraction and Cleaning

Removes citations and cleans up formatting:
try:
    result = response.json()
    self.worker.editor_logging_handler.info(f"✅ Parsed JSON Response:\n{json.dumps(result, indent=2)}")
except Exception as e:
    self.worker.editor_logging_handler.info(f"❌ Failed to parse JSON: {e}")
    result = {}

# Extract the assistant's message (final summary)
search_result = result.get("choices", [{}])[0].get("message", {}).get("content", "Sorry, I couldn't find anything.")

# Remove citation markers like [1], [2], etc.
search_result = re.sub(r"\[\d+\]", "", search_result)
search_result = search_result.replace("  ", " ").strip()

Delivery

# Speak the final summarized result
await self.capability_worker.speak("Here's what I found:")
await self.capability_worker.speak(search_result)

# Resume the normal workflow
self.capability_worker.resume_normal_flow()

Example Conversations

User: What's the capital of Australia?

AI: Let me check that for you real quick...

AI: Here's what I found: The capital of Australia is Canberra, 
    not Sydney as many people assume.

Advanced Configuration

Adjusting Response Length

"max_tokens": 150,  # Shorter, punchier answers
"max_tokens": 300,  # More detailed explanations

Temperature Control

"temperature": 0.2,  # More factual, deterministic
"temperature": 0.7,  # More creative, varied responses

Model Selection

Best for: Accurate, detailed answersCost: Higher per queryUse when: User needs reliable, in-depth information

Extending This Ability

Follow-up Questions

Add conversation loop to allow multi-turn research sessions

Source Citations

Parse and speak source URLs for fact-checking

Category Detection

Route different query types to different models or parameters

Search History

Store and recall previous searches across sessions

Why Use Perplexity Instead of LLM?

Perplexity searches the web for current information, while base LLMs only know information up to their training cutoff date.Example: “What’s the current price of Bitcoin?” needs live data.
Perplexity cites and verifies sources, reducing hallucinations for factual queries.Example: “When was the last SpaceX launch?” requires verified data.
Perfect for queries about recent news, sports scores, stock prices, or weather.Example: “Who won the game last night?” needs current information.
Perplexity is ideal for factual, time-sensitive queries. For creative tasks or advice, the base LLM is often better.

Troubleshooting

Your API key is invalid or expired. Check that you’ve correctly replaced YOUR_API_KEY in the code.
You’ve exceeded your API quota. Check your usage at perplexity.ai/settings/api
Try increasing max_tokens from 150 to 250 for longer responses.
The cleaning regex should remove [1], [2] markers. Check that this line is present:
search_result = re.sub(r"\[\d+\]", "", search_result)

Build docs developers (and LLMs) love