AI Chat Interface

Aurora’s chat interface provides a natural language interface to your cloud infrastructure with real-time streaming responses, intelligent context management, and multi-session support.

Architecture Overview

WebSocket Connection

The chat uses a persistent WebSocket connection for bidirectional real-time communication:

Client: Next.js frontend with useWebSocket hook
Server: Python WebSocket server (main_chatbot.py) running on port 5006
Protocol: JSON messages with types: init, message, status, tool_call, usage_info, control

// client/src/hooks/useWebSocket.ts
const useWebSocket = ({ url, userId, onMessage }) => {
  const wsRef = useRef<WebSocket | null>(null);
  const [isConnected, setIsConnected] = useState(false);
  
  useEffect(() => {
    const ws = new WebSocket(url);
    
    ws.onopen = () => {
      setIsConnected(true);
      // Send init message to register user
      ws.send(JSON.stringify({
        type: 'init',
        user_id: userId
      }));
    };
    
    ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      onMessage(data);
    };
    
    wsRef.current = ws;
    return () => ws.close();
  }, [url, userId]);
  
  return { wsRef, isConnected, send: (data) => wsRef.current?.send(data) };
};

Message Flow

User sends message

Frontend sends JSON with query, session_id, user_id, mode, model, and provider_preference

Agent processes request

LangGraph workflow executes with streaming enabled via astream_events API

Tokens stream back

Server sends token events for each chunk of AI response (real-time streaming)

Tool calls execute

When AI needs to run a tool, server sends tool_call event with status updates

Final status

Server sends END status when workflow completes

Chat Modes

Aurora supports two chat modes controlled by the mode selector:

Ask Mode

Read-only mode for safe exploration:

Cannot execute write operations
No infrastructure changes
Safe for production queries
Tools blocked: cloud_tool (write), kubectl_tool (apply/delete), github_commit, iac_tool (apply)

Agent Mode

Full execution mode for automation:

Can deploy infrastructure
Execute kubectl commands
Commit to GitHub
Modify cloud resources
Requires confirmation for high-risk actions

# server/chat/backend/agent/access.py
class ModeAccessController:
    @staticmethod
    def is_tool_allowed(mode: str, tool_name: str) -> bool:
        """Check if a tool is allowed in the current mode."""
        if mode == "ask":
            # Block write operations in ask mode
            write_tools = [
                "cloud_tool", "kubectl_tool", "github_commit",
                "iac_tool", "github_apply_fix"
            ]
            return tool_name not in write_tools
        return True  # Agent mode allows all tools

Real-Time Streaming

Token-Level Streaming

The chat uses LangGraph’s astream_events API for token-level streaming:

# server/chat/backend/agent/workflow.py
async def stream(self, input_state: State):
    """Stream the workflow with token-level streaming."""
    async for event in self.app.astream_events(input_state, self.config, version="v2"):
        event_type = event.get("event")
        
        if event_type == "on_chat_model_stream":
            # Stream individual tokens from LLM
            chunk_data = event.get("data", {})
            chunk_obj = chunk_data.get("chunk")
            if chunk_obj and hasattr(chunk_obj, 'content') and chunk_obj.content:
                content = _extract_text_from_content(chunk_obj.content)
                if content:
                    yield ("token", content)

Gemini thinking models return content as a list of blocks with types thinking and text. Aurora extracts both for a complete thought stream.

Message Consolidation

During streaming, AIMessageChunk objects are accumulated and consolidated into complete AIMessage objects:

def _consolidate_message_chunks(self):
    """Consolidate AIMessageChunk objects into proper AIMessage objects."""
    chunk_builders = {}
    
    for msg in messages:
        if isinstance(msg, AIMessageChunk):
            builder = chunk_builders.get(msg.id)
            if builder is None:
                builder = {
                    "content": "",
                    "tool_calls": [],
                    "id": msg.id
                }
                chunk_builders[msg.id] = builder
            
            # Accumulate content
            builder["content"] += msg.content
            
            # Accumulate tool calls
            if hasattr(msg, 'tool_calls') and msg.tool_calls:
                builder["tool_calls"].extend(msg.tool_calls)
            
            # Finalize on finish_reason
            if msg.response_metadata.get("finish_reason") in ("tool_calls", "stop"):
                cleaned_tool_calls = self._clean_tool_calls(builder["tool_calls"])
                ai_msg = AIMessage(
                    content=builder["content"],
                    tool_calls=cleaned_tool_calls,
                    id=builder["id"]
                )
                consolidated_messages.append(ai_msg)
                del chunk_builders[msg.id]

Session Management

Creating a New Session

Chat sessions are created lazily on first message:

// client/src/app/chat/components/useChatSendHandlers.ts
const handleSend = async (text: string, websocket: WebSocket) => {
  let effectiveSessionId = currentSessionId;
  
  if (!effectiveSessionId) {
    // Create new session
    const newSession = await createSession({
      title: text.substring(0, 50) + (text.length > 50 ? '...' : ''),
      userId: userId
    });
    effectiveSessionId = newSession.id;
    setCurrentSessionId(newSession.id);
    justCreatedSessionRef.current = newSession.id;
  }
  
  // Send message via WebSocket
  websocket.send(JSON.stringify({
    type: 'message',
    query: text,
    session_id: effectiveSessionId,
    user_id: userId,
    mode: selectedMode,
    model: selectedModel,
    provider_preference: selectedProviders
  }));
};

Context Loading

When resuming a session, Aurora loads the full context:

# server/chat/backend/agent/workflow.py
if input_state.session_id and input_state.user_id:
    existing_context = LLMContextManager.load_context_history(
        input_state.session_id, 
        input_state.user_id
    )
    
    if existing_context:
        # Combine existing context with new messages
        combined_messages = []
        combined_messages.extend(existing_context)
        combined_messages.extend(input_state.messages)
        input_state.messages = combined_messages

Two-Tier Context Storage:

LLM Context (llm_context_history): Complete conversation for AI (includes tool messages)
UI Context (chat_sessions.messages): Formatted messages for display (excludes tool details)

Context Compression

For long conversations, Aurora uses LangChain’s context compression:

# server/chat/backend/agent/utils/chat_context_manager.py
compressed_messages, was_compressed = ChatContextManager.compress_context_if_needed(
    session_id, user_id, messages_full, model
)

if was_compressed:
    # Use compressed version for LLM context
    messages_for_context = compressed_messages
    logger.info(f"Context compression: {len(compressed_messages)} messages preserved")

User Workflows

Starting a New Chat

Navigate to /chat or click “New Chat” in sidebar
The UI shows the empty state with prompt suggestions
Type a question or click a suggested prompt
Session is created on first message send

Empty chat state with prompt suggestions

Empty chat state with dynamic prompt suggestions based on connected providers

Chat Input Features

Text Input
File Attachments
Mode & Model Selection

Rich text editor with markdown support
Multi-line input with Shift+Enter
Enter to send
Character limit: 2000 characters per message

Message Types in UI

{
  id: 1,
  role: 'user',
  content: 'List all pods in production namespace',
  timestamp: '2026-03-03T10:30:00Z'
}

Cancelling In-Progress Messages

Click the Stop button while a message is generating:

// client/src/hooks/useChatCancellation.ts
const cancelCurrentMessage = async () => {
  if (!userId || !sessionId || !webSocket.isConnected) return;
  
  // Send cancel control message
  webSocket.send(JSON.stringify({
    type: 'control',
    action: 'cancel',
    session_id: sessionId,
    user_id: userId
  }));
  
  // Backend cancels the Celery task and consolidates partial messages
};

The server:

Cancels the running workflow task
Waits for ongoing tool calls to complete
Consolidates partial message chunks
Saves the partial conversation state
Adds a cancellation notice to the context

Cancelling does not undo already-executed commands. It only stops further processing.

WebSocket Message Types

Client → Server

type

string

required

Message type: init, message, control, confirmation_response

query

string

User’s question (for message type)

session_id

string

Chat session ID (UUID)

user_id

string

required

User ID

mode

string

Chat mode: ask or agent

model

string

LLM model name: claude-3-5-sonnet-20241022, gpt-4o, gemini-2.0-flash-thinking-exp-01-21, etc.

provider_preference

array

Cloud providers to scope commands: ["gcp", "aws", "azure"]

attachments

array

File attachments with filename, file_type, file_data (base64), or server_path for server-side files

Server → Client

type

string

Message type: status, message, token, tool_call, usage_info

data

object

Message payload (varies by type)

{
  "type": "status",
  "data": {
    "status": "END"
  },
  "isComplete": true,
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

Rate Limiting

WebSocket connections are rate-limited per client:

Limit: 5 messages per 60 seconds
Scope: Per WebSocket connection ID
Exceeded: Returns {"type": "error", "data": {"text": "Rate limit exceeded"}}

class RateLimiter:
    def __init__(self, rate, per):
        self.tokens = defaultdict(lambda: rate)
        self.rate = rate
        self.per = per
    
    def is_allowed(self, client_id):
        now = time.time()
        elapsed = now - self.last_checked[client_id]
        self.tokens[client_id] += elapsed * (self.rate / self.per)
        
        if self.tokens[client_id] >= 1:
            self.tokens[client_id] -= 1
            return True
        return False

rate_limiter = RateLimiter(rate=5, per=60)

API Cost Tracking

Aurora tracks LLM API usage per user:

# After workflow completion
if user_id:
    asyncio.create_task(update_api_cost_cache_async(user_id))
    is_cached, total_cost = get_cached_api_cost(user_id)
    
    # Send usage info to client
    usage_info_response = {
        "type": "usage_info",
        "data": {
            "total_cost": round(total_cost, 2)
        }
    }
    await websocket.send(json.dumps(usage_info_response))

Costs are displayed in the chat UI footer in real-time.

Incident Investigation

Background RCA investigations use the same chat backend

Cloud Integrations

Execute commands across GCP, AWS, and Azure via chat

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

AI Chat Interface

Architecture Overview

WebSocket Connection

Message Flow

Chat Modes

Ask Mode

Agent Mode

Real-Time Streaming

Token-Level Streaming

Message Consolidation

Session Management

Creating a New Session

Context Loading

Context Compression

User Workflows

Starting a New Chat

Chat Input Features

Message Types in UI

Cancelling In-Progress Messages

WebSocket Message Types

Client → Server

Server → Client

Rate Limiting

API Cost Tracking

Incident Investigation

Cloud Integrations

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

​Architecture Overview

​WebSocket Connection

​Message Flow

​Chat Modes

Ask Mode

Agent Mode

​Real-Time Streaming

​Token-Level Streaming

​Message Consolidation

​Session Management

​Creating a New Session

​Context Loading

​Context Compression

​User Workflows

​Starting a New Chat

​Chat Input Features

​Message Types in UI

​Cancelling In-Progress Messages

​WebSocket Message Types

​Client → Server

​Server → Client

​Rate Limiting

​API Cost Tracking

​Related Features

Incident Investigation

Cloud Integrations

Build docs developers (and LLMs) love

Architecture Overview

WebSocket Connection

Message Flow

Chat Modes

Real-Time Streaming

Token-Level Streaming

Message Consolidation

Session Management

Creating a New Session

Context Loading

Context Compression

User Workflows

Starting a New Chat

Chat Input Features

Message Types in UI

Cancelling In-Progress Messages

WebSocket Message Types

Client → Server

Server → Client

Rate Limiting

API Cost Tracking

Related Features