Aurora’s chat interface provides a natural language interface to your cloud infrastructure with real-time streaming responses, intelligent context management, and multi-session support.
Architecture Overview
WebSocket Connection
The chat uses a persistent WebSocket connection for bidirectional real-time communication:
Client : Next.js frontend with useWebSocket hook
Server : Python WebSocket server (main_chatbot.py) running on port 5006
Protocol : JSON messages with types: init, message, status, tool_call, usage_info, control
Client WebSocket Hook
Server WebSocket Handler
// client/src/hooks/useWebSocket.ts
const useWebSocket = ({ url , userId , onMessage }) => {
const wsRef = useRef < WebSocket | null >( null );
const [ isConnected , setIsConnected ] = useState ( false );
useEffect (() => {
const ws = new WebSocket ( url );
ws . onopen = () => {
setIsConnected ( true );
// Send init message to register user
ws . send ( JSON . stringify ({
type: 'init' ,
user_id: userId
}));
};
ws . onmessage = ( event ) => {
const data = JSON . parse ( event . data );
onMessage ( data );
};
wsRef . current = ws ;
return () => ws . close ();
}, [ url , userId ]);
return { wsRef , isConnected , send : ( data ) => wsRef . current ?. send ( data ) };
};
Message Flow
User sends message
Frontend sends JSON with query, session_id, user_id, mode, model, and provider_preference
Agent processes request
LangGraph workflow executes with streaming enabled via astream_events API
Tokens stream back
Server sends token events for each chunk of AI response (real-time streaming)
Tool calls execute
When AI needs to run a tool, server sends tool_call event with status updates
Final status
Server sends END status when workflow completes
Chat Modes
Aurora supports two chat modes controlled by the mode selector:
Ask Mode Read-only mode for safe exploration:
Cannot execute write operations
No infrastructure changes
Safe for production queries
Tools blocked: cloud_tool (write), kubectl_tool (apply/delete), github_commit, iac_tool (apply)
Agent Mode Full execution mode for automation:
Can deploy infrastructure
Execute kubectl commands
Commit to GitHub
Modify cloud resources
Requires confirmation for high-risk actions
# server/chat/backend/agent/access.py
class ModeAccessController :
@ staticmethod
def is_tool_allowed ( mode : str , tool_name : str ) -> bool :
"""Check if a tool is allowed in the current mode."""
if mode == "ask" :
# Block write operations in ask mode
write_tools = [
"cloud_tool" , "kubectl_tool" , "github_commit" ,
"iac_tool" , "github_apply_fix"
]
return tool_name not in write_tools
return True # Agent mode allows all tools
Real-Time Streaming
Token-Level Streaming
The chat uses LangGraph’s astream_events API for token-level streaming:
# server/chat/backend/agent/workflow.py
async def stream ( self , input_state : State):
"""Stream the workflow with token-level streaming."""
async for event in self .app.astream_events(input_state, self .config, version = "v2" ):
event_type = event.get( "event" )
if event_type == "on_chat_model_stream" :
# Stream individual tokens from LLM
chunk_data = event.get( "data" , {})
chunk_obj = chunk_data.get( "chunk" )
if chunk_obj and hasattr (chunk_obj, 'content' ) and chunk_obj.content:
content = _extract_text_from_content(chunk_obj.content)
if content:
yield ( "token" , content)
Gemini thinking models return content as a list of blocks with types thinking and text. Aurora extracts both for a complete thought stream.
Message Consolidation
During streaming, AIMessageChunk objects are accumulated and consolidated into complete AIMessage objects:
def _consolidate_message_chunks ( self ):
"""Consolidate AIMessageChunk objects into proper AIMessage objects."""
chunk_builders = {}
for msg in messages:
if isinstance (msg, AIMessageChunk):
builder = chunk_builders.get(msg.id)
if builder is None :
builder = {
"content" : "" ,
"tool_calls" : [],
"id" : msg.id
}
chunk_builders[msg.id] = builder
# Accumulate content
builder[ "content" ] += msg.content
# Accumulate tool calls
if hasattr (msg, 'tool_calls' ) and msg.tool_calls:
builder[ "tool_calls" ].extend(msg.tool_calls)
# Finalize on finish_reason
if msg.response_metadata.get( "finish_reason" ) in ( "tool_calls" , "stop" ):
cleaned_tool_calls = self ._clean_tool_calls(builder[ "tool_calls" ])
ai_msg = AIMessage(
content = builder[ "content" ],
tool_calls = cleaned_tool_calls,
id = builder[ "id" ]
)
consolidated_messages.append(ai_msg)
del chunk_builders[msg.id]
Session Management
Creating a New Session
Chat sessions are created lazily on first message:
// client/src/app/chat/components/useChatSendHandlers.ts
const handleSend = async ( text : string , websocket : WebSocket ) => {
let effectiveSessionId = currentSessionId ;
if ( ! effectiveSessionId ) {
// Create new session
const newSession = await createSession ({
title: text . substring ( 0 , 50 ) + ( text . length > 50 ? '...' : '' ),
userId: userId
});
effectiveSessionId = newSession . id ;
setCurrentSessionId ( newSession . id );
justCreatedSessionRef . current = newSession . id ;
}
// Send message via WebSocket
websocket . send ( JSON . stringify ({
type: 'message' ,
query: text ,
session_id: effectiveSessionId ,
user_id: userId ,
mode: selectedMode ,
model: selectedModel ,
provider_preference: selectedProviders
}));
};
Context Loading
When resuming a session, Aurora loads the full context:
# server/chat/backend/agent/workflow.py
if input_state.session_id and input_state.user_id:
existing_context = LLMContextManager.load_context_history(
input_state.session_id,
input_state.user_id
)
if existing_context:
# Combine existing context with new messages
combined_messages = []
combined_messages.extend(existing_context)
combined_messages.extend(input_state.messages)
input_state.messages = combined_messages
Two-Tier Context Storage :
LLM Context (llm_context_history): Complete conversation for AI (includes tool messages)
UI Context (chat_sessions.messages): Formatted messages for display (excludes tool details)
Context Compression
For long conversations, Aurora uses LangChain’s context compression:
# server/chat/backend/agent/utils/chat_context_manager.py
compressed_messages, was_compressed = ChatContextManager.compress_context_if_needed(
session_id, user_id, messages_full, model
)
if was_compressed:
# Use compressed version for LLM context
messages_for_context = compressed_messages
logger.info( f "Context compression: { len (compressed_messages) } messages preserved" )
User Workflows
Starting a New Chat
Navigate to /chat or click “New Chat” in sidebar
The UI shows the empty state with prompt suggestions
Type a question or click a suggested prompt
Session is created on first message send
Empty chat state with dynamic prompt suggestions based on connected providers
Text Input
File Attachments
Mode & Model Selection
Rich text editor with markdown support
Multi-line input with Shift+Enter
Enter to send
Character limit: 2000 characters per message
Upload images (PNG, JPG, GIF, WebP)
Upload documents (PDF)
Upload code archives (ZIP - extracted server-side)
Max file size: 10MB per file
Max attachments: 5 per message
Mode : Toggle between Ask (read-only) and Agent (execution)
Model : Select from Claude, GPT-4, Gemini, or local models
Providers : Multi-select cloud providers to scope commands
Message Types in UI
User Message
Assistant Message
Tool Call Message
{
id : 1 ,
role : 'user' ,
content : 'List all pods in production namespace' ,
timestamp : '2026-03-03T10:30:00Z'
}
Cancelling In-Progress Messages
Click the Stop button while a message is generating:
// client/src/hooks/useChatCancellation.ts
const cancelCurrentMessage = async () => {
if ( ! userId || ! sessionId || ! webSocket . isConnected ) return ;
// Send cancel control message
webSocket . send ( JSON . stringify ({
type: 'control' ,
action: 'cancel' ,
session_id: sessionId ,
user_id: userId
}));
// Backend cancels the Celery task and consolidates partial messages
};
The server:
Cancels the running workflow task
Waits for ongoing tool calls to complete
Consolidates partial message chunks
Saves the partial conversation state
Adds a cancellation notice to the context
Cancelling does not undo already-executed commands. It only stops further processing.
WebSocket Message Types
Client → Server
Message type: init, message, control, confirmation_response
User’s question (for message type)
LLM model name: claude-3-5-sonnet-20241022, gpt-4o, gemini-2.0-flash-thinking-exp-01-21, etc.
Cloud providers to scope commands: ["gcp", "aws", "azure"]
File attachments with filename, file_type, file_data (base64), or server_path for server-side files
Server → Client
Message type: status, message, token, tool_call, usage_info
Message payload (varies by type)
Status Message
Token Stream
Tool Call
{
"type" : "status" ,
"data" : {
"status" : "END"
},
"isComplete" : true ,
"session_id" : "550e8400-e29b-41d4-a716-446655440000"
}
Rate Limiting
WebSocket connections are rate-limited per client:
Limit : 5 messages per 60 seconds
Scope : Per WebSocket connection ID
Exceeded : Returns {"type": "error", "data": {"text": "Rate limit exceeded"}}
class RateLimiter :
def __init__ ( self , rate , per ):
self .tokens = defaultdict( lambda : rate)
self .rate = rate
self .per = per
def is_allowed ( self , client_id ):
now = time.time()
elapsed = now - self .last_checked[client_id]
self .tokens[client_id] += elapsed * ( self .rate / self .per)
if self .tokens[client_id] >= 1 :
self .tokens[client_id] -= 1
return True
return False
rate_limiter = RateLimiter( rate = 5 , per = 60 )
API Cost Tracking
Aurora tracks LLM API usage per user:
# After workflow completion
if user_id:
asyncio.create_task(update_api_cost_cache_async(user_id))
is_cached, total_cost = get_cached_api_cost(user_id)
# Send usage info to client
usage_info_response = {
"type" : "usage_info" ,
"data" : {
"total_cost" : round (total_cost, 2 )
}
}
await websocket.send(json.dumps(usage_info_response))
Costs are displayed in the chat UI footer in real-time.
Incident Investigation Background RCA investigations use the same chat backend
Cloud Integrations Execute commands across GCP, AWS, and Azure via chat