Overview
The/siaa/fragmento endpoint shows exactly which text fragments (chunks) would be extracted from a document and sent to the AI model for a given query. This is the most detailed diagnostic tool for understanding what context the model receives.
Endpoint
Parameters
Document filename (case-insensitive, lowercase recommended)Example:
acuerdo_no._psaa16-10476.mdThe query/question used for fragment extraction. The extractor selects the most relevant chunks based on this query.Example:
¿Cuáles son los funcionarios responsables?Response
The document filename that was searched
The query that was used for extraction
The extracted text that would be sent to the AI model as context. This includes:
- Document header with name and section markers
- Up to
MAX_CHUNKS_CONTEXTO(default: 3) selected chunks - Each chunk includes its section heading
- Total size limited by
CHUNK_SIZE×MAX_CHUNKS_CONTEXTO
Total character count of the extracted fragment
Error Response
If required parameters are missing:Error message: “Parámetros ‘doc’ y ‘q’ requeridos”
400
Example
Request
Response
Use Cases
Debugging “No encontré información” Responses
When the AI model responds that it couldn’t find information, check what context it actually received:Validating Chunk Selection
Verify that the most relevant chunks are being selected:Testing Query Expansion
Compare fragments for queries with and without expanded terms:Checking Article Bonus Scoring
When querying for specific articles, verify the correct article is extracted:Analyzing Context Size
Check if the extracted context is within optimal size limits:chars is too low (<500), the extraction might be too selective.
If chars is at maximum (2400), consider if all chunks are equally relevant.
Testing Overlap Effectiveness
Chunks haveCHUNK_OVERLAP (default: 300 chars) to prevent splitting articles:
Notes
- Chunks are pre-calculated during document loading with
CHUNK_SIZE(default: 800) andCHUNK_OVERLAP(default: 300) - The extraction algorithm uses a sophisticated scoring system:
- Base scoring: TF-IDF weighted by term frequency in chunk
- Query match bonus: +15 if the full query appears in the chunk
- Article bonus: +10 for “artículo N°”, +5 for “artículo N”
- Numbered list bonus: +4 for chunks with procedural steps
- Proximity bonus: Up to +20 for high keyword density in 150-char windows
- Query expansion automatically adds related terms:
- Temporal queries (“cuándo”) → adds “periodicidad”, “plazo”, “hábil”
- Definition queries (“qué es”) → adds “sistema”, “herramienta”, “objeto”
- Responsibility queries (“quién”) → adds “responsable”, “funcionario”
- Sanction queries (“qué pasa”) → adds “sanción”, “disciplinario”, “incumplimiento”
- Listing questions (“cuáles son”, “enumera”) force minimum 2 chunks to avoid truncation
- The actual AI model receives this fragment wrapped in
[DOC:NAME]markers