Overview
XMLParser extends Parser to extract structured data from XML-tagged completions. It’s useful for prompts that request specific output formats like <reasoning>...</reasoning><answer>...</answer>.
Constructor
XMLParser(
fields: list[str | tuple[str, ...]],
answer_field: str = "answer",
extract_fn: Callable[[str], str] = lambda x: x,
)
fields
list[str | tuple[str, ...]]
List of field definitions. Each can be:
- A string: Fixed XML tag name (e.g.,
"reasoning")
- A tuple: Multiple allowed tag names with the first as canonical (e.g.,
("code", "answer"))
Which field contains the final answer for parse_answer().
Additional transformation applied to extracted field values.
Do NOT include "think" in fields for models like Qwen3 or DeepSeek-R1 that auto-parse thinking tags. This will cause parsing failures.
Methods
parse
def parse(self, text: str, strip: bool = True, last: bool = False) -> Any
Parse XML tags from text and return a namespace object with attributes for each field.
The text containing XML tags.
Whether to strip whitespace from field values.
If True, extract the last occurrence of each tag. If False, extract the first.
Returns: SimpleNamespace with attributes for each allowed field name. Missing fields are set to None.
parse_answer
def parse_answer(self, completion: Messages) -> str | None
Extract the answer field from the last assistant message containing it.
String or list of messages.
Returns: Content of the answer field, or None if not found.
def format(self, **kwargs) -> str
Format keyword arguments into an XML string using canonical field names.
Field values to format. Keys must match canonical or alternative field names.
Returns: XML-formatted string.
Raises: ValueError if a required field is missing.
def get_format_str(self) -> str
Get a description of the expected XML format.
Returns: String showing the XML structure with field names.
get_fields
def get_fields(self) -> list[str]
Get the list of canonical field names.
Returns: List of field names in order.
def get_format_reward_func(self) -> Callable
Create a reward function that checks format compliance.
Returns: Function that scores completions based on:
- Presence of expected fields (40% weight)
- Proper spacing (20%)
- Starting with first field (20%)
- Ending with last field (20%)
Attributes
The field name used for extracting answers.
All attributes from Parser are also available.
Example Usage
Basic XML Parsing
import verifiers as vf
# Define parser expecting reasoning and answer
parser = vf.XMLParser(
fields=["reasoning", "answer"],
answer_field="answer"
)
text = """<reasoning>
Let's solve step by step.
2 + 2 = 4
</reasoning>
<answer>
4
</answer>"""
result = parser.parse(text)
print(result.reasoning) # "Let's solve step by step.\n2 + 2 = 4"
print(result.answer) # "4"
Alternative Field Names
# Allow either "code" or "answer" for the final field
parser = vf.XMLParser(
fields=["reasoning", ("code", "answer")],
answer_field="answer" # Canonical name for extraction
)
# This works with <code>
text1 = "<reasoning>...</reasoning><code>print('hi')</code>"
result1 = parser.parse(text1)
print(result1.code) # "print('hi')"
print(result1.answer) # "print('hi')" (same value, different attribute)
# This works with <answer>
text2 = "<reasoning>...</reasoning><answer>42</answer>"
result2 = parser.parse(text2)
print(result2.answer) # "42"
print(result2.code) # "42" (same value, different attribute)
parser = vf.XMLParser(fields=["reasoning", "answer"])
completion = [
{"role": "user", "content": "What is 5*5?"},
{"role": "assistant", "content": "<reasoning>5*5</reasoning><answer>25</answer>"}
]
answer = parser.parse_answer(completion)
print(answer) # "25"
parser = vf.XMLParser(fields=["reasoning", "answer"])
xml = parser.format(
reasoning="First, we calculate...",
answer="42"
)
print(xml)
# <reasoning>
# First, we calculate...
# </reasoning>
# <answer>
# 42
# </answer>
parser = vf.XMLParser(fields=["reasoning", "answer"])
# Get format checking function
format_checker = parser.get_format_reward_func()
# Well-formatted completion
good = [{"role": "assistant", "content": "<reasoning>...</reasoning><answer>5</answer>"}]
print(format_checker(good)) # 1.0
# Missing reasoning tag
bad = [{"role": "assistant", "content": "<answer>5</answer>"}]
print(format_checker(bad)) # < 1.0 (partial credit)
# No tags at all
worse = [{"role": "assistant", "content": "just text"}]
print(format_checker(worse)) # 0.0
Using with Rubric
parser = vf.XMLParser(fields=["reasoning", "answer"])
# Create rubric with format checking
rubric = vf.Rubric(parser=parser)
# Add format reward
rubric.add_reward_func(parser.get_format_reward_func(), weight=0.2)
# Add correctness reward
def correctness(completion, answer, parser, **kwargs):
parsed = parser.parse_answer(completion)
return 1.0 if parsed == answer else 0.0
rubric.add_reward_func(correctness, weight=1.0)
# Now: reward = 1.0 * correctness + 0.2 * format_compliance
parser = vf.XMLParser(fields=["answer"])
# Multiple answer tags - get the last one
text = "<answer>wrong</answer> <answer>correct</answer>"
first = parser.parse(text, last=False)
print(first.answer) # "wrong"
last = parser.parse(text, last=True)
print(last.answer) # "correct"
Complex Multi-Field Example
parser = vf.XMLParser(
fields=[
"problem_understanding",
"approach",
("code", "solution"), # Either code or solution
"explanation",
"answer"
],
answer_field="answer"
)
text = """<problem_understanding>
We need to find the sum.
</problem_understanding>
<approach>
Use a loop.
</approach>
<code>
result = sum(range(10))
</code>
<explanation>
This calculates 0+1+2+...+9
</explanation>
<answer>
45
</answer>"""
result = parser.parse(text)
print(result.problem_understanding) # "We need to find the sum."
print(result.code) # "result = sum(range(10))"
print(result.answer) # "45"
# Get format description
print(parser.get_format_str())
# <problem_understanding>
# ...
# </problem_understanding>
# <approach>
# ...
# </approach>
# <[ code | solution ]>
# ...
# </[ code | solution ]>
# ...
Error Handling
parser = vf.XMLParser(fields=["reasoning", "answer"])
# Malformed XML - unclosed tags
text = "<reasoning>text</reasoning><answer>42"
result = parser.parse(text)
print(result.reasoning) # "text"
print(result.answer) # None (unclosed tag)
# Missing required field when formatting
try:
parser.format(reasoning="only this")
except ValueError as e:
print(e) # "Missing value for field 'answer' (allowed: ['answer'])"
No Whitespace Stripping
parser = vf.XMLParser(fields=["code"])
text = "<code>\n def foo():\n pass\n</code>"
# With stripping (default)
stripped = parser.parse(text, strip=True)
print(repr(stripped.code)) # "def foo():\n pass"
# Without stripping
unstripped = parser.parse(text, strip=False)
print(repr(unstripped.code)) # "\n def foo():\n pass\n"
Multi-turn Parsing
parser = vf.XMLParser(fields=["reasoning", "answer"])
completion = [
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "<reasoning>Let me think</reasoning>"},
{"role": "user", "content": "Continue."},
{"role": "assistant", "content": "<answer>4</answer>"}
]
# parse_answer searches backwards through assistant messages
answer = parser.parse_answer(completion)
print(answer) # "4" (from last assistant message with answer field)
The format reward function scores completions based on:
- Field presence (40%): Proportion of expected field sets present
- Proper spacing (20%): Tags have content between them (not just whitespace)
- Starts correctly (20%): Begins with first field’s opening tag
- Ends correctly (20%): Ends with last field’s closing tag
Partial credit is given for partial compliance.
See Also