Introduction to simpE

What is simpE?

simpE is a lightweight benchmarking tool designed to evaluate small language models (LLMs) on fundamental cognitive tasks. Whether you’re testing models locally with LM-Studio or evaluating reasoning capabilities, simpE provides quick, reliable metrics on model performance.

Quick Start

Get up and running with simpE in minutes

Installation Guide

Detailed setup instructions and configuration

Benchmark Types

Learn about the three core benchmark areas

Analyzing Results

Understand your benchmark data

Benchmark Types

simpE evaluates language models across three fundamental capability areas:

1. String Reversal

Evaluates basic pattern manipulation by asking the model to reverse strings of varying lengths (2-30 characters). This tests:

Character-level attention
Sequential processing
Ability to follow simple transformations

# Example from the source code (main.py:147-152)
stringlenth = random.randint(2, 30)
text = ''.join(random.choice(string.ascii_uppercase + string.digits + string.ascii_lowercase) for _ in range(stringlenth))

prompt = f"Provide the following text in reverse order. Don't output anything else. Only output the reversed string without anything additional, not even quotes: \"{text}\""

2. Big Integer Addition

Challenges the model with arithmetic operations on large integers (2-30 digits each). This benchmark reveals:

Mathematical reasoning capabilities
Handling of large numbers
Ability to perform calculations without explanation

# Example from the source code (main.py:214-223)
int1_length = random.randint(2, 30)
int2_length = random.randint(2, 30)

int1 = int(''.join(random.choice(string.digits) for _ in range(int1_length)))
int2 = int(''.join(random.choice(string.digits) for _ in range(int2_length)))

prompt = f"Provide the sum of the two numbers. Don't output anything else. Only output the sum of the two numbers without anything additional. Only output the final number, no calculation, no explanation, just the final number without any text.: \"{int1}\" \"{int2}\""

3. String Rehearsal

Tests the model’s ability to reproduce longer strings (10-500 characters) exactly as provided. This measures:

Context retention
Exact replication capabilities
Attention to detail

# Example from the source code (main.py:292-297)
stringlenth = random.randint(10, 500)
text = ''.join(random.choice(string.ascii_uppercase + string.digits + string.ascii_lowercase) for _ in range(stringlenth))

prompt = f"Repeat the following string exactly without modifying it. Don't output anything else. Only output the string without anything additional, not even quotes: \"{text}\""

Key Features

Real-time Progress Tracking

simpE provides live console output showing:

Current benchmark progress (e.g., “String Reversal 45/100”)
Success rate percentage updated in real-time
Thinking time for reasoning models
Completion status for each benchmark

Reasoning Model Support

Built-in support for models with reasoning capabilities:

Configurable reasoning effort levels (low, medium, high)
Automatic capture of reasoning traces
Detailed reasoning statistics in analysis

Comprehensive Logging

All benchmark runs generate detailed logs:

Timestamped execution logs in logs/ directory
JSON results with full response data in results/ directory
Recent log file for quick access to latest run

Flexible Configuration

Easily adjust benchmark parameters in main.py:

tries = 100  # Number of tests per benchmark
timeout_time = 400  # Timeout in seconds
max_tokens = 512  # Maximum output tokens
reasoning_effort = "low"  # Reasoning level: low, medium, high
baseurl = "http://127.0.0.1:1234/v1"  # LM-Studio API endpoint

Analyzing Results

After running benchmarks, use the built-in analysis tool:

uv run analyze

The analyzer provides:

Accuracy metrics - Success percentage for each benchmark
Reasoning pattern analysis - Frequency of key phrases like “wait”, “actually”, “hold on”
Statistical insights - Average/median/min/max for reasoning trace lengths and word counts

Results are saved as JSON files in the results/ directory with timestamps and model information for easy comparison across runs.

Why simpE?

Simple Setup

No complex configuration - just install and run

Local-First

Works with LM-Studio for complete privacy

Fast Iteration

Quick benchmarks help you iterate on model selection

Detailed Insights

Rich logging and analysis for deep dives

Next Steps

Install simpE

Follow the installation guide to set up simpE and configure your API endpoint

Run Your First Benchmark

Check out the quick start guide to run your first benchmark suite

Analyze Results

Learn how to interpret and compare benchmark results

Get Started

Benchmarks

Usage

API Reference

Introduction to simpE

What is simpE?

Quick Start

Installation Guide

Benchmark Types

Analyzing Results

Benchmark Types

1. String Reversal

2. Big Integer Addition

3. String Rehearsal

Key Features

Real-time Progress Tracking

Reasoning Model Support

Comprehensive Logging

Flexible Configuration

Analyzing Results

Why simpE?

Simple Setup

Local-First

Fast Iteration

Detailed Insights

Next Steps

Build docs developers (and LLMs) love

Get Started

Benchmarks

Usage

API Reference

​What is simpE?

Quick Start

Installation Guide

Benchmark Types

Analyzing Results

​Benchmark Types

​1. String Reversal

​2. Big Integer Addition

​3. String Rehearsal

​Key Features

​Real-time Progress Tracking

​Reasoning Model Support

​Comprehensive Logging

​Flexible Configuration

​Analyzing Results

​Why simpE?

Simple Setup

Local-First

Fast Iteration

Detailed Insights

​Next Steps

Build docs developers (and LLMs) love

What is simpE?

Benchmark Types

1. String Reversal

2. Big Integer Addition

3. String Rehearsal

Key Features

Real-time Progress Tracking

Reasoning Model Support

Comprehensive Logging

Flexible Configuration

Analyzing Results

Why simpE?

Next Steps