Dataset Overview
The DeepSense Scenario 23 dataset is specifically designed for drone detection in THz communication scenarios.Dataset Statistics
| Metric | Value |
|---|---|
| Total images | 11,387 |
| Training set | 7,970 (70%) |
| Validation set | 1,708 (15%) |
| Test set | 1,709 (15%) |
| Image resolution | 960×540 pixels (16:9 aspect ratio) |
| Dataset size | ~650 MB |
| Annotation format | YOLO (normalized bounding boxes) |
| Classes | 1 (drone) |
| Capture sessions | 51 different conditions |
The dataset includes images from 51 capture sessions with uneven distribution (some sessions have 1 image, others 1000+). The train/val/test split is shuffled to ensure a reasonable mix of conditions across all splits.
Directory Structure
The dataset must be organized in the standard YOLO format:Dataset Configuration File
Thedata.yaml file defines dataset paths and classes:
Configuration Fields
| Field | Description |
|---|---|
path | Root directory for dataset (relative to project root) |
train | Training images subdirectory (relative to path) |
val | Validation images subdirectory |
test | Test images subdirectory |
nc | Number of classes (1 for single-class drone detection) |
names | List of class names (index 0 = “drone”) |
Annotation Format
Each.txt label file contains bounding box annotations in YOLO format:
Example Annotation
- Class:
0(drone) - Center: (0.541511, 0.609623) → 51.95% from left, 60.96% from top
- Size: 0.046874 × 0.075995 → 4.69% of image width, 7.60% of image height
Converting to Pixel Coordinates
For 960×540 images:Multi-Object Images
If an image contains multiple drones, each gets its own line:image_with_2_drones.txt
Obtaining the Dataset
The DeepSense Scenario 23 dataset is publicly available:Download from DeepSense
Visit the DeepSense 6G website and download the Scenario 23 dataset. The download includes both images and YOLO-format annotation files.
Extract the dataset
Extract the downloaded archive. You should find:
- A directory of images (
.jpgfiles) - A directory of labels (
.txtfiles)
Organize into train/val/test splits
The raw dataset needs to be split into training, validation, and test sets. Use a 70/15/15 split with shuffling to ensure diverse conditions in each set.
Python script for splitting
Python script for splitting
Verifying Dataset Integrity
Check file counts
Verify that each split has matching image and label counts:Expected: 7970 train, 1708 validation, 1709 test
Image Characteristics
Aspect Ratio
All images are 960×540 pixels (16:9 aspect ratio). This matches standard HD video format and is ideal for drone detection in wide outdoor scenes.The training script uses
rect=True to preserve the 16:9 aspect ratio during training. Without this, YOLO would pad images to square (960×960), wasting 44% of pixels on black padding.Capture Conditions
The dataset includes diverse conditions:- Different times of day (various lighting conditions)
- Multiple drone positions and angles
- Varying backgrounds (sky, buildings, trees)
- Different drone sizes (near/far from camera)
Memory Requirements
When training withcache="ram", the dataset is loaded into system memory:
| Resource | Required |
|---|---|
| Disk space | ~650 MB (images only) |
| System RAM (with cache=“ram”) | ~4 GB |
| GPU VRAM (training, batch=0.90) | ~3.5 GB (varies by model) |
Known Issues
Finding the bounding box annotations
Finding the bounding box annotations
Issue: Initially thought the dataset didn’t include bbox labels.Resolution: The 11,387 YOLO-format
.txt files are included in the original DeepSense download. They just need to be paired with images and organized into the YOLO directory structure. See the Known Issues page for more details.Uneven distribution across capture sessions
Uneven distribution across capture sessions
Issue: The 51 capture sessions have very uneven counts (some 1 image, others 1000+).Mitigation: Shuffling before splitting ensures train/val/test sets have a reasonable mix of conditions. This hasn’t caused problems in practice. See the Known Issues page for more details.
Next Steps
Once your dataset is set up:- Verify the directory structure matches the expected format
- Confirm
data.yamlpoints to the correct paths - Start training using the Training Guide