Overview
TheDecisionTreeRegressor class implements a decision tree algorithm for regression problems. It builds a tree structure by recursively splitting the data based on features that minimize variance or error in the target values.
Constructor
The function to measure the quality of a split. Supported criteria are:
Criterion::mse- Mean squared error (default for regression)Criterion::friedman_mse- Friedman’s improvement on MSECriterion::mae- Mean absolute error
The maximum depth of the tree. If not set, nodes are expanded until all leaves contain fewer than
min_samples_split samples.The minimum number of samples required to split an internal node.
The minimum number of samples required to be at a leaf node.
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
Methods
fit (primary method)
Train the decision tree regressor on continuous target data.Training feature matrix where each inner vector represents a sample.
Target values (continuous, real-valued).
fit (string labels - not recommended)
Overridden method for compatibility with the base class. Not typically used for regression.Training feature matrix where each inner vector represents a sample.
Target values as strings (will be converted to numeric internally).
predict (single sample)
Predict the target value for a single sample.A single feature vector to predict.
The predicted continuous value.
predict (multiple samples)
Predict target values for multiple samples.Feature matrix where each inner vector represents a sample.
Vector of predicted continuous values.
root
Get a pointer to the root node of the decision tree.Pointer to the root TreeNode, or nullptr if the tree hasn’t been fitted.
classes
Get the class labels (not typically meaningful for regression).Vector of label strings (empty or minimally populated for regression tasks).
Enumerations
Criterion
Criteria for measuring split quality in regression trees.Criterion::mse, Criterion::friedman_mse, or Criterion::mae.
Mean Squared Error (MSE): Minimizes the average squared difference between predictions and actual values. Best for normally distributed errors.
Friedman MSE: An improved version of MSE that can lead to better splits in some cases.
Mean Absolute Error (MAE): Minimizes the average absolute difference. More robust to outliers than MSE.
Data structures
TreeNode
Represents a node in the decision tree.Whether this node is a leaf node.
Index of the feature used for splitting at this node (for internal nodes).
Threshold value for the split (for internal nodes).
The predicted value stored at this node (mean of training samples for regression).
Not used for regression tasks.
Not used for regression tasks.
Pointer to the left child node (samples where feature <= threshold).
Pointer to the right child node (samples where feature > threshold).
Example usage
Comparison with DecisionTreeClassifier
| Feature | DecisionTreeRegressor | DecisionTreeClassifier |
|---|---|---|
| Task | Regression (continuous values) | Classification (discrete classes) |
| Default criterion | Criterion::mse | Criterion::gini |
| Output type | double (continuous) | double (class code) or std::string (class label) |
| Additional methods | None | predict_class(), predict_proba() |
| Leaf node value | Mean of training samples | Most common class |