@@ -50,12 +50,12 @@ For example, if the training step of a random forest model `ranger::ranger()` ta
50
50
When the same model takes 1 second to train, the overhead introduced by mlr3 is only 1%.
51
51
Instead of using real models, we simulate the training and prediction time for models by sleeping for 1, 10, 100, and 1000 ms.
52
52
53
- We start by measuring the runtime of the ` $train() ` methods of the learner.
53
+ We start by measuring the runtime of the ` $train() ` method of the learner.
54
54
For models with a training time of 1000 and 100 ms, the overhead introduced by mlr3 is minimal.
55
55
Models with a training time of 10 ms take 2 times longer to train in mlr3.
56
56
For models with a training time of 1 ms, the overhead is approximately 10 times larger than the actual model training time.
57
57
The overhead of ` $predict() ` is similar to ` $train() ` and the size of the dataset being predicted plays only a minor role.
58
- The ` $predict_newdata() ` methods converts the data to a task and then predicts on it which doubles the overhead of the ` $predict() ` method.
58
+ The ` $predict_newdata() ` method converts the data to a task and then predicts on it which doubles the overhead of the ` $predict() ` method.
59
59
The recently introduced ` $predict_newdata_fast() ` method is much faster than ` $predict_newdata() ` .
60
60
For models with a prediction time of 10 ms, the overhead is around 10%.
61
61
For models with a prediction time of 1 ms, the overhead is around 50%.
@@ -115,7 +115,8 @@ create_table = function(data) {
115
115
task = "Task Size",
116
116
median_runtime = "Median Runtime [ms]",
117
117
k = "K") %>%
118
- fmt_number(columns = c("k", "median_runtime"), n_sigfig = 2, sep_mark = "") %>%
118
+ fmt_number(columns = c("median_runtime"), decimals = 0, sep_mark = "") %>%
119
+ fmt_number(columns = c("k"), n_sigfig = 2, sep_mark = "") %>%
119
120
tab_style(
120
121
style = list(
121
122
cell_fill(color = "crimson"),
@@ -156,7 +157,7 @@ data_runtime = data_runtime[, -c("renv_project")]
156
157
```
157
158
158
159
The runtime and memory usage of the ` $train() ` method is measured for different mlr3 versions.
159
- The train step is performed for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 instances .
160
+ The train step is performed for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 observations .
160
161
161
162
``` {r}
162
163
#| eval: false
@@ -208,7 +209,7 @@ plot_runtime = function(data) {
208
209
geom_col(group = 1, fill = "#008080") +
209
210
geom_errorbar(aes(ymin = pmax(median_runtime - mad_runtime, 0), ymax = median_runtime + mad_runtime), width = 0.5, position = position_dodge(0.9)) +
210
211
geom_hline(aes(yintercept = model_time), linetype = "dashed") +
211
- facet_wrap(~task, scales = "free_y", labeller = labeller(task = function(value) sprintf("%s Instances ", value))) +
212
+ facet_wrap(~task, scales = "free_y", labeller = labeller(task = function(value) sprintf("%s Observations ", value))) +
212
213
labs(x = "mlr3Version", y = "Runtime [ms]") +
213
214
theme_minimal() +
214
215
theme(axis.text.x = element_text(angle = 45, hjust = 1))
@@ -228,7 +229,8 @@ create_table = function(data) {
228
229
task = "Task Size",
229
230
median_runtime = "Median Runtime [ms]",
230
231
k = "K") %>%
231
- fmt_number(columns = c("k", "median_runtime"), n_sigfig = 2) %>%
232
+ fmt_number(columns = c("median_runtime"), decimals = 0, sep_mark = "") %>%
233
+ fmt_number(columns = c("k"), n_sigfig = 2, sep_mark = "") %>%
232
234
tab_style(
233
235
style = list(
234
236
cell_fill(color = "crimson"),
@@ -529,7 +531,7 @@ data_runtime = merge(data_runtime, data_memory, by = c("task", "evals", "mlr3",
529
531
```
530
532
531
533
The runtime and memory usage of the ` resample() ` function is measured for different mlr3 versions.
532
- The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances .
534
+ The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 observations .
533
535
The resampling iterations (` evals ` ) are set to 1000, 100, and 10.
534
536
535
537
``` {r}
@@ -607,7 +609,7 @@ data_runtime = merge(data_runtime, data_memory, by = c("task", "evals", "mlr3",
607
609
```
608
610
609
611
The runtime and memory usage of the ` benchmark() ` function is measured for different mlr3 versions.
610
- The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances .
612
+ The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 observations .
611
613
The resampling iterations (` evals ` ) are set to 1000, 100, and 10.
612
614
613
615
``` {r}
@@ -762,7 +764,7 @@ data_runtime = merge(data_runtime, data_runtime_2, by = c("task", "evals", "mlr3
762
764
763
765
The runtime and memory usage of the ` resample() ` function with ` future::multisession ` parallelization is measured for different mlr3 versions.
764
766
The parallelization is conducted on 10 cores.
765
- The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances .
767
+ The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 observations .
766
768
The resampling iterations (` evals ` ) are set to 1000, 100, and 10.
767
769
768
770
``` {r}
@@ -858,7 +860,7 @@ data_runtime = merge(data_runtime, data_runtime_2, by = c("task", "evals", "mlr3
858
860
859
861
The runtime and memory usage of the ` benchmark() ` function with ` future::multisession ` parallelization is measured for different mlr3 versions.
860
862
The parallelization is conducted on 10 cores.
861
- The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances .
863
+ The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 observations .
862
864
The resampling iterations (` evals ` ) are set to 1000, 100, and 10.
863
865
864
866
``` {r}
0 commit comments