NEWS.md
data.frame filter values are now returned in a long (tidy) tibble. This makes it easier to apply post-processing methods (like group_by(), etc) (@pat-s, #2456)benchmark() does not store the tuning results ($extract slot) anymore by default. If you want to keep this slot (e.g. for post tuning analysis), set keep.extract = TRUE. This change originated from the fact that the size of BenchmarkResult objects with extensive tuning got very large (~ GB) which can cause memory problems during runtime if multiple benchmark() calls are executed on HPCs.benchmark() does not store the created models ($models slot) anymore by default. The reason is the same as for the $extract slot above. Storing can be enabled using models = TRUE.generateFeatureImportanceData() gains argument show.info which shows the name of the current feature being calculated, its index in the queue and the elapsed time for each feature (@pat-s, #26222)classif.liquidSVM and regr.liquidSVM have been removed because liquidSVM has been removed from CRAN.data.tables default in rbindlist(). See #2578 for more information. (@mllg, #2579)regr.randomForest gains three new methods to estimate the standard error:
se.method = "jackknife"se.method = "bootstrap"se.method = "sd"?regr.randomForest for more details.regr.ranger relies on the functions provided by the package (“jackknife” and “infjackknife” (default))regr.gbm now supports quantile distribution (@bthieurmel, #2603)classif.plsdaCaret now supports multiclass classification (@GegznaV, #2621)getClassWeightParam() now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)getLearnerNote() to query the “Note” slot of a learner (@alona-sydorova, #2086)e1071::svm() now only uses the formula interface if factors are present. This change is supposed to prevent from “stack overflow” issues some users encountered when using large datasets. See #1738 for more information. (@mb706, #1740)cluster.MiniBatchKmeans from package ClusterR (@Prasiddhi, #2554)plotHyperParsEffect() now supports facet visualization of hyperparam effects for nested cv (@MasonGallo, #1653)data.tables default in rbindlist(). See #2578 for more information. (@mllg, #2579)options(on.learner.error) was not respected in benchmark(). This caused benchmark() to stop even if it should have continued including FailureModels in the result (@dagola, #1984)getClassWeightParam() now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)getLearnerNote() to query the “Note” slot of a learner (@alona-sydorova, #2086)praznik_mrmr also supports regr and surv tasksplotFilterValues() got a bit “smarter” and easier now regarding the ordering of multiple facets. (@pat-s, #2456)filterFeatures(), generateFilterValuesData() and makeFilterWrapper() gained new examples. (@pat-s, #2456)makeResampleDesc(fixed = TRUE)) (@pat-s, #2412).Task help pages are now split into separate ones, e.g. RegrTask, ClassifTask (@pat-s, #2564)deleteCacheDir(): Clear the default mlr cache directory (@pat-s, #2463)getCacheDir(): Return the default mlr cache directory (@pat-s, #2463)getResamplingIndices(inner = TRUE) now correctly returns the inner indices (before inner indices referred to the subset of the respective outer level train set) (@pat-s, #2413).fw.perc, fw.abs or fw.threshold. It can be triggered with the new cache argument in makeFilterWrapper() or filterFeatures() (@pat-s, #2463).Additionally, filter names have been harmonized using the following scheme:
FSelectorRcpp_gain.ratio, FSelectorRcpp_information.gain and FSelectorRcpp_symmetrical.uncertainty from package FSelectorRcpp. These filters are ~ 100 times faster than the implementation of the FSelector pkg. Please note that both implementations do things slightly different internally and the FSelectorRcpp methods should not be seen as direct replacement for the FSelector pkg.information.gain -> FSelector_information.gain
gain.ratio -> FSelector_gain.ratio
symmetrical.uncertainty -> FSelector_symmetrical.uncertainty
chi.squared -> FSelector_chi.squared
relief -> FSelector_relief
oneR -> FSelector_oneR
randomForestSRC.rfsrc -> randomForestSRC_importance
randomForestSRC.var.select -> randomForestSRC_var.select
randomForest.importance -> randomForest_importance
"h2o.use.data.table" = TRUE is now the default (@j-hartshorn, #2508)x.bit.names that stores the optimal bitsx now always contains the real feature names and not the bit.namesmakeFeatSelWrapper usable with custom bit.names.sffs crashed in some cases (@bmihaljevic, #2486)resample.fun to specify a custom resampling function to use.