HIVE-COTE 2.0 unpublished results

Introduction

This page includes results we did not include in the paper and experiments we have processed since the papers publication. When displaying these results, we default to the tsml implementation of HC2 but may use the aeon implementation if it is more relevant, i.e., if we are testing a new component which is implemented in aeon but not tsml, such as variations on ROCKET or TSFresh.

tsml vs aeon

We provide two implementations of the HIVE-COTE 2.0 algorithm. One Java implementation in the tsml package, and another in the Python aeon package. We compare both our implementations of HC2 and its components (as of 18/03/2022). As the aeon package develops, the implmentations may become more efficient.

tsml results:

HC2 DrCIF TDE STC Arsenal
Accuracy 0.8912 0.8637 0.8589 0.8585 0.8659
Average Train Time (Minutes) 187.1848 24.3228 40.3987 66.823 14.9539
Total Train Time (Hours) 349.4112 45.4025 75.4109 124.7362 27.9139

aeon results:

HC2 DrCIF TDE STC Arsenal
Accuracy 0.8926 0.8637 0.8609 0.8555 0.8657
Average Train Time (Minutes) 275.856 85.3483 45.2662 125.9338 5.0892
Total Train Time (Hours) 514.9313 159.3168 84.497 235.0764 9.4999

None of the classifiers are significantly different from their language counterparts in terms of accuracy. The Python implementation is quite a bit slower, STC is contracted for 2 hours instead of 1. Both HC2 implementations are capable of contracting the full classifier.

Cross-validation vs out-of-bag train accuracy estimates

In HIVE-COTE 2.0 we replace the cross-validation train accuracy estimates with out-of-bag estimates requiring only a single model. Here we compare both estimates for the HC2 components. Except for TDE, out-of-bag estimates are noticeably faster. TDE evaluates all members of its ensemble using leave-one-out cross-validation as part of the building process, by retaining those estimated the full ensemble estimate is essentially free.

Out-of-bag accuracy estimates:

Arsenal-oob TDE-oob DrCIF-oob STC-oob
Train Accuracy 0.8542 0.8475 0.8422 0.8709
Difference to Test Accuracy -0.0117 -0.0114 -0.0215 0.0124
Average Train Time (Minutes) 8.9383 3.564 15.6845 11.4425
Total Train Time (Hours) 8.7794 2.4126 4.6385 5.1934

Cross-validation accuracy estimates:

Arsenal-cv TDE-cv DrCIF-cv STC-cv
Train Accuracy 0.8621 0.8866 0.8546 0.8823
Difference to Test Accuracy -0.0038 0.0277 -0.0091 0.0238
Average Train Time (Minutes) 67.2417 0.0002 31.2117 102.5649
Total Train Time (Hours) 65.0597 0.00004 7.1529 42.5626

We compare four versions of HIVE-COTE 2.0. HC2-oob is the version used in our publication, where each component uses out-of-bag error. For HC2-cv all components use cross-validation. HC2-fastest uses the fastest train estimate method for each component, with all except for TDE using out-of-bag error. Lastly HC2-closest uses the component with the train accuracy estimate closest to the actual test accuracy, consisting of out-of-bag estimates from TDE and STC and cross-validation estimates from DrCIF and Arsenal.

Of the HC2 variants, HC2-fastest is unsurprisingly the fastest. Unexpectedly, it is also significantly better than the out-of-bag estimate version.

HC2-fastest HC2-cv HC2-oob HC2-closest
Accuracy 0.8917 0.892 0.8912 0.8912
Average Train Time (Minutes) 183.4415 348.9466 187.1848 261.1385
Total Train Time (Hours) 342.4237 651.3661 349.4112 487.458

Upgraded STC for multivariate data

With the exception of TDE, which did not have multivariate capabilities prior to HC2, all component classifiers use the multivariate versions used in the 2021 multivariate time series backoff. STC created an ensemble of classifiers built on each dimension in the dataset. Given that the classifier is also contracted for an hour, this can get out of hand quite quickly. DuckDuckGeese has 1345 dimensions, so in theory would take 56 days to train if each ensemble member was trained sequentially, not including the requires train accuracy estimate.

We test a faster solution, where the dimension for each shapelet extracted in the shapelet transform portion of the classifier is randomly selected. We find no significant different in terms of accuracy between these methods, but the new multivariate STC is significantly faster.

HC2-NewSTC HC2 NewSTC STC
Accuracy 0.7445 0.7448 0.7123 0.7032
Average Train Time (Hours) 21.6148 439.1926 2.3552 269.988
Total Train Time (Days) 23.4161 475.792 2.3957 292.487

ROCKET variants for Arsenal

Feature based representation