HIVE-COTE 2.0 unpublished results

Introduction

This page includes results we did not include in the paper and experiments we have processed since the papers publication. When displaying these results, we default to the tsml implementation of HC2 but may use the aeon implementation if it is more relevant, i.e., if we are testing a new component which is implemented in aeon but not tsml, such as variations on ROCKET or TSFresh.

tsml vs aeon

We provide two implementations of the HIVE-COTE 2.0 algorithm. One Java implementation in the tsml package, and another in the Python aeon package. We compare both our implementations of HC2 and its components (as of 18/03/2022). As the aeon package develops, the implmentations may become more efficient.

tsml results:

	HC2	DrCIF	TDE	STC	Arsenal
Accuracy	0.8912	0.8637	0.8589	0.8585	0.8659
Average Train Time (Minutes)	187.1848	24.3228	40.3987	66.823	14.9539
Total Train Time (Hours)	349.4112	45.4025	75.4109	124.7362	27.9139

aeon results:

	HC2	DrCIF	TDE	STC	Arsenal
Accuracy	0.8926	0.8637	0.8609	0.8555	0.8657
Average Train Time (Minutes)	275.856	85.3483	45.2662	125.9338	5.0892
Total Train Time (Hours)	514.9313	159.3168	84.497	235.0764	9.4999

None of the classifiers are significantly different from their language counterparts in terms of accuracy. The Python implementation is quite a bit slower, STC is contracted for 2 hours instead of 1. Both HC2 implementations are capable of contracting the full classifier.

Cross-validation vs out-of-bag train accuracy estimates

In HIVE-COTE 2.0 we replace the cross-validation train accuracy estimates with out-of-bag estimates requiring only a single model. Here we compare both estimates for the HC2 components. Except for TDE, out-of-bag estimates are noticeably faster. TDE evaluates all members of its ensemble using leave-one-out cross-validation as part of the building process, by retaining those estimated the full ensemble estimate is essentially free.

Out-of-bag accuracy estimates:

	Arsenal-oob	TDE-oob	DrCIF-oob	STC-oob
Train Accuracy	0.8542	0.8475	0.8422	0.8709
Difference to Test Accuracy	-0.0117	-0.0114	-0.0215	0.0124
Average Train Time (Minutes)	8.9383	3.564	15.6845	11.4425
Total Train Time (Hours)	8.7794	2.4126	4.6385	5.1934

Cross-validation accuracy estimates:

	Arsenal-cv	TDE-cv	DrCIF-cv	STC-cv
Train Accuracy	0.8621	0.8866	0.8546	0.8823
Difference to Test Accuracy	-0.0038	0.0277	-0.0091	0.0238
Average Train Time (Minutes)	67.2417	0.0002	31.2117	102.5649
Total Train Time (Hours)	65.0597	0.00004	7.1529	42.5626

We compare four versions of HIVE-COTE 2.0. HC2-oob is the version used in our publication, where each component uses out-of-bag error. For HC2-cv all components use cross-validation. HC2-fastest uses the fastest train estimate method for each component, with all except for TDE using out-of-bag error. Lastly HC2-closest uses the component with the train accuracy estimate closest to the actual test accuracy, consisting of out-of-bag estimates from TDE and STC and cross-validation estimates from DrCIF and Arsenal.

Of the HC2 variants, HC2-fastest is unsurprisingly the fastest. Unexpectedly, it is also significantly better than the out-of-bag estimate version.

	HC2-fastest	HC2-cv	HC2-oob	HC2-closest
Accuracy	0.8917	0.892	0.8912	0.8912
Average Train Time (Minutes)	183.4415	348.9466	187.1848	261.1385
Total Train Time (Hours)	342.4237	651.3661	349.4112	487.458

Upgraded STC for multivariate data

With the exception of TDE, which did not have multivariate capabilities prior to HC2, all component classifiers use the multivariate versions used in the 2021 multivariate time series backoff. STC created an ensemble of classifiers built on each dimension in the dataset. Given that the classifier is also contracted for an hour, this can get out of hand quite quickly. DuckDuckGeese has 1345 dimensions, so in theory would take 56 days to train if each ensemble member was trained sequentially, not including the requires train accuracy estimate.

We test a faster solution, where the dimension for each shapelet extracted in the shapelet transform portion of the classifier is randomly selected. We find no significant different in terms of accuracy between these methods, but the new multivariate STC is significantly faster.

	HC2-NewSTC	HC2	NewSTC	STC
Accuracy	0.7445	0.7448	0.7123	0.7032
Average Train Time (Hours)	21.6148	439.1926	2.3552	269.988
Total Train Time (Days)	23.4161	475.792	2.3957	292.487

HIVE-COTE 2.0 unpublished results

Introduction

tsml vs aeon

Cross-validation vs out-of-bag train accuracy estimates

Upgraded STC for multivariate data

ROCKET variants for Arsenal

Feature based representation