Running HIVE-COTE 2.0 with our code

tsml:

tsml is a Java based toolbox for time series classification research. tsml is the primary source for our time series classiication algorithms, and contains the version of HC2 used in our publication. For users wishing to run HC2 (or other TSC algorithms) using tsml we provide 3 main options:

The latest version of the tsml repository containing the most recent versions of HC2 and other TSC algorithms.
The tsml repository HC2 branch containing the version of tsml used at the time of publication.
A jar file of the tsml repository at the time of publication zipped with a batch and bash script to run HC2 using it.

We provide instruction on how to use the tsml jar to run HC2 on a command line below. For integrating HC2 or other tsml algorithms into Java applications, the toolkit is Weka compatible and uses its classifier API. Feel free to contract us if further help is required in integrating tsml into other codebases.

Instructions:

We provide both a batch and bash script to run HC2 using the tsml jar. Both scripts require 5 input arguments and have an optional sixth. These are:

The path to the dataset you want to run HC2 on.
The path you want to write the HC2 results file to.
The name of the dataset.
The numer of threads you want to use.
Memory allowance in gigabytes.
(optional) A seed for the HC2 classifier, if not provided will default to 0.

The dataset files must be in ARFF or TS format. More details on the ARFF format can be found on the Weka help page and details on the TS format can be found in the aeon loading data notebook. Examples of both formats can be found on the datasets page.

In the chosen dataset path a directory with the same name as the dataset must be present. This directory should contain the ARFF/TS files, both beginning with the dataset name and ending with _TEST for the testing data and _TRAIN for the training data. A single file can be provided with no _TEST/_TRAIN suffix, and a 50/50 split will be created.

After HC2 has finished the training and testing process, a results file will be output in the 'HIVE-COTEv2' series of directories at the chosen results path. Information about our results file format can be found on this page.

The format for the input data and structure for the results output should look similar to the following:


├── data
│   └── dataset
│       ├── dataset_TEST.(arff/ts)
│       └── dataset_TRAIN.(arff/ts)
├── results
│   └── HIVE-COTEv2
│       └── Predictions
│           └── dataset
│               └── testFold0.csv
└── tsml.jar

Both scripts use the same ordered input arguments, and can be run like this:

D:/HC2/HC2.bat D:\HC2\data\ D:\HC2\results\ dataset 2 10 0

sh HC2.sh D:\HC2\data\ D:\HC2\results\ dataset 2 10 0

Experiments can be run from the jar file directly as well. Input arguments and descriptions for them can be found in the ExperimentalArguments class in Experiments.java.

java -jar -Xmx10G tsml.jar -dp=D:\HC2\data\ -rp=D:\HC2\results\ -dn=dataset -cn="HIVE-COTEv2" -f=1 -nt=2 -s=0

The classification algorithm being run can be changed using the -cn argument. Classifiers available tsml can be found in the switch statements in ClassifierLists.java, with the name of the case being the input to run that classifier.

Additional notes:

tsml uses the nd4j library in its ROCKET/Arsenal implementation for efficiency reasons. To change the number of threads used the OMP_NUM_THREADS enviromental variable must be set. If OMP_NUM_THREADS does not match the set number of threads, warnings will be displayed.
tsml requires a 64 bit version of Java. You may have to adjust the script Java command accordingly if your default version is 32 bit.
tsml has extensive tools for evaluation multiple classifiers using the output file format. See the evaluation example class and the MultipleClassifierEvaluation class.

aeon:

aeon is a Python based toolbox for time series analysis which contains a developed classification module. The implementation of HIVE-COTE 2.0 is available in aeon 0.1.0 and subsequent releases as well as the GitHub main branch. Past aeon 0.1.0 some of these examples may be outdated.

We provide a link to the latest aeon release here and the aeon GitHub repository here.

Instructions:

The aeon toolkit is compatible with sklearn. There are a number of example notebooks avilable to help get started with building and evaluating classifiers available here.

aeon can load files in the ARFF or TS format into the required data structure. Examples of both formats can be found on the datasets page. Below we provide a simple example for loading a dataset in the TS format, building HIVE-COTE 2.0 and calculating accuracy using aeon.


from sklearn.metrics import accuracy_score

from aeon.classification.hybrid import HIVECOTEV2
from aeon.utils.data_io import load_from_tsfile_to_dataframe as load_ts

if __name__ == "__main__":
    # Load data
    X_train, y_train = load_ts("data_TRAIN.ts")
    X_test, y_test = load_ts("data_TEST.ts")

    # Fit HC2
    hc2 = HIVECOTEV2()
    hc2.fit(X_train, y_train)

    # Predict and print accuracy
    predictions = hc2.predict(X_test)
    print(accuracy_score(y_test, predictions))

Like tsml, aeon has a class for running experiments and outputing results files in our format. The same file structure requirements as tsml apply, and currently only TS files can be loaded. Valid inputs for the classifier are available in the set_classifier function in the experiments.py file.


from aeon.benchmarking.experiments import load_and_run_classification_experiment

if __name__ == "__main__":
    load_and_run_classification_experiment(
        "D:\HC2\data\",  # Dataset dir
        "D:\HC2\results\",  # Results dir
        "HIVECOTEV2",  # Classifier name
        "dataset",  # Dataset name
        resample_id=0,  # seed for dataset resample and classifier. No resample if 0.
    )