Time Series cross-validator
Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.
This cross-validation object is a variation of :class:`KFold`. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set.
Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them.
Read more in the :ref:`User Guide <cross_validation>`.
Parameters ---------- n_splits : int, default=5 Number of splits. Must be at least 2.
.. versionchanged:: 0.22 ``n_splits`` default value changed from 3 to 5.
max_train_size : int, optional Maximum size for a single training set.
Examples -------- >>> import numpy as np >>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]
) >>> y = np.array(1, 2, 3, 4, 5, 6
) >>> tscv = TimeSeriesSplit() >>> print(tscv) TimeSeriesSplit(max_train_size=None, n_splits=5) >>> for train_index, test_index in tscv.split(X): ... print('TRAIN:', train_index, 'TEST:', test_index) ... X_train, X_test = Xtrain_index
, Xtest_index
... y_train, y_test = ytrain_index
, ytest_index
TRAIN: 0
TEST: 1
TRAIN: 0 1
TEST: 2
TRAIN: 0 1 2
TEST: 3
TRAIN: 0 1 2 3
TEST: 4
TRAIN: 0 1 2 3 4
TEST: 5
Notes ----- The training set has size ``i * n_samples // (n_splits + 1)
- n_samples % (n_splits + 1)`` in the ``i``th split, with a test set of size ``n_samples//(n_splits + 1)``, where ``n_samples`` is the number of samples.