StationarySignals
Overview
The StationarySignals class is designed to transform a set of time-series signals into stationary signals.
This is an essential pre-processing step in many time-series analysis tasks.
Attributes
df: (pd.DataFrame) The input DataFrame containing the signals.signal_id: (str, optional) Column name indfcontaining the signal IDs. Default is 'signal_id'.timestamp: (str, optional) Column name indfcontaining the timestamps. Default is 'timestamp'.value_col: (str, optional) Column name indfcontaining the values. Default is 'value'.method: (str, optional) The method to use for making signals stationary. Default is 'difference'. Choices are 'difference' and 'detrend'.detrend_type: (str, optional) The type of detrending to use ifmethodis 'detrend'. Default is 'gp'. Choices are 'gp' and 'lr'.alpha: (float, optional) Significance level for the Augmented Dickey-Fuller test. Default is 0.05.random_seed: (int, optional) Seed for the random number generator. Default is None.ls_range: (tuple, optional) Tuple specifying the range of length-scales for the Gaussian process. Default is (10.0, 100.0).ls_values: (np.ndarray, optional) Array of specific length scale values. Default is None.n_searches: (int, optional) Number of searches for Gaussian process hyperparameters. Default is 10.n_splits: (int, optional) Number of splits for cross-validation. Default is 5.eps: (float, optional) Tolerance for the difference. Default is 1e-6.gp_implementation: (str, optional) Implementation to use for the Gaussian process. Default is 'numba'. Choices are 'sklearn' and 'numba'.sklearn_scoring: (str, optional) Scoring method when using 'sklearn' for Gaussian process. Default is 'neg_mean_squared_error'.normalize_signals: (bool, optional) Whether to normalize signals to zero mean and unit variance. Default is True.
Methods
make_stationary_signals()
Creates stationary signals at specified statistical level, \(\alpha\).
Notes
We have implemented an RBF-based GP detrending in Numba using a
standard Cholesky factorization solution. This method, on average,
is faster than the Scikit-Learn implementation. The benefit will
become more noticeable as the number of unique signals increases.
However, this implementation is less numerically stable and less well-tested
than the Scikit-Learn version. You can control which version you use
via the argument gp_implementation.
Example
>>> signal_ids = np.repeat(["abc", "def"], 100)
>>> timestamps = np.tile(np.arange(100), 2)
>>> rng = np.random.default_rng(17)
>>> abc_values = rng.uniform(-5, 5, size=(100,))
>>> def_values = rng.uniform(-5, 5, size=(100,))
>>> values = np.concatenate((abc_values, def_values))
>>> df = pd.DataFrame({
... "signal_id": signal_ids,
... "timestamp": timestamps,
... "value": values
... })
>>> signals = StationarySignals(df, method='difference', normalize_signals=False)
>>> signals.make_stationary_signals()
signal_id timestamp value
0 abc 1 -6.841017
1 abc 2 3.967715
2 abc 3 -1.896646
3 abc 4 -1.531380
4 abc 5 1.708821
.. ... ... ...
193 def 95 0.653840
194 def 96 0.846767
195 def 97 5.441443
196 def 98 -8.955780
197 def 99 5.397502