StationarySignals
Overview
The StationarySignals
class is designed to transform a set of time-series signals into stationary signals.
This is an essential pre-processing step in many time-series analysis tasks.
Attributes
df
: (pd.DataFrame
) The input DataFrame containing the signals.signal_id
: (str
, optional) Column name indf
containing the signal IDs. Default is 'signal_id'.timestamp
: (str
, optional) Column name indf
containing the timestamps. Default is 'timestamp'.value_col
: (str
, optional) Column name indf
containing the values. Default is 'value'.method
: (str
, optional) The method to use for making signals stationary. Default is 'difference'. Choices are 'difference' and 'detrend'.detrend_type
: (str
, optional) The type of detrending to use ifmethod
is 'detrend'. Default is 'gp'. Choices are 'gp' and 'lr'.alpha
: (float
, optional) Significance level for the Augmented Dickey-Fuller test. Default is 0.05.random_seed
: (int
, optional) Seed for the random number generator. Default is None.ls_range
: (tuple
, optional) Tuple specifying the range of length-scales for the Gaussian process. Default is (10.0, 100.0).ls_values
: (np.ndarray
, optional) Array of specific length scale values. Default is None.n_searches
: (int
, optional) Number of searches for Gaussian process hyperparameters. Default is 10.n_splits
: (int
, optional) Number of splits for cross-validation. Default is 5.eps
: (float
, optional) Tolerance for the difference. Default is 1e-6.gp_implementation
: (str
, optional) Implementation to use for the Gaussian process. Default is 'numba'. Choices are 'sklearn' and 'numba'.sklearn_scoring
: (str
, optional) Scoring method when using 'sklearn' for Gaussian process. Default is 'neg_mean_squared_error'.normalize_signals
: (bool
, optional) Whether to normalize signals to zero mean and unit variance. Default is True.
Methods
make_stationary_signals()
Creates stationary signals at specified statistical level, \(\alpha\).
Notes
We have implemented an RBF-based GP detrending in Numba using a
standard Cholesky factorization solution. This method, on average,
is faster than the Scikit-Learn implementation. The benefit will
become more noticeable as the number of unique signals increases.
However, this implementation is less numerically stable and less well-tested
than the Scikit-Learn version. You can control which version you use
via the argument gp_implementation
.
Example
>>> signal_ids = np.repeat(["abc", "def"], 100)
>>> timestamps = np.tile(np.arange(100), 2)
>>> rng = np.random.default_rng(17)
>>> abc_values = rng.uniform(-5, 5, size=(100,))
>>> def_values = rng.uniform(-5, 5, size=(100,))
>>> values = np.concatenate((abc_values, def_values))
>>> df = pd.DataFrame({
... "signal_id": signal_ids,
... "timestamp": timestamps,
... "value": values
... })
>>> signals = StationarySignals(df, method='difference', normalize_signals=False)
>>> signals.make_stationary_signals()
signal_id timestamp value
0 abc 1 -6.841017
1 abc 2 3.967715
2 abc 3 -1.896646
3 abc 4 -1.531380
4 abc 5 1.708821
.. ... ... ...
193 def 95 0.653840
194 def 96 0.846767
195 def 97 5.441443
196 def 98 -8.955780
197 def 99 5.397502