Echo State Network Training

Overview

The primary interface with the echo state network (ESN) in this code base is through the ESN() class. This class allows one to define an ESN and fit it to data with a given set of hyper-parameters.

Defining the Network

The usual format one uses when defining an ESN is building a sparse matrix with a specified spectral radius, \(\rho\), that follows either a uniform or a normal distribution for the weights. However, we have opted to define the reservoir of the ESN as a small-world network. We did this because there has been recent research which suggests that by having a more fine-grained specification of the reservoir network, one can see improvements in out-of-sample performance for the model [KTPA17].

Thus instead of specifying the sparsity of the reservoir, a user can now have greater control over the network by providing the number of neighbors and the re-wiring probability for the graph.

Function Documentation

class parallel_esn.esn.ESN(input_dim, hidden_dim, output_dim, k, spectral_radius=0.9, p=0.1, beta=0.001, alpha=0.7, random_state=None, weight_distn='uniform', use_cython=False, use_sparse=False)[source]

Sequential Echo State Network class

Parameters
  • input_dim (int) – Size of input dimension, N_u

  • hidden_dim (int) – Number of hidden units in the W matrix, N_x

  • output_dim (int) – Dimensionality of the output, N_y

  • k (int) – k-nearest neighbors each node is connected to in the small-world network for the hidden layer

  • spectral_radius (float, optional) – Spectral radius of the reservoir

  • p (float, optional) – Re-wiring probability for small-world network

  • beta (float, optional) – Regularization parameter for L2 regression

  • alpha (float, optional) – ESN leaking rate

  • random_state (int or np.random.RandomState, optional) – Random state initializer

  • weight_distn ({"uniform", "normal"}, optional) – Distribution of reservoir weights

  • use_cython (bool, optional) – Whether to use the Cython compiled code when computing the X matrix. Not compatible with use_sparse.

  • use_sparse (bool, optional) – Whether to use a sparse matrix for the reservoir W

clear_state()[source]

Clears X0, the reservoir’s memory of previous inputs and neural state

predict(U)[source]

Predicts Yhat, output observations, given time series of inputs U

Parameters

U (np.ndarray) – Input data array, columns u(n) concatenated horizontally. Dimensions - N_u x T

Returns

Yhat – Prediction of observations. Returns feature vectors as columns stacked horizontally in time. Take the transpose of this output to obtain feature vectors as rows stacked vertically in time.

Return type

np.ndarray

predict_with_X(X)[source]

Predicts Yhat, output observations, given X already generated from input data U. Useful if X has already been computed and a prediction is desired without affecting the current state of the reservoir.

Parameters

X (np.ndarray) – X is [1;u(n);x(n)] concatenated horizontally (n is time), generated from the input data U. Dimensions of (1+ N_u + N_x) x T

Returns

Yhat – Prediction of observations.

Return type

np.ndarray

recursive_predict(U, iterations, cold_start=False)[source]

Predicts Yhat, output observations following the given time series of inputs U This method assumes that the network has been trained to produce one-step-forecasting, where the output y(t) corresponds to u(t+1), i.e. what the next input would have been. Currently this method only supports predicting all features provided as input; the dimensions of vector y(t) must match the dimensions of u(t).

For the first step, observed values u(t) for t=0..T-1 are used to produce the first predicted value y(t) = hat{u}(t+1), which is then fed back to the network as an input in order to produce y(t+1). This recursion is continued for the specified number of iterations.

Parameters
  • U (np.ndarray) – Input data array, columns u(n) concatenated horizontally. Dimensions - N_u x T

  • iterations (int) – How many future times to predict.

  • cold_start (boolean, optional, default=False) – Whether to clear reservoir state before driving the reservoir with input data U. If the input data follows directly after training data, a warm start is sensible. However, if the provided data is temporally disconnected from the training data, a cold start could be useful for reproducibility if this method will be called multiple times, on the same data or on other inputs.

Returns

Yhat – Prediction of observations. Returns feature vectors as columns stacked horizontally in time. Take the transpose of this output to obtain feature vectors as rows stacked vertically in time.

Return type

np.ndarray

recursive_score(U, Y_true, input_len, pred_len)[source]

Computes loss in recursive one-step prediction, intended for validation set.

Parameters
  • U (np.ndarray) – Input data array, columns u(n) concatenated horizontally. Dimensions - N_u x T

  • Y_true (np.ndarray) – Target output array, y(n) concatenated horizontally in time. Dimensions - N_y x T

  • input_len (int) – The input length to be fed to the ESN before recursive single-step prediction.

  • pred_len (int) – The number of predictions desired.

Returns

error – Normalized mean square error (NMSE). On the prediction. Each feature’s NMSE is computed separately and averaged together at the end.

Return type

float

recursive_train_validate(trainU, trainY, valU, valY, input_len, pred_len, warmup=10, verbose=1, compute_loss_freq=-1, warmup_each_batch=False)[source]

Train on provided training data, and immediately validate and return validation loss.

Parameters
  • trainU (array_like of np.ndarray) – Batch of training input data arrays, columns u(n) concatenated horizontally. Dimensions - Batch_size x N_x x T_i

  • trainY (array_like of np.ndarray) – Batch of training true output data arrays. Dimensions - Batch_size x N_y x T_i

  • valU (array_like of np.ndarray) – Batch of validation input data arrays, columns u(n) concatenated horizontally. Dimensions - Batch_size x N_x x T_i

  • valY (array_like of np.ndarray) – Batch of validation true output data arrays. Dimensions - Batch_size x N_y x T_i

  • input_len (int) – The input length to be fed to the ESN before recursive single-step prediction.

  • pred_len (int) – The number of predictions desired.

  • warmup (int, optional, default=10) – The number of states to discard at the beginning of each train/validation batch, before initial transients in the reservoir have died out. The amount to discard depends on the memory of the network and typically ranges from 10s to 100s.

  • verbose (int, optional, default=1) – Whether to print status of training

  • compute_loss_freq (int, optional, default=-1) – How often to compute training loss. Only for information, not necessary for training. Negative value disables computing training loss.

  • warmup_each_batch (bool, optional, default=False) – Whether to use a warmup period on each batch before training. If the batches are are not consecutive time series, setting this to True is usually desirable.

Returns

loss – Returns the sum of the losses computed on each sequence in validation set.

Return type

float

recursive_validate(batchU, batchY_true, input_len, pred_len, verbose=1)[source]

Get loss on validation set for recursive one-step prediction. Uses recursive_score to compute the total error.

Parameters
  • batchU (array_like of np.ndarray) – Batch of input data arrays, columns u(n) concatenated horizontally Dimensions - Batch_size x N_x x T_i

  • batchY_true (array_like of np.ndarray) – Batch of true output data arrays Dimensions - Batch_size x N_y x T_i

  • input_len (int) – The input length to be fed to the ESN before recursive single-step prediction.

  • pred_len (int) – The number of predictions desired.

  • verbose (int, optional, default=1) – Whether to print status of training

Returns

loss – Returns the average of the NMSE losses computed on each sequence

Return type

float

reset()[source]

Reset output layer training and clear state

score(U, Y_true)[source]

Computes loss

Parameters
  • U (np.ndarray) – Input data array, columns u(n) concatenated horizontally. Dimensions - N_u x T

  • Y_true (np.ndarray) – Target output array, y(n) concatenated horizontally in time. Dimensions - N_y x T

Returns

error – Normalized mean square error (NMSE). Each feature’s NMSE is computed separately and averaged together at the end.

Return type

float

score_with_X(X, Y_true)[source]

Computes loss given input data X already processed with reservoir activations. Functions identically to running:

self.score(self._compute_X(U), Y_true)

provided X corresponds to U, and if starting from the same reservoir initial state.

Parameters
  • X (np.ndarray) – X is [1;u(n);x(n)] concatenated horizontally (n is time), generated from the input data U. Dimensions of (1+ N_u + N_x) x T

  • Y_true (np.ndarray) – Target output array, y(n) concatenated horizontally in time. Dimensions - N_y x T

Returns

error – Normalized mean square error (NMSE). Each feature’s NRMSE is computed separately and averaged together at the end.

Return type

float

train(batchU, batchY_true, clear_state=False, warmup=10, verbose=1, compute_loss_freq=-1, warmup_each_batch=False)[source]

Trains on U’s and corresponding Y_true’s, batched in first index.

Parameters
  • batchU (array_like of np.ndarray) – Batch of input data arrays, columns u(n) concatenated horizontally Dimensions - Batch_size x N_x x T_i

  • batchY_true (array_like of np.ndarray) – batch of true output data arrays Dimensions - Batch_size x N_y x T_i

  • clear_state (boolean, optional, default=False) – Whether to clear the reservoir memory in between batches. If False, the training on the batches is equivalent to if the batches were all concatenated into a single time series.

  • warmup (int, optional, default=10) – The number of states to discard at the beginning of training, before initial transients in the reservoir have died out. The amount to discard depends on the memory of the network and typically ranges from 10s to 100s. If batches are to be treated as independent, with clear_state=True, warmups can typically be shorter since the zeroed reservoir initialization would be the normal operating mode of the ESN.

  • verbose (int, optional, default=1) – Whether to print status of training

  • compute_loss_freq (int, default=-1) – How often to compute training loss. Only for information, not necessary for training. Negative value disables computing training loss.

  • warmup_each_batch (bool, optional, default=False) – Whether to use a warmup period on each batch before training. If the batches are are not consecutive time series, setting this to True is usually desirable.

Returns

loss – Returns the loss computed on each sequence where loss was computed, array of length ((Batch_size-1) // compute_loss_freq) + 1. None returned if compute_loss_freq is less than equal to 0.

Return type

np.ndarray

train_validate(trainU, trainY, valU, valY, warmup=10, verbose=1, compute_loss_freq=-1, warmup_each_batch=False)[source]

Train on provided training data, and immediately validate and return validation loss.

Parameters
  • trainU (array_like of np.ndarray) – Batch of training input data arrays, columns u(n) concatenated horizontally. Dimensions - Batch_size x N_x x T_i

  • trainY (array_like of np.ndarray) – Batch of training true output data arrays. Dimensions - Batch_size x N_y x T_i

  • valU (array_like of np.ndarray) – Batch of validation input data arrays, columns u(n) concatenated horizontally. Dimensions - Batch_size x N_x x T_i

  • valY (array_like of np.ndarray) – Batch of validation true output data arrays. Dimensions - Batch_size x N_y x T_i

  • warmup (int, optional, default=10) – The number of states to discard at the beginning of each train/validation batch, before initial transients in the reservoir have died out. The amount to discard depends on the memory of the network and typically ranges from 10s to 100s.

  • verbose (int, optional, default=1) – Whether to print status of training

  • compute_loss_freq (int, optional, default=-1) – How often to compute training loss. Only for information, not necessary for training. Negative value disables computing training loss.

  • warmup_each_batch (bool, optional, default=False) – Whether to use a warmup period on each batch before training. If the batches are are not consecutive time series, setting this to True is usually desirable.

Returns

loss – Returns the sum of the losses computed on each sequence in validation set.

Return type

float

validate(batchU, batchY_true, warmup=10, verbose=1)[source]

Get loss on validation set, given past sequences in batchU and observed outcomes batchY_true

Parameters
  • batchU (array_like of np.ndarray) – Batch of input data arrays, columns u(n) concatenated horizontally Dimensions - Batch_size x N_x x T_i

  • batchY_true (array_like of np.ndarray) – Batch of true output data arrays Dimensions - Batch_size x N_y x T_i

  • warmup (int, optional, default=10) – The number of states to discard at the beginning of each validation batch, before initial transients in the reservoir have died out. The amount to discard depends on the memory of the network and typically ranges from 10s to 100s.

  • verbose (int, optional, default=1) – Whether to print status of training

Returns

loss – Returns the average of the NMSE losses computed on each sequence

Return type

float

KTPA17

Yuji Kawai, Tatsuya Tokuno, Jihoon Park, and Minoru Asada. Echo in a small-world reservoir: time-series prediction using an economical recurrent neural network. In 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 126–131. IEEE, 2017.