Shortcuts

RNN

class continual.RNN(input_size, hidden_size, num_layers=1, nonlinearity='tanh', bias=True, dropout=0.0, device=None, dtype=None, *args, **kwargs)[source]

Applies a multi-layer Elman RNN with tanh\tanh or ReLU\text{ReLU} non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

ht=tanh(Wihxt+bih+Whhh(t1)+bhh)h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})

where hth_t is the hidden state at time t, xtx_t is the input at time t, and h(t1)h_{(t-1)} is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then ReLU\text{ReLU} is used instead of tanh\tanh.

Parameters:
  • input_size (int) – The number of expected features in the input x

  • hidden_size (int) – The number of features in the hidden state h

  • num_layers (int) – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

  • nonlinearity – The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

  • bias (bool) – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

  • dropout (float) – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

Inputs: input, h_0
  • input: tensor of shape (N,Hin,L)(N, H_{in}, L) containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details.

  • h_0: tensor of shape (num_layers,N,Hout)(\text{num\_layers}, N, H_{out}) containing the initial hidden state for each element in the batch. Defaults to zeros if not provided.

where:

N=batch sizeL=sequence lengthHin=input_sizeHout=hidden_size\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ H_{in} ={} & \text{input\_size} \\ H_{out} ={} & \text{hidden\_size} \end{aligned}
Outputs: output, h_n
  • output: tensor of shape (N,Hout,L)(N, H_{out}, L) containing the output features (h_t) from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.

  • h_n: tensor of shape (num_layers,N,Hout)(\text{num\_layers}, N, H_{out}) containing the final hidden state for each element in the batch.

Variables:
  • weight_ih_l[k] – the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k] – the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from U(k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\_size}}

Note

For bidirectional RNNs are not supported.

Note

Contrary to the module version found in torch.nn, this module assumes batch first, channel next, and temporal dimension last.

Examples:

rnn = co.RNN(input_size=10, hidden_size=20, num_layers=2)
#               B, C,  T
x = torch.randn(1, 10, 16)

# torch API
h0 = torch.randn(2, 1, 20)
output, hn = rnn(x, h0)

# continual inference API
rnn.set_state(h0)
firsts = rnn.forward_steps(x[:,:,:-1])
last = rnn.forward_step(x[:,:,-1])

assert torch.allclose(firsts, output[:, :, :-1])
assert torch.allclose(last, output[:, :, -1])
Read the Docs v: latest
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.