Shortcuts

LSTM

class continual.LSTM(input_size, hidden_size, num_layers=1, bias=True, dropout=0.0, proj_size=0, device=None, dtype=None, *args, **kwargs)[source]

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

it=σ(Wiixt+bii+Whiht1+bhi)ft=σ(Wifxt+bif+Whfht1+bhf)gt=tanh(Wigxt+big+Whght1+bhg)ot=σ(Wioxt+bio+Whoht1+bho)ct=ftct1+itgtht=ottanh(ct)\begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\ h_t = o_t \odot \tanh(c_t) \\ \end{array}

where hth_t is the hidden state at time t, ctc_t is the cell state at time t, xtx_t is the input at time t, ht1h_{t-1} is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and iti_t, ftf_t, gtg_t, oto_t are the input, forget, cell, and output gates, respectively. σ\sigma is the sigmoid function, and \odot is the Hadamard product.

In a multilayer LSTM, the input xt(l)x^{(l)}_t of the ll -th layer (l>=2l >= 2) is the hidden state ht(l1)h^{(l-1)}_t of the previous layer multiplied by dropout δt(l1)\delta^{(l-1)}_t where each δt(l1)\delta^{(l-1)}_t is a Bernoulli random variable which is 00 with probability dropout.

If proj_size > 0 is specified, LSTM with projections will be used. This changes the LSTM cell in the following way. First, the dimension of hth_t will be changed from hidden_size to proj_size (dimensions of WhiW_{hi} will be changed accordingly). Second, the output hidden state of each layer will be multiplied by a learnable projection matrix: ht=Whrhth_t = W_{hr}h_t. Note that as a consequence of this, the output of LSTM network will be of different shape as well. See Inputs/Outputs sections below for exact dimensions of all variables. You can find more details in https://arxiv.org/abs/1402.1128.

Parameters:
  • input_size (int) – The number of expected features in the input x

  • hidden_size (int) – The number of features in the hidden state h

  • num_layers (int) – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

  • bias (bool) – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

  • dropout (float) – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0

  • proj_size (int) – If > 0, will use LSTM with projections of corresponding size. Default: 0

Inputs: input, (h_0, c_0)
  • input: tensor of shape (N,Hin,L)(N, H_{in}, L) containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details.

  • h_0: tensor of shape (Dnum_layers,N,Hout)(D * \text{num\_layers}, N, H_{out}) containing the initial hidden state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided.

  • c_0: tensor of shape (num_layers,N,Hcell)(\text{num\_layers}, N, H_{cell}) containing the initial cell state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided.

where:

N=batch sizeL=sequence lengthHin=input_sizeHcell=hidden_sizeHout=proj_size if proj_size>0 otherwise hidden_size\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ H_{in} ={} & \text{input\_size} \\ H_{cell} ={} & \text{hidden\_size} \\ H_{out} ={} & \text{proj\_size if } \text{proj\_size}>0 \text{ otherwise hidden\_size} \\ \end{aligned}
Outputs: output, (h_n, c_n)
  • output: tensor of shape (N,Hout,L)(N, H_{out}, L) containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.

  • h_n: tensor of shape (num_layers,N,Hout)(\text{num\_layers}, N, H_{out}) containing the final hidden state for each element in the batch.

  • c_n: tensor of shape (num_layers,N,Hcell)(\text{num\_layers}, N, H_{cell}) containing the final cell state for each element in the batch.

Variables:
  • weight_ih_l[k] – the learnable input-hidden weights of the kth\text{k}^{th} layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0. Otherwise, the shape is (4*hidden_size, num_directions * hidden_size). If proj_size > 0 was specified, the shape will be (4*hidden_size, num_directions * proj_size) for k > 0

  • weight_hh_l[k] – the learnable hidden-hidden weights of the kth\text{k}^{th} layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size). If proj_size > 0 was specified, the shape will be (4*hidden_size, proj_size).

  • bias_ih_l[k] – the learnable input-hidden bias of the kth\text{k}^{th} layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)

  • bias_hh_l[k] – the learnable hidden-hidden bias of the kth\text{k}^{th} layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

  • weight_hr_l[k] – the learnable projection weights of the kth\text{k}^{th} layer of shape (proj_size, hidden_size). Only present when proj_size > 0 was specified.

  • weight_ih_l[k]_reverse – Analogous to weight_ih_l[k] for the reverse direction. Only present when bidirectional=True.

  • weight_hh_l[k]_reverse – Analogous to weight_hh_l[k] for the reverse direction. Only present when bidirectional=True.

  • bias_ih_l[k]_reverse – Analogous to bias_ih_l[k] for the reverse direction. Only present when bidirectional=True.

  • bias_hh_l[k]_reverse – Analogous to bias_hh_l[k] for the reverse direction. Only present when bidirectional=True.

  • weight_hr_l[k]_reverse – Analogous to weight_hr_l[k] for the reverse direction. Only present when bidirectional=True and proj_size > 0 was specified.

Note

All the weights and biases are initialized from U(k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\_size}}

Note

For bidirectional LSTMs are not supported.

Note

Contrary to the module version found in torch.nn, this module assumes batch first, channel next, and temporal dimension last.

Examples:

lstm = co.LSTM(input_size=10, hidden_size=20, num_layers=2)
#               B, C,  T
x = torch.randn(1, 10, 16)

# torch API
h0 = (torch.randn(2, 1, 20), torch.randn(2, 1, 20))
output, hn = lstm(x, h0)

# continual inference API
lstm.set_state(h0)
firsts = lstm.forward_steps(x[:,:,:-1])
last = lstm.forward_step(x[:,:,-1])

assert torch.allclose(firsts, output[:, :, :-1])
assert torch.allclose(last, output[:, :, -1])
Read the Docs v: latest
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.