Shortcuts

TransformerEncoder

class continual.TransformerEncoder(encoder_layer, num_layers, norm=None)[source]

Continual Transformer Encoder is a stack of N encoder layers.

The continual formulation of the Transformer Encoder was proposed by Hedegaard et al. in “Continual Transformers: Redundancy-Free Attention for Online Inference”. https://arxiv.org/abs/2201.06268 (paper) https://www.youtube.com/watch?v=gy802Tlp-eQ (video).

Note

This class deviates from the Pytorch implementation in the following ways: 1) encoder_layer parameter takes a factory functor, TransformerEncoderLayerFactory 2) mask and src_key_padding_mask are not supported currently.

Note

The efficiency gains of forward_step compared to forward is highly dependent on the chosen num_layers. Here, a lower num_layers is most efficient. Accordingly, we recommend increasing d_model, nhead, and dim_feedforward of the TransformerEncoderLayerFactory rather than increasing num_layers if larger models are desired. Keeping the parameter-count equal, this was found to work well for regular Transformer Encoders as well (https://arxiv.org/pdf/2210.00640.pdf).

Note

In order to handle positional encoding correctly for continual input streams, the RecyclingPositionalEncoding should be used together with this module.

Parameters:

Examples:

encoder_layer = co.TransformerEncoderLayerFactory(d_model=512, nhead=8, sequence_len=32)
transformer_encoder = co.TransformerEncoder(encoder_layer, num_layers=2)
src = torch.rand(10, 512, 32)
out = transformer_encoder(src)
Read the Docs v: latest
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.