TransformerEncoder¶

class continual.TransformerEncoder(encoder_layer, num_layers, norm=None)[source]¶

Continual Transformer Encoder is a stack of N encoder layers.

The continual formulation of the Transformer Encoder was proposed by Hedegaard et al. in “Continual Transformers: Redundancy-Free Attention for Online Inference”. https://arxiv.org/abs/2201.06268 (paper) https://www.youtube.com/watch?v=gy802Tlp-eQ (video).

Note

This class deviates from the Pytorch implementation in the following ways: 1) encoder_layer parameter takes a factory functor, TransformerEncoderLayerFactory 2) mask and src_key_padding_mask are not supported currently.

Note

The efficiency gains of forward_step compared to forward is highly dependent on the chosen num_layers. Here, a lower num_layers is most efficient. Accordingly, we recommend increasing d_model, nhead, and dim_feedforward of the TransformerEncoderLayerFactory rather than increasing num_layers if larger models are desired. Keeping the parameter-count equal, this was found to work well for regular Transformer Encoders as well (https://arxiv.org/pdf/2210.00640.pdf).

Note

In order to handle positional encoding correctly for continual input streams, the RecyclingPositionalEncoding should be used together with this module.

Parameters:

encoder_layer (Callable[[MhaType, Optional[bool]], Sequential]) – An instance of TransformerEncoderLayerFactory.
num_layers (int) – the number of sub-encoder-layers in the encoder (required).
norm (Module) – the layer normalization component (optional).

Examples:

encoder_layer = co.TransformerEncoderLayerFactory(d_model=512, nhead=8, sequence_len=32)
transformer_encoder = co.TransformerEncoder(encoder_layer, num_layers=2)
src = torch.rand(10, 512, 32)
out = transformer_encoder(src)