TransformerEncoder¶
- class continual.TransformerEncoder(encoder_layer, num_layers, norm=None)[source]¶
Continual Transformer Encoder is a stack of N encoder layers.
The continual formulation of the Transformer Encoder was proposed by Hedegaard et al. in “Continual Transformers: Redundancy-Free Attention for Online Inference”. https://arxiv.org/abs/2201.06268 (paper) https://www.youtube.com/watch?v=gy802Tlp-eQ (video).
Note
This class deviates from the Pytorch implementation in the following ways: 1) encoder_layer parameter takes a factory functor, TransformerEncoderLayerFactory 2) mask and src_key_padding_mask are not supported currently.
Note
The efficiency gains of
forward_step
compared toforward
is highly dependent on the chosennum_layers
. Here, a lowernum_layers
is most efficient. Accordingly, we recommend increasingd_model
,nhead
, anddim_feedforward
of theTransformerEncoderLayerFactory
rather than increasingnum_layers
if larger models are desired. Keeping the parameter-count equal, this was found to work well for regular Transformer Encoders as well (https://arxiv.org/pdf/2210.00640.pdf).Note
In order to handle positional encoding correctly for continual input streams, the
RecyclingPositionalEncoding
should be used together with this module.- Parameters:
encoder_layer (Callable[[MhaType, Optional[bool]], Sequential]) – An instance of
TransformerEncoderLayerFactory
.num_layers (int) – the number of sub-encoder-layers in the encoder (required).
norm (Module) – the layer normalization component (optional).
Examples:
encoder_layer = co.TransformerEncoderLayerFactory(d_model=512, nhead=8, sequence_len=32) transformer_encoder = co.TransformerEncoder(encoder_layer, num_layers=2) src = torch.rand(10, 512, 32) out = transformer_encoder(src)