Skip to content

Misconception of the transformer_encoder in notebook https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_classification_transformer.py #1256

@YacineKaci

Description

@YacineKaci

The normalization layer should be applied after the residual connection, not before. At least this is how it is done in the Transformer community.
So the code:
.
.
x = layers.LayerNormalization(epsilon=1e-6)(x)
res = x + inputs
.
.
x = layers.LayerNormalization(epsilon=1e-6)(x)
return x + res

Should be replaced by:
.
.
x = x + inputs
res = layers.LayerNormalization(epsilon=1e-6)(x)
.
.
x = x + res
return layers.LayerNormalization(epsilon=1e-6)(x)

Kind regards

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions