Description
In _normalize_token, when a stop_token or forbidden_token string maps to multiple token IDs, the code raises a ValueError. The error message uses {token!r} in a regular string instead of an f-string, so users see the literal {token!r} instead of the actual invalid token value. This makes it harder to debug misconfigured stop/forbidden tokens.
Example error shown:
ValueError: Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.
Instead of:
ValueError: Invalid token: 'hello world'. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.
Locations
-
gemma/gm/text/_sampler.py
-
gemma/research/t5gemma/sampling.py
Steps to reproduce
- Create a tokenizer (e.g.
Tokenizer.from_version(3)).
- Call
_normalize_token(tokenizer, "hello world") (or any string that tokenizes to more than one token).
- The
ValueError message will contain the literal {token!r} instead of the actual token string.
