You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases (usually involving sequences of multiple whitespace characters), the tokenizer can produce sentences with zero tokens. This causes errors later in the pipeline, specifically the following:
```
File "/usr/local/lib/python3.6/dist-packages/cube/api.py" line 194 in __call__
sequences = self._parser.parse_sequences(sequences)
File "/usr/local/lib/python3.6/dist-packages/cube/generic_networks/parsers.py" line 496 in parse_sequences
predicted_tags = self.tag(new_sequence)
File "/usr/local/lib/python3.6/dist-packages/cube/generic_networks/parsers.py" line 226 in tag
arc_matrix, aux_arc_matrix, proj_labels, softmax_morphology = self._predict_arc(seq)
File "/usr/local/lib/python3.6/dist-packages/cube/generic_networks/parsers.py" line 470 in _predict_arc
s_max = dy.softmax(dy.concatenate(s_max))
File "_dynet.pyx" line 4605 in _dynet.concatenate
File "_dynet.pyx" line 4618 in _dynet.concatenate
AssertionError: List is empty, nothing to concatenate.
```
This change removes empty sequences from the tokenization output.
0 commit comments