You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 25, 2025. It is now read-only.
It is important that these be actual sentences
for the "next sentence prediction" task
and the example sample_text.txt does have each line ends with either . or ;.
Whereas in the BERT paper, it says
... we sample two spans of text from the corpus, which we refer to as "sentences"
even though they are typically much longer than single sentences
(but can be shorter also)
So it becomes unclear whether this implementation does expect actual sentences per line or just documents be broken down into multiple lines arbitrarily.