From 617e9b82c593f72ae2821e6aecb0d3fba728a651 Mon Sep 17 00:00:00 2001 From: Utkarsh Mahajan <89238212+Utkarsh352@users.noreply.github.com> Date: Thu, 23 Jun 2022 17:39:58 +0530 Subject: [PATCH] Update README.md --- README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index da8f792a0..569361f60 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,4 @@ # BERT - **\*\*\*\*\* New March 11th, 2020: Smaller BERT Models \*\*\*\*\*** This is a release of 24 smaller BERT models (English only, uncased, trained with WordPiece masking) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962). @@ -79,15 +78,15 @@ the pre-processing code. In the original pre-processing code, we randomly select WordPiece tokens to mask. For example: -`Input Text: the man jumped up , put his basket on phil ##am ##mon ' s head` -`Original Masked Input: [MASK] man [MASK] up , put his [MASK] on phil +`Input Text: the man jumped up, put his basket on Phil ##am ##mon ' s head` +`Original Masked Input: [MASK] man [MASK] up, put his [MASK] on Phil [MASK] ##mon ' s head` The new technique is called Whole Word Masking. In this case, we always mask -*all* of the the tokens corresponding to a word at once. The overall masking +*all* of the tokens corresponding to a word at once. The overall masking rate remains the same. -`Whole Word Masked Input: the man [MASK] up , put his basket on [MASK] [MASK] +`Whole Word Masked Input: the man [MASK] up, put his basket on [MASK] [MASK] [MASK] ' s head` The training is identical -- we still predict each masked WordPiece token