You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`ParallelSentencesDataset` is used for multilingual training. For details, see [multilingual training](../../examples/training/multilingual/README.md).
They return a score 0...1 indicating the semantic similarity of the given sentence pair.
17
-
-**sentence-transformers/ce-distilroberta-base-stsb** - STSbenchmark test performance: 87.92
18
-
-**sentence-transformers/ce-roberta-base-stsb** - STSbenchmark test performance: 90.17
19
-
-**sentence-transformers/ce-roberta-large-stsb** - STSbenchmark test performance: 91.47
17
+
-**cross-encoder/stsb-TinyBERT-L-4** - STSbenchmark test performance: 85.50
18
+
-**cross-encoder/stsb-distilroberta-base** - STSbenchmark test performance: 87.92
19
+
-**cross-encoder/stsb-roberta-base** - STSbenchmark test performance: 90.17
20
+
-**cross-encoder/stsb-roberta-large** - STSbenchmark test performance: 91.47
20
21
21
22
## Quora Duplicate Questions
22
23
These models have been trained on the [Quora duplicate questions dataset](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs). They can used like the STSb models and give a score 0...1 indicating the probability that two questions are duplicate questions.
23
24
24
-
-**sentence-transformers/ce-distilroberta-base-quora** - Average Precision dev set: 87.48
25
-
-**sentence-transformers/ce-roberta-base-quora** - Average Precision dev set: 87.80
26
-
-**sentence-transformers/ce-roberta-large-quora** - Average Precision dev set: 87.91
25
+
-**cross-encoder/quora-distilroberta-base** - Average Precision dev set: 87.48
26
+
-**cross-encoder/quora-roberta-base** - Average Precision dev set: 87.80
27
+
-**cross-encoder/quora-roberta-large** - Average Precision dev set: 87.91
27
28
29
+
Note: The model don't work for question similarity. The question *How to learn Java* and *How to learn Python* will get a low score, as these questions are not duplicates. For question similarity, the respective bi-encoder trained on the Quora dataset yields much more meaningful results.
28
30
29
31
## Information Retrieval
30
32
31
33
The following models are trained for Information Retrieval: Given a query (like key-words or a question), and a paragraph, can the query be answered by the paragraph? The models have beend trained on MS Marco, a large dataset with real-user queries from Bing search engine.
32
34
33
35
The models can be used like this:
34
-
```
36
+
```python
35
37
from sentence_transformers import CrossEncoder
36
38
model = CrossEncoder('model_name', max_length=512)
scores = model.predict([('How many people live in Berlin?', 'Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'),
43
+
('What is the size of New York?', 'New York City is famous for the Metropolitan Museum of Art.')])
38
44
```
39
45
40
46
This returns a score 0...1 indicating if the paragraph is relevant for a given query.
41
47
42
-
-**sentence-transformers/ce-ms-marco-TinyBERT-L-2** - MRR@10 on MS Marco Dev Set: 30.15
43
-
-**sentence-transformers/ce-ms-marco-TinyBERT-L-4** - MRR@10 on MS Marco Dev Set: 34.50
44
-
-**sentence-transformers/ce-ms-marco-TinyBERT-L-6** - MRR@10 on MS Marco Dev Set: 36.13
45
-
-**sentence-transformers/ce-ms-marco-electra-base** - MRR@10 on MS Marco Dev Set: 36.41
46
48
47
-
For details on the usage, see [Applications - Information Retrieval](../examples/applications/information-retrieval/README.md)
49
+
For details on the usage, see [Applications - Information Retrieval](../examples/applications/information-retrieval/README.md)
50
+
51
+
52
+
### MS MARCO
53
+
[MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) is a large dataset with real user queries from Bing search engine with annotated relevant text passages.
54
+
-**cross-encoder/ms-marco-TinyBERT-L-2** - MRR@10 on MS Marco Dev Set: 30.15
55
+
-**cross-encoder/ms-marco-TinyBERT-L-4** - MRR@10 on MS Marco Dev Set: 34.50
56
+
-**cross-encoder/ms-marco-TinyBERT-L-6** - MRR@10 on MS Marco Dev Set: 36.13
57
+
-**cross-encoder/ms-marco-electra-base** - MRR@10 on MS Marco Dev Set: 36.41
58
+
59
+
### SQuAD (QNLI)
60
+
61
+
QNLI is based on the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) and was introduced by the [GLUE Benchmar](https://arxiv.org/abs/1804.07461). Given a passage from Wikipedia, annotators created questions that are answerable by that passage.
62
+
63
+
-**cross-encoder/qnli-distilroberta-base** - Accuracy on QNLI dev set: 90.96
64
+
-**cross-encoder/qnli-electra-base** - Accuracy on QNLI dev set: 93.21
65
+
66
+
67
+
68
+
## NLI
69
+
Given two sentences, are these contradicting each other, entailing one the other or are these netural? The following models were trained on the [SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) datasets.
70
+
-**cross-encoder/nli-distilroberta-base** - Accuracy on MNLI mismatched set: 83.98
71
+
-**cross-encoder/nli-roberta-base** - Accuracy on MNLI mismatched set: 87.47
72
+
-**cross-encoder/nli-deberta-base** - Accuracy on MNLI mismatched set: 88.08
73
+
74
+
```python
75
+
from sentence_transformers import CrossEncoder
76
+
model = CrossEncoder('model_name')
77
+
scores = model.predict([('A man is eating pizza', 'A man eats something'), ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')])
Copy file name to clipboardExpand all lines: docs/training/overview.md
+7-13Lines changed: 7 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,19 +57,16 @@ For all available building blocks see [» Models Package Reference](../package_r
57
57
To represent our training data, we use the `InputExample` class to store training examples. As parameters, it accepts texts, which is a list of strings representing our pairs (or triplets). Further, we can also pass a label (either float or int). The following shows a simple example, where we pass text pairs to `InputExample` together with a label indicating the semantic similarity.
58
58
59
59
```python
60
-
from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample
60
+
from sentence_transformers import SentenceTransformer, InputExample
61
61
from torch.utils.data import DataLoader
62
62
63
63
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
64
64
train_examples = [InputExample(texts=['My first sentence', 'My second sentence'], label=0.8),
To prepare the examples for training, we provide a custom `SentencesDataset`, which is a [custom PyTorch dataset](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html). It accepts as parameters the list with `InputExamples` and the `SentenceTransformer` model.
71
-
72
-
We can wrap `SentencesDataset` with the standard PyTorch `DataLoader`, which produces for example batches and allows us to shuffle the data for training.
69
+
We wrap our `train_examples` with the standard PyTorch `DataLoader`, which shuffles our data and produces batches of certain sizes.
73
70
74
71
75
72
@@ -92,7 +89,7 @@ For each sentence pair, we pass sentence A and sentence B through our network wh
92
89
93
90
A minimal example with `CosineSimilarityLoss` is the following:
94
91
```python
95
-
from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample, losses
92
+
from sentence_transformers import SentenceTransformer, InputExample, losses
96
93
from torch.utils.data import DataLoader
97
94
98
95
#Define the model. Either from scratch of by loading a pre-trained model
@@ -103,8 +100,7 @@ train_examples = [InputExample(texts=['My first sentence', 'My second sentence']
[training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_stsbenchmark_continue_training.py) shows an example where training on a fine-tuned model is continued. In that example, we use a sentence transformer model that was first fine-tuned on the NLI dataset and then continue training on the training data from the STS benchmark.
141
+
[training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) shows an example where training on a fine-tuned model is continued. In that example, we use a sentence transformer model that was first fine-tuned on the NLI dataset and then continue training on the training data from the STS benchmark.
146
142
147
143
First, we load a pre-trained model from the server:
148
144
```python
@@ -152,9 +148,7 @@ model = SentenceTransformer('bert-base-nli-mean-tokens')
152
148
153
149
The next steps are as before. We specify training and dev data:
0 commit comments