Skip to content

Commit de558ab

Browse files
committed
Merge branch 'dev-0.4.1'
2 parents 10ed2a8 + 195784b commit de558ab

File tree

81 files changed

+1031
-766
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+1031
-766
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
.idea
2+
.vscode
23
*.pyc
34
*.gz
45
*.tsv

docs/package_reference/cross_encoder.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ For an introduction to Cross-Encoders, see [Cross-Encoders](../usage/cross-encod
1010
CrossEncoder have their own evaluation classes, that are in `sentence_transformers.cross_encoder.evaluation`.
1111

1212
```eval_rst
13+
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CEBinaryAccuracyEvaluator
1314
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CEBinaryClassificationEvaluator
1415
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CECorrelationEvaluator
1516
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CESoftmaxAccuracyEvaluator

docs/package_reference/datasets.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,6 @@
22
`sentence_transformers.datasets` contains classes to organize your training input examples.
33

44

5-
## SentencesDataset
6-
`SentencesDataset` is the main class to store training classes for training. For details, see [training overview](../training/overview.md).
7-
```eval_rst
8-
.. autoclass:: sentence_transformers.datasets.SentencesDataset
9-
```
105

116
## ParallelSentencesDataset
127
`ParallelSentencesDataset` is used for multilingual training. For details, see [multilingual training](../../examples/training/multilingual/README.md).

docs/package_reference/models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
## Further Classes
1212
```eval_rst
13+
.. autoclass:: sentence_transformers.models.Asym
1314
.. autoclass:: sentence_transformers.models.BoW
1415
.. autoclass:: sentence_transformers.models.CNN
1516
.. autoclass:: sentence_transformers.models.LSTM

docs/pretrained_cross-encoders.md

Lines changed: 50 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,41 +7,77 @@ This page lists available **pretrained Cross-Encoders**. Cross-Encoders require
77

88
## STSbenchmark
99
The following models can be used like this:
10-
```
10+
```python
1111
from sentence_transformers import CrossEncoder
1212
model = CrossEncoder('model_name')
1313
scores = model.predict([('Sent A1', 'Sent B1'), ('Sent A2', 'Sent B2')])
1414
```
1515

1616
They return a score 0...1 indicating the semantic similarity of the given sentence pair.
17-
- **sentence-transformers/ce-distilroberta-base-stsb** - STSbenchmark test performance: 87.92
18-
- **sentence-transformers/ce-roberta-base-stsb** - STSbenchmark test performance: 90.17
19-
- **sentence-transformers/ce-roberta-large-stsb** - STSbenchmark test performance: 91.47
17+
- **cross-encoder/stsb-TinyBERT-L-4** - STSbenchmark test performance: 85.50
18+
- **cross-encoder/stsb-distilroberta-base** - STSbenchmark test performance: 87.92
19+
- **cross-encoder/stsb-roberta-base** - STSbenchmark test performance: 90.17
20+
- **cross-encoder/stsb-roberta-large** - STSbenchmark test performance: 91.47
2021

2122
## Quora Duplicate Questions
2223
These models have been trained on the [Quora duplicate questions dataset](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs). They can used like the STSb models and give a score 0...1 indicating the probability that two questions are duplicate questions.
2324

24-
- **sentence-transformers/ce-distilroberta-base-quora** - Average Precision dev set: 87.48
25-
- **sentence-transformers/ce-roberta-base-quora** - Average Precision dev set: 87.80
26-
- **sentence-transformers/ce-roberta-large-quora** - Average Precision dev set: 87.91
25+
- **cross-encoder/quora-distilroberta-base** - Average Precision dev set: 87.48
26+
- **cross-encoder/quora-roberta-base** - Average Precision dev set: 87.80
27+
- **cross-encoder/quora-roberta-large** - Average Precision dev set: 87.91
2728

29+
Note: The model don't work for question similarity. The question *How to learn Java* and *How to learn Python* will get a low score, as these questions are not duplicates. For question similarity, the respective bi-encoder trained on the Quora dataset yields much more meaningful results.
2830

2931
## Information Retrieval
3032

3133
The following models are trained for Information Retrieval: Given a query (like key-words or a question), and a paragraph, can the query be answered by the paragraph? The models have beend trained on MS Marco, a large dataset with real-user queries from Bing search engine.
3234

3335
The models can be used like this:
34-
```
36+
```python
3537
from sentence_transformers import CrossEncoder
3638
model = CrossEncoder('model_name', max_length=512)
37-
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2')])
39+
scores = model.predict([('Query1', 'Paragraph1'), ('Query2', 'Paragraph2')])
40+
41+
#For Example
42+
scores = model.predict([('How many people live in Berlin?', 'Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'),
43+
('What is the size of New York?', 'New York City is famous for the Metropolitan Museum of Art.')])
3844
```
3945

4046
This returns a score 0...1 indicating if the paragraph is relevant for a given query.
4147

42-
- **sentence-transformers/ce-ms-marco-TinyBERT-L-2** - MRR@10 on MS Marco Dev Set: 30.15
43-
- **sentence-transformers/ce-ms-marco-TinyBERT-L-4** - MRR@10 on MS Marco Dev Set: 34.50
44-
- **sentence-transformers/ce-ms-marco-TinyBERT-L-6** - MRR@10 on MS Marco Dev Set: 36.13
45-
- **sentence-transformers/ce-ms-marco-electra-base** - MRR@10 on MS Marco Dev Set: 36.41
4648

47-
For details on the usage, see [Applications - Information Retrieval](../examples/applications/information-retrieval/README.md)
49+
For details on the usage, see [Applications - Information Retrieval](../examples/applications/information-retrieval/README.md)
50+
51+
52+
### MS MARCO
53+
[MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) is a large dataset with real user queries from Bing search engine with annotated relevant text passages.
54+
- **cross-encoder/ms-marco-TinyBERT-L-2** - MRR@10 on MS Marco Dev Set: 30.15
55+
- **cross-encoder/ms-marco-TinyBERT-L-4** - MRR@10 on MS Marco Dev Set: 34.50
56+
- **cross-encoder/ms-marco-TinyBERT-L-6** - MRR@10 on MS Marco Dev Set: 36.13
57+
- **cross-encoder/ms-marco-electra-base** - MRR@10 on MS Marco Dev Set: 36.41
58+
59+
### SQuAD (QNLI)
60+
61+
QNLI is based on the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) and was introduced by the [GLUE Benchmar](https://arxiv.org/abs/1804.07461). Given a passage from Wikipedia, annotators created questions that are answerable by that passage.
62+
63+
- **cross-encoder/qnli-distilroberta-base** - Accuracy on QNLI dev set: 90.96
64+
- **cross-encoder/qnli-electra-base** - Accuracy on QNLI dev set: 93.21
65+
66+
67+
68+
## NLI
69+
Given two sentences, are these contradicting each other, entailing one the other or are these netural? The following models were trained on the [SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) datasets.
70+
- **cross-encoder/nli-distilroberta-base** - Accuracy on MNLI mismatched set: 83.98
71+
- **cross-encoder/nli-roberta-base** - Accuracy on MNLI mismatched set: 87.47
72+
- **cross-encoder/nli-deberta-base** - Accuracy on MNLI mismatched set: 88.08
73+
74+
```python
75+
from sentence_transformers import CrossEncoder
76+
model = CrossEncoder('model_name')
77+
scores = model.predict([('A man is eating pizza', 'A man eats something'), ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')])
78+
79+
#Convert scores to labels
80+
label_mapping = ['contradiction', 'entailment', 'neutral']
81+
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
82+
```
83+

docs/training/overview.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -57,19 +57,16 @@ For all available building blocks see [» Models Package Reference](../package_r
5757
To represent our training data, we use the `InputExample` class to store training examples. As parameters, it accepts texts, which is a list of strings representing our pairs (or triplets). Further, we can also pass a label (either float or int). The following shows a simple example, where we pass text pairs to `InputExample` together with a label indicating the semantic similarity.
5858

5959
```python
60-
from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample
60+
from sentence_transformers import SentenceTransformer, InputExample
6161
from torch.utils.data import DataLoader
6262

6363
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
6464
train_examples = [InputExample(texts=['My first sentence', 'My second sentence'], label=0.8),
6565
InputExample(texts=['Another pair', 'Unrelated sentence'], label=0.3)]
66-
train_dataset = SentencesDataset(train_examples, model)
67-
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=16)
66+
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
6867
```
6968

70-
To prepare the examples for training, we provide a custom `SentencesDataset`, which is a [custom PyTorch dataset](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html). It accepts as parameters the list with `InputExamples` and the `SentenceTransformer` model.
71-
72-
We can wrap `SentencesDataset` with the standard PyTorch `DataLoader`, which produces for example batches and allows us to shuffle the data for training.
69+
We wrap our `train_examples` with the standard PyTorch `DataLoader`, which shuffles our data and produces batches of certain sizes.
7370

7471

7572

@@ -92,7 +89,7 @@ For each sentence pair, we pass sentence A and sentence B through our network wh
9289

9390
A minimal example with `CosineSimilarityLoss` is the following:
9491
```python
95-
from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample, losses
92+
from sentence_transformers import SentenceTransformer, InputExample, losses
9693
from torch.utils.data import DataLoader
9794

9895
#Define the model. Either from scratch of by loading a pre-trained model
@@ -103,8 +100,7 @@ train_examples = [InputExample(texts=['My first sentence', 'My second sentence']
103100
InputExample(texts=['Another pair', 'Unrelated sentence'], label=0.3)]
104101

105102
#Define your train dataset, the dataloader and the train loss
106-
train_dataset = SentencesDataset(train_examples, model)
107-
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=16)
103+
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
108104
train_loss = losses.CosineSimilarityLoss(model)
109105

110106
#Tune the model
@@ -142,7 +138,7 @@ model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_st
142138

143139

144140
### Continue Training on Other Data
145-
[training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_stsbenchmark_continue_training.py) shows an example where training on a fine-tuned model is continued. In that example, we use a sentence transformer model that was first fine-tuned on the NLI dataset and then continue training on the training data from the STS benchmark.
141+
[training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) shows an example where training on a fine-tuned model is continued. In that example, we use a sentence transformer model that was first fine-tuned on the NLI dataset and then continue training on the training data from the STS benchmark.
146142

147143
First, we load a pre-trained model from the server:
148144
```python
@@ -152,9 +148,7 @@ model = SentenceTransformer('bert-base-nli-mean-tokens')
152148

153149
The next steps are as before. We specify training and dev data:
154150
```python
155-
sts_reader = STSBenchmarkDataReader('datasets/stsbenchmark', normalize_scores=True)
156-
train_data = SentencesDataset(sts_reader.get_examples('sts-train.csv'), model)
157-
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=train_batch_size)
151+
train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=train_batch_size)
158152
train_loss = losses.CosineSimilarityLoss(model=model)
159153

160154
evaluator = EmbeddingSimilarityEvaluator.from_input_examples(sts_reader.get_examples('sts-dev.csv'))

examples/applications/computing-embeddings/computing_embeddings.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919

2020

2121
# Load pre-trained Sentence Transformer Model (based on DistilBERT). It will be downloaded automatically
22-
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
22+
model = SentenceTransformer('average_word_embeddings_glove.6B.300d')
2323

2424
# Embed a list of sentences
2525
sentences = ['This framework generates embeddings for each input sentence',

examples/applications/cross-encoder/cross-encoder_reranking.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222

2323
# To refine the results, we use a CrossEncoder. A CrossEncoder gets both inputs (input_question, retrieved_question)
2424
# and outputs a score 0...1 indicating the similarity.
25-
cross_encoder_model = CrossEncoder('sentence-transformers/ce-roberta-base-stsb')
25+
cross_encoder_model = CrossEncoder('cross-encoder/roberta-base-stsb')
2626

2727
# Dataset we want to use
2828
url = "http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv"

examples/applications/cross-encoder/cross-encoder_usage.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
import numpy as np
88

99
# Pre-trained cross encoder
10-
model = CrossEncoder('sentence-transformers/ce-distilroberta-base-stsb')
10+
model = CrossEncoder('cross-encoder/distilroberta-base-stsb')
1111

1212
# We want to compute the similarity between the query sentence
1313
query = 'A man is eating pasta.'

examples/applications/information-retrieval/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,10 @@ In the following table, we provide various pre-trained Cross-Encoders together w
7878

7979
| Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec (BertTokenizerFast) | Docs / Sec |
8080
| ------------- |:-------------| -----| --- | --- |
81-
| sentence-transformers/ce-ms-marco-TinyBERT-L-2 | 67.43 | 30.15 | 9000 | 780
82-
| sentence-transformers/ce-ms-marco-TinyBERT-L-4 | 68.09 | 34.50 | 2900 | 760
83-
| sentence-transformers/ce-ms-marco-TinyBERT-L-6 | 69.57 | 36.13 | 680 | 660
84-
| sentence-transformers/ce-ms-marco-electra-base | 71.99 | 36.41 | 340 | 340
81+
| cross-encoder/ms-marco-TinyBERT-L-2 | 67.43 | 30.15 | 9000 | 780
82+
| cross-encoder/ms-marco-TinyBERT-L-4 | 68.09 | 34.50 | 2900 | 760
83+
| cross-encoder/ms-marco-TinyBERT-L-6 | 69.57 | 36.13 | 680 | 660
84+
| cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | 340
8585
| *Other models* | | | |
8686
| nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | 760
8787
| nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | 340|

0 commit comments

Comments
 (0)