Hi,
Thank you for the excellent paper and for providing the code! I have been trying to reproduce the results from Figure 8 of the paper using LLaMa-7B and LLaMa-13B and the TriviaQA dataset I downloaded using the command in ReadMe.
However, I get the following values:
7B:
0 docs: 50.8, 1 doc: 54.1, 2 docs: 55.9, 3 docs: 56.4
13B:
0 docs: 57.8, 1 doc: 58.8, 2 docs: 59.8, 3 docs: 60.4
Can you please provide some insights/information that explains this discrepancy?
(The numbers for 1-3 documents are similar but there is a ~3% gap for 0 documents.)
