You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,7 +120,7 @@ You can increase `--num-examples` and `--num-candidate-solutions` to run on more
120
120
121
121
### Running on more examples.
122
122
123
-
There are 500 examples total in SWE-bench Verified. Note that this can take awhile, so there are a few levels of parallelism this repository supports.
123
+
There are 500 examples total in SWE-bench Verified. Note that this can take awhile, so there are a few levels of parallelism this repository supports.
124
124
- Firstly, we suggest running 8 processes. This is the `--num-processes` flag. Beyond this, Docker hits issues.
125
125
- Secondly, we support a notion of breaking up the dataset into shards. This is the `--shard-ct` and `--shard-id` flags. This makes it relatively easy to split up the work across multiple machines, which circumnvents the issues with scaling Docker byeond 8 processes.
The input JSONL file should contain a list of problem objects, each with the following structure:
169
+
The input JSONL file should contain a list of problem objects, each with the following structure. The `diffs` are the candidate solutions generated by the agent. The `eval_outcomes` are the results of running the eval harness on each candidate solution, where the index corresponds to the index in the `diffs` array.
Copy file name to clipboardExpand all lines: example_ensembler_results.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,4 +23,4 @@
23
23
"selected_diff": "@@ -45,3 +45,12 @@ def is_palindrome(text):\n cleaned_text = ''.join(c.lower() for c in text if c.isalnum())\n return cleaned_text == cleaned_text[::-1]\n\n+def is_valid_email(email):\n+ \"\"\"\n+ Check if a string is a valid email address.\n+ \"\"\"\n+ import re\n+ \n+ # Simple regex pattern for email validation\n+ pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'\n+ return bool(re.match(pattern, email))\n",
0 commit comments