Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
1e5f474
Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…
wjh581 Jan 25, 2026
62ab5f8
Simplify multi-agent: remove orchestrator, streamline env and rubric
wjh581 Jan 27, 2026
f097902
cleaning up
wjh581 Jan 27, 2026
7246cc9
cleaning up
wjh581 Jan 27, 2026
929c90f
cleaning up
wjh581 Jan 27, 2026
c4d5613
cleaning up: added some hooks into multiturn so i dont have to overri…
wjh581 Jan 27, 2026
fdf4071
cleaning up: added some hooks into multiturn to add extras for tracje…
wjh581 Jan 27, 2026
177b228
cleaning up some more
wjh581 Jan 27, 2026
b62afdc
trying to unify it better with multiturn and multiagent with the game…
wjh581 Jan 27, 2026
2a663cc
Add proposer_solver and poker environments, update multiagent support
wjh581 Jan 29, 2026
828f80a
Add model and client to actor so you can use different endpoints to p…
wjh581 Jan 29, 2026
c3af35c
adding poker_multi which is poker extension that allows more players.…
wjh581 Jan 29, 2026
dda05da
adjust protocol spawn rubric.
wjh581 Jan 29, 2026
5b1141e
added error handling and working on making sure timing is per actor
wjh581 Jan 30, 2026
842cb9d
adding documentation
wjh581 Jan 30, 2026
86bfe6f
ARC-AGI-2
wjh581 Feb 8, 2026
604903b
Merge branch 'main' into Bhoy1/Multiagent
Bhoy1 Feb 8, 2026
75b634e
updated arc-agi-2 as it still needed a few judges
wjh581 Feb 11, 2026
8d95d58
Merge branch 'Bhoy1/Multiagent' of https://github.com/Bhoy1/verifiers…
wjh581 Feb 11, 2026
cae6005
add ARC multiagent overview doc
wjh581 Feb 11, 2026
ed70983
Merge remote-tracking branch 'origin/main' into Bhoy1/Multiagent
wjh581 Feb 12, 2026
fb9751f
Fix multiagent compatibility with upstream: align run_group/score_gro…
wjh581 Feb 12, 2026
05a088d
Add 2nd pick is_correct to pipeline debug output for pass@2 visibility
wjh581 Feb 12, 2026
ed72edb
merge upstream main into Bhoy1/Multiagent
wjh581 Feb 18, 2026
6821879
multi-agent v2: Agent, TaskSet, Registry abstractions
wjh581 Feb 26, 2026
7b31b0f
multi-agent training: per-actor splitting in run_group()
wjh581 Feb 27, 2026
28dbfde
fix prime-rl compat and Windows support
wjh581 Feb 27, 2026
34952be
disable thinking in guesser prompt, increase max_tokens to 100
wjh581 Feb 27, 2026
ed508da
set twenty questions endpoints to None for training
wjh581 Feb 27, 2026
7fa9cd9
fix token client breaking multi-agent multi-turn
wjh581 Feb 27, 2026
4b3f53d
attach full game conversation to actor trajectory for logging
wjh581 Feb 27, 2026
808a0d8
fix RPS endpoints to None for server mode
wjh581 Feb 27, 2026
4f05e87
add proposer-solver-v2 env with None endpoints for server mode
wjh581 Feb 27, 2026
f2a4802
add asymmetric strategy hints to RPS prompts
wjh581 Feb 27, 2026
57d4b33
token client: find same-actor previous turn for multi-agent token reuse
wjh581 Feb 27, 2026
d9a90c1
Bump RPS agent max_tokens to 20
wjh581 Feb 27, 2026
1bc71a2
Clean up defensive extras access and remove unused import
wjh581 Feb 28, 2026
9c4cbcb
Remove unused outputs_per_input property
wjh581 Feb 28, 2026
7745579
Remove dead code, outdated docs, and old-style environments
wjh581 Feb 28, 2026
7a44917
Revert unnecessary changes to multiturn_env and eval_utils
wjh581 Feb 28, 2026
079e046
Remove unused multiagent_stateful_tool_env and tool_utils alias
wjh581 Feb 28, 2026
1efbfde
Restore poker_multi environment for eval
wjh581 Feb 28, 2026
48405c3
Add Prisoner's Dilemma env with masked actions and asymmetric payoffs
wjh581 Feb 28, 2026
ff98b9b
PD: strip think tags in extractor, bump max_tokens to 50, handle ambi…
wjh581 Feb 28, 2026
30adc1e
Move agent.py and taskset.py into envs/ for organization
wjh581 Feb 28, 2026
00ee035
Strip tool loop, add rollouts_per_example divisibility check, remove …
wjh581 Feb 28, 2026
cf0152e
Simplify MultiAgentRubric: remove group funcs, global reward path, an…
wjh581 Feb 28, 2026
41e33d8
Clean up environment docstrings
wjh581 Feb 28, 2026
afed1f6
Pass advantage through state_to_output for per-actor GRPO
wjh581 Feb 28, 2026
ff5b4a6
Add no-op env_response stub to satisfy abstract method
wjh581 Feb 28, 2026
4d4b2cc
Debug log per-actor advantages
wjh581 Mar 1, 2026
98fe8a6
Use print for advantage debug logging
wjh581 Mar 1, 2026
24026e5
Add debug print to trace advantage passthrough in state_to_output
wjh581 Mar 1, 2026
afd129c
Unconditional debug prints to trace advantage loss between score_grou…
wjh581 Mar 1, 2026
7128c2a
Debug: print which run_group path calls state_to_output
wjh581 Mar 1, 2026
2f9e295
Debug: print inside attempt() right after score_group with object IDs
wjh581 Mar 1, 2026
175b359
Debug: disable timing block to test if it resets advantage
wjh581 Mar 1, 2026
de7922e
debug: trap advantage reset with stack trace
wjh581 Mar 1, 2026
d4c446a
fix: skip advantage overwrite in metrics-only rubrics
wjh581 Mar 1, 2026
7942b47
proposer-solver: conflicting incentives for per-actor GRPO demo
wjh581 Mar 1, 2026
d631e8c
debug prints for reward/metrics pipeline tracing
wjh581 Mar 1, 2026
df7a65f
both agents 2048 max tokens
wjh581 Mar 1, 2026
4276d74
fix metrics dilution: only include per-actor metrics for relevant rol…
wjh581 Mar 1, 2026
5878a6d
remove all debug prints from advantage/metrics debugging
wjh581 Mar 2, 2026
a20fc4d
twenty questions: use default model, add token reuse print
wjh581 Mar 2, 2026
f18ab07
add twenty_questions_v2 pyproject.toml
wjh581 Mar 2, 2026
63f09e8
no_think + lower max_tokens for twenty questions
wjh581 Mar 2, 2026
0c0ee7d
add ultimatum game env
wjh581 Mar 3, 2026
4d1d05a
adversarial prompts for ultimatum game
wjh581 Mar 4, 2026
92d95fd
Propagate actor_id through RolloutOutput for per-actor LoRA routing
wjh581 Mar 4, 2026
b1061e3
Add actor_models parameter to load_environment for per-actor LoRA rou…
wjh581 Mar 4, 2026
c92a5e5
ultimatum game: penalize garbled output, neutral responder prompt
wjh581 Mar 6, 2026
197dc65
pass actor_models through rollout request for per-actor model routing
wjh581 Mar 6, 2026
a36cd83
Add shared-model and 4-player ultimatum game env variants
wjh581 Mar 7, 2026
4373773
Merge upstream/main into Bhoy1/Multiagent_v2
wjh581 Mar 7, 2026
9e7bb9a
Add actor_models to run_rollout for multi-agent routing
wjh581 Mar 7, 2026
575f12d
Add actor_models to run_rollout across full call chain
wjh581 Mar 7, 2026
79e828b
Override run_rollout in MultiAgentEnv for per-actor splitting
wjh581 Mar 8, 2026
0def04e
Support list returns in run_rollout RPC for multi-agent
wjh581 Mar 8, 2026
3444c6c
debug: add trajectory length print in run_rollout
wjh581 Mar 8, 2026
f0d1e95
debug: dump trajectory step extras in create_actor_states
wjh581 Mar 8, 2026
8f5300b
debug: trace rollout loop flow
wjh581 Mar 8, 2026
1318821
debug: show full error type and traceback for responder
wjh581 Mar 8, 2026
89cb625
Fix ultimatum game prompts to use user role for base model compatibility
wjh581 Mar 8, 2026
a923e53
Fix system role in ultimatum_game_shared for base model compatibility
wjh581 Mar 8, 2026
82c2554
Add debug prints for responder ModelError cause
wjh581 Mar 8, 2026
542388b
Normalize trajectory step messages before to_native_prompt in token c…
wjh581 Mar 8, 2026
349b52e
restore system/user role split for instruct model
wjh581 Mar 8, 2026
7c654c3
Remove unused Agent.respond() and re-enable scoring timing
wjh581 Mar 9, 2026
9e0054a
Add ARC codegen training environment
wjh581 Mar 10, 2026
5e53f1d
Fix imports: Actor->Agent, Protocol->Registry
wjh581 Mar 10, 2026
1bf2c75
Make sandbox evaluation async for parallel execution
wjh581 Mar 10, 2026
b803fcf
Add print debugging and sandbox concurrency cap (32)
wjh581 Mar 10, 2026
28576dd
Fix duplicate system prompt, make prompt more direct
wjh581 Mar 10, 2026
13b1e74
skip GRPO advantage for single-state actor groups
wjh581 Mar 10, 2026
58fb988
add needs_group_scoring flag to MultiAgentRubric
wjh581 Mar 10, 2026
f39437f
Add arc_multistrategy env + sort_by_size curriculum for codegen
wjh581 Mar 11, 2026
71e3d7c
Set model on Agent objects instead of TOML actor_models
wjh581 Mar 11, 2026
51fa35c
Add debug print for image solver grid extraction failures
wjh581 Mar 11, 2026
21230b5
Make all 3 actors top-level so all get trained
wjh581 Mar 11, 2026
17925d5
Remove image actor, codegen-only for 2-GPU testing
wjh581 Mar 15, 2026
0ef1872
Skip advantage computation for single-rollout actor groups
wjh581 Mar 15, 2026
aa19f81
add arc_multistrategy_vision env with image actor (Qwen3-VL-4B)
wjh581 Mar 16, 2026
d94e53e
accept **kwargs in load_environment for multi_model actor_endpoints
wjh581 Mar 16, 2026
df597e4
use actor_endpoints for per-actor client routing in vision env
wjh581 Mar 17, 2026
8700cd8
add arc_1d_multistrategy environment and dataset generation script
wjh581 Mar 18, 2026
ee68321
add arc_synthetic_multistrategy environment and dataset generator
wjh581 Mar 24, 2026
50eef72
add arc_synthetic_multistrategy_vision environment
wjh581 Mar 24, 2026
ee7a171
Add 18 Level 2 synthetic ARC generators
wjh581 Mar 24, 2026
c1a83ce
Add LoRA extraction script and test split generation
wjh581 Mar 25, 2026
3f40f71
Fix extract_lora.py: clone tensors before saving to safetensors
wjh581 Mar 25, 2026
9cd251d
Fix extract_lora.py: reconstruct tensors with empty_like+copy_ to fix…
wjh581 Mar 25, 2026
3348ad0
Fix extract_lora.py: save as pytorch bin instead of safetensors
wjh581 Mar 25, 2026
07568bd
Fix extract_lora.py: convert DTensors to plain tensors for vLLM compa…
wjh581 Mar 25, 2026
d604110
Add workspace source for verifiers in arc_synthetic_multistrategy
wjh581 Mar 25, 2026
8627d84
Handle SyntaxError in codegen eval instead of crashing
wjh581 Mar 25, 2026
3dad944
Support separate v4_model for per-actor LoRA routing in eval
wjh581 Mar 25, 2026
f6ab6c0
Remove workspace source - breaks when installed from prime-rl
wjh581 Mar 25, 2026
6d74e12
Add per-actor reward and pass@k metrics to eval display
wjh581 Mar 26, 2026
9eb5385
Restore workspace source for verifiers env
wjh581 Mar 26, 2026
cf79b66
Debug per-actor pass@k computation
wjh581 Mar 26, 2026
6beeac2
fix per-actor pass@k: divide rollouts by num_actors
wjh581 Mar 26, 2026
ec52879
pass pass_threshold through load_environment via kwargs
wjh581 Mar 26, 2026
d4a3aec
add v4_model support to arc_multistrategy for per-actor LoRA routing
wjh581 Mar 26, 2026
4d83f6b
fix LoRA extraction: keep model. prefix in keys, save as safetensors
wjh581 Mar 26, 2026
3a64fe0
add arc_curriculum 2-LoRA self-play environment
wjh581 Apr 2, 2026
d223eb0
simplify curriculum: L1 ops only, no post_ops
wjh581 Apr 2, 2026
69241c1
adjust curriculum grid sizes: 2-5, 5-8, 8-12
wjh581 Apr 2, 2026
142d19a
add arc_curriculum_filtered env with dynamic op skipping
wjh581 Apr 2, 2026
99ba55b
Remove few-shot examples from generator prompt to prevent collapse
wjh581 Apr 3, 2026
33e1871
Generator only picks level (1-3), seed fixed at 1337
wjh581 Apr 3, 2026
581c5bf
Use opaque A/B/C labels instead of 1/2/3 for generator levels
wjh581 Apr 3, 2026
3030730
Mastery-based curriculum: 8 seeds, 5-8 grids, level progression
wjh581 Apr 4, 2026
ae8435a
per-(op, level) mastery tracking for curriculum generator
wjh581 Apr 4, 2026
2a7ccb6
match fixed dataset seeds: 20 per op using same formula
wjh581 Apr 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
892 changes: 892 additions & 0 deletions environments/arc_1d_multistrategy/arc_1d_multistrategy.py

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions environments/arc_1d_multistrategy/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[project]
name = "arc-1d-multistrategy"
description = "Multi-strategy ARC-1D solver with two codegen strategies"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = ["verifiers>=0.1.9", "datasets"]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build]
include = ["arc_1d_multistrategy.py", "pyproject.toml"]

[tool.verifiers.eval]
num_examples = 10
rollouts_per_example = 2
Loading