qwen35-9b · team-no-protocol · CooperData v1

Generated 2026-05-29 · follow-up to the team→coop placeholder report
First end-to-end run of the team→coop converter (CooperData PR #101) on a fresh model + dataset combination: Qwen3.5-9B (mini_swe_agent_v2, --setting team --team-no-protocol) against CooperData v1 (70 tasks, 345 conflict pairs). Adds a third run to CooperBench/team-coop alongside the two existing codex/gpt-5.5-hao runs.

Headline

fieldvalue
pairs evaluated345 / 345
both-agent pass21 (6.09%)
fail323
eval error1
cost$0.00 (self-hosted vLLM)
HF subdirqwen35-cooperdata-team-noproto

Pass rate by repo

Ten of 26 repos saw at least one both-agent pass. arrow, cobra, and pyparsing contribute the largest absolute counts; cobra, pyparsing, env, and typeguard hit ≥18% locally.

reponpassfailerrrate
arrow_task6365709.5%
cobra_task16313018.8%
pyparsing_task15312020.0%
flask_task2922706.9%
env_task1028020.0%
axios_task7417301.4%
gin_task1811705.6%
pygments_task918011.1%
oauthlib_task716014.3%
typeguard_task514020.0%
astroid_task2302300.0%
flake8_task1801800.0%
jinja_task90900.0%
anyhow_task80800.0%
sqlfluff_task60600.0%
sqlglot_task60600.0%
starlette_task60600.0%
sqlparse_task40400.0%
cantools_task30300.0%
indicatif_task30300.0%
roaring_task30300.0%
trio_task30300.0%
tweepy_task30300.0%
click_task20200.0%
avro_task10100.0%
xpath_task10100.0%
total3452132316.09%

Run setup

fieldvalue
modelQwen/Qwen3.5-9B served on Modal vLLM (cooperbench--qwen35-9b-128k-serve.modal.run)
agent_frameworkmini_swe_agent_v2
settingteam --team-no-protocol (lead + member, Redis-backed task list, shared scratchpad volume, MCP)
datasetCooperData v1 — 70 tasks, 345 has_conflict pairs across 26 repos
concurrency3 pairs in flight (lowered from 5 after a host OOM mid-sweep)
backenddocker (task images: akhatua/cooperbench-<repo>:task<id>)
run wall-time~7h spread over ~30h (two host OOMs from pathological generated tests; sweep resumed each time)

Data registry

fieldvalue
data_idteam-coop-qwen35-cooperdata-v1
storage_pathhuggingface.co/datasets/CooperBench/team-coop/tree/main/qwen35-cooperdata-team-noproto
generation_methodcooperbench --setting team --team-no-protocolconvert_team (CooperData PR #101)
n_trajectories345
modelQwen/Qwen3.5-9B
agent_frameworkmini_swe_agent_v2
pass_rate6.09% (21/345)
ownerProKil
date2026-05-29

Regenerate

cd /path/to/CooperBench
# (1) trajectories
export OPENAI_API_KEY=dummy
export OPENAI_BASE_URL=https://cooperbench--qwen35-9b-128k-serve.modal.run/v1
uv run cooperbench run \
    -n qwen35-cooperdata-team-noproto \
    -s cooperdata-all \
    -a mini_swe_agent_v2 \
    -m openai/Qwen/Qwen3.5-9B \
    --setting team --team-no-protocol \
    --backend docker --concurrency 3 --eval-concurrency 3 \
    --dataset-dir /path/to/CooperData/dataset
# (2) team-harness logs → coop layout
uv run python scripts/convert_team_to_coop.py logs/qwen35-cooperdata-team-noproto --out data
# (3) upload
hf upload CooperBench/team-coop \
    data/qwen35-cooperdata-team-noproto qwen35-cooperdata-team-noproto \
    --repo-type dataset

Related links