Skip to content

Test case names can result in invalid eval directory names #21

@valerie-autumn-skye

Description

@valerie-autumn-skye

The following test cases result in invalid directory names on Windows, which cause metacoder to fail when it reaches that test case. This can be expensive if this occurs toward the end of a large number of tests. In these specific cases, the colon (':') character is not a valid path character on Windows. Linux and MacOS will have their own failure cases.

Suggestions:

  • Test name requirements should be added to the config file specification.
  • The YAML config file should be validated in some way prior to starting the full test suite, in order to reject test cases that fail to conform to the correct specification.
- name: https://www.ncbi.nlm.nih.gov/books/NBK1256/_HTML
  metrics:
  - CorrectnessMetric
  input: What are the last two rows of table 2
  expected_output: "Behavior disorder/\nPsychosis        10%        \nAltered mentation\n\
    Impaired reality testing\nCone-rod\ndystrophy        70%        \nLoss of central\
    \ vision & color vision\nAbnormal fundoscopic exam"
  threshold: 0.9

- name: PMID:40307501_Figure_Legend
  metrics:
  - CorrectnessMetric
  input: What is the first sentence of figure 1 legend
  expected_output: Proposed system for bio-accelerated weathering of ultramafic materials
    for carbon mineralization
  threshold: 0.9

Stack trace (for first case):

Progress: 6/25 - goose/claude-4-sonnet/https://www.ncbi.nlm.nih.gov/books/NBK1256/_HTML with servers: artl, simple-pubmed, ols
Running goose with claude-4-sonnet on case 'https://www.ncbi.nlm.nih.gov/books/NBK1256/_HTML'
📁 Preparing workdir: eval_workdir\claude-4-sonnet_goose_https:\www.ncbi.nlm.nih.gov\books\NBK1256\_HTML_artl_simple-pubmed_ols\claude-4-sonnet_goose_https:\www.ncbi.nlm.nih.gov\books\NBK1256\_HTML      
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Scripts\metacoder.exe\__main__.py", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\click\core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\click\core.py", line 1363, in main
    rv = self.invoke(ctx)
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\click\core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\click\core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\click\core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\metacoder\metacoder.py", line 587, in eval_command
    results = runner.run_all_evals(dataset, workdir_path, coders_list)
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\metacoder\evals\runner.py", line 502, in run_all_evals
    results = self.run_single_eval(
        model_name,
    ...<4 lines>...
        coder_config,
    )
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\metacoder\evals\runner.py", line 279, in run_single_eval
    output: CoderOutput = coder.run(case.input)
                          ~~~~~~~~~^^^^^^^^^^^^
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\metacoder\coders\goose.py", line 146, in run
    self.prepare_workdir()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\metacoder\coders\base_coder.py", line 356, in prepare_workdir
    with change_directory(self.workdir):
         ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "C:\Users\CTParker\AppData\Roaming\uv\python\cpython-3.13.2-windows-x86_64-none\Lib\contextlib.py", line 141, in __enter__
    return next(self.gen)
  File "C:\Users\CTParker\PycharmProjects\mcp_literature_eval\.venv\Lib\site-packages\metacoder\coders\base_coder.py", line 66, in change_directory
    Path(path).mkdir(parents=True, exist_ok=True)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\CTParker\AppData\Roaming\uv\python\cpython-3.13.2-windows-x86_64-none\Lib\pathlib\_local.py", line 722, in mkdir
    os.mkdir(self, mode)
    ~~~~~~~~^^^^^^^^^^^^
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'eval_workdir\\claude-4-sonnet_goose_https:\\www.ncbi.nlm.nih.gov\\books\\NBK1256\\_HTML_artl_simple-pubmed_ols\\claude-4-sonnet_goose_https:\\www.ncbi.nlm.nih.gov\\books\\NBK1256\\_HTML'

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions