Skip to content

Conversation

penfever
Copy link
Contributor

@penfever penfever commented Aug 22, 2025

Description

This PR refactors and stabilizes the inference test suite, extending coverage to more engines (VLLM and LlamaCPP), makes VLM GPU inference tests more efficient by using smaller models, and standardizes the testing format for inference using an abstract base class approach.

Each inference engine now inherits 7+ standardized test methods covering basic inference, batch processing, file I/O, parameter validation, deterministic generation, and error handling scenarios. Abstract methods allow each engine to specify performance thresholds, model configurations, and hardware requirements while maintaining consistent test behavior.

Improved GPU VRAM cache release triggering to allow for more extended test suites without CUDA OOM errors.

Related issues

Fixes # (issue)

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

@penfever penfever marked this pull request as ready for review August 25, 2025 23:10
@penfever penfever requested review from a team, oelachqar and wizeng23 and removed request for a team August 25, 2025 23:11
@penfever penfever changed the title [WIP] Penfever/inference tests Standardized Inference Engine Integration Testing Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant