Add Gemini Vertex AI integration with thinking budget support #83

NewcomerAI · 2025-08-06T12:31:24Z

Add GeminiVertexLanguageModel class for Vertex AI authentication
Extend extract() function with project, location, thinking_budget parameters
Maintain full backward compatibility with existing API key approach
Add comprehensive test suite and example code
Include detailed documentation and migration guide

Features:

Vertex AI authentication using project + location instead of API keys
Thinking budget control for configurable reasoning capabilities
Full feature parity with existing GeminiLanguageModel
Schema constraints support for structured outputs
Parallel processing and safety settings support
Comprehensive error handling and validation

Files changed:

langextract/inference.py: New GeminiVertexLanguageModel class
langextract/init.py: Extended extract() function with new parameters
examples/vertex_ai_example.py: Complete usage examples
test_vertex_integration.py: Integration test suite
VERTEX_AI_INTEGRATION.md: Comprehensive documentation

- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3

- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation

- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6

Deleted an inline comment referencing the output directory in the save_annotated_documents.

…ples.md docs: clarify output_dir behavior in medication_examples.md

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

docs: add output_dir="." to all save_annotated_documents examples

feat: add code formatting and linting pipeline

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

Add LangExtractError base exception for centralized error handling

Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.

fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility

- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example

…-mkdir Fix save_annotated_documents to handle string paths

feat: Add OpenAI language model support

…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <[email protected]> Co-authored-by: Akshay Goel <[email protected]>

- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix

…oogle#57) Fixes google#27

…e#62) - Add quickstart example and documentation for local LLM usage - Include Docker setup with health checks and docker-compose - Add integration tests and update CI pipeline - Secure setup: localhost-only binding, containerized deployment Signed-off-by: Akshay Goel <[email protected]>

- Ollama integration with Docker examples - Fixed OllamaLanguageModel parameter name (model -> model_id) - Added CI/CD tests for Ollama - Updated documentation with consistent API examples

Bumps the github_actions group with 1 update in the /.github/workflows directory: [tj-actions/changed-files](https://github.com/tj-actions/changed-files). Updates `tj-actions/changed-files` from 44 to 46 - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](tj-actions/changed-files@v44...v46) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-version: '46' dependency-type: direct:production dependency-group: github_actions ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…e#74) - Add check-linked-issue.yml: Enforces that PRs reference issues with 5+ community reactions - Add check-pr-size.yml: Labels PRs by size and enforces 1000 line limit - Update CONTRIBUTING.md: Document new PR requirements and size guidelines - Include helpful error messages with links to contribution guidelines - Create a scalable system for maintaining code quality and review efficiency

- Use specific version v1.2.7 of nearform action - Ensures custom-body parameter is properly recognized

* Add infrastructure file protection workflow - Prevents non-maintainers from modifying .github/ directory - Provides helpful guidance to use autoformat.sh for formatting - Updates CONTRIBUTING.md to document this restriction * Expand infrastructure protection to build config and core docs - Protect pyproject.toml, tox.ini, .pre-commit-config.yaml, .pylintrc, Dockerfile - Protect autoformat.sh, .gitignore, CONTRIBUTING.md, LICENSE, CITATION.cff - Update workflow name to 'Protect Infrastructure Files' - Simplify CONTRIBUTING.md language to be more general * Add note about special circumstances for build config changes Clarify that build configuration updates may be considered in special circumstances with proper discussion, testing evidence, and community support.

- Add maintainer permission check to linked issue workflow - Maintainers (admin/maintain) can bypass 5+ thumbs up requirement - Provides informational logging when maintainer bypass is used

* Add workflow_dispatch trigger to validation workflows - Enable manual triggering for check-linked-issue, check-pr-size, and validate_pr_template - Add conditional logic to ensure PR-specific steps only run on PR events - Allows maintainers to manually trigger workflows when needed * Add manual trigger to infrastructure protection workflow - Add workflow_dispatch trigger - Add conditional logic for PR-specific checks - Ensures consistency across all validation workflows

- Change from pull_request to pull_request_target in all validation workflows - This gives workflows proper permissions to add labels and comments on PRs from forks - Fixes 'Resource not accessible by integration' error (HTTP 403) - Safe because workflows only read PR metadata and don't execute PR code

- Add GeminiVertexLanguageModel class for Vertex AI authentication - Extend extract() function with project, location, thinking_budget parameters - Maintain full backward compatibility with existing API key approach - Add comprehensive test suite and example code - Include detailed documentation and migration guide Features: - Vertex AI authentication using project + location instead of API keys - Thinking budget control for configurable reasoning capabilities - Full feature parity with existing GeminiLanguageModel - Schema constraints support for structured outputs - Parallel processing and safety settings support - Comprehensive error handling and validation Files changed: - langextract/inference.py: New GeminiVertexLanguageModel class - langextract/__init__.py: Extended extract() function with new parameters - examples/vertex_ai_example.py: Complete usage examples - test_vertex_integration.py: Integration test suite - VERTEX_AI_INTEGRATION.md: Comprehensive documentation

- Fix pyink formatting issues in __init__.py and inference.py - Remove trailing whitespace - Move google.genai.types import to top level to fix pylint warning - Remove duplicate import from method

Enables manual triggering of CI workflow including live API tests. This allows maintainers to run live API tests for PRs from forks where the tests would normally be skipped for security reasons.

github-actions · 2025-08-06T23:11:12Z

Manual validation results:

Size: 825 lines
Template: ✗
Linked issue: ✗

Run ID: 16790802536

github-actions · 2025-08-06T23:15:54Z

Manual validation results:

Size: 825 lines
Template: ✗
Linked issue: ✗

Run ID: 16790874427

github-actions · 2025-08-06T23:36:53Z

Manual validation results:

Size: 825 lines
Template: ✗
Linked issue: ✗

Run ID: 16791195286

github-actions · 2025-08-06T23:40:27Z

Manual Validation Results

Status: ❌ Failed

Check	Status	Details
PR Size	✅	825 lines
Template	❌	Missing required sections
Linked Issue	❌	Missing Fixes/Closes #XXX

Errors:

❌ Missing PR template sections: # Description, Fixes #, # How Has This Been Tested?, # Checklist
❌ No linked issue found

View workflow run

github-actions · 2025-08-07T05:29:40Z

⚠️ Branch Update Required

Your branch is 9 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push