Skip to content

Conversation

jerichosiahaya
Copy link

Add a more clear validation on AzureOpenAITextVectorizer config for azure_deployment, so it won't make people confused since it doesn't state that azure_deployment is a required field.

@jerichosiahaya jerichosiahaya changed the title feat: add validation checking for azure deployment config add validation checking for azure deployment config Nov 12, 2024
@tylerhutcherson tylerhutcherson added the enhancement New feature or request label Nov 12, 2024
@tylerhutcherson tylerhutcherson self-requested a review November 12, 2024 15:51
@tylerhutcherson
Copy link
Collaborator

tylerhutcherson commented Nov 19, 2024

@jerichosiahaya can you use the poetry env to run the formatter/linter? poetry run format

other details outlined here: https://github.com/redis/redis-vl-python/blob/main/CONTRIBUTING.md#linting-and-tests

@tylerhutcherson
Copy link
Collaborator

We want to bring this in, but more updates are needed. I will port your commits to another branch and use that. Stay tuned.

@jerichosiahaya
Copy link
Author

We want to bring this in, but more updates are needed. I will port your commits to another branch and use that. Stay tuned.

Alright, what do I need to update?

@bsbodden
Copy link
Collaborator

@tylerhutcherson did this get implemented in another branch?

@bsbodden
Copy link
Collaborator

Hi @jerichosiahaya,

Thank you so much for taking the time to contribute to RedisVL! We really appreciate community members who notice areas where our documentation could be clearer and take the initiative to help improve the user experience.

Why we can't merge this PR

After careful review, rebasing against main, and extensive testing, I've identified several fundamental issues with this approach that prevent us from merging it:

  1. The parameter is extracted but never used

The core problem is that the PR extracts azure_deployment from api_config and validates its presence, but then completely discards the value. Looking at the code:

  azure_deployment = (
      api_config.pop("azure_deployment")
      if api_config
      else os.getenv("AZURE_OPENAI_DEPLOYMENT")
  )

  if not azure_deployment:
      raise ValueError(...)

After this validation, azure_deployment is never passed to any client constructor, never stored as an instance variable, and never used anywhere. The variable just disappears.

  1. Architectural misunderstanding of Azure OpenAI

The Azure OpenAI deployment name isn't a parameter to the AzureOpenAI client constructor - it's passed as the model parameter to the embeddings.create() method calls. Our current implementation
already handles this correctly:

  • Users provide the deployment name via the model parameter when creating the vectorizer
  • This is documented in the docstring (line 91-92): "Deployment to use for embedding. Must be the 'Deployment name' not the 'Model name'"
  • The model parameter is then used in all _embed(), _embed_many(), _aembed(), and _aembed_many() calls

The deployment name is already a required parameter - just not where this PR looks for it.

  1. Breaking change that creates confusion

This change would require users to provide the deployment name in two places:

  1. As the model parameter (which is actually used)
  2. As azure_deployment in api_config (which would be validated but discarded)

This would break all existing code using the vectorizer correctly, including our own test fixtures:

  # From tests/integration/test_vectorizers.py:66-69
  AzureOpenAITextVectorizer(
      model=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "text-embedding-ada-002")
  )

This would fail with the new validation if AZURE_OPENAI_DEPLOYMENT isn't set, even though the deployment name is correctly provided via model.

  1. False confidence from passing tests

All tests pass locally (and likely in your environment), but this is misleading. The Azure OpenAI vectorizer tests are marked with @pytest.mark.requires_api_keys and are automatically skipped when
API keys aren't available. This means the breaking change slips through undetected in standard test runs.

  1. Documentation inconsistency

The PR updates the docstring to mention "Azure deployment" in api_config, but this creates confusion because:

  • The deployment name should be passed via the model parameter (as currently documented)
  • Adding it to api_config doesn't align with how the OpenAI SDK works
  • It contradicts the existing (correct) documentation about the model parameter

What the real problem might be

If users are getting confused about the Azure deployment configuration, the root cause is likely:

  1. Unclear naming: The parameter is called model but expects a deployment name (though this is documented)
  2. Error messages: When configuration is wrong, error messages from the OpenAI SDK might not be clear
  3. Examples: We might need more prominent examples showing Azure-specific setup

Better alternatives to consider

If you'd like to continue contributing in this area, here are some ideas that would genuinely help users:

  1. Improve documentation examples - Add clearer examples showing Azure deployment configuration
  2. Better error handling - Add validation that provides helpful error messages when embeddings fail due to deployment issues
  3. Add Azure-specific documentation - Create a dedicated guide for Azure OpenAI users
  4. Environment variable clarity - Better document which environment variables are needed for Azure vs standard OpenAI

I'd be happy to discuss any of these approaches with you if you're interested in pursuing them!

Summary

The issue isn't the intent (making deployment configuration clearer) but the implementation (validating an unused parameter that creates breaking changes). The current code already requires and
uses the deployment name correctly - it just expects it via the model parameter rather than api_config['azure_deployment'].

Thanks again for your contribution and for caring about user experience. We hope you'll continue contributing to RedisVL!

@bsbodden bsbodden closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants