-
Notifications
You must be signed in to change notification settings - Fork 30
tests: Add parametrized validation test for manifest-only connectors #706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
9465284
473f237
617c64f
ec5a6b1
8a3a248
104aac2
dc2416e
fcaf435
6aeb97b
5841083
79e9118
9282e0b
0a4b66e
a87f360
7f92efa
0ecc73d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,226 @@ | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
| Unit tests for validating manifest.yaml files from the connector registry against the CDK schema. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| This test suite fetches all manifest-only connectors from the Airbyte connector registry, | ||||||||||||||||||||||||||||||
| downloads their manifest.yaml files from public endpoints, and validates them against | ||||||||||||||||||||||||||||||
| the current declarative component schema defined in the CDK. | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||
| from typing import Any, Dict, List, Tuple | ||||||||||||||||||||||||||||||
| from unittest.mock import patch | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| import pytest | ||||||||||||||||||||||||||||||
| import requests | ||||||||||||||||||||||||||||||
| import yaml | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| from airbyte_cdk.sources.declarative.validators.validate_adheres_to_schema import ( | ||||||||||||||||||||||||||||||
| ValidateAdheresToSchema, | ||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
aaronsteers marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||
| EXCLUDED_CONNECTORS = [ | ||||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| CONNECTOR_REGISTRY_URL = "https://connectors.airbyte.com/files/registries/v0/oss_registry.json" | ||||||||||||||||||||||||||||||
| MANIFEST_URL_TEMPLATE = "https://connectors.airbyte.com/files/metadata/airbyte/{connector_name}/latest/manifest.yaml" | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| VALIDATION_SUCCESSES = [] | ||||||||||||||||||||||||||||||
| VALIDATION_FAILURES = [] | ||||||||||||||||||||||||||||||
| DOWNLOAD_FAILURES = [] | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| # Use pytest fixtures to provide thread-safe lists for test results. | |
| @pytest.fixture | |
| def validation_successes(): | |
| return [] | |
| @pytest.fixture | |
| def validation_failures(): | |
| return [] | |
| @pytest.fixture | |
| def download_failures(): | |
| return [] |
Copilot uses AI. Check for mistakes.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider more robust schema file loading
The current path navigation (going up 4 parent directories) is brittle and assumes a specific directory structure. What if we used a more robust approach like importlib.resources or tried multiple potential locations?
Also, should we add error handling for when the schema file doesn't exist? This would provide a clearer error message than a generic FileNotFoundError.
Example:
def load_declarative_component_schema() -> Dict[str, Any]:
"""Load the declarative component schema from the CDK."""
schema_path = (
Path(__file__).resolve().parent.parent.parent.parent
/ "airbyte_cdk/sources/declarative/declarative_component_schema.yaml"
)
if not schema_path.exists():
raise FileNotFoundError(
f"Declarative component schema not found at {schema_path}. "
"Please ensure the CDK is properly installed."
)
with open(schema_path, "r") as file:
return yaml.safe_load(file)wdyt?
🤖 Prompt for AI Agents
In unit_tests/sources/declarative/test_manifest_registry_validation.py around
lines 37 to 44, replace the brittle Path(...).parent.parent.parent.parent
approach with a more robust loader: attempt to load
"declarative_component_schema.yaml" via importlib.resources (or
importlib.resources.files().joinpath/read_text in py3.9+) from the
airbyte_cdk.sources.declarative package first, fall back to the existing
relative path only if the resource API fails, and if neither location yields the
file raise a clear FileNotFoundError that includes attempted locations and a
short instruction to ensure the CDK is installed; ensure the function returns
the parsed yaml as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Type annotation and error handling improvements
A few suggestions for this function:
-
The return type
List[Tuple[str, str]]is misleading since the second element is alwaysNone. Should this beList[Tuple[str, None]]orList[Tuple[str, Optional[str]]]? -
Using
pytest.fail()in a helper function makes it less reusable outside of pytest context. Would it be better to raise an exception and let the caller decide how to handle it? -
Given the network dependency, should we add retry logic for transient failures? The PR objectives mention considering "network dependency flakiness (e.g., retries or circuit breakers)".
Example with retries:
from typing import Optional
import time
def get_manifest_only_connectors() -> List[Tuple[str, Optional[str]]]:
"""Fetch manifest-only connectors from the registry."""
max_retries = 3
for attempt in range(max_retries):
try:
response = requests.get(CONNECTOR_REGISTRY_URL, timeout=30)
response.raise_for_status()
# ... rest of the logic
except Exception as e:
if attempt == max_retries - 1:
raise RuntimeError(f"Failed to fetch connector registry after {max_retries} attempts: {e}")
time.sleep(2 ** attempt) # Exponential backoffwdyt?
🤖 Prompt for AI Agents
In unit_tests/sources/declarative/test_manifest_registry_validation.py around
lines 47-70, the helper get_manifest_only_connectors has an inaccurate return
type, calls pytest.fail inside a utility (reducing reusability), and lacks
retry/backoff for transient network errors; change the signature to return
List[Tuple[str, Optional[str]]], import Optional from typing, replace
pytest.fail with raising a clear exception (e.g., RuntimeError) so callers
decide handling, and add retry logic with a small max_retries and exponential
backoff (sleep between attempts) before raising the final error.
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function calls get_manifest_only_connectors() which makes a network request to the registry. When used in @pytest.mark.parametrize, this network call happens during test collection, potentially slowing down test discovery. Consider caching the result or using a different approach.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding CI-friendly connector limits
The PR objectives mention "considering a CI-friendly limit on processed connectors" since there are ~475 connectors. Running all of them might cause CI timeouts or excessive resource usage.
Would you consider adding an optional limit mechanism? For example:
import os
def get_manifest_only_connector_names() -> List[str]:
"""Get manifest-only connector names with optional limit for CI."""
connectors = get_manifest_only_connectors()
connector_names = [name for name, _ in connectors]
# Allow limiting connectors for CI runs
max_connectors = os.getenv("MAX_CONNECTORS_TO_TEST")
if max_connectors:
limit = int(max_connectors)
if limit > 0:
connector_names = connector_names[:limit]
logger.info(f"Limited to testing {limit} connectors (CI mode)")
return connector_namesThis would allow CI to run with MAX_CONNECTORS_TO_TEST=50 while still allowing full runs locally. wdyt?
🤖 Prompt for AI Agents
In unit_tests/sources/declarative/test_manifest_registry_validation.py around
lines 95 to 104, add an optional CI-friendly limit to
get_manifest_only_connector_names by reading an environment variable (e.g.,
MAX_CONNECTORS_TO_TEST) and slicing the returned list of connector names if the
variable is set to a positive integer; import os at the top, defensively parse
the env var to int (ignore or log and treat as unset on parse errors or
non-positive values), log when a limit is applied, and keep the default behavior
unchanged when the env var is absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid network during test collection: move parametrization to pytest_generate_tests
Calling get_manifest_only_connector_names() at import time makes test collection flaky/slow and can fail entirely if the registry is down. Shall we move parametrization to pytest_generate_tests and keep collection side-effect free, wdyt?
Apply this diff to remove import-time network access:
-@pytest.mark.parametrize("connector_name", get_manifest_only_connector_names())
+# Parametrization moved to pytest_generate_tests to avoid network at collection time.Add this hook at the end of the file to perform parametrization safely at runtime:
def pytest_generate_tests(metafunc):
if "connector_name" in metafunc.fixturenames:
try:
names = get_manifest_only_connector_names()
except Exception as e:
names = []
tr = metafunc.config.pluginmanager.get_plugin("terminalreporter")
if tr:
tr.write_line(f"[manifest-validator] Skipping connector parametrization due to registry error: {e}")
metafunc.parametrize("connector_name", names)🤖 Prompt for AI Agents
In unit_tests/sources/declarative/test_manifest_registry_validation.py around
line 281, the test currently calls get_manifest_only_connector_names() at import
time via @pytest.mark.parametrize which causes network access during collection;
remove that decorator usage and instead add a pytest_generate_tests hook at the
end of the file that checks if "connector_name" is in metafunc.fixturenames,
calls get_manifest_only_connector_names() inside a try/except (falling back to
an empty list on error), logs the exception to the terminal reporter if present,
and calls metafunc.parametrize("connector_name", names) so parametrization
happens at collection/runtime without import-time network access.
Outdated
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema is loaded and validator is created for every test iteration. Since the schema is static, consider loading it once and reusing it across all test iterations using a module-level fixture or cached function.
| validator.validate(manifest_dict) | |
| # Use the module-level validator to validate the manifest | |
| try: | |
| VALIDATOR.validate(manifest_dict) |
Copilot uses AI. Check for mistakes.
Uh oh!
There was an error while loading. Please reload this page.