Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ The format is (loosely) based on [Keep a Changelog](http://keepachangelog.com/)

### Added

- If a validation error occurs in recursive mode only show the invalid items unless verbose mode is on. [#243](https://github.com/stac-utils/stac-validator/pull/243)
- Added ability to validate extensions of Collections [#243](https://github.com/stac-utils/stac-validator/pull/243)
- Improve error reporting through use of [best_match](https://python-jsonschema.readthedocs.io/en/stable/errors/#best-match-and-relevance) [#243](https://github.com/stac-utils/stac-validator/pull/243)
- Add `schema-map` option similar to [stac-node-validator SchemaMap](https://github.com/stac-utils/stac-node-validator?tab=readme-ov-file#usage) to allow validation against local copies of schemas. [#243](https://github.com/stac-utils/stac-validator/pull/243)

## [v3.5.0] - 2025-01-10

### Added
Expand Down
100 changes: 73 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,33 +91,35 @@ stac-validator --help
Usage: stac-validator [OPTIONS] STAC_FILE

Options:
--core Validate core stac object only without extensions.
--extensions Validate extensions only.
--links Additionally validate links. Only works with
default mode.
--assets Additionally validate assets. Only works with
default mode.
-c, --custom TEXT Validate against a custom schema (local filepath or
remote schema).
-r, --recursive Recursively validate all related stac objects.
-m, --max-depth INTEGER Maximum depth to traverse when recursing. Omit this
argument to get full recursion. Ignored if
`recursive == False`.
--collections Validate /collections response.
--item-collection Validate item collection response. Can be combined
with --pages. Defaults to one page.
--no-assets-urls Disables the opening of href links when validating
assets (enabled by default).
--header KEY VALUE HTTP header to include in the requests. Can be used
multiple times.
-p, --pages INTEGER Maximum number of pages to validate via --item-
collection. Defaults to one page.
-v, --verbose Enables verbose output for recursive mode.
--no_output Do not print output to console.
--log_file TEXT Save full recursive output to log file (local
filepath).
--version Show the version and exit.
--help Show this message and exit.
--core Validate core stac object only without
extensions.
--extensions Validate extensions only.
--links Additionally validate links. Only works with
default mode.
--assets Additionally validate assets. Only works with
default mode.
-c, --custom TEXT Validate against a custom schema (local
filepath or remote schema).
--schema-map <TEXT TEXT>... Schema path to replaced by (local) schema path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should be a -s option for example, to make it a little easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

during validation. Can be used multiple times.
-r, --recursive Recursively validate all related stac objects.
-m, --max-depth INTEGER Maximum depth to traverse when recursing. Omit
this argument to get full recursion. Ignored if
`recursive == False`.
--collections Validate /collections response.
--item-collection Validate item collection response. Can be
combined with --pages. Defaults to one page.
--no-assets-urls Disables the opening of href links when
validating assets (enabled by default).
--header <TEXT TEXT>... HTTP header to include in the requests. Can be
used multiple times.
-p, --pages INTEGER Maximum number of pages to validate via --item-
collection. Defaults to one page.
-v, --verbose Enables verbose output for recursive mode.
--no_output Do not print output to console.
--log_file TEXT Save full recursive output to log file (local
filepath).
--help Show this message and exit.
```

---
Expand Down Expand Up @@ -340,3 +342,47 @@ stac-validator https://earth-search.aws.element84.com/v0/collections/sentinel-s2
```bash
stac-validator https://stac-catalog.eu/collections/sentinel-s2-l2a/items --header x-api-key $MY_API_KEY --header foo bar
```

**--schema-map**
Schema map allows stac-validator to replace a schema in a STAC json by a schema from another URL or local schema file.
This is especially useful when developing a schema and testing validation against your local copy of the schema.

``` bash
stac-validator https://raw.githubusercontent.com/radiantearth/stac-spec/master/examples/extended-item.json --extensions --schema-map https://stac-extensions.github.io/projection/v1.0.0/schema.json stac-validator https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/extended-item.json --extensions --schema-map https://stac-extensions.github.io/projection/v1.0.0/schema.json "tests/test_data/schema/v1.0.0/projection.json"
[
{
"version": "1.0.0",
"path": "https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/extended-item.json",
"schema": [
"https://stac-extensions.github.io/eo/v1.0.0/schema.json",
"tests/test_data/schema/v1.0.0/projection.json",
"https://stac-extensions.github.io/scientific/v1.0.0/schema.json",
"https://stac-extensions.github.io/view/v1.0.0/schema.json",
"https://stac-extensions.github.io/remote-data/v1.0.0/schema.json"
],
"valid_stac": true,
"asset_type": "ITEM",
"validation_method": "extensions"
}
]
```

This option is also capable of replacing URLs to subschemas:

```bash
stac-validator tests/test_data/v100/extended-item-local.json --custom tests/test_data/schema/v1.0.0/item_with_unreachable_url.json --schema-map https://geojson-wrong-url.org/schema/Feature.json https://geojson.org/schema/Feature.json --schema-map https://geojson-wrong-url.org/schema/Geometry.json https://geojson.org/schema/Geometry.json
[
{
"version": "1.0.0",
"path": "tests/test_data/v100/extended-item-local.json",
"schema": [
"tests/test_data/schema/v1.0.0/item_with_unreachable_url.json"
],
"valid_stac": true,
"asset_type": "ITEM",
"validation_method": "custom"
}
]
```


15 changes: 14 additions & 1 deletion stac_validator/stac_validator.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import json
import sys
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional, Tuple

import click # type: ignore

Expand Down Expand Up @@ -87,6 +87,12 @@ def collections_summary(message: List[Dict[str, Any]]) -> None:
default="",
help="Validate against a custom schema (local filepath or remote schema).",
)
@click.option(
"--schema-map",
type=(str, str),
multiple=True,
help="Schema path to replaced by (local) schema path during validation. Can be used multiple times.",
)
@click.option(
"--recursive",
"-r",
Expand Down Expand Up @@ -149,6 +155,7 @@ def main(
links: bool,
assets: bool,
custom: str,
schema_map: List[Tuple],
verbose: bool,
no_output: bool,
log_file: str,
Expand All @@ -170,6 +177,7 @@ def main(
links (bool): Whether to additionally validate links. Only works with default mode.
assets (bool): Whether to additionally validate assets. Only works with default mode.
custom (str): Path to a custom schema file to validate against.
schema_map (list(tuple)): List of tuples each having two elememts. First element is the schema path to be replaced by the path in the second element.
verbose (bool): Whether to enable verbose output for recursive mode.
no_output (bool): Whether to print output to console.
log_file (str): Path to a log file to save full recursive output.
Expand All @@ -182,6 +190,10 @@ def main(
or 1 if it is invalid.
"""
valid = True
if schema_map == ():
schema_map_dict: Optional[Dict[str, str]] = None
else:
schema_map_dict = dict(schema_map)
stac = StacValidate(
stac_file=stac_file,
collections=collections,
Expand All @@ -196,6 +208,7 @@ def main(
headers=dict(header),
extensions=extensions,
custom=custom,
schema_map=schema_map_dict,
verbose=verbose,
log=log_file,
)
Expand Down
93 changes: 53 additions & 40 deletions stac_validator/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,10 @@
from urllib.parse import urlparse
from urllib.request import Request, urlopen

import jsonschema
import requests # type: ignore
from jsonschema import Draft202012Validator
from referencing import Registry, Resource
from referencing.jsonschema import DRAFT202012
from referencing.retrieval import to_cached_resource
from referencing.typing import URI

NEW_VERSIONS = [
Expand Down Expand Up @@ -192,88 +190,103 @@ def link_request(
initial_message["format_invalid"].append(link["href"])


def fetch_remote_schema(uri: str) -> dict:
def fetch_remote_schema(uri: str, timeout: int = 10) -> Dict:
"""
Fetch a remote schema from a URI.

Args:
uri (str): The URI of the schema to fetch.
timeout (int): Default timeout for robustness

Returns:
dict: The fetched schema content as a dictionary.

Raises:
requests.RequestException: If the request to fetch the schema fails.
"""
response = requests.get(uri)
response.raise_for_status()
return response.json()
try:
response = requests.get(uri, timeout=timeout)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
raise requests.RequestException(
f"Failed to fetch schema from {uri}: {str(e)}"
) from e
except Exception as e:
raise Exception(
f"Unexpected error while retrieving schema from {uri}: {str(e)}"
) from e


@to_cached_resource() # type: ignore
def cached_retrieve(uri: URI) -> str:
def cached_retrieve(uri: URI, schema_map: Optional[Dict] = None) -> Resource[Dict]:
"""
Retrieve and cache a remote schema.

Args:
uri (str): The URI of the schema.
schema_map_keys: Override schema location to validate against local versions of a schema

Returns:
str: The raw JSON string of the schema.
dict: The parsed JSON dict of the schema.

Raises:
requests.RequestException: If the request to fetch the schema fails.
Exception: For any other unexpected errors.
"""
try:
response = requests.get(uri, timeout=10) # Set a timeout for robustness
response.raise_for_status() # Raise an error for HTTP response codes >= 400
return response.text
except requests.exceptions.RequestException as e:
raise requests.RequestException(
f"Failed to fetch schema from {uri}: {str(e)}"
) from e
except Exception as e:
raise Exception(
f"Unexpected error while retrieving schema from {uri}: {str(e)}"
) from e
return Resource.from_contents(
fetch_schema_with_override(uri, schema_map=schema_map)
)


def validate_with_ref_resolver(schema_path: str, content: dict) -> None:
def fetch_schema_with_override(
schema_path: str, schema_map: Optional[Dict] = None
) -> Dict:
"""
Retrieve and cache a remote schema.

Args:
schema_path (str): Path or URI of the schema.
schema_map (dict): Override schema location to validate against local versions of a schema

Returns:
dict: The parsed JSON dict of the schema.
"""

if schema_map:
if schema_path in schema_map:
schema_path = schema_map[schema_path]

# Load the schema
return fetch_and_parse_schema(schema_path)


def validate_with_ref_resolver(
schema_path: str, content: Dict, schema_map: Optional[Dict] = None
) -> None:
"""
Validate a JSON document against a JSON Schema with dynamic reference resolution.

Args:
schema_path (str): Path or URI of the JSON Schema.
content (dict): JSON content to validate.
schema_map (dict): Override schema location to validate against local versions of a schema

Raises:
jsonschema.exceptions.ValidationError: If validation fails.
requests.RequestException: If fetching a remote schema fails.
FileNotFoundError: If a local schema file is not found.
Exception: If any other error occurs during validation.
"""
# Load the schema
if schema_path.startswith("http"):
schema = fetch_remote_schema(schema_path)
else:
try:
with open(schema_path, "r") as f:
schema = json.load(f)
except FileNotFoundError as e:
raise FileNotFoundError(f"Schema file not found: {schema_path}") from e

schema = fetch_schema_with_override(schema_path, schema_map=schema_map)
# Set up the resource and registry for schema resolution
cached_retrieve_with_schema_map = functools.partial(
cached_retrieve, schema_map=schema_map
)
resource: Resource = Resource(contents=schema, specification=DRAFT202012) # type: ignore
registry: Registry = Registry(retrieve=cached_retrieve).with_resource( # type: ignore
registry: Registry = Registry(retrieve=cached_retrieve_with_schema_map).with_resource( # type: ignore
uri=schema_path, resource=resource
) # type: ignore

# Validate the content against the schema
try:
validator = Draft202012Validator(schema, registry=registry)
validator.validate(content)
except jsonschema.exceptions.ValidationError as e:
raise jsonschema.exceptions.ValidationError(f"{e.message}") from e
except Exception as e:
raise Exception(f"Unexpected error during validation: {str(e)}") from e
validator = Draft202012Validator(schema, registry=registry)
validator.validate(content)
Loading