Skip to content

Conversation

shults
Copy link
Contributor

@shults shults commented Sep 23, 2025

User description

TT-10073
Summary Support Tyk OAS with Open Policy Agent
Type Story Story
Status In Dev
Points N/A
Labels 5.3PreCF, codilime_refined, jira_escalated

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

PR Type

Enhancement, Tests


Description

  • Extend OAS validation with options

  • Add AllowRootFields schema modifier

  • Introduce failable options builder

  • Add tests for new behavior


Diagram Walkthrough

flowchart LR
  validate["ValidateOASObject opts"] -- "applyOptions" --> apply["applyOptions(schema, opts)"]
  apply -- "FailableOptions.Build" --> optionBuild["Build(copy of schema)"]
  optionBuild -- "AllowRootFields" --> schemaMod["Modify schema properties"]
  schemaMod -- "validateJSON" --> result["JSON Schema validation"]
Loading

File Walkthrough

Relevant files
Enhancement
validator.go
Pluggable OAS schema modifiers and root field allowance   

apidef/oas/validator.go

  • Add failable options to validation flow
  • Implement schema copy and applyOptions
  • Introduce AllowRootFields to extend root properties
  • Wire option handling into ValidateOASObject
+50/-1   
option.go
Failable options support with error-aware builder               

common/option/option.go

  • Add FailableOptions type and constructor
  • Implement Build with error propagation
+16/-0   
Tests
validator_test.go
Tests for optional root field allowances in OAS                   

apidef/oas/validator_test.go

  • Add tests for AllowRootFields behavior
  • Add helper to patch default valid OAS
  • Import json-patch and json for test utilities
+41/-0   
Dependencies
go.mod
Add json-patch dependency for tests                                           

go.mod

  • Add dependency on github.com/evanphx/json-patch
+1/-0     

@buger
Copy link
Member

buger commented Sep 23, 2025

I'm a bot and I 👍 this PR title. 🤖

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In AllowRootFields, the condition uses both jsonparser.NotExist and errors.Is(err, jsonparser.KeyPathNotFoundError). jsonparser.Get returns typ==NotExist with err==KeyPathNotFoundError; combining both may miss or mis-handle other error cases (e.g., malformed JSON). Consider explicitly checking for KeyPathNotFoundError first and treating any other non-nil error as fatal to avoid silently allowing fields on invalid schemas.

for _, field := range fields {
	_, typ, _, err := jsonparser.Get(schema, keyProperties, field)

	if typ == jsonparser.NotExist && errors.Is(err, jsonparser.KeyPathNotFoundError) {
		if schema, err = jsonparser.Set(schema, []byte(`{}`), keyProperties, field); err != nil {
			return err
		}
	} else if err != nil {
		return err
	}
}
Naming/Comment

The comment above copying the schema says "Coping prevents mutation schema" which appears to be a typo and slightly unclear. Consider "Copying prevents mutating the original schema" for clarity.

// Should be copied because schema is allocated on heap.
// Coping prevents mutation schema.
var copiedSchema = make([]byte, len(schema))
copy(copiedSchema, schema)
API Consistency

NewFailable mirrors New, but Options.Build returns *O while FailableOptions.Build returns (*O, error). Consider documenting the intended usage and thread-safety expectations, and ensure nil-safety when callers ignore returned pointer. Also consider accepting variadic opts for parity with common option patterns.

func NewFailable[O any](opts []FailableOption[O]) FailableOptions[O] {
	return opts
}

func (o FailableOptions[O]) Build(baseVal O) (*O, error) {
	for _, apply := range o {
		if err := apply(&baseVal); err != nil {
			return nil, err
		}
	}

	return &baseVal, nil
}

Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Correct missing-field detection logic

Tighten the condition to only treat a missing field as addable when the error is
specifically KeyPathNotFound, regardless of type value. This avoids silently
succeeding if typ is NotExist but the error is different (e.g., malformed schema
path).

apidef/oas/validator.go [232-252]

 func AllowRootFields(fields ...string) option.FailableOption[jsonSchema] {
 	return func(s *jsonSchema) error {
-		var schema = *s
+		schema := *s
 
 		for _, field := range fields {
-			_, typ, _, err := jsonparser.Get(schema, keyProperties, field)
-
-			if typ == jsonparser.NotExist && errors.Is(err, jsonparser.KeyPathNotFoundError) {
-				if schema, err = jsonparser.Set(schema, []byte(`{}`), keyProperties, field); err != nil {
-					return err
+			_, _, _, err := jsonparser.Get(schema, keyProperties, field)
+			if errors.Is(err, jsonparser.KeyPathNotFoundError) {
+				var setErr error
+				schema, setErr = jsonparser.Set(schema, []byte(`{}`), keyProperties, field)
+				if setErr != nil {
+					return setErr
 				}
-			} else if err != nil {
+				continue
+			}
+			if err != nil {
 				return err
 			}
 		}
 
 		*s = schema
 		return nil
 	}
 }
Suggestion importance[1-10]: 6

__

Why: Tightening the condition to rely solely on KeyPathNotFoundError avoids false positives and is consistent with intended logic, offering a modest robustness improvement.

Low
General
Prevent schema aliasing and clarify comment

Fix the typo in the comment and ensure we never return a nil slice on error paths by
preserving the defensive copy. Also, simplify by returning the built value directly
to avoid potential accidental aliasing issues.

apidef/oas/validator.go [147-164]

 func applyOptions(schema []byte, opts []option.FailableOption[jsonSchema]) ([]byte, error) {
 	if len(opts) == 0 {
 		return schema, nil
 	}
 
-	// Should be copied because schema is allocated on heap.
-	// Coping prevents mutation schema.
-	var copiedSchema = make([]byte, len(schema))
+	// Should be copied because schema may be shared; copying prevents mutation of the original.
+	copiedSchema := make([]byte, len(schema))
 	copy(copiedSchema, schema)
 
 	modifiedSchema, err := option.NewFailable(opts).Build(copiedSchema)
-
 	if err != nil {
 		return nil, err
 	}
-
 	return *modifiedSchema, nil
 }
Suggestion importance[1-10]: 4

__

Why: The change mainly fixes a comment typo and retains the existing safe-copy/build pattern without functional alteration; it's correct but low impact.

Low
Correct test case wording

Fix the repeated “filed” typo in test names to “field” to prevent confusion and
improve test readability. Clear naming helps future maintenance and triaging.

apidef/oas/validator_test.go [301-318]

-t.Run("fails if addition filed exist but not allowed", func(t *testing.T) {
+t.Run("fails if additional field exists but not allowed", func(t *testing.T) {
 	spec := extendDefaultValidOas(t, `{"additional": true}`)
 	err := ValidateOASObject(spec, oasVersion)
 	require.Error(t, err)
 })
 
-t.Run("does not fail if addition filed is allowed", func(t *testing.T) {
+t.Run("does not fail if additional field is allowed", func(t *testing.T) {
 	spec := extendDefaultValidOas(t, `{"additional": true}`)
 	err := ValidateOASObject(spec, oasVersion, AllowRootFields("additional"))
 	require.NoError(t, err)
 })
 
 t.Run("does not allow to override default validation rules", func(t *testing.T) {
 	spec := extendDefaultValidOas(t, `{"paths": true}`)
 	err := ValidateOASObject(spec, oasVersion, AllowRootFields("paths"))
 	require.Error(t, err)
 })
Suggestion importance[1-10]: 3

__

Why: Renaming the test descriptions improves readability but does not affect functionality; it's accurate yet minor in impact.

Low

Copy link
Contributor

github-actions bot commented Sep 23, 2025

API Changes

--- prev.txt	2025-09-24 09:40:44.227026989 +0000
+++ current.txt	2025-09-24 09:40:34.803947110 +0000
@@ -5664,8 +5664,6 @@
 
 TYPES
 
-type FailableOption[O any] func(*O) error
-
 type Option[O any] func(*O)
 
 type Options[O any] []Option[O]
@@ -12552,6 +12550,16 @@
 	// Has unexported fields.
 }
 
+# Package: ./pkg/utils
+
+package utils // import "github.com/TykTechnologies/tyk/pkg/utils"
+
+
+FUNCTIONS
+
+func Must[T any](val T, err error) T
+    Must panics if errors is not nil
+
 # Package: ./regexp
 
 package regexp // import "github.com/TykTechnologies/tyk/regexp"

Copy link

probelabs bot commented Sep 23, 2025

🔍 Code Analysis Results

An analysis of the pull request is provided below.

Change Impact Analysis

What this PR accomplishes

This pull request introduces an extensible validation mechanism for Tyk OAS (OpenAPI Specification) API definitions. The primary goal is to allow custom, non-standard fields at the root of an API definition—such as configurations for Open Policy Agent (OPA)—without causing validation failures.

Previously, the OAS validator strictly enforced the official schema, rejecting any definition with unknown root-level properties. This change makes the validator flexible by allowing the schema to be dynamically and temporarily modified before validation occurs, effectively "whitelisting" specific custom fields.

Key technical changes introduced

  1. Extensible Validator with Options: A new function, ValidateOASObjectWithOptions, is introduced in apidef/oas/validator.go. It uses the functional options pattern to accept a variable number of Option arguments. These options configure a schemaModifier that alters the JSON schema used for validation.

  2. AllowRootFields Schema Modifier: A new option, AllowRootFields, is provided. This function takes a list of field names and modifies the validation schema to permit these fields at the root level. It does this by injecting empty object definitions ({}) for the specified fields into the schema's properties, ensuring they pass validation without overriding existing rules for standard fields.

  3. Performance-Optimized Schema Caching: To avoid the performance cost of regenerating modified schemas for every validation call, an LRU (Least Recently Used) cache, LruSchemaCache, has been implemented using github.com/hashicorp/golang-lru. The cache stores modified schemas using a key derived from the OAS version and the sorted list of allowed custom fields, ensuring that schema generation only happens once for a given configuration.

  4. New Utility Function: A generic helper function, utils.Must, has been added to pkg/utils/utils.go to panic on errors during initialization. This is used to simplify the setup of the global defaultSchemaCache, ensuring the application fails fast if the cache cannot be created.

Affected system components

  • API Definition Validation Logic: This is the most directly impacted component. The core logic for validating OAS definitions within the Tyk Gateway is now more flexible, affecting how APIs are loaded on startup and during hot reloads.
  • API Definition Lifecycle Management: Any system that loads, validates, and saves API definitions is affected. This includes the Tyk Gateway, the Tyk Dashboard when importing or editing APIs, and any CI/CD tooling that interacts with the management API.
  • Downstream Ecosystem Tooling (Critical Impact): The change alters the fundamental structure of a valid Tyk OAS API definition. As a result, downstream tools that rely on a strict schema will be impacted:
    • tyk-operator: The Kubernetes operator's Custom Resource Definition (CRD) for API definitions will need to be updated to allow these new fields.
    • tyk-sync / MDCB: The data synchronization utility will need to be updated to recognize and propagate the new fields across data centers.
    • Tyk Developer Portal: The portal's API management features must be updated to handle, persist, and not strip out these custom fields when APIs are managed through its interface.

Architecture Visualization

The following diagram illustrates the new, extensible validation workflow, highlighting the logic for applying schema modifications and using the caching layer.

graph TD
    A[Start: ValidateOASObjectWithOptions] --> B{Has Options?};
    B -- No --> C[Get Base Schema];
    B -- Yes --> D[Build Schema Modifier];
    D --> E{Is Modified Schema in Cache?};
    E -- No --> F[Copy Base Schema];
    F --> G[Apply Modifiers e.g., AllowRootFields];
    G --> H[Store Modified Schema in Cache];
    H --> I[Use Modified Schema];
    E -- Yes --> J[Get Schema from Cache];
    J --> I;
    C --> K[Validate Document with Schema];
    I --> K;
    K --> L[End: Return Validation Result];
Loading

Powered by Visor from Probelabs

Last updated: 2025-09-24T09:49:42.266Z | Triggered by: synchronize | Commit: 9e0525b

Copy link

probelabs bot commented Sep 23, 2025

🔍 Code Analysis Results

Security Issues (2)

Severity Location Issue
🟡 Warning apidef/oas/validator.go:261-285
The schema cache is vulnerable to resource exhaustion. The cache key is generated from user-controllable input (the list of allowed fields), allowing an attacker to flood the cache with unique entries. This forces expensive schema regeneration for each validation, leading to a potential Denial of Service through CPU and memory exhaustion.
💡 SuggestionTo mitigate the risk of cache exhaustion, limit the number and length of fields that can be passed to `AllowRootFields`. Additionally, consider optimizing the schema modification to be a single operation (e.g., using a JSON merge patch) to reduce the performance impact of a cache miss.
🟡 Warning pkg/utils/utils.go:4-10
The new generic helper `utils.Must` panics on any error. Adding a generic panic-inducing helper to a shared `utils` package is risky. It can be easily misused in request-handling logic where panics are not expected, leading to service crashes and a Denial of Service.
💡 SuggestionTo reduce the risk of misuse, either make the function private to the `apidef/oas` package where it is used, or rename it to be more specific to its intended purpose, such as `utils.MustInit`, to clarify it should only be used for application initialization.

Performance Issues (3)

Severity Location Issue
🟢 Info pkg/utils/utils.go:4-10
The new generic helper `utils.Must` panics on any non-nil error. While its current use for initializing a global cache at startup is an acceptable fail-fast pattern, placing a generic panic-inducing helper in a shared `utils` package is risky. It could be inadvertently used in request-handling logic where panics are not expected and would crash the service.
💡 SuggestionTo improve code safety and prevent misuse, limit the scope of this function. Either make it a private helper within the `apidef/oas` package where it is used or rename it to be more specific to its intended purpose, such as `utils.MustInit`. This makes it clearer to other developers that it should only be used for application initialization where a panic is the desired outcome on failure.
🟡 Warning apidef/oas/validator.go:270-281
The schema is modified iteratively within a `for` loop by calling `jsonparser.Set` for each allowed field. Each call can reallocate and copy the entire schema byte slice, leading to increased CPU and memory usage during cache misses, especially with large schemas or many fields.
💡 SuggestionTo optimize performance, consolidate all schema modifications into a single operation. Construct a JSON merge patch containing all the new properties to be added and apply it once. The `github.com/evanphx/json-patch` library, which is already a test dependency, could be used for this purpose.
🟡 Warning apidef/oas/validator.go:315-328
The current implementation of the LRU cache does not protect against concurrent executions of the schema creation function for the same cache key. If multiple goroutines request validation for the same new (uncached) schema configuration simultaneously, they will all miss the cache and independently execute the expensive schema generation logic. This "thundering herd" problem leads to redundant work and spikes in CPU and memory usage.
💡 SuggestionTo prevent this, ensure that the schema creation function is executed only once per key for concurrent requests. The `golang.org/x/sync/singleflight` package is the idiomatic solution in Go for this problem. By wrapping the `create` call in a `singleflight.Group`, you can ensure that for a given key, the first request executes the function while subsequent concurrent requests for the same key wait for and share its result.

Quality Issues (2)

Severity Location Issue
🟡 Warning pkg/utils/utils.go:4-10
The new generic helper `utils.Must` panics on any error. Placing a generic panic-inducing function in a shared `utils` package is risky as it can be misused in request-handling logic where panics are not expected, leading to service crashes.
💡 SuggestionTo reduce the risk of misuse, rename the function to be more specific to its intended purpose, such as `utils.MustInit`, to clarify it should only be used for application initialization where a fail-fast approach is appropriate.
🟡 Warning apidef/oas/schema/3.0.json:52
Setting `"additionalProperties": true` in the base OAS schema weakens validation by default. While the goal is to allow custom fields via options, the base schema should remain strict to enforce the principle of least privilege and prevent unintended fields from being accepted if the custom validation logic is bypassed.
💡 SuggestionRevert this change to `"additionalProperties": false`. The schema modification logic should dynamically allow specific fields on a copy of the strict base schema before validation, which aligns with the overall design goal without weakening the default contract.

Style Issues (2)

Severity Location Issue
🟢 Info pkg/utils/utils_test.go:15
There is a minor typo in the test case name. It reads `"panics is err is not nil"`.
💡 SuggestionCorrect the typo to `"panics if err is not nil"` to improve the readability and clarity of the test suite.
🟡 Warning pkg/utils/utils.go:4
The new generic helper `utils.Must` panics on error. Adding a generic panic-inducing helper to a shared `utils` package is risky as it can be easily misused in request-handling logic where panics should be avoided, leading to service crashes.
💡 SuggestionTo reduce the risk of misuse, consider making the function more specific to its intended use case by renaming it (e.g., `utils.MustInit`). This would prevent its accidental use in contexts where panicking is not appropriate.

Dependency Issues (4)

Severity Location Issue
🔴 Critical apidef/oas/schema/3.0.json:52
The change setting `additionalProperties` to `true` in the OAS API definition schema allows for custom root-level fields. This will break the `tyk-operator`, as its Custom Resource Definition (CRD) for `ApiDefinition` enforces a strict schema. Any API definition containing custom fields will be rejected by the Kubernetes API server, preventing deployment.
💡 SuggestionA corresponding pull request must be opened against the `tyk-operator` repository to update its `ApiDefinition` CRD schema. The schema should be relaxed, for example by adding `x-kubernetes-preserve-unknown-fields: true` to the root of the schema definition, to allow for these custom fields. This PR should be blocked until the downstream dependency is addressed and linked.
🔴 Critical apidef/oas/schema/3.0.json:52
Allowing custom fields in the OAS API definition will likely cause data loss in the Tyk Developer Portal. The Portal's backend is probably not designed to handle unknown fields and will strip them when an API definition is saved, silently breaking any feature (like OPA) that relies on them.
💡 SuggestionThe Portal's data models and API management logic must be updated to preserve all fields within the API definition, including unknown ones. A pull request must be raised and merged in the `tyk-portal` repository to address this before this change is released.
🔴 Critical apidef/oas/schema/3.0.json:52
The change to the API definition structure will break `tyk-sink` (MDCB). During synchronization between data centers, `tyk-sink` will strip any unrecognized custom fields because its internal data model is based on the old, strict schema. This will cause configuration drift and break functionality in replica data centers.
💡 SuggestionThe internal API definition model in `tyk-sink` must be updated to support and preserve all fields. A pull request must be raised and merged in the `tyk-sink` repository to ensure configurations are synchronized correctly.
🟡 Warning go.mod:1
This feature will be included in a new Tyk Gateway release, which will produce a new Docker image. To make this functionality available to users deploying via Helm, the `tyk-charts` repository will need to be updated.
💡 SuggestionEnsure that updating the gateway image tag in the relevant `tyk-charts` (e.g., `tyk-headless`, `tyk-hybrid`) is part of the release checklist for this feature.

Connectivity Issues (2)

Severity Location Issue
🔴 Critical apidef/oas/schema/3.0.json:52
The change to allow `additionalProperties` in OAS API definitions introduces a significant risk to data integrity in MDCB deployments. API definitions are synchronized via RPC, and if the `tyk-sink` (MDCB) component's data model for API definitions is strict, any custom fields will be stripped during unmarshaling. This will cause the custom configuration (e.g., for OPA) to be lost during synchronization, leading to inconsistent API behavior across the gateway cluster.
💡 SuggestionThe RPC message format and the corresponding data models in both Tyk Gateway and `tyk-sink` must be verified to preserve unknown fields during JSON serialization and deserialization. This typically involves using a `map[string]interface{}` field to capture extra properties. A corresponding change in `tyk-sink` is required, and this PR should be blocked until that is addressed.
🟠 Error apidef/oas/schema/3.0.json:52
While Redis can store API definitions with additional fields, the `MdcbStorage` layer, which acts as a wrapper around local Redis and RPC calls, could introduce inconsistencies. If an API definition is fetched from MDCB via RPC and the custom fields are stripped (as per the RPC issue), the stripped version will be cached in the local Redis, overriding the correct definition. This would cause the feature to fail even on the gateway that originally received the correct configuration.
💡 SuggestionEnsure that the entire data flow through the `MdcbStorage` handler preserves the integrity of the API definition. The logic must guarantee that what is fetched via RPC is a faithful representation of the source, and what is cached in Redis is identical to what was fetched.

Powered by Visor from Probelabs

Last updated: 2025-09-24T09:49:43.610Z | Triggered by: synchronize | Commit: 9e0525b

@shults shults force-pushed the TT-10073-support-tyk-oas-with-open-policy-agent branch from ed48bad to 9e0525b Compare September 24, 2025 09:42
Copy link

@shults shults closed this Sep 29, 2025
@shults shults deleted the TT-10073-support-tyk-oas-with-open-policy-agent branch September 29, 2025 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants