Skip to content

UN-3349 [FIX] Sanitize SQL identifiers in database connectors#1872

Merged
kirtimanmishrazipstack merged 9 commits intomainfrom
fix/UN-3349-sanitize-connector-inputs
Mar 31, 2026
Merged

UN-3349 [FIX] Sanitize SQL identifiers in database connectors#1872
kirtimanmishrazipstack merged 9 commits intomainfrom
fix/UN-3349-sanitize-connector-inputs

Conversation

@muhammad-ali-e
Copy link
Copy Markdown
Contributor

@muhammad-ali-e muhammad-ali-e commented Mar 24, 2026

What

  • Add sql_safety.py utility for SQL identifier validation and quoting across all 8 database connectors
  • Parameterize dynamic values in metadata queries instead of f-string interpolation
  • Parameterize token lookup queries in prompt-service and x2text-service
  • Clean up error messages to avoid exposing query internals
  • Add 26 unit tests for the new sql_safety module

Why

  • Connector configuration values (schema, table, column names) were interpolated directly into SQL via f-strings with no sanitization or quoting (UN-3349)
  • Defense-in-depth: validate identifiers with allowlist regex + quote with DB-specific styles + parameterize WHERE clause values

How

  • New sql_safety.py module: validate_identifier() (allowlist), quote_identifier() (DB-specific), safe_identifier() (combined), QuoteStyle enum
  • Base class unstract_db.py: added get_quote_style() abstract method, parameterized execute(), updated get_information_schema(), create_table_query(), get_sql_insert_query()
  • Each connector implements get_quote_style() and uses safe identifiers in DDL/DML/metadata queries
  • PostgreSQL uses psycopg2.sql.Identifier() for SET search_path
  • Auth services use %s parameterized queries (pattern already existed in same files)

Can this PR break any existing features?

  • No. Quoting valid identifiers is a no-op in SQL ("my_table" behaves identically to my_table). Parameterized queries return identical results. The only behavioral change is that identifiers containing SQL metacharacters are now rejected with ValueError.
  • get_sql_insert_query changed from @staticmethod to instance method — all callers already invoke it on instances (db_class.get_sql_insert_query(...)), so this is backward compatible.

Database Migrations

  • None

Env Config

  • None

Related Issues or PRs

  • UN-3349

Dependencies Versions

  • None

Notes on Testing

  • Sanity tested on
    • Postgresql
    • Mysql
    • Mssql
    • Redshift
    • Mariadb
    • Bigquery
  • 26 unit tests added for sql_safety.py covering validation, quoting, and real-world payloads
  • Pre-commit hooks pass on all 13 changed files
  • Manual verification: connector test and workflow execution should work as before

Checklist

  • I have added an appropriate PR title and description
  • I have read and understood the Contribution Guidelines
  • My code follows the style guidelines of this project
  • I have solved all the sonar issues
  • I have performed a self-review of my code
  • I have commented on my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

muhammad-ali-e and others added 2 commits March 24, 2026 14:48
Add defense-in-depth SQL injection prevention: validate identifiers
against allowlist regex, quote with DB-specific styles, and parameterize
WHERE clause values. Fixes GHSA-8mj6-f7hp-pwp9 (CVSS 8.8).

- New sql_safety.py utility: validate_identifier(), quote_identifier(),
  safe_identifier() with QuoteStyle enum (DOUBLE_QUOTE, BACKTICK,
  SQUARE_BRACKET)
- Fix all 8 connectors: PostgreSQL, Redshift, MSSQL, MySQL, MariaDB,
  Oracle, Snowflake, BigQuery
- Parameterize auth token queries in prompt-service and x2text-service
- Fix Snowflake error message leaking SQL query to HTTP response
- Add get_quote_style() abstract method to UnstractDB base class
- Add params support to UnstractDB.execute()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
26 tests covering validate_identifier, quote_identifier, and
safe_identifier across all QuoteStyle variants. Tests include
real-world injection payloads from the security advisory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8de8b777-53ea-4b69-8dfe-89ff980b2bf9

📥 Commits

Reviewing files that changed from the base of the PR and between 228c129 and 5260b6c.

📒 Files selected for processing (1)
  • unstract/connectors/src/unstract/connectors/databases/sql_safety.py
✅ Files skipped from review due to trivial changes (1)
  • unstract/connectors/src/unstract/connectors/databases/sql_safety.py

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Enhanced SQL security by converting string interpolation to parameterized queries, significantly reducing SQL injection vulnerabilities in authentication and database connector operations.
    • Improved support for special characters in table and column identifiers with consistent, database-specific quoting conventions across all supported database systems (BigQuery, MariaDB, MSSQL, MySQL, Oracle, PostgreSQL, Redshift, Snowflake).

Walkthrough

Introduces a sql_safety module and applies identifier validation/quoting plus parameterized query support across database connectors and authentication code; updates UnstractDB execute/insert APIs and adds tests for identifier safety.

Changes

Cohort / File(s) Summary
SQL Safety Core & Base DB
unstract/connectors/src/unstract/connectors/databases/sql_safety.py, unstract/connectors/src/unstract/connectors/databases/unstract_db.py
Added QuoteStyle, validate_identifier(), quote_identifier(), safe_identifier(); UnstractDB gains abstract get_quote_style(), execute(..., params=None) and uses identifier validation/quoting in SQL generation (including converting insert helper to instance method).
BigQuery
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py
Added get_quote_style(); execute() accepts params; replaced raw identifiers with safe_identifier(..., allow_dots=True) in CREATE/ALTER/INSERT; get_information_schema() validates identifiers and uses BigQuery parameter binding.
MySQL / MariaDB
unstract/connectors/src/unstract/connectors/databases/mysql/mysql.py, unstract/connectors/src/unstract/connectors/databases/mariadb/mariadb.py
Added get_quote_style() (BACKTICK); CREATE/ALTER/INSERT now use safe_identifier(); information_schema queries converted to parameterized queries using params.
PostgreSQL / Redshift
unstract/connectors/src/unstract/connectors/databases/postgresql/postgresql.py, unstract/connectors/src/unstract/connectors/databases/redshift/redshift.py
Added get_quote_style() (DOUBLE_QUOTE); schema validated via validate_identifier(); replaced ad-hoc quoting with safe_identifier() for DDL/DML and migrations.
MSSQL
unstract/connectors/src/unstract/connectors/databases/mssql/mssql.py
Added get_quote_style() (SQUARE_BRACKET); safely quoted schema/table/column identifiers (including dot-qualified names) for CREATE/ALTER; information_schema lookups use parameterized execution.
Oracle
unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py
Added get_quote_style() (DOUBLE_QUOTE); switched to safe_identifier()/validate_identifier() for DDL/DML; get_sql_insert_query() became instance method; information_schema lookup parameterized.
Snowflake
unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py
Added get_quote_style() (DOUBLE_QUOTE); applied safe_identifier() to CREATE/ALTER/INSERT and information schema calls; added column identifier validation and simplified programming error detail.
Auth: prompt-service & x2text-service
prompt-service/src/unstract/prompt_service/helpers/auth.py, x2text-service/app/authentication_middleware.py
Replaced string-interpolated bearer-token SQL with parameterized queries (WHERE ... = %s) and passed token/organization as bound parameters.
Tests
unstract/connectors/tests/databases/test_sql_safety.py
New unit tests for validate_identifier(), quote_identifier(), and safe_identifier() covering quote styles, escaping, dot-qualified identifiers, hyphens, and invalid/injection inputs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'UN-3349 [FIX] Sanitize SQL identifiers in database connectors' clearly and concisely describes the main change: adding SQL identifier sanitization across database connectors.
Description check ✅ Passed The description comprehensively covers all template sections: What (new utilities and parameterization), Why (security rationale), How (implementation approach), breaking changes analysis, testing notes, and checklist completion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/UN-3349-sanitize-connector-inputs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR adds a defense-in-depth SQL injection fix across all 8 database connectors: a new sql_safety.py module provides allowlist validation + DB-specific identifier quoting (safe_identifier), DDL/DML queries are updated to use quoted identifiers, and token lookup queries in prompt-service and x2text-service are parameterized. The changes are broadly correct and well-tested (26 unit tests), with prior review concerns addressed.

Key findings:

  • P1 — Hyphen allowed in _IDENTIFIER_PATTERN but unsafe in unquoted column-name paths: _IDENTIFIER_PATTERN permits hyphens so that table/schema names like my-project pass validation. However, several code paths call only validate_identifier (no quoting) on column names and embed them raw in SQL — specifically create_table_query and the base-class/Snowflake/Oracle get_sql_insert_query. A workflow field named my-field would pass validation but produce malformed SQL. Removing hyphens from _IDENTIFIER_PATTERN would close the gap.
  • P2 — BigQuery execute silently drops params: BigQuery overrides execute but ignores the params argument without raising an error. No current breakage since BigQuery overrides all parameterized paths, but it violates the base-class contract.

Confidence Score: 4/5

Safe to merge with one targeted fix: the hyphen allowance in _IDENTIFIER_PATTERN can produce runtime SQL errors for column names with hyphens in the unquoted code paths.

Prior P0/P1 concerns from earlier review rounds are resolved. One new P1 remains: _IDENTIFIER_PATTERN allows hyphens, which are fine for quoted table/schema identifiers but cause SQL syntax errors when the same validation is used on unquoted column names. Removing the hyphen from the pattern is a one-line fix that should happen before merge.

sql_safety.py (_IDENTIFIER_PATTERN definition) and unstract_db.py (create_table_query, base get_sql_insert_query) are the two files that need attention.

Important Files Changed

Filename Overview
unstract/connectors/src/unstract/connectors/databases/sql_safety.py New utility module: allowlist regex + DB-specific quoting + combined safe_identifier(); P1 issue: allowlist permits hyphens which are safe only for quoted identifiers but cause SQL errors in unquoted column-name paths
unstract/connectors/src/unstract/connectors/databases/unstract_db.py Base class: execute() gains params support, get_sql_insert_query() becomes instance method, column names validated-only (not quoted); unquoted hyphenated column names would cause SQL syntax errors
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py BigQuery: get_information_schema uses job_config parameterization, DDL uses safe_identifier with backtick; execute() override silently discards params arg (P2)
unstract/connectors/src/unstract/connectors/databases/postgresql/postgresql.py PostgreSQL: SET search_path uses psycopg2.sql.Identifier(), INSERT columns fully quoted via safe_identifier, no issues found
unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py Snowflake: table name quoted, column names validated-only (intentional for UPPERCASE normalization); hyphenated column names would fail SQL; error message sanitized
unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py Oracle: table name quoted, column names validate-only (intentional for UPPERCASE normalization); parameterized user_tab_columns query with named :table_name parameter
unstract/connectors/src/unstract/connectors/databases/mssql/mssql.py MSSQL: uses SQUARE_BRACKET quoting, IF NOT EXISTS WHERE literals use validated raw values (acknowledged in prior review, documented with comments)
unstract/connectors/src/unstract/connectors/databases/redshift/redshift.py Redshift: validate_identifier(self.schema) in get_engine() is safe because constructor defaults schema to 'public' and guards against empty values
unstract/connectors/src/unstract/connectors/databases/mysql/mysql.py MySQL: information_schema query parameterized with %s, DDL uses safe_identifier with backtick quoting, no issues found
unstract/connectors/src/unstract/connectors/databases/mariadb/mariadb.py MariaDB: identical improvements to MySQL, information_schema parameterized, DDL uses backtick quoting, no issues found
unstract/connectors/tests/databases/test_sql_safety.py 26 unit tests covering validation, quoting styles, real-world injection payloads, and dot-qualified names; good coverage of the new module
prompt-service/src/unstract/prompt_service/helpers/auth.py Token lookup queries parameterized with %s placeholders, eliminating direct string interpolation of user-supplied tokens
x2text-service/app/authentication_middleware.py Bearer token lookup query parameterized with (token,) tuple, same fix as prompt-service auth

Comments Outside Diff (1)

  1. unstract/connectors/src/unstract/connectors/databases/sql_safety.py, line 29 (link)

    Hyphens allowed by allowlist but unsafe in unquoted column-name paths

    _IDENTIFIER_PATTERN explicitly allows hyphens (-) so that table/schema names like my-project pass validation. However, several code paths call validate_identifier on column names and then embed them unquoted directly in SQL:

    • unstract_db.py create_table_query (line ~178): validate_identifier(key) then sql_query += f"{key} {sql_type}, "
    • unstract_db.py get_sql_insert_query (line ~202–204): validate_identifier(k) then keys_str = ",".join(sql_keys)
    • snowflake.py get_sql_insert_query (line ~350–352): same pattern
    • oracle_db.py get_sql_insert_query: same pattern

    If a workflow's prompt schema defines a field named my-field, the key reaches these paths, passes validate_identifier, and produces SQL like:

    CREATE TABLE … (my-field VARCHAR(…))   -- parsed as: my MINUS field VARCHAR(...)
    INSERT INTO … (my-field) VALUES (…)    -- same parse error

    This causes a hard SQL syntax error at runtime, while the validation layer gave a false green light.

    Root cause: the same allowlist is shared between identifiers that will always be quoted (table/schema names via safe_identifier) and identifiers that are intentionally left unquoted (column names, to preserve Snowflake/Oracle upper-case normalization).

    Possible fix: strip hyphens from _IDENTIFIER_PATTERN so that the single pattern is safe for both quoted and unquoted contexts:

    Alternatively, introduce a stricter _COLUMN_IDENTIFIER_PATTERN for unquoted column-name contexts and use it in the paths that skip quoting.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: unstract/connectors/src/unstract/connectors/databases/sql_safety.py
Line: 29

Comment:
**Hyphens allowed by allowlist but unsafe in unquoted column-name paths**

`_IDENTIFIER_PATTERN` explicitly allows hyphens (`-`) so that table/schema names like `my-project` pass validation. However, several code paths call `validate_identifier` on **column names** and then embed them **unquoted** directly in SQL:

- `unstract_db.py` `create_table_query` (line ~178): `validate_identifier(key)` then `sql_query += f"{key} {sql_type}, "`
- `unstract_db.py` `get_sql_insert_query` (line ~202–204): `validate_identifier(k)` then `keys_str = ",".join(sql_keys)`
- `snowflake.py` `get_sql_insert_query` (line ~350–352): same pattern
- `oracle_db.py` `get_sql_insert_query`: same pattern

If a workflow's prompt schema defines a field named `my-field`, the key reaches these paths, passes `validate_identifier`, and produces SQL like:

```sql
CREATE TABLE … (my-field VARCHAR(…))   -- parsed as: my MINUS field VARCHAR(...)
INSERT INTO … (my-field) VALUES (…)    -- same parse error
```

This causes a hard SQL syntax error at runtime, while the validation layer gave a false green light.

**Root cause:** the same allowlist is shared between identifiers that will always be quoted (table/schema names via `safe_identifier`) and identifiers that are intentionally left unquoted (column names, to preserve Snowflake/Oracle upper-case normalization).

**Possible fix:** strip hyphens from `_IDENTIFIER_PATTERN` so that the single pattern is safe for both quoted and unquoted contexts:

```suggestion
_IDENTIFIER_PATTERN = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]*$")
```

Alternatively, introduce a stricter `_COLUMN_IDENTIFIER_PATTERN` for unquoted column-name contexts and use it in the paths that skip quoting.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py
Line: 129-134

Comment:
**`execute` silently drops `params` argument**

The base class `execute` signature was updated to accept `params` and the base-class `get_information_schema` now always passes `params=(table_name,)`. BigQuery overrides `get_information_schema`, so that particular call path is safe today. However, BigQuery's `execute` override accepts `params` and silently discards it:

```python
def execute(self, query: str, params: Any = None) -> Any:
    try:
        query_job = self.get_engine().query(query)   # params never forwarded
        return query_job.result()
```

Any future caller that does `self.execute(query, params=(...))` on a BigQuery instance will receive incorrect results without any error. The correct behavior is to either forward params via a `QueryJobConfig`, or raise `NotImplementedError` if called with params — rather than silently ignoring them.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (7): Last reviewed commit: "Merge branch 'main' into fix/UN-3349-san..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
prompt-service/src/unstract/prompt_service/helpers/auth.py (1)

29-40: ⚠️ Potential issue | 🟡 Minor

Consider not logging bearer tokens in error messages.

Similar to the x2text-service, the error messages log the actual bearer token value (lines 29, 35-36, 40), which could expose sensitive credentials in log files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompt-service/src/unstract/prompt_service/helpers/auth.py` around lines 29 -
40, The error logs in helpers/auth.py currently print the full bearer token
(variable token) in multiple app.logger.error calls (the checks around
platform_key, is_active, and authentication failure); update these logger calls
to avoid emitting the raw token by either removing the token from the message or
logging a sanitized version (e.g., mask all but last 4 chars or log a hashed
representation) and keep contextual text (e.g., "Authentication failed. bearer
token not found" / "Token is not active" / "Authentication failed. Invalid
bearer token") without the plaintext credential; change the three
app.logger.error invocations that reference token to use the sanitized variable
(e.g., sanitized_token or token_hash) or omit the token entirely.
x2text-service/app/authentication_middleware.py (1)

36-51: ⚠️ Potential issue | 🟡 Minor

Consider not logging bearer tokens in error messages.

The error messages log the actual bearer token value (lines 37, 44-45, 50), which could expose sensitive credentials in log files. Consider logging only a truncated/masked version or omitting the token entirely.

🛡️ Suggested approach
             if not result_row or len(result_row) == 0:
                 current_app.logger.error(
-                    f"Authentication failed. bearer token not found {token}"
+                    "Authentication failed. bearer token not found"
                 )
                 return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@x2text-service/app/authentication_middleware.py` around lines 36 - 51, The
current_app.logger.error calls in authentication_middleware.py are logging the
full bearer token (references: token, platform_key, is_active, result_row) —
change those three error logs (the "bearer token not found", "Token is not
active", and "Invalid bearer token" messages which currently interpolate
{token}) to avoid emitting the raw token: either omit the token entirely or
replace it with a masked/truncated value (e.g., show only last 4 chars or fixed
number of asterisks) before passing into current_app.logger.error, so all log
statements no longer contain the full sensitive token string.
unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py (2)

152-153: ⚠️ Potential issue | 🟠 Major

Stop logging raw SQL text and bound values.

The exception detail is sanitized now, but these debug/error logs still dump the full statement and row payloads. That leaks query internals back into logs and undercuts the security goal of this change.

Also applies to: 171-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py`
around lines 152 - 153, Remove the debug logs that print raw SQL and bound
parameters in the execute_query flow: do not log sql_query or sql_values
directly in SnowflakeConnector.execute_query (and corresponding debug/error
calls around lines 171-177). Instead log a non-sensitive fingerprint (e.g., hash
of sql_query), the query type or first N characters if needed, and a count of
bound parameters, or redact values entirely. Update any
logger.debug/logger.error calls in that function to use the
fingerprint/count/redacted message so no full SQL text or row payloads are
written to logs.

112-124: ⚠️ Potential issue | 🔴 Critical

Quote the fixed Snowflake columns consistently in DDL and DML statements.

get_create_table_base_query() creates built-in columns unquoted (e.g., created_by TEXT), which Snowflake stores as uppercase CREATED_BY. However, get_sql_insert_query() now references these columns with quoted lowercase identifiers (e.g., "created_by"), which Snowflake treats as a different column. Insert operations for rows including built-in columns will fail with "column not found" errors.

Apply one consistent convention across all DDL and DML statements: either quote all fixed column names (and normalize to uppercase), or leave all unquoted (and normalize identifiers before insertion).

Also applies to: 133-142, 346-348

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py`
around lines 112 - 124, get_create_table_base_query defines built-in columns
unquoted while get_sql_insert_query uses quoted lowercase identifiers, causing
Snowflake to treat them as different columns; make quoting consistent by either
quoting all built-in column names in get_create_table_base_query (use
safe_identifier/QuoteStyle.DOUBLE_QUOTE or explicit double quotes and normalize
names to lowercase in DML), or remove quotes from get_sql_insert_query and
normalize identifiers to uppercase so both DDL and DML reference the same
identifier form; update the column definitions in get_create_table_base_query
and the corresponding column references in get_sql_insert_query (and the related
blocks at the ranges mentioned) to use the same quoting/normalization approach.
unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py (1)

119-129: ⚠️ Potential issue | 🔴 Critical

Quoted table name with unquoted built-in columns breaks Oracle identifier semantics.

get_create_table_base_query() quotes the table identifier but leaves built-in columns unquoted, while get_sql_insert_query() quotes every column reference, and get_information_schema() uses UPPER(:table_name) to query the dictionary. In Oracle, unquoted identifiers normalize to uppercase at creation but are stored as-is when quoted. This creates a mismatch: unquoted columns in CREATE TABLE are stored uppercase (ID, CREATED_BY, etc.), but quoted columns in INSERT preserve case (lowercase "id", "created_by"), so subsequent INSERT operations will fail with ORA-00904 (column not found). Additionally, the schema lookup will fail for non-uppercase table names.

Normalize table and column names to uppercase before quoting, or keep all identifiers unquoted consistently.

Also applies to: 148-157, 173-183, 224-226

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py`
around lines 119 - 129, get_create_table_base_query() currently quotes the table
name via safe_identifier(..., QuoteStyle.DOUBLE_QUOTE) but emits unquoted column
names (which Oracle uppercases), while get_sql_insert_query() quotes every
column (preserving case), and get_information_schema() uses UPPER(:table_name),
causing identifier mismatches and ORA-00904. Fix by normalizing identifiers to
uppercase before quoting (or stop quoting all identifiers) so table and column
names are consistently created and referenced; update
get_create_table_base_query(), get_sql_insert_query(), and
get_information_schema() to call .upper() on table and column names prior to
passing to safe_identifier or to use unquoted identifiers consistently, and
ensure safe_identifier usage is uniform (QuoteStyle.DOUBLE_QUOTE) across these
functions so creation, insert and schema lookup use the same canonical form.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py`:
- Around line 129-132: The execute method currently ignores params; change it so
when params is provided you build a google.cloud.bigquery.QueryJobConfig with
query_parameters set (convert positional tuple/list into BigQuery
ScalarQueryParameter objects, or map dict keys to NamedQueryParameter objects)
and pass that as job_config to self.get_engine().query(query,
job_config=job_config) before returning query_job.result(); update execute and,
if necessary, add imports for QueryJobConfig and the appropriate Parameter
classes and handle the None/no-params path to keep current behavior.
- Around line 228-233: The BigQuery parameter names are being generated with
backticks (e.g., "@`{key}`") which is invalid for BigQuery named parameters;
update the code in the INSERT builder and the other occurrence (previously
around lines 279–281) to emit parameters as "@{key}" instead while keeping the
identifier validation via validate_identifier(key); specifically change the list
comprehensions/formatting that build escaped_params/parameter placeholders to
use f"@{key}" (or equivalent) instead of f"@`{key}`" so binding with
ScalarQueryParameter works.

In `@unstract/connectors/src/unstract/connectors/databases/unstract_db.py`:
- Around line 178-179: The function get_sql_insert_query currently uses an
implicit Optional by annotating sql_values: list[str] = None; update its type
hint to be explicit (e.g., sql_values: list[str] | None or sql_values:
Optional[list[str]]) and add the corresponding import (from typing import
Optional) if you choose Optional, so the parameter explicitly allows None;
adjust only the annotation for the sql_values parameter in get_sql_insert_query.

---

Outside diff comments:
In `@prompt-service/src/unstract/prompt_service/helpers/auth.py`:
- Around line 29-40: The error logs in helpers/auth.py currently print the full
bearer token (variable token) in multiple app.logger.error calls (the checks
around platform_key, is_active, and authentication failure); update these logger
calls to avoid emitting the raw token by either removing the token from the
message or logging a sanitized version (e.g., mask all but last 4 chars or log a
hashed representation) and keep contextual text (e.g., "Authentication failed.
bearer token not found" / "Token is not active" / "Authentication failed.
Invalid bearer token") without the plaintext credential; change the three
app.logger.error invocations that reference token to use the sanitized variable
(e.g., sanitized_token or token_hash) or omit the token entirely.

In
`@unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py`:
- Around line 119-129: get_create_table_base_query() currently quotes the table
name via safe_identifier(..., QuoteStyle.DOUBLE_QUOTE) but emits unquoted column
names (which Oracle uppercases), while get_sql_insert_query() quotes every
column (preserving case), and get_information_schema() uses UPPER(:table_name),
causing identifier mismatches and ORA-00904. Fix by normalizing identifiers to
uppercase before quoting (or stop quoting all identifiers) so table and column
names are consistently created and referenced; update
get_create_table_base_query(), get_sql_insert_query(), and
get_information_schema() to call .upper() on table and column names prior to
passing to safe_identifier or to use unquoted identifiers consistently, and
ensure safe_identifier usage is uniform (QuoteStyle.DOUBLE_QUOTE) across these
functions so creation, insert and schema lookup use the same canonical form.

In
`@unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py`:
- Around line 152-153: Remove the debug logs that print raw SQL and bound
parameters in the execute_query flow: do not log sql_query or sql_values
directly in SnowflakeConnector.execute_query (and corresponding debug/error
calls around lines 171-177). Instead log a non-sensitive fingerprint (e.g., hash
of sql_query), the query type or first N characters if needed, and a count of
bound parameters, or redact values entirely. Update any
logger.debug/logger.error calls in that function to use the
fingerprint/count/redacted message so no full SQL text or row payloads are
written to logs.
- Around line 112-124: get_create_table_base_query defines built-in columns
unquoted while get_sql_insert_query uses quoted lowercase identifiers, causing
Snowflake to treat them as different columns; make quoting consistent by either
quoting all built-in column names in get_create_table_base_query (use
safe_identifier/QuoteStyle.DOUBLE_QUOTE or explicit double quotes and normalize
names to lowercase in DML), or remove quotes from get_sql_insert_query and
normalize identifiers to uppercase so both DDL and DML reference the same
identifier form; update the column definitions in get_create_table_base_query
and the corresponding column references in get_sql_insert_query (and the related
blocks at the ranges mentioned) to use the same quoting/normalization approach.

In `@x2text-service/app/authentication_middleware.py`:
- Around line 36-51: The current_app.logger.error calls in
authentication_middleware.py are logging the full bearer token (references:
token, platform_key, is_active, result_row) — change those three error logs (the
"bearer token not found", "Token is not active", and "Invalid bearer token"
messages which currently interpolate {token}) to avoid emitting the raw token:
either omit the token entirely or replace it with a masked/truncated value
(e.g., show only last 4 chars or fixed number of asterisks) before passing into
current_app.logger.error, so all log statements no longer contain the full
sensitive token string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fde10716-6d33-425f-8f59-986c5edcf09d

📥 Commits

Reviewing files that changed from the base of the PR and between 7f51af2 and 54ef025.

📒 Files selected for processing (13)
  • prompt-service/src/unstract/prompt_service/helpers/auth.py
  • unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py
  • unstract/connectors/src/unstract/connectors/databases/mariadb/mariadb.py
  • unstract/connectors/src/unstract/connectors/databases/mssql/mssql.py
  • unstract/connectors/src/unstract/connectors/databases/mysql/mysql.py
  • unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py
  • unstract/connectors/src/unstract/connectors/databases/postgresql/postgresql.py
  • unstract/connectors/src/unstract/connectors/databases/redshift/redshift.py
  • unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py
  • unstract/connectors/src/unstract/connectors/databases/sql_safety.py
  • unstract/connectors/src/unstract/connectors/databases/unstract_db.py
  • unstract/connectors/tests/databases/test_sql_safety.py
  • x2text-service/app/authentication_middleware.py

…fix type hints

- Remove redundant validate_identifier() calls in BigQuery and Oracle
  (safe_identifier already validates internally)
- Fix type annotation: list[str] = None → list[str] | None = None
  in unstract_db.py and snowflake.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py (1)

330-384: Good security improvement with minor exception handling issue.

The get_information_schema method now properly:

  1. Validates project, dataset, and table via validate_identifier() before embedding in the schema path
  2. Uses parameterized query via QueryJobConfig with ScalarQueryParameter for the table_name filter

The static analysis S608 warning on lines 358-360 is a false positive - the identifiers are validated before being embedded.

However, the exception handling on lines 372-373 could be improved by chaining the exception:

♻️ Proposed improvement for exception chaining
         try:
             query_job = self.get_engine().query(query, job_config=job_config)
             results = query_job.result()
         except Exception as e:
-            raise ConnectorError(str(e))
+            raise ConnectorError(str(e)) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py`
around lines 330 - 384, In get_information_schema, the except block currently
raises ConnectorError with only str(e); change it to chain the original
exception so the traceback is preserved (use "raise ConnectorError(... ) from e"
or construct ConnectorError with the original exception) in the except Exception
as e handler where query_job = self.get_engine().query(...) is called; this
keeps the same message but attaches the original exception for debugging while
preserving existing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py`:
- Around line 330-384: In get_information_schema, the except block currently
raises ConnectorError with only str(e); change it to chain the original
exception so the traceback is preserved (use "raise ConnectorError(... ) from e"
or construct ConnectorError with the original exception) in the except Exception
as e handler where query_job = self.get_engine().query(...) is called; this
keeps the same message but attaches the original exception for debugging while
preserving existing behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26ccb316-eb6e-4724-8505-6d758e4baa4b

📥 Commits

Reviewing files that changed from the base of the PR and between 54ef025 and 1422ee4.

📒 Files selected for processing (4)
  • unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py
  • unstract/connectors/src/unstract/connectors/databases/oracle_db/oracle_db.py
  • unstract/connectors/src/unstract/connectors/databases/snowflake/snowflake.py
  • unstract/connectors/src/unstract/connectors/databases/unstract_db.py

…patibility

Snowflake and Oracle normalize unquoted identifiers to UPPERCASE.
Quoting column names (e.g., "created_by") preserves lowercase, causing
a mismatch with existing tables where columns are stored as CREATED_BY.

Changed approach: validate column names with allowlist regex (blocks
injection) but don't quote them, preserving the original unquoted
behavior. Table names remain quoted since they're quoted in both
CREATE and INSERT consistently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@muhammad-ali-e
Copy link
Copy Markdown
Contributor Author

Addressing outside-diff findings from CodeRabbit review (#3997809940):

Snowflake/Oracle column case mismatch (Critical) — Fixed in 228c129. Changed to validate-only for column names (no quoting) in Snowflake, Oracle, and the base class. Snowflake/Oracle normalize unquoted identifiers to UPPERCASE, so quoting lowercase column names would break existing tables. validate_identifier() still blocks all injection payloads. Table names remain quoted since CREATE and INSERT both quote them consistently.

Bearer tokens logged in error messages (Minor, prompt-service + x2text) — Pre-existing pattern outside our diff. Valid concern but out of scope for this security fix. Should be addressed in a follow-up.

Snowflake debug logs still dump raw SQL (Major) — Pre-existing logger.debug/logger.error calls outside our diff. We fixed the user-facing exception detail (stripped SQL from SnowflakeProgrammingException). Server-side debug logs are lower risk — should be addressed in a follow-up.

@athul-rs
Copy link
Copy Markdown
Contributor

Code review

Found 1 issue:

  1. BigQuery execute() accepts a params argument but silently ignores it. The method signature is execute(self, query: str, params: Any = None) but the implementation passes only query to self.get_engine().query(query) without forwarding params. Any caller relying on parameterized execution through this method will have their parameters silently dropped, defeating the SQL injection protection this PR aims to add. The base class execute() was updated to accept and forward params, but BigQuery's override does not honor this contract. By contrast, get_information_schema() in the same file correctly uses QueryJobConfig with query_parameters.

def execute(self, query: str, params: Any = None) -> Any:
try:
query_job = self.get_engine().query(query)
return query_job.result()

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@muhammad-ali-e
Copy link
Copy Markdown
Contributor Author

muhammad-ali-e commented Mar 27, 2026

@athul-rs Already addressed in this reply. BigQuery.execute() accepts params for LSP compliance with the base class signature, but it is never called with params — BigQuery overrides get_information_schema() with its own implementation using QueryJobConfig directly (lines 359-373). No caller in the codebase calls BigQuery.execute() with params.

Copy link
Copy Markdown
Contributor

@kirtimanmishrazipstack kirtimanmishrazipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review

Found 3 issues:

  1. MSSQL get_create_table_base_query still uses string interpolation in WHERE clause -- While safe_identifier() validates the values first (blocking metacharacters), the existence check still interpolates schema_name and table_name directly as string literals (WHERE TABLE_SCHEMA = '{schema_name}'). This is inconsistent with the defense-in-depth approach used in get_information_schema in the same file, which correctly uses %s parameterized queries. Consider parameterizing these, or adding a comment explaining why parameterization is not feasible in this DDL IF NOT EXISTS context.

f"IF NOT EXISTS ("
f"SELECT * FROM INFORMATION_SCHEMA.TABLES "
f"WHERE TABLE_SCHEMA = '{schema_name}' "
f"AND TABLE_NAME = '{table_name}'"
f")"
)
quoted_full_table = f"{safe_schema}.{safe_table}"
else:
# Handle unqualified table names (default to dbo schema)
safe_table = safe_identifier(table, QuoteStyle.SQUARE_BRACKET)
# table is validated by safe_identifier above
existence_check = (
f"IF NOT EXISTS ("
f"SELECT * FROM INFORMATION_SCHEMA.TABLES "
f"WHERE TABLE_SCHEMA = 'dbo' "
f"AND TABLE_NAME = '{table}'"
f")"

  1. Allowlist regex excludes $ and # without documentation -- The regex ^[a-zA-Z_][a-zA-Z0-9_-]*$ is a good secure default but intentionally excludes $ (valid in Oracle/PostgreSQL identifiers) and # (MSSQL temp tables). If any deployment uses these patterns, this change will break them. Consider documenting the intentional exclusions above the pattern so future maintainers understand the tradeoff.

# Allowlist patterns for SQL identifiers.
# Permits: letters, digits, underscores, hyphens (common in table names).
_IDENTIFIER_PATTERN = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_-]*$")

  1. BigQuery get_information_schema has redundant local import -- The method imports from google.cloud import bigquery as bq_module at line 362, but the class __init__ already imports and stores the module as self.bigquery (line 35). This creates a second import path. Consider using bq_module = self.bigquery instead to reuse the existing reference.

"table_name = @table_name"
)
from google.cloud import bigquery as bq_module
job_config = bq_module.QueryJobConfig(
query_parameters=[
bq_module.ScalarQueryParameter("table_name", "STRING", table)
]


Overall this is a solid security fix with good defense-in-depth design. The sql_safety.py module is clean, well-tested (26 tests), and the parameterization changes across connectors and auth services are correct.

Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@muhammad-ali-e
Copy link
Copy Markdown
Contributor Author

muhammad-ali-e commented Mar 27, 2026

@kirtimanmishrazipstack Thanks for the review and the positive feedback on the overall approach!

1. MSSQL WHERE clause interpolation — Already addressed in this reply. safe_identifier() is called before the WHERE clause, which runs validate_identifier() rejecting all SQL metacharacters. The IF NOT EXISTS ... CREATE TABLE is a compound DDL statement that cannot be split into a separate parameterized query. Comments in the code document this dependency.

2. Regex excludes $ and # — Intentional. Unstract connector table/schema names don't use $ or # in practice. Keeping the allowlist strict is the safer default for a security fix. Good suggestion on documenting it — will consider in a follow-up.

3. BigQuery redundant local import — This is intentional. The class-level comment says "DO NOT import Google API libraries at module level — these imports initialize gRPC state before Celery fork, causing SIGSEGV." The local import in get_information_schema follows this established pattern. Using self.bigquery would work since __init__ already loaded the module, but keeping the local import is consistent with the existing convention in this file (see execute_query() at line 249 which also does a local import).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Test Results

Summary
  • Runner Tests: 11 passed, 0 failed (11 total)
  • SDK1 Tests: 98 passed, 0 failed (98 total)

Runner Tests - Full Report
filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$
SDK1 Tests - Full Report
filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/patches/test\_litellm\_cohere\_timeout.py}}$$ $$\textcolor{#23d18b}{\tt{TestPatchedEmbeddingSyncTimeoutForwarding.test\_timeout\_passed\_to\_client\_post}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/patches/test\_litellm\_cohere\_timeout.py}}$$ $$\textcolor{#23d18b}{\tt{TestPatchedEmbeddingSyncTimeoutForwarding.test\_none\_timeout\_passed\_to\_client\_post}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/patches/test\_litellm\_cohere\_timeout.py}}$$ $$\textcolor{#23d18b}{\tt{TestPatchedEmbeddingSyncTimeoutForwarding.test\_httpx\_timeout\_object\_forwarded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/patches/test\_litellm\_cohere\_timeout.py}}$$ $$\textcolor{#23d18b}{\tt{TestMonkeyPatchApplied.test\_cohere\_handler\_patched}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/patches/test\_litellm\_cohere\_timeout.py}}$$ $$\textcolor{#23d18b}{\tt{TestMonkeyPatchApplied.test\_bedrock\_handler\_patched}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/patches/test\_litellm\_cohere\_timeout.py}}$$ $$\textcolor{#23d18b}{\tt{TestMonkeyPatchApplied.test\_patch\_module\_loaded\_via\_embedding\_import}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatFromLlm.test\_from\_llm\_reuses\_llm\_instance}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatFromLlm.test\_from\_llm\_returns\_llmcompat\_instance}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatFromLlm.test\_from\_llm\_sets\_model\_name}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatFromLlm.test\_from\_llm\_does\_not\_call\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_complete\_delegates\_to\_llm}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_chat\_delegates\_to\_llm\_complete}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_chat\_forwards\_kwargs\_to\_llm}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_complete\_forwards\_kwargs\_to\_llm}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_acomplete\_delegates\_to\_llm}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_achat\_delegates\_to\_llm\_acomplete}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_stream\_chat\_not\_implemented}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_stream\_complete\_not\_implemented}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_astream\_chat\_not\_implemented}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_astream\_complete\_not\_implemented}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_metadata\_returns\_emulated\_type}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_get\_model\_name\_delegates}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_get\_metrics\_delegates}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestLLMCompatDelegation.test\_test\_connection\_delegates}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestEmulatedTypes.test\_message\_role\_values}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestEmulatedTypes.test\_chat\_message\_defaults}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestEmulatedTypes.test\_chat\_response\_message\_access}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestEmulatedTypes.test\_completion\_response\_text}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestEmulatedTypes.test\_llm\_metadata\_defaults}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestMessagesToPrompt.test\_single\_user\_message}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestMessagesToPrompt.test\_none\_content\_becomes\_empty\_string}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestMessagesToPrompt.test\_preserves\_all\_messages}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestMessagesToPrompt.test\_multi\_turn\_conversation}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestMessagesToPrompt.test\_empty\_messages\_returns\_empty\_string}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_llm\_compat.py}}$$ $$\textcolor{#23d18b}{\tt{TestMessagesToPrompt.test\_string\_role\_fallback}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_non\_retryable\_http\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retryable\_http\_errors}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_post\_method\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_logging}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_retry\_on\_errors}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_wrapper\_methods\_retry}}$$ $$\textcolor{#23d18b}{\tt{4}}$$ $$\textcolor{#23d18b}{\tt{4}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_connection\_error\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_timeout\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_non\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_without\_response}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_non\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_other\_exception\_not\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_without\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_successful\_call\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_after\_transient\_failure}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_max\_retries\_exceeded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_with\_custom\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_no\_retry\_with\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_exception\_not\_in\_tuple\_not\_retried}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_default\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_environment\_variable\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_max\_retries}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_base\_delay}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_multiplier}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_jitter\_values}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_exceptions\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_predicate\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_both\_exceptions\_and\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_exceptions\_match\_but\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_platform\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_prompt\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_platform\_service\_decorator\_retries\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_prompt\_service\_decorator\_retries\_on\_timeout}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_warning\_logged\_on\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_info\_logged\_on\_success\_after\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_exception\_logged\_on\_giving\_up}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{98}}$$ $$\textcolor{#23d18b}{\tt{98}}$$

@sonarqubecloud
Copy link
Copy Markdown

@kirtimanmishrazipstack kirtimanmishrazipstack merged commit 3b1a343 into main Mar 31, 2026
8 checks passed
@kirtimanmishrazipstack kirtimanmishrazipstack deleted the fix/UN-3349-sanitize-connector-inputs branch March 31, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants