Skip to content

Conversation

@m3hm3t
Copy link
Contributor

@m3hm3t m3hm3t commented Oct 14, 2025

fixes #8249

On PostgreSQL 18, opening another backend’s temporary relation is rejected earlier. Our public columnar views (columnar.storage, and those that join through it) could traverse other sessions’ temp tables and error with cannot access temporary tables of other sessions. Also, direct callers of columnar.get_storage_id(regclass) could hit the same path on PG18.

What changed

  • SQL migration: citus_columnar--13.2-1--14.0-1.sql

    • Recreate columnar.storage with a temp-safe filter:

      • include permanent/unlogged relations,
      • include this session’s temp relations,
      • exclude other sessions’ temp schemas.
    • Re-emit columnar.stripe, columnar.chunk_group, and columnar.chunk with OR REPLACE so they inherit the safer storage and keep security_barrier.

  • C code: columnar_metadata.c

    • Add a PG18-only precheck in columnar.get_storage_id(regclass) to skip other sessions’ temp relations before relation_open(). For those, the function returns NULL (SQL is STRICT, so callers naturally drop such rows). Behavior on PG15–PG17 is unchanged.

Downgrade note

We intentionally keep the hardened view definitions after downgrade as a bugfix (safer on all supported PGs).

I couldn't find related commit but found these discussions:

https://www.postgresql.org/message-id/2736425.1758475979%40sss.pgh.pa.us
https://www.postgresql.org/message-id/CAMEv5_syU0ZopE-2Wr8A8QksqrCyYT2hW06Rgw4RSPdyJO-%3Dfw%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CAJDiXghdFcZ8=nh4G69te7iRr3Q0uFyXxb3ZdG09_GTNZXwH0g@mail.gmail.com

@m3hm3t m3hm3t self-assigned this Oct 14, 2025
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.99%. Comparing base (356f254) to head (5fd40f9).

Additional details and impacted files
@@                    Coverage Diff                     @@
##           m3hm3t/pg18_beta_confs    #8252      +/-   ##
==========================================================
+ Coverage                   86.65%   88.99%   +2.34%     
==========================================================
  Files                         287      287              
  Lines                       63189    63200      +11     
  Branches                     7949     7951       +2     
==========================================================
+ Hits                        54756    56245    +1489     
+ Misses                       5931     4635    -1296     
+ Partials                     2502     2320     -182     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_columnar_access_temp_err branch 5 times, most recently from c332560 to d8e5b54 Compare October 14, 2025 22:03
@m3hm3t m3hm3t changed the title Refactor columnar tests to use helper functions for storage ID retrie… PG18 - columnar- cannot access temporary tables Oct 15, 2025
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_columnar_access_temp_err branch from d8e5b54 to 91b0f6b Compare October 15, 2025 11:27
@m3hm3t m3hm3t changed the title PG18 - columnar- cannot access temporary tables PG18: Refactor columnar regression tests to avoid cross-session temp access and stabilize storage_id lookups Oct 16, 2025
@m3hm3t m3hm3t marked this pull request as ready for review October 16, 2025 08:47
@onurctirtir
Copy link
Member

Instead, let's fix the columnar views -the views inside columnar schema, not the tables in columnar_internal- in a way that they skip other sessions' temp relations always.

For two reasons;

  1. seems these are not just regression test issues -> users can also query those views
  2. fixing these issues in the catalog view level should automatically fix most of those regression tests

I can imagine a few tests directly executing get_storage_id() and those tests should be updated in a way that they stop directly calling the UDF but they should query columnar.storage as in select storage_id from columnar.storage where relation = ... This should help once we fix columnar.storage in a way that it omits the temp relations of other sessions.

@m3hm3t m3hm3t marked this pull request as draft October 16, 2025 11:08
@m3hm3t m3hm3t closed this Oct 16, 2025
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_columnar_access_temp_err branch from 91b0f6b to bc8ab44 Compare October 16, 2025 12:31
@m3hm3t m3hm3t reopened this Oct 16, 2025
@m3hm3t m3hm3t changed the title PG18: Refactor columnar regression tests to avoid cross-session temp access and stabilize storage_id lookups PG18: Harden columnar views against cross-session temps + add safe storage_id helper Oct 16, 2025
@m3hm3t
Copy link
Contributor Author

m3hm3t commented Oct 16, 2025

Instead, let's fix the columnar views -the views inside columnar schema, not the tables in columnar_internal- in a way that they skip other sessions' temp relations always.

For two reasons;

  1. seems these are not just regression test issues -> users can also query those views
  2. fixing these issues in the catalog view level should automatically fix most of those regression tests

I can imagine a few tests directly executing get_storage_id() and those tests should be updated in a way that they stop directly calling the UDF but they should query columnar.storage as in select storage_id from columnar.storage where relation = ... This should help once we fix columnar.storage in a way that it omits the temp relations of other sessions.

@onurctirtir thank you for feedback

I updated the PR according to your suggestion. If you have time, could you review the PR?

@m3hm3t m3hm3t marked this pull request as ready for review October 16, 2025 12:44
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch 2 times, most recently from ced7e44 to e0ccab3 Compare October 21, 2025 12:19
Copy link
Member

@onurctirtir onurctirtir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped one comment and one question below.
And also wondering if the test failures are related to this PR?


Also a note to myself: check the changes made in view definitions once again before approving.

Comment on lines 149 to 160
CREATE OR REPLACE FUNCTION get_storage_id_if_visible(rel regclass)
RETURNS bigint
LANGUAGE sql STABLE AS $$
SELECT CASE
WHEN c.relpersistence = 't'
AND c.relnamespace <> pg_catalog.pg_my_temp_schema()
THEN NULL -- other session’s temp → don’t touch
ELSE columnar.get_storage_id(c.oid)
END
FROM pg_catalog.pg_class c
WHERE c.oid = $1::oid
$$;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems get_storage_id() was a function we expose to users like it's the case for columnar views. So, rather than having a new test helper function, I believe we should fix get_storage_id() itself as in what we have done for columnar views.

Copy link
Contributor Author

@m3hm3t m3hm3t Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_storage_id() updated
test helper removed

@@ -1,2 +1,44 @@
-- citus_columnar--13.2-1--14.0-1
-- bump version to 14.0-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should create older versions of these views in downgrade path as we usually do, or, as we did in some of the similar bugfixes, should those stay as is even after a downgrade?

Copy link
Contributor Author

@m3hm3t m3hm3t Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we keep this view, we’ll only “lose” rows that came from other sessions’ temp columnar tables which nobody should rely on, and which is exactly what bit on PG18.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I can add a downgrade script to this PR and open a separate issue to decide later whether to keep or remove it.

What do you think on this @onurctirtir ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after discuss will add downgrade path

m3hm3t and others added 4 commits October 24, 2025 12:41
Pg18 beta conf file updated

(cherry picked from commit c36410c)

Update image suffix in build and test workflow

Update image suffix in build configuration

Update image suffix in build configuration

Update image suffix in build configuration

(cherry picked from commit 7dbb946)

Update image suffix in build_and_test.yml to reflect latest development version

Update PostgreSQL version to 18beta3 in Dockerfile and CI workflow
… WHERE … IS NOT NULL (PG15–PG18) (#8139)

DESCRIPTION: Stabilize multi_insert_select expected: accept unqualified
columns in WHERE … IS NOT NULL

fixes #8133 


**Context**

* With PG18, ruleutils adds a GROUP RTE and improves column-name dedup.
As a side-effect, Vars that point at the GROUP RTE print as unqualified
column names even when `varprefix` is true.
* In Citus’ vendored `ruleutils_18.c` we already flattened GROUP Vars in
`targetList` and `havingQual`, but not in `jointree->quals`.
* For queries like `INSERT … SELECT … GROUP BY …`, Citus injects an
implicit null-guard on the group key in the WHERE clause. Because that
Var was still referencing the GROUP RTE, the deparser emitted `WHERE
(user_id IS NOT NULL)` instead of `WHERE (raw_events_first.user_id IS
NOT NULL)`, causing regress diffs only in grouped SELECTs.
* Related upstream change: PostgreSQL commit
`52c707483ce4d0161127e4958d981d1b5655865e` (ruleutils column-name de-dup
/ GROUP RTE exposure).

**What changed**

* Added an alternative expected file
`src/test/regress/expected/multi_insert_select_0.out` to keep CI green
across mixed environments where the qualified form may still be
produced.
The change in `merge_planner.c` fixes _unrecognized range table entry_
diffs in merge regress tests (category 2 diffs in #7992), the change in
`multi_router_planner.c` fixes _column reference ... is ambiguous_ diffs
in `multi_insert_select` and `multi_insert_select_window` (category 3
diffs in #7992). Edit to `common.py` enables standalone regress tests
with pg18 (e..g `citus_tests/run_test.py merge`).
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch 3 times, most recently from 5f5da94 to 15ecb4b Compare November 5, 2025 09:11
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch from 15ecb4b to 5565920 Compare November 5, 2025 18:45
naisila pushed a commit that referenced this pull request Nov 6, 2025
Fixes #8235

PG18 and PG latest minors ignore temporary relations in
`RelidByRelfilenumber` (`RelidByRelfilenode` in PG15)
Relevant PG commit:
postgres/postgres@86831952

Here we are keeping temp reloids instead of getting it with
RelidByRelfilenumber, for example, in some cases, we can directly get
reloid from relations, in other cases we keep it in some structures.

Note: there is still an outstanding issue with columnar temp tables in
concurrent sessions, that will be fixed in PR
#8252
manaldush added a commit to manaldush/citus that referenced this pull request Nov 7, 2025
Backporting citusdata#8235

PG18 and PG latest minors ignore temporary relations in
`RelidByRelfilenumber` (`RelidByRelfilenode` in PG15)
Relevant PG commit:
postgres/postgres@86831952

Here we are keeping temp reloids instead of getting it with
RelidByRelfilenumber, for example, in some cases, we can directly get
reloid from relations, in other cases we keep it in some structures.

Note: there is still an outstanding issue with columnar temp tables in
concurrent sessions, that will be fixed in PR
citusdata#8252
manaldush added a commit to manaldush/citus that referenced this pull request Nov 7, 2025
Backporting citusdata#8235

PG18 and PG latest minors ignore temporary relations in
`RelidByRelfilenumber` (`RelidByRelfilenode` in PG15)
Relevant PG commit:
postgres/postgres@86831952

Here we are keeping temp reloids instead of getting it with
RelidByRelfilenumber, for example, in some cases, we can directly get
reloid from relations, in other cases we keep it in some structures.

Note: there is still an outstanding issue with columnar temp tables in
concurrent sessions, that will be fixed in PR
citusdata#8252

(cherry picked from commit daa69be)
manaldush added a commit to manaldush/citus that referenced this pull request Nov 7, 2025
Backporting citusdata#8235

PG18 and PG latest minors ignore temporary relations in
`RelidByRelfilenumber` (`RelidByRelfilenode` in PG15)
Relevant PG commit:
postgres/postgres@86831952

Here we are keeping temp reloids instead of getting it with
RelidByRelfilenumber, for example, in some cases, we can directly get
reloid from relations, in other cases we keep it in some structures.

Note: there is still an outstanding issue with columnar temp tables in
concurrent sessions, that will be fixed in PR
citusdata#8252

(cherry picked from commit daa69be)
manaldush added a commit to manaldush/citus that referenced this pull request Nov 7, 2025
Backporting citusdata#8235

PG18 and PG latest minors ignore temporary relations in
`RelidByRelfilenumber` (`RelidByRelfilenode` in PG15)
Relevant PG commit:
postgres/postgres@86831952

Here we are keeping temp reloids instead of getting it with
RelidByRelfilenumber, for example, in some cases, we can directly get
reloid from relations, in other cases we keep it in some structures.

Note: there is still an outstanding issue with columnar temp tables in
concurrent sessions, that will be fixed in PR
citusdata#8252

(cherry picked from commit daa69be)
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch 2 times, most recently from e302318 to 42d95c8 Compare November 17, 2025 08:09
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch 3 times, most recently from 8c37f62 to d4fcdcb Compare November 20, 2025 08:21
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch 5 times, most recently from bdd4c24 to 206fc6e Compare December 5, 2025 13:51
@naisila naisila force-pushed the m3hm3t/pg18_beta_confs branch 4 times, most recently from b515be7 to 141f8ff Compare December 15, 2025 13:49
@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch from 667e83a to 8056bff Compare December 16, 2025 11:05
Base automatically changed from m3hm3t/pg18_beta_confs to main December 17, 2025 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PG18 - columnar- cannot access temporary tables of other sessions

5 participants