Ll/local time part2 by lixiliu · Pull Request #406 · dsgrid/dsgrid

lixiliu · 2026-01-23T23:49:10Z

Closes GitHub Issue #

Description

Local time support.

Allows config to describe tz-naive timestamps that are:

aligned_in_absolute_time(with or without time zone) or
aligned_in_local_std_time (where time zones are given by a TIME_ZONE_COLUMN in the data table)

Time zone localization is triggered during dataset registration when:

timestamps in data table are tz-naive (TIMESTAMP_NTZ) but time config has a time zone.
For aligned_in_local_std_time, localization is based on the TIME_ZONE_COLUMN from the geography dimension.

No time zone localization available when dataset is submitted to project

Chronify: NatLabRockies/chronify#61

Checklist

Tests exercising the new feature or bug fix
All tests pass
At least one code review approval
Consider transferring TODOs to GitHub

This reverts commit e15b0d4.

codecov · 2026-01-23T23:59:21Z

Codecov Report

❌ Patch coverage is 75.29412% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.12%. Comparing base (f3f5546) to head (17ced0b).
⚠️ Report is 21 commits behind head on main.

Files with missing lines	Patch %	Lines
dsgrid/utils/dataset.py	61.70%	36 Missing ⚠️
dsgrid/config/dimensions.py	50.00%	9 Missing ⚠️
dsgrid/config/date_time_dimension_config.py	84.00%	8 Missing ⚠️
dsgrid/query/query_submitter.py	72.72%	6 Missing ⚠️
dsgrid/dimension/time.py	50.00%	3 Missing ⚠️
dsgrid/registry/dataset_registry_manager.py	98.43%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #406      +/-   ##
==========================================
+ Coverage   76.90%   80.12%   +3.22%     
==========================================
  Files         137      124      -13     
  Lines       14784    14211     -573     
==========================================
+ Hits        11370    11387      +17     
+ Misses       3414     2824     -590

Flag	Coverage Δ
Linux	`80.05% <75.29%> (?)`
Windows	`80.12% <75.29%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds local time support for tz-naive timestamps by introducing Chronify-based localization during dataset registration, along with new/updated time-zone configuration semantics and tests.

Changes:

Adds Chronify-powered timestamp localization helpers and wires them into dataset registration.
Updates time zone format modeling (rename to aligned_in_local_std_time, allow time_zone=None for aligned absolute time).
Adds tests for localization routing and UTC-offset parsing in time-in-parts conversion.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tests/test_localize_timestamps_if_necessary.py	New tests validating localization routing and no-op behavior.
tests/test_create_time_dimensions.py	Minor test fix for `periods` argument type.
tests/test_convert_time_format_if_necessary.py	New tests for UTC offset parsing and timestamp transformation behavior.
tests/data/dimension_models/minimal/dimension_test_time.json5	Updates time zone format string to new enum value.
pyproject.toml	Switches Chronify dependency to a Git URL/branch for local-time work.
missing_associations/geography__subsector.csv	Adds missing associations CSV fixture.
dsgrid/utils/dataset.py	Adds Chronify localization helpers and `localize_timestamps_if_necessary`.
dsgrid/registry/dataset_registry_manager.py	Integrates time-in-parts conversion and timestamp localization into registration.
dsgrid/query/query_submitter.py	Uses `TIME_ZONE_COLUMN` consistently; refactors Chronify conversion routing.
dsgrid/dimension/time.py	Renames time zone enum value to `aligned_in_local_std_time`.
dsgrid/dataset/dataset_schema_handler_base.py	Adds Chronify-based timestamp localization pathway at schema-handler level.
dsgrid/config/index_time_dimension_config.py	Updates Chronify return type hints for index time config.
dsgrid/config/dimensions.py	Updates time config models (offset column, time zone formats, removes legacy model).
dsgrid/config/date_time_dimension_config.py	Adds Chronify dtype support + localization plan logic.
dsgrid/common.py	Adds `TIME_COLUMN` constant.
dsgrid-test-data	Updates submodule pointer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dsgrid/utils/dataset.py

dsgrid/config/dimensions.py

dsgrid/dataset/dataset_schema_handler_base.py

Copilot · 2026-01-24T00:22:00Z

dsgrid/registry/dataset_registry_manager.py

+        # update time_dim
+        time_column = time_dim.model.column_format.time_column
+        time_dim.model.column_format = TimeFormatDateTimeTZModel(time_column=time_column)
+


After localization, the code updates the file path and the time dimension dtype, but it does not update config.model.data_layout.data_file.columns (when present) to reflect TIMESTAMP_TZ. This can leave the dataset config’s declared schema inconsistent with the written parquet. Consider mirroring _update_config_for_timestamp behavior here: replace the time column entry with data_type='TIMESTAMP_TZ' when columns is not None.

Suggested change

# update data_file columns schema, if present, to reflect TIMESTAMP_TZ

data_file = config.model.data_layout.data_file

if data_file.columns is not None:

updated_columns: list[Column] = []

for col in data_file.columns:

if col.name == time_column:

updated_columns.append(Column(name=time_column, data_type="TIMESTAMP_TZ"))

else:

updated_columns.append(col)

data_file.columns = updated_columns

tests/test_convert_time_format_if_necessary.py

tests/test_localize_timestamps_if_necessary.py

dsgrid/dimension/time.py

dsgrid/utils/dataset.py

Copilot · 2026-01-24T00:22:01Z

pyproject.toml

    # "boto3",
    # "s3path",
-    "chronify ~= 0.6.0",
+    "chronify@git+https://github.com/NatLabRockies/chronify.git@ll/local_time2", # "chronify ~= 0.6.0",


Using a Git URL pinned to a branch name makes builds non-reproducible (branch heads can move) and can break dependency resolution for consumers. Prefer pinning to an immutable commit SHA or (ideally) a released Chronify version, and keep VCS overrides behind a dev-only extra if needed.

Suggested change

"chronify@git+https://github.com/NatLabRockies/chronify.git@ll/local_time2", # "chronify ~= 0.6.0",

"chronify ~= 0.6.0",

dsgrid/query/query_submitter.py

dsgrid/config/date_time_dimension_config.py

dsgrid/utils/dataset.py

dsgrid/dataset/dataset_schema_handler_base.py

daniel-thom · 2026-02-08T20:03:29Z

dsgrid/utils/dataset.py

+    runtime_config = dsgrid.runtime_config
+    match localization_plan:
+        case "localize_to_single_tz":
+            to_time_zone = time_dim._get_chronify_time_zone()


This shouldn't rely on a private method. Why do we need a chronify-specific method? Can't it be time_dim.get_time_zone?

Because we need the ZoneInfo obj version of time_dim.get_time_zone.

I don't want to make it a public method, though I can, because the method is only for the datetime time config.

daniel-thom · 2026-02-08T20:07:56Z

tests/test_localize_timestamps_if_necessary.py

+    if isinstance(res_df, DataFrame):
+        res_df = res_df.toPandas()
+    assert sorted(res_df[TIME_COLUMN]) == sorted(called_df[TIME_COLUMN])
+    # target.assert_called_once()


Why are many of these lines commented out?

They work in local purest but don't seem to work for the CI version. I'll remove

elainethale

Looking at this code brings two main questions to mind:

Is it possible to know what operations will be/are applied to bring one time convention in line with another?
Are we supporting all time dimension types (when reasonably compatible) as possible project time base dimensions and/or supplemental dimensions (are they queryable)?

elainethale · 2026-02-25T17:04:59Z

dsgrid/config/dimensions.py

    """Format of timestamps in a dataset is timezone-naive datetime,
-    requiring localization to time zones."""
+    timestamps can be localized to time zone(s)."""


Perhaps delete, "timestamps can be localized to time zone(s)". This model is just documenting what the time format of the dataset is, which is covered by the first line.

elainethale · 2026-02-25T17:10:24Z

dsgrid/config/dimensions.py

+    offset_column: str | None = Field(
+        title="offset_column",
+        description="Name of the offset column in the dataset. Value is the UTC offset in hours (e.g., -8 or -08:00). "
+        "If None, the offset will not be set.",
        default=None,
-        title="time_zone",
-        description="IANA time zone of the timestamps. Use None for time zone-naive timestamps.",
    )


Does this only support standard time, or can offset change over the year?

It seems like it would be best to support time_zone strings as well ... Can we provide guidance/support for data that includes daylight savings? I.e., be robust to repeated hours in fall and skipped hour in spring?

Timezone offset is very confusing. Can we provide an example that clarifies our sign convention whenever it is mentioned?

elainethale · 2026-02-25T17:11:43Z

dsgrid/config/dimensions.py



 class AlignedTimeSingleTimeZone(DSGBaseModel):
    """For each geography, data has the same set of timestamps in absolute time.


Suggested change

"""All geographies have data with the same set of timestamps in absolute time.

elainethale · 2026-02-25T17:13:23Z

dsgrid/config/dimensions.py

    """For each geography, data has the same set of timestamps in absolute time.
    Timestamps in the data must be tz-aware.

    E.g., data in CA and NY both start in 2018-01-01 00:00 EST.


This is a great example. We should have examples like this everywhere, especially for enum fields and data models / data model fields.

elainethale · 2026-02-25T17:15:22Z

dsgrid/config/dimensions.py


 class AlignedTimeSingleTimeZone(DSGBaseModel):
    """For each geography, data has the same set of timestamps in absolute time.
    Timestamps in the data must be tz-aware.


This doesn't seem to be true any longer.

elainethale · 2026-02-25T19:49:30Z

dsgrid/common.py

 SYNC_EXCLUDE_LIST = ["*.DS_Store", "**/*.lock"]
 TIME_ZONE_COLUMN = "time_zone"
 VALUE_COLUMN = "value"
+TIME_COLUMN = "timestamp"


Are we standardizing to this now?

If so, can data submitters map other column names to this one if necessary? The same question applies to these other columns: "value" and "time_zone".

elainethale · 2026-02-25T19:56:35Z

dsgrid/utils/dataset.py

+    Time zone conversion converts from tz-aware timestamps to
+    tz-naive timestamps with the specified time zone as a new column.


Time zone "conversion" is a vague term. Is it always tz-aware to tz-naive + time_zone column throughout the dsgrid and chronify codebases, or is it overloaded?

elainethale · 2026-02-25T19:59:20Z

dsgrid/utils/dataset.py

+    time_zone: tzinfo | None,
+    scratch_dir_context: ScratchDirContext,
+) -> DataFrame:
+    """Create a single time zone-localized table with chronify and Spark and a Hive Metastore."""


Check that the doc strings for the spark methods repeat relevant information from the duckdb methods.

elainethale · 2026-02-25T20:02:41Z

dsgrid/utils/dataset.py

+    config: DatasetConfig,
+    scratch_dir_context: ScratchDirContext,
+) -> tuple[DataFrame, bool]:
+    """Localize tz-naive timestamps to time zone(s) in the dataframe if necessary using Chronify."""


Under what conditions is localization "necessary"?

elainethale · 2026-02-25T20:15:47Z

dsgrid/config/date_time_dimension_config.py

+                    measurement_type=self._model.measurement_type,
+                    interval_type=self._model.time_interval_type,
+                )
+            case TimeZoneFormat.ALIGNED_IN_LOCAL_STD_TIME:


Should we also support local time data with local timestamps like America/Denver? _TZ type, "aligned in standard time", but timestamps are localized to clock time, including DST.

elainethale

Here are a couple of additional points raised by Claude that make sense to me and we might want to address:

If someone set an ANNUAL or REPRESENTATIVE_PERIOD [project] base dimension, there's no guard — it would fail only at dataset registration or query time with a confusing error. Consider adding a project-level validator or at least documenting that DATETIME is the expected base type.

AlignedTimeSingleTimeZone.time_zone is now str | None. When None, get_time_zones() returns []. This flows through to get_localization_plan() returning None (no localization). This enables true tz-naive datetime data, which is useful. But it also means ALIGNED_IN_ABSOLUTE_TIME with time_zone=None and TIMESTAMP_NTZ produces data with no timezone information anywhere — which may be confusing for downstream consumers. Worth documenting what this combination means.

elainethale · 2026-02-25T20:34:50Z

dsgrid/dimension/time.py

    DATETIME = "datetime"
    ANNUAL = "annual"
    REPRESENTATIVE_PERIOD = "representative_period"
    DATETIME_EXTERNAL_TZ = "datetime_external_tz"


Removed by this PR and needs to be deleted or marked for deprecation, right?

elainethale · 2026-02-25T20:36:15Z

dsgrid/dimension/time.py

    ALIGNED_IN_ABSOLUTE_TIME = "aligned_in_absolute_time"
-    ALIGNED_IN_CLOCK_TIME = "aligned_in_clock_time"
+    ALIGNED_IN_LOCAL_STD_TIME = "aligned_in_local_std_time"
    LOCAL_AS_STRINGS = "local_as_strings"


Is this used? Claude thinks this might cause a silent failure if a user selected it.

elainethale · 2026-02-25T20:42:06Z

dsgrid/dimension/time.py

    )


 class LeapDayAdjustmentType(DSGEnum):


This Adjustment class and the next two seem one-directional and/or ambiguous to me. Should we have separate "adjustment types" for with leap day -> without leap day and without leap day - with leap day, and likewise for multiple types of DST conversions (standard timestamps <-> timestamps with DST, representative or indexed timestamps with different DST handling <-> standard timestamps or timestamps with DST)?

This is related to my comment about whether we want to explicitly support standard timezone types like America/Denver. Claude raised a similar question.

daniel-thom · 2026-02-25T21:32:04Z

dsgrid/config/date_time_dimension_config.py

+
+    def _get_chronify_dtype(self) -> chronify.TimeDataType:
+        match self.model.column_format.dtype:
+            case "TIMESTAMP_NTZ":


Having this string be all-caps is unfortunate. All strings in our config files are lowercase. Can we still change this in chronify?

lixiliu added 14 commits January 20, 2026 14:04

rename config params

59961ac

basic implementation

6715a32

update

fa550c1

update localization calls

a394e38

fix integration errors

9ab7759

update test

0ed68de

clean up

1aeefa2

remove time zone from column_format

f11a541

update submodule

e15b0d4

Revert "update submodule"

7de702b

This reverts commit e15b0d4.

remove stale features

9bee3fe

update

ade1d41

update offset parsing

e0272d7

fix offset parsing

b0f7f20

refactor tests

97c1e3a

lixiliu requested review from Copilot and daniel-thom January 24, 2026 00:13

lixiliu self-assigned this Jan 24, 2026

Copilot AI reviewed Jan 24, 2026

View reviewed changes

lixiliu added 10 commits January 23, 2026 19:02

add bound on offset

6c881be

fix bugs

d196d1d

make spark df

12b2e67

change spark session

df1ef42

update

3322bea

fix bugs

7698332

update localize test

fd5aa0d

fix bugs

15f2f59

update

fbb1435

make time_zone_column optional

a5c7854

last try

903b42a

lixiliu mentioned this pull request Jan 27, 2026

Time zone localization NatLabRockies/chronify#61

Merged

daniel-thom requested changes Feb 8, 2026

View reviewed changes

address PR comments

89c1876

lixiliu requested a review from daniel-thom February 16, 2026 10:28

daniel-thom added 2 commits February 19, 2026 07:13

Update submodules

d977b27

Update chronify branch

17ced0b

elainethale self-requested a review February 19, 2026 20:12

elainethale reviewed Feb 25, 2026

View reviewed changes

daniel-thom reviewed Feb 25, 2026

View reviewed changes

+        # update data_file columns schema, if present, to reflect TIMESTAMP_TZ
+        data_file = config.model.data_layout.data_file
+        if data_file.columns is not None:
+            updated_columns: list[Column] = []
+            for col in data_file.columns:
+                if col.name == time_column:
+                    updated_columns.append(Column(name=time_column, data_type="TIMESTAMP_TZ"))
+                else:
+                    updated_columns.append(col)
+            data_file.columns = updated_columns

	"chronify@git+https://github.com/NatLabRockies/chronify.git@ll/local_time2", # "chronify ~= 0.6.0",
	"chronify ~= 0.6.0",



		class AlignedTimeSingleTimeZone(DSGBaseModel):
		"""For each geography, data has the same set of timestamps in absolute time.


	"""All geographies have data with the same set of timestamps in absolute time.

		Time zone conversion converts from tz-aware timestamps to
		tz-naive timestamps with the specified time zone as a new column.

Conversation

lixiliu commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lixiliu Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elainethale left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elainethale left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

lixiliu commented Jan 23, 2026 •

edited

Loading

codecov bot commented Jan 23, 2026 •

edited

Loading

lixiliu Feb 16, 2026 •

edited

Loading