Skip to content

Conversation

jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Sep 7, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Problem: DateTimeParseException: Text '2023-12-31T12:00:00.000' could not be parsed at index 23

Root Cause: Instant.parse() requires timezone information, but GeoPackage datetime fields often don't include it

Solution: Enhanced the epoch() method with fallback parsing that treats datetime strings without timezone as UTC

Modified:

  • spark/common/src/main/scala/org/apache/sedona/sql/datasources/geopackage/transform/DataTypesTransformations.scala
  • Enhanced: The epoch() method to handle multiple datetime formats
  • Added: Support for various millisecond formats (1, 2, 3 digits, or none)
  • Maintained: Full backward compatibility with timezone-aware formats

How was this patch tested?

Added 1 more test case and a hand-made geopackage file that contains problematic datetime data

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

@jiayuasu jiayuasu added this to the sedona-1.8.0 milestone Sep 7, 2025
@jiayuasu jiayuasu requested a review from Copilot September 7, 2025 06:49
Copilot

This comment was marked as outdated.

@jiayuasu jiayuasu requested a review from Copilot September 7, 2025 07:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a DateTimeParseException in the GeoPackage reader when processing datetime fields that lack timezone information. The fix enhances the datetime parsing logic to handle multiple formats and treat timezone-less strings as UTC.

  • Enhanced the epoch() method with fallback parsing for datetime strings without timezone
  • Added support for various millisecond formats (1, 2, 3 digits, or none)
  • Added comprehensive test coverage for the datetime parsing fix

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

File Description
DataTypesTransformations.scala Enhanced epoch() method with fallback parsing and support for multiple datetime formats
GeoPackageReaderTest.scala (3 versions) Added test case to verify datetime parsing without timezone information

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 51 to 64
for (formatter <- datetimeFormatters) {
try {
val localDateTime = LocalDateTime.parse(timestampStr, formatter)
return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli
} catch {
case _: DateTimeParseException =>
// Continue to next formatter
}
}

// If all formatters failed, throw a descriptive exception
throw new IllegalArgumentException(
s"Unable to parse datetime: $timestampStr. " +
s"Expected formats: 'yyyy-MM-ddTHH:mm:ss[.S]' or 'yyyy-MM-ddTHH:mm:ss[.S]Z'")
Copy link
Preview

Copilot AI Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using early return in a for loop is not optimal for functional programming. Consider using datetimeFormatters.view.map(...) with find or collectFirst to make the code more functional and potentially more efficient.

Suggested change
for (formatter <- datetimeFormatters) {
try {
val localDateTime = LocalDateTime.parse(timestampStr, formatter)
return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli
} catch {
case _: DateTimeParseException =>
// Continue to next formatter
}
}
// If all formatters failed, throw a descriptive exception
throw new IllegalArgumentException(
s"Unable to parse datetime: $timestampStr. " +
s"Expected formats: 'yyyy-MM-ddTHH:mm:ss[.S]' or 'yyyy-MM-ddTHH:mm:ss[.S]Z'")
datetimeFormatters.view
.map { formatter =>
try {
Some(LocalDateTime.parse(timestampStr, formatter).toInstant(ZoneOffset.UTC).toEpochMilli)
} catch {
case _: DateTimeParseException => None
}
}
.collectFirst { case Some(epochMilli) => epochMilli }
.getOrElse {
// If all formatters failed, throw a descriptive exception
throw new IllegalArgumentException(
s"Unable to parse datetime: $timestampStr. " +
s"Expected formats: 'yyyy-MM-ddTHH:mm:ss[.S]' or 'yyyy-MM-ddTHH:mm:ss[.S]Z'")
}

Copilot uses AI. Check for mistakes.

// If all formatters failed, throw a descriptive exception
throw new IllegalArgumentException(
s"Unable to parse datetime: $timestampStr. " +
s"Expected formats: 'yyyy-MM-ddTHH:mm:ss[.S]' or 'yyyy-MM-ddTHH:mm:ss[.S]Z'")
Copy link
Preview

Copilot AI Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message format examples don't accurately reflect all supported patterns. It shows '[.S]' but the code supports .S, .SS, and .SSS formats. Consider updating to: 'yyyy-MM-ddTHH:mm:ss[.SSS|.SS|.S]' or 'yyyy-MM-ddTHH:mm:ss[.SSS|.SS|.S]Z'

Suggested change
s"Expected formats: 'yyyy-MM-ddTHH:mm:ss[.S]' or 'yyyy-MM-ddTHH:mm:ss[.S]Z'")
s"Expected formats: 'yyyy-MM-ddTHH:mm:ss[.SSS|.SS|.S]' or 'yyyy-MM-ddTHH:mm:ss[.SSS|.SS|.S]Z'")

Copilot uses AI. Check for mistakes.

@jiayuasu jiayuasu merged commit 4b2b8a6 into master Sep 7, 2025
22 of 40 checks passed
@jiayuasu jiayuasu deleted the geopackage-bug branch September 8, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error Reading geopackage (DateTimeParseException) and Showing metadata
1 participant