Skip to content

Conversation

BentsiLeviav
Copy link
Collaborator

Only Glue (AWS) and Databricks environments were tested and verified; the rest of the managed services were added based on online research and vendor documentation.

close #397

@BentsiLeviav BentsiLeviav requested a review from mzitnik October 13, 2025 23:41
Copy link

@windsurf-bot windsurf-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other comments (2)
  • clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala (63-66) The parseVersionString method returns None when the version string has fewer than 3 parts, but the original code would return "Spark-ClickHouse-Connector" in this case. Consider handling this case explicitly to maintain the same behavior:
    private def parseVersionString(version: String): Option[(String, String, String)] = {
      val parts = version.split("_")
      if (parts.length < 3) None
      else Some((parts(0), parts(1), parts(2)))
    }
    
  • clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala (78-79) Consider expanding the shouldInferRuntime method to handle more truthy values for better compatibility with different configuration styles:
    private def shouldInferRuntime(): Boolean = {
      val value = nodeSpec.infer_runtime_env.toLowerCase
      value == "true" || value == "1" || value == "yes" || value == "on"
    }
    

💡 To request another review, post a new comment with "/windsurf-review".

Comment on lines +323 to +336
def detectViaThreadNames(): Option[String] = {
val threadNames = Thread.getAllStackTraces.keySet().asScala.map(_.getName.toLowerCase)

if (threadNames.exists(_.contains("databricks"))) {
Some("Databricks")
} else if (threadNames.exists(t => t.contains("glue") || t.contains("awsglue"))) {
Some("Glue")
} else if (threadNames.exists(_.contains("emr"))) {
Some("EMR")
} else if (threadNames.exists(_.contains("dataproc"))) {
Some("Dataproc")
} else {
None
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detectViaThreadNames() method doesn't check for Synapse in thread names, while the other detection methods do. Consider adding a check for Synapse thread names for consistency.

Comment on lines +323 to +336
def detectViaThreadNames(): Option[String] = {
val threadNames = Thread.getAllStackTraces.keySet().asScala.map(_.getName.toLowerCase)

if (threadNames.exists(_.contains("databricks"))) {
Some("Databricks")
} else if (threadNames.exists(t => t.contains("glue") || t.contains("awsglue"))) {
Some("Glue")
} else if (threadNames.exists(_.contains("emr"))) {
Some("EMR")
} else if (threadNames.exists(_.contains("dataproc"))) {
Some("Dataproc")
} else {
None
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detectViaThreadNames() method doesn't check for HDInsight, but HDInsight is included in the detectViaClassLoader() method. Consider adding a check for HDInsight thread names for consistency.

Comment on lines +201 to +246
def detectViaStackTrace(): Option[String] = {
val stackTrace = Thread.currentThread().getStackTrace
val stackClasses = stackTrace.map(_.getClassName.toLowerCase)

// Check for platform-specific classes in stack
if (
stackClasses.exists(c =>
c.contains("com.databricks.logging") ||
c.contains("databricks.spark") ||
c.contains("com.databricks.backend")
)
) {
Some("Databricks")
} else if (
stackClasses.exists { c =>
c.contains("com.amazonaws.services.glue") ||
c.contains("aws.glue") ||
c.contains("awsglue")
}
) {
Some("Glue")
} else if (
stackClasses.exists(c =>
c.contains("com.amazon.emr") ||
c.contains("amazon.emrfs")
)
) {
Some("EMR")
} else if (
stackClasses.exists(c =>
c.contains("com.google.cloud.dataproc") ||
c.contains("dataproc")
)
) {
Some("Dataproc")
} else if (
stackClasses.exists(c =>
c.contains("com.microsoft.azure.synapse") ||
c.contains("synapse.spark")
)
) {
Some("Synapse")
} else {
None
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detectViaStackTrace() method doesn't check for HDInsight, but HDInsight is included in the detectViaClassLoader() method. Consider adding a check for HDInsight in stack traces for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add runtime environment info (e.g. Glue, Databricks) to User-Agent string

1 participant