Handler failure cases for clickhouse connector #8133

NamanMahor · 2025-10-15T18:12:44Z

I have done few fixes for PLAT-236: Test clickhouse connector for wrong user inputs

Added a TCP dial check to validate host and port reachability before connecting, providing a clear error message when either is incorrect.
handling "operation timed out"
EOF and handsake [21] issue when ssl is false. giving proper message.
Prevented redundant retries by skipping the second connection attempt when the protocol is already HTTP

Checklist:

Covered by tests
Ran it and it works as intended
Reviewed the diff before requesting a review
Checked for unhandled edge cases
Linked the issues it closes
Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
Intend to cherry-pick into the release branch
I'm proud of this work!

begelundmuller · 2025-10-16T10:01:26Z

runtime/drivers/clickhouse/clickhouse.go

+	if conf.Host != "" && conf.Port != 0 {
+		target := net.JoinHostPort(conf.Host, fmt.Sprintf("%d", conf.Port))
+		conn, err := net.DialTimeout("tcp", target, 5*time.Second)
+		if err != nil {
+			return nil, fmt.Errorf("Error: %w - please check that the host and port are correct %s", err, target)
+		}
+		conn.Close()
+	}


I think this will add a redundant round-trip every time we open a Clickhouse handle. Can you look into ways where it only does extra checks/validation in failure scenarios (i.e. the non-happy path where the connection does not open successfully)? Ideally if the connector is configured correctly, we shouldn't be doing additional requests.

Agree, it’s redundant but quite fast (a few ms). If we lower the dial timeout to <30s, we can move this check to the error path so it only runs on failures/ or may not need it. We can also make dial timeout < 30 with check host has suffix clickHouse.cloud , but i think in-house clusters can scale to zero.

I don't think we can reduce the timeout. But I don't understand why ping uses the full dial timeout when the host is unreachable? It feels like there's a difference between a) the dial taking a long time, b) the dial failing fast and getting retried. Is it because the Clickhouse driver treats those as the same? If that's the case, should we consider patching the Clickhouse driver to have two different settings for this?

begelundmuller · 2025-10-16T10:03:45Z

runtime/drivers/clickhouse/clickhouse.go

-			return nil, err
+		// Detect SSL/TLS mismatch (common causes: "read: EOF" or TLS Alert [21])
+		if strings.Contains(err.Error(), "EOF") || strings.Contains(err.Error(), "[handshake] unexpected packet [21]") {
+			return nil, fmt.Errorf("Error: %w — this usually happens due to SSL/TLS mismatch", err)


nit: In error wrapping in general, the %w should come at the end of the error message. It also shouldn't add an explicit Error: prefix in the error message – that kind of formatting should be done by the UI/CLI when printing the error (internally, we don't know if the error may get wrapped some more times by other callers, which would make the injected Error: look like broken formatting). E.g.:

return nil, fmt.Errorf("handshake failed (this usually happens due to SSL/TLS mismatch): %w", err)

begelundmuller · 2025-10-29T20:46:24Z

runtime/drivers/clickhouse/clickhouse.go

 		conn, err := net.DialTimeout("tcp", target, 5*time.Second)
 		if err != nil {
-			return nil, fmt.Errorf("Error: %w - please check that the host and port are correct %s", err, target)
+			return nil, fmt.Errorf("please check that the host and port are correct %s: %w - ", target, err)


nit redundant - at the end of the error message

begelundmuller · 2025-10-29T20:50:32Z

runtime/drivers/clickhouse/clickhouse.go

+	if conf.Host != "" && conf.Port != 0 {
+		target := net.JoinHostPort(conf.Host, fmt.Sprintf("%d", conf.Port))
+		conn, err := net.DialTimeout("tcp", target, 5*time.Second)
+		if err != nil {
+			return nil, fmt.Errorf("Error: %w - please check that the host and port are correct %s", err, target)
+		}
+		conn.Close()
+	}


I don't think we can reduce the timeout. But I don't understand why ping uses the full dial timeout when the host is unreachable? It feels like there's a difference between a) the dial taking a long time, b) the dial failing fast and getting retried. Is it because the Clickhouse driver treats those as the same? If that's the case, should we consider patching the Clickhouse driver to have two different settings for this?

Fix for wrong host and port issue

5737b44

NamanMahor requested a review from begelundmuller October 15, 2025 18:12

NamanMahor changed the title ~~Fix for wrong host and port issue~~ Handler failure cases for clickhouse connector Oct 15, 2025

NamanMahor marked this pull request as ready for review October 15, 2025 19:03

NamanMahor added 2 commits October 16, 2025 01:56

ssl/tls error

20e48c5

change error message

7dbd1a9

NamanMahor self-assigned this Oct 16, 2025

begelundmuller requested changes Oct 16, 2025

View reviewed changes

review

8496c62

NamanMahor requested a review from begelundmuller October 22, 2025 12:52

begelundmuller reviewed Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handler failure cases for clickhouse connector #8133

Handler failure cases for clickhouse connector #8133

Uh oh!

NamanMahor commented Oct 15, 2025 •

edited

Loading

Uh oh!

begelundmuller Oct 16, 2025

Uh oh!

NamanMahor Oct 22, 2025

Uh oh!

begelundmuller Oct 29, 2025

Uh oh!

begelundmuller Oct 16, 2025

Uh oh!

NamanMahor Oct 22, 2025

Uh oh!

begelundmuller Oct 29, 2025

Uh oh!

begelundmuller Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Handler failure cases for clickhouse connector #8133

Are you sure you want to change the base?

Handler failure cases for clickhouse connector #8133

Uh oh!

Conversation

NamanMahor commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

begelundmuller Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

NamanMahor Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

begelundmuller Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

begelundmuller Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

NamanMahor Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

begelundmuller Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

begelundmuller Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NamanMahor commented Oct 15, 2025 •

edited

Loading