Skip to content

Conversation

@Xaelp
Copy link

@Xaelp Xaelp commented Feb 5, 2025

This PR adds this self-hosted tool, https://github.com/d-Rickyy-b/certstream-server-go, as an additional option for querying CT Logs using a Go implementation.

This tool includes a web socket server from which it can efficiently stream parsed logs. Has no storage option (yet).

@Xaelp Xaelp requested a review from a team as a code owner February 5, 2025 16:36
@Xaelp Xaelp requested a review from phbnf February 5, 2025 16:36
@google-cla
Copy link

google-cla bot commented Feb 5, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

|[crt.sh](https://github.com/crtsh) |SQL |[yes](https://github.com/crtsh/ct_monitor/blob/174e0d8d4954dacd80eaf45dedd90061d7e7a6f4/ct/logList.go#L24) |[yes](https://github.com/crtsh/ct_monitor/blob/174e0d8d4954dacd80eaf45dedd90061d7e7a6f4/ct/getEntries.go#L77) |[static](https://github.com/crtsh/ct_monitor/blob/174e0d8d4954dacd80eaf45dedd90061d7e7a6f4/ct/logList.go#L75) |
|[CertStream](https://github.com/CaliDog/certstream-server?tab=readme-ov-file) |files (json), last 25 entries|[yes](https://github.com/CaliDog/certstream-server/blob/41c054704316f9ade21a0cc89db19d51e10469e6/lib/certstream/ct_watcher.ex#L165) |[no](https://github.com/CaliDog/certstream-server-python/blob/790718da384d3710e7842bd32b8367d2e142cc14/certstream/watcher.py#L143)|no |
|[CertStream](https://github.com/CaliDog/certstream-server?tab=readme-ov-file) |files (json), last 25 entries|[yes](https://github.com/CaliDog/certstream-server/blob/41c054704316f9ade21a0cc89db19d51e10469e6/lib/certstream/ct_watcher.ex#L165) |[no](https://github.com/CaliDog/certstream-server-python/blob/790718da384d3710e7842bd32b8367d2e142cc14/certstream/watcher.py#L143)|no
|[certstream server go](https://github.com/d-Rickyy-b/certstream-server-go) | n/a | [yes](https://github.com/d-Rickyy-b/certstream-server-go/blob/22cc89fc7ea2994d4d2717e5dcc5ad17a444fee7/internal/certificatetransparency/ct-watcher.go#L233) | no | [yes](https://github.com/d-Rickyy-b/certstream-server-go/blob/22cc89fc7ea2994d4d2717e5dcc5ad17a444fee7/internal/certificatetransparency/ct-watcher.go#L230)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible for clients to bump parallelism to a value higher than 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can make multiple queries to the same log in parallel with a value higher than 1 which is the current default. It can be adjusted here.

Underneath it uses this library.

Copy link
Author

@Xaelp Xaelp Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am preparing a setup to monitor all CT Logs and store ALL unique certificates for powering a tool to help fight phishing.

This library, from a comparison between all the alternatives, is the closest to what we need to do this efficiently.

There are some other requirements which will cause some adjustments on it, but it's a great out of the box self-hosting solution to fetch entries from logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry for the delay, that fell off my radar.

The intent of the parallelism column it to tell folks running this code what features they can benefit from when they use the tool, without modifying the code. Given that changing parallelism requires changing the code, I'd rather set it to do. Are you planing of making it a flag and/or putting it in the config? That would make it easier for clients to control it.

If you use the fetcher library, then I think that we can say it supports dynamic indexes though?

Copy link
Author

@Xaelp Xaelp Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

I am not a contributor of the https://github.com/d-Rickyy-b/certstream-server-go project but I will open a PR there as that seems a logical improvement regarding the configurable parallelism.

Regarding the dynamic indexes you're totally correct. It will keep track of the tree size on the responses of each log so that it doesn't miss a certificate.

However, there is something I want to improve there (either by forking it, or through a PR) which is: it should also cache persistently each CT log current tree size so that if it restarts it can continue from where it stopped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the dynamic index info. See https://github.com/Xaelp/certificate-transparency-community-site/blob/add-self-hosted-tools/docs/google/fetch-logs.md#self-hosted-tools.

Regarding the parallelism shall I change it to

Tool Storage Parallelism Dynamic indexes Backoff
certstream server go n/a yes* (requires changing code) yes yes

or just set it as

Tool Storage Parallelism Dynamic indexes Backoff
certstream server go n/a no yes yes

and later when the configuration is implemented I can update this.

@phbnf
Copy link
Contributor

phbnf commented Feb 6, 2025

Nice to see this!

@phbnf phbnf self-requested a review March 11, 2025 10:28
@Xaelp Xaelp force-pushed the add-self-hosted-tools branch from 849e44c to 383c180 Compare March 17, 2025 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants