-
Notifications
You must be signed in to change notification settings - Fork 412
Use bazel's downloader for Go modules when possible #2124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
752ad5e to
78f1456
Compare
| auth = use_netrc(read_user_netrc(ctx), ctx.attr.urls, ctx.attr.auth_patterns), | ||
| allow_fail = True, | ||
| ) | ||
| if result.success: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases do we expect this to fail? Should we print a warning if it does or fail instead and provide a way to opt into the fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe native go tooling will silently fallback when there are multiple comma-separated options in GOPROXY (as there are by default) so I was attempting to replicate that behavior. So for example GOPROXY=https://nosuchproxy.com,https://proxy.golang.org,direct should continue to work, as should GOPROXY=https://nosuchproxy.com,direct. I figured the easiest way to handle this edge case was to fallback to the full resolution chain via native tooling, even if it retries the first (failed) one.
Probably worth at least a comment, but happy to revise the approach if you have a better idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give a taste of complexity here, it's important to only fall back to the next option if the return code is 404 or 410 specifically. The proxy should be able to deny fallback by returning another error code, like 403. That lets a proxy enforce usage policies without needing to serve all external modules.
The separators are , (above behavior) or | (fallback on any error including unavailability).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good to know. That reaffirms my belief that if we go down this route, we should not try to replicate the full fallback logic, but defer to go mod download after the first failure, which is what this PR does.
Do you think there are cases where ctx.download_and_extract does NOT fail but we should still do a fallback?
internal/go_repository.bzl
Outdated
| if not pat: | ||
| return False | ||
| if pat.endswith("/..."): | ||
| return module_path.startswith(pat[:-4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would let foo/bar/... match foo/barbaz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, good catch. i'll add some tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i'll remove this for now, given #2124 (comment)
internal/go_repository.bzl
Outdated
| if goproxy == "off": | ||
| return None | ||
|
|
||
| def matches_pattern(pattern, module_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a top-level function? That does avoid some overhead.
internal/go_repository.bzl
Outdated
| return False | ||
| if pat.endswith("/..."): | ||
| return module_path.startswith(pat[:-4]) | ||
| if "*" in pat: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this occur in any position? The logic doesn't look quite right for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only is this incorrect, but it's also insufficient. This would need to implement all of https://pkg.go.dev/path#Match to really be correct. I'm tempted to just avoid using this new codepath if you have anything in GOPRIVATE/GONOPROXY, I feel like trying to handle it correctly is really not worth it, at least for the first cut
78f1456 to
4c6288d
Compare
4c6288d to
ef58f55
Compare
ef58f55 to
ecd4203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is safe, and I'd prefer we not go in this direction for a few reasons.
- At this point,
go_repositoryshould almost always be used in module mode, withversionandsumset, not withsha256. I don't think there's any need to support GOPATH mode at this point.sumin a different format thansha256: it's a hash of the uncompressed files and metadata, not the archive itself. Bazel doesn't know how to verify these hashes, so we can't usectx.download_and_extract. - There is a lot of complex logic in
go mod downloadthat's important for security and privacy. I don't think we should bypass that.
| ) | ||
| result = ctx.download_and_extract( | ||
| url = url, | ||
| sha256 = ctx.attr.sha256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.attr.sha256 is not set in module mode. can_use_direct_download is only set when ctx.attr.version is set (L231), but we verify sha256 is not set in L233.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, we should snip this line, good catch. Note that that sha256 was actually not the part ensuring correctness of download; we invoke the fetch_repo tool in -no-fetch mode so it still computes the dirhash and verifies that with the value from go.sum
The sha256 attribute was an oversight, but the actual correctness is verified via the There's still a question of what happens if the download is corrupted; the
Is this the same concern you mention above about the complex fallback behavior (which I think this PR sidesteps by only peeling off the first attempt into starlark) or are there other behaviors you're worried about? |
I didn't get that deep into the code: my concerns are more architectural (below).
Yes, other behaviors, too. I think Could you say more about the problem being solved? You mentioned Bazel's downloader config, but could you give more detail? I wonder if this is something that could be solved with a customized Go module proxy server. The proxy protocol is meant to be simple to implement: most of the complexity is on the client side ( You mentioned remote asset API too, but I don't think that will work as long as |
|
We should also take into account that |
Taking a step back, the problem we've observed is that there are times when Of course the same effect can be achieved by running our own module proxy server, but it's appealing to reuse the same solution that's already powering the download caching for packages for other languages (python, JS, java, etc.)
We haven't seen any more flakes in weeks since running with this patch, so I suspect that even though the entire repository rule execution cannot be cached, the download itself comes from the CAS and then gazelle still runs locally
Yeah perhaps some of the recent/upcoming changes can provide more holistic caching. I am all for being able to cache the state of the repository post-gazelle-run, as long as it correctly invalidates :) |
|
I think what's needed here is a service that acts as a Go module proxy but is actually a transparent pull through cache. It could store files in the CAS using the Asset API. I'm not sure such a thing exists yet, but it would be relatively simple to implement, and architecturally, it seems like the right thing for this situation. The rules shouldn't have to change for this. The main risk of using the CAS as storage is durability: files can get garbage collected after a while. But |
What type of PR is this?
Feature
What package or component does this PR mostly affect?
go_repository
What does this PR do? Why is it needed?
When GOPROXY is enabled, a module fetch reduces to an HTTP download from the proxy, which Bazel is capable of doing on its own. This lets the rules play more nicely with the downloader config, remote asset API, etc.
Which issues(s) does this PR fix?
Fixes #
Other notes for review