Skip to content

Conversation

reingart
Copy link

@reingart reingart commented Jan 11, 2025

Fixes: #6732 #9197
Related: #8172 (inspired by by a prior ebekebe fork, with a more general approach)

Description

docker buildx is an enhanced builder using BuildKit that can replace traditional docker build command, with almost the same syntax (transparently).

This adds several distributed features and advanced capabilities:

  • local & remote build-kit builders, either running standalone, docker or Kubernetes clusters
  • improved cache support (registry destination, pushing full metadatada and multi-stage layers)
  • improved multi-platform image building support (eg. x86_64 and arm64 combined, with emulation or cross-compilation)

This features are very useful for corporate CI/CD, for example when using GitLab, where the use case requires:

  • using a ephemeral remote buildkit instance (rootless)
  • using a remote docker registry for cache
  • using a different cache destination tag

The remote BuildKit support is useful in cases with a privileged docker daemon is not possible or desirable due security policies (eg. untrusted images or review pipelines).
Daemon-less mode is also useful to offload container building from developers notebooks, sharing and reusing remote caches more effectively.

The cache-tag is useful to point to latest or a generic cache tag, instead of the generated one for this build (via tagPolicy that would be useless in most distributed cases, as it can be invalidated by minor changes, specially if using inputDigest).
Having it as an option could allow different branches to have different cache destinations (production vs development cache)

User facing changes

To use BuildKit transparently, BuildX should be installed as default builder (this creates an alias in docker config) and create a remote instance:

docker buildx install
docker buildx create --name remote --driver remote tcp://buildkitd:2375

Then Skaffold should be configured to detect buildx and set a default cache tag:

skaffold config set -g buildx-builder remote
skaffold config set -g cache-tag cache

Optionally, in skaffold.yaml the cache source and destination can be configured with cacheFrom and cacheTo with the image name:

build:
  artifacts:
  - image: my-app-image
    context: my-app-image
    docker:
      dockerfile: Dockerfile
      cacheFrom:
      - "my-app-image"
  tagPolicy:
    inputDigest: {}
  local:
    useBuildkit: true
    useDockerCLI: true
    tryImportMissing: true
  • If no tag is specified for cache, the configured cache-tag will be used (in this example my-app-image:cache)
  • If cache to destination is not specified, the cache-from image name and tag will be adjusted, adding type=registry,mode=max (only if push images is enabled)

This cache behavior is intended to provide transparent user experience, with similar functionality compared to traditional local docker builds, without additional boilerplate or different configuration / patches for a CI workflow.

Finally, the user can run skaffold dev or skaffold build as usual.

Follow-up Work

Future work: #2110 to include more BuildKit advanced features support like multiple build contexts, for an improved mono-repo experience for large code-bases.

TODO:

  • Design document
  • Unit Tests
  • Integration tests

Changes summary:

  • adds a new global config buildx-builder to enable buildx detection and support for different builders
  • implements logic to detect buildx is the default builder (via docker config alias)
  • removes dependency on docker daemon if using BuildX (still supported to load images if using minikube or similar)
  • adds a new global config cache-tag to override default cache tagging (instead of generated tag)
  • adds a cacheTo to sakffold schema for the cache destination (optional, default is to use new cacheFrom + cache tag)
  • adds --cache-to support in Docker CLI build if using BuildX CLI
  • adds multiplatform images building support for buildkit under buildx

All changes are backward compatible, so if the developer is not using buildx, traditional docker builder will be run.

Signed-off-by: [email protected]

…Tools#6732)

* detect-buildx global config option for backward compatibility
* cache-tag global config option to customize cache destination
* new CacheTo in DockerArtifact in configuration yaml (for docker build --cache-to)

* export LoadDockerConfig to read ~/.docker/config.json for buildx detection
* fix avoid loading image via buildx if no docker daemon is accessible
* fix remote lookup / import missing in buildx workaround
* fix image import if no docker daemon is available (under buildx)
* adjust cache reference preserving tag and default cacheTo if not given
* parse buildx metadata to extract ImageID digest

Initially based on ebekebe's GoogleContainerTools#8172 patch
ebekebe@1c1fdeb

Signed-off-by: [email protected]
@reingart reingart requested a review from a team as a code owner January 11, 2025 21:07
@reingart reingart requested a review from Darien-Lin January 11, 2025 21:07
Copy link

google-cla bot commented Jan 11, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback in a few minutes. In the meantime, here's a summary of the changes for you and other reviewers to quickly get up to speed:

This pull request, feat: transparent buildx support for remote buildkit (#6732), aims to improve Skaffold's build process by adding transparent support for docker buildx. The primary goal is to enable users to leverage the advanced features of buildx, such as remote buildkit builders and improved caching, without significant changes to their existing Skaffold configurations.

Here's a breakdown of the changes:

  • New Global Configs:
    • detect-buildx: Enables automatic detection of buildx as the default builder.
    • cache-tag: Allows overriding the default cache tagging strategy, useful for managing caches across different branches or environments.
  • Build Process Enhancements:
    • The build process now intelligently detects and utilizes buildx if available and configured.
    • Reduced dependency on the local Docker daemon when using buildx, enhancing security and flexibility.
    • Added cacheTo support in the Skaffold schema to specify custom cache destinations.
    • Added --cache-to support to the Docker CLI build process when using buildx.
  • Backward Compatibility: All changes are designed to be backward compatible. If buildx is not detected or used, Skaffold will fall back to the traditional Docker builder.

Key File Changes:
The majority of changes are focused within the pkg/skaffold/build directory, specifically affecting the cache and Docker build logic. Significant modifications were made to cache.go, lookup.go, and docker.go to handle buildx integration and caching strategies. The schema (config.go) was also updated to include the new cacheTo field and detect-buildx option.

Note: The pull request mentions follow-up work to support additional advanced buildkit features in the future. Unit and integration tests are also listed as TODO items.

I hope this summary is helpful. I'll post my detailed review shortly.


And now, a little haiku to lighten the mood:

BuildKit's swift flight,
Remote caches, smooth and fast,
Code flows like a stream.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces transparent buildx support, which is a valuable addition. The implementation looks good overall, and addresses the issues raised in #6732 and #9197. I have a few suggestions for improvement.

for more info see https://github.com/orgs/community/discussions/45969

tl;dr buildx attestations include additional metadata that confuses registry tools (breaking the integration test)

```
docker buildx imagetools inspect localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
```
```
Name:      localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
MediaType: application/vnd.oci.image.index.v1+json
Digest:    sha256:...

Manifests:
  Name:        localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    linux/arm64

  Name:        localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    linux/amd64

  Name:        localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:...
    vnd.docker.reference.type:   attestation-manifest

  Name:        localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:...
    vnd.docker.reference.type:   attestation-manifest
```

Using `BUILDX_NO_DEFAULT_ATTESTATIONS=1` fixes the issue, removing the unknown platforms:
```
Name:      localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:...

Manifests:
  Name:      localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      localhost:5000/my-app:v2.11.1-120-g8adcfa106-dirty@sha256:...
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64
```
@reingart
Copy link
Author

reingart commented Mar 1, 2025

Hi team! ping :-)

I've updated the new schema changes, following the new version 23 generated by @menahyouyeah in #9741, so now the schema test pass.

In my personal fork all checks are passing: https://github.com/reingart/skaffold/pull/1/checks including:

  • New unit test cases fully covering buildx local and remote scenarios, including detection and new cache-to.
  • New integration test cases fully covering basic buildx image building, and advanced ones for amd64/arm64 architectures and multi-platform images.
    Integration tests runs with a buildx buildkit builder and a registry deployed in containers, so no additional changes to testing infra is required (can run locally, do not uses GCP or any other special cloud infra)

Also I'd written a tentative design proposal: buildx-transparent-support.md, trying to address request in related PR like #8172 (comment) by @gsquared94 regarding multiple buildx instances, actionable errors and multi-platform images.

Let me know if you need anything else to evaluate the PR, any recommendation is welcomed

@mikegore1000
Copy link

mikegore1000 commented Apr 11, 2025

Hi,

I'm currently looking to use buildx with Skaffold. One of the use cases I have is being able to authenticate with a package feed during the build process.

I have spiked out an approach using custom build scripts so I can use buildx commands. One of the features I've used is Docker secret mounts as this allows me to pass in the credentials in a way that doesn't invalidate the Docker cache if a secret changes.

Our build process passes a unique secret per run to authenticate with the package feed, so without this support we effectively have to rebuild the image from scratch every single time which adds minutes onto our builds.

Would it be possible to add support for build secrets as part of this PR?

Copy link
Contributor

@snickell snickell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a huge leap forward, we (code.org) are evaluating skaffold, and our primary reservation in selecting skaffold is the complexity of maintaining our own custom builder scripts to use buildx. I hope the skaffold team has the bandwidth to work with this PR all the way to merge.

My primary question, which may best be addressed in a follow-on PR in order to build momentum: how can we use multiple buildx instances in order to build with native builders, and then stitch them into a multi-platform image?

"type": "string"
},
"type": "array",
"description": "the Docker images used as cache destination. If omitted, cacheFrom is used with max mode to export all layers.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not terribly important, since most people playing with this (today at least) will already know what buildx --cache-to accepts, but I wouldn't mind a link to the docker docs here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will add a link to the docs: https://docs.docker.com/reference/cli/docker/buildx/build/#cache-to (I see ko and other builders have similar links)

Since `docker build` and `docker buildx build` essentially share the same options, no major modifications are needed to Skaffold for backward compatibility.

This proposal implements the logic to detect if buildx is the default builder (looking for an alias in the docker config).
To [set buildx as the default builder](https://github.com/docker/buildx?tab=readme-ov-file#set-buildx-as-the-default-builder), the command `docker buildx install` should be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a must-have by any means, but it'd be nice to have an explicit way at the skaffold.yml level, like useBuildkit: true, to specify that buildx should be used, and therefore error out if its not available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more actionable error would be great if buildx is not installed properly (that is included in the design doc for further phases), but I explicitly would like to avoid a configuration in skaffold.yaml because:

  • users that doesn't have buildx installed would still be able to build
  • no modification of skaffold.yaml is needed to switch between CI and local envs

This way, buildx support is transparent and the user can decide to use it or not, just setting the config parameter with skaffold config set buildx-builder ...

Yes, some advanced features would not work (like exporting the cache), but this would not be critical for local development (and wont work either if proper registry push credentials are not given).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That thinking makes sense to me, I do like your idea of providing graceful "fallback". That said, iIn the cases of very big projects, if cache-from in particular isn't working, I'd personally prefer my "app developers" to get an error that they really need to configure buildx, at the very least a noisy warning. Many of them will silently not configure buildx, and will later complain that builds take 30 minutes, when if cache-from was working, it'd be like 5 minutes.

skaffold config setting the builder system-wide rather than per-project feels like it has small issues in some cases: for example, in my case I will have one project where I need multi-platform, and another where I don't. But I agree that hardcoding the builder names in skaffold.yaml would be much worse.

Then, additional buildx features are availables, like multi-platform support and different cache destinations.

Cache adjustment is extended via the the `cache-tag`, useful to point to latest or a generic cache tag, instead of the generated one for this build
(via `tagPolicy` that would be useless in most distributed cases, as it can be invalidated by minor changes, specially if using `inputDigest`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature, appreciate the thoughtfulness here.

docker buildx install
```

Then Skaffold should be configured to detect buildx (default builder) and set a generic cache tag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, per below in the doc, these are optional. If my understanding is correct, it'd be great if this doc lists these as "can be" rather than "should be".

if !b.pushImages {
// All of the builders will rely on a local Docker:
if !b.buildx && !b.pushImages {
// Most builders will rely on a local Docker (except when using a remote buildkit via buildx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this relate to using a local buildkit? Will these have to basically dump to local docker?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a local docker daemon, it should have a "default" buildkit builder.
If I understand correctly, it is currently being used when the skaffold.yaml has useBuildkit: true.
It can be used with buildx too, executing skaffold config set buildx-builder default.
Let me know if I understood ok the question or more info is needed.

if platforms.IsMultiPlatform() && !SupportsMultiPlatformBuild(*artifact) {
// buildx creates multiplatform images via buildkit directly
if platforms.IsMultiPlatform() && !SupportsMultiPlatformBuild(*artifact) && !b.buildx {
built, err = CreateMultiPlatformImage(ctx, out, artifact, tag, platforms, artifactBuilder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the simplicity of this approach, and this is really a curiosity more than a comment. Is there a reason not to instead extend CreateMultiPlatformImage to run individual buildx instances for each platform, and then stitch them into a single manfiest using the built-in skaffold stitching?

In particular, in my use case, I would want to run (via GH actions) separate /native/ builders (our build is too slow on arm to want to use qemu), define them as multiple remotes, and then stitch them in the main skaffold build command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the native buildkit multi-platform can offload the build parallelization and image stitching from skaffold, and it would support more strategies:
https://github.com/docker/buildx?tab=readme-ov-file#building-multi-platform-images

  1. Using the QEMU emulation support in the kernel
  2. Building on multiple native nodes using the same builder instance
  3. Using a stage in Dockerfile to cross-compile to different architectures

I addressed more about using native nodes in my other reply, I think it should be possible and match your use case without needing separate builds or stitching: #9648 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes great sense to me, thanks for explaining, and its really cool that buildx has support for "fanning out" to native builders built-in. Somehow I had never noticed that despite using buildx for a couple years.

@reingart
Copy link
Author

reingart commented Apr 16, 2025

Hi @mikegore1000, thanks for the comments!

I have spiked out an approach using custom build scripts so I can use buildx commands. One of the features I've used is Docker secret mounts as this allows me to pass in the credentials in a way that doesn't invalidate the Docker cache if a secret changes.

Yes, I use that too, including SSH key forwarding.
Beside cache invalidation, it also helps to avoid leaking secrets in the built image.

If you can share the custom script and skaffold config I could take a look and tell you how to adapt to use this PR.

Our build process passes a unique secret per run to authenticate with the package feed, so without this support we effectively have to rebuild the image from scratch every single time which adds minutes onto our builds.

Would it be possible to add support for build secrets as part of this PR?

AFAIK build secrets are already supported by BuildKit in skaffold (natively), and this PR supports them even with remote builders (either docker secrets and ssh mounts).
Other BuildKit features also work remotely with this PR, like cache mounts, multi-stage builds and cache import/export (some will be limited by the type o builder and shared storage, for example only a registry can be used as cache destination if the builder run in containers IIRC).

I will try to implement a more advanced example with these features (secrets and ssh mounts + caching), in a future stage once the code stabilizes (proposed in the design document).

@reingart
Copy link
Author

reingart commented Apr 16, 2025

Hi @snickell , thanks for the comments and review!

My primary question, which may best be addressed in a follow-on PR in order to build momentum: how can we use multiple buildx instances in order to build with native builders, and then stitch them into a multi-platform image?

The current PR already support multi-platforms builds without needing to stitch different images (see "build multiplatform images with buildx" integration test for both amd64 & arm64).
In fact, the individual build pipeline and stitching was modified to not be used in buildx, and the integration test is using the emulation.

I think that this could even work for native builders, according the buildx documentation this could be implemented just adding the nodes:

Using multiple native nodes provide better support for more complicated cases that are not handled by QEMU and generally have better performance. You can add additional nodes to the builder instance using the --append flag.

$ docker buildx create --use --name mybuild node-amd64
$ docker buildx create --append --name mybuild node-arm64
$ docker buildx build --platform linux/amd64,linux/arm64 .

I didn't tested this scenario, but it should work as it is just a configuration to buildx outside skaffold.

@snickell
Copy link
Contributor

Thank you @reingart for linking me to the use of "native build nodes" from buildx. Really cool. I'll do a test of using this from within skaffold (using this branch) via gh actions.

In case it wasn't clear to everyone: I really support this PR, and for our use case, its a very good fit. I would love to see this PR merged as-is.

@mikegore1000
Copy link

Hi @reingart,

I'm pretty new to Skaffold so must have missed the built-in secret support. That plus your PR means we can do everything without any build scripts and it's clear how I'd go about setting that up.

Thanks for the help. Really looking forward to this getting merged in!

@rajatvig
Copy link

Really looking forward to having this change merged in.

@rajatvig
Copy link

@reingart Are you still working on this PR?

@reingart
Copy link
Author

@rajatvig yes, mostly testing it in development.
I will try to set-up a fork to provide a compiled release to ease reviews/QA and keep in sync (and to include probably other minor improvements).
Let me know if you have any question (sorry for the late reply, I was on vacation)

reingart added a commit to reingart/skaffold that referenced this pull request Aug 14, 2025
Personal fork to test "transparent buildx support for remote buildkit"

GoogleContainerTools#9648
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support building securely against remote buildkitd
6 participants