Skip to content

Conversation

@yakutovicha
Copy link
Member

No description provided.

@yakutovicha yakutovicha marked this pull request as draft May 5, 2025 16:23
@edoardob90 edoardob90 changed the title Temporarily disable arm build. Temporarily disable arm build May 5, 2025
@edoardob90
Copy link
Member

@yakutovicha This seems to work on Renku. Pausing/resuming the session as well. It seems that there're problems with multi-arch images, maybe something related to how K8s is set up?

I added the annotations according to this page: https://docs.docker.com/build/exporters/image-registry/

@rokroskar
Copy link

Hi, I'm from the Renku team 👋

Great that you're able to get your image to work! However, I noticed it's rather large, it looks like over 11GB which means that sessions take a really long time to start. Is there any way you can reduce the image size? Are you adding data to the image or is it purely python packages?

@rokroskar
Copy link

btw I believe to make multiarch images work you need to disable the provenance flag, e.g. like we do here

@yakutovicha
Copy link
Member Author

Hey @rokroskar, thanks for reaching out!

Great that you're able to get your image to work!

The image doesn't work on my account, unfortunately. It remains like this and never proceeds further (not sure why 🤷 ) :

image

However, I noticed it's rather large, it looks like over 11GB which means that sessions take a really long time to start.

On my PC the image is 3.47 GB, see below:

image

I feel a bit stuck here, any help would be greatly appreciated 🙏

@rokroskar
Copy link

I see this in our logs

Successfully pulled image "ghcr.io/empa-scientific-it/python-tutorial:35139ae5b38f" in 560ms (560ms including waiting). Image size: 11587224791 bytes.

Your docker client might be reporting the compressed size? not sure.

Unfortunately very large images can take a really long time to download and to start. I see torch installed - this can easily lead to very large images, especially if it is installed potentially multiple times. Could this be part of what is causing the large size?

@edoardob90
Copy link
Member

Hi, I'm from the Renku team 👋

Great that you're able to get your image to work! However, I noticed it's rather large, it looks like over 11GB which means that sessions take a really long time to start. Is there any way you can reduce the image size? Are you adding data to the image or is it purely python packages?

Hi @rokroskar! We're not even adding the content of the repository in the image. The base image is quay.io/jupyter/minimal-notebook:latest (which is about 1.6 GB locally), and then we're simply adding some dependencies via apt, plus updating the base environment. I suspect it's that step that bloats the image somehow, but I don't know it could get that big.

Is there any way to know about the image's actual size Renku is going to pull?

@edoardob90
Copy link
Member

I see this in our logs

Successfully pulled image "ghcr.io/empa-scientific-it/python-tutorial:35139ae5b38f" in 560ms (560ms including waiting). Image size: 11587224791 bytes.

Your docker client might be reporting the compressed size? not sure.

Unfortunately very large images can take a really long time to download and to start. I see torch installed - this can easily lead to very large images, especially if it is installed potentially multiple times. Could this be part of what is causing the large size?

Ah, well, PyTorch might easily be the responsible here. Honestly, I don't know a workaround other than trying to use another base image (or a combination?), where PT is already installed and maybe optimized

@rokroskar
Copy link

I was more wondering if the different torch-enabled packages are maybe adding their own versions of cuda? That is what contributes to torch bloat.

@yakutovicha
Copy link
Member Author

I see this in our logs

Successfully pulled image "ghcr.io/empa-scientific-it/python-tutorial:35139ae5b38f" in 560ms (560ms including waiting). Image size: 11587224791 bytes.

Your docker client might be reporting the compressed size? not sure.

This image was built with repo2docker, we changed the approach recently. The image from this PR should be much smaller (ghcr.io/empa-scientific-it/python-tutorial:pr-296), but it somehow still fails to start.

@olevski
Copy link

olevski commented May 6, 2025

dive is a tool that can let you analyze docker images - specifically it can tell you how large is each layer in your image. Then you can use that information to optimize. It also has some tests/way to determine how "efficient" the image is, although I am not sure what metrics/heuristics it uses to determine that.

https://github.com/wagoodman/dive

@olevski
Copy link

olevski commented May 6, 2025

@yakutovicha I am one of the renku developers. Can you share the project where you are trying to run the image that is failing? Is that possible? Or if not can you share the session launcher configuration you are using to launch that image?

@yakutovicha
Copy link
Member Author

@olevski, thanks, yes sure. The project is https://renkulab.io/v2/projects/empa-scientific-it/empa-it-python-tutorial

Just some heads up, I was wrong about failing. It took about 20 minutes to start for the first time, but then it was working.

Regarding the image size, @edoardob90 will explore the options in a separate PR. Thanks a lot for the suggestions 🙏

@rokroskar
Copy link

Maybe another data point - I made a fork of this repo and moved the environment.yml file to the root directory - then I used renku to build an image for it automatically. You can see it in the python tutorial launcher here. This makes a slightly smaller image (7GB) and launches fine - the limitation (for now) is that it uses vscode, but this will be changed soon to also support jupyter. Still takes ~5 minutes to launch, but might be a workable option? Feel free to try it out to see if all the notebooks work as expected (I ran a few and they seem ok).

@olevski
Copy link

olevski commented May 6, 2025

@olevski, thanks, yes sure. The project is https://renkulab.io/v2/projects/empa-scientific-it/empa-it-python-tutorial

Ok if the session eventually starts then that is ok. I thought it was misconfigured and it was truly failing. What you describe is definitely the case of the image being really big.

What @rokroskar suggested is a really viable option though. And it would save you some time in building and publishing the image. When you let Renku build the image we also store it in our own image repository which is much faster to access. Also I think ghcr and dockerhub start to throttle you after they see you keep pulling images. This cannot happen in the case where we build and host the image.

Currently when you let Renku build the image for you we only support VSCodium. But in the new release (that is coming out in about a week from now) we will support also Jupyterlab.

@edoardob90
Copy link
Member

Maybe another data point - I made a fork of this repo and moved the environment.yml file to the root directory - then I used renku to build an image for it automatically. You can see it in the python tutorial launcher here. This makes a slightly smaller image (7GB) and launches fine - the limitation (for now) is that it uses vscode, but this will be changed soon to also support jupyter. Still takes ~5 minutes to launch, but might be a workable option? Feel free to try it out to see if all the notebooks work as expected (I ran a few and they seem ok).

Only problem with using VS Code: some features for the interactive exercises rely on Jupyter+pytest, and that doesn't work properly when the notebooks are opened directly in VS Code (or Codium).

@yakutovicha
Copy link
Member Author

superseded by #298

@yakutovicha yakutovicha closed this May 6, 2025
@yakutovicha yakutovicha deleted the workaround/disable-arm-build branch May 6, 2025 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants