Skip to content

Conversation

@Evaan2001
Copy link
Contributor

@Evaan2001 Evaan2001 commented May 5, 2025

Description

This PR:

  1. Creates a web page for my blog on Fine-tunuing an LLM on CPUs using Bazel and NativeLink
  2. Adds some new words to vale vocab
  3. Makes a minor formatting change to nativelink/web/platform/src/content/docs/docs/config/production-config.mdx that is also presented in PR Add AI Customer Service Blog Post #1743. This formatting change needs to be merged into production before I can create a new blog post. Since PR Add AI Customer Service Blog Post #1743 will probably get merged later, I had to commit that change in this PR as well to make all the CI tests pass.

Type of change

  • Documentation update

How Has This Been Tested?

The content and design of this web page are heavily inspired by the web page I made for my full AI Customer Service Agent blog post, which has already received LGTMs from Marcus and Aaron, though the latter has not yet been added to production. Once the page development was complete and the page was examined via rm -rd dist && bun preview, everything looked as expected, and no errors were thrown.


This change is Reviewable

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see an explanation for how work is distributed or accelerated by Nativelink. My guess is the scheduler but you should make that clear to the reader in a subsequent iteration. For now, you can ship this one.

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 3 files reviewed, and 1 discussions need to be resolved

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: keep the crazy ideas coming.

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: 1 of 2 LGTMs obtained, and 2 of 3 files reviewed, and 1 discussions need to be resolved

@MarcusSorealheis MarcusSorealheis enabled auto-merge (squash) May 5, 2025 14:11
Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: 1 of 2 LGTMs obtained, and all files reviewed, and 14 discussions need to be resolved


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 37 at r2 (raw file):

1\. Create a directory called `finetune-repo` and open your IDE from within it. Also create a `setup.sh` script. Alternatively, you can use the following shell commands
<div style="max-height: 250px; max-width: 50vw; overflow: auto;">

All of these setup instructions seem redundant. We can cut down on reading and setup time significantly if we create a repo for this that the user can clone. Then the setup becomes a single git clone github:TraceMachina/finetune-repo.


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 132 at r2 (raw file):

#
# This file is autogenerated by pip-compile via the following command:
#    pip-compile --output-file=bazel_requirements_lock.txt requirements.txt

If you use pip-compile, run with the hash generation option. The better approach is to use uv which is about 7x faster than pip. Use pyproject.toml to specify the versions and create the lockfile with uv lock.

See also: bazel-contrib/rules_python#1975


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 335 at r2 (raw file):

### Prologue - Docker Image For Remote Execution

By default, NativeLink provides a minimal Ubuntu 22.04 image *without any* dependencies installed for remote execution. For this project, we created our own custom publicly-accessible docker image (Docker Hub reference: `"container-image=docker://docker.io/evaanahmed2001/python-bazel-env:amd64-v2”`) and we’ve made this free to use.

Use the sha256 when pulling container images from private sources to protect against compromised container registries. In this case it's docker.io/evaanahmed2001/python-bazel-env@sha256:8de13199d587964b218c0b671272b42031cf4944b2f426e6eee7d7542802bf7c as displayed on this page:

https://hub.docker.com/layers/evaanahmed2001/python-bazel-env/amd64-v2/images/sha256-6d8058b6b44ee34860297321f62b2fe99afae21c8594d499998105b3b699c9dd


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 378 at r2 (raw file):

## **Important Note About Bazel’s Remote Execution Support**

Bazel supports remote execution for building (compiling) and testing only, **NOT** for running the built executables. To use remote servers for remote build execution via Bazel, a roundabout way is to design tests that just execute the Python binary we want to run. If the binary terminates without any errors, the test passes.

This is incorrect. Bazel can run executables. What you probably meant to say here is that bazel run uses the host platform as it's target platform, meaning that the executable will be invoked on whichever platform is marked as host. You could work around this by wrapping the binary in a test since bazel test runs on the target platform.

See also: bazelbuild/bazel#21805


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 384 at r2 (raw file):

1\. `py_test` uses the `pytest` package to create tests (convenient, user-friendly)

2\. `sh_test` involves coming up with a shell script that executes the test file (old-school but functional and robust)

There is also native_test from bazel-skylib: https://github.com/bazelbuild/bazel-skylib/blob/454b25912a8ddf3d90eb47f25260befd5ee274a8/rules/native_binary.bzl#L88


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 392 at r2 (raw file):

**DRAWBACK:**

Print statements that track progress (like "Starting to fine tune model" or "Exiting this function") behave differently in remote execution. Unlike local runs where these appear in real-time, remote testing collects all logs on the server and only displays them after test completion when control returns to the local machine. The expected output is still fully preserved - just delayed until the process finishes running.

I believe you can control this via flags (i.e. --test_output=all): https://bazel.build/reference/command-line-reference#build-flag--test_output


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 427 at r2 (raw file):

        If you use a different model, you may need to adjust the output shape
        """
        print("\nEvaluating model accuracy on test set...")

Don't recommend the use of print for "production examples". Use logger instead. See: https://docs.astral.sh/ruff/rules/print/

For bazel specifically this is relevant as you might want to use environment variables in your build to control log output.


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 464 at r2 (raw file):

        false_negatives = np.sum((predicted_classes == 0) & (labels == 1))

        print(f"\nConfusion Matrix (calculated with NumPy):")

Use multiline strings for subsequent log statements.


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 506 at r2 (raw file):

            output_dir = os.path.expanduser(output_dir)

        # model_dir = os.path.join(output_dir, model_name.replace("/", "_"))

Remove comment


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 521 at r2 (raw file):

        total_training_samples = len(dataset["train"])
        print(f"Total training samples: {total_training_samples}")
        # Reserve the last 1000 samples for testing

nit: potentially missing newline above


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 693 at r2 (raw file):

    ```bash
    #!/bin/bash

Use #!/usr/bin/env bash for POSIX compliance. If you need to deviate from this, add a coment to explain why that's necessary.


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 736 at r2 (raw file):

    )

    sh_test(

The builtin shell rules are deprecated. Use rules_shell instead: https://github.com/bazelbuild/rules_shell


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 985 at r2 (raw file):

    <div style="max-height: 250px; max-width: 50vw; overflow: auto;">

    ```bash

Duplicating this kind of logic is not scalable and not a recommended practice. Instead, use the intended rules for this as you don't need any specific functionality between this and the run_training.sh script.

auto-merge was automatically disabled May 8, 2025 19:46

Head branch was pushed to by a user without write access

@Evaan2001 Evaan2001 force-pushed the finetune_llm_cpu_blog branch from b3e28fb to d864e5f Compare May 8, 2025 19:46
Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The layout seems inconsistent. I.e. why do we need the 1\. instead of just 1. and why do we use bold fonts in the titles when they're already titles?

Reviewed 1 of 2 files at r3, 2 of 2 files at r5, all commit messages.
Reviewable status: 1 of 2 LGTMs obtained, and all files reviewed, and pending CI: Bazel Dev / macos-15, Cargo Dev / macos-15, Cargo Dev / ubuntu-24.04, Installation / macos-14, Installation / macos-15, Local / lre-rs / macos-15, NativeLink.com Cloud / Remote Cache / macos-15, Publish image, Publish nativelink-worker-init, Remote / lre-cc / xlarge-ubuntu-24.04, Remote / lre-rs / xlarge-ubuntu-24.04, Web Platform Deployment / macos-15, buildstream, windows-2022 / stable, and 10 discussions need to be resolved


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 26 at r5 (raw file):

### Prerequisites

1\. Bazel ([installation instructions](https://bazel.build/install)). This demo uses Bazel 8.1.1

nit: The demo uses Bazel 8.2.1. A more future proof way in general might be to just use 1. A recent version of Bazel ([installation...).


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 50 at r5 (raw file):

2\. `requirements.lock` - Ensures consistent Python dependencies across all environments<br>
3\. `.bazelrc` - Main Bazel configuration file setting global options for hermetic builds and remote execution<br>
4\. `MODULE.bazel` - Configures the project as a Bazel module, tells Bazel we'll need python, pip and CPU-only PyTorch, and manages external dependencies<br>

nit:

Suggestion:

Python, `pip`

web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 75 at r5 (raw file):

### The NativeLink Difference:

nit: Headers probably shouldn't end in a colon.


web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 77 at r5 (raw file):

### The NativeLink Difference:

To demonstrate NativeLink’s efficacy, consistency, and reliability, we ran the same fine-tuning job on the CPU of an M1 Pro MacBook Pro, the free version of Google Colab on CPU, and [NativeLink](https://github.com/TraceMachina/nativelink),which is free and open-source. We executed the fine-tuning task 5 times and this is what we observed:

Suggestion:

, w

web/platform/src/content/posts/Finetune_LLM_On_CPU.md line 98 at r5 (raw file):

For forward-thinking AI teams, this infrastructure stack represents a shift from the "bigger is better" hardware arms race toward thoughtful resource utilization. The competitive advantage increasingly belongs to those who can extract maximum value from available compute rather than those who deploy more powerful hardware.

nit: Duplicate newline


web/platform/src/content/docs/docs/config/production-config.mdx line 18 at r5 (raw file):

To run NativeLink, you just pass the path to a single JSON5 Configuration file, such as:

```bash

nit: This is a nice fix, but unrelated to this PR. Consider pulling it out into a separate mini fix PR.


.github/styles/config/vocabularies/TraceMachina/accept.txt line 19 at r5 (raw file):

[Hh]ermeticity
JDK
json

This shouldn't be in here. JSON should alwasy be written in capital letters. If you need the file ending .json the corresponding text should be in backticks.


.github/styles/config/vocabularies/TraceMachina/accept.txt line 38 at r5 (raw file):

OSSF
Reclient
[Rr]osetta

Rosetta should always be capitalized.


.github/styles/config/vocabularies/TraceMachina/accept.txt line 39 at r5 (raw file):

Reclient
[Rr]osetta
[Rr]epo

Write out repository instead.

Copy link
Contributor Author

@Evaan2001 Evaan2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-05-14 at 5.08.11 PM.png
Without the backslash in 1\., the 1. doesn't show in bun preview (screenshot attached)

I'll remove the bold from the titles (just thought it looked better).

Reviewable status: 1 of 2 LGTMs obtained, and all files reviewed, and 10 discussions need to be resolved


web/platform/src/content/docs/docs/config/production-config.mdx line 18 at r5 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: This is a nice fix, but unrelated to this PR. Consider pulling it out into a separate mini fix PR.

This was causing vale-related errors barring me from committing changes.

Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r6, all commit messages.
Dismissed @MarcusSorealheis from a discussion.
Reviewable status: 2 of 2 LGTMs obtained, and all files reviewed, and pending CI: Cargo Dev / macos-15, Local / lre-rs / macos-15, NativeLink.com Cloud / Remote Cache / macos-15, buildstream, and 1 discussions need to be resolved

@aaronmondal aaronmondal enabled auto-merge (squash) May 14, 2025 22:55
Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained, and all files reviewed

@aaronmondal aaronmondal merged commit 18eb9e1 into TraceMachina:main May 14, 2025
37 checks passed
MarcusSorealheis pushed a commit to MarcusSorealheis/nativelink that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants