Add different reproducibility tests #179

5u623l20 · 2025-03-11T15:17:37Z

uid - Tests if built with non privileged user. Specially root user with -DNO_ROOT and nobody with -DNO_ROOT
path - Tests if build is run from different path and using file-prefix-map
timestamp - Tests if built in two different timestamps with SOURCE_DATE_EPOCH
parallel - Tests with -j1 vs -jX
locale - Tests with different locales

In plan for checking with UFS vs ZFS but I need some times to check this in the CI infrastructure.

1. arch - Tests if built with different CPUTYPE 2. clang - Tests if built with different CLANG version 3. kernconf - Tests if built with different KERNCONF mainly with DEBUG and NODEBUG 4. linkergc - Tests if built with garbage collection 5. linkicf - Tests if built with code folding 6. linkerstatic - Tests if built statically 7. locale - Tests with different locales 8. parallel - Tests with -j1 vs -jX 9. path - Tests if build is run from different path and using file-prefix-map 10. timestamp - Tests if built in two different timestamps with SOURCE_DATE_EPOCH 11. uid - Tests if built with non privileged user. Specially root user with -DNO_ROOT and nobody with -DNO_ROOT In plan for checking with UFS vs ZFS but I need some times to check this in the CI infrastructure.

emaste · 2025-03-11T15:52:27Z

Some of these are not expected to produce identical results -- CPUTYPE, Clang version, optimization options.

We should be reproducible with changes to locale, parallel builds, uid, fs type, perhaps path

scripts/jail/default-pkg-list

5u623l20 · 2025-03-11T19:20:48Z

Some of these are not expected to produce identical results -- CPUTYPE, Clang version, optimization options.

If I read the Makefile correctly I think only possible changes for CPUTYPE for amd64 should be related to OpenSSL. So I added this to check if there are additional libs/binaries creating difference. And I have added those to regularly check whether if there are symbol reordering occurring in new codebase introduced in the base system. But if those really does not carry any value in longer term I would be happy to remove those tests.

We should be reproducible with changes to locale, parallel builds, uid, fs type, perhaps path

I will try to complete the check with zfs vs ufs as MAKEOBJDIRPREFIX.

lwhsu · 2025-03-11T19:24:59Z

I feel the if [ ${TESTTYPE} = "XXX" ]; blocks in scripts/build/build-reproducible.sh is not a very good practice. I suggest extract them to the build.sh in each job. For common steps and settings, we can have things like reproducibility-env.conf for build.sh to source, and reproducibility-pre.sh and ``reproducibility-post.sh` for reduce duplication.

lwhsu · 2025-03-11T19:27:50Z

It's not recommended to have pkg install in the build scripts because it's better to have environment setup and execution parts separated. the pkg-list file in each job's directory is for that purpose.

For the long term, we also want to remove the use of sudo, it was used from the beginning as NO_ROOT is not well supported.

jjb/template.yaml

scripts/build/build-reproducible.sh

lwhsu · 2025-03-11T19:40:05Z

scripts/build/build-reproducible.sh

+TARGET=${TARGET:-amd64}
+TARGET_ARCH=${TARGET_ARCH:-amd64}
+ARTIFACT=${WORKSPACE}/diff.html
+ARTIFACT_DEST=artifact/reproducibility/${FBSD_BRANCH}/${TARGET}/${TARGET_ARCH}/${GIT_COMMIT}-${TESTTYPE}.html


Do these mean to archive the html output from diffoscope to artifact servers? It's ok for now but I feel it's better to use the html result to keep it along with the builds, that's easier for gc in the future.

That is doable. But I will address this after some runs. I could not exactly replicate this artifact upload thing in my setup as all my builds step are failing with these error:

07:31:40 + ./freebsd-ci/artifact/post-link.py 07:31:40 Traceback (most recent call last): 07:31:40 File "/jenkins/workspace/FreeBSD-main-armv7-build/./freebsd-ci/artifact/post-link.py", line 14, in <module> 07:31:40 x['branch'] = os.environ['FBSD_BRANCH'] 07:31:40 ~~~~~~~~~~^^^^^^^^^^^^^^^ 07:31:40 File "<frozen os>", line 679, in __getitem__ 07:31:40 KeyError: 'FBSD_BRANCH'

So I will try to address these after some runs.

scripts/build/build-reproducible.sh

lwhsu · 2025-03-11T19:52:59Z

The (theoretically) final step of the build is:

diffoscope --html ${WORKSPACE}/diff.html ${WORKSPACE}/obj ${WORKSPACE}/objXXX

It's exit code is 0 or non-0 for expressing the code is reproducible or not, which is important to be the exit code of the whole build script as it will tell Jenkins (or other CI system) mark the build successful or fail. If it isn't the final step in the script, the exit code should be saved and exit the script with it.

Also note that if the two targets of the comparison are the same, there is no html generated.

lwhsu · 2025-03-11T19:53:17Z

Do we really want to compare all the object files? My feeling is that comparing the final files would be sufficient, but that means we may need to do a installworld & installkernel to a directory. Although it's an extra step, but we can also check things not generated from buildworld and buildkernel steps, but will also effect the final artifacts.

lwhsu · 2025-03-11T20:03:10Z

One thing I'm thinking but not sure what's the best way to do it currently is that, now we build world and kernel twice in each jobs. However, the first pass is just default build with WITH_REPRODUCIBLE_BUILD=yes, if all reproducible jobs are building against the same hash (likely), we can separate them into two steps (jobs), one for building the baseline and archives the artifact to artifact server, the other test jobs build with the customized options and comparing with the fetched baseline files.

1. arch - Tests if built with different CPUTYPE 2. clang - Tests if built with different CLANG version 3. kernconf - Tests if built with different KERNCONF mainly with DEBUG and NODEBUG 4. linkergc - Tests if built with garbage collection 5. linkicf - Tests if built with code folding 6. linkerstatic - Tests if built statically 7. locale - Tests with different locales 8. parallel - Tests with -j1 vs -jX 9. path - Tests if build is run from different path and using file-prefix-map 10. timestamp - Tests if built in two different timestamps with SOURCE_DATE_EPOCH 11. uid - Tests if built with non privileged user. Specially root user with -DNO_ROOT and nobody with -DNO_ROOT In plan for checking with UFS vs ZFS but I need some times to check this in the CI infrastructure.

lwhsu · 2025-03-11T20:10:02Z

Some of these are not expected to produce identical results -- CPUTYPE, Clang version, optimization options.

And also kernconf? DEBUG and NODEBUG seem to determine including debug information or not.

emaste · 2025-03-11T20:13:51Z

Yes, enabling debug options will certainly produce different results. (Not just debug info, but WITNESS, INVARIANTS etc.)

5u623l20 · 2025-03-12T10:33:56Z

One thing I'm thinking but not sure what's the best way to do it currently is that, now we build world and kernel twice in each jobs. However, the first pass is just default build with WITH_REPRODUCIBLE_BUILD=yes, if all reproducible jobs are building against the same hash (likely), we can separate them into two steps (jobs), one for building the baseline and archives the artifact to artifact server, the other test jobs build with the customized options and comparing with the fetched baseline files.

This is a refinement that I want to work on but not right at this moment. We have to add additional artifacts to record the SOURCE_EPOCH_TIME. And use that to build the later tests. I want to initially make some tests available for the developers while we improve this.

- Add a build_world_kernel function

5u623l20 · 2025-03-13T09:33:21Z

Do we really want to compare all the object files? My feeling is that comparing the final files would be sufficient, but that means we may need to do a installworld & installkernel to a directory. Although it's an extra step, but we can also check things not generated from buildworld and buildkernel steps, but will also effect the final artifacts.

While the Debian reproducibility tests actually compare the final pkgbase, it makes more sense for us to test the build artifacts instead of the final artifacts as the developers can understand what went wrong for a certain file. We can add another test to test the final artifacts instead of the build artifacts.

emaste · 2025-03-14T14:35:41Z

I'll comment on other topics later, but specifically with respect to intermediate artifacts (object files, etc.) there's not much value in comparing them vs. the effort of doing so. The base system is almost completely reproducible today and the outstanding issues are in pkg (freebsd/pkg#2427) and base system packaging (tarballs, VM images, etc.), not in the individual binaries or libraries.

emaste · 2025-03-14T14:55:04Z

By a similar argument we don't really need to test reproducibility with only one change, at least for most variables -- there is little nonreproducibility, so we can just do the second build with all of the changes in one build (uid, -j value, locale) and save a bunch of CPU time.

paths may be an exception to this; I don't believe we've done much to address paths across the base system and it might make sense to keep them separate for now.