#+title: Reproducibility, Boostrappability and Functional Package Management #+date: 2026-06-01 Mon #+author: W. Kosior #+email: wkosior@agh.edu.pl * SolarWinds - "Sunburst" - 2020 - compromised build infrastructure - ~18,000 customers affected - motivated discussion & guidance development * Chalk Backdoor - 2025 - "I Got Pwned" - developer phished - honest about it - *extremely* popular npm Registry package - terminal coloring - depended on by almost every project - backdoor - crypto-stealing (typical for NPM registry) - millions of downloads in ~2 hours - unused "potential" ;) - + a few other compromised packages * The Build Process #+ATTR_ORG: :width 100% [[./11-build-process-overview.svg]] * Kinds of Dependencies - runtime dependencies - development dependencies - build dependencies (compilers, test frameworks, etc.) - other tools (IDEs, linters, debuggers, etc.) * Reproducibility #+begin_example hermeti​city fixed build inputs ​a bit of care |​ | ​ | |​ | ​ | +​------+ | +​------+ ​ | | | ​ V V V ​ bit-to-bit identical result each time ​ | ​ | ​ V ​ identical hashes -----> multiparty verification #+end_example * Successful Multiparty Verification #+ATTR_ORG: :width 100% [[./11-rebuilds-no-contamination-diagram.svg]] * Unsuccessful Multiparty Verification #+ATTR_ORG: :width 100% [[./11-rebuilds-contamination-diagram.svg]] * Reproducibility & Hermeticity in Guidance - Software Component Verification Standard - OWASP, 2020 - no direct mention - Supply Chain Levels for Software Artifacts - OSSF, 2023 - in draft, later removed - might be re-introduced in later versions - Secure Supply Chain Consumption Framework - briefly mentioned - OSSF, 2022 - Software Supply Chain Security Best Practices - CNCF, 2021 - potentially leverageable when high assurance needed - Securing the Software Supply Chain: Recommended Practices Guide - NSA, ODNI, and CISA, 2022 - "advanced" mitigations, "additional protection" - some text copied from SLSA draft * Reproducibility in Practice - 1992-3 (Cygnus, GCC) - cross-compilation too! - wisdom later lost * Reproducibility in Practice, Cont. - 1992-3 (Cygnus, GCC) - Gitian (2011) - VM for builds - Bitcoin Core builds until 2021 * Reproducibility in Practice, Cont… - 1992-3 (Cygnus, GCC) - Gitian (2011) - Debian - DebConf13 (2013) - 24% reproducible in 2013 - 2018 — APT transport with in-toto - not usable as of 2025 :( - 97% reproducible in 2025 * Reproducibility in Practice, Cont… - 1992-3 (Cygnus, GCC) - Gitian (2011) - Debian - other early adopters (2015) - Coreboot - OpenWrt - NetBSD - FreeBSD - Arch - Fedora * Reproducibility in Practice, Cont… - 1992-3 (Cygnus, GCC) - Gitian (2011) - Debian - other early adopters (2015) - reproducible container images - /(traditionally a reproducibility-unfriendly area)/ - base Arch container (2024) - GNU Guix containers (2025) * Reproducibility in Practice, Cont… - 1992-3 (Cygnus, GCC) - Gitian (2011) - Debian - other early adopters (2015) - reproducible container images - more at https://reproducible-builds.org * Reproducibility CI - https://reproducible-builds.org/citests/ * Sample Reproducibility Challenges - timestamps - solution: set all to Epoch - order of filenames - directory listings - created archives - solution: sort names - paths (debugging information) - solutions: - exclude paths in DWARF - build in the same prefix - casual *rebuildability* problems - build time bombs (e.g., expiring certs in tests) - no automated rebuildability for npm Registry, PyPI, etc. * Diffoscope #+begin_example $ diffoscope somearchive.tar.gz otherarchive.tar.gz --- somearchive.tar.gz +++ otherarchive.tar.gz │ --- somearchive.tar ├── +++ otherarchive.tar │ ├── file list │ │ @@ -1,2 +1,2 @@ │ │ -drwxr-xr-x 0 urz (1000) urz (1000) 0 2026-06-01 05:41:31.000000 somedir/ │ │ --rw-r--r-- 0 urz (1000) urz (1000) 102 2026-06-01 05:41:32.000000 somedir/lecture-head │ │ +drwxr-xr-x 0 urz (1000) urz (1000) 0 2026-06-01 05:41:10.000000 somedir/ │ │ +-rw-r--r-- 0 urz (1000) urz (1000) 148 2026-06-01 05:41:10.000000 somedir/lecture-head │ ├── somedir/lecture-head │ │ @@ -1,4 +1,4 @@ │ │ -#+title: Software Repositories │ │ -#+date: 2026-05-25 Mon │ │ +#+title: Reproducibility, Boostrappability and Functional Package Management │ │ +#+date: 2026-06-01 Mon │ │ #+author: W. Kosior │ │ #+email: wkosior@agh.edu.pl #+end_example * Diffoscope, Cont. #+begin_example $ diffoscope hello1 hello2 --- ../build-2026-06-01T0759/hello1 +++ hello2 ├── readelf --wide --debug-dump=rawline {} │ @@ -25,15 +25,15 @@ │ Opcode 9 has 1 arg │ Opcode 10 has 0 args │ Opcode 11 has 0 args │ Opcode 12 has 1 arg │ │ The Directory Table (offset 0x22, lines 2, columns 1): │ Entry Name │ - 0 (line_strp) (offset: 0x8): /tmp/build-2026-06-01T0759 │ + 0 (line_strp) (offset: 0x8): /tmp/build-2026-06-01T0800 │ 1 (line_strp) (offset: 0x23): /usr/local/include #+end_example * Making Use of Reproducibility - monitoring software distribution - easy - Debian, Arch, GNU Guix, Nix, etc. - verifying software before installing/distributing - better guarantees - harder - Bitcoin Core - reproducing research * "Reflections on Trusting Trust" - Ken Thompson, 1984 - replicating compiler backdoors - conspiracy theory: #+begin_quote Proprietary compiler used to build the first GCC versions implanted a self-replicating backdoor in them that lives in all GCC binaries to this day. #+end_quote - now disproved :) - solutions: - Diverse Double-Compiling (Wheeler, 2009) - bootstrappable builds * Bootstrappable Builds - verify dependencies of dependencies of dependencies, etc. - including build dependencies - 2022 — full source bootstrap - hex0 — minimal hex assembler that can assemble its source code - hex1 — richer version of hex0 (with, e.g., label jumps) - catm — =cat= replacement buildable with hex0 - hex2 — even richer counterpart of hex1 (with, e.g., absolute addresses) - other MesCC Tools (M0, cc_x86, M1, M2, get_machine) - can compile simple C - GNU Mes (Scheme Lisp interpreter & C compiler) - Tiny C Compiler (TCC) - GCC (2.95.3) - binutils, glibc - GCC (4.9.4) - more… * Bootstrappable Builds, Cont. - verify dependencies of dependencies of dependencies, etc. - 2022 — full source bootstrap - thanks to NLnet for funding! * Bootstrappable Builds, Cont… - verify dependencies of dependencies of dependencies, etc. - 2022 — full source bootstrap - thanks to NLnet for funding! - what about kernel? - live-bootstrap project * Traditional Package Management / Development - in the past: - surely, 0x1e8266 is the memory address of the framebuffer - in more modern times: - surely =/usr/bin/python= exists and is … - Filesystem Hierarchy Standard (FHS) - =/etc= — configuration files =/bin= — statically-linked executables - =/usr/share= — architecture-independent files - =/usr/bin= — executables - =/usr/lib= — libraries - =/usr/libexec= — executables for use by programs - =/usr/local/bin=, =/usr/local/share= — non-repo software - etc. * Traditional Package Management / Development — Challenges - software that requires older/newer libraries - using multiple software versions on a single system - updating safely (atomically) - rolling back updates - ad-hoc installation & usage - rootless installation - patching dependencies deep in the tree - bootstrappability & reproducibility - unlike Docker, virtualenv et al :( * Functional Package Management - "The Purely Functional Software Deployment Model" - Eelco Dolstra, 2006 - Nix package manager - every program/library in its own directory - =/store/008naskq2zc7dq93fpz4ard66qiyzywy-libx11-1.8.10/= - =/store/9q279phakibws2s76paciw0g8hvxvl0p-libx11-1.8.12/= - =/store/a6kkajhmaymz2rmx7m29bp9aqh9pawrx-libx11-1.8.12/= - used as "prefix" (=lib/=, =bin/=, etc. inside) * Functional Package Management, Cont. - "The Purely Functional Software Deployment Model" - Nix package manager - every program/library in its own directory - *built software is a function of build inputs* - store names - hash of all sources & dependencies - dependency update → new dependee - tricks to limit rebuilds & space usage - dependencies - creative use of rpath, etc. - e.g., =/store/a6kkajhmaymz2rmx7m29bp9aqh9pawrx-libx11-1.8.12/= points to =/store/yj053cys0724p7vs9kir808x7fivz17m-glibc-2.41= - no dependence on, e.g., =/usr/bin/python= - versions can coexist * Functional Package Management, Cont… - "The Purely Functional Software Deployment Model" - Nix package manager - every program/library in its own directory - *built software is a function of build inputs* - store names - dependencies - creative use of rpath, etc. - e.g., =/store/a6kkajhmaymz2rmx7m29bp9aqh9pawrx-libx11-1.8.12/= points to =/store/yj053cys0724p7vs9kir808x7fivz17m-glibc-2.41= - no dependence on, e.g., =/usr/bin/python= - versions can coexist * Nix - NixOS - Nixpkgs - stable & rolling release - DSL for recipes - packaging as a code! - 1 file — many definitions - compare to Debian… - Nix daemon - local builds & substitutes - declarative system configuration * GNU Guix - 2014 - Nix' sibling - Guile Scheme (Lisp family) - package definitions - system configuration - package manager itself - used for Bitcoin Core releases (since 2021) * GNU Guix Features - full source bootstrap merged in - reproducibility tests - =guix challenge= - =guix build --rounds= - hermeticity enforced on builds - cross-compilation (works in some cases) - ad-hoc shells - & containers - grafts - VM images, container images - =guix deploy= (like Ansible, but for Guix) - policies: libre software only - patching packages (code & command line) - =guix shell yt-dlp mpv --with-branch=yt-dlp=master= - time-traveling - more! * Functional Package Management — Drawbacks, Problems - speed (recompilations) - substitute availability - limited compatibility (no FHS) - but: Guix' container shells with FHS emulation - live updates - e.g., Apache config & declarative service definition - feature-completeness - declarative configuration interfaces — all options of all services - secrets deployment (declaratively defined GNU Guix systems) - Gentoo =use= flags (equivalent still lacking) - package count (Debian > GNU Guix) - "cheating" - prebuilt npm et al packages in Nix - no "stable" branch in GNU Guix - Microsoft bought a GUIX trademark (used for sth else) * Local Variables # Local Variables: # org-image-actual-width: nil # End: