summaryrefslogtreecommitdiff
path: root/lectures/11-reproducibility-bootstrappability.org
blob: adfb1c791b3d08aa0ba0bafea55d9766318daf00 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
#+title: Reproducibility, Boostrappability and Functional Package Management
#+date: 2026-06-01 Mon
#+author: W. Kosior
#+email: wkosior@agh.edu.pl

* SolarWinds
- "Sunburst"
- 2020
- compromised build infrastructure
- ~18,000 customers affected
- motivated discussion & guidance development

* Chalk Backdoor
- 2025
- "I Got Pwned"
  - developer phished
  - honest about it
- *extremely* popular npm Registry package
  - terminal coloring
  - depended on by almost every project
- backdoor
  - crypto-stealing (typical for NPM registry)
  - millions of downloads in ~2 hours
  - unused "potential" ;)
- + a few other compromised packages

* The Build Process
#+ATTR_ORG: :width 100%
[[./11-build-process-overview.svg]]

* Kinds of Dependencies
- runtime dependencies
- development dependencies
  - build dependencies (compilers, test frameworks, etc.)
  - other tools (IDEs, linters, debuggers, etc.)

* Reproducibility
#+begin_example
hermeti​city         fixed build inputs      ​a bit of care
      |​                     |               ​      |
      |​                     |               ​      |
      +​------+              |              +​------+
       ​      |              |              |
       ​      V              V              V
       ​    bit-to-bit identical result each time
       ​                     |
       ​                     |
       ​                     V
       ​              identical hashes -----> multiparty verification
#+end_example

* Successful Multiparty Verification
#+ATTR_ORG: :width 100%
[[./11-rebuilds-no-contamination-diagram.svg]]

* Unsuccessful Multiparty Verification
#+ATTR_ORG: :width 100%
[[./11-rebuilds-contamination-diagram.svg]]

* Reproducibility & Hermeticity in Guidance
- Software Component Verification Standard
  - OWASP, 2020
  - no direct mention
- Supply Chain Levels for Software Artifacts
  - OSSF, 2023
  - in draft, later removed
  - might be re-introduced in later versions
- Secure Supply Chain Consumption Framework
  - briefly mentioned
  - OSSF, 2022
- Software Supply Chain Security Best Practices
  - CNCF, 2021
  - potentially leverageable when high assurance needed
- Securing the Software Supply Chain: Recommended Practices Guide
  - NSA, ODNI, and CISA, 2022
  - "advanced" mitigations, "additional protection"
  - some text copied from SLSA draft

* Reproducibility in Practice
- 1992-3 (Cygnus, GCC)
  - cross-compilation too!
  - wisdom later lost

* Reproducibility in Practice, Cont.
- 1992-3 (Cygnus, GCC)
- Gitian (2011)
  - VM for builds
  - Bitcoin Core builds until 2021

* Reproducibility in Practice, Cont…
- 1992-3 (Cygnus, GCC)
- Gitian (2011)
- Debian
  - DebConf13  (2013)
  - 24% reproducible in 2013
  - 2018 — APT transport with in-toto
    - not usable as of 2025 :(
  - 97% reproducible in 2025

* Reproducibility in Practice, Cont…
- 1992-3 (Cygnus, GCC)
- Gitian (2011)
- Debian
- other early adopters (2015)
  - Coreboot
  - OpenWrt
  - NetBSD
  - FreeBSD
  - Arch
  - Fedora

* Reproducibility in Practice, Cont…
- 1992-3 (Cygnus, GCC)
- Gitian (2011)
- Debian
- other early adopters (2015)
- reproducible container images
  - /(traditionally a reproducibility-unfriendly area)/
  - base Arch container (2024)
  - GNU Guix containers (2025)

* Reproducibility in Practice, Cont…
- 1992-3 (Cygnus, GCC)
- Gitian (2011)
- Debian
- other early adopters (2015)
- reproducible container images
- more at https://reproducible-builds.org

* Reproducibility CI
- https://reproducible-builds.org/citests/

* Sample Reproducibility Challenges
- timestamps
  - solution: set all to Epoch
- order of filenames
  - directory listings
  - created archives
  - solution: sort names
- paths (debugging information)
  - solutions:
    - exclude paths in DWARF
    - build in the same prefix
- casual *rebuildability* problems
  - build time bombs (e.g., expiring certs in tests)
  - no automated rebuildability for npm Registry, PyPI, etc.

* Diffoscope
#+begin_example
$ diffoscope somearchive.tar.gz otherarchive.tar.gz
--- somearchive.tar.gz
+++ otherarchive.tar.gz
│   --- somearchive.tar
├── +++ otherarchive.tar
│ ├── file list
│ │ @@ -1,2 +1,2 @@
│ │ -drwxr-xr-x   0 urz       (1000) urz       (1000)        0 2026-06-01 05:41:31.000000 somedir/
│ │ --rw-r--r--   0 urz       (1000) urz       (1000)      102 2026-06-01 05:41:32.000000 somedir/lecture-head
│ │ +drwxr-xr-x   0 urz       (1000) urz       (1000)        0 2026-06-01 05:41:10.000000 somedir/
│ │ +-rw-r--r--   0 urz       (1000) urz       (1000)      148 2026-06-01 05:41:10.000000 somedir/lecture-head
│ ├── somedir/lecture-head
│ │ @@ -1,4 +1,4 @@
│ │ -#+title: Software Repositories
│ │ -#+date: 2026-05-25 Mon
│ │ +#+title: Reproducibility, Boostrappability and Functional Package Management
│ │ +#+date: 2026-06-01 Mon
│ │  #+author: W. Kosior
│ │  #+email: wkosior@agh.edu.pl
#+end_example

* Diffoscope, Cont.
#+begin_example
$ diffoscope hello1 hello2
--- ../build-2026-06-01T0759/hello1
+++ hello2
├── readelf --wide --debug-dump=rawline {}
│ @@ -25,15 +25,15 @@
│    Opcode 9 has 1 arg
│    Opcode 10 has 0 args
│    Opcode 11 has 0 args
│    Opcode 12 has 1 arg
│
│   The Directory Table (offset 0x22, lines 2, columns 1):
│    Entry      Name
│ -  0  (line_strp)     (offset: 0x8): /tmp/build-2026-06-01T0759
│ +  0  (line_strp)     (offset: 0x8): /tmp/build-2026-06-01T0800
│    1  (line_strp)     (offset: 0x23): /usr/local/include
#+end_example

* Making Use of Reproducibility
- monitoring software distribution
  - easy
  - Debian, Arch, GNU Guix, Nix, etc.
- verifying software before installing/distributing
  - better guarantees
  - harder
  - Bitcoin Core
- reproducing research

* "Reflections on Trusting Trust"
- Ken Thompson, 1984
- replicating compiler backdoors
- conspiracy theory:
  #+begin_quote
  Proprietary compiler used to build the first GCC versions implanted a
  self-replicating backdoor in them that lives in all GCC binaries to this day.
  #+end_quote
  - now disproved :)
- solutions:
  - Diverse Double-Compiling (Wheeler, 2009)
  - bootstrappable builds

* Bootstrappable Builds
- verify dependencies of dependencies of dependencies, etc.
  - including build dependencies
- 2022 — full source bootstrap
  - hex0 — minimal hex assembler that can assemble its source code
  - hex1 — richer version of hex0 (with, e.g., label jumps)
  - catm — =cat= replacement buildable with hex0
  - hex2 — even richer counterpart of hex1 (with, e.g., absolute addresses)
  - other MesCC Tools (M0, cc_x86, M1, M2, get_machine)
    - can compile simple C
  - GNU Mes (Scheme Lisp interpreter & C compiler)
  - Tiny C Compiler (TCC)
  - GCC (2.95.3)
  - binutils, glibc
  - GCC (4.9.4)
  - more…

* Bootstrappable Builds, Cont.
- verify dependencies of dependencies of dependencies, etc.
- 2022 — full source bootstrap
- thanks to NLnet for funding!

* Bootstrappable Builds, Cont…
- verify dependencies of dependencies of dependencies, etc.
- 2022 — full source bootstrap
- thanks to NLnet for funding!
- what about kernel?
  - live-bootstrap project

* Traditional Package Management / Development
- in the past:
  - surely, 0x1e8266 is the memory address of the framebuffer
- in more modern times:
  - surely =/usr/bin/python= exists and is …
- Filesystem Hierarchy Standard (FHS)
  - =/etc= — configuration files
    =/bin= — statically-linked executables
  - =/usr/share= — architecture-independent files
  - =/usr/bin= — executables
  - =/usr/lib= — libraries
  - =/usr/libexec= — executables for use by programs
  - =/usr/local/bin=, =/usr/local/share= — non-repo software
  - etc.

* Traditional Package Management / Development — Challenges
- software that requires older/newer libraries
- using multiple software versions on a single system
- updating safely (atomically)
- rolling back updates
- ad-hoc installation & usage
- rootless installation
- patching dependencies deep in the tree
- bootstrappability & reproducibility
  - unlike Docker, virtualenv et al :(

* Functional Package Management
- "The Purely Functional Software Deployment Model"
  - Eelco Dolstra, 2006
- Nix package manager
- every program/library in its own directory
  - =/store/008naskq2zc7dq93fpz4ard66qiyzywy-libx11-1.8.10/=
  - =/store/9q279phakibws2s76paciw0g8hvxvl0p-libx11-1.8.12/=
  - =/store/a6kkajhmaymz2rmx7m29bp9aqh9pawrx-libx11-1.8.12/=
  - used as "prefix" (=lib/=, =bin/=, etc. inside)

* Functional Package Management, Cont.
- "The Purely Functional Software Deployment Model"
- Nix package manager
- every program/library in its own directory
- *built software is a function of build inputs*
- store names
  - hash of all sources & dependencies
  - dependency update → new dependee
    - tricks to limit rebuilds & space usage
- dependencies
  - creative use of rpath, etc.
    - e.g., =/store/a6kkajhmaymz2rmx7m29bp9aqh9pawrx-libx11-1.8.12/= points to
      =/store/yj053cys0724p7vs9kir808x7fivz17m-glibc-2.41=
  - no dependence on, e.g., =/usr/bin/python=
  - versions can coexist

* Functional Package Management, Cont…
- "The Purely Functional Software Deployment Model"
- Nix package manager
- every program/library in its own directory
- *built software is a function of build inputs*
- store names
- dependencies
  - creative use of rpath, etc.
    - e.g., =/store/a6kkajhmaymz2rmx7m29bp9aqh9pawrx-libx11-1.8.12/= points to
      =/store/yj053cys0724p7vs9kir808x7fivz17m-glibc-2.41=
  - no dependence on, e.g., =/usr/bin/python=
  - versions can coexist

* Nix
- NixOS
- Nixpkgs
  - stable & rolling release
- DSL for recipes
  - packaging as a code!
  - 1 file — many definitions
    - compare to Debian…
- Nix daemon
- local builds & substitutes
- declarative system configuration

* GNU Guix
- 2014
- Nix' sibling
- Guile Scheme (Lisp family)
  - package definitions
  - system configuration
  - package manager itself
- used for Bitcoin Core releases (since 2021)

* GNU Guix Features
- full source bootstrap merged in
- reproducibility tests
  - =guix challenge=
  - =guix build --rounds=
- hermeticity enforced on builds
- cross-compilation (works in some cases)
- ad-hoc shells
  - & containers
- grafts
- VM images, container images
- =guix deploy= (like Ansible, but for Guix)
- policies: libre software only
- patching packages (code & command line)
  - =guix shell yt-dlp mpv --with-branch=yt-dlp=master=
- time-traveling
- more!

* Functional Package Management — Drawbacks, Problems
- speed (recompilations)
- substitute availability
- limited compatibility (no FHS)
  - but: Guix' container shells with FHS emulation
- live updates
  - e.g., Apache config & declarative service definition
- feature-completeness
  - declarative configuration interfaces — all options of all services
  - secrets deployment (declaratively defined GNU Guix systems)
  - Gentoo =use= flags (equivalent still lacking)
  - package count (Debian > GNU Guix)
  - "cheating"
    - prebuilt npm et al packages in Nix
- no "stable" branch in GNU Guix
- Microsoft bought a GUIX trademark (used for sth else)

* Local Variables
# Local Variables:
# org-image-actual-width: nil
# End: