#+title: BPF #+date: 2026-05-11 Mon #+author: W. Kosior #+email: wkosior@agh.edu.pl * BPF - 1991 (BSD) - Berkeley Packet Filter - VM in a kernel - tcpdump - https://www.tcpdump.org/papers/bpf-usenix93.pdf * eBPF - 2014 (kernel 3.18) - 2021 (Windows implementation) - extended BPF - C → BPF program - loading - =CAP_SYS_ADMIN= - =CAP_BPF= since kernel 5.8 - verification - efficient in-kernel operation - unambigious naming: eBPF & cBPF (classical BPF) * eBPF VM - registers R0-R10 - R10 — read-only frame pointer - 64-bit - helper functions (side-effects) - safely read kernel & userspace - =bpf_probe_read_kernel()=, =bpf_core_read()= & friends - direct pointer dereference = verification failure - traverse kernel data structures - kill a process - write to userspace - … - dedicated LLVM & GCC target - use of object file sections - https://eunomia.dev/tutorials/1-helloworld/ * eBPF Verification - no arbitrary memory writes - static bounds checking - no direct kernel memory access - loops - none - bounded only (2019; kernel 5.3) - execution paths checked during loading - helper functions * eBPF Compilation & Loading - libbpf - =program.bpf.c= → =program.skel.h= - compiled bpf → C array blob - =.bpf.c= — one of conventions - =program_bpf__open()= & =program_bpf__load()= & =bootstrap_bpf__attach()= - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/fentry.c - other toolsets * eBPF Communication with Userspace - =bpf_printk()= - =sudo cat /sys/kernel/debug/tracing/trace_pipe= - data structures (maps) * eBPF Data Structures - array (=BPF_MAP_TYPE_ARRAY=) - optionally per-cpu - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/minimal_legacy.bpf.c - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/minimal_legacy.c - hash (=BPF_MAP_TYPE_HASH=) - optionally per-cpu - perf buffers (=BPF_MAP_TYPE_PERF_EVENT_ARRAY=) - per-cpu - efficient polling possible - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/bootstrap_legacy.bpf.c - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/bootstrap_legacy.c - ring buffers (=BPF_MAP_TYPE_RINGBUF=) - reserve & submit (spinlock) - global alterantive to perf buffers * eBPF Attachment Points - kprobes - & kretprobes - uprobes - & uretprobes - tracepoints - ingress / egress - XDP - LSM hooks - … * Kprobe eBPF - (almost) arbitrary kernel code - even inside functions! - =int3= (or equivalent on non-x86) injection - function exit (kretprobe) - *predates eBPF* - usable from kernel modules - analyze registers - function arguments - traverse kernel data structures - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/kprobe.bpf.c * Uprobe eBPF - attach to arbitrary program code - e.g., TLS routines :) - function exit (uretprobe) - *also predates eBPF* - can read userspace - access controls? *beware of TOCTOU* - *can modify userspace* - no ptrace needed - higher efficiency, less intrusive - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/uprobe.bpf.c - User Statically Defined Tracepoints (USDT) - non-pre-compiled languages * Tracepoint eBPF - attach to non-arbitrary points in kernel - explicitly exposed by kernel programmers - higher stability than kprobes - =/sys/kernel/debug/tracing/events/= - =bootstrap_legacy.bpf.c= & =minimal_legacy.bpf.c= from before * Tracepoint Format Listing #+begin_example # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/format name: sched_process_exec ID: 296 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__data_loc char[] filename; offset:8; size:4; signed:0; field:pid_t pid; offset:12; size:4; signed:1; field:pid_t old_pid; offset:16; size:4; signed:1; print fmt: "filename=%s pid=%d old_pid=%d", __get_str(filename), REC->pid, REC->old_pid #+end_example * ingress / egress eBPF - Traffic Control (*also predates eBPF*) - packet dropping, redirection & modification - =__sk_buff= (writable) - return codes - =TC_ACT_UNSPEC= - =TC_ACT_OK= - =TC_ACT_RECLASSIFY= - =TC_ACT_SHOT= - =TC_ACT_PIPE= - … - https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/tc.bpf.c - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/tc.c - if name → if index * eXpress Data Path - "zero-copy" - even quicker redirects, etc. - return codes - =XDP_ABORTED= - =XDP_DROP= - =XDP_PASS= - =XDP_TX= — back out on the same NIC! - =XDP_REDIRECT= - NIC offloading - few cards - https://gothub.r4fo.com/haolipeng/ebpf-tutorial/blob/master/src/18-xdp-filter/xdp_filter.bpf.c * Kernel Struct Field Accesses - direct pointer dereference only on BPF-owned memory - =bpf_probe_read_kernel()= — read kernel emmory - size & offset - =bpf_probe_read_kernel()= → =bpf_core_read()= - type safety - =bpf_core_read()= → =BPF_CORE_READ()= - replaces multiple =bpf_core_read()= - https://gothub.r4fo.com/libbpf/libbpf-bootstrap/blob/master/examples/c/bootstrap.bpf.c - =BPF_CORE_READ(task, real_parent, tgid)= → =task->real_parent->tgid= - BPF Type Format (BTF) - "Compile Once Run Everywhere" ("CO-RE") * BPF Use-Cases - load-balancing (XDP) - DDoS mitigation (XDP) - QoS / traffic shaping - monitoring - access restrictions (seccomp-bpf, LSM) - debugging - snooping on / hijacking userspace * Tools (Development Libraries) - libbpf (C) - libbpfgo (Go) (libbpf wrapper) - Cilium/eBPF (Go) - aya (Rust) - libbpf-rs (Rust) * Tools (BPF Users) - bpftrace - https://bpftrace.org/docs/release_023/cli - bpfilter (more efficient itables?) - seccomp-bpf (cBPF) - https://kernel-internals.org/security/seccomp/ * Omitted Tracing Facilities - DTrace - ftrace (can also be used by eBPF)