#+title: Filesystems #+date: 2026-03-16 Mon #+author: W. Kosior #+email: wkosior@agh.edu.pl * UID, GID - =getuid()= - =getgid()= - =getgroups()= * file permission bits - =rwx-rwx-rwx= - 4, 2, 1 - root * inodes, links - inode ← file information - file names *ABSENT* - hard links - many fs paths, 1 inode - counted in inode - regular files only - metadata changes affect all links - symlinks - link permissions do not matter (or they do, see macOS) - Windows / NTFS has - hard links - junctions - symbolic links (for UNIX compatibility) * setuid, setgid (EUID, EGID) - =geteuid()= - =getegid()= - saved user / group ID (=setreuid()= & friends) - =man credentials= - +shell scripts+ - directories → group ownership auto copied * Sticky Bit - inhibit non-owner file removal - directories only - historically used on files on some systems * Giving Files - no SUID behavior on directories? * Filesystem Quotas - limiting filesystem usage by users / groups - mount options =usrquota= / =grpquota= - =aquota.user= / =aquota.group= (formely without leading "a") - details may vary between filesystems / operating systems - tmpfs quota only available since Linux 6.6 (2023)! * Filesystem Quotas, Diving in - *separate inodes and blocks quotas* - space for inodes reserved a priori on ext filesystems - can be tuned - millions of small files → fs unoperational - hard & soft limits - can be exceeded temporarily (soft) - can send warnings to the user - cannot be exceeded (hard) * Other Resource Quotas - =setrlimit()= / =getrlimit()= - hard / soft limits can be lowered - root can raise limits - non-root can only raise soft - limits on - absolute cpu time - file descriptor count - user thread count - memory used (virtual, RSS, stack) * Cgroups - Linux-specific - groups of processes (group hierarchies) - aplly policies to groups - resource limiting - accounting - control - kill the entire groups at once - freeze the entire group at once - can be configured through a special fs - typically mounted at =/sys/fs/cgroup/= - shall also be very useful to us later (containers) * =setuid()= - root only - but… we'll cover capabilities in a minute * new process with a Different UID — Cases - login manager - su - SSH - cron - inetd - ... * Threads - seen as processes by Linux kernel - shared resources (memory, etc.) - thread-local storage * sudo - =/etc/sudoers= - visudo - =man sudo= - sudoedit - new process environment - workaround for setuid scripts * doas - OpenBSD - 2015 - fraction of sudo's code size * Use of System (aka Non-Human) Accounts - daemons - =cat /etc/passwd= - privilege dropping (=bind()= & =setuid()=) - re-binding after config update? - rootless X11 * IPC - semaphores, pipes, sockets, signals, shared mem… - root can send signals to all - users can send signals to their own processes - sockets - local client authentication - sending file descriptor to other local process - Linux-based systems - socketd themselves operated through fds… - sending socket fd over socket - zero downtime service updates - binding to privileged ports and sending sockfd - IPC namespace — future topic * =ptrace= - traditional UNIX syscall - used by debuggers - signal reception interception - operations on traced process' memory - syscall interception - used by PRoot - used by User Mode Linux * =ptrace= Security - can only trace process if either - have =CAP_SYS_PTRACE= - can send signal to it and process is not SUID / SGID - other ways of limitation (Linux Security Modules) * Attributes - examples: - append only (a) - compressed (c) - immutable (i) - data journaling (j) - no co​mpression (m) - no atime update (A) - no copy on write (C) - BSD file flags — analogical * Extended Attributes - arbitrary name+value pairs on files - several uses - file mime type - backing up NTFS files (alternative data streams) - *security-related metadata (discussed next)* * POSIX ACLs - more flexible file access permissions - supported by most popular UNIX filesystems - Linux → =acl= mount option needed - user, e.g., "u:1000:rw" - group, e.g., "g:hackers:r" - mask, e.g., "m:rx" - + traditional permissions (represented as ACL part) - default ACL → directory → created child ACL - check order: owner → user → owning group → group → other - and mask - entry creation order does not matter * +POSIX+ Linux Capabilities - failure at standardization - privileges of root decoupled - examples: - =CAP_NET_BIND_SERVICE= - =CAP_NET_RAW= - =CAP_SYS_TIME= - more… (limit of 64, formely 32) - =CAP_SYS_ADMIN= ← overloaded - use +setuid+ *setcap* binaries * Capabilities, Diving in - threads have caps - executables have caps - ignore when mounted as nosuid - ignore when ptrace in use - … - =captget= / =capset= syscalls - =prctl= syscall * Capability Sets - Permitted - Inheritable Permitted → =capset()= → Inheritable — always allowed. /*nothing*/ → =capset()= → Inheritable — if process has =CAP_SETPCAP=. * Capabilities on =exec= - P — process - F — file P_prm' := F_prm | (P_inh & F_inh) P_inh' := P_inh * Capability Sets, Cont. - Permitted - Inheritable - Effective On a file, Effective is a bit, not a set. * Capabilities on =exec=, Cont. - P — process - F — file P_prm' := F_prm | (P_inh & F_inh) P_inh' := P_inh P_eff' := F_eff & P_prm' Effective bit useful for "dumb" binaries. * Capability Sets, Cont… - Permitted (processes and files) - Effective (processes and files) - Inheritable (processes and files) - Bounding (processes *only*) =prctl(CAP_BSET_DROP)= ← can remove from bounding set, not add Bounding limits *adding to* Inheritable. * Capabilities on =exec=, Cont. - P — process - F — file P_prm' := (F_prm & P_bnd) | (P_inh & F_inh) P_inh' := P_inh P_eff' := F_eff & P_prm' P_bnd' := P_bnd * Capability Sets, Cont… - Permitted (processes and files) - Effective (processes and files) - Inheritable (processes and files) - Bounding (processes *only*) - Ambient (processes *only*) =prctl(PR_CAP_AMBIENT_RAISE)= ← if cap already in Permitted & Inheritable * Capabilities on =exec=, Cont. - P — process - F — file P_amb' := F_has_caps ? 0 : P_amb P_prm' := (F_prm | (P_inh & F_inh) | P_amb') & P_bnd P_inh' := P_inh P_eff' := F_eff ? P_prm' : P_amb' P_bnd' := P_bnd Capability preservation on ordinary =exec=. Note that file can *have* cap sets that are empty. * Capabilities in =exec= by Root P_prm' := P_inh | P_bnd P_eff' := P_prm' Also applies at the moment of gaining root through SUID binary… … except when that binary has caps itself. * Windows SIDs - instead of UID/GID - well-known SIDs - =S-1-1-0= ("World", aka "Everyone") - =S-1-5-21-1004336348-1177238915-682003330-512= - meant to be globally unique (GUID) * Windows tokens and logon - LSA - SAM database - remote authentication (domains, future topic) - NTLM - Kerberos - access tokens - kernel objects - operation → handles - contains user & groups SIDs - threads & processes - associated tokens - primary / impersonating * S4U - included in a Kerberos extension - equivalent of setuid() - accessible to Local System ("SYSTEM") - used in place of NtCreateToken() - OpenSSH server on Windows - Cygwin "Local Security Authority Subsystem Service (LSASS) stores credentials in memory on behalf of users with active Windows sessions." [[https://learn.microsoft.com/en-us/windows-server/security/windows-authentication/credentials-processes-in-windows-authentication#services-and-kernel-mode][https://learn.microsoft.com/ en-us/windows-server/security/windows-authentication/ credentials-processes-in-windows-authentication#services-and-kernel-mode]] * Windows ACLs - composed of ACEs - processed *in order* - unlike POSIX ACLs - can deny access - unlike POSIX ACLs - not accounting for POSIX ACL special cases - mask - group with narrower permissions than "other" - could be stored in xattrs on *NIX