chroot()/ for the processchroot()cd into directory that is then moved outside of chroot dirchroot() without cd /, which defeats the purposechroot()INADDR_ANYuts, pid, net, etc. (see later)CLONE_FILES)CLONE_VM)CLONE_SIGHAND)CLONE_FS)uid 0 are actually process capabilities
CAP_CHOWN, make arbitrary changes to file IDsCAP_KILL, send signals to arbitrary processesCAP_SYS_ADMIN: “Note: this capability is overloaded” ☺capabilities(7)CONFIG_UTS_NSCAP_SYS_ADMINCONFIG_IPC_NSCAP_SYS_ADMINipcs|wc -l → 48 entries on my machinecgroups(7) - resource isolation/monitoringCAP_SYS_ADMINCLONE_NEWNSchroot() effect: isolates the directory hierarchy.MS_RDONLY, MS_NOSUID, MS_NOEXEC) and the “atime” flags become locked and cannot be changed anymoreCAP_SYS_ADMINCONFIG_NET_NS, but completed in 2.6.29CAP_SYS_ADMIN# ip -o l
1: lo: …
2: eth0: …
# unshare --net
# ip -o l
1: lo: …
#
veth(4))# ip link add veth2-left type veth peer veth2-right
# ip link set veth2-right netns ns-right
CONFIG_PID_NSCAP_SYS_ADMINgetpid() should never change, a process cannot change PID namespaces (compared to all other namespace types)setns() only changes the namespace for future children of this process, not for the process itselfgetppid() returns 0 in such casessetns()—downwards, not upwards
SIGKILLSIGKILL/SIGSTOPreboot() in this namespace works (and terminates it)
test@debian:~$ id
uid=1001(test) gid=1001(test) groups=1001(test)
test@debian:~$ unshare --user
nobody@debian:~$ id; exit
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)
test@debian:~$ unshare --user --map-root-user
root@debian:~# id
uid=0(root) gid=0(root) groups=0(root)
echo 1 > /proc/sys/kernel/unprivileged_userns_clonetest@debian:~$ unshare --user --map-root-user --mount
root@debian:~# df -h|grep /mnt
root@debian:~# mount -t tmpfs none /mnt/
root@debian:~# df -h|grep /mnt
none 998M 0 998M 0% /mnt
CAP_SYS_ADMIN in a (non-initial) user namespace is not quite the real thing
mknod)procfs, sysfs, devpts, tmpfs, ramfs, mqueue, bpf)dmesg to read the kernel logs, if they were originally disallowedCLONE_NEWUSER flag) in 2.6.23, semantics changed to current ones in 3.5, and the final bits were added to make it fully usable in 3.8; set CONFIG_USER_NSbpf mounting appeared in 4.4,cgroup configuration introduced 4.6, etc.CAP_SYS_ADMIN in the target namespaceCAP_NET_ADMINtest@debian:~$ unshare --user --map-root-user
root@debian:~# iptables -L
iptables: Permission denied (you must be root).
root@debian:~# id
uid=0(root) gid=0(root) groups=0(root)test@debian:~$ unshare --user --map-root-user
root@debian:~# date > /tmp/foo; exit
test@debian:~$ ls -l /tmp/foo
-rw-r--r-- 1 test test 29 Mar 30 15:29 /tmp/footest@debian:~$ unshare --user --map-root-user
root@debian:~# su - more-test
su: Authentication failuretest@debian:~$ ls -l /tmp/foo
-rw-r--r-- 1 more-test test 32 Mar 30 15:39 /tmp/foo
test@debian:~$ unshare --user --map-root-user
root@debian:~# ls -l /tmp/foo
-rw-r--r-- 1 nobody root 32 Mar 30 15:39 /tmp/fooCAP_SETUID/CAP_SETGID can set arbitrary mappings/proc/$pid/uid_map (and gid_map), and read user_namespaces(7)65534, nobody/nogroup)stat, getuid, chown, etc.) as appropriate, both inside and outside of the namespacesetuid/setgid programs works as expected if there is a mapping!setns(2): switches one or more namespaces:
ls -l /proc/self/ns/setns(2) takes argument a file descriptor to one such directoryunshare(2): unshares parts of the execution context
CLONE_* flagsclone(2) is very much worth reading, to understand how complex process relationships areioctl_ns(2) allows discovering some of the relationships between namespacesunshare(1), newuid(1), newgid(1), etc.CLONE_NEWUSER)CLONE_NEWUTS)CLONE_NEWNS)CLONE_NEWPID)CLONE_NEWNET)namespaces(7), and all the “SEE ALSO” pages