117 lines
		
	
	
		
			4.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			117 lines
		
	
	
		
			4.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Device Whitelist Controller
 | 
						|
 | 
						|
1. Description:
 | 
						|
 | 
						|
Implement a cgroup to track and enforce open and mknod restrictions
 | 
						|
on device files.  A device cgroup associates a device access
 | 
						|
whitelist with each cgroup.  A whitelist entry has 4 fields.
 | 
						|
'type' is a (all), c (char), or b (block).  'all' means it applies
 | 
						|
to all types and all major and minor numbers.  Major and minor are
 | 
						|
either an integer or * for all.  Access is a composition of r
 | 
						|
(read), w (write), and m (mknod).
 | 
						|
 | 
						|
The root device cgroup starts with rwm to 'all'.  A child device
 | 
						|
cgroup gets a copy of the parent.  Administrators can then remove
 | 
						|
devices from the whitelist or add new entries.  A child cgroup can
 | 
						|
never receive a device access which is denied by its parent.
 | 
						|
 | 
						|
2. User Interface
 | 
						|
 | 
						|
An entry is added using devices.allow, and removed using
 | 
						|
devices.deny.  For instance
 | 
						|
 | 
						|
	echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
 | 
						|
 | 
						|
allows cgroup 1 to read and mknod the device usually known as
 | 
						|
/dev/null.  Doing
 | 
						|
 | 
						|
	echo a > /sys/fs/cgroup/1/devices.deny
 | 
						|
 | 
						|
will remove the default 'a *:* rwm' entry. Doing
 | 
						|
 | 
						|
	echo a > /sys/fs/cgroup/1/devices.allow
 | 
						|
 | 
						|
will add the 'a *:* rwm' entry to the whitelist.
 | 
						|
 | 
						|
3. Security
 | 
						|
 | 
						|
Any task can move itself between cgroups.  This clearly won't
 | 
						|
suffice, but we can decide the best way to adequately restrict
 | 
						|
movement as people get some experience with this.  We may just want
 | 
						|
to require CAP_SYS_ADMIN, which at least is a separate bit from
 | 
						|
CAP_MKNOD.  We may want to just refuse moving to a cgroup which
 | 
						|
isn't a descendant of the current one.  Or we may want to use
 | 
						|
CAP_MAC_ADMIN, since we really are trying to lock down root.
 | 
						|
 | 
						|
CAP_SYS_ADMIN is needed to modify the whitelist or move another
 | 
						|
task to a new cgroup.  (Again we'll probably want to change that).
 | 
						|
 | 
						|
A cgroup may not be granted more permissions than the cgroup's
 | 
						|
parent has.
 | 
						|
 | 
						|
4. Hierarchy
 | 
						|
 | 
						|
device cgroups maintain hierarchy by making sure a cgroup never has more
 | 
						|
access permissions than its parent.  Every time an entry is written to
 | 
						|
a cgroup's devices.deny file, all its children will have that entry removed
 | 
						|
from their whitelist and all the locally set whitelist entries will be
 | 
						|
re-evaluated.  In case one of the locally set whitelist entries would provide
 | 
						|
more access than the cgroup's parent, it'll be removed from the whitelist.
 | 
						|
 | 
						|
Example:
 | 
						|
      A
 | 
						|
     / \
 | 
						|
        B
 | 
						|
 | 
						|
    group        behavior	exceptions
 | 
						|
    A            allow		"b 8:* rwm", "c 116:1 rw"
 | 
						|
    B            deny		"c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
 | 
						|
 | 
						|
If a device is denied in group A:
 | 
						|
	# echo "c 116:* r" > A/devices.deny
 | 
						|
it'll propagate down and after revalidating B's entries, the whitelist entry
 | 
						|
"c 116:2 rwm" will be removed:
 | 
						|
 | 
						|
    group        whitelist entries                        denied devices
 | 
						|
    A            all                                      "b 8:* rwm", "c 116:* rw"
 | 
						|
    B            "c 1:3 rwm", "b 3:* rwm"                 all the rest
 | 
						|
 | 
						|
In case parent's exceptions change and local exceptions are not allowed
 | 
						|
anymore, they'll be deleted.
 | 
						|
 | 
						|
Notice that new whitelist entries will not be propagated:
 | 
						|
      A
 | 
						|
     / \
 | 
						|
        B
 | 
						|
 | 
						|
    group        whitelist entries                        denied devices
 | 
						|
    A            "c 1:3 rwm", "c 1:5 r"                   all the rest
 | 
						|
    B            "c 1:3 rwm", "c 1:5 r"                   all the rest
 | 
						|
 | 
						|
when adding "c *:3 rwm":
 | 
						|
	# echo "c *:3 rwm" >A/devices.allow
 | 
						|
 | 
						|
the result:
 | 
						|
    group        whitelist entries                        denied devices
 | 
						|
    A            "c *:3 rwm", "c 1:5 r"                   all the rest
 | 
						|
    B            "c 1:3 rwm", "c 1:5 r"                   all the rest
 | 
						|
 | 
						|
but now it'll be possible to add new entries to B:
 | 
						|
	# echo "c 2:3 rwm" >B/devices.allow
 | 
						|
	# echo "c 50:3 r" >B/devices.allow
 | 
						|
or even
 | 
						|
	# echo "c *:3 rwm" >B/devices.allow
 | 
						|
 | 
						|
Allowing or denying all by writing 'a' to devices.allow or devices.deny will
 | 
						|
not be possible once the device cgroups has children.
 | 
						|
 | 
						|
4.1 Hierarchy (internal implementation)
 | 
						|
 | 
						|
device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
 | 
						|
list of exceptions.  The internal state is controlled using the same user
 | 
						|
interface to preserve compatibility with the previous whitelist-only
 | 
						|
implementation.  Removal or addition of exceptions that will reduce the access
 | 
						|
to devices will be propagated down the hierarchy.
 | 
						|
For every propagated exception, the effective rules will be re-evaluated based
 | 
						|
on current parent's access rules.
 |