From: Eric Paris <eparis@redhat.com> Date: Thu, 24 Jan 2008 10:40:18 -0500 Subject: [selinux] harden against null ptr dereference bugs Message-id: 1201189218.3260.13.camel@localhost.localdomain O-Subject: Re: [PATCH] harden against null pointer dereference bugs Bugzilla: 233021 BZ 233021 The following is a mashup of a whole lot of my patches from upstream plus some abi hoops which I'm not going to talk about. I tried this patch (at least part of it) in 5.1 and it broke stuff, hopefully it will go a little more smoothly this time but we definitely want as much testing as we can find. And I want more eyes looking for ways to make low memory addresses accessible to userspace (i talk about that below) http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ed0321895182ffb6ecf210e066d87911b270d587 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8869477a49c3e99def1fcdadd6bbc407fea14b45 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7cd94146cd504016315608e297219f9fb7b1413b http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ecaf18c15aac8bb9bed7b7aa0e382fe252e275d5 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab5a91a8364c3d6fc617abc47cc81d162c01d90a ****BIG NOTE, this patch turns things on by default***** ****BIG NOTE, this patch turns things on by default***** ****BIG NOTE, this patch turns things on by default***** Hopefully with these patches users will not be able to make low addresses of virtual memory accessible. The reason for this is to harden the kernel against future unknown null pointer bugs. Step 1) Find a way to make the kernel screw up and operate on memory in userspace. Step 2) Put your own code where the kernel is about to access Step 3) Execute the exploit you found in step 1 Step 4) ..... Step 5) Profit My patch tries to attack this at Step 2. Everyone is already doing everything they can to attack Step 1. Aside from random data corruption the vast majority of kernel bugs in bad pointers are null pointers. Its very easy to know where in userspace memory those are going to try to operate (hint hint null.) If we keep the user from using these pages the kernel can't be subverted, it will instead oops() like it 'should'. This protects against using mmap, mremap, expand_stack, and do_brk (specifically crafted binary) to keep users from getting access to the low pages. Please if anyone gets a minute and can think of any way to get low pages accessible in userspace that I missed let me know! Who does this mess with/break? 1) X on i686 (ok any user of vm86) 2) dosemu 3) maybe a company has a program which actually maps the zero page in hopes of using that to detect their own application null pointers. This permission can be allowed 1 of 3 ways. 1) If selinux is running a custom policy module can be written to allow the application to map these pages. X is already given this permission as of 5.1. 2) If selinux is off CAP_SYS_RAWIO is checked. (which really just means root most of the time) 3) the tunable /proc/sys/vm/mmap_min_addr can be set to 0 which disables this check completely. Upstream decided to take path 3 by default to save the 3 people in the world who might have to otherwise tune it. I plan on using a change to sysctl.conf in fedora to have it on by default.... (note to Don, do NOT rediff this patch, one small part of it was made using -U 4 because -U 3 will missaply. If you rediff your kernel will build and then go BOOM. If you need to rediff chat with me) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 20d0d79..9f73447 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -24,6 +24,7 @@ Currently, these files are in /proc/sys/vm: - dirty_writeback_centisecs - max_map_count - min_free_kbytes +- mmap_min_addr - laptop_mode - block_dump - drop-caches @@ -110,6 +111,19 @@ a number of reserved free pages based proportionally on its size. ============================================================== +mmap_min_addr + +This file indicates the amount of address space which a user process will be +restricted from mmaping. Since kernel null dereference bugs could +accidentally operate based on the information in the first couple of pages of +memory userspace processes should not be allowed to write to them. By default +this value is set to 0 and no protections will be enforced by the security +module. Setting this value to something like 64k will allow the vast majority +of applications to work correctly and provide defense in depth against future +potential kernel bugs. + +============================================================== + percpu_pagelist_fraction This is the fraction of pages at most (high mark pcp->high) in each zone that diff --git a/include/linux/mm.h b/include/linux/mm.h index 5d80deb..491eee4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -39,6 +39,10 @@ extern int sysctl_legacy_va_layout; #include <asm/pgtable.h> #include <asm/processor.h> +#ifdef CONFIG_SECURITY +extern unsigned long mmap_min_addr; +#endif + #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) /* @@ -545,6 +549,21 @@ static inline void set_page_links(struct page *page, unsigned long zone, } /* + * If a hint addr is less than mmap_min_addr change hint to be as + * low as possible but still greater than mmap_min_addr + */ +static inline unsigned long round_hint_to_min(unsigned long hint) +{ +#ifdef CONFIG_SECURITY + hint &= PAGE_MASK; + if (((void *)hint != NULL) && + (hint < mmap_min_addr)) + return PAGE_ALIGN(mmap_min_addr); +#endif + return hint; +} + +/* * Some inline functions in vmstat.h depend on page_zone() */ #include <linux/vmstat.h> diff --git a/include/linux/security.h b/include/linux/security.h index 11fec5b..98fd375 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -71,6 +71,7 @@ struct xfrm_user_sec_ctx; extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb); extern int cap_netlink_recv(struct sk_buff *skb, int cap); +extern unsigned long mmap_min_addr; /* * Values used in the task_security_ops calls */ @@ -1396,6 +1397,12 @@ struct security_operations { #endif /* CONFIG_KEYS */ +#ifndef __GENKSYMS__ + int (*file_mmap_addr) (struct file * file, + unsigned long reqprot, + unsigned long prot, unsigned long flags, + unsigned long addr, unsigned long addr_only); +#endif }; /* global variables */ @@ -1815,7 +1822,18 @@ static inline int security_file_mmap (struct file *file, unsigned long reqprot, unsigned long prot, unsigned long flags) { - return security_ops->file_mmap (file, reqprot, prot, flags); + return security_ops->file_mmap_addr (file, reqprot, prot, flags, + (unsigned long)-1, 0); +} + +static inline int security_file_mmap_addr (struct file *file, unsigned long reqprot, + unsigned long prot, + unsigned long flags, + unsigned long addr, + unsigned long addr_only) +{ + return security_ops->file_mmap_addr (file, reqprot, prot, flags, + addr, addr_only); } static inline int security_file_mprotect (struct vm_area_struct *vma, @@ -2493,6 +2511,15 @@ static inline int security_file_mmap (struct file *file, unsigned long reqprot, return 0; } +static inline int security_file_mmap_addr (struct file *file, unsigned long reqprot, + unsigned long prot, + unsigned long flags, + unsigned long addr, + unsigned long addr_only) +{ + return 0; +} + static inline int security_file_mprotect (struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 97b08e8..ac29a86 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -202,6 +202,7 @@ enum VM_VDSO_ENABLED=34, /* map VDSO into new processes? */ VM_MIN_SLAB=35, /* Percent pages ignored by zone reclaim */ VM_PAGECACHE=37, /* favor reclaiming unmapped pagecache pages */ + VM_MMAP_MIN_ADDR=38, /* amound of memory to protect from mmap */ }; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d248271..e935b76 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -827,7 +827,6 @@ static ctl_table kern_table[] = { .proc_handler = &proc_dointvec, }, #endif - { .ctl_name = 0 } }; @@ -886,6 +885,16 @@ static ctl_table vm_table[] = { .extra1 = &zero, .extra2 = &one_hundred, }, +#ifdef CONFIG_SECURITY + { + .ctl_name = VM_MMAP_MIN_ADDR, + .procname = "mmap_min_addr", + .data = &mmap_min_addr, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, +#endif { .ctl_name = VM_DIRTY_WB_CS, .procname = "dirty_writeback_centisecs", diff --git a/mm/mmap.c b/mm/mmap.c index 883892c..8ce2fdf 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -931,6 +931,9 @@ unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, if (!len) return -EINVAL; + if (!(flags & MAP_FIXED)) + addr = round_hint_to_min(addr); + error = arch_mmap_check(addr, len, flags); if (error) return error; @@ -1028,10 +1031,10 @@ unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, } } - error = security_file_mmap(file, reqprot, prot, flags); + error = security_file_mmap_addr(file, reqprot, prot, flags, addr, 0); if (error) return error; - + /* Clear old maps */ error = -ENOMEM; munmap_back: @@ -1684,6 +1687,12 @@ int expand_stack(struct vm_area_struct *vma, unsigned long address) */ if (unlikely(anon_vma_prepare(vma))) return -ENOMEM; + + address &= PAGE_MASK; + error = security_file_mmap_addr(0, 0, 0, 0, address, 1); + if (error) + return error; + anon_vma_lock(vma); /* @@ -1691,8 +1700,6 @@ int expand_stack(struct vm_area_struct *vma, unsigned long address) * is required to hold the mmap_sem in read mode. We need the * anon_vma lock to serialize against concurrent expand_stacks. */ - address &= PAGE_MASK; - error = 0; /* Somebody else might have raced and expanded it already */ if (address < vma->vm_start) { @@ -1974,6 +1981,10 @@ unsigned long do_brk(unsigned long addr, unsigned long len) is_hugepage_only_range(mm, addr, len)) return -EINVAL; + error = security_file_mmap_addr(0, 0, 0, 0, addr, 1); + if (error) + return error; + flags = VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags; error = arch_mmap_check(addr, len, flags); diff --git a/mm/mremap.c b/mm/mremap.c index f4e90ee..356e0e0 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -290,6 +290,10 @@ unsigned long do_mremap(unsigned long addr, if ((addr <= new_addr) && (addr+old_len) > new_addr) goto out; + ret = security_file_mmap_addr(0, 0, 0, 0, new_addr, 1); + if (ret) + goto out; + ret = do_munmap(mm, new_addr, new_len); if (ret) goto out; @@ -389,8 +393,14 @@ unsigned long do_mremap(unsigned long addr, new_addr = get_unmapped_area_prot(vma->vm_file, 0, new_len, vma->vm_pgoff, map_flags, vma->vm_flags & VM_EXEC); - ret = new_addr; - if (new_addr & ~PAGE_MASK) + + if (new_addr & ~PAGE_MASK) { + ret = new_addr; + goto out; + } + + ret = security_file_mmap_addr(0, 0, 0, 0, new_addr, 1); + if (ret) goto out; } ret = move_vma(vma, addr, old_len, new_len, new_addr); diff --git a/mm/nommu.c b/mm/nommu.c index 943bca5..084fb90 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -527,7 +527,7 @@ static int validate_mmap_request(struct file *file, } /* allow the security API to have its say */ - ret = security_file_mmap(file, reqprot, prot, flags); + ret = security_file_mmap_addr(file, reqprot, prot, flags, addr, 0); if (ret < 0) return ret; @@ -693,6 +693,9 @@ unsigned long do_mmap_pgoff(struct file *file, void *result; int ret; + if (!(flags & MAP_FIXED)) + addr = round_hint_to_min(addr); + /* decide whether we should attempt the mapping, and if so what sort of * mapping */ ret = validate_mmap_request(file, addr, len, prot, flags, pgoff, diff --git a/security/dummy.c b/security/dummy.c index 558795b..962235e 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -689,6 +689,17 @@ static int dummy_netlink_recv (struct sk_buff *skb, int cap) return 0; } +static int dummy_file_mmap_addr (struct file *file, unsigned long reqprot, + unsigned long prot, + unsigned long flags, + unsigned long addr, + unsigned long addr_only) +{ + if ((addr < mmap_min_addr) && !capable(CAP_SYS_RAWIO)) + return -EACCES; + return 0; +} + #ifdef CONFIG_SECURITY_NETWORK static int dummy_unix_stream_connect (struct socket *sock, struct socket *other, @@ -1080,6 +1091,7 @@ void security_fixup_ops (struct security_operations *ops) set_to_dummy_if_null(ops, setprocattr); set_to_dummy_if_null(ops, secid_to_secctx); set_to_dummy_if_null(ops, release_secctx); + set_to_dummy_if_null(ops, file_mmap_addr); #ifdef CONFIG_SECURITY_NETWORK set_to_dummy_if_null(ops, unix_stream_connect); set_to_dummy_if_null(ops, unix_may_send); diff --git a/security/security.c b/security/security.c index ee4e070..8e5a2b1 100644 --- a/security/security.c +++ b/security/security.c @@ -25,6 +25,7 @@ extern struct security_operations dummy_security_ops; extern void security_fixup_ops(struct security_operations *ops); struct security_operations *security_ops; /* Initialized to NULL */ +unsigned long mmap_min_addr = 65536; /* 0 means no protection */ static inline int verify(struct security_operations *ops) { @@ -177,4 +178,5 @@ EXPORT_SYMBOL_GPL(register_security); EXPORT_SYMBOL_GPL(unregister_security); EXPORT_SYMBOL_GPL(mod_reg_security); EXPORT_SYMBOL_GPL(mod_unreg_security); +EXPORT_SYMBOL_GPL(mmap_min_addr); EXPORT_SYMBOL(security_ops); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 20628dd..5425f88 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2562,13 +2562,17 @@ static int file_map_prot_check(struct file *file, unsigned long prot, int shared return 0; } -static int selinux_file_mmap(struct file *file, unsigned long reqprot, - unsigned long prot, unsigned long flags) +static int selinux_file_mmap_addr(struct file *file, unsigned long reqprot, + unsigned long prot, unsigned long flags, + unsigned long addr, unsigned long addr_only) { - int rc; + int rc = 0; + u32 sid = ((struct task_security_struct*)(current->security))->sid; - rc = secondary_ops->file_mmap(file, reqprot, prot, flags); - if (rc) + if (addr < mmap_min_addr) + rc = avc_has_perm(sid, sid, SECCLASS_MEMPROTECT, + MEMPROTECT__MMAP_ZERO, NULL); + if (rc || addr_only) return rc; if (selinux_checkreqprot) @@ -2578,6 +2582,14 @@ static int selinux_file_mmap(struct file *file, unsigned long reqprot, (flags & MAP_TYPE) == MAP_SHARED); } +/* should be dead code unless some external garbage screwed with the security_ops */ +static int selinux_file_mmap(struct file *file, unsigned long reqprot, + unsigned long prot, unsigned long flags) +{ + return selinux_file_mmap_addr(file, reqprot, prot, flags, + (unsigned long)-1, 0); +} + static int selinux_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot) @@ -4807,6 +4819,7 @@ static struct security_operations selinux_ops = { .key_free = selinux_key_free, .key_permission = selinux_key_permission, #endif + .file_mmap_addr = selinux_file_mmap_addr, }; static __init int selinux_init(void) @@ -4829,6 +4842,7 @@ static __init int selinux_init(void) sel_inode_cache = kmem_cache_create("selinux_inode_security", sizeof(struct inode_security_struct), 0, SLAB_PANIC, NULL, NULL); + avc_init(); original_ops = secondary_ops = security_ops; diff --git a/security/selinux/include/av_inherit.h b/security/selinux/include/av_inherit.h index a68fdd5..8377a4b 100644 --- a/security/selinux/include/av_inherit.h +++ b/security/selinux/include/av_inherit.h @@ -30,3 +30,4 @@ S_(SECCLASS_NETLINK_DNRT_SOCKET, socket, 0x00400000UL) S_(SECCLASS_NETLINK_KOBJECT_UEVENT_SOCKET, socket, 0x00400000UL) S_(SECCLASS_APPLETALK_SOCKET, socket, 0x00400000UL) + S_(SECCLASS_DCCP_SOCKET, socket, 0x00400000UL) diff --git a/security/selinux/include/av_perm_to_string.h b/security/selinux/include/av_perm_to_string.h index 09fc8a2..1a262af 100644 --- a/security/selinux/include/av_perm_to_string.h +++ b/security/selinux/include/av_perm_to_string.h @@ -35,12 +35,16 @@ S_(SECCLASS_NODE, NODE__RAWIP_RECV, "rawip_recv") S_(SECCLASS_NODE, NODE__RAWIP_SEND, "rawip_send") S_(SECCLASS_NODE, NODE__ENFORCE_DEST, "enforce_dest") + S_(SECCLASS_NODE, NODE__DCCP_RECV, "dccp_recv") + S_(SECCLASS_NODE, NODE__DCCP_SEND, "dccp_send") S_(SECCLASS_NETIF, NETIF__TCP_RECV, "tcp_recv") S_(SECCLASS_NETIF, NETIF__TCP_SEND, "tcp_send") S_(SECCLASS_NETIF, NETIF__UDP_RECV, "udp_recv") S_(SECCLASS_NETIF, NETIF__UDP_SEND, "udp_send") S_(SECCLASS_NETIF, NETIF__RAWIP_RECV, "rawip_recv") S_(SECCLASS_NETIF, NETIF__RAWIP_SEND, "rawip_send") + S_(SECCLASS_NETIF, NETIF__DCCP_RECV, "dccp_recv") + S_(SECCLASS_NETIF, NETIF__DCCP_SEND, "dccp_send") S_(SECCLASS_UNIX_STREAM_SOCKET, UNIX_STREAM_SOCKET__CONNECTTO, "connectto") S_(SECCLASS_UNIX_STREAM_SOCKET, UNIX_STREAM_SOCKET__NEWCONN, "newconn") S_(SECCLASS_UNIX_STREAM_SOCKET, UNIX_STREAM_SOCKET__ACCEPTFROM, "acceptfrom") @@ -252,3 +256,8 @@ S_(SECCLASS_KEY, KEY__LINK, "link") S_(SECCLASS_KEY, KEY__SETATTR, "setattr") S_(SECCLASS_KEY, KEY__CREATE, "create") + S_(SECCLASS_CONTEXT, CONTEXT__TRANSLATE, "translate") + S_(SECCLASS_CONTEXT, CONTEXT__CONTAINS, "contains") + S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NODE_BIND, "node_bind") + S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NAME_CONNECT, "name_connect") + S_(SECCLASS_MEMPROTECT, MEMPROTECT__MMAP_ZERO, "mmap_zero") diff --git a/security/selinux/include/av_permissions.h b/security/selinux/include/av_permissions.h index 81f4f52..5ec9101 100644 --- a/security/selinux/include/av_permissions.h +++ b/security/selinux/include/av_permissions.h @@ -312,6 +312,8 @@ #define NODE__RAWIP_RECV 0x00000010UL #define NODE__RAWIP_SEND 0x00000020UL #define NODE__ENFORCE_DEST 0x00000040UL +#define NODE__DCCP_RECV 0x00000080UL +#define NODE__DCCP_SEND 0x00000100UL #define NETIF__TCP_RECV 0x00000001UL #define NETIF__TCP_SEND 0x00000002UL @@ -319,6 +321,8 @@ #define NETIF__UDP_SEND 0x00000008UL #define NETIF__RAWIP_RECV 0x00000010UL #define NETIF__RAWIP_SEND 0x00000020UL +#define NETIF__DCCP_RECV 0x00000040UL +#define NETIF__DCCP_SEND 0x00000080UL #define NETLINK_SOCKET__IOCTL 0x00000001UL #define NETLINK_SOCKET__READ 0x00000002UL @@ -970,3 +974,33 @@ #define KEY__LINK 0x00000010UL #define KEY__SETATTR 0x00000020UL #define KEY__CREATE 0x00000040UL + +#define CONTEXT__TRANSLATE 0x00000001UL +#define CONTEXT__CONTAINS 0x00000002UL + +#define DCCP_SOCKET__IOCTL 0x00000001UL +#define DCCP_SOCKET__READ 0x00000002UL +#define DCCP_SOCKET__WRITE 0x00000004UL +#define DCCP_SOCKET__CREATE 0x00000008UL +#define DCCP_SOCKET__GETATTR 0x00000010UL +#define DCCP_SOCKET__SETATTR 0x00000020UL +#define DCCP_SOCKET__LOCK 0x00000040UL +#define DCCP_SOCKET__RELABELFROM 0x00000080UL +#define DCCP_SOCKET__RELABELTO 0x00000100UL +#define DCCP_SOCKET__APPEND 0x00000200UL +#define DCCP_SOCKET__BIND 0x00000400UL +#define DCCP_SOCKET__CONNECT 0x00000800UL +#define DCCP_SOCKET__LISTEN 0x00001000UL +#define DCCP_SOCKET__ACCEPT 0x00002000UL +#define DCCP_SOCKET__GETOPT 0x00004000UL +#define DCCP_SOCKET__SETOPT 0x00008000UL +#define DCCP_SOCKET__SHUTDOWN 0x00010000UL +#define DCCP_SOCKET__RECVFROM 0x00020000UL +#define DCCP_SOCKET__SENDTO 0x00040000UL +#define DCCP_SOCKET__RECV_MSG 0x00080000UL +#define DCCP_SOCKET__SEND_MSG 0x00100000UL +#define DCCP_SOCKET__NAME_BIND 0x00200000UL +#define DCCP_SOCKET__NODE_BIND 0x00400000UL +#define DCCP_SOCKET__NAME_CONNECT 0x00800000UL + +#define MEMPROTECT__MMAP_ZERO 0x00000001UL diff --git a/security/selinux/include/class_to_string.h b/security/selinux/include/class_to_string.h index 24303b6..9d86294 100644 --- a/security/selinux/include/class_to_string.h +++ b/security/selinux/include/class_to_string.h @@ -61,3 +61,6 @@ S_("appletalk_socket") S_("packet") S_("key") + S_("context") + S_("dccp_socket") + S_("memprotect") diff --git a/security/selinux/include/flask.h b/security/selinux/include/flask.h index 95887ae..36639dc 100644 --- a/security/selinux/include/flask.h +++ b/security/selinux/include/flask.h @@ -63,6 +63,9 @@ #define SECCLASS_APPLETALK_SOCKET 56 #define SECCLASS_PACKET 57 #define SECCLASS_KEY 58 +#define SECCLASS_CONTEXT 59 +#define SECCLASS_DCCP_SOCKET 60 +#define SECCLASS_MEMPROTECT 61 /* * Security identifier indices for initial entities