From: Larry Woodman <lwoodman@redhat.com> Date: Tue, 21 Sep 2010 16:55:00 -0400 Subject: [mm] add dirty_background_bytes and dirty_bytes sysctls Message-id: <1285088100.31554.26.camel@dhcp-100-19-198.bos.redhat.com> Patchwork-id: 28329 O-Subject: [RHEL5 Patch] Backport dirty_background_bytes and dirty_bytes sysctls to RHEL 5 Bugzilla: 635782 RH-Acked-by: Rik van Riel <riel@redhat.com> On really large systems with limited IO we need the patches to control background writeout in terms of bytes rather than percentages of memory. BZ 635782 says it all: ---------------------------------------------------------------------------------------- Description of problem: With the increasing number of systems with 50 or more GiBs of RAM, dirty_ratio with a lower bound of 5% is not that helpful. For example, the affects of Bug 469848 (nfs_getattr() hangs during heavy write workloads) could be limited if we were able to limit the number of dirty pages to a much lower level on systems with a lot of RAM. Patches: The commits that need to be backported from mainline: ----------------------------------------------------------- commit 2da02997e08d3efe8174c7a47696e6f7cbe69ba9 Author: David Rientjes <rientjes@google.com> Date: Tue Jan 6 14:39:31 2009 -0800 mm: add dirty_background_bytes and dirty_bytes sysctls This change introduces two new sysctls to /proc/sys/vm: dirty_background_bytes and dirty_bytes. dirty_background_bytes is the counterpart to dirty_background_ratio and dirty_bytes is the counterpart to dirty_ratio. With growing memory capacities of individual machines, it's no longer sufficient to specify dirty thresholds as a percentage of the amount of dirtyable memory over the entire system. dirty_background_bytes and dirty_bytes specify quantities of memory, in bytes, that represent the dirty limits for the entire system. If either of these values is set, its value represents the amount of dirty memory that is needed to commence either background or direct writeback. When a `bytes' or `ratio' file is written, its counterpart becomes a function of the written value. For example, if dirty_bytes is written to be 8096, 8K of memory is required to commence direct writeback. dirty_ratio is then functionally equivalent to 8K / the amount of dirtyable memory: dirtyable_memory = free pages + mapped pages + file cache dirty_background_bytes = dirty_background_ratio * dirtyable_memory -or- dirty_background_ratio = dirty_background_bytes / dirtyable_memory AND dirty_bytes = dirty_ratio * dirtyable_memory -or- dirty_ratio = dirty_bytes / dirtyable_memory Only one of dirty_background_bytes and dirty_background_ratio may be specified at a time, and only one of dirty_bytes and dirty_ratio may be specified. When one sysctl is written, the other appears as 0 when read. The `bytes' files operate on a page size granularity since dirty limits are compared with ZVC values, which are in page units. Prior to this change, the minimum dirty_ratio was 5 as implemented by get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user written value between 0 and 100. This restriction is maintained, but dirty_bytes has a lower limit of only one page. Also prior to this change, the dirty_background_ratio could not equal or exceed dirty_ratio. This restriction is maintained in addition to restricting dirty_background_bytes. If either background threshold equals or exceeds that of the dirty threshold, it is implicitly set to half the dirty threshold. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> commit fc3501d411d34823fb9be248a95a0c44f945866f Author: Sven Wegener <sven.wegener@stealer.net> Date: Wed Feb 11 13:04:23 2009 -0800 mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches commit 9e4a5bda89034502fb144331e71a0efdfd5fae97 Author: Andrea Righi <righi.andrea@gmail.com> Date: Thu Apr 30 15:08:57 2009 -0700 mm: prevent divide error for small values of vm_dirty_bytes ----------------------------------------------------------------------------- The attached patch fixes the problem and BZ635782 Signed-off-by: Jarod Wilson <jarod@redhat.com> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 5b302a0..bb53eb8 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -216,6 +216,8 @@ enum VM_TOPDOWN_ALLOCATE_FAST=42, /* optimize speed over fragmentation in topdown alloc */ VM_MAX_RECLAIMS=43, /* max reclaims allowed */ VM_DEVZERO_OPTIMIZED=44, /* pagetables initialized with ZERO_PAGE at mmmap time */ + VM_DIRTY_BYTES=45, /* specific number of dirty bytes allowed */ + VM_DIRTY_BACKGND_BYTES=46, /* specific number of dirty background bytes allowed */ }; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index aa705cf..8579372 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -84,6 +84,8 @@ extern int flush_mmap_pages; extern int max_writeback_pages; extern int blk_iopoll_enabled; extern int vm_devzero_optimized; +extern int vm_dirty_bytes; +extern int dirty_background_bytes; #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *, @@ -1253,6 +1255,26 @@ static ctl_table vm_table[] = { .strategy = &sysctl_intvec, .extra1 = &zero, }, + { + .ctl_name = VM_DIRTY_BYTES, + .procname = "vm_dirty_bytes", + .data = &vm_dirty_bytes, + .maxlen = sizeof(vm_dirty_bytes), + .mode = 0644, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + .extra1 = &zero, + }, + { + .ctl_name = VM_DIRTY_BACKGND_BYTES, + .procname = "dirty_background_bytes", + .data = &dirty_background_bytes, + .maxlen = sizeof(dirty_background_bytes), + .mode = 0644, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + .extra1 = &zero, + }, { .ctl_name = 0 } }; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index d337e45..1c1c2dd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -98,6 +98,9 @@ int laptop_mode; EXPORT_SYMBOL(laptop_mode); +int vm_dirty_bytes = 0; +int dirty_background_bytes = 0; + /* End of sysctl-exported parameters */ @@ -146,21 +149,30 @@ get_dirty_limits(long *pbackground, long *pdirty, global_page_state(NR_ANON_PAGES)) * 100) / total_pages; - dirty_ratio = vm_dirty_ratio; + if (vm_dirty_bytes) + dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE); + else { + dirty_ratio = vm_dirty_ratio; + + /* if vm_dirty_ratio is 100 dont limit to 1/2 unmapped_ratio */ + if ((dirty_ratio > unmapped_ratio / 2) && (dirty_ratio != 100)) + dirty_ratio = unmapped_ratio / 2; - /* if vm_dirty_ratio is 100 dont limit to 1/2 unmapped_ratio */ - if ((dirty_ratio > unmapped_ratio / 2) && (dirty_ratio != 100)) - dirty_ratio = unmapped_ratio / 2; + if (dirty_ratio < 5) + dirty_ratio = 5; - if (dirty_ratio < 5) - dirty_ratio = 5; + dirty = (dirty_ratio * available_memory) / 100; + } - background_ratio = dirty_background_ratio; - if (background_ratio >= dirty_ratio) - background_ratio = dirty_ratio / 2; + if (dirty_background_bytes) + background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE); + else { + background_ratio = dirty_background_ratio; + if (background_ratio >= dirty_ratio) + background_ratio = dirty_ratio / 2; - background = (background_ratio * available_memory) / 100; - dirty = (dirty_ratio * available_memory) / 100; + background = (background_ratio * available_memory) / 100; + } tsk = current; if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) { background += background / 4;