From: Larry Woodman <lwoodman@redhat.com> Date: Mon, 5 Apr 2010 15:53:30 -0400 Subject: [mm] fix hugepage corruption using vm.drop_caches Message-id: <1270482810.3551.304.camel@dhcp-100-19-198.bos.redhat.com> Patchwork-id: 23887 O-Subject: [RHEL5 Patch] vm.drop_caches corrupts hugepages and causes Oracle Database ORA-600 crashes Bugzilla: 579469 RH-Acked-by: Bob Picco <bpicco@redhat.com> RH-Acked-by: Dean Nelson <dnelson@redhat.com> While running an Oracle Database, single-instance or RAC with the SGA backed by hugepages if you "echo 3 > /proc/sys/vm/drop_caches the system silently corrupts the database hugepages. This causes various ORA-600 errors by the Oracle database. This problem has been fixed upstream with commit 6649a3863232eb2e2f15ea6c622bd8ceacf96d76 --------------------------------------------------------------------------- Author: Ken Chen <kenchen@google.com> Date: Thu Feb 8 14:20:27 2007 -0800 [PATCH] hugetlb: preserve hugetlb pte dirty state __unmap_hugepage_range() is buggy that it does not preserve dirty state of huge_pte when unmapping hugepage range. It causes data corruption in the event of drop_caches being used by sys admin. For example, an application creates a hugetlb file, modify pages, then unmap it. While leaving the hugetlb file alive, comes along sys admin doing a "echo 3 > /proc/sys/vm/drop_caches". drop_pagecache_sb() will happily free all pages that aren't marked dirty if there are no active mapping. Later when application remaps the hugetlb file back and all data are gone, triggering catastrophic flip over on application. Not only that, the internal resv_huge_pages count will also get all messed up. Fix it up by marking page dirty appropriately. Signed-off-by: Ken Chen <kenchen@google.com> Cc: "Nish Aravamudan" <nish.aravamudan@gmail.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: <stable@kernel.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> ------------------------------------------------------------------------- The attached backport also fixes this problem in RHEL5-U5, BZ579469 Signed-off-by: Jarod Wilson <jarod@redhat.com> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index f724806..c6d6ff3 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -450,9 +450,13 @@ static int hugetlbfs_symlink(struct inode *dir, /* * For direct-IO reads into hugetlb pages + * mark the head page dirty */ static int hugetlbfs_set_page_dirty(struct page *page) { + struct page *head = compound_head(page); + + SetPageDirty(head); return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fa2fd01..a542f79 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -426,6 +426,8 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, continue; page = pte_page(pte); + if (pte_dirty(pte)) + set_page_dirty(page); put_page(page); } spin_unlock(&mm->page_table_lock);