Sophie: kernel-2.6.18-238.19.1.el5.centos.plus src

kernel-2.6.18-238.19.1.el5.centos.plus.src.rpm

From: Jesse Larrew <jlarrew@redhat.com>
Date: Wed, 3 Jun 2009 15:52:04 -0400
Subject: [sched] fix cond_resched_softirq() offset
Message-id: 20090603194854.27279.10170.sendpatchset@squad5-lp1.lab.bos.redhat.com
O-Subject: [PATCH RHEL5.3 BZ496935] umount stalls machine.
Bugzilla: 496935
RH-Acked-by: Peter Zijlstra <pzijlstr@redhat.com>

RHBZ#:
======
https://bugzilla.redhat.com/show_bug.cgi?id=496935

Description:
===========
This is a bug fix for RHEL 5.3 on all archs.

Attempting to umount a filesystem with >20 million files results in a
softlock oops and stalls the machine. This occurs because the umount
process does not relinquish the cpu even though cond_reschedule_lock()
is called. The reason that cond_reschedule_lock() does not schedule out
the process is because the call to resched_legal(1) always fails.

The solution from upstream is to remove the __resched_legal() check: it
is conceptually broken. The biggest problem it had is that it can mask
buggy cond_resched() calls. A cond_resched() call is only legal if we
are not in an atomic context, with two narrow exceptions:

 1) if the system is booting
 2) a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set

However, __resched_legal() hid this and just silently returned whenever
these primitives were called from invalid contexts. (The same goes for
cond_resched_locked() and cond_resched_softirq()). Furthermore, the
__legal_resched(0) call was buggy in that it caused unnecessarily long
softirq latencies via cond_resched_softirq() (which is only called from
softirq-off sections, hence the code did nothing.) The fix is to
resurrect the efficiency of the might_sleep checks and to
only allow the narrow exceptions.

RHEL Version Found:
================
RHEL 5.3

kABI Status:
============
No symbols were harmed.

Brew:
=====
Built on all platforms.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1824336

Upstream Status:
================
committed upstream:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9414232fa0cc28e2f51b8c76d260f2748f7953fc

Test Status:
============
This can be reproduced by mounting a filesystem (e.g. to /mnt)
containing more than 20 million files and running the following commands:

du -s /mnt
umount /mnt

The system becomes unresponsive for a couple of seconds, and a softlock
oops message is seen on the console. With this patch applied, no oops
occurs. This has been verified by Ramachandra Pai <linuxram@us.ibm.com>.

===============================================================

Jesse Larrew
IBM Onsite Partner
978-392-3183
jlarrew@redhat.com

Proposed Patch:
===============
This patch is based on kernel-2.6.18-151.el5.

diff --git a/kernel/sched.c b/kernel/sched.c
index 9921513..267715b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4776,15 +4776,6 @@ asmlinkage long sys_sched_yield(void)
 	return 0;
 }
 
-static inline int __resched_legal(int expected_preempt_count)
-{
-	if (unlikely(preempt_count() != expected_preempt_count))
-		return 0;
-	if (unlikely(system_state != SYSTEM_RUNNING))
-		return 0;
-	return 1;
-}
-
 static void __cond_resched(void)
 {
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
@@ -4804,7 +4795,8 @@ static void __cond_resched(void)
 
 int __sched cond_resched(void)
 {
-	if (need_resched() && __resched_legal(0)) {
+	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
+					system_state == SYSTEM_RUNNING) {
 		__cond_resched();
 		return 1;
 	}
@@ -4830,7 +4822,7 @@ int cond_resched_lock(spinlock_t *lock)
 		ret = 1;
 		spin_lock(lock);
 	}
-	if (need_resched() && __resched_legal(1)) {
+	if (need_resched() && system_state == SYSTEM_RUNNING) {
 		spin_release(&lock->dep_map, 1, _THIS_IP_);
 		_raw_spin_unlock(lock);
 		preempt_enable_no_resched();
@@ -4846,7 +4838,7 @@ int __sched cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
-	if (need_resched() && __resched_legal(0)) {
+	if (need_resched() && system_state == SYSTEM_RUNNING) {
 		raw_local_irq_disable();
 		_local_bh_enable();
 		raw_local_irq_enable();