Sophie: kernel-2.6.18-238.19.1.el5.centos.plus src

kernel-2.6.18-238.19.1.el5.centos.plus.src.rpm

From: Oleg Nesterov <oleg@redhat.com>
Date: Mon, 29 Nov 2010 14:21:38 -0500
Subject: [misc] posix-cpu-timers: workaround for mt exec problems
Message-id: <20101129142138.GA7063@redhat.com>
Patchwork-id: 29648
O-Subject: [RHEL5.6 PATCH] bz656266: posix-cpu-timers: workaround to suppress
	the problems with mt exec
Bugzilla: 656266
RH-Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>

https://bugzilla.redhat.com/show_bug.cgi?id=656266

Upstream commit e0a70217107e6f9844628120412cb27bb4cea194
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Fri Nov 5 16:53:42 2010 +0100

    posix-cpu-timers: workaround to suppress the problems with mt exec

    posix-cpu-timers.c correctly assumes that the dying process does
    posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD
    timers from signal->cpu_timers list.

    But, it also assumes that timer->it.cpu.task is always the group
    leader, and thus the dead ->task means the dead thread group.

    This is obviously not true after de_thread() changes the leader.
    After that almost every posix_cpu_timer_ method has problems.

    It is not simple to fix this bug correctly. First of all, I think
    that timer->it.cpu should use struct pid instead of task_struct.
    Also, the locking should be reworked completely. In particular,
    tasklist_lock should not be used at all. This all needs a lot of
    nontrivial and hard-to-test changes.

    Change __exit_signal() to do posix_cpu_timers_exit_group() when
    the old leader dies during exec. This is not the fix, just the
    temporary hack to hide the problem for 2.6.37 and stable. IOW,
    this is obviously wrong but this is what we currently have anyway:
    cpu timers do not work after mt exec.

    In theory this change adds another race. The exiting leader can
    detach the timers which were attached to the new leader. However,
    the window between de_thread() and release_task() is small, we
    can pretend that sys_timer_create() was called before de_thread().

    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>

Trivial backport. __exit_signal() differs from upstream, but the
logic is the same. thread_group_leader() can't be released before
other threads, so has_group_leader_pid() can't be false positive,
it always means that the new leader does release_task(old_leader).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

diff --git a/kernel/exit.c b/kernel/exit.c
index 695fbe4..b45d02a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -88,6 +88,14 @@ static void __exit_signal(struct task_struct *tsk)
 		posix_cpu_timers_exit_group(tsk);
 	else {
 		/*
+		 * This can only happen if the caller is de_thread().
+		 * FIXME: this is the temporary hack, we should teach
+		 * posix-cpu-timers to handle this case correctly.
+		 */
+		if (unlikely(has_group_leader_pid(tsk)))
+			posix_cpu_timers_exit_group(tsk);
+
+		/*
 		 * If there is any task waiting for the group exit
 		 * then notify it:
 		 */