From: AMEET M. PARANJAPE <aparanja@redhat.com> Date: Thu, 16 Oct 2008 16:53:51 -0500 Subject: [ppc64] SPUs hang when run with affinity-1 Message-id: 48F7B7EF.6070804@REDHAT.COM O-Subject: Re: [PATCH RHEL5.3 BZ464686 1/2] Fix SPUs hangs when run with affinity Bugzilla: 464686 RH-Acked-by: David Howells <dhowells@redhat.com> RHBZ# ====== https://bugzilla.redhat.com/show_bug.cgi?id=464686 Description: =========== The fix for this problem requires two patches the first of these patches is described here: It is possible to lock aff_mutex and cbe_spu_info[n].list_mutex in different orders, allowing a deadlock to occur. With this change, aff_mutex is not taken within a list_mutex critical section anymore. The other patch will be attached to a post to follow. RHEL Version Found: ================ RHEL 5.2 kABI Status: ============ No symbols were harmed. Brew: ===== Built on all platforms. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1508838 Upstream Status: ================ The patches were accepted upstream in kernel 2.6.27-rc1 (see http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.27-rc1) Test Status: ============ A testcase is provided in the Red Hat Bugzilla and without the patches the dmabench stress test hangs and a reboot is required to run another application on the Cell Synergistic Processing Elements (SPEs). With these patches dmabench runs successfully and running other SPE programs afterwards is possible again. =============================================================== Ameet Paranjape 978-392-3903 ext 23903 IBM on-site partner Proposed Patch: =============== diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index 24f4c43..9df2068 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -382,6 +382,9 @@ static int has_affinity(struct spu_context *ctx) if (list_empty(&ctx->aff_list)) return 0; + if (atomic_read(&gang->aff_sched_count) == 0) + gang->aff_ref_spu = NULL; + if (!gang->aff_ref_spu) { if (!(gang->aff_flags & AFF_MERGED)) aff_merge_remaining_ctxs(gang); @@ -407,14 +410,13 @@ static void spu_unbind_context(struct spu *spu, struct spu_context *ctx) if (spu->ctx->flags & SPU_CREATE_NOSCHED) atomic_dec(&cbe_spu_info[spu->node].reserved_spus); - if (ctx->gang){ - mutex_lock(&ctx->gang->aff_mutex); - if (has_affinity(ctx)) { - if (atomic_dec_and_test(&ctx->gang->aff_sched_count)) - ctx->gang->aff_ref_spu = NULL; - } - mutex_unlock(&ctx->gang->aff_mutex); - } + if (ctx->gang) + /* + * If ctx->gang->aff_sched_count is positive, SPU affinity is + * being considered in this gang. Using atomic_dec_if_positive + * allow us to skip an explicit check for affinity in this gang + */ + atomic_dec_if_positive(&ctx->gang->aff_sched_count); spu_switch_notify(spu, NULL); spu_unmap_mappings(ctx); @@ -543,11 +545,7 @@ static struct spu *spu_get_idle(struct spu_context *ctx) goto found; mutex_unlock(&cbe_spu_info[node].list_mutex); - mutex_lock(&ctx->gang->aff_mutex); - if (atomic_dec_and_test(&ctx->gang->aff_sched_count)) - ctx->gang->aff_ref_spu = NULL; - mutex_unlock(&ctx->gang->aff_mutex); - + atomic_dec(&ctx->gang->aff_sched_count); return NULL; } mutex_unlock(&ctx->gang->aff_mutex);