From: David Teigland <teigland@redhat.com> Date: Thu, 5 Jun 2008 14:10:26 -0500 Subject: [dlm] call to confirm_master in receive_request_reply Message-id: 20080605191026.GG18635@redhat.com O-Subject: [RHEL5.3 PATCH 07/18] dlm: another call to confirm_master in receive_request_reply Bugzilla: 450132 RH-Acked-by: Bob Peterson <rpeterso@redhat.com> bz 450132 dlm: fixes for recovery of user lockspace Fix bugs when userland apps using the dlm join/leave the lockspace, causing recovery. brew build including this patch http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1344633 upstream commit: >From aec64e1be2225c6fc64499594d23257c6adf6168 Mon Sep 17 00:00:00 2001 >From: David Teigland <teigland@redhat.com> >Date: Tue, 8 Jan 2008 15:37:47 -0600 >Subject: [PATCH] dlm: another call to confirm_master in receive_request_reply When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of retried, there may be other lkb's waiting on the rsb_lookup list for it to complete. A call to confirm_master() is needed to move on to the next waiting lkb since the current one won't be retried. Signed-off-by: David Teigland <teigland@redhat.com> diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index f3fd150..37a2988 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1941,8 +1941,11 @@ static void confirm_master(struct dlm_rsb *r, int error) break; case -EAGAIN: - /* the remote master didn't queue our NOQUEUE request; - make a waiting lkb the first_lkid */ + case -EBADR: + case -ENOTBLK: + /* the remote request failed and won't be retried (it was + a NOQUEUE, or has been canceled/unlocked); make a waiting + lkb the first_lkid */ r->res_first_lkid = 0; @@ -3383,6 +3386,7 @@ static void receive_request_reply(struct dlm_ls *ls, struct dlm_message *ms) if (is_overlap(lkb)) { /* we'll ignore error in cancel/unlock reply */ queue_cast_overlap(r, lkb); + confirm_master(r, result); unhold_lkb(lkb); /* undoes create_lkb() */ } else _request_lock(r, lkb);