From: Marcus Barrow <mbarrow@redhat.com> Date: Wed, 8 Apr 2009 00:09:39 -0400 Subject: [scsi] qla2xxx: reduce DID_BUS_BUSY failover errors Message-id: 20090408040939.12343.33337.sendpatchset@file.bos.redhat.com O-Subject: [rhel 5.4 patch] qla2xxx : Reduce DID_BUS_BUSY errors causing failover Bugzilla: 244967 RH-Acked-by: Mike Christie <mchristi@redhat.com> BZ 244967 Frequent path failures during I/O on DM multipath devices This patch changes the driver to reduce the number of conditions it returns DID_BUS_BUSY for. That error is very serious and causes path failovers. Errors caused by a dropped frame are responded to with DID_ERROR causing a retry to occur. This patch applies and builds cleanly with 2.6.18-137. It is tested at QLogic. qla2xxx - reduce use of DID_BUS_BUSY Instead of BUS_BUSY return TRANSPORT_DISRUPTED or DID_ERROR. Use DID_ERROR for dropped frame on CS_UNDERRUN instead of DID_BUSY With DID_TRANSPORT_DISRUPTED usage, IO will not fail until fast IO fail fires. Or if fast IO fail is not set IO will fail with the dev loss tmo is fired. This may change the behavior, the users would have to set the fast IO fail to get the IO fail quickly, than to wait for the dev loss tmo to fire. diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index 7901e24..554d77f 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -1185,7 +1185,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt) /* * If RISC reports underrun and target does not report * it then we must have a lost frame, so tell upper - * layer to retry it by reporting a bus busy. + * layer to retry it by reporting a did error. */ if (!(scsi_status & SS_RESIDUAL_UNDER)) { DEBUG2(printk("scsi(%ld:%d:%d:%d) Dropped " @@ -1195,7 +1195,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt) cp->device->lun, resid, scsi_bufflen(cp))); - cp->result = DID_BUS_BUSY << 16; + cp->result = DID_ERROR << 16; break; } @@ -1252,7 +1252,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt) cp->serial_number, comp_status, atomic_read(&fcport->state))); - cp->result = DID_BUS_BUSY << 16; + cp->result = DID_TRANSPORT_DISRUPTED << 16; if (atomic_read(&fcport->state) == FCS_ONLINE) { qla2x00_mark_device_lost(ha, fcport, 1, 1); } @@ -1280,7 +1280,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt) break; case CS_TIMEOUT: - cp->result = DID_BUS_BUSY << 16; + cp->result = DID_TRANSPORT_DISRUPTED << 16; if (IS_FWI2_CAPABLE(ha)) { DEBUG2(printk(KERN_INFO