From: AMEET M. PARANJAPE <aparanja@redhat.com> Date: Fri, 7 Nov 2008 14:48:53 -0600 Subject: [openib] ehca: remove ref to QP if port activation fails Message-id: 4914A9B5.6020302@REDHAT.COM O-Subject: Re: [PATCH RHEL5.3 BZ469941] IB/ehca: remove reference to the QP in case of port activation failure Bugzilla: 469941 RH-Acked-by: Doug Ledford <dledford@redhat.com> RHBZ#: ====== https://bugzilla.redhat.com/show_bug.cgi?id=469941 Description: =========== If the initialization of a special queue pair (QP) (e.g. AQP1) fails due to a software timeout,we have to remove the reference to that special QP struct from the port struct preventing the driver to access the QP, since it will be/has been destroyed by the caller, ie in this case ib_mad. RHEL Version Found: ================ RHEL 5.3 alpha and beta kABI Status: ============ No symbols were harmed. Brew: ===== Built on all platforms. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1556681 Upstream Status: ================ http://lkml.org/lkml/2008/11/4/178 Test Status: ============ On ppc64 system it occured that the ehca ports were not comming up in time. A while after the timeout of the probe function, the "port up" event occured and lead in some cases to a kernel panic. With this patch the issue is not seen anymore. diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 99642a6..4c57161 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -370,25 +370,27 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) switch (ec) { case 0x30: /* port availability change */ if (EHCA_BMASK_GET(NEQE_PORT_AVAILABILITY, eqe)) { - int suppress_event; - /* replay modify_qp for sqps */ - spin_lock_irqsave(&sport->mod_sqp_lock, flags); - suppress_event = !sport->ibqp_sqp[IB_QPT_GSI]; - if (sport->ibqp_sqp[IB_QPT_SMI]) - ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_SMI]); - if (!suppress_event) - ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_GSI]); - spin_unlock_irqrestore(&sport->mod_sqp_lock, flags); - - /* AQP1 was destroyed, ignore this event */ - if (suppress_event) - break; + /* perform recovery only for port auto-detect mode */ + if (ehca_nr_ports == -1) { + int suppress_event; + /* replay modify_qp for sqps */ + spin_lock_irqsave(&sport->mod_sqp_lock, flags); + suppress_event = !sport->ibqp_sqp[IB_QPT_GSI]; + if (sport->ibqp_sqp[IB_QPT_SMI]) + ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_SMI]); + if (!suppress_event) + ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_GSI]); + spin_unlock_irqrestore(&sport->mod_sqp_lock, flags); + + /* AQP1 was destroyed, ignore this event */ + if (suppress_event) + break; + } sport->port_state = IB_PORT_ACTIVE; dispatch_port_event(shca, port, IB_EVENT_PORT_ACTIVE, "is active"); - ehca_query_sma_attr(shca, port, - &sport->saved_attr); + ehca_query_sma_attr(shca, port, &sport->saved_attr); } else { sport->port_state = IB_PORT_DOWN; dispatch_port_event(shca, port, IB_EVENT_PORT_ERR, diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 556de4c..f3ca3f6 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -847,6 +847,11 @@ static struct ehca_qp *internal_create_qp( if (qp_type == IB_QPT_GSI) { h_ret = ehca_define_sqp(shca, my_qp, init_attr); if (h_ret != H_SUCCESS) { + kfree(my_qp->mod_qp_parm); + my_qp->mod_qp_parm = NULL; + /* the QP pointer is no longer valid */ + shca->sport[init_attr->port_num - 1].ibqp_sqp[qp_type] = + NULL; ret = ehca2ib_return_code(h_ret); goto create_qp_exit6; }