From: Paolo Bonzini <pbonzini@redhat.com> Date: Wed, 4 Aug 2010 15:24:14 -0400 Subject: [net] cxgb3: alt buffer freeing strategy when xen dom0 Message-id: <1280935454-22447-3-git-send-email-pbonzini@redhat.com> Patchwork-id: 27384 O-Subject: [RHEL5.5 PATCH] cxgb3: adopt alternative buffer freeing strategy when running on dom0 Bugzilla: 488882 Bugzilla: 488882 Upstream status: driver does not exist in linux-2.6.18.hg Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=2651766 Patch mostly coming from Chelsio, the only change I made is to restrict the new path to dom0, in order to avoid losing performance when the NIC is passed through to a PV domain. > In all the virtualized environments we have tested, the VM's app's > send buffer frees up its load only when the hypervisor's driver frees > the corresponding skb. cxgb3 however does not free a TX skb on DMA > completion. The driver relies on FW generated credit returns posted on > the receive control queues. > > In non virtual environments, the driver programs the HW to coalesce > these credit returns to minimize the FW management load, and relies on > skb_orphan() to free up space in the app'send buffer. skbs are freed on > credit return receptions. It does not work for the VMs, skb_orphan() > won't free up virtualized app'send buffer. > > The attached patch provides a much more aggressive credit return policy, > and has solved our perf issues on other virtualized platforms. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 4d44cec..0ca7c72 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -1265,7 +1265,31 @@ int t3_eth_xmit(struct sk_buff *skb, struct net_device *dev) gen = q->gen; q->unacked += ndesc; - compl = (q->unacked & 8) << (S_WR_COMPL - 3); +#ifdef CONFIG_XEN + if (is_initial_xendomain()) + /* + * Some Guest OS clients get terrible performance when + * they have bad message size / socket send buffer space + * parameters. For instance, if an application selects an + * 8KB message size and an 8KB send socket buffer size. + * This forces the application into a single packet + * stop-and-go mode where it's only willing to have a + * single message outstanding. The next message is only + * sent when the previous message is noted as having + * been sent. Until we issue a kfree_skb() against the + * TX skb, the skb is charged against the application's + * send buffer space. We only free up TX skbs when we + * get a TX credit return from the hardware / firmware + * which is fairly lazy about this. So we request a TX + * WR Completion Notification on every TX descriptor in + * order to accellerate TX credit returns. See also the + * change in handle_rsp_cntrl_info() to free up TX skb's + * when we receive the TX WR Completion Notifications ... + */ + compl = F_WR_COMPL; + else +#endif + compl = (q->unacked & 8) << (S_WR_COMPL - 3); q->unacked &= 7; pidx = q->pidx; q->pidx += ndesc; @@ -2171,9 +2195,38 @@ static inline void handle_rsp_cntrl_info(struct sge_qset *qs, u32 flags) #endif credits = G_RSPD_TXQ0_CR(flags); - if (credits) + if (credits) { qs->txq[TXQ_ETH].processed += credits; - +#ifdef CONFIG_XEN + if (is_initial_xendomain()) { + /* + * In the normal Linux driver t3_eth_xmit() + * routine, we call skb_orphan() on unshared TX skb. + * This results in a call to the destructor for + * the skb which frees up the send buffer space + * it was holding down. This, in turn, allows the + * application to make forward progress generating + * more data which is important at 10Gb/s. + * For Virtual Machine Guest Operating Systems + * this doesn't work since the send buffer space is + * being held down in the Virtual Machine. Thus we + * need to get the TX skb's freed up as soon as + * possible in order to prevent applications from + * stalling. This code is largely copied from the + * corresponding code in sge_timer_tx() and should + * probably be kept in sync with any changes there. */ + if (spin_trylock(&qs->txq[TXQ_ETH].lock)) { + struct sge_txq *q = &qs->txq[TXQ_ETH]; + struct port_info *pi = netdev_priv(qs->netdev); + struct adapter *adap = pi->adapter; + + reclaim_completed_tx(adap, &qs->txq[TXQ_ETH], + TX_RECLAIM_CHUNK); + spin_unlock(&qs->txq[TXQ_ETH].lock); + } + } +#endif + } credits = G_RSPD_TXQ2_CR(flags); if (credits) qs->txq[TXQ_CTRL].processed += credits;