From: mchristi@redhat.com <mchristi@redhat.com> Date: Wed, 11 Feb 2009 18:11:43 -0600 Subject: [net] skbuff: fix oops in skb_seq_read Message-id: 1234397503674-git-send-email-mchristi@redhat.com O-Subject: [PATCH] RHEL 5.4: net: fix oops in skb_seq_read Bugzilla: 483285 RH-Acked-by: Neil Horman <nhorman@redhat.com> RH-Acked-by: David Miller <davem@redhat.com> RH-Acked-by: Thomas Graf <tgraf@redhat.com> From: Mike Christie <mchristi@redhat.com> This is for BZ 483285. This patch fixes multiple bugs in skb_seq_read. >From the usptream commits: ------------------------------------------------------- http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=95e3b24cfb4ec0479d2c42f7a1780d68063a542a The frag_list handling was broken in skb_seq_read: 1) We didn't add the stepped offset when looking at the head are of fragments other than the first. 2) We didn't take the stepped offset away when setting the data pointer in the head area. 3) The frag index wasn't reset ------------------------------------------------------- http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=71b3346d182355f19509fadb8fe45114a35cc499 The st pointer is shifted to the next skb whereas actually it should have hit the second else if first since the data is in the frag_list. ------------------------------------------------------- http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=5b5a60da281c767196427ce8144deae6ec46b389 Having walked through the entire skbuff, skb_seq_read would leave the last fragment mapped. As a consequence, the unwary caller would leak kmaps, and proceed with preempt_count off by one. The only (kind of non-intuitive) workaround is to use skb_seq_read_abort. I have tested with a couple ixgbe drivers from intel's website with LRO enabled. Some there do not build becuase of the missing net if lro, but others do compile and have a warning to not use with iscsi because of its use of skb_seq_read. Our iscsi driver in 5.3 does not use skb_seq_read, but I am porting iscsi fixes for 5.4 that will use it, so this is why I am sending the net fixes now. With or without the intel LRO driver I did not hit the oops. Upstream, I also did not hit the oops like intel and lsi did. I would see data corruption every once and a while instead (did not see that with rhel though). diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 750ec6f..836cc5f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1720,11 +1720,11 @@ void skb_prepare_seq_read(struct sk_buff *skb, unsigned int from, * of bytes already consumed and the next call to * skb_seq_read() will return the remaining part of the block. * - * Note: The size of each block of data returned can be arbitary, + * Note 1: The size of each block of data returned can be arbitary, * this limitation is the cost for zerocopy seqeuental * reads of potentially non linear data. * - * Note: Fragment lists within fragments are not implemented + * Note 2: Fragment lists within fragments are not implemented * at the moment, state->root_skb could be replaced with * a stack for this purpose. */ @@ -1738,10 +1738,10 @@ unsigned int skb_seq_read(unsigned int consumed, const u8 **data, return 0; next_skb: - block_limit = skb_headlen(st->cur_skb); + block_limit = skb_headlen(st->cur_skb) + st->stepped_offset; if (abs_offset < block_limit) { - *data = st->cur_skb->data + abs_offset; + *data = st->cur_skb->data + (abs_offset - st->stepped_offset); return block_limit - abs_offset; } @@ -1771,13 +1771,19 @@ next_skb: st->stepped_offset += frag->size; } - if (st->cur_skb->next) { - st->cur_skb = st->cur_skb->next; + if (st->frag_data) { + kunmap_skb_frag(st->frag_data); + st->frag_data = NULL; + } + + if (st->root_skb == st->cur_skb && + skb_shinfo(st->root_skb)->frag_list) { + st->cur_skb = skb_shinfo(st->root_skb)->frag_list; st->frag_idx = 0; goto next_skb; - } else if (st->root_skb == st->cur_skb && - skb_shinfo(st->root_skb)->frag_list) { - st->cur_skb = skb_shinfo(st->root_skb)->frag_list; + } else if (st->cur_skb->next) { + st->cur_skb = st->cur_skb->next; + st->frag_idx = 0; goto next_skb; }