Sophie

Sophie

distrib > Mandriva > 2010.1 > x86_64 > by-pkgid > 965e33040dd61030a94f0eb89877aee8 > files > 6475

howto-html-en-20080722-2mdv2010.1.noarch.rpm

<HTML
><HEAD
><TITLE
>Our perspective</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
TITLE="Usenet News HOWTO "
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Usenet news clients"
HREF="x1208.html"><LINK
REL="NEXT"
TITLE="Usenet software: a historical perspective"
HREF="softwarehistory.html"></HEAD
><BODY
CLASS="SECTION"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Usenet News HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x1208.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="softwarehistory.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECTION"
><H1
CLASS="SECTION"
><A
NAME="AEN1243">12. Our perspective</H1
><P
>This chapter has been added to allow us to share our perspective on
certain technical choices. Certain issues which are more a matter of
opinion than detail, are discussed here.</P
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="FEEDEFFICIENCY">12.1. Efficiency issues of NNTP</H2
><P
>    To understand why NNTP is often an inappropriate choice for
    newsfeeds, we need to understand TCP's sliding window protocol
    and the nature of NNTP. NNTP is an apalling waste of bandwidth
    for most bulk article transfer situations, because of the
    following simple reasons:</P
><P
></P
><UL
><LI
><P
>    <EM
>No compression</EM
>: articles are transferred in plain text. </P
></LI
><LI
><P
> 
    <EM
>No article transmission restart</EM
>: if a
    connection breaks halfway through an article, the next round
    will have to start with the beginning of the article.</P
></LI
><LI
><P
> 
    <EM
>Ping-pong protocol</EM
>: NNTP is unsuitable for
    bulk streaming data transfer because the TCP sliding window feature
    is unusable with NNTP.</P
></LI
></UL
><P
>    What is a ping-pong protocol? TCP uses a sliding window mechanism to
    pump out data in one direction very rapidly, and can achieve near
    wire speeds under most circumstances. However, this only works if
    the application layer protocol can aggregate a large amount of data
    and pump it out without having to stop every so often, waiting for
    an ack or a response from the other end's application layer. This is
    precisely why sending one file of 100 Mbytes by FTP takes so much less
    clock time than 10,000 files of 10 Kbytes each, all other parameters
    remaining unchanged. The trick is to keep the sliding window sliding
    smoothly over the outgoing data, blasting packets out as fast as the
    wire will carry it, without ever allowing the window to empty out
    while you wait for an ack.  Protocols which require short bursts of
    data from either end constantly, <EM
>e.g.</EM
> in the
    case of remote procedure calls, are called ``ping pong protocols''
    because they remind you of a table-tennis ball.</P
><P
>    With NNTP, this is precisely the problem. The average size
    of Usenet news messages, including header and body, is
    3 Kbytes. When thousands of such articles are sent out by
    NNTP, the sending server has to send the message ID of the
    first article, then wait for the receiving server to respond
    with a ``yes'' or ``no.'' Once the sending server gets the
    ``yes'', it sends out that article, and waits for an ``ok''
    from the receiving server. Then it sends out the message ID
    of the second article, and waits for another ``yes'' or
    ``no.'' And so on. The TCP sliding window never gets to do
    its job.  </P
><P
>    This sub-optimal use of TCP's data pumping ability, coupled with
    the absence of compression, make for a protocol which is great
    for synchronous connectivity, <EM
>e.g.</EM
> for news
    reading or real-time
    updates, but very poor for batched transfer of data which can be
    delayed and pumped out. All these are precisely reversed in the
    case of UUCP over TCP.</P
><P
>    To decide which protocol, UUCP over TCP or NNTP, is appropriate
    for your server, you must address two questions:</P
><P
></P
><OL
TYPE="1"
><LI
><P
> 
    How much time can your server afford to wait from the time
    your upstream server receives an article to the time it
    passes it on to you?</P
></LI
><LI
><P
> 
    Are you receiving the same set of hierarchies from multiple
    next-door neighbour servers, <EM
>i.e.</EM
> is your
    newsfeed flow pattern a mesh instead of a tree?</P
></LI
></OL
><P
>    If your answers to the two questions above are ``messages cannot
    wait'' and ``we operate in a mesh'', then NNTP is the correct
    protocol for your server to receive its primary feed(s). </P
><P
>    In most cases, carrier-class servers operated by major service
    providers do not want to accept even a minute's delay from the
    time they receive an article to the time they retransmit it out.
    They also operate in a mesh with other servers operated by their
    own organisations (<EM
>e.g.</EM
> for redundancy) or
    others. They usually
    sit very close to the Internet backbone,
    <EM
>i.e.</EM
> with Tier 1 ISPs,
    and have extremely fast Internet links, usually more than
    10 Mbits/sec. The amount of data that flows out of such servers
    in outgoing feeds is more than the amount that comes in, because
    each incoming article is retained, not for local consumption,
    but for retransmission to others lower down in the flow. And
    these servers boast of a retransmission latency of less than 30
    seconds, <EM
>i.e.</EM
> I will retransmit an article
    to you within 30 seconds of my having received it.  </P
><P
>    However, if your server is used by a company for making Usenet
    news available for its employees, or by an institute to make the
    service available for its students and teachers, then you are
    not operating your server in a mesh pattern, nor do you mind it
    if messages take a few hours to reach you from your upstream
    neighbour. </P
><P
>    In that case, you have enormous bandwidth to conserve by moving
    to UUCP.  Even if, in this Internet-dominated era, you have no
    one to supply you with a newsfeed using dialup point-to-point
    links, you can pick up a compressed batched newsfeed using UUCP
    over TCP, over the Internet.  </P
><P
>    In this context, we want to mention Taylor UUCP, an excellent
    UUCP implementation available under GNU GPL. We use this UUCP
    implementation in preference to the bundled UUCP systems offered
    by commercial Unix vendors even for dialup connections, because
    it is far more stable, high performance, and always supports
    file transfer restart. Over TCP/IP, Taylor is the only one we
    have tried, and we have no wish to try any others.  </P
><P
>    Apart from its robustness, Taylor UUCP has one invaluable
    feature critical to large Usenet batch transfers: file transfer
    restart. If it is transferring a 10 MB batch, and the connection
    breaks after 8 MB, it will restart precisely where it left off
    last time. Therefore, no bytes of bandwidth are wasted, and
    queues never get stuck forever.  </P
><P
>    Over NNTP, since there is no batching, transfers happen one
    article at a time. Considering the (relatively) small size of an
    article compared to multi-megabyte UUCP batches, one would
    expect that an article would never pose a major problem while
    being transported; if it can't be pushed across in one attempt,
    it'll surely be copied the next time.  However, we have
    experienced entire NNTP feeds getting stuck for days on end
    because of one article, with logs showing the same article
    breaking the connection over and over again while being
    transferred <A
NAME="AEN1281"
HREF="#FTN.AEN1281"
>[1]</A
>. Some rare articles can be
    more than a megabyte in size, particularly in
    <TT
CLASS="LITERAL"
>comp.binaries</TT
>. In each such incident, we have
    had to manually edit the queue file on the transmitting server
    and remove the offending article from the head of the queue.
    Taylor UUCP, on the other hand, has never given us a single
    hiccup with blocked queues.  </P
><P
>    We feel that the overwhelming majority of servers offering the
    Usenet news service are at the leaf nodes  of the Usenet news
    flow, not at the heart. These servers are usually connected in a
    tree, with each server having one upstream ``parent node'', and
    multiple downstream ``child nodes.'' These servers receive their
    bulk incoming feed from their upstream server, and their users
    can tolerate a delay of a few hours for articles to move in and
    out. If your server is in this class, we feel you should
    consider using UUCP over TCP and transfer compressed batches.
    This will minimise bandwidth usage, and if you operate using
    dialup Internet connections, it will directly reduce your
    expenses.  </P
><P
>    A word about the link between mesh-patterned newsfeed flow and
    the need to use NNTP. If your server is receiving primary ---
    as against trickle --- feeds from multiple next-door neighbours,
    then you have to use NNTP to receive these feeds. The reason
    lies in the way UUCP batches are accepted. UUCP batches are
    received in their entirety into your server, and then they are
    uncompressed and processed. When the sending server is giving
    you the batch, it is not getting a chance to go through the
    batch article by article and ask your server whether you have or
    don't have each article. This way, if multiple servers give you
    large feeds for the same hierarchies, then you will be bound to
    receive multiple copies of each article if you go the UUCP way.
    All the gains of compressed batches will then be neutralised.
    NNTP's <TT
CLASS="LITERAL"
>IHAVE</TT
> and <TT
CLASS="LITERAL"
>SENDME</TT
>
    dialogue in effect
    permits precisely this double-check for each article, and thus
    you don't receive even a single  article twice. </P
><P
>    For Usenet servers which connect to the Internet periodically
    using dialup connections to fetch news, the UUCP option is
    especially important. Their primary incoming newsfeed cannot be
    pushed into them using queued NNTP feeds for reasons described
    in the above <A
HREF="x64.html#DIALUPNONNTP"
>paragraph</A
>
    These
    hapless servers are usually forced to pull out their articles
    using a pull NNTP feed, which is often very slow. This may lead
    to long connect times, repeat attempts after every line break,
    and high Internet connection charges.  </P
><P
>    On the other hand, we have been using UUCP over TCP and
    <TT
CLASS="LITERAL"
>gzip</TT
>'d batches for more than five years now
    in a variety of sites. Even today, a full feed of all eight
    standard hierarchies, plus the full
    <TT
CLASS="LITERAL"
>microsoft</TT
>, <TT
CLASS="LITERAL"
>gnu</TT
>
    and <TT
CLASS="LITERAL"
>netscape</TT
> hierarchies, minus 
    <TT
CLASS="LITERAL"
>alt</TT
> and <TT
CLASS="LITERAL"
>comp.binaries</TT
>, can
    comfortably be handled in just a few hours of connect time every
    night, dialing up to the 
    Internet at 33.6 or 56 Kbits/sec. We believe that the proverbial
    `full feed' with all hierarchies including
    <TT
CLASS="LITERAL"
>alt</TT
> can be handled comfortably with a 24-hour
    link at 56 Kbits/sec, provided you forget about NNTP feeds. We
    usually get compression ratios of 4:1 using
    <TT
CLASS="LITERAL"
>gzip -9</TT
> on our news batches, incidentally. </P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN1299">12.2. C-News+NNTPd or INN?</H2
><P
>INN and CNews are the two most popular free software implementations
of Usenet news. Of these two, we prefer CNews, primarily because
we have been using it across a very large range of Unixen for more
than one decade, starting from its earliest release --- the so-called
``Shellscript release'' --- and we have yet to see a need to
change.<A
NAME="AEN1302"
HREF="#FTN.AEN1302"
>[2]</A
></P
><P
>We have seen INN, and we are not comfortable with a software
implementation which puts in so much of functionality inside one
executable. This reminds us of Windows NT, Netscape Communicator,
and other complex and monolithic systems, which make us uncomfortable
with their opaqueness. We feel that CNews' architecture, which comprises
many small programs, intuitively fits into the Unix approach of building
large and complex systems, where each piece can be understood, debugged,
and if needed, replaced, individually.</P
><P
>Secondly, we seem to see the move towards INN accompanied by a move
towards NNTP as a primary newsfeed mechanism. This is no fault of INN;
we suspect it is a sort of cultural difference between INN users and
CNews users.  We find the issue of UUCP versus NNTP for batched newsfeeds
a far more serious issue than the choice of CNews versus INN. We simply
cannot agree with the idea that NNTP is an appropriate protocol for bulk
Usenet feeds for most sites. Unfortunately, we seem to find that most
sites which are more comfortable using INN seem to also prefer NNTP over
UUCP, for reasons not clear to us.</P
><P
>Our comments should not be taken as expressing any reservation about
INN's quality or robustness. Its popularity is testimony to its
quality; it most certainly ``gets the job done'' as well as anything
else. In addition, there are a large number of commercial Usenet news
server implementations which have started with the INN code; we do not
know of any which have started with the CNews code. The Netwinsite DNews
system and the Cyclone Typhoon, we suspect, both are INN-spired.</P
><P
>We will recommend CNews and NNTPd over INN, because we are more
comfortable with the CNews architecture for reasons given above, and we
do not run carrier-class sites. We will continue to support, maintain and
extend this software base, at least for Linux.  And we see no reason for
the overwhelming majority of Usenet sites to be forced to use anything
else. Your viewpoints welcome.</P
><P
>Had we been setting up and managing carrier-class sites with their
near-real-time throughput requirements, we would probably not have
chosen CNews. And for those situations, our opinion of NNTP versus
compressed UUCP has been discussed in <A
HREF="x1243.html#FEEDEFFICIENCY"
>Section 12.1</A
>&#62;</P
><P
>Suck and Leafnode have their place in the range of options, where they
appear to be attractive for novices who are intimidated by the ``full
blown'' appearance of CNews+NNTPd or INN. However, we run CNews + NNTPd
even on Linux laptops. We suspect INN can be used this way too. We do
not find these ``full blown'' implementations any more resource
hungry than their simpler cousins. Therefore, other than administration
and configuration familiarity, we don't see any other reason why even a
solitary end-user will choose Leafnode or Suck over CNews+NNTPd. As
always, contrary opinions invited.</P
></DIV
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN1281"
HREF="x1243.html#AEN1281"
>[1]</A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>    This lack of a restart facility is something NNTP shares with
    its older cousin, SMTP, and we have often seen email messages
    getting stuck in a similar fashion over flaky data links. In
    many such networks which we manage for our clients, we have
    moved the inter-server mail transfer to Taylor UUCP, using UUCP
    over TCP.</P
></TD
></TR
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN1302"
HREF="x1243.html#AEN1302"
>[2]</A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>One of us did his first installation with with BNews,
actually, at the IIT Mumbai. Then we rapidly moved from there to CNews
Shellscript Release, then CNews Performance Release, CNews Cleanup
Release, and our current release has fixed some bugs in the latest
Cleanup Release.</P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x1208.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="softwarehistory.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Usenet news clients</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Usenet software: a historical perspective</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>