<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9"> <TITLE>LVS-HOWTO: Introduction</TITLE> <LINK HREF="LVS-HOWTO-2.html" REL=next> <LINK HREF="LVS-HOWTO.html#toc1" REL=contents> </HEAD> <BODY> <A HREF="LVS-HOWTO-2.html">Next</A> Previous <A HREF="LVS-HOWTO.html#toc1">Contents</A> <HR> <H2><A NAME="s1">1. Introduction</A></H2> <P> <H2><A NAME="ss1.1">1.1 ChangeLog</A> </H2> <P> <P>LVS-HOWTO <UL> <LI>v0.1 12 Jun 99, sent to Wensong for checking</LI> <LI>v0.2 18 Jul 99, for kernel 2.0.36</LI> <LI>v0.3 6 Nov 99, for kernel 2.2.12, LVS v0.9.2, ipvsadm v1.5 (17Oct99 release)</LI> <LI>v0.4 7 Nov 99, for kernel 2.2.13, LVS v0.9.3, ipvsadm v1.5 (7Nov99 release)</LI> <LI>v0.5 14 Nov 99, for kernel 2.2.13, LVS v0.9.4, ipvsadm v1.5 (11Nov99 release)</LI> <LI>v0.6 17 Nov 99, updates off the mailing list</LI> <LI>v0.7 1 Dec 99, fixes to "arp problem" </LI> <LI>v1.0 Mar 2001, Transparent proxy for VS-DR (Horms' method) and for VS-Tun (from Julian) corrections to VS-NAT instructions for VS-DR for non-Linux real-servers (from ratz). Change to sgml format. Lots of updates</LI> <LI>v1.1 Mar 2001, transparent proxy for 2.4.x</LI> <LI>v1.2 Apr 2001, fwmark</LI> <LI>v1.3 May 2001, kernels 0.2.12-2.4.4, 1.0.7-2.2.19. e-commerce comments from Ratz, ftp is not secure, netscape persistence, dynamic web pages.</LI> <LI>v1.4 started May 2001, kernels 0.8.0-2.4.4, 1.0.7-2.2.19</LI> </UL> <P>posted to the LVS site, <A HREF="http://www.linuxvirtualserver.org/">LVS website http://www.linuxvirtualserver.org</A><P> <H2><A NAME="ss1.2">1.2 Purpose of this HOWTO</A> </H2> <P> <P>To enable you to understand how a Linux Virtual Server (LVS) works. Another document, the LVS-mini-HOWTO, tells you how to setup and install an LVS without understanding how the LVS works. The mini-HOWTO was extracted from this HOWTO. Eventually the redundant material will be dropped from this HOWTO. <P>The material here covers directors and real-servers with 2.2 kernels. For 2.4 kernels, the coverage is less extensive. In general ipvsadm commands and services have not changed between kernels. Otherwise there will be separate information for 2.2 and 2.4 kernels in each section. If there's nothing on 2.4 kernels, it's probably because I haven't written it yet. <P> <H2><A NAME="ss1.3">1.3 Nomenclature/Abbreviations</A> </H2> <P> <P>Preferred names: <UL> <LI> <EM>director</EM>: the node that runs the ipvs code. Clients connect to the director. The director forwards packets to the servers.</LI> <LI> <EM>real-servers</EM>: the servers that handle the requests from the clients. </LI> <LI> <EM>LVS</EM> director + real-servers. The virtual server which appears as one machine to the clients.</LI> </UL> <P>The code that patches the linux kernel is called ipvs (or IPVS). <P> <H3>synonyms</H3> <P> <P> <UL> <LI> <EM>director</EM>: load balancer, dispatcher, redirector.</LI> <LI> <EM>real-server</EM>: servers, real servers. </LI> <LI> <EM>LVS</EM>: the whole cluster, the (linux) virtual server (LVS)</LI> </UL> <P> <H3>other</H3> <P>The real-servers sometimes are front-ends to other (back-end) servers. (The client does not connect to the back-end servers and they are not in the ipvsadm table.) <P><EM>e.g.</EM> <UL> <LI> a real-server may run a web application. The web application in turn connects to a database on another (back-end) server.</LI> <LI> a real-server which is a webcache (<EM>e.g.</EM> a squid). The squid connects to (back-end) webserver(s). </LI> </UL> <P> <H3>Nomenclature is not uniform</H3> <P> <P>Anarchy still rules with nomenclature on the mailing list. People call both the director and the real-servers "the server" (in the ipvsadm man page the director is called the "LVS server/host" - this is a historical accident which will be fixed), the real-server is called the "real server" and the whole system the "virtual server". Because of the ambiguity involved, I would rather avoid the term "the server" with LVS. Most people use "real server" for what I call "real-server". I use "real-server" as I despair of finding a reference to a real server in a webpage using the search keys "real" and "server". Until a standardised nomenclature is adopted, you should be prepared for anything on the mailing list. <P> <H3>IPs</H3> <P> <P>The following abbreviations are used in the HOWTO (and in the mailing list). <P> <PRE> client IP = CIP virtual IP = VIP (the IP on the director that the client connects to) director IP = DIP real-server IP = RIP (and RIP1, RIP2...) director GW = DGW real-serer GW = RGW </PRE> <P> <H2><A NAME="ss1.4">1.4 What is an LVS?</A> </H2> <P> <P>A Linux Virtual Server (LVS) is a piece of software which makes a cluster of servers appear to be one server to an outside client. This apparent single server is called here a "virtual server". The individual servers (real-servers) are under the control of a director (or load balancer), which runs a patched Linux kernel. The client is connected to one particular real-server for the duration of its connection to the LVS. This connection is chosen and managed by the director. The real-servers serve services (eg ftp, http, dns, telnet, nntp, smtp) such as are found in /etc/services or inetd.conf. The LVS presents one IP on the director (the virtual IP, VIP) to clients. <P> <P>An LVS is useful <UL> <LI> for higher throughput. The cost of increasing throughput by adding real-servers in an LVS increases linearly, whereas the cost of increased throughput by buying a larger single machine increases faster than linearly</LI> <LI> for redundancy. Individual machines can be switched out of the LVS, upgraded and brought back on line without interuption of service to the clients. Machines can move to a new site and brought on line one at a time while machines are removed from the old site, without interruption of service to the clients.</LI> <LI> for adaptability. If the throughput is expected to change gradually (as a business builds up), or quickly (for an event), the number of servers can be increased (and then decreased) transparently to the clients.</LI> </UL> <P>The Client/Server relationship is preserved in an LVS since <P> <UL> <LI>IPs of all servers is mapped to one IP (the VIP). Client sees only one IP address </LI> <LI> servers at different IP addresses believe they are contacted directly by the client.</LI> </UL> <P>Here is the basic setup for an LVS. <P> <BLOCKQUOTE><CODE> <HR> <PRE> ________ | | | client | (local or on internet) |________| | (router) | -- | L Virtual IP i ____|_____ n | | (director can have 1 or 2 NICs) u | director | x |__________| | V | i | r ---------------------------------- t | | | u | | | a | | | l _____________ _____________ _____________ | | | | | | S | real-server | | real-server | | real-server | e |_____________| |_____________| |_____________| r v e r --- </PRE> <HR> </CODE></BLOCKQUOTE> <P>In the computer bestiary, the director is a layer 4 (L4) switch. The director makes decisions at the IP layer and just sees a stream of packets going between the client and the real-servers. In particular an L4 switch makes decisions based on the IP information in the headers of the packets. <P>Here's a description of an L4 switch from <A HREF="http://www.supersparrow.org/ss_paper/">Super Sparrow Global Load Balancer documentation</A><P>" Layer 4 Switching: Determining the path of packets based on information available at layer 4 of the OSI 7 layer protocol stack. In the context of the Internet, this implies that the IP address and port are available as is the underlying protocol, TCP/IP or UCP/IP. This is used to effect load balancing by keeping an affinity for a client to a particular server for the duration of a connection. " <P>The director does not inspect the content of the packets and cannot make decisions based on the content of the packets (e.g. if the packet contains a <A HREF="LVS-HOWTO-10.html#cookie">cookie</A>, the director doesn't know about it and doesn't care). The director doesn't know anything about the application generating the packets or what the application is doing. Because the director does not inspect the content of the packets (layer 7, L7) it is not capable of session management or providing service based on packet content. L7 capability would be a useful feature for LVS and perhaps this will be developed in the future. <P>The director is basically a router, with routing tables set up for the LVS function. These tables allow the director to forward packets to real-servers for services that are being LVS'ed. If http (port 80) is a service that is being LVS'ed then the director will forward those packets. The director does not have a socket listener on VIP:80 (i.e. netstat won't see a listener). <P>John Cronin <CODE>jsc3@havoc.gtf.org</CODE> (19 Oct 2000) calls these types of servers (i.e. lots of little boxes appearing to be one machine) "RAILS" (Redundant Arrays of Inexpensive Linux|Little|Lightweight|L* Servers). Lorn Kay <CODE>lorn_kay@hotmail.com</CODE> calls them RAICs (C=computer), pronounced "rake". <P>The director uses 3 different methods of <A HREF="LVS-HOWTO-5.html#forwarding">forwarding</A> <P> <UL> <LI> VS-NAT based on network address translation (NAT)</LI> <LI> VS-DR (direct routing) where the MAC addresses on the packet are changed and the packet forwarded to the real-server</LI> <LI> VS-Tun (tunnelling) where the packet is IPIP encapsulated and forwarded to the real-server.</LI> </UL> <P>Some modification of the real-server's ifconfig and routing tables will be needed for VS-DR and VS-Tun forwarding. For VS-NAT the real-servers only need a functioning tcpip stack (ie the real-server can be a networked printer). <P> <P>LVS works with all services tested so far (single and 2 port services) except that VS-DR and VS-Tun cannot work with services that initiate connects from the real-servers (so far; identd and rsh). <P>The real-servers can be indentical, presenting the same service (eg http, ftp) working off file systems which are kept in sync for content. This type of LVS increases the number of clients able to be served. Or the real-servers can be different, presenting a range of services from machines with different services or operating systems, enabling the virtual server to present a total set of services not available on any one server. The real-servers can be local/remote, running Linux (any kernel) or other OS's. Some methods for setting up an LVS have fast packet handling (eg VS-DR which is good for http and ftp) while others are easier to setup (eg transparent proxy) but have slower packet throughput. In the latter case, if the service is CPU or I/O bound, the slower packet throughput may not be a problem. <P> <P>For any one service (eg httpd at port 80) all the real-servers must present identical content since the client could be connected to any one of them and over many connections/reconnections, will cycle through the real-servers. Thus if the LVS is providing access to a farm of web, database, file or mail servers, all real-servers must have identical files/content. You cannot split up a database amongst the real-servers and access pieces of it with LVS. <P>An LVS requires a Linux director (Intel and <A HREF="LVS-HOWTO-2.html#Alpha">Alpha</A> versions known to work, other versions eg Sparc expected to work, but haven't been tested). <P>There are differences in the coding for LVS for the 2.0.x, 2.2.x and 2.4.x kernels. Development of LVS on 2.0.36 kernels has stopped (May 99). Code for 2.2.x kernels is production level and this HOWTO is up to date for 2.2.18 kernels. Code for 2.4.x kernels is relatively new and this HOWTO may not contain all the information neccessary to run 2.4.x kernel ipvs (check on the mailing list). <P>The 2.0.x and 2.2.x code is based on the masquerading code. Even if you don't explicitely use ipchains (eg with VS-DR or VS-Tun), you will see masquerading entries with `ipchains -M -L` (or `netstat -M`). Because 2.2.x kernels are not SMP for kernel code, LVS can only use one processor at a time. Code for 2.4.x kernels was rewritten to be compatible with the netfilter code (i.e. its entries will show up in netfilter tables). It is production level for VS-DR and VS-Tun. Because of incompatibilities with VS-NAT for 2.4.x LVS is development (Jan 2001) with problems are expected to be resolved shortly. Directors running 2.4.x kernel LVS's will run in SMP mode in the kernel allowing a higher throughput of packets than with a 2.2.x kernel running on the same (SMP) hardware. <P> <P>If you are setting up an LVS for the first time, then set up the director with the latest 2.2.x kernel or 2.4.x kernel. You can have almost any OS on the real-servers. <P> <P>LVS works on ethernet. There are some limitations on using <A HREF="LVS-HOWTO-3.html#ATM">ATM</A>. <P>LVS is continually being developed and usually only the more recent kernel and kernel patches are supported. Usually development is incremental, but with the 2.2.14 kernels the entries in the /proc file system changed and all subsequent 2.2.x versions were incompatible with previous versions. <P> <P>For more documentation, look at the LVS web site (eg a talk I gave on how LVS works on <A HREF="http://www.linuxvirtualserver.org/Joseph.Mack">2.0.36 kernel directors</A>) <P> <H2><A NAME="minimal_knowledge"></A> <A NAME="ss1.5">1.5 Minimal knowledge required</A> </H2> <P> <P>(from Ratz <CODE>ratz@tac.ch</CODE>) <P>To be able to setup/maintain an LVS it would be helpfull if you <P> <UL> <LI>know how to patch and compile a kernel</LI> <LI>the basics of shell-scripting</LI> <LI>have intermediate knowledge of TCP/IP</LI> <LI>have read the man-page, the online-documentation and LVS-HOWTO (this document) (and the LVS-mini-HOWTO)</LI> <LI>know basic system administration tools (e.g. ipchains, syslog)</LI> </UL> <P> <H2><A NAME="ss1.6">1.6 LVS Failure</A> </H2> <P> <P>The LVS itself does not provide high availability. Other software (eg mon, ldirectord, or the Linux HA code) is used in conjunction with LVS to provide high availability (i.e. to switch out a failed real-server/service or a failed director). <P>Another package <A HREF="http://keepalived.sourceforge.net/">keepalived</A> is designed to work with LVS watching the health of services, but no-one has reported using it. Julian has written <A HREF="http://www.linuxvirtualserver.org/~julian/">Netparse</A>, which is suffering the same fate. <P>There are two types of failures with an LVS. <P> <UL> <LI>director failure <P> <P>This is handled by having a redundant director available. Director failover is handled by the <A HREF="http://ultramonkey.sourceforge.net">Ultra Monkey Project</A>. <P> </LI> <LI>real-server failure, or failure of a service on a real-server <P> <P>This is relatively simple to handle (compared to director failover). <P>An agent running on the director monitors the services on the real-servers. If a service goes down, that service is removed from the ipvsadm table. When the service comes back up, the service is added back to the ipvsadm table. There is no separate handling of real-server failure (e.g. it catches on fire, a concern of Mattieu Marc <CODE>marc.mathieu@metcelo.com</CODE>) - the agent on the director will just remove all that real-server's services from the ipvsadm table. <P>In the <A HREF="http://ultramonkey.sourceforge.net">Ultra Monkey Project</A>, service failure is monitored by ldirectord. <P>The <A HREF="LVS-HOWTO-9.html#configure">configure script</A> monitors services with mond. Setting up mon is covered in <A HREF="LVS-HOWTO-19.html#failover">Failover protection</A><P> </LI> </UL> <P>In both cases (failure of director, or failure of a service), the client's session with the real-server will be lost (as would happen in the case of a single server). With failover however, the client will be presented with a new connection when they reconnect. <P> <H2><A NAME="ss1.7">1.7 Thanks</A> </H2> <P>Contributions to this HOWTO came from the mailing list and are attibuted to the poster (with e-mail address). Postings may have been edited to fit them into the flow of the HOWTO. <P> <P>The LVS logo (Tux with 3 lighter shaded penguins behind him representing a director and 3 real-servers) is by Mike Douglas spike@bayside.net <P> <P> <A HREF="http://www.linuxvirtualserver.org">LVS homepage</A> in Germany hosted by Lars <CODE>lmb@suse.de</CODE> <P> <H2><A NAME="ss1.8">1.8 Mailing list, subscribing, unsubscribing and searchable archives</A> </H2> <P> <P> <A HREF="http://www.linuxvirtualserver.org/mailing.html">mailing list details</A><P>Thanks to Hank Leininger for the mailing list archive which is searchable not only by subject and author, but by strings in the body. Hank's resource has been of great help in assembling this HOWTO. <P> <H2><A NAME="ss1.9">1.9 getting technical help</A> </H2> <P> <P> <UL> <LI> mailing list</LI> <LI> LVS-mini-HOWTO. This will setup a simple 3 node (client, director, real-server) LVS without you needing to understand how an LVS works. Presumably you will have read the mini-HOWTO before this HOWTO.</LI> <LI> updates/problems - post to the mailing list and cc: mailto:jmack@wm7d.net</LI> </UL> <P> <H2><A NAME="problem_report"></A> <A NAME="ss1.10">1.10 Posting problems/questions to the mailing list</A> </H2> <P> <P>There's always new ideas and questions being posted on the mailing list. We don't expect this to stop. <P>If you have a problem with your LVS not working, before you come up on the mailing list, please - <P> <P> <UL> <LI> Read the LVS-mini-HOWTO</LI> <LI>Set up a simple LVS (3 nodes: client, director, real-server) with VS-DR or VS-NAT forwarding, with the service telnet using the instructions in the LVS-mini-HOWTO. You should be able to do this starting from a freshly downloaded kernel and ipvs-patch pair. </LI> <LI>LVS is distribution neutral. It only needs the kernel (from ftp.kernel.org) and the ipvs patch (from www.linuxvirtualserver.org). If you can get an LVS to work with these files but not with the kernel on your distro, then you have a problem with your distro. If your problem is distribution specific (eg Redhat ships with LVS patches applied, with a version which will be old by the time you get to talk to us; SuSE is planning on doing the same thing) then we may know what the problem is, but it would be better if you contacted your distro - they need the feedback, not us. </LI> <LI>If you are using one of the packages that can be used with LVS (eg heartbeat from the Linux HA project http://www.henge.com/ alanr/ha, or piranha from Redhat), again we may know what the problem is, but they need the feedback that you can't get it to work, not us. Many of us are on each others' mailing lists and we try to help when we can but the best people to handle the problem are the developers for each package. </LI> <LI> Read the LVS-HOWTO </LI> <LI> Consult the <A HREF="http://marc.theaimsgroup.com/?l=linux-kernel&m=91754108723334&w=2">LVS archives</A> </LI> <LI>Please use our jargon as best you can. The machine names will be client, director, real-server1, real-server2... IPs are CIP, VIP, RIP, DIP. If you do this, we won't have to translate "susanne" and "annie" to their functional names as we scan your posting. </LI> <LI>When you come up on the mailing list we need to know your kernel (eg 2.2.14) and the patch that was applied to it (eg 0.9.11), whether you are using VS-DR, VS-NAT or VS-Tun. Tell us what you did, what you expected and what you got and why that's a problem. </LI> <LI>We didn't know everything when we first started and we don't expect you to be any better at LVS than we were at first, but we'll feel much better about helping you if you've done the above first. We can't help you if you just tell us that you've read the documents and your LVS doesn't work. </LI> </UL> <P> <P>If you don't understand your problem well, Here's a suggested submission format from Roberto Nibali <CODE>ratz@tac.ch</CODE> <P> <OL> <LI>System information, such as kernel, tools and their versions. Example: <PRE> hog:~ # uname -a Linux hog 2.2.18 #2 Sun Dec 24 15:27:49 CET 2000 i686 unknown hog:~ # ipvsadm -L -n | head -1 IP Virtual Server version 1.0.2 (size=4096) hog:~ # ipvsadm -h | head -1 ipvsadm v1.13 2000/12/17 (compiled with popt and IPVS v1.0.2) hog:~ # </PRE> </LI> <LI>Short description and maybe sketch of what you intended to setup. Example (LVS-DR): <PRE> o Using LVS-DR, gatewaying method. o Load balancing port 80 (http) non-persistent. o Network Setup: ________ | | | client | |________| | CIP | | | (router) | | | | GEP (packetfilter, firewall) | GIP | | __________ | DIP | | +------+ director | | VIP |__________| | | | +-----------------+----------------+ | | | | | | RIP1, VIP RIP2, VIP RIP3, VIP ____________ ____________ ____________ | | | | | | |real-server1| |real-server2| |real-server3| |____________| |____________| |____________| CIP = 212.23.34.83 GEP = 81.23.10.2 (external gateway, eth0) GIP = 192.168.1.1 (internal gateway, eth1, masq or NAT) DIP = 192.168.1.2 (eth0) VIP1 = 192.168.1.100 (eth0:4 on director, lo:1 on realserver) RIP1 = 192.168.1.11 (belonging to VIP1) RIP2 = 192.168.1.12 (belonging to VIP1) RIP3 = 192.168.1.13 (belonging to VIP1) DGW = 192.168.1.1 (GIP for all realserver) o ipvsadm -L -n hog:~ # ipvsadm -L -n IP Virtual Server version 1.0.2 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.10:80 wlc -> 192.168.1.13:80 Route 0 0 0 -> 192.168.1.12:80 Route 0 0 0 -> 192.168.1.11:80 Route 0 0 0 hog:~ # The output from ifconfig from all machines (abbreviated, just need the IP, netmask etc), and the output from netstat -rn. </PRE> </LI> <LI>What doesn't work. Show some output like tcpdump, ipchains, ipvsadm and kernlog. Maybe we even ask you for a more detailed configuration like routing table, OS-version or interface setup on some machines used in your setup. Tell us what you expected. Example: <PRE> o ipchains -L -M -n o echo 9 > /proc/sys/net/ipv4/vs/debug_level && tail -f /var/log/kernlog o tcpdump -n -e -i eth0 tcp port 80 o route -n o netstat -an o ifconfig -a </PRE> </LI> </OL> <P> <H2><A NAME="ss1.11">1.11 ToDo List</A> </H2> <P> <P>Combining HA and LVS (<EM>e.g.</EM> Ultramonkey). <P>I realise that information in here isn't all that easy to locate yet (there's no index and you'll have to search with your editor) and that the ordering of sections could be improved. <P>I'll work on it as I have time. <P> <H2><A NAME="ss1.12">1.12 Other load balancing solutions</A> </H2> <P> <P>from lvs@spiderhosting.com <A HREF="http://dmoz.org/Computers/Software/Internet/Site_Management/Load_Balancing">a list of load balancers</A><P> <H2><A NAME="ss1.13">1.13 Software/Information useful/related to LVS</A> </H2> <P> <P> <A HREF="http://ultramonkey.sourceforge.net/">Ultra Monkey</A> is LVS and HA combined. <P>from lvs@spiderhosting.com <A HREF="http://supersparrow.sourceforge.net/">Super Sparrow Global Load Balancing</A> using BGP routing information. <P>From ratz, there's a write up on load imbalance with persistence and sticky bits at our friends at <A HREF="http://www.micosoft.com/TechNet/win2000/nlbovw.asp?a=printable">M$</A>. <P>From ratz, Zero copy patches to the kernel to speed up network throughput, <A HREF="ftp://ftp.kernel.org/pub/linux/kernel/people/davem">Dave Miller's patches</A>, <A HREF="http://www.surriel.com/patches">Rik van Riel's vm-patches</A> and <A HREF="http://www.linux-mm.org/">more of Rick van Riel's patches</A>. The Zero copy patches may not work with LVS and may not work with netfilter either (from Kate<CODE>john@antefacto.com</CODE>). <P>From Michael Brown <CODE>michael_e_brown@dell.com</CODE>, the <A HREF="http://www.redhat.com/products/software/ecommerce/TUX">TUX kernel level webserver</A>. <P>From Lars <CODE>lmb@suse.de</CODE> <A HREF="http://www.backhand.org/">mod_backhand</A>, a method of balancing apache httpd servers that looks like ICP for web caches. <P>A lightweight and simple webbased cluster monitoring tool designed for beowulfs <A HREF="http://www.phy.duke.edu/brahma/">procstatd</A>, the latest version was 1.3.4 (you'll have to look around on this page). <P>From Putchong Uthayopas <CODE>pu@ku.ac.th</CODE> a heavyweight (lots of bells and whistles) cluster monitoring tool <A HREF="http://smile.cpe.ku.ac.th/software/">KCAP</A><P> <HR> <A HREF="LVS-HOWTO-2.html">Next</A> Previous <A HREF="LVS-HOWTO.html#toc1">Contents</A> </BODY> </HTML>