LVS-HOWTO
dave@glynjones.com
), other random stuff from the mailing list.posted to the LVS site, LVS website http://www.linuxvirtualserver.org
To enable you to understand how a Linux Virtual Server (LVS) works. Another document, the LVS-mini-HOWTO, tells you how to setup and install an LVS without understanding how the LVS works. The mini-HOWTO was extracted from this HOWTO. Eventually the redundant material will be dropped from this HOWTO.
The material here covers directors and realservers with 2.2 and 2.4 kernels. The code for 2.0.x kernels still works fine and was used on production systems when 2.0.x kernels were current, but is not being developed further. For 2.2 kernels, the networking code was rewritten, producing the arp problem. This changes the installation of LVS from a simple process that can be done by almost anyone, to thought provoking, head scratching exercise, which requires detailed understanding of the workings of LVS. For 2.0 and 2.2 LVS is stand alone code based on ip_masquerading and doesn't integrate well with other uses of ip_masquerading. For 2.4 kernels, LVS was rewritten as a netfilter module to allow it to fit into and be visible to other netfilter modules. Unfortunately the fit isn't perfect but cooperation with netfilter does work in most cases. Being a netfilter module, the latency and throughput are slightly worse for 2.4 LVS than for the 2.2 code. However with modern CPUs being running at 800MHz, the bottleneck now is network throughput rather than LVS throughput (you only need a small number of realservers to saturate 100Mbps ethernet).
In general ipvsadm commands and services have not changed between kernels.
If you use these terms when you mail us, we'll know what you're talking about.
Please use the first term in these lines. The other words are valid but less precise (or are redundant).
The realservers sometimes are frontends to other backend servers. The client does not connect to these backend servers and they are not in the ipvsadm table.
e.g.
These backend servers are setup separately from the LVS.
People sometimes call the director or the realservers, "the server". Since the LVS appears as a server to the client and since the realservers are also serving services, the term "server" is ambiguous. Do not use the term "the server" when talking about LVS. Most often you are referring to the "director" or the "realservers". Sometimes (e.g. when talking about throughput) you are talking about the LVS.
I use "realserver" as I despair of finding a reference to a "real server" in a webpage using the search keys "real" and "server". Horms and I (for reasons that neither of us can remember) have been pushing the term "real-server" for about a year, on the mailing list, and no-one has adopted it. We're going back to "realserver".
client IP = CIP virtual IP = VIP (the IP on the director that the client connects to) director IP = DIP (the IP on the director in the realserver's network) realserver IP = RIP (and RIP1, RIP2...) (the IP on the realserver) director GW = DGW (or director gw) realserver GW = RGW, SGW (or server gw)
A Linux Virtual Server (LVS) is a cluster of servers which appears to be one server to an outside client. This apparent single server is called here a "virtual server". The individual servers (realservers) are under the control of a director (or load balancer), which runs a Linux kernel patched to include the ipvs code. The ipvs code running on the director is the essential feature of the LVS (although other user level code can/is used to manage the LVS).
The director presents an IP called the Virtual IP (VIP) to clients. (When using fwmarks, VIPs are agregated into groups of IPs, but the same principles apply as for a single IP). When a client connects to the VIP, the director forwards the client's packets to one particular realserver for the duration of the client's connection to the LVS. This connection is chosen and managed by the director. The realservers serve services (eg ftp, http, dns, telnet, nntp, smtp) such as are found in /etc/services or inetd.conf. The LVS presents one IP on the director (the virtual IP, VIP) to clients.
Peter Martin p.martin@ies.uk.com
and John Cronin jsc3@havoc.gtf.org
05 Jul 2001
The VIP is the address which you want to load balance i.e. the address of your website. The VIP is usually an alias (e.g. eth0:1) so that the VIP can be swapped between two directors if a fault is detected on one.The VIP is the IP address of the "service", not the IP address of any of the particular systems used in providing the service (ie the director and the realservers).
The VIP be moved from one director to another backup director if a fault is directed (typically this is done by using mon and heartbeat, or something similar). The director can have multiple VIPs. Each VIP can have one or more services associated with it e.g. you could have HTTP/HTTPS balanced using one VIP, and FTP service (or whatever) balanced using another VIP, and calls to these VIPs can be answered by the same or different realservers.
Groups of VIPs and/or ports can be setup with fwmark.
The realservers have to be configured to work with the VIPs on the director (this includes handling the arp problem).
There can be persistence issues, if you are using cookies or https, or anything else that expects the realserver fulfilling the requests to have some connection state information. This is also addressed on the LVS persistence page
________
| |
| client | (local or on internet)
|________|
|
(router)
|
-- |
L Virtual IP
i ____|_____
n | | (director can have 1 or 2 NICs)
u | director |
x |__________|
|
V |
i |
r ----------------------------------
t | | |
u | | |
a | | |
l _____________ _____________ _____________
| | | | | |
S | realserver1 | | realserver2 | | realserver3 |
e |_____________| |_____________| |_____________|
r
v
e
r
---
In the computer beastiary, the director is a layer 4 (L4) switch. The director makes decisions at the IP layer and just sees a stream of packets going between the client and the realservers. In particular an L4 switch makes decisions based on the IP information in the headers of the packets.
Here's a description of an L4 switch from Super Sparrow Global Load Balancer documentation
" Layer 4 Switching: Determining the path of packets based on information available at layer 4 of the OSI 7 layer protocol stack. In the context of the Internet, this implies that the IP address and port are available as is the underlying protocol, TCP/IP or UCP/IP. This is used to effect load balancing by keeping an affinity for a client to a particular server for the duration of a connection. "
This is all fine except
Nevo Hednevo@aviancommunications.com
13 Jun 2001The IP layer is L3.
Alright, I lied. TCPIP is a 4 layer protocol and these layers do not map well onto the 7 layers of the OSI model. (As far as I can tell the 7 layer OSI model is only used to torture students in classes.) It seems that everyone has agreed to pretend that tcpip uses the OSI model and that tcpip devices like the LVS director should therefore be named according to the OSI model. Because of this, the name "L4 switch" really isn't correct, but we all use it anyhow.
The director does not inspect the content of the packets and cannot make decisions based on the content of the packets (e.g. if the packet contains a cookie, the director doesn't know about it and doesn't care). The director doesn't know anything about the application generating the packets or what the application is doing. Because the director does not inspect the content of the packets (layer 7, L7) it is not capable of session management or providing service based on packet content. L7 capability would be a useful feature for LVS and perhaps this will be developed in the future (preliminary code is out - May 2001 - ktcpvs).
The director is basically a router, with routing tables set up for the LVS function. These tables allow the director to forward packets to realservers for services that are being LVS'ed. If http (port 80) is a service that is being LVS'ed then the director will forward those packets. The director does not have a socket listener on VIP:80 (i.e. netstat won't see a listener).
John Cronin jsc3@havoc.gtf.org
(19 Oct 2000)
calls these types of servers
(i.e. lots of little boxes appearing to be one machine) "RAILS"
(Redundant Arrays of Inexpensive Linux|Little|Lightweight|L* Servers).
Lorn Kay lorn_kay@hotmail.com
calls them RAICs (C=computer),
pronounced "rake".
The director uses 3 different methods of forwarding
Some modification of the realserver's ifconfig and routing tables will be needed for LVS-DR and LVS-Tun forwarding. For LVS-NAT the realservers only need a functioning tcpip stack (i.e. the realserver can be a networked printer).
LVS works with all services tested so far (single and 2 port services) except that LVS-DR and LVS-Tun cannot work with services that initiate connects from the realservers (so far; identd and rsh).
The realservers can be indentical, presenting the same service (eg http, ftp) working off file systems which are kept in sync for content. This type of LVS increases the number of clients able to be served. Or the realservers can be different, presenting a range of services from machines with different services or operating systems, enabling the virtual server to present a total set of services not available on any one server. The realservers can be local/remote, running Linux (any kernel) or other OS's. Some methods for setting up an LVS have fast packet handling (eg LVS-DR which is good for http and ftp) while others are easier to setup (eg transparent proxy) but have slower packet throughput. In the latter case, if the service is CPU or I/O bound, the slower packet throughput may not be a problem.
For any one service (eg httpd at port 80) all the realservers must present identical content since the client could be connected to any one of them and over many connections/reconnections, will cycle through the realservers. Thus if the LVS is providing access to a farm of web, database, file or mail servers, all realservers must have identical files/content. You cannot split up a database amongst the realservers and access pieces of it with LVS.
The simplest LVS to setup involved clients doing read-only fetches (e.g. a webfarm). If the client is allowed to write to the LVS (e.g. database, mail farm), then some method is required so that data written on one realserver is transferred to other realservers before the client disconnects and reconnects again. This need not be all that fast (you can tell them that their mail won't be updated for 10mins), but the simplest (and most expensive) is for the mail farm to have a common file system for all servers. For a database, the realservers can be running database clients which connect to a single backend database, or else the realservers can be running independant database daemons which replicate their data.
An LVS requires a Linux director (Intel and Alpha versions known to work. The LVS code doesn't have any Intel specific instructions and is expected to work on any machine that Linux runs on.
There are differences in the coding for LVS for the 2.0.x, 2.2.x and 2.4.x kernels. Development of LVS on 2.0.36 kernels has stopped (May 99). Code for 2.2.x kernels is production level and this HOWTO is up to date for 2.2.19 kernels. Code for 2.4.x kernels is relatively new and the HOWTO is less complete for the 2.4.x material (check on the mailing list). (Jun 2001, we're pretty much upto date now.)
The 2.0.x and 2.2.x code is based on the masquerading code. Even if you don't explicitely use ipchains (eg with LVS-DR or LVS-Tun), you will see masquerading entries with `ipchains -M -L` (or `netstat -M`).
Code for 2.4.x kernels was rewritten to be compatible with the netfilter code (i.e. its entries will show up in netfilter tables). It is now production level code. Because of incompatibilities with LVS-NAT for 2.4.x LVS was in development mode (till about Jan 2001) for LVS-NAT.
2.4.x kernels are SMP for kernel code as well as user space code, while 2.2.x kernels are only SMP for user space code. LVS is all kernel code. A dual CPU director running a 2.4.x kernel should be able to push packets at twice the rate of the same machine running a 2.2 kernel (if other resources on the director don't become limiting).
You can have almost any OS on the realservers (all are expected to work, but we haven't tried them all yet). The realservers only need a tcpip stack - a networked printer can be a realserver.
LVS works on ethernet. There are some limitations on using ATM.
LVS is continually being developed and usually only the more recent kernel and kernel patches are supported. Usually development is incremental, but with the 2.2.14 kernels the entries in the /proc file system changed and all subsequent 2.2.x versions were incompatible with previous versions.
For more documentation, look at the LVS web site (eg a talk I gave on how LVS works on 2.0.36 kernel directors)
The LVS itself does not provide high availability. Other software (eg mon, ldirectord, or the Linux HA code) is used in conjunction with LVS to provide high availability (i.e. to switch out a failed realserver/service or a failed director).
Another package keepalived is designed to work with LVS watching the health of services. Julian has written Netparse, which is suffering the same fate.
There are two types of failures with an LVS.
This is handled by having a redundant director available. Director failover is handled by the Ultra Monkey Project.
This is relatively simple to handle (compared to director failover).
An agent running on the director monitors the services on the realservers.
If a service goes down, that service is removed from the ipvsadm table.
When the service comes back up, the service is added back to the ipvsadm table.
There is no separate handling of realserver failure
(e.g. it catches on fire, a concern of Mattieu Marc marc.mathieu@metcelo.com
)
- the agent on the director will just remove all that realserver's
services from the ipvsadm table.
In the Ultra Monkey Project, service failure is monitored by ldirectord.
The configure script monitors services with mond. Setting up mon is covered in Failover protection
In both cases (failure of director, or failure of a service), the client's session with the realserver will be lost (as would happen in the case of a single server). With failover however, the client will be presented with a new connection when they initiate a reconnect.
Contributions to this HOWTO came from the mailing list and are attibuted to the poster (with e-mail address). Postings may have been edited to fit them into the flow of the HOWTO.
The LVS logo (Tux with 3 lighter shaded penguins behind him representing a director and 3 realservers) is by Mike Douglas spike@bayside.net
LVS homepage is running on a machine donated by Voxel.
LVS mailing list is hosted by
Lars in Germanylmb@suse.de
The HOWTO is written in sgml. The char '&' found in C source code has to be written as & in sgml. If you swipe patches from the sgml rather than say the html rendering of it, you will get code which needs to be edited to fix the &.
Thanks to Hank Leininger for the mailing list archive which is searchable not only by subject and author, but by strings in the body. Hank's resource has been of great help in assembling this HOWTO.
Ratzratz@tac.ch
To be able to setup/maintain an LVS, you need to be able to
- know how to patch and compile a kernel
- the basics of shell-scripting
- have intermediate knowledge of TCP/IP
- have read the man-page, the online-documentation and LVS-HOWTO (this document) (and the LVS-mini-HOWTO)
- know basic system administration tools (e.g. ipchains, syslog)
The mailing list and HOWTOs cover information specific to LVS. The rest you have to handle yourself. All of us knew nothing about computers when we first started, we learnt it and you can too. If you can't setup a simple LVS from the mini-HOWTO, without getting into a major sweat (or being able to tell us what's wrong with the instructions), then you need to do some more homework.
It's hard to believe but we do get postings like
recompiling the kernel is hard (or I don't read HOWTOs), can't you guys cut me some slack and just tell me what to do?
The answer is: NO WE WON'T
The people on the mailing list answer questions for free, and have important things to do, like keeping up with /. and checking our e-mail. When we're at home, there is beer to drink and Gilligan's Island re-runs to watch. Reading to you does not rate. I expect the people who post these statements don't read the HOWTO, so I may be wasting my time here. Still there's people who think that their time is more important than ours.
can anybody tell me how to setup a windows realserver? thank you very much! I'm in a hurry.robert.gehr@web2cad.deI can't think of anyone who has set up lvs in a hurry :-)
To get technical help:
Please don't e-mail me privately with general questions. The mailing list will archive your question and the answer(s) which can be retrieved later. Other people may have more interesting, relevant or useful comments than I will.
If are writing to me to avoid the public humiliation of showing your ignorance on the mailing list, it's not going to happen. We've had too many good ideas from people who were "ignorant" to let this happen. If your question has been answered many times before and it's in the HOWTO and the archives, you'll just be told to read the HOWTO, that's all.
There's always new ideas and questions being posted on the mailing list. We don't expect this to stop.
If you have a problem with your LVS not working, before you come up on the mailing list, please -
Don't setup first with http, with filter rules, with firewalls, with complicated file systems (e.g. coda, nfs) or network accelators - debug all these nifty things after you have LVS working with telnet and with no filter rules.
If you don't understand your problem well,
here's a suggested submission format from Roberto Nibali ratz@tac.ch
hog:~ # uname -a Linux hog 2.2.18 #2 Sun Dec 24 15:27:49 CET 2000 i686 unknown hog:~ # ipvsadm -L -n | head -1 IP Virtual Server version 1.0.2 (size=4096) hog:~ # ipvsadm -h | head -1 ipvsadm v1.13 2000/12/17 (compiled with popt and IPVS v1.0.2) hog:~ #
o Using LVS-DR, gatewaying method. o Load balancing port 80 (http) non-persistent. o Network Setup: ________ | | | client | |________| | CIP | | | (router) | | | | GEP (packetfilter, firewall) | GIP | | __________ | DIP | | +------+ director | | VIP |__________| | | | +-----------------+----------------+ | | | | | | RIP1, VIP RIP2, VIP RIP3, VIP ____________ ____________ ____________ | | | | | | |realserver1 | |realserver2 | |realserver3 | |____________| |____________| |____________| CIP = 212.23.34.83 GEP = 81.23.10.2 (external gateway, eth0) GIP = 192.168.1.1 (internal gateway, eth1, masq or NAT) DIP = 192.168.1.2 (eth0:1, or eth1:1) VIP1 = 192.168.1.110 (director: eth0:110, realserver: lo0:110) RIP1 = 192.168.1.11 RIP2 = 192.168.1.12 RIP3 = 192.168.1.13 DGW = 192.168.1.1 (GIP for all realserver) o ipvsadm -L -n hog:~ # ipvsadm -L -n IP Virtual Server version 1.0.2 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.10:80 wlc -> 192.168.1.13:80 Route 0 0 0 -> 192.168.1.12:80 Route 0 0 0 -> 192.168.1.11:80 Route 0 0 0 hog:~ # The output from ifconfig from all machines (abbreviated, just need the IP, netmask etc), and the output from netstat -rn.
o ipchains -L -M -n (2.2.x) or cat /proc/net/ip_conntrack (2.4.x) o echo 9 > /proc/sys/net/ipv4/vs/debug_level && tail -f /var/log/kernlog o tcpdump -n -e -i eth0 tcp port 80 o route -n o netstat -an o ifconfig -a
Combining HA and LVS (e.g. Ultramonkey).
I realise that information in here isn't all that easy to locate yet (there's no index and you'll have to search with your editor) and that the ordering of sections could be improved.
I'll work on it as I have time.
Does anyone want to write a MIB for LVS?
Nov 2001. We have a
MIB written by Romeo Benzoni rb@ssn.tp
!
from lvs@spiderhosting.com a list of load balancers
Ultra Monkey is LVS and HA combined.
from lvs@spiderhosting.com Super Sparrow Global Load Balancing using BGP routing information.
From ratz, there's a write up on load imbalance with persistence and sticky bits at our friends at M$.
From ratz, Zero copy patches to the kernel to speed up network throughput,
Dave Miller's patches,
Rik van Riel's vm-patches and
more of Rick van Riel's patches. The Zero
copy patches may not work with LVS and may not work with netfilter either (from
Katejohn@antefacto.com
).
From Michael Brown michael_e_brown@dell.com
, the
TUX kernel level webserver.
From Lars lmb@suse.de
mod_backhand,
a method of balancing apache httpd servers that looks like ICP for web caches.
A lightweight and simple webbased cluster monitoring tool designed for beowulfs procstatd, the latest version was 1.3.4 (you'll have to look around on this page).
From Putchong Uthayopas pu@ku.ac.th
a heavyweight (lots of bells and whistles) cluster monitoring tool
KCAP