LVS-DR is based on IBM's NetDispatcher. The NetDispatcher sits in front of a set of webservers, which appear as one webserver to the clients. The NetDispatcher served http for the Atlanta and the Sydney Olympic games and for the chess match between Kasparov and Deep Blue.
Here's an example set of IPs for a LVS-DR setup. Note that in this example the RIPs are on the same network as the VIP. If the RIPs were on a different network (eg 192.168.2.0/24) to the VIP && the router (here the client) was not allowed to route to RIP network (ie the realservers do not receive the arp requests from the router asking "who has VIP tell router"), then the arp problem is solved without requiring a patch on the director's kernel. In this example, for (my) convenience, the servers are on the same network as the client and you have to handle the arp problem (I used the arp -f /etc/ethers approach).
Host IP client CIP=192.168.1.254 director DIP=192.168.1.1 virtual IP (VIP) VIP=192.168.1.110 (arpable, IP clients connect to) realserver1 RIP1=192.168.1.2, VIP=192.168.1.110 (lo:0, not arpable) realserver2 RIP2=192.168.1.3, VIP=192.168.1.110 (lo:0, not arpable) realserver3 RIP3=192.168.1.4, VIP=192.168.1.110 (lo:0, not arpable) . . realserver-n 192.168.1.n+1
#lvs_dr.conf LVS_TYPE=VS_DR INITIAL_STATE=on VIP=eth0:110 lvs 255.255.255.255 192.168.1.110 DIP=eth0 dip 192.168.1.0 255.255.255.0 192.168.1.255 DIRECTOR_DEFAULT_GW=client SERVICE=t telnet rr realserver1 realserver2 realserver3 SERVER_VIP_DEVICE=lo:0 SERVER_NET_DEVICE=eth0 SERVER_DEFAULT_GW=client #----------end lvs_dr.conf------------------------------------
________ | | | client | |________| CIP=192.168.1.254 | (router) | __________ | | | | VIP=192.168.1.110 (eth0:1, arps) | director |--- DIP=192.168.1.1 (eth0) |__________| | | | ------------------------------------- | | | | | | RIP1=192.168.1.2 RIP2=192.168.1.3 RIP3=192.168.1.4 (eth0) VIP=192.168.1.110 VIP=192.168.1.110 VIP=192.168.1.110 (all lo:0, non-arping) _____________ _____________ _____________ | | | | | | | realserver | | realserver | | realserver | |_____________| |_____________| |_____________| | | | (router) (router) (router) | | | ----------------------------------------------> to client (or router in front of director)
Here's the lvs_dr.conf file
#--------------------lvs_dr.conf LVS_TYPE=VS_DR INITIAL_STATE=on #director setup VIP=eth0:12 192.168.1.110 255.255.255.255 192.168.1.110 DIP=eth0 192.168.1.10 192.168.1.0 255.255.255.0 192.168.1.255 #service setup, one service at a time SERVICE=t telnet rr 192.168.1.1 192.168.1.8 127.0.0.1 #realserver setup SERVER_LVS_DEVICE=lo0:1 SERVER_NET_DEVICE=eth0 #----------end lvs_dr.conf------------------------------------
LVS-DR setup and testing is the same as LVS-Tun except that all machines within the LVS-DR (ie the director and realservers) must be able to arp each other. This means that they have to be on the same network without any forwarding devices between them. This means that they are using the same piece of transport layer hardware ("wire"), eg RJ-45, coax, fibre. There can be hub(s) or switch(es) in this mix. Communication within the LVS is by link-layer, using MAC addresses rather than IP's. All machines in the LVS have the VIP, only the VIP on the director replies to arp requests, the VIP on the realservers must be on a non-arping device (eg lo:0, dummy).
The restrictions for LVS-DR are
For more info see e-mail postings about LVS-DR topologies in the section More on the arp problem and topologies of LVS-DR and LVS-Tun LVS's.
To allow the director to be the default gw for the realservers (e.g. when the director is the firewall), see Julian's martian modification.
Note for LVS-DR (and LVS-Tun), the services on the realservers are listening to the VIP. You can have the service listening to the RIP as well, but the LVS needs the service to be listening to the VIP. This is not an issue with services like telnet which listen to all local IPs (ie 0.0.0.0), but httpd is set up to listen to only the IPs that you tell it.
Normally for LVS-DR, the client is on a different network to the director/server(s), and each realserver has its own route to the outside world. In the simple test case below, where all machines are on the 192.168.1.0 network, no routers are required, and the return packets, instead of going out (the router(s)) at the bottom of the diagram, would return to the client via the network device on 192.168.1.0 (presumably eth0).
Here's part of the rc.lvs_dr script which configures the realserver with RIP=192.168.1.8
#setup servers for telnet, LVS-DR /sbin/ipvsadm -A -t 192.168.1.110:23 -s rr echo "adding service 23 to realserver 192.168.1.6 " /sbin/ipvsadm -a -t 192.168.1.110:23 -R 192.168.1.6 -g -w 1
With LVS-DR, the target port numbers of incoming packets cannot be remapped (unlike LVS-NAT). A request to port 23 (telnet) on the VIP will be forwarded to port 23 on a realserver, thus the RIP entry has no accompanying port.
Here's the packet headers as the request is processed by the LVS.
packet source dest data 1. request from client CIP:3456 VIP:23 - 2. ipvsadm table: director chooses server=RIP1, creates link-layer packet MAC of DIP MAC of RIP1 IP datagram source=CIP:3456, dest=VIP:23, data= - 3. realserver recovers IP datagram CIP:3456 VIP:23 - 4. realserver looks up routing table, finds VIP is local, processes request locally, generates reply VIP:23 CIP:3456 "login:" 5. packet leaves realserver via its default gw, not via DIP.
For the verbally oriented...
A packet arrives from the client for the VIP (CIP:3456->VIP:23). The director looks up its tables and decides to send the connection to realserver_1. The director arps for the MAC address of RIP1 and sends a link-layer packet to that MAC containing an IP datagram with CIP:3456->VIP:23. This is the same src:dst as the incoming packet and the tcpip layer see this as a forwarded packet. To allow this packet to be sent to the realserver, it is not neccessary for forwarding must be on in the director (it is off by default in 2.2.x, 2.4.x kernels, this is handled by the configure script).
The packet arrives at realserver_1. The realserver recovers the IP datagram, looks up its routing table, finds that the VIP (on an otherwise unused, non-arping and nonfunctional device) is local.
I'm not sure what exactly happens next, but I believe the Linux tcpip stack then delivers the packet to the socket listeners, rather than to the device with the VIP, but I'm out of my depth now.
The realserver now has a packet CIP:3456->VIP:23, processes it locally, constructs a reply, VIP:23->CIP:3456. The realserver looks up its routing table and sends the reply out its default gw to the internet (or client). The reply does not go through the director.
The role of LVS-DR is to allow the director to deliver a packet with dst=VIP (the only arp'ing VIP being on the director), not to itself, but to some machine that (as far as the director knows) doesn't have the VIP address at all. The only difference between LVS-DR and LVS-Tun is that instead of putting the IP datagram inside a link-layer packet with dst=MAC of the RIP, for LVS-Tun the IPdatagram from the client CIP->VIP is put inside another IPdatagram DIP->RIP.
The use of the non-arping lo:0 and tunl0 to hold the VIP for LVS-DR and LVS-Tun (respectively) is to allow the realserver's routing table to have an entry for a local device with IP=VIP _AND_ that so that other machines can't see this IP (ie it doesn't reply to arp requests). There is nothing particularly loopback about the lo:0 device that is required to make LVS-DR work anymore than there is anything tunnelling about a tunl0 device. For 2.0.x kernels, a tunnel packet is de-capsulated because it is marked type=IPIP, and will be decapsulated if delivered to an lo device just as well as if delivered to a tunl device. The 2.2.x kernels are more particular and need a tunl device (see "Properties of devices for VIP").
The VIP on the realservers must not reply to arp requests from the client (or the router between the client and the director).
The loopback device does not arp by default for all OS's except Linux 2.2.x,2.4.x kernels (even when you use -noarp with ifconfig). You may need to do something if you are running a realserver with a 2.2.x or 2.4.x kernel (see the arp problem).
This requires hiding the VIP on the realservers, by putting them on a separate network.
Lars set this up first on LVS-Tun. Here it is for LVS-DR. The director has 2 NICs and the realservers are on a different network (10.1.1.0/24) to the VIP (192.168.1.0/24). All IPs reply to arps. The router/client cannot route to the realserver network and the RIPs do not need to be internet routable. Since the director has 2 NICs, in the lvs_dr.conf file, set the DIP to eth1.
________ | | | client | |________| CIP=192.168.1.254 | (router) | VIP=192.168.1.110 (eth0, arps) __________ | | | director | |__________| DIP=10.1.1.1 (eth1, arps) | | ------------------------------------- | | | | | | RIP1=10.1.1.2 RIP2=10.1.1.3 RIP3=10.1.1.4 (eth0) VIP=192.168.1.110 VIP=192.168.1.110 VIP=192.168.1.110 (all lo:0, can arp) _____________ _____________ _____________ | | | | | | | realserver | | realserver | | realserver | |_____________| |_____________| |_____________| | | | (router) (router) (router) | | | ----------------------------------------------> to client
This subject has it's own section.
Performance tests (75MHz pentium classics, on 100Mbps network) with LVS-DR on the performance page showed that the limit for LVS-DR is the rate at which the director can forward packets to the realservers. LVS doesn't add any detectable latency or change the throughput of forwarding. There is little load on the director operating at high throughput in LVS-DR mode. Apparently little computation is involved in forwarding.
In the case where the director is the firewall for the realserver network, the director has to be the default gw for the realservers.
If the reply packet from the realserver to the client (VIP->CIP) goes through the director (which has a device with IP=VIP), the director is being asked to route a packet with a src address that is on the director.
Hormshorms@vergenet.net
>The problem is that with Direct routing the reply from the real server has the vip as the source address. As this is an address of one of the interfaces on the director it will drop it if you try and forward it through the director. It appears from experimentation with /proc/sys/net/ipv4/conf/*/rp_filter that at least on 2.2.14, there is no way to turn this behaviour off.
This type of packet is called a "source martian" and is dropped by the director. martians can be logged with
# echo 1 >/proc/sys/net/ipv4/conf/all/log_martians.
There are 3 solutions to this; 2 by Julian and 1 by Horms.
If the director accepts packets for the VIP via transparent proxy, then the director doesn't have the VIP and the return packets are processed normally. (Note: transparent proxy only works on the director for 2.2.x kernels).
Here's Julian's posting
Clients | ISP |eth0/ppp0/... Router/Firewall/Director (LVS box) |eth1 +----------+------------+ |eth0 |eth0 Real 1 Real2
Router: transparent proxy for VIP (or all served VIPs). The ISP must feed your Director with packets for your subnet 199.199.199.0/24 LVS-DR mode (Yes, LVS-DR, this is not a mistake). eth1: 199.199.199.2. default gw is ISP.
Real server(s): nothing special. VIP on hidden device or via transparent proxy. eth0: 199.199.199.3. default gateway is 199.199.199.2 (the Director)
This is a minimum required config. You can add internal subnets yourself using the same physical network (one NIC) or by adding additional NICs, etc. They are not needed for this test.
Packets from the real servers with saddr=VIP will be forwarded from the director because VIP is not configured in the Director. We expect that this setup is faster than VS/NAT.
(see earlier for an explanation of "source martians".)
The martian modification is currently (Aug 2001) implemented with the hidden-forward_shared-xxx.diff patch. This patch has the hidden (for realservers) and forward_shared (for directors) patch and can be applied to both realservers and directors. (Remember for the director you need the ipvs patch too).
This is a kernel patch, director has 2 NICs (doesn't work with one NIC), VIP is on outside NIC. Here are the original patches for the martian modification, 2.2 and 2.4 kernels.
The patch (below) has been tested against 2.2.15pre9 (Joe)
and 2.2.13 (Stephen Zander gibreel@pobox.com
).
The kernel code is not changing very fast for these files.
If patching other 2.2 kernels produces no rejects
(i.e. no "HUNK FAILED" notices)
then the patch is probably OK.
The patch for 2.4.x kernels is at
Julian's patch page
and in the text below.
After applying this patch, for a test, use the default values for */rp_filter(=0). This allows real servers to send packets with saddr=VIP and daddr=client through the Director.
If this patch is applied and external_eth/rp_filter is 0 (which is the default) the real servers can receive packets with saddr=any_director_ip and dst=any_RIP_or_VIP which is not very good. On the external net, set rp_filter=1 for better security.
Here's the test setup
____________ | | | client | |____________| | | 192.168.2.0/24 _____|______ | | | director | LVS-DR director has 2 NICs |____________| | eth0 192.168.1.9 | eth0:12 192.168.1.1 | | 192.168.1.0/24 _____|____________________ | | _____|__________ | | | realserver(s) | default gw=192.168.1.1 |________________|
192.168.1.1 is the normal router. For the test it was put on the director instead (as an alias). The director has 2 NICs, with forwarding=on (client and realservers can ping each other).
Director runs linux-0.9.8-2.2.15pre9 unpatched or with Julian's patch. LVS is setup using the configure script in the HOWTO, redirecting telnet, with rr scheduling to 3 realservers. The realservers were running 2.0.36 (1) or 2.2.14 (2). The arp problem was handled for the 2.2.14 realservers by permanently installing in the client's arp table, the MAC address of the NIC on the outside of the director, using the command `arp -f /etc/ethers`
The director was booted 4 times, into unpatched, patched, unpatched and patched. After each reboot the lvs scripts were run on the director and the realservers, then the functioning of the LVS tested by telnet'ing multiple times from the client to the VIP.
For the unpatched kernel, the client connection hung and inactive connections acccumulated for each realserver. For the patched kernel, the client telnet'ed to the VIP connecting with each realserver in turn.
The configure script will set up the modified LVS-DR (it will warn you that you need the patch to work). Setup details are in performance page
Performance has similar latency to LVS-NAT but the load is low on the director at high throughput of LVS-DR (see the performance page).
See Performance page on website.
Horms fwmarks allows the director to accept packets by fwmark. There is no VIP required on the director.
The material here came from listening to a talk by Herbie Pearthree of IBM (posting 2000-10-10) and from a posting by TC Lewis (which I've lost).
In normal IP communication between two hosts, the routing is symmetrical: each end of the link has an ethernet device with an IP and a route to the other machine. Packets are transmitted in pairs (an outgoing packet and a reply, often just an ACK).
In LVS-DR or LVS-Tun the roles of the two machines are split between 3 machines. Here is a two network test setup, with the client in the position normally occupied by the router. In production, the client will have a public IP and connect via a router. (This is my test setup. A big trap is that services which make calls from the RIP, eg identd and rshd will work in my setup, but fail in a production setup as the RIP will not be a routable IP).
____________ | |192.168.1.254 (eth0) | client |---------------------- |____________| <- | CIP=192.168.2.254 (eth1) | | | | V | | VIP=192.168.2.110 (eth0) | ____________ | | | | | director | | |____________| | DIP=192.168.1.1 (eth1, arps) | | | | V |---------------------------- | -> RIP=192.168.1.2 (eth0) VIP=192.168.2.110 (lo:0, no_arp) _____________ | | | realserver | |_____________|
The client sends a packet to the VIP on the director. In a normal exchange of packets between a pair of machines, the director would send a reply packet back to the client. With an LVS, the director's response instead is a packet to the MAC address of the RIP. In most normal operation, the VIP on the director never sends packets back to the client, it only sends packets to the realservers. A default gw for the director is not needed for the functioning of the LVS. Having a default gw would only allow the VIP director to reply to packets from the internet, such as port scans, creating a security hazard. The director doesn't need and shouldn't have a default gw.
There are pathological conditions when the VIP needs to reply to the client. If the realserver goes down, the director will issue ICMP "host unreachable" packets, till a new realserver is switched in by mon or ldirectord. (If you have a long lived tcp connection, eg with telnet or https, the new realserver will be getting packets for a connection which it doesn't know about, and it will issue a tcp reset. This reset will go out the default gw for the realserver and the client's session will hang or drop.)
Julian Anastasov ja@ssi.bg
> 30 Aug 2001
It may be these ICMPs are not fatal if they are not sent. This is true when LVS is used in transparent proxy setups and particulary in 2.4 where there is no real transparent proxy support. There icmp_send() does not send any packets when there is no running squid.But may be the original email sender wanted to use the LocalNode feature together with a DR setup, IIRC. I see that configure-lvs does not have configs for such setups with mixed forwarding methods. So, as you said, the users with more knowledge can select another way to build their setup. And they will know when they need a default gateway :)
If the mtu is not matched between the router and the director, the director will need to send ICMP "fragmentation needed" packets back to the router. This is a bad setup.
The configure script doesn't add a default gw for packets from the VIP. If you want one, just put it in yourself.
If you need to talk to the outside world from the director, use another IP on the director (e.g. the primary IP on the outside of the director) and use iproute2 to send packets from this IP to 0/0.
The realserver doesn't reply to the director, instead it sends its reply to the client. The realserver requires a default gw (here 192.168.1.154), but the client/router never replies to the realserver, the client/router sends its replies to the director. So the client/router doesn't need a route to the realserver network. To have one would be a security hazard. The realserver now can't ping its default gw (since there's no route for the reply packet), but the LVS still works.
The flow of packets around the LVS-DR LVS is shown by the ascii arrows.
When an attacker tries to access the nodes on the LVS, it can only connect to the LVS services on the director. It can't connect to the realserver network, as there is no routing to the realservers (even if they get access to the router). Presumably the realservers are not accessable from the outside as they'll be on private networks anyhow.
Note that for Julian's martian modification, the director will need a default gw.
If you are only using the link between the director and realserver for LVS-DR packets (i.e. you aren't telnet or ssh'ing from the realserver to the director for your admin, and you aren't copying logs from one machine to another), then you don't need an IP on the interface on the director which connects to the realserver(s).
tc lewistcl@bunzy.net
12 Jul 2000 (paraphrased)I would like to send packets from the LVS-DR director to the realservers by a separate interface (eth2), but not assign an IP to this interface. Normally I put a 192.168.100.x ip on eth2, but without it, route add -net 192.168.100.0 netmask 255.255.255.0 dev eth2 just gives me an error about eth2 not existing. I just want to save an extra IP.
What i'm asking is: does the director's eth2 need an ip on 192.168.100.0/24, or can i just somehow add that route to that interface to tell the machine to send packets that way? With lvs, the real servers are never going to care about the director's interface ip, since there's no direct tcp/ip connections or anything there, but it looks like it still needs an ip anyway.
If all that that interface is doing is forwarding outgoing packets from the director via the dr method, then i don't see why it needs an ip address.
Ted Pavlic tpavlic@netwalk.com
You basically want to do device routing. There's nothing special about this -- many routers do it... NT even does it. So does Linux. Your original route command should work
route add -net 192.168.100.0 netmask 255.255.255.0 dev eth2
as long as you've brought up eth2. Now tricking Linux into bringing up eth2 without an address might be the hard part. Try this:
ifconfig eth2 0.0.0.0 up or ifconfig eth2 0 up
tc lewis tcl@bunzy.net
ifconfig eth0 0.0.0.0 up
then the route did work. I tried that before with a netmask but it didn't work.
Ted Pavlic tpavlic@netwalk.com
Remember that IP=0 actually is IP=0.0.0.0, which is another name for the default route.
The reason why IP=0 is 0.0.0.0 ... Remember that each IP address is simply a 4-byte unsigned integer, right? Well... the easiest way to envision this is to imagine that an IP is just like a base-256 number. For example:
216.69.192.12 (my mail server) would be: 12 + 192 * 256 + 69 * 256 * 256 + 216 * 256 * 256 * 256
Which is equal to 3628449804. So...
telnet 216.69.192.12 25
is the same as:
telnet 3628449804 25
0.0.0.0 is just a special system address which is the same as the default route. Making a route from 0.0.0.0 to some gateway will set your default route equal to that gateway. That's all "route add default gw ..." does. Don't believe me? Do a route -n.
So when I told TC to put 0 on his IP-less NIC, I was just choosing a system IP that I knew would not ever need to be transmitted on. Linux wanted an IP to create the interface... so I gave it one -- the IP of the default gateway. Packets would never need to leave the system going to 0.0.0.0, and Linux has to listen to this address ANYWAY, so you might as well explicitly put it on an interface.
What would have also worked (and might have been a better idea) would be to put 127.0.0.1 on that interface. That is another system address that Linux will listen to anyway if loopback has been turned on... and it should never transmit anything away from itself with that as the destination address, so it's safe to put it on more than one interface.
The only reason I chose 0 over 127.0.0.1 is because 0 is easy... It's small... It's quick. Whenever I want to telnet to my localhost's port blah I just do a:
telnet 0 blah
because I'm lazy.. (Linux sees 0, interprets 0.0.0.0, sees an address it listens to, and basically treats 0 like a loopback)
Also you'll notice that if you give an interface 0.0.0.0 as an IP address and do an ifconfig to get stats on that interface, it will still retain no IP address. Another perquesite of using 0.0.0.0 in TC's particular situation. It may actually cause less confusion in the end.
The realserver in LVS-DR has two IPs, the RIP and the VIP. The LVS'ed services are running on the VIP. Packets from LVS'ed services, returning from the realserver, have src_addr=VIP. The RIP is not directly involved in the LVS. Services may be running on the RIP too, eg telnet which listens to 0.0.0.0, but services running on the RIP are of no interest to a LVS-DR. The director only needs the RIP to determine the target MAC address to forward packets from the clients destined for the VIP. Thus you are free to do whatever you like with the RIP without affecting the LVS. Usually the RIP is on a private IP (eg 192.168.x.x) so as to not require an extra IP, and to shield the realserver from the internet. It would be unusual to run non-LVS'ed services on the realservers, as the RIP would have to be a public IP and the realservers would have to be firewalled. However there it is reasonable to run clients on the realservers. A client session ( e.g. telnet) initiated from the RIP would have to be NAT'ed out to the outside world. The NAT box could be the router or the director. Here's how to setup with the director doing the NAT'ing (the router setup would be the same).
This is not possible with the standard destination-based route command. You need the priority routing tools from iproute2.
Here's Julian's recipe (25 Sep 2000) for setting up NAT for clients on realservers in a LVS-DR LVS.
Settings for the real server(s), send all packets from the RIP network (RIPN) to the DIP (an IP on the director in the RIPN).
#create a rule with priority 100, which says that for any packet #with src_addr in the RIP network, lookup the action in table 100. realserver: #ip rule add prio 100 from RIPN/24 table 100 #route all packets in table 100 which go to 0/0 #(ie anywhere, the default route), via the DIP. realserver: #ip route add table 100 0/0 via DIP dev eth0 #the result of this is that packets with src_addr=RIPnetwork #and dst_addr=0/0 go via the DIP.
The director has to to listen on DIP (if it doesn't already), not send ICMP redirects from the DIP ethernet device and masquerade packets from the RIPN.
director: #ifconfig eth0:1 DIP netmask 255.255.255.0 director: #echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects director: #echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
Here's how to masquerade all services, from all machines on the realserver network for kernel 2.2. In practice you will add masquerading by RIP:service, only masquerading those services needed.
director: #ipchains -A forward -s RIPN/24 -j MASQ
For LVS-DR, no default gw is needed for packets from the primary IP on the outside of the director or from the VIP (which will be an alias/secondary IP). For security reasons then none is installed. To allow masquerading of clients on the realservers, a default route will be needed for packets from the primary IP on the outside of the director (but not for packets from the VIP).
If you want to test this out first, just put in a default route for the director using the route command. If you like it you can add the more restrictive routes with iproute2 later.