Back to February 2000 meeting page
Flexible Level 4 switching Applications using Linux Virtual Servers
(aka Building Stupidly Large Servers using Linux Virtual Servers)

Michael Sparks
zath@i.am
http://MichaelSparks.tripod.com/
Level 4 switching & LVS : What is it ?
- Essentially very similar to IP Masquerading. Just "the other way around".
- Packets arrive at a load balancer, and are forwarded to private servers,
based on the contents of the /proc/net/ip_masquerade table's "FromIP:port
ToIP:Port Type" triplet.
- New entries added to the /proc/net/ip_masquerade table when SYN packets
arrive. Lots of flexibility in schedulers in choosing which server to use.
- Front end is termed a director. Backend servers termed real servers.
Credits: Who wrote it?
- Lead developer - Wensong Zhang. Other important people:
- Julian Anastasov - Lots of patches/ideas
- Peter Kese - Port to 2.2 kernel
- Joseph Mack - HOWTO Maintainer.
- Lars Marowsky-Bree - Hosting the primary LVS website amongst other
things
- Many,many others: Mike Wangsmo, Rob Thomas, TC Lewis, Matthew
Kellett, Mike Douglas, Horms, VA Linux, Redhat, plus a number of the
usual suspects.
Level 4 switching & LVS : What's out there ?
- Most commercial level 4 switches - eg Alteon, Arrowpoint, Big IP, Foundry,
(all bar 1 of those I know) operate using NAT - ie just like IP masquerading.
(LVS Term VS-NAT)
- IBM's Net Dispatcher has the service IP public on the director, and
private on the real servers. The director modifies the ethernet packet's MAC
addresses to that of the ereal server. The real server can now reply directly
to the client. Director and server must be on same LAN. (LVS Term VS-DR)
- LVS - has both of the above 2 options, and a third - packets can be
forwarded using IPIP tunnelling. (not IP-GRE tuinnelling unfortunately) This
allows the real servers to be on different networks from the load balancer -
very useful for resiliency & failover. (LVS Term VS-TUN)
Level 4 switching & LVS : Pro's Cons
- Commercial systems
- pro plug in and go. Sometimes in higher end switches/routers
anyway.
- cons Expensive. Doesn't have a general purpose OS available. Only
one choice of scheduler normally, and one choice of forwarding.
- Linux Virtual Servers
- con Still under active development.
- pros
- Still under active development :-)
- FREE
- You may already have it - built into the RedHat 6.1 kernel.
- More flexible
- Choice of forwarding on a per server basis.
- Full operating system available.
Forwarding & Scheduling
- Forwarding Mechanism Benefits:
- VS NAT -simple to set up requires no modifcation to servers, which can
be running any OS
- VS-DR - since servers reply directly, more scaleable , any OS.
- VS-TUN - due to using IPIP, Linux only, but is the most flexible
forwarding mechanism
- Scheduling:
- round-robin scheduling
- weighted round-robin scheduling
- least-connection scheduling
- weighted least-connection scheduling
- Persistance.
Gotchas !
Kernel Configuration
- Best approach is to build support for all scheduling methods into the
kernel in one go. Key options to select:
Prompt for development and/or incomplete code/drivers
Network firewalls,
IP: firewalling
IP: masquerading
IP: masquerading virtual server support,
(16) IP masquerading VS table size (the Nth power of 2)
IP: aliasing support (optional)
And as modules:
IPVS: round-robin scheduling
IPVS: weighted round-robin scheduling
IPVS: least-connection scheduling
IPVS: weighted least-connection scheduling
IP: tunneling
-
Add alias tunl0 ipip to your /etc/conf.modules
file.
IPVSADM
IP Virtual Server ADMinstration tool. Quick and simple access to the
mechanism:
ipvsadm v1.7 1999/11/28
Usage: /sbin/ipvsadm -[A|E] -[t|u] service-address [-s scheduler] [-p [timeout]] [-M [netmask]]
/sbin/ipvsadm -D -[t|u] service-address
/sbin/ipvsadm -C
/sbin/ipvsadm -[a|e] -[t|u] service-address -r server-address [options]
/sbin/ipvsadm -d -[t|u] service-address -r server-address
/sbin/ipvsadm -[L|l] [-n]
Commands:
Either long or short options are allowed.
--add-service -A add virtual service with options
--edit-service -E edit virtual service with options
--delete-service -D delete virtual service
--clear -C clear the whole table
--add-server -a add real server with options
--edit-server -e edit real server with options
--delete-server -d delete real server
--list -L list the table
Options:
--tcp-service -t service-address service-address is host and port
--udp-service -u service-address service-address is host and port
--scheduler -s It can be rr|wrr|lc|wlc,
the default scheduler is wlc.
--persistent -p [timeout] persistent port
--netmask -M [netmask] persistent granularity mask
--real-server -r server-address server-address is host (and port)
--masquerading -m masquerading (NAT)
--ipip -i ipip encapsulation (tunneling)
--gatewaying -g gatewaying (direct routing) (default)
--weight: -w capacity of real server
--numeric -n numeric output of addresses and ports
Building a large scale web server : Realserver setup
Building a large scale web server : Director setup
On the director - put this IP on a public device - eg an alias of a normal
ethernet device:
ifconfig eth0:0 130.88.203.3 netmask 255.255.255.255 broadcast 130.88.203.3
And create the service:
ipvsadm -A -t 130.88.203.3:80 -s wlc
Assuming your 200 real servers have IPs 130.88.203.45 - 130.88.203.244,
and that you've got sh-utils installed:
for i in `seq 45 244`; do
ipvsadm -a -t 130.88.203.3:80 -r 130.88.203.$i:80 -g -w 1000
done
Building a large scale DNS server
Assume the network device config on all servers (director & real
alike) is unchanged. DNS is a good example of a UDP based service hence the
choice. On the real servers, if you're running bind 8, you need to put the
following into the /etc/named.conf file if using VS-DR or VS-TUN (Assuming
130.88.203.2 as the VIP)
listen-on { 130.88.203.2; }
Director configuration.
ipvsadm -A -u 130.88.203.3:53 -s wrr
ipvsadm -A -t 130.88.203.3:53 -s wrr
Build the routing table:
for i in `seq 45 244`; do
ipvsadm -a -u 130.88.203.3:53 -r 130.88.203.$i:53 -g -w 1000
ipvsadm -a -t 130.88.203.3:53 -r 130.88.203.$i:53 -g -w 1000
done
Building a large scale Web Proxy server
Assume the network device config on all servers (director & real
alike) is unchanged. Assume software on all boxes is squid. In the squid.conf
file on all the real servers add the line:
udp_incoming_address 130.88.203.3
Director configuration.
Create the TCP based HTTP proxy service:
ipvsadm -A -t 130.88.203.3:3128 -s wlc
Create the UDP based ICP service:
ipvsadm -A -t 130.88.203.3:3130 -s wrr
Build the routing table:
for i in `seq 45 244`; do
ipvsadm -a -t 130.88.203.3:3128 -r 130.88.203.$i:3128 -g -w 1000
ipvsadm -a -u 130.88.203.3:3130 -r 130.88.203.$i:3130 -g -w 1000
done
Building a bigger better Web server : BBBWS
- Assuming you have a large number of ASP, php, cgi requests, you may want
machines dedicated to this purpose. eg Allocate servers:
- 130.88.203.45-100 to be cgi-bin servers.
- 130.88.203.101-144 to be php3 servers.
- 130.88.203.145-200 to be asp servers.
- 130.88.203.201-244 to be normal webservers/frontends.
- Designate an IP address per partitioning - eg:
- 130.88.203.245 be www-cgi.mydomain.net
- 130.88.203.246 be www-php.mydomain.net
- 130.88.203.247 be www-asp.mydomain.net
- 130.88.203.3 be www.mydomain.net
BBBWS : Farming off the request types
- Run squid on all the machines as an http accellerator. Run a redirector
that has the following rules:
- map requests containing .*\.cgi$ or cgi-bin to www-cgi.mydomain.net
- map requests for .*\.php$ to www-php.mydomain.net
- map requests for .*\.asp$ to www-asp.mydomain.net
- otherwise serve from local server.
- Use squid's ability to hide this from the user.
BBBWS : Configuring the real servers
On machines |
configure dummy0 as |
130.88.203.45-100 |
130.88.203.245 |
130.88.203.101-144 |
130.88.203.246 |
130.88.203.145-200 |
130.88.203.247 |
130.88.203.201-244 |
130.88.203.3 |
- On machine 130.88.203.201-244, run the webserver on port 81 rather than
the usual port 80. (see next slide :-)
BBBWS : Configuring the Director
- Listen on all the IPs:
ifconfig eth0:0 130.88.203.3 netmask 255.255.255.255 broadcast 130.88.203.3 up
ifconfig eth0:1 130.88.203.245 netmask 255.255.255.255 broadcast 130.88.203.245 up
ifconfig eth0:2 130.88.203.246 netmask 255.255.255.255 broadcast 130.88.203.246 up
ifconfig eth0:3 130.88.203.247 netmask 255.255.255.255 broadcast 130.88.203.247 up
- Create the services:
for i in 3 245 246 247; do
ipvsadm -A -t 130.88.203.$i:80 -s wlc -p
done
Create the routing tables
for i in `seq 45 100`; do # CGI-BIN servers
ipvsadm -a -t 130.88.203.245:80 -r 130.88.203.$i:80 -g -w 1000
done
for i in `seq 101 144`; do # PHP servers
ipvsadm -a -t 130.88.203.246:80 -r 130.88.203.$i:80 -g -w 1000
done
for i in `seq 145 200`; do # ASP servers
ipvsadm -a -t 130.88.203.246:80 -r 130.88.203.$i:80 -g -w 1000
done
for i in `seq 201 244`; do # Normal/frontends servers
ipvsadm -a -t 130.88.203.3:80 -r 130.88.203.$i:80 -g -w 1000
done
What else can we load balance ?
- Any TCP or UDP based service.
- Including SMTP, FTP (specific support for ftp's data/control lines are in
place), quake servers, POP3 servers, ssh/telnet connections into a cluster of
machines, etc. (eg 30-100 diskless X terminals using a handful of real servers
as central config)
- Your imagination is effectively the limit.
What I didn't cover
- Monitoring tools - essentially what you use depends on how closely
you need to monitor the real servers. If they're flakey, you need very good
monitoring. If they're not, you can get by with very simple tools. There's a
large number of tools out there including mon, and for simple setups, the LVS
tar ball comes with some.
- Redhat 6.1 is setup to do VS-DR out of the box, and includes a simple
admin/monitoring tool called Piranha.
- Director failover - currently this is best achieved using the
software fake to inform the router that the MAC address of the
director has changed.
- Content Synchronisation - one way of doing this is to use a network
file system - such as afs or Coda if available on the realservers. If the
amount of content is fairly small - ie less than 4Gb, then it may be easier
just to use rsync for the static data.
- Any errors in this talk are all down to either typos or me being stupid,
in all cases, the linux virtual servers website is the canonical
documentation. Thanks for listening :-)
Linux Virtual Servers website address
http://www.LinuxVirtualServer.org/