Manchester Linux User Group (ManLUG) - meeting February 2000 - Flexible Level 4 switching Applications using Linux Virtual Servers(aka Building Stupidly Large Servers using Linux Virtual Servers)

Content is scrollable

Back to February 2000 meeting page

Flexible Level 4 switching Applications using Linux Virtual Servers

(aka Building Stupidly Large Servers using Linux Virtual Servers)

Michael Sparks
zath@i.am
http://MichaelSparks.tripod.com/

Level 4 switching & LVS : What is it ?

Essentially very similar to IP Masquerading. Just "the other way around".
Packets arrive at a load balancer, and are forwarded to private servers, based on the contents of the /proc/net/ip_masquerade table's "FromIP:port ToIP:Port Type" triplet.
New entries added to the /proc/net/ip_masquerade table when SYN packets arrive. Lots of flexibility in schedulers in choosing which server to use.
Front end is termed a director. Backend servers termed real servers.

Credits: Who wrote it?

Lead developer - Wensong Zhang. Other important people:

Julian Anastasov - Lots of patches/ideas
Peter Kese - Port to 2.2 kernel
Joseph Mack - HOWTO Maintainer.
Lars Marowsky-Bree - Hosting the primary LVS website amongst other things
Many,many others: Mike Wangsmo, Rob Thomas, TC Lewis, Matthew Kellett, Mike Douglas, Horms, VA Linux, Redhat, plus a number of the usual suspects.

Level 4 switching & LVS : What's out there ?

Most commercial level 4 switches - eg Alteon, Arrowpoint, Big IP, Foundry, (all bar 1 of those I know) operate using NAT - ie just like IP masquerading. (LVS Term VS-NAT)
IBM's Net Dispatcher has the service IP public on the director, and private on the real servers. The director modifies the ethernet packet's MAC addresses to that of the ereal server. The real server can now reply directly to the client. Director and server must be on same LAN. (LVS Term VS-DR)
LVS - has both of the above 2 options, and a third - packets can be forwarded using IPIP tunnelling. (not IP-GRE tuinnelling unfortunately) This allows the real servers to be on different networks from the load balancer - very useful for resiliency & failover. (LVS Term VS-TUN)

Level 4 switching & LVS : Pro's Cons

Commercial systems
- pro plug in and go. Sometimes in higher end switches/routers anyway.
- cons Expensive. Doesn't have a general purpose OS available. Only one choice of scheduler normally, and one choice of forwarding.
Linux Virtual Servers
- con Still under active development.
- pros
  - Still under active development :-)
  - FREE
  - You may already have it - built into the RedHat 6.1 kernel.
  - More flexible
  - Choice of forwarding on a per server basis.
  - Full operating system available.

Forwarding & Scheduling

Forwarding Mechanism Benefits:
- VS NAT -simple to set up requires no modifcation to servers, which can be running any OS
- VS-DR - since servers reply directly, more scaleable , any OS.
- VS-TUN - due to using IPIP, Linux only, but is the most flexible forwarding mechanism
Scheduling:
- round-robin scheduling
- weighted round-robin scheduling
- least-connection scheduling
- weighted least-connection scheduling
- Persistance.

Gotchas !

Level 4 switching is only a mechanism.Need to use other tools for mointoring/maniplualting system state - eg mon, or home grown tools.
VS-NAT - only route to outside world must be through the Director.
VS-DR/VS-TUN - ARP. Since all the machines have the same IP address, only one of them must be allowed to respond to ARP requests. For VS-DR under non-linux systems, simply specifying -arp works. For 2.2 kernel Linux boxes, you need to tell the kernel the device is private.
Eg:
```
ifconfig <DEV> up
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
echo 1 > /proc/sys/net/ipv4/conf/<DEV>/hidden
ifconfig <DEV> <VIP> up
```
UDP Services - if using VS-TUN/VS-DR, you must configure the UDP services to either respond on the same address the request came in on or bind the UDP service to the VIP. (due to being a connectionless protocol)

Kernel Configuration

Best approach is to build support for all scheduling methods into the kernel in one go. Key options to select:

        Prompt for development and/or incomplete code/drivers
	Network firewalls, 
	IP: firewalling
	IP: masquerading
	IP: masquerading virtual server support,
	(16) IP masquerading VS table size (the Nth power of 2)
	IP: aliasing support (optional)

And as modules:

	IPVS: round-robin scheduling
	IPVS: weighted round-robin scheduling
	IPVS: least-connection scheduling
	IPVS: weighted least-connection scheduling
	IP: tunneling

Add alias tunl0 ipip to your /etc/conf.modules file.

IPVSADM

ipvsadm  v1.7 1999/11/28
Usage: /sbin/ipvsadm -[A|E] -[t|u] service-address [-s scheduler] [-p [timeout]] [-M [netmask]]
       /sbin/ipvsadm -D -[t|u] service-address
       /sbin/ipvsadm -C
       /sbin/ipvsadm -[a|e] -[t|u] service-address -r server-address [options]
       /sbin/ipvsadm -d -[t|u] service-address -r server-address
       /sbin/ipvsadm -[L|l] [-n]

Commands:
Either long or short options are allowed.
  --add-service     -A        add virtual service with options
  --edit-service    -E        edit virtual service with options
  --delete-service  -D        delete virtual service
  --clear           -C        clear the whole table
  --add-server      -a        add real server with options
  --edit-server     -e        edit real server with options
  --delete-server   -d        delete real server
  --list            -L        list the table

Options:
  --tcp-service  -t service-address  service-address is host and port
  --udp-service  -u service-address  service-address is host and port
  --scheduler    -s       It can be rr|wrr|lc|wlc,
                                     the default scheduler is wlc.
  --persistent   -p [timeout]        persistent port
  --netmask      -M [netmask]        persistent granularity mask
  --real-server  -r server-address   server-address is host (and port)
  --masquerading -m                  masquerading (NAT)
  --ipip         -i                  ipip encapsulation (tunneling)
  --gatewaying   -g                  gatewaying (direct routing) (default)
  --weight:      -w          capacity of real server
  --numeric      -n                  numeric output of addresses and ports

Building a large scale web server : Realserver setup

Assuming Linux boxes running a web server - eg Apache, Roxen, etc, and all web servers on same network segment.
Decide on a service address - eg 130.88.203.3

On the real servers:

ifconfig dummy0 up
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
echo 1 > /proc/sys/net/ipv4/conf/dummy0/hidden
ifconfig dummy0 130.88.203.3 up

Building a large scale web server : Director setup

ifconfig eth0:0 130.88.203.3 netmask 255.255.255.255 broadcast 130.88.203.3

ipvsadm -A -t 130.88.203.3:80 -s wlc

for i in `seq 45 244`; do
   ipvsadm -a -t 130.88.203.3:80 -r 130.88.203.$i:80 -g -w 1000
done

Building a large scale DNS server

        listen-on { 130.88.203.2; }

Director configuration.

ipvsadm -A -u 130.88.203.3:53 -s wrr
ipvsadm -A -t 130.88.203.3:53 -s wrr

for i in `seq 45 244`; do
   ipvsadm -a -u 130.88.203.3:53 -r 130.88.203.$i:53 -g -w 1000
   ipvsadm -a -t 130.88.203.3:53 -r 130.88.203.$i:53 -g -w 1000
done

Building a large scale Web Proxy server

udp_incoming_address 130.88.203.3

Director configuration.

ipvsadm -A -t 130.88.203.3:3128 -s wlc

ipvsadm -A -t 130.88.203.3:3130 -s wrr

for i in `seq 45 244`; do
   ipvsadm -a -t 130.88.203.3:3128 -r 130.88.203.$i:3128 -g -w 1000
   ipvsadm -a -u 130.88.203.3:3130 -r 130.88.203.$i:3130 -g -w 1000
done

Building a bigger better Web server : BBBWS

Assuming you have a large number of ASP, php, cgi requests, you may want machines dedicated to this purpose. eg Allocate servers:
- 130.88.203.45-100 to be cgi-bin servers.
- 130.88.203.101-144 to be php3 servers.
- 130.88.203.145-200 to be asp servers.
- 130.88.203.201-244 to be normal webservers/frontends.
Designate an IP address per partitioning - eg:
- 130.88.203.245 be www-cgi.mydomain.net
- 130.88.203.246 be www-php.mydomain.net
- 130.88.203.247 be www-asp.mydomain.net
- 130.88.203.3 be www.mydomain.net

BBBWS : Farming off the request types

Run squid on all the machines as an http accellerator. Run a redirector that has the following rules:
- map requests containing .*\.cgi$ or cgi-bin to www-cgi.mydomain.net
- map requests for .*\.php$ to www-php.mydomain.net
- map requests for .*\.asp$ to www-asp.mydomain.net
- otherwise serve from local server.
Use squid's ability to hide this from the user.

BBBWS : Configuring the real servers

On machines	configure dummy0 as
130.88.203.45-100	130.88.203.245
130.88.203.101-144	130.88.203.246
130.88.203.145-200	130.88.203.247
130.88.203.201-244	130.88.203.3

On machine 130.88.203.201-244, run the webserver on port 81 rather than the usual port 80. (see next slide :-)

BBBWS : Configuring the Director

Listen on all the IPs:

ifconfig eth0:0 130.88.203.3 netmask 255.255.255.255 broadcast 130.88.203.3 up
ifconfig eth0:1 130.88.203.245 netmask 255.255.255.255 broadcast 130.88.203.245 up 
ifconfig eth0:2 130.88.203.246 netmask 255.255.255.255 broadcast 130.88.203.246 up
ifconfig eth0:3 130.88.203.247 netmask 255.255.255.255 broadcast 130.88.203.247 up

Create the services:

for i in 3 245 246 247; do
   ipvsadm -A -t 130.88.203.$i:80 -s wlc -p
done

Create the routing tables 
for i in `seq 45 100`; do             # CGI-BIN servers
   ipvsadm -a -t 130.88.203.245:80 -r 130.88.203.$i:80 -g -w 1000
done
for i in `seq 101 144`; do            # PHP servers
   ipvsadm -a -t 130.88.203.246:80 -r 130.88.203.$i:80 -g -w 1000
done
for i in `seq 145 200`; do            # ASP servers
   ipvsadm -a -t 130.88.203.246:80 -r 130.88.203.$i:80 -g -w 1000
done
for i in `seq 201 244`; do            # Normal/frontends servers
   ipvsadm -a -t 130.88.203.3:80 -r 130.88.203.$i:80 -g -w 1000
done

What else can we load balance ?

Any TCP or UDP based service.
Including SMTP, FTP (specific support for ftp's data/control lines are in place), quake servers, POP3 servers, ssh/telnet connections into a cluster of machines, etc. (eg 30-100 diskless X terminals using a handful of real servers as central config)
Your imagination is effectively the limit.

What I didn't cover

Monitoring tools - essentially what you use depends on how closely you need to monitor the real servers. If they're flakey, you need very good monitoring. If they're not, you can get by with very simple tools. There's a large number of tools out there including mon, and for simple setups, the LVS tar ball comes with some.
Redhat 6.1 is setup to do VS-DR out of the box, and includes a simple admin/monitoring tool called Piranha.
Director failover - currently this is best achieved using the software fake to inform the router that the MAC address of the director has changed.
Content Synchronisation - one way of doing this is to use a network file system - such as afs or Coda if available on the realservers. If the amount of content is fairly small - ie less than 4Gb, then it may be easier just to use rsync for the static data.
Any errors in this talk are all down to either typos or me being stupid, in all cases, the linux virtual servers website is the canonical documentation. Thanks for listening :-)

Linux Virtual Servers website address

http://www.LinuxVirtualServer.org/