Tuesday, March 3, 2015

Round Robin Internet Failover Setup



Ahoy! It's been a while and been to busy. Here is another of my personal projects and I hope this could help others too. I have updated the instructions below so that it would be easier to follow. I hope :)

In this setup I used gateway switching on the WAN side if a fail on the current router occurred.
Below is a sample graphic on how this setup was made…


 





From the Router side:
- the DHCP was disabled
- set their IP for example in setting like 192.168.0.1/24 for the first router, 192.168.0.2/24 for the second and so on.

For the Server:
- Set it’s WAN IP like 192.168.0.10/24 or  whatever  IP you want as long as the Routers and your server are on the same subnet
- Set the LAN side like 192.168.1/24 or what ever IP you like as long as it is not the same subnet as the WAN side
- setup the Server for the LAN side as DHCP, BIND(optional), SQUID(optional).(kindly see the link)
- the IPTABLES for MASQUERADING for LAN for internet sharing. (kindly see the link)
- I used Ubuntu linux as my server in this setup.

Script1 the route table. You can save this as rtable.sh:
#!/bin/bash
# This is my route add script.
# created by verZion 2/26/2015
# place this in /etc/rc.local for this to run at startup

#clear the current route table
/sbin/route del -net 0.0.0.0
/sbin/route del -net 0.0.0.0
/sbin/route del -net 0.0.0.0

#deploy the route table. Note that the last executed is the primary used gateway
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw 192.168.0.3
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw 192.168.0.2
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw 192.168.0.1
#end of script

- make rtable.sh executable
# chmod a+x rtable.sh
 
- To test Script1 execute it manually and to see if the last executed route is in the beginning of the output list.
# ./rtable.sh 
# netstat -nr            (to check the route table)
Example output on the deployed route table...
$ netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG        0 0          0 eth0
0.0.0.0         192.168.0.2     0.0.0.0         UG        0 0          0 eth0
0.0.0.0         192.168.0.3     0.0.0.0         UG        0 0          0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
192.168.0.0     0.0.0.0         255.255.255.0   U         0 0          0 eth0


- If there are no problems encountered on the rtable.sh script put it in /etc/rc.local file. Just put it above exit 0 line 
ex: 
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.


 /home/verzion/rtable.sh  


 exit 0

#####
Script2 the failover:
- Below is the script I created for the round robin internet failover. Save it as filename netfailover.sh
#!/bin/bash
# This is my internet failover script.
# created by verZion 2/26/2015
# Placed this in cron and set it to run every 5 mins to check the connection.
# Why 5mins? It's because it will download a 2MB file to determine the
# consistency of the connection.
# If you wish for a lighter checking uncomment the PING line below and set it to cron to check
# every 1minute. Note cron smallest interval is 1 minute, no seconds.
# updated 3/6/2015 - found some bugs on the loop statement

#failsw Legend
# 0 = do the check.
# 1 = failure encountered. Email the admins.
# 2 = critical error stop the email until the admin do something to fix the problem


#initialize some variables
loop1=1
dum01=0
failsw=0
gwh=3
#uncomment dURL if you are going to use download as check tool
#dURL="http://squid.acmeconsulting.it/download/squid-2.7.STABLE8-bin.zip"

#get the current route table and put it in dump01 file
/bin/netstat -nr > /tmp/dump01

#Set the maximum gateways in the network
maxcon=3

#Start testing and changing the connection on failure
while [  $loop1 -le $maxcon ] && [ $failsw -ne 2 ]; do

        gw=$(awk '{print $2};' /tmp/dump01 | head -$gwh | tail -1 )
         echo Checking connection on $gw... >> /var/log/syslog
       #download the file
        #wget --limit-rate=30k --tries=1 --connect-timeout=10 --read-timeout=5  --no-check-certificate --output-document=/tmp/dummy $dURL &> /dev/null

        #if you wish to use ping instead of download uncomment the line below
        #and comment the wget line above
        ping zdnet.com -c3 &> /dev/null

        #check the return value where non 0 means we have a problem
        if [[ "$?" -ne 0 ]]; then
            let loop1=$loop1+1
            failsw=1

            if [[ $loop1 -le $maxcon ]]; then  #if less than or equal change the route table
                 echo network problem detected!!! Changing gateway...
                 echo clearing the route configuration...
                 let loop2=$maxcon-1
                 while [  $loop2 -ne 0 ]; do
                        echo delete route $loop2
                        /sbin/route del -net 0.0.0.0
                        let loop2=$loop2-1
                 done

                 echo then shuffling the route table...        
                 loop2=3
                 let dum01=$maxcon+2
                 let loop4=$loop1+2
                 while [ $loop2 -lt $dum01 ] || [ $loop2 -eq $dum01 ]; do
                        if [[ $loop2 -ne $loop4 ]]; then
                            gw=$(awk '{print $2};' /tmp/dump01 | head -$loop2 | tail -1 )
                            echo loading gateway $gw
                            /sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw $gw
                       fi
                       let loop2=$loop2+1
                 done
                 #the last route is the primary gateway
                 let dum01=$loop1+2
                 gw=$(awk '{print $2};' /tmp/dump01 | head -$dum01 | tail -1 )
                 echo then loading default gateway $gw
                 /sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw $gw
           fi
           let gwh=$gwh+1
        else
            echo no network problem detected. Exiting!
            break     #exit loop
        fi
done

#check if all gateways failed
if  [[ $loop1 -gt $maxcon ]]; then
    echo Warning all gateways failed!!!
    sed -i '20s/failsw=0/failsw=2/' /home/verzion/netfailover.sh   #This means to stop the checking until failsw is set to 0
    #email the admin about it
    /usr/bin/mail -s "Server reporing: MAJOR Problem on the internet connection!!!" verzion@internal.com <<< "All gateways are down! :( After fixing the problem reset the failsw to 0 to enable the autocheck again."
fi
if [[ $failsw -eq 1 ]]; then
    #email the admin about the current gateway changes
    echo Warning 1 of the internet gateway has failed. Currently gateway $gw is loaded.
    /usr/bin/mail -s "Server reporing: Internet failure detected!" verzion@internal.com <<< "The internet gateway was changed to $gw"
fi
if [[ $failsw -eq 2 ]]; then
    echo Major network downtime detected. Reset the failsw variable to 0 after fixing the problem
fi

#end of script

- set the permission on the script files so it will become executable.
# chmod a+x netfailover.sh

- you may test it manually by executing it like example below
# ./netfailover.sh

- If no problems occurred on the manual execution now edit /etc/crontab and add this below the lines.
* * * * *   root    sleep 10; /home/verzion/netfailover.sh >> /var/log/syslog

- No need to restart cron as it will see the changes after the save.

- to see the log files execute the command below…
# tail -f /var/log/syslog

- Below is a sample log output from syslog…
Mar  2 12:17:01 VZServer CRON[32463]: (root) CMD (   sleep 10; /home/verzion/netfailover.sh >> /var/log/syslog)
Checking connection on 192.168.0.1...
no network problem detected. Exiting!


Summary on the Setup
- Note on the script there is a line that mails the admin as notification. I have setup an internal mail and the setup for the internal mail is not covered here.

- Also on the Script2 the BLUE colored text is the one that you can modify according to your needs and the VIOLET colored is the one you need to reset if a total failure was met in the script.

- Below are some details on the colored BLUE on Script2 (netfailover.sh)

#dURL="http://squid.acmeconsulting.it/download/squid-2.7.STABLE8-bin.zip"
If you are going to use the download test just remove the remark(#) and put the remark on ping line. In short use only either of the two.

#Set the maximum gateways in the network
maxcon=3
You set the number of routers here and the lowest is 2. Please do not use this script for only 1 router. It would be senseless.

ping zdnet.com -c3
Here you may use any URL you wish. You can ping facebook.com if you want.
on the -c3 option of the ping command this says how many times the ping should run where it is set to 3. You may set it higher or lower according to your needs.

/home/verzion/netfailover.sh
This states where the netfailover.sh script was placed on your computer. Do not forget to change this as the path is needed to be correct.

"Server reporing: MAJOR Problem on the internet connection!!!" verzion@internal.com
All under the mailing line this sends mail to the internal mail I set up on my server. The notes for the mail server setup is not found on this documentation. Kindly search for it as there are tons of working tutorials out there.
If you do not wish to use this as of now you may remark(#) the mail command line.


Have fun coding!!!  

No comments: