Daemonizing rsync on Debian

I use rsync a lot in my multiple server environment. For example its handy to consolidate all the log files from Apache (which sometimes serves the same site from many servers) into the one place, or in my GeoIP BIND setup where BIND’s normal zone transfers no longer work and a tool like rsync is required to replace zone transfers, or relocating telephone recordings from Asterisk to another machine more suited to distributing them to those with access as it has more storage and bandwidth.

There are two problems with rsync & Debian – Debian doesn’t start the rsync daemon and provides no init.d script for startup at boot, and rsync has no facility for PID files when executing a file transfer.

The first solution is to add the following command to /etc/rc.local – this command can also be crontab’ed to ensure the rsync daemon is running – and this command is superior to normal initialization as it checks the PID file and won’t start rsync if its already running:

/sbin/start-stop-daemon -p /var/run/rsyncd.pid -u root -x /usr/bin/rsync -n rsync -S — –daemon

Further to that, crontab’ed rsync jobs can sometimes have a huge hit of data that will take some time to transfer. Or perhaps there are problems with the network causing slower than normal file transfers. There are many scenarios where the crontab’ed job would be executed again before a previous job has completed.

The solution again is to check PID files. My answer was to write a small shell script (below) which creates a PID file for rsync jobs based on a specified “name” (which probably should be associated with the rsync share name and/or the host specified in the job). The solution is good enough to allow runs of rsync every minute, or maybe even multiple times per minute.

To execute rsync with this script you would run:

./rsync-pid.sh “rsync –avz –compress-level=9 –delete –password-file=/path/to/password/file /path/to/local/data/* rsync://user@host/path/to/remote/data” processname

 

The script’s source code is as follows:

#!/bin/sh

if [ -e /var/run/rsync/$2.pid ]
then
        pid=`cat /var/run/rsync/$2.pid`
        ps=`ps auwx | grep rsync | grep $pid | grep -v grep | wc -l`
        if [ $ps -eq 0 ]
        then
                /bin/rm /var/run/rsync/$2.pid
                unset pid
        fi
fi

if [ ! -e /var/run/rsync/$2.pid ]
then
        pid=`echo $$`
        echo $pid >/var/run/rsync/$2.pid

        $1

        /bin/rm /var/run/rsync/$2.pid
else
        echo $2 is already running!
fi

Clustering SIP servers with Asterisk

I’ve been considering I should make my VoIP systems more redundant. At present its just a single Asterisk installation on a Jumba Virtuzzo VPS account. While many have laughed at me for doing this, the reality is this Asterisk rig has supported about 50 users for several years with very few hiccups. If Jumba for what ever reason fall over, my pure VoIP telephone goes offline. In an ideal world I’d have auto-failover with hosting from several different providers in Sydney (so latency remains really low while redundancy is really good).

I’ve been considering how this can be pulled off, but I think playing with it over the Christmas period will be the best plan so any downtime doesn’t affect business (as business is closed anyway).

My thoughts are that multiple Asterisk installs would run with a few different tasks. One task would be SIP registration where 1 to 3 machines would continually register to SIP providers like Exetel and Pennytel, and when any inbound call is received, try to dial it locally, if not use IAX to try dialling on every other Asterisk node. Another task would be SIP registration with end users where a number of nodes would be mentioned in DNS A and SRV records.

The real magic I’ll have to work on is a macro for the Asterisk dial plan, so that we can replace Dial(SIP/somedestination) with a routine that will attempt that destination on every node in the cluster before producing a failed result. But I can’t see any reason as to why this isn’t possible.

The complicated thing with day to day administration will be duplicating the same configurations on every node in the cluster. Perhaps at a later date the development of some scripts to assist would be beneficial. Naturally I’ll be blogging about this adventure as it progresses.

Another note on VoIP – today I changed my POST 15 VoIP plan with Exetel to the $5 per month plan. I also recharged my Pennytel account and am now using Exetel as a primary provider with the alaw & ulaw codecs and Pennytel for calls to mobiles and 1300 numbers with the g729 codec. I noticed there are some decent differences in price with this operation.

50% of Internet data now has a source and destination inside Australia thanks to Google and Akamai

john-lintonI was reading John Linton’s blog (owner of Exetel) about a week ago and one article peaked my interest (requires you’re either an Exetel customer or you pay $20 for membership). I don’t read John’s blog regulary but I’m a big fan of his blog as it provides some unique insight into the Australian telecommunications market.

John was talking about how the cost of delivering data to ADSL customers has changed partly due the continual fall of the cost of IP data and because of the increases in the amount of content delivered from the Akamai and Google cache’s in Sydney.

I found this interesting cause some time during the past year or two I noticed that Google traffic was progressively “switched on” to be served over Pipe Networks who have incredibly low cost peering solutions for Internet Service Providers, typically at a much lower cost than any other transit provider. At first only Google Search seemed to be served from Pipe, and later other sites like YouTube were added.

I remember a time in the distant past where over 90% of all Internet data in Australia had a source or destination that was offshore. In the days of dialup Internet I worked for a small rural ISP who decided that as the majority of their traffic was to/from the United States they would bypass the high costs of Telstra and other Australian carriers and get a satellite link from an American firm directly to Los Angeles.

In the past 10 years there have been improvements in the fibre optic links between the US and Australia and there are now numerous non-Telstra suppliers of international transit. Pipe Networks was the most recent entry to international transit with their fibre link to the US via Guam. These improvements are what John was talking about with the fall in cost for IP data.

How times have changed. Now in the days of broadband and heavy focuses on latency, using satellite or directly linking to America would be an absurd decision to make.

I believe that eventually, every large Internet firm will be placing servers close to the end user, so that international data is only a fraction of data used by end users. This would mean that international transit will become a realm for web sites and hosting firms, and will no longer be a primary focus for ISPs even in the Australian market which is very geographically isolated.

Combining the access_log from multiple web servers into a single file

Further to my blog the other day about remote syslogd with Debian I run numerous web servers that serve the same site and visitors are directed to them based on GeoIP in BIND.

Making sense of the log files is difficult as they’re spread over separate files on separate servers. Thankfully awstats comes with a tool that helps solve this problem.

logresolvemerge.pl ships with awstats and you define each log file as a parameter, and it will sort them chronologically and output the results – so then you just direct the output into a file.

So for example on Debian you would:

/usr/share/awstats/tools/logresolvemerge.pl /var/log/apache2/somesite_access_log_node1 /var/log/apache2/some_site_access_log_node2 /var/log/apache2/some_site_access_log_node3 > /var/log/apache2/somesite_access_log

And its probably sensible to use rsync to send your logs to a centralized location.

Once you’ve used logresolvemerge.pl you can then use tools like awstats on the combined log file.

Remote syslogd with Debian

I’m running a number of OpenVZ based Debian VPS accounts for my hosting needs and I’ve busted out into using GeoIP in BIND to direct clients of my hosted services to their nearest available server to maximize performance and add the ability to failover.

A crucial piece in this puzzle is logging. Most logging on Linux systems is done with syslogd and Debian uses sysklogd. syslogd for years has had remote logging capabilities and its very simple to setup.

First I setup some “log servers” which will receive logs from other hosts.

I edited /etc/default/syslogd so that the SYSLOGD variable was defined like:

SYSLOGD="-r -s example.com -l node1.example.com:node2.example.com:node3.example.com"

Because OpenVZ hosting providers are bodgy and not all do proper reverse DNS, I created an /etc/hosts file that accurately produced reverse DNS on the logging hosts. I actually made just one file and used rsync to keep it in sync on other “log servers”.

I also edited /etc/logrotate.conf so that two options were tweaked:

# keep 1 year (52 weeks) worth of backlogs

rotate 52

# uncomment this if you want your log files compressed

compress

So then my “log servers” were all setup. I thought with disk space so cheap in certain locations, I may as well keep lots of logs. As its centralized its easier to monitor for log over-runs.

I then went on to edit /etc/syslogd.conf on each node so that it contained:

*.*    @syslog1.example.com

*.*    @syslog2.example.com

*.*    @syslog3.example.com

With this configuration every node will log in the same files and hosts will have their short hostname printed at the start of each log entry. The data is transmitted over UDP.

By having numerous sites where logs are stored means that logging is redundant and a comprehensive log set is always available. Additionally I never disabled the local logging facilities, so each node still has its own logs in /var/log

My final step was to whinge to one provider as the clock is set by the hosting provider in OpenVZ VPS containers. One provider had a clock that was inaccurate by 67 seconds. Clearly accurate time is important in a logging application and this effectively renders that node useless until the hosting provider fixes it.

An example of the advantage of this configuration… now on one of the “log servers” when I `tail –f /var/log/mail.log` I receive the activity for every single mail server I operate.

Oracle Grid Engine 6.2 update 6

Well my toying with clustered computing was a little too late.

It would appear that Sun Microsystems when they bought up Gridware decided to release Sun Grid Engine as an open source product.

Grid Engine basically handles job allocations, so a computing job can be assigned to a cluster node that has the most resources available and is most suited to the job.

But when Oracle bought up Sun they decided to kill off a few open source projects including Grid Engine.

Anything later than Grid Engine 6.2 update 6 is no longer open sourced. Grid Engine 6.2 update 7 is the latest edition and Oracle don’t have a download for update 6.

I’m also yet to find any pricing for Grid Engine.

Its a shame because I would really like to implement this software at production level.

Geographic server assignment/routing in DNS

Correctly setting up routes in DNS for geographically clustered servers or proxies is a challenging task.

There are quite simply many countries out there, and a big chunk of those are small enough to draw no or little transfer, and overseas Internet cables and Internet routing isn’t mapped or documented too well.

To correctly assign a server to a country in DNS for best performances I’ve been using a fairly simple technique. In Google Earth I made a KML map for the locations of servers and I use Google Earth to inspect the regions surrounding a server. I search Google for the name of the country in question and the keywords “ip address” for example “new caledonia ip address” and the results will give me a list of subnets assigned in those countries. I can then use traceroute on each of the servers to determine the lowest latency and then devise the best approach to routing.

Seems to work quite well its just a tedious job of locating other parts of the world and determining your best connectivity to those locations.

Updated config for GeoIP in BIND

I posted previously about using GeoIP in BIND with Debian. Today I altered my config a little for my Europe region so that it includes the Middle East and Africa as these regions are largely connected to the Internet via Europe anyway. Saves them being directed to an American server which isn’t the closest available.

My new config looks a little something like this:

Continue reading

Server Clustering

I was looking at the costing of my Internet hosting a few weeks ago and decided I’m paying too much.

I have a couple dedicated servers in a few locations, mostly at iWeb. For the purpose of this blog I’ll do the costing on just one of them which was $104 USD per month.

The main reason I was keeping the hosting at iWeb was for Icecast servers as I operate the Internet streams for several FM radio stations which can consume a bit of bandwidth. Last week I was able to locate several Virtual Private Server packages with sufficient bandwidth for my Icecast servers from $3 USD per month. Quite a difference. Even if these VPS providers are overselling or are flyby companies, with good backups and my credit card I should be able to install replacement servers in a matter of hours.

Lowendbox proved to be a useful resource in tracking down VPS packages.

I decided on taking the approach of migrating all web, email and DNS hostings to Australia with Jumba. 90% of these sites target Australians so it will serve to improve the performance of these websites for the target audience.

I also decided to get accounts in Los Angeles and The Netherlands for mail and DNS backups. I will later do some works on clustering web servers and utilize GeoIP support in BIND to direct visitors to their nearest server and will end up migrating a few websites across. I’ve already successfully setup my Icecast servers to cluster and am in the process of stability testing that system.

So all up to replace one machine costing $104 per month… I ended up buying:

  • 2x VPS with Jumba – 512MB Memory, 10GB Hard Disk, 100GB Network Transfer – $20 AUD per month
  • VPS with Hostitek – 1GB Memory, 80GB Hard Disk, 2000GB Network Transfer – $6 USD per month
  • VPS with VMPort – 256MB Memory, 20GB Hard Disk, 250GB Network Transfer – £3.52 GBP per month

So for around $35 per month… I’ve successfully reduced that hosting cost of $104 per month. Sure I don’t get the same processor, memory, hard disk and bandwidth… but I didn’t need such large quotas anyway.

BIND with GeoIP on Debian

After my last blog I needed to setup Geographic IP support in BIND. Apparently Debian ships with the patch from Caraytech already so there isn’t any patching needed, just configuration.

First off, because we’re using views in BIND we need to remove the inclusion for /etc/bind/named.conf.default-zones in /etc/bind/named.conf – so you can do that with vi or your editor of choice.

Secondly so we don’t have to duplicate zone definitions I created /etc/bind/named.conf.zones for domains that do not require Geographic IP support. I’ve then included it in each view on the config below.

We then we need to edit /etc/bind/named.conf.local to have something like this:

view “Australiasia” {
match-clients { country_AU; country_NZ; };
recursion no;
include “/etc/bind/named.conf.default-zones”;
include “/etc/bind/named.conf.zones”;
zone “example.com” {
type master;
file “/etc/bind/geoip/au.example.com.hosts”;
};
};

view “Europe” {
match-clients { country_AD; country_AL; country_AM; country_AT;
country_AZ; country_BA; country_BE; country_BG; country_BY;
country_CH; country_CZ; country_DE; country_DK; country_EE;
country_ES; country_FI; country_FR; country_GE; country_GR;
country_HR; country_HU; country_IE; country_IS; country_IT;
country_KZ; country_LI; country_LT; country_LU; country_LV;
country_MC; country_MD; country_ME; country_MK; country_MT;
country_NL; country_NO; country_PL; country_PT; country_RO;
country_RS; country_RU; country_SE; country_SI; country_SK;
country_SM; country_TR; country_UA; country_UK; country_VA; };
recursion no;
include “/etc/bind/named.conf.default-zones”;
include “/etc/bind/named.conf.zones”;
zone “example.com” {
type master;
file “/etc/bind/geoip/eu.example.com.hosts”;
};
};

view “Default” {
match-clients { any; };
recursion no;
include “/etc/bind/named.conf.default-zones”;
include “/etc/bind/named.conf.zones”;
zone “example.com” {
type master;
file “/etc/bind/geoip/us.example.com.hosts”;
};
};

With that all done we also need to keep in mind that AXFR zone transfers will not work with GeoIP. This is why I’ve placed the GeoIP zones in /etc/bind/geoip – so that with some scripting magic we can use rsync and when the files are changed, reload BIND (this script I’ve omitted as I believe its my intellectual property).

Enjoy.

UPDATE: Provided some further configuration here.