#!/usr/local/bin/python # encoding: utf-8 """ servermonitor.py Version 0.1 http://www.subjectivereality.org ------------------------------------------------------------------------------ Copyright (c) 2008 Ryan Parrish Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ------------------------------------------------------------------------------ PURPOSE: The problem with most server monitoring systems is they run from inside of your network, as such if there is a major outage past office hours where there monitor server is not capable of sending an alert email out (be it the mail server is down, or internet) you may not get the notification. There are ideal solutions to this such as signing up for a paging service so your servers have some kind of Out-Of-Band alerting, but for very small shops that may not have the extra equipment or funds to install such hardware and the situation of having an externaly facing server go down is still a big deal I offer this script. It's purpose is to use the many free online shell accounts to provide you with additonal external monitoring to you already existing internal system for no cost. It would also work great for all the home server folks out there that may have one box hosting there domain and want a simple monitor to know if there site, internet, or DNS has stopped working. REQUIREMENTS: The requierments are ment to be absolutly minimal. As such, for full functionality nothing more is reqired than a shell accont with python (2.4 reqires the pysqlite module[1]), some type of cron scheduler, a small (as in only a few KB), amount of web space to host the status page, and finaly some type of email address that is external to the network you are monitoring. Most cell phone providers offer an email->SMS gateway service that I feel is optimul for the pourpose of this script, where as your phone has an email address such as 5555551212@txt.yourcellcompany.com and messages sent to that address will be delivered to your cell phone via SMS. Almost all free shells provide all of the above needed items and you can find an extensive list of free shells at http://www.red-pill.eu/freeunix.shtml. [1] I know that most shell accounts will not just install packages at your whim, so if your shell does not have pysqlite look into using http://pypi.python.org/pypi/virtualenv to setup your own local install of python where you can install the pysqlite module. SETUP: Step 0. Set the variables below to suit your environment. Step 1. Initialize the database. $./servermonitor.py --initialize-database Step 2. Add some services to check. $./servermonitor.py --add-service server1 www.example.com 80 ConnectAndVerify $./servermonitor.py --add-service server2 jabber.example.com 5222 ConnectCheck $./servermonitor.py --add-service server1 www.example.com 443 ConnectAndVerify $./servermonitor.py --add-service server3 mail.example.com 25 ConnectAndVerify Step 3. Run all the checks. $./servermonitor.py --run Step 4. Generate the status page. $./servermonitor.py --generate-html USAGE: I will cover real quick though the two kinds of tests that are available, ConnectCheck and ConnectAndVerify. ConnectCheck is a very simple check that just makes sure there is a socket open on the other end and then closes, it's just a simple "hey are the lights on?" test. The next test ConnectAndVerify, does a little more to check that what ever is listening on the tested port is actually working as planed, it's like "hey the lights are on, lets see if anyone is home". The first time you add a service with a check type of ConnectAndVerify the script goes out to the port and grabs whatever data port returns such as a banner on a SMTP port, the / directory on a web server, or even a banner on an SSH port. With that data in hand the script generates a MD5 hex digest of the data and stores it as the 'know good return' for that particular service. Subsequent test of that service will also retrieve the data and generate a MD5 hex digest of it, but this time the script compares the MD5 from the 'known good' with the just returned data - if it is different, meaning maybe the service returned some error codes (but in this case still listening on it's port) the script will set the state of the service in VerifyFailed and you will get an email alerting you to the change. Obviously ConnectAndVerify will only work on ports that return some unsolicited data, ones that don't return data you need to revert to the ConnectCheck. SAMPLE RUN SCRIPT: #!/usr/local/bin/bash #script name: run.sh PID=$(ps ux | grep servermonitor.py | grep -v grep | awk '{print $2}') if [ -n "$PID" ] then echo "Script already running" exit fi cd /home/username/servermonitor/ #we want to check to make sure we can even out to the internet if ping -c 1 www.google.com then ./servermonitor.py --run RETURN_VALUE=$? #if everything ran fine, lets update the status page if [ $RETURN_VALUE == 0 ] then ./servermonitor.py --generate-html fi fi SAMPLE CRON: */5 * * * * /home/username/servermonitor/run.sh > /dev/null """ ###################################### #This is the stuff you want to change# ###################################### #How long to wait till we assume the port is not listening, in seconds. TIMEOUT = 10 #Email address to send status messages to. EMAIL_TO = "5555551212@txt.att.net" #Whom should we say the email is from? Honestly, you could put pretty much #anything in here. EMAIL_FROM = "yourname@shell.net" #SMTP server to send mail though. Personaly I like the idea of using the #EMAIL_TO address'es MX server as this way we do not have to rely on the shell #account's local SMTP server. SMTPSERVER = "atlsmtp.cingularme.net" #Some kind of descriptive name to describe this service, would be useful #if you where running multiable copies of this script and wanted to know #which one sent the status message. Personaly I use the DNS name of the #shell account. MYNAME = "shell.net" #File name and location to use for the the sqlite database. DB = "./servermonitor.db" #File name and location to write out the HTML version of the status page. OUTPUT_FILE = "/home/username/public_html/servermonitor/index.html" #commonly used http protocol ports, we need to know these so we know to use #urllib2 rather than socket to connect to them when we do a CheckAndVerify http_ports = (80, 443) #these are ports that display a banner upon connection, if you want to do a #CheckAndVerify on a service that is not HTTP, you need to have the port #listed here. banner_ports = (21, 22, 23, 25) ############################################################ #No need to edit anything below, unless you really want to.# ############################################################ import sys #I'm pretty sure pysqlite is API compatible with python 2.5 sqlite3 with the #few basic functions I am using. try: import sqlite3 as sqlite except ImportError: #probably using python 2.4 try: from pysqlite2 import dbapi2 as sqlite except: print "You must have sqlite installed to use this script." sys.exit(1) import os import socket import md5 import smtplib import urllib2 from time import strftime index_html_template = """ Service Checker at """+MYNAME+"""

Last service check: %s

Page generated: %s

%s
NameAddressPortStatusStatus state since
""" service_html_template = """ %(host)s%(address)s%(port)s%(status)s%(timestamp)s """ email_template = """ This is the monitor service at """+MYNAME+""". As of %s service %s has become state %s """ services_query = "SELECT service_id, name, address, port, checktype, hash FROM service" service_states_query = "SELECT state_id, id_service, status, timestamp FROM state WHERE id_service = ? ORDER BY state_id DESC" insert_state_query = "INSERT INTO state (status, id_service, timestamp) VALUES (?, ?, ?)" class ConnectCheck: '''This check type does a simple socket test of wether the port has a listening socket. Returns a two element tuple ''' def __init__(self, address, port, *args): self.address = address self.port = port def check(self): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.settimeout(TIMEOUT) try: s.connect((self.address, self.port)) if self.port not in banner_ports: return ("Ok", None) result = "" while 1: data = s.recv(1024) result = result + data #we are going to only read the first line of the banner if "\n" in data: break s.close() return ("Ok", result.strip()) except: s.close() return ("ConnectFail", None) class ConnectAndVerify(ConnectCheck): def __init__(self, address, port, hash): ConnectCheck.__init__(self, address, port) self.hash = hash def _vefifyMD5(self, data): '''Compare the known MD5 with the one we generate from the returned socket data''' new_hash = md5.new(data).hexdigest() if new_hash != self.hash: return ("VerifyFail", new_hash) else: return ("Ok", new_hash) def check(self): if self.port in http_ports: try: if self.port == 80: h = urllib2.urlopen("http://%s:%s" % (self.address, self.port)) elif self.port == 443: h = urllib2.urlopen("https://%s:%s" % (self.address, self.port)) else: #default to plain http h = urllib2.urlopen("http://%s:%s" % (self.address, self.port)) data = h.read() except: return ("ConnectFail", None) return self._vefifyMD5(data) else: result = ConnectCheck.check(self) if result[0] != "Up": return result return self._vefifyMD5(result[1]) check_to_class = {'ConnectCheck': ConnectCheck, 'ConnectAndVerify': ConnectAndVerify} def checkDBExists(database): if os.path.isfile(database) == False: print database+" file missing, run this script with the --initialize-database" sys.exit(1) def sendNotification(time, service, state): '''Sends an email notification to the specified address noting a change in the state of a service.''' msg = email_template % (time, service, state) server = smtplib.SMTP(SMTPSERVER) server.sendmail(EMAIL_FROM, EMAIL_TO, msg) server.quit() def buildHTML(): '''Builds the HTML status page for monitored services''' checkDBExists(DB) conn = sqlite.connect(DB) conn.row_factory = sqlite.Row services_text = "" c = conn.cursor() c.execute("SELECT value FROM stats WHERE stat ='lastrun'") lastrun = c.fetchone()[0] c.execute(services_query) rows = c.fetchall() for service in rows: c.execute(service_states_query, (service[0],)) state = c.fetchone() services_text = services_text + service_html_template % {'status':state['status'], 'address':service[2], 'port': service[3], 'host':service['name'], 'timestamp':state['timestamp']} c.close() conn.close() html = index_html_template % (lastrun, strftime("%Y-%m-%d %H:%M:%S"), services_text) fh = open(OUTPUT_FILE, 'w') fh.write(html) fh.close() def createSchema(): '''Creates the sqlite db schema''' if os.path.isfile(DB): print "Database "+DB+" already exists.\nPlease delete it and try again." sys.exit(1) conn = sqlite.connect(DB) schema = """ CREATE TABLE service( service_id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, address TEXT NOT NULL, port INTEGER NOT NULL, checktype TEXT NOT NULL, hash TEXT); CREATE UNIQUE INDEX unq_service ON service (address, port, checktype); CREATE TABLE state( state_id INTEGER PRIMARY KEY AUTOINCREMENT, id_service INTEGER NOT NULL, status TEXT NOT NULL, timestamp TEXT NOT NULL); CREATE TABLE stats( stat TEXT NOT NULL, value TEXT); INSERT INTO stats (stat, value) values ('lastrun', null);""" c = conn.cursor() c.executescript(schema) c.close() def addService(name, address, port, type): insert_service_query = """INSERT INTO service (name, address, port, checktype, hash) VALUES (?, ?, ?, ?, ?)""" checkDBExists(DB) conn = sqlite.connect(DB) c = conn.cursor() if type == "ConnectAndVerify": x = ConnectAndVerify(address, port, '') result = x.check() if result[0] == "ConnectFail": print "Service not up right now, needs to up when you add it" c.close() sys.exit(1) c.execute(insert_service_query, (name, address, port, type, result[1])) c.execute("SELECT last_insert_rowid()") service_id = c.fetchone()[0] c.execute(insert_state_query, ("Ok", service_id, strftime("%Y-%m-%d %H:%M:%S"))) conn.commit() elif type == "ConnectCheck": x = ConnectCheck(address, port) result = x.check() if result[0] == "ConnectFail": print "Service not up right now, needs to up when you add it" c.close() sys.exit(1) c.execute(insert_service_query, (name, address, port, type, "")) c.execute("SELECT last_insert_rowid()") service_id = c.fetchone()[0] c.execute(insert_state_query, ("Ok", service_id, strftime("%Y-%m-%d %H:%M:%S"))) conn.commit() else: print "Unreconized check type, %s" % type sys.exit(1) def deleteService(service_id): '''Deletes a service all associated state entries from the database, follows it up with a VACUUM to keep the database neet.''' checkDBExists(DB) conn = sqlite.connect(DB) c = conn.cursor() c.execute("DELETE FROM service WHERE service_id = ?", (service_id,)) c.execute("DELETE FROM state WHERE id_service = ?", (service_id,)) conn.commit() c.execute("VACUUM") c.close() def padOrTurncate(input, length): input = str(input) if len(input) <= length: return input.ljust(length) else: return "%s..." % input[0:length-3] def listServices(): checkDBExists(DB) conn = sqlite.connect(DB) conn.row_factory = sqlite.Row c = conn.cursor() c2 = conn.cursor() c.execute(services_query) print "-----------+--------------+--------------------------+------+------------------+-------------+--------------------" print "Service ID | Name | Address | Port | Check Type | Last Status | Status State Since" print "-----------+--------------+--------------------------+------+------------------+-------------+--------------------" for service in c: c2.execute(service_states_query, (service['service_id'],)) state = c2.fetchone() print "%s | %s | %s | %s | %s | %s | %s" % (padOrTurncate(service['service_id'], 10), padOrTurncate(service['name'] ,12), padOrTurncate(service['address'] ,24), padOrTurncate(service['port'], 4), padOrTurncate(service['checktype'] ,16), padOrTurncate(state['status'], 11), state['timestamp']) print "-----------+--------------+--------------------------+------+------------------+-------------+--------------------" c.execute("SELECT value FROM stats WHERE stat ='lastrun'") lastrun = c.fetchone()[0] print "Last run: %s" % lastrun def main(): '''This is the main loop that checks all the services and updates the state table and sends the alert mail if necessary''' checkDBExists(DB) conn = sqlite.connect(DB) conn.row_factory = sqlite.Row c = conn.cursor() c2 = conn.cursor() c.execute(services_query) for row in c: x = check_to_class[row['checktype']] checker = x(row['address'], row['port'], row['hash']) result = checker.check() c2.execute(service_states_query, (row['service_id'],)) last_state = c2.fetchone() if last_state['status'] != result[0]: current_time = strftime("%Y-%m-%d %H:%M:%S") sendNotification(current_time, "%s:%s" % (row['address'], row['port']), result[0]) c2.execute(insert_state_query, (result[0], row['service_id'], strftime("%Y-%m-%d %H:%M:%S"))) conn.commit() c.execute("UPDATE stats SET value=? WHERE stat='lastrun'", (strftime("%Y-%m-%d %H:%M:%S"),)) conn.commit() c.close() conn.close() if __name__ == '__main__': import optparse parser = optparse.OptionParser() parser.add_option('--run', '-r', action="store_true", dest="run", help="Do the main service check on all services") parser.add_option('--initialize-database', '-i', action="store_true", dest="create", help="Initialize the database") parser.add_option('--generate-html', '-g', action="store_true", dest="generate", help="Generate the HTML rendering of the status page") parser.add_option('--add-service', '-a', action="store", dest="service", help = """Add a new service to check. \n --add-service
= The 'pretty' name to call your service. \n
= The DNS name or IP address of your service \n = The port to check \n = ConnectCheck OR ConnectAndVerify""") parser.add_option('--delete-service', '-d', action="store", dest="delete", help="Delete a service and all assosisated state entrires by its service id") (options, args) = parser.parse_args() if options.create == True: createSchema() elif options.generate == True: buildHTML() elif options.service is not None: if len(args) != 3: print "Incorrect number of arguments" print "--add-service
" sys.exit(1) addService(options.service, args[0], int(args[1]), args[2]) elif options.run == True: main() elif options.delete is not None: deleteService(options.delete) else: listServices()