#!/usr/local/bin/python
# encoding: utf-8
"""
servermonitor.py
Version 0.3
http://www.subjectivereality.org/category/servermonitorpy
------------------------------------------------------------------------------
Copyright (c) 2008 Ryan Parrish
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
------------------------------------------------------------------------------
PURPOSE:
The problem with most server monitoring systems is they run from inside
of your network, as such if there is a major outage past office hours where
there monitor server is not capable of sending an alert email out (be it
the mail server is down, or internet) you may not get the notification.
There are ideal solutions to this such as signing up for a paging service
so your servers have some kind of Out-Of-Band alerting, but for very small
shops that may not have the extra equipment or funds to install such
hardware and the situation of having an externaly facing server go down
is still a big deal I offer this script. It's purpose is to use the many
free online shell accounts to provide you with additonal external monitoring
to you already existing internal system for no cost. It would also work
great for all the home server folks out there that may have one box hosting
there domain and want a simple monitor to know if there site, internet, or
DNS has stopped working.
REQUIREMENTS:
The requierments are ment to be absolutly minimal. As such, for full
functionality nothing more is reqired than a shell accont with
python (2.4 reqires the pysqlite module[1]), some type of cron scheduler,
a small (as in only a few KB), amount of web space to host the status
page, and finaly some type of email address that is external to the
network you are monitoring. Most cell phone providers offer an email->SMS
gateway service that I feel is optimul for the pourpose of this script,
where as your phone has an email address such as
5555551212@txt.yourcellcompany.com and messages sent to that address will
be delivered to your cell phone via SMS.
Almost all free shells provide all of the above needed items and
you can find an extensive list of free shells at
http://www.red-pill.eu/freeunix.shtml.
[1] I know that most shell accounts will not just install packages at your
whim, so if your shell does not have pysqlite look into using
http://pypi.python.org/pypi/virtualenv to setup your own local install of
python where you can install the pysqlite module.
SETUP:
Step 0. Set the variables below to suit your environment.
Step 1. Initialize the database.
$./servermonitor.py --initialize-database
Step 2. Add some services to check.
$./servermonitor.py --add-service server1 www.example.com 80 ConnectAndVerify
$./servermonitor.py --add-service server2 jabber.example.com 5222 ConnectCheck
$./servermonitor.py --add-service server1 www.example.com 443 ConnectAndVerify
$./servermonitor.py --add-service server3 mail.example.com 25 ConnectAndVerify
Step 3. Run all the checks.
$./servermonitor.py --run
Step 4. Generate the status page.
$./servermonitor.py --generate-html
USAGE:
I will cover real quick though the two kinds of tests that
are available, ConnectCheck and ConnectAndVerify. ConnectCheck is a very
simple check that just makes sure there is a socket open on the other end
and then closes, it's just a simple "hey are the lights on?" test.
The next test ConnectAndVerify, does a little more to check that what ever
is listening on the tested port is actually working as planed, it's like
"hey the lights are on, lets see if anyone is home". The first time you
add a service with a check type of ConnectAndVerify the script goes out
to the port and grabs whatever data port returns such as a banner on a
SMTP port, the / directory on a web server, or even a banner on an SSH
port. With that data in hand the script generates a MD5 hex digest of the
data and stores it as the 'know good return' for that particular service.
Subsequent test of that service will also retrieve the data and generate
a MD5 hex digest of it, but this time the script compares the MD5 from the
'known good' with the just returned data - if it is different, meaning
maybe the service returned some error codes (but in this case still listening
on it's port) the script will set the state of the service in VerifyFailed
and you will get an email alerting you to the change.
Obviously ConnectAndVerify will only work on ports that return some
unsolicited data, ones that don't return data you need to revert to the
ConnectCheck.
SAMPLE RUN SCRIPT:
#!/usr/local/bin/bash
#script name: run.sh
PID=$(ps ux | grep servermonitor.py | grep -v grep | awk '{print $2}')
if [ -n "$PID" ]
then
echo "Script already running"
exit
fi
cd /home/username/servermonitor/
#we want to check to make sure we can even out to the internet
if ping -c 1 www.google.com
then
./servermonitor.py --run
RETURN_VALUE=$?
#if everything ran fine, lets update the status page
if [ $RETURN_VALUE == 0 ]
then
./servermonitor.py --generate-html
fi
fi
SAMPLE CRON:
*/5 * * * * /home/username/servermonitor/run.sh > /dev/null
"""
######################################
#This is the stuff you want to change#
######################################
#How long to wait till we assume the port is not listening, in seconds.
TIMEOUT = 10
#Email address to send status messages to.
EMAIL_TO = "5555551212@txt.att.net"
#Whom should we say the email is from? Honestly, you could put pretty much
#anything in here.
EMAIL_FROM = "yourname@myshell.net"
#SMTP server to send mail though. Personaly I like the idea of using the
#EMAIL_TO address'es MX server as this way we do not have to rely on the shell
#account's local SMTP server.
SMTPSERVER = "atlsmtp.cingularme.net"
#Some kind of descriptive name to describe this service, would be useful
#if you where running multiable copies of this script and wanted to know
#which one sent the status message. Personaly I use the DNS name of the
#shell account.
MYNAME = "myshell.net"
#File name and location to use for the the sqlite database.
DB = "/home/yourhome/servermonitor/servermonitor.db"
#File name and location to write out the HTML version of the status page.
OUTPUT_DIRECTORY = "/home/yourhome/public_html/servermonitor/"
#commonly used http protocol ports, we need to know these so we know to use
#urllib2 rather than socket to connect to them when we do a CheckAndVerify
http_ports = (80, 443)
#these are ports that display a banner upon connection, if you want to do a
#CheckAndVerify on a service that is not HTTP, you need to have the port
#listed here.
banner_ports = (21, 22, 23, 25)
############################################################
#No need to edit anything below, unless you really want to.#
############################################################
import sys
#I'm pretty sure pysqlite is API compatible with python 2.5 sqlite3 with the
#few basic functions I am using.
try:
import sqlite3 as sqlite
except ImportError:
#probably using python 2.4
try:
from pysqlite2 import dbapi2 as sqlite
except:
print "You must have sqlite installed to use this script."
sys.exit(1)
import os
import socket
import md5
import smtplib
import urllib2
from time import strftime
index_html_template = """
Service Checker at """+MYNAME+"""
Last service check: %s
Page generated: %s
| Name | Address | Port | Status | Status state since |
%s
Generated by servermonitor.py
"""
service_html_template = """
| %(host)s | %(address)s | %(port)s | %(status)s | %(timestamp)s |
"""
service_history_template_lead = """
| |
"""
service_history_template_following = """
| %(status)s | %(timestamp)s |
"""
email_template = """This is the monitor service at """+MYNAME+""". \r\n As of %s service %s has become state %s
"""
services_query = "SELECT service_id, name, address, port, checktype, hash FROM service"
service_states_query = "SELECT state_id, id_service, status, timestamp FROM state WHERE id_service = ? ORDER BY state_id DESC"
insert_state_query = "INSERT INTO state (status, id_service, timestamp) VALUES (?, ?, ?)"
class ConnectCheck:
'''This check type does a simple socket test of wether the port has a
listening socket. Returns a two element tuple '''
def __init__(self, address, port, *args):
self.address = address
self.port = port
def check(self):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(TIMEOUT)
try:
s.connect((self.address, self.port))
if self.port not in banner_ports:
return ("Ok", None)
result = ""
while 1:
data = s.recv(1024)
result = result + data
#we are going to only read the first line of the banner
if "\n" in data:
break
s.close()
return ("Ok", result.strip())
except:
s.close()
return ("ConnectFail", None)
class ConnectAndVerify(ConnectCheck):
def __init__(self, address, port, hash):
ConnectCheck.__init__(self, address, port)
self.hash = hash
def _vefifyMD5(self, data):
'''Compare the known MD5 with the one we generate from the returned
socket data'''
new_hash = md5.new(data).hexdigest()
if new_hash != self.hash:
return ("VerifyFail", new_hash)
else:
return ("Ok", new_hash)
def check(self):
if self.port in http_ports:
try:
if self.port == 80:
h = urllib2.urlopen("http://%s:%s" % (self.address,
self.port))
elif self.port == 443:
h = urllib2.urlopen("https://%s:%s" % (self.address,
self.port))
else:
#default to plain http
h = urllib2.urlopen("http://%s:%s" % (self.address,
self.port))
data = h.read()
except:
return ("ConnectFail", None)
return self._vefifyMD5(data)
else:
result = ConnectCheck.check(self)
if result[0] != "Up":
return result
return self._vefifyMD5(result[1])
#TODO:I'm pretty sure there is a magic var i can use to accomplish the same mapping
#I'm doing here, need to look into it.
check_to_class = {'ConnectCheck': ConnectCheck,
'ConnectAndVerify': ConnectAndVerify}
def checkDBExists(database):
if os.path.isfile(database) == False:
print database+" file missing, run this script with the --initialize-database"
sys.exit(1)
def sendNotification(time, service, state):
'''Sends an email notification to the specified address noting a change
in the state of a service.'''
msg = email_template % (time, service, state)
server = smtplib.SMTP(SMTPSERVER)
server.sendmail(EMAIL_FROM, EMAIL_TO, msg)
server.quit()
def buildHTML():
'''Builds the HTML status page for monitored services'''
timestamp_link_template = "%(timestamp)s"
checkDBExists(DB)
conn = sqlite.connect(DB)
conn.row_factory = sqlite.Row
services_text = ""
c = conn.cursor()
c.execute("SELECT value FROM stats WHERE stat ='lastrun'")
lastrun = c.fetchone()[0]
c.execute(services_query)
rows = c.fetchall()
for service in rows:
c.execute(service_states_query, (service[0],))
states = c.fetchall()
if len(states) >= 2:
timestamp = timestamp_link_template % {'id': service[0],
'timestamp': states[0]['timestamp']}
else:
timestamp = states[0]['timestamp']
services_text = services_text + service_html_template % {'id': service[0],
'status':states[0]['status'],
'address':service[2],
'port': service[3],
'host':service['name'],
'timestamp':timestamp}
if len(states) >= 2:
history_data = ""
for history_item in states[1:]:
history_data = history_data + service_history_template_following % {'status': history_item[2],
'timestamp': history_item[3]}
history_file = open(OUTPUT_DIRECTORY + "history_%s.html" % service[0], "w")
history_file.write(service_history_template_lead % {'rowcount': len(states) - 1,
'id': service[0],
'history_data': history_data}
)
history_file.close()
c.close()
conn.close()
html = index_html_template % (lastrun,
strftime("%Y-%m-%d %H:%M:%S"),
services_text)
fh = open(OUTPUT_DIRECTORY + "index.html", 'w')
fh.write(html)
fh.close()
def createSchema():
'''Creates the sqlite db schema'''
if os.path.isfile(DB):
print "Database "+DB+" already exists.\nPlease delete it and try again."
sys.exit(1)
conn = sqlite.connect(DB)
schema = """
CREATE TABLE service(
service_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
address TEXT NOT NULL,
port INTEGER NOT NULL,
checktype TEXT NOT NULL,
hash TEXT);
CREATE UNIQUE INDEX unq_service ON service (address, port, checktype);
CREATE TABLE state(
state_id INTEGER PRIMARY KEY AUTOINCREMENT,
id_service INTEGER NOT NULL,
status TEXT NOT NULL,
timestamp TEXT NOT NULL);
CREATE TABLE stats(
stat TEXT NOT NULL,
value TEXT);
INSERT INTO stats (stat, value) values ('lastrun', null);"""
c = conn.cursor()
c.executescript(schema)
c.close()
def addService(name, address, port, type):
insert_service_query = """INSERT INTO
service (name, address, port, checktype, hash)
VALUES
(?, ?, ?, ?, ?)"""
checkDBExists(DB)
conn = sqlite.connect(DB)
c = conn.cursor()
if type == "ConnectAndVerify":
x = ConnectAndVerify(address, port, '')
result = x.check()
if result[0] == "ConnectFail":
print "Service not up right now, needs to up when you add it"
c.close()
sys.exit(1)
c.execute(insert_service_query, (name, address, port, type, result[1]))
c.execute("SELECT last_insert_rowid()")
service_id = c.fetchone()[0]
c.execute(insert_state_query, ("Ok", service_id,
strftime("%Y-%m-%d %H:%M:%S")))
conn.commit()
elif type == "ConnectCheck":
x = ConnectCheck(address, port)
result = x.check()
if result[0] == "ConnectFail":
print "Service not up right now, needs to up when you add it"
c.close()
sys.exit(1)
c.execute(insert_service_query, (name, address, port, type, ""))
c.execute("SELECT last_insert_rowid()")
service_id = c.fetchone()[0]
c.execute(insert_state_query, ("Ok", service_id,
strftime("%Y-%m-%d %H:%M:%S")))
conn.commit()
else:
print "Unreconized check type, %s" % type
sys.exit(1)
def deleteService(service_id):
'''Deletes a service all associated state entries from the database,
follows it up with a VACUUM to keep the database neet.'''
checkDBExists(DB)
conn = sqlite.connect(DB)
c = conn.cursor()
c.execute("DELETE FROM service WHERE service_id = ?", (service_id,))
c.execute("DELETE FROM state WHERE id_service = ?", (service_id,))
conn.commit()
c.execute("VACUUM")
c.close()
def updateHash(service_id):
'''Updates the hash of a ConnectAndVerify service'''
checkDBExists(DB)
conn = sqlite.connect(DB)
conn.row_factory = sqlite.Row
c = conn.cursor()
c.execute("SELECT * FROM service WHERE service_id = ?", (service_id,))
service = c.fetchone()
x = ConnectAndVerify(service['address'], service['port'], '')
result = x.check()
if result[0] == "ConnectFail":
print "Service not up right now, needs to up when you update it"
c.close()
sys.exit(1)
c.execute("UPDATE service SET hash = ? WHERE service_id = ?", (result[1] ,
service['service_id']))
c.execute(insert_state_query, ("Ok", service['service_id'],
strftime("%Y-%m-%d %H:%M:%S")))
conn.commit()
def padOrTurncate(input, length):
input = str(input)
if len(input) <= length:
return input.ljust(length)
else:
return "%s..." % input[0:length-3]
def listServices():
checkDBExists(DB)
conn = sqlite.connect(DB)
conn.row_factory = sqlite.Row
c = conn.cursor()
c2 = conn.cursor()
c.execute(services_query)
print "-----------+--------------+--------------------------+------+------------------+-------------+--------------------"
print "Service ID | Name | Address | Port | Check Type | Last Status | Status State Since"
print "-----------+--------------+--------------------------+------+------------------+-------------+--------------------"
for service in c:
c2.execute(service_states_query, (service['service_id'],))
state = c2.fetchone()
print "%s | %s | %s | %s | %s | %s | %s" % (padOrTurncate(service['service_id'], 10),
padOrTurncate(service['name'] ,12),
padOrTurncate(service['address'] ,24),
padOrTurncate(service['port'], 4),
padOrTurncate(service['checktype'] ,16),
padOrTurncate(state['status'], 11),
state['timestamp'])
print "-----------+--------------+--------------------------+------+------------------+-------------+--------------------"
c.execute("SELECT value FROM stats WHERE stat ='lastrun'")
lastrun = c.fetchone()[0]
print "Last run: %s" % lastrun
def main():
'''This is the main loop that checks all the services and updates the
state table and sends the alert mail if necessary'''
checkDBExists(DB)
conn = sqlite.connect(DB)
conn.row_factory = sqlite.Row
c = conn.cursor()
c2 = conn.cursor()
c.execute(services_query)
for row in c:
x = check_to_class[row['checktype']]
checker = x(row['address'], row['port'], row['hash'])
result = checker.check()
c2.execute(service_states_query, (row['service_id'],))
last_state = c2.fetchone()
if last_state['status'] != result[0]:
current_time = strftime("%Y-%m-%d %H:%M:%S")
sendNotification(current_time,
"%s:%s" % (row['address'], row['port']),
result[0])
c2.execute(insert_state_query, (result[0], row['service_id'],
strftime("%Y-%m-%d %H:%M:%S")))
#conn.commit()
c.execute("UPDATE stats SET value=? WHERE stat='lastrun'", (strftime("%Y-%m-%d %H:%M:%S"),))
c.close()
c2.close()
conn.commit()
conn.close()
if __name__ == '__main__':
import optparse
parser = optparse.OptionParser()
parser.add_option('--run', '-r', action="store_true", dest="run",
help="Do the main service check on all services")
parser.add_option('--initialize-database', '-i', action="store_true",
dest="create", help="Initialize the database")
parser.add_option('--generate-html', '-g', action="store_true",
dest="generate",
help="Generate the HTML rendering of the status page")
parser.add_option('--add-service', '-a', action="store", dest="service",
help = """Add a new service to check. \n
--add-service
= The 'pretty' name to call your service. \n
= The DNS name or IP address of your service \n
= The port to check \n
= ConnectCheck OR ConnectAndVerify""")
parser.add_option('--delete-service', '-d', action="store",
dest="delete",
help="Delete a service and all assosisated state entrires by its service id")
parser.add_option('--update-hash', '-u', action='store',
dest="update",
help="Update the hash of a ConnectAndVerify service when the returned data has changed")
(options, args) = parser.parse_args()
if options.create == True:
createSchema()
elif options.generate == True:
buildHTML()
elif options.service is not None:
if len(args) != 3:
print "Incorrect number of arguments"
print "--add-service "
sys.exit(1)
addService(options.service, args[0], int(args[1]), args[2])
elif options.run == True:
main()
elif options.delete is not None:
deleteService(options.delete)
elif options.update is not None:
updateHash(options.update)
else:
listServices()