While writing or modifying Check_MK checks, one thing that will often become obvious is that it's easy to mess things up, with relatively little effort. Since the checks you write aren't readily executable outside of Check_MK either, the repeated cycle of making a change, precompiling your host checks, manually running checks for some host, then figuring out what went wrong can become a bit tedious.
Being a programmer at heart, and with Check_MK checks being Python code, I decided to attack the problem the same way I would if doing regular software development: write tests!
This can be done either before or after making the required changes, depending on how you feel about test-driven development; in either case you'll need to structure the check code in a certain way though.
To begin with, import the unittest module (1), which is part of standard Python:
Now add tests. These follow regular unittest convention, so you need to define a class that inherits unittest.TestCase (1). Next, we define a helper function that splits up our fake agent output for processing by the check function (2), and then write one or more test functions for whatever it is we want to test (3):
Test function names must always start with the letters "test". They will in most cases simply need to provide some input, call the inventory and/or check functions, and make sure that the results match expectations. Have a look at http://docs.python.org/2/library/unittest.html for more information on what you can do with unittest.
The main script is where things happen.
A regular check simply sets up check_info etc immediately after defining any required functions:
We will do that inside a try block instead. If we get a NameError, check_info doesn't exist in our environment and we can reasonably assume that our check was called outside of Check_MK - so just execute the tests instead:
Now we're about to run our tests. Let's say we've make a mistake when checking our critical threshold; we could get the following output:
The problem will appear in the summary (1), with a detailed description of the differences between expected and actual results below (2). Apparently, though the check result should be CRITICAL, the numerical code I'm returning is 1, which means WARNING to Nagios.. This must obviously be fixed!
After fixing the issue above, so that the check function returns the correct numerical code for our CRITICAL status, we run the tests again. If everything looks good this time, we get the following output instead:
Looking good! We should be able to put this check into production use without worrying too much about breaking things.
Writing tests for your Check_MK checks will give you more confidence in deploying changes to your monitoring environment, and should help in preventing problems both for new checks, and when making changes to old checks. You don't need to do it all at once of course - when about to make a change, just start with writing tests for the CURRENT expected functionality of the check, making sure the tests pass. THEN you can make your changes, and if you break things, it should become immediately obvious.
Bonus points for writing tests for the new functionality BEFORE you implement it, so you can watch your new tests break first, then go OK as you start to get things right.. See http://en.wikipedia.org/wiki/Test-driven_development for more information.
Obviously, mistakes can still be made, and your checks are only as good as your tests in that respect - but if you have actual tests, at least the mistakes in code under test coverage has to be made in two places for you to miss it.