=============================================
Xymon config/installation/manipulation notes:
=============================================

Lessons learned:
================

*   xymond listes on port 1984 - useful for firewall restrictions.
*   Acknowledging an alert from the CLI: ::

        xymon 127.0.0.1 'hobbitdack ${alert_id} ${time_in_minutes} ${alert_msg}'

*   To segregate alerts by filesystem:

    *   In analsysi.cfg, ensure appropriate filesystems (and other alerts)
        are grouped: ::

            HOST=%client[1-3]
                DISK /opt/app GROUP=mw 90 95
                DISK * GROUP=infra 90 05
            HOST=client4
                DISK /opt/app GROUP=dol 90 95
                DISK * GROUP=infra 90 05

    *   In alerts.cfg, use GROUP name as the *host*: ::

            GROUP=mw
                MAIL $Middleware
            GROUP=dol
                MAIL $Dkoleary
            GROUP=infra
                MAIL=$Mpiunix

*   To disable a test for a period of time: ::

        xymon 127.0.0.1 'disable ${host}.[${test}|*] ${minutes} ${free_text}'

    Can set ${minutes} to -1 to disable it until it comes back good again.

*   To ID the alert_id of a test - in fact, to obtain quite a bit of info
    regarding a test.

    *   ``xymon localhost 'xymondlog ${host}.${test}`` displays test status.
        see xymon manpage, xymondlog section, for details.  Note: DO NOT
        haveto be root to run it.  ::

            $ xymon localhost 'xymondlog client4.disk'
            client4|disk|red||1412444137|1412448972|1412450772|0|0|192.168.122.25|1578903790|||Y|
            red Sat Oct  4 13:56:11 CDT 2014 - Filesystems NOT ok
            &red /opt/app (100% used) has reached the PANIC level (95%)
            
            Filesystem            1024-blocks    Used Available Capacity Mounted on
            /dev/mapper/vg00-root     1032088  370652    609008      38% /
            /dev/vda1                  495844   67751    402493      15% /boot
            /dev/mapper/vg00-opt      1032088   34060    945600       4% /opt
            /dev/mapper/vg00-tmp      2064208   68616   1890736       4% /tmp
            /dev/mapper/vg00-usr      4128448 1684704   2234032      43% /usr
            /dev/mapper/vg00-var      2064208  439152   1520200      23% /var
            /dev/mapper/vg00-app      2064208 2042292         0     100% /opt/app

    *   To display the alert id, parse the above output: ::

            $ xymon localhost 'xymondlog client4.disk' | head -1 | \
            awk -F\| '{print $11}'
            1578903790

*   To display the results of a test across the env: ::

        # xymon localhost 'xymondboard test=lntp'
        client1|lntp|green||1412468328|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014
        client2|lntp|green||1412468328|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014
        client3|lntp|green||1412472773|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014
        client4|lntp|green||1412468328|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014
        ldapsvr|lntp|green||1412468348|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014
        syslog|lntp|green||1412468348|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014
        xymon|lntp|green||1412470655|1412478507|1412480307|0|0|127.0.0.1||green Sat Oct  4 22:08:27 CDT 2014

    See xymon man page for details.

Items to learn:
===============

*   How to set up different scripts.  For instance, for ntp testing

Notes:
======

08/17/14

*   Got xymon and four clients running.  Downloaded rpms for the same version
    we're using at work from http://terabithia.org/rpms/xymon/.  Server and
    clients installed but not configured and running.

*   Still need to: 

    *   Edit /etc/xymon-client/xymonclient.cfg updating XYMONSERVERS.
    *   Figure out the server configuration.

09/01/14:

*   Scratch that and reverse.  Got xymon installed on new vm, called xymon
*   Got xymon-client running on six other clients.  
*   xymon.conf for http access put in place automatically.  That's nice.
*   xymond listens on port 1984 - useful for firewall restrictions.
*   **Got** my ghost clients.  Nice!
*   Read through the hosts.cfg man page.  Nothing too out of the ordinary.
*   One interesting bit, though, was the .default. tag, used for identifying
    default tests on otherwise unidentified hosts.  That's how you get the
    new hosts in the ghosts page.
*   OK: got my two groups, cient and infra, got clients all green and 
    got one host in infra red.  
*   Next goals:

    *   ack alerts
    *   rewrite ntp reporting.

09/02/14:

*   Read through alerts.cfg.  I think I found out, at least initially, how
    to configure disk alerts to go to other people.  Specific lines: ::

        For some tests - e.g. "procs" or "msgs" - the right group of people 
        to alert in case of a failure may be different, depending on which 
        of the client rules actually detected a problem. E.g. if you have 
        PROCS rules for a host checking both "httpd" and "sshd" processes, 
        then the Web admins should handle httpd-failures, whereas "sshd" 
        failures are handled by the Unix admins.

        To handle this, all rules can have a "GROUP=groupname" setting. When 
        a rule with this setting triggers a yellow or red status, the groupname
        is passed on to the Xymon alerts module, so you can use it in the alert
        rule definitions in alerts.cfg(5) to direct alerts to the correct group
        of people.

    Need to experiment a bit with that one.

09/05/14:

*   Files:

    *   hosts.cfg: IDs the hosts to monitor and tests to run on them.
    *   analysis.cfg: IDs specific parameters for each host:

        *   memphys
        *   memswap
        *   memact
        *   load
        *   up
        *   disk 

    *   alerts.cfg: IDs who gets alerted for what.

*   Updated analysis.cfg and alerts.cfg to direct emails for specific
    filesystems to specific groups.  Trick is as follows:

    *   analysis.cfg: ::

                HOST=%client[1-3]
                    DISK /opt/app GROUP=mw 90 95
                    DISK * GROUP=infra 90 05
                HOST=client4
                    DISK /opt/app GROUP=dol 90 95
                    DISK * GROUP=infra 90 05
                HOST=%xymon|ldapsvr|syslog
                    DISK * GROUP=infra 90 05

    *   alerts.cfg: ::

                GROUP=mw
                    MAIL $Middleware
                GROUP=dol
                    MAIL $Dkoleary
                GROUP=infra
                    MAIL=$Mpiunix

*   Didn't get duplicate alerts, though.  When client[14] were already alerting
    due to disk issues, the alert didn't go out for /tmp.  That may be expected.
    Will have to check on that w/Justin at some point.

09/06/14: 

Remaining goals:

*   How to ID the alert number if it's not emailed out. 
    Answer:  /var/lib/xymon/histlogs/${host}/${test}: Nope; not it.
*   How to script an alert on a client.  (ntp)
*   How to send alerts to scripts (for further redirection to OVO)

Well, didn't find out how to acknowledge a specific alert but I did find out
how to disable the damned thing for a bit.  That, at least, makes it go away
for the duration.  I disabled caauth until it comes live again.  At work, I
disabled walvdevwapp062's memory until 0800 monday morning, and I disabled
nap-lvad-075's memory until it goes green again.  Damn thing's been yellow 
for pushing 20 days now...

Still, remaining goals:

*   How to script an alert on a client.  (ntp)
*   How to send alerts to scripts (for further redirection to OVO)

10/04/14:

Been a bit.  Vacation, new role at work, and complete and utter task saturation.

Today's work: figure out how to identify the alert_id from an alert that's not
mailed out.  To do that, I'm going to kick off an alert, wait for the alert, 
then find the fucking alert_id.

OK: forgot the firewall update on xymon. That's sorted now.  Alert ID for the
client4:disk is  1578903790

Found the fucker!  ::

    xymond "xymondlog ${host}.${test}"
    
example: ::

    # xymon "xymondlog client4.disk"
    2014-10-04 12:39:36 No recipient specified - assuming localhost
    client4|disk|red||1412444137|1412444338|1412446138|0|0|192.168.122.25|1578903790|||Y|
    red Sat Oct  4 12:38:57 CDT 2014 - Filesystems NOT ok
    &red /opt/app (100% used) has reached the PANIC level (95%)
    
    Filesystem            1024-blocks    Used Available Capacity Mounted on
    /dev/mapper/vg00-root     1032088  370652    609008      38% /
    /dev/vda1                  495844   67751    402493      15% /boot
    /dev/mapper/vg00-opt      1032088   34060    945600       4% /opt
    /dev/mapper/vg00-tmp      2064208   68616   1890736       4% /tmp
    /dev/mapper/vg00-usr      4128448 1684704   2234032      43% /usr
    /dev/mapper/vg00-var      2064208  438492   1520860      23% /var
    /dev/mapper/vg00-app      2064208 2042292         0     100% /opt/app

Or, more exlicity: ::

    xymon localhost "xymondlog client4.disk" | head -1 | \
    awk -F\| '{print $11}'

Combining that with our ack cli: ::

    xymon localhost 'hobbitdack ${alert_id} ${time_in_minutes} ${alert_msg}'

    xymon localhost 'hobbitdack 1578903790 5 testing cli alert ack'

Then, update xymonserver.cfg to not propogate acknolwedged alerts and 
your non-gree view becomes much clearer.  ::

    XYMONGENOPTS="--nopropack='*'...

OK; some excellent progress today.  That was one of the main goals.  If
ntp's still fucked up, I can probably live with that.  I **really** wanted
to be able to acknowledge those goddamned alerts, though.