MCSG: Symptoms/Cause matrix

Title:

MCSG: Symptoms/Cause matrix

Author:

Douglas O’Leary <dkoleary@olearycomputers.com>

Description:

MCSG: Symptoms/Cause matrix

Date created:

06/2007

Date updated:

06/2008

Disclaimer:

Standard: Use the information that follows at your own risk. If you screw up a system, don’t blame it on me…

This, more than any other document in the MCSG section, is going to be a work in progress. The following are symptoms of a cluster problem and how to go about fixing them. Please send me a note with anything that can/should be added.

There are a couple of places to look for errors. Probably the first stop is the package log maintained in the package directory, /etc/cmcluster/${pkg}. The next stop is the syslog and/or the log file created via cmsetlog. Read through those logs carefully; there’s usually only one line indicating the problem and it’ll be easy to miss.

  • Symptom: One of the nodes reboots for unknown reasons.

    Cause: Please check out the Reasons for TOCs for possible causes.

  • Symptom: Activation mode requested for volume group ${vg} conflicts with configured mode

    Cause: Almost definitely caused by the ${vg} not having the exclusive bit set. With the cluster running, execute vgchange -c y ${vg}

  • Symptom: Node ${node} is currently unable to run ${pkg}

    Cause: Local switch for the node is disabled. Check via cmviewcl -vp ${pkg} and reset with cmmodpkg -e -n ${node} ${pkg}

  • Symptom: Can’t find service name ${service}

    Cause: Possibly a problem with the way the service name was identified in the package configuration file. Ensure the name is a single word without quotes

  • Symptom:

    1. Package starts on primary node, then shortly thereafter stops.

    2. Package starts up on each of the adoptive nodes, then shortly thereafter stops.

    3. Once the package has gone through all the adoptive nodes, it dies completely.

    4. The local switches for all primary/adoptive nodes read disabled

    Cause: If the package is running any services, ensure they all have infinite loops. If a service exits, by default, it will cause a package switch and set the node’s local switch to disbabled

  • Symptom: cmviewcl command hangs

    Cause: ps -ef | grep cmclconfd If there are bunches of them, kill them all then restart inetd by running inetd -c