E10K: Debugging Drain Failures

Title:

E10K: Debugging Drain Failures

Author:

Douglas O’Leary <dkoleary@olearycomputers.com>

Description:

E10K: Debugging Drain Failures

Date created:

08/1999

Date updated:

09/1999

Disclaimer:

Standard: Use the information that follows at your own risk. If you screw up a system, don’t blame it on me…

  1. Enable the kernel variable dr_mem_debug by setting its value to -1 using either adb or setting the value in /etc/system and reboot:

    # adb -kw
    physmem 13af5d
    dr_mem_debug/W0x1
    dr_mem_debug: 0x0 = 0x1
    $q
    
  2. Capture the console output from a failed DR drain session. The failed address will be readily apparent, the message will be something to the effect:

    hold_pfns: page not held: <some address>

  3. In an adb session, enter the following command:

    <page address from step 2>$<page
    
  4. Look for the field p_selock. If the value in this field is 1, the problem is possibly related to swap.

  5. If there is a value in the p_vnode field, then enter the following:

    <vnode address>$<vnode
    

    Look for the vop field, this tells us which virtual operation is in progress.

  6. If the value in p_selock is an address, then we need to adjust the value of this address by subtracting 8 from the high order bit. For example, if the p_selock field = c0000000, then the value we need for the next step is 40000000. This is a thread address. To check this, enter the following command:

    <thread address>$<thread
    
  7. Look for the field called procp and get that address. Enter the following:

    <proc address>$<proc2u
    
  8. Search through the screen output for the psargs field. This will indicate the process that is holding the lock.

If this is a 3rd party vendor, we need to know who. In any case, mail the screen output to us and we will look further at it.

Notes on adb: Be careful of the columns. Things don’t always align