VDM Monitor

This chapter shows you how to use VDM to check for and report low free space, errors on your disk drives and changes to critical files automatically.  With it, you can eliminate production problems and database inconsistencies caused by system hangs resulting from lack of free space.

VDM/MONITOR guards free spac, checks for errors on your disk drives and checks critical files for changes in size or content.  At intervals you set (from 1 to 60 minutes), it checks how many blocks are free on each disk drive and whether any errors have occurred.  If any drive has less than the minimum free blocks you've set, it sends an alarm and recovers space by submitting a batch procedure to purge multiple file versions and delete temporary files.  Two different free space thresholds can be set for each drive with a batch procedure for each threshold.  You can use the standard cleanup procedures supplied with the package or create your own.  If any errors have occurred during the specified interval, VDM/MONITOR sounds an alarm and, optionally, sends mail to the user(s) you specify.  If VDM detects changes in a file, according to criteria you specify, it will take any combination of actions you specify; sending console messages, sending mail and submitting a correction procedure to batch.

To invoke VDM's disk monitoring function, type "VDM/MONITOR" at the command level along with the desired qualifiers.  This creates a detached process which monitors your disk drives and files. Various /monitor qualifiers allow you to add, remove and change entries in VDM’s permanent data file which contains all the details on what the monitor is watching for. You can enter "HELP VDM Monitor_disk" at the command level to obtain information about disk monitoring.

Getting Started
We recommend you do the following to get started:


. Create the permanent data file
 Do this by typing any VDM/MONITOR command that changes the permanent data file, such as VDM/MONITOR/NOACTIVE:

    $ VDM/MONITOR/NOACTIVE
    %VDM_W_ERROPEPERM, Error opening permanent data file
    %RMS_E_FNF, file not found
    %SYSTEM_W_NOSUCHFILE, no such file
    Cannot open the permanent data file
    Would you like to create a new one(y/n) [Y]? y
    %VDM_E_NOTRUN, VDM/MONITOR is not running

VDM/MONITOR responds by telling you that it can't open the permanent data file, because the file was not found.  It then asks you if you want to create a new one.  When you say yes, it creates the new data file and fills it with the defaults for all parameters.  It also gives you the error message that VDM/MONITOR is not running, because the detached process has not been started.  You can ignore that error for now.

. List the permanent data file
 You can see what the defaults are for the permanent file, by doing a list:

    $ VDM/MONITOR/LIST
    Logging to the operators console is enabled
    .
    .

The list shows which defaults VDM/MONITOR will use for each disk you add to the monitor list. You may override many of these defaults on a disk by disk basis.

NOTE: VDM's disk monitor will not check for free space thresholds or errors on any device unless you add that device to the permanent data file.

. Change parameters
You will probably want to change some of the defaults. You can do that now. For example, to disable the log file, you would enter the command:

$ VDM /MONITOR /NOLOG
%VDM_E_NOTRUN, VDM/MONITOR is not running

No messages will be written to the log file until a /LOG command is entered.

Here is another example to mail a message to the user SYSTEM when low free space is detected:
    $ VDM/MONITOR/MAIL=(USER=SYSTEM,QUEUE=SYS$BATCH)
    %VDM_E_NOTRUN, VDM/MONITOR is not running

Here is another example, to add DRA1: to the list of drives to be checked using the default thresholds:

    $ VDM/MONITOR/DISK=DRA1:
    %VDM_E_NOTRUN, VDM/MONITOR is not running

NOTE: When you change a parameter, VDM/MONITOR tells the detached process to read the parameter file.  Since the detached process doesn't exist at this point, you get the "VDM/MONITOR is not running" message.  This can be ignored.

. List the changed parameters
When you have modified the setup to your needs, check the permanent data file to ensure that all parameters are as you want them:

    $ VDM/MONITOR/LIST
    Logging to the operators console is disabled
    .
    .

. Start the detached process
 When the defaults are satisfactory, start the detached process:

    $ VDM/MONITOR/START
    %VDM_S_PROCID, Identification of created process is 00000AFB

The above command starts the detached process, with a system assigned process ID of 00000AFB.  The process name is "VDM_MONITOR".

NOTE: To limit the possibility of problems with the detached process and batch jobs it submits, we suggest you issue the VDM/MONITOR/START command while logged in as SYSTEM. If you add this command to your system startup file the monitor will automatically be started under the SYSTEM account.

. Define cleanup procedures
There are standard cleanup procedures supplied with the distribution kit.  Their names are WORRY and PANIC.  Once you see from the console log and the VDM/MONITOR log file how the parameters you have set up are working and are satisfied that this is the correct behaviour, you can start running VDM/MONITOR live:

    $ VDM /MONITOR /ACTIVE

Now VDM/MONITOR will monitor free space and activate cleanup procedures.

Once you have the permanent data file set up, it is not necessary to change it again.  To start up VDM/MONITOR automatically, add the command VDM/MONITOR/START to the system specific startup file SYS$MANAGER:SYSTARTUP.COM.  The command should be inserted after the part in the command procedure that defines all the logical names.  Whenever the system starts up, VDM/MONITOR will be started.  You should also add the command VDM/MONITOR/STOP to the system specific shutdown command procedure SYS$MANAGER:SYSHUTDWN.COM.  This will terminate the VDM/MONITOR detached process and close the log file.

Defining thresholds
VDM/MONITOR has two free space thresholds called WORRY and PANIC.  Use WORRY and PANIC threshold clauses to define free space thresholds and actions on a device.

Threshold clause
A threshold clause consists of the following two items:

• A threshold level.  A threshold is the free space point below which action should be taken.  You can specify a threshold as a percentage of the disk, using the PERCENT=n clause, or an exact number of blocks, using the BLOCKS=n clause.

• The name of the command procedure that you wish to run when the threshold is reached.

For example, DUA1: is an RA81 disk drive.  The RA81 has a total of approximately 800,000 blocks.  We want to purge the disk if we have less than 10 percent of the disk free, and we want to clean up temporary files if we have less than 10,000 blocks.  Thus we have the following VDM/MONITOR command.

$ VDM /MONITOR /DISK=DUA1: /WORRY=(PERCENT=10)/PANIC=(BLOCKS=10000)

WORRY
Use the /WORRY=(...) threshold_clause to indicate what the first free space threshold is. The default is /WORRY=(PERCENT=15).

You can also specify a specific number of blocks as the threshold using the BLOCKS=n threshold clause.

PANIC
Use the /PANIC=(...) threshold_clause to indicate what the second free space threshold is. The default is /PANIC=(PERCENT=5).

You can also specify a specific number of blocks as the threshold using the BLOCKS=n threshold clause.

For a more detailed explanation of thresholds, please see the section of the manual that describes the /WORRY and /PANIC qualifiers.

Setting up VDM to monitor files
 New with Version 6 of VDM is the ability to set up VDM to monitor files. Files can be monitor for any of the following event types:
-     a file exists            a file with a specific name has been created
-     a file does not exist         a file with a specific name cannot be found any longer
-     a file has changed        a check of certain VMS file attributes indicates that this file has been modified since the last time it was checked
-     a file has grown         a file has become larger than a threshold number of blocks
-     a file has shrunk        a file has become smaller than a threshold number of blocks
with each of these event types it is possible to configure VDM to send console messages, send mail and/or submit a batch procedure when the event is detected. The procedures for setting up and taking advantage of file monitor events are very similar to those for disks. The /file and /event qualifiers are used in conjunction with the /monitor qualifier to add file events to the monitor’s permanent data file. The file size events can be configured to check against disk blocks used, allocated or unused; since OpenVMS often allocates blocks long before it marks them as used this can be particularily useful for catching problems with files that are growing beyond reasonable size.

 These events can be configured to detect a number of possible problems before they become serious and allow you to correct the problem, often automatically using a batch job. The following examples should help to illustrate the usefulness of these events.        

1.    This event will submit a job to automatically reset the OpenVMS accounting file when it exceeds 15,000 blocks used. The correction procedure used simply does a SET ACCOUNTING/NEW command and then purges the system accounting file keeping 3 versions. Mail will also be sent to the SYSOP and a console message broadcast.

    $ VDM /MONITOR /FILE=SYS$SYSTEM:ACCOUNTNG.DAT -
    /EVENT=(NODE=THISND, TYPE=SIZE, BLOCKS=15000, SIZE=USED, -
    COMMAND=SYS$MANAGER:RESET_SYSTEM_ACCOUNTING.COM,-
    QUEUE=THISND$BATCH) /MAIL=SYSOP /CONSOLE

2.    This command will submit a job that performs a comparison and mails the results to the system operator everytime the OpenVMS system startup file gets changed on THISND. Mail is also being sent to the system operator directly by VDM so that the operator will be aware if the comparison job fails for any reason. Setting up this event and keeping a copy of the system startup file in another directory will allow the operator responsible for this file to quickly determine if an unauthorized change has been made to the system startup file.

    $ VDM /MONITOR /FILE=SYS$SYSTEM:SYSTARTUP_VMS.COM -
    /EVENT=(TYPE=CHANGE, NODE=THISND, QUEUE=THISND$BATCH, -
    COMMAND=SYS$MANAGER:COMPARE_STARTUP.COM) /MAIL=SYSOP

To remove a file event from the VDM Monitor permanent data file simply enter:

    $ VDM /MONITOR /NOFILE=filename

using the exact same filename that the event was added with. The event will be deleted from the permanent data file and VDM will stop monitoring that file.


Defining cleanup procedures
A batch procedure can be submitted when you reach either of the WORRY or PANIC thresholds.  You define the contents of the procedure.

Writing cleanup procedures
When a threshold is crossed, VDM/MONITOR executes a cleanup procedure by submitting a command file to run on a specified batch queue.  The purpose of this procedure is to free disk space by eliminating unnecessary files.  Two possible cleanup procedures may be executed _ the worry command procedure or the panic command procedure.  Remember, you define the contents of these procedures.

 The command procedure for the first threshold level (WORRY level) should purge multiple versions of files.  Two parameters, P1 and P2, are passed to the worry command procedure. P1 is the device name.  P2 is the worry threshold level.  P2 can be used to give you more control over the cleanup procedure.  For example, you can check the number of free blocks on the disk against P2 to see if you have passed the threshold level yet.  The command procedure can then execute the commands PURGE/KEEP=3 or PURGE/KEEP=2, depending on this check.

The command procedure for the second threshold level (PANIC level) should delete temporary files.  It should also execute at a higher priority, ie., priority 5 either by running in a separate queue or containing a SET PROCESS/PRIORITY command in the procedure. Two parameters, P1 and P2, are passed to the panic command procedure.  P1 is the device name.  P2 is the panic threshold level.  P2 can be used in the panic command procedure similar to P2 in the worry command procedure.

Purging multiple versions of a file
  As files are modified, OpenVMS creates new versions of files and retains the previous version.  This is a great convenience but can consume a lot of space.  Purging multiple versions of a file frees up space without eliminating the most current version.  You can specify how many versions of the file are to be kept and specify which directories are to be purged.  Within OpenVMS, you can automatically restrict the number of versions of a file in a directory.  You can set versions limits on a directory, or on a specific file.  The DCL commands to do this are:

    $ SET DIRECTORY /VERSION_LIMIT=n directory_name
    $ SET FILE /VERSION_LIMIT=n file_name

where n is the number of versions of the file(s) you want to keep.  For example, if you set the version limit to 3, then when the fourth version of a file is created, the first version is automatically deleted.  If you are already using version limits on files and directories, then purging the disk will have a lesser effect on the free space than if you were not using them.  If you set these limits, they will not be in effect on a file until it has been purged to the number of version you specified (or less).  For further information on setting version limits on directories and files, see the OpenVMS manual, DCL DICTIONARY.

Deleting temporary files
In the course of operation, many programs create temporary files which aren't always deleted after completion.  Any file that can be easily regenerated can also be considered a temporary file.  For example, the listing and object files generated from a FORTRAN compile can be easily recreated, by simply recompiling the FORTRAN source.  RUNOFF output files, such as file.MEM, file.BRN, etc can be recreated if you have the original RUNOFF document.  It is important to remember that this type of file should only be considered temporary if the file that it was generated from is readily available, and the procedure for creating the output files does not involve a significant amount of work.

Truncating files
Use the OpenVMS command SET FILE/TRUNCATE file-spec to truncate files to the number of blocks they are using as opposed to the number they have allocated.

Recording activity
VDM/MONITOR can record every action it makes.  This record can be logged to a file.  It can also be sent to the operator console if you wish.

Log information
The following information is contained in each log record:
• Date            • Threshold
• Time            • Cleanup Procedure name
• Drive number

System console log
Use the /CONSOLE qualifier to tell VDM/MONITOR to send a message to all operators enabled as a CENTRAL operator when a cleanup procedure is activated.  Use the /NOCONSOLE qualifier to suppress messages to these operators.  The default is /CONSOLE.

Log file
VDM/MONITOR automatically logs all activity to a file.  Use the /LOG qualifier to tell VDM/MONITOR to close the existing file and start a new one with the same name and a higher version number.  This allows you to look at the log file.

The log file is, by default, called VDM_DAT:VDM_LOG_FILE.LOG.  To change its name, define the system logical name VDM_LOG_FILE to be the file name you want.  For example:

    $ DEFINE /SYSTEM VDM_LOG_FILE DUA1:[FRED]DISK_LOG.DAT

will cause the new log file to be opened as "DUA1:[FRED]DISK_LOG.DAT".  Use the /NOLOG qualifier to close the current log file and not to open a new file.  No messages will be logged to the log file, until a new file is opened with the /LOG qualifier.

Sending Warning Messages
In addition to submitting cleanup procedures, VDM/MONITOR can send alarm messages when a threshold is crossed.

Broadcast
VDM/MONITOR has the ability to send a warning message to user terminals when disk space gets low.  This allows the users to cleanup their own accounts if they have space allocated to temporary files or multiple versions.

Use the /BROADCAST qualifier to indicate that a message is to be sent.  Use the /NOBROADCAST qualifier to eliminate warnings.  The default is /BROADCAST.
 If broadcast is turned on, it will send a message to all terminals, telling them that the panic stage has been reached.  If the free space on a disk gets below 100 blocks, VDM/MONITOR will broadcast a message to all terminals regardless of what the broadcast switch is set to.  It will broadcast this message every time it checks the disk space, and finds it below 100 blocks.  If a user does not want to receive these messages, execute the command:

    $ SET BROADCAST=NOGENERAL

This will still allow the user to receive messages from DCL (spawn and CTRL/T messages), mail, phone, etc, but they will not receive the VDM/MONITOR messages.  If the user has his terminal set /NOBROADCAST or has typed CTRL/S, the warning message will not be displayed.

Mail
VDM/MONITOR can send a mail message to a user or group of users.  If the mail qualifier is used, it will send this mail message to the specified users when the panic stage is reached. For more information on how the mail qualifier works, see the command qualifiers section of the manual.

Collecting information
VDM/MONITOR can collect information about free space without activating any cleanup procedures or generating any alarms.  VDM/MONITOR has the ability to monitor the drives and record when thresholds were crossed and cleanup procedures would have been activated.  This function is used to collect information about the activity on your system before turning VDM/MONITOR loose.

Use the /NOACTIVE qualifier to prevent procedures from being activated.  The default is /ACTIVE.  /NOACTIVE disables the activation of procedures and warning of terminals but the log file, console and mail functions continue to operate.

NOTE: We recommend that you collect information for a few days before you start activating cleanup procedures.

Here are some sample VDM/MONITOR console messages:

This message is created when VDM/MONITOR starts up, or from VDM/MONITOR/LOG
%%%%%%%%%%% OPCOM 7_JUL_1999 16:48:44.03 %%%%%%%%%%%
Message from user SYSTEM
7_JUL_1999 16:48:43.77 VDM/MONITOR _ New log file created

This is the message when VDM/MONITOR submits the WORRY procedure
%%%%%%%%%%% OPCOM 7_JUL_1999 17:00:32.17 %%%%%%%%%%%
Message from user SYSTEM
7_JUL_1999 17:00:31.11 VDM/MONITOR _ 23,900 free blocks on _DRA0:, Worry Batch Procedure submitted

This is the message when VDM/MONITOR submits the PANIC procedure
%%%%%%%%%%% OPCOM 7_JUL_1999 17:01:25.97 %%%%%%%%%%%
Message from user SYSTEM
7_JUL_1999 17:01:24.17 VDM/MONITOR _ 11,900 free blocks on _DRA0:, Panic Batch Procedure submitted

This is the message when someone modifies the permanent data file
%%%%%%%%%%% OPCOM 7_JUL_1999 17:02:45.51 %%%%%%%%%%%
Message from user SYSTEM
7_JUL_1999 17:02:45.37 VDM/MONITOR _ Permanent data file modified

This is the time stamp
%%%%%%%%%%% OPCOM 7_JUL_1999 17:00:32.17 %%%%%%%%%%%
Message from user SYSTEM
This is VDM/MONITOR, see you in an hour

This is the message from VDM/MONITOR/STOP
%%%%%%%%%%% OPCOM 7_JUL_1999 17:22:59.59 %%%%%%%%%%%
Message from user SYSTEM
VDM/MONITOR shutting down

Handling error messages
Because the VDM/MONITOR process is detached, i.e. not connected to a terminal, it sends all error messages to the console.

There will be one or more messages.  The first will give the error for what happened (i.e. unable to open log file).  The next error message will give the reasons (i.e. insufficient quota.)

All errors from the detached process are signalled on the operators console regardless of the setting of the console logging switch.  This is to ensure that you will know as soon as possible when the VDM/MONITOR detached process stops running.

Signalling these errors is done by establishing an error handler which sends the error to the console, then continues with the error handling. All errors pass through the error handler and it decides whether the error comes from the detached process and has to be signalled.

Here are some sample VDM/MONITOR error messages:

This is an error message from the detached process.

%%%%%%%%%%% OPCOM 7_JUL_1999 19:24:04.00 %%%%%%%%%%%
Message from user SYSTEM
%VDM_W_ERRCREMBX, Error creating mailbox

%%%%%%%%%%% OPCOM 7_JUL_1999 19:24:05.36 %%%%%%%%%%%
Message from user SYSTEM
%SYSTEM_F_NOPRIV, no privilege for attempted operation


Hints for running VDM/MONITOR
This section describes some hints and explanations that you might need for running VDM/MONITOR.

Resource quotas
Since VDM/MONITOR uses asynchronous system traps (AST'S) to time when to check disks, and to be notified when batch jobs complete, you may have to increase the UAF parameter ASTLM.  The process AST limit is the number of AST's that a process can have outstanding at one time.  The detached processes AST limit defaults to the limit specified for the account that started the detached process.  If you see the detached process go into
RWAST (resource wait for AST) state, you may want to increase this parameter.

Stopping and re-starting VDM/MONITOR
 If the system is heavily loaded, and you issue the command VDM/MONITOR/STOP immediately followed by the command VDM/MONITOR/START, you may receive the error message "VDM/MONITOR is already running".  This is because the detached process may not have actually stopped running yet. If this happens, execute the command VDM/MONITOR/START again.
Clusters
When running VDM/MONITOR on a cluster, you should only need to have one copy running across the cluster.  VDM/MONITOR should be executed on a machine that has access to all the disks in the cluster.  If you have disks that are only accessible from one machine, you should run a copy of VDM/MONITOR on that machine, and that copy should have its own permanent data file.  This copy of the data file should contain the disks that are accessible from only that machine.

Disk quotas
If you have disk quotas enabled, the VDM/MONITOR detached process may not be able to create its log files.  This will occur if VDM/MONITOR is started from an account without the EXQUOTA (exceed quota) privilege and the account has exceeded its quota.

If this is the problem, every time VDM/MONITOR's detached process is started, the PID number is displayed (of the created process) but if a $ SHOW SYSTEM command is issued, the detached process is not found.

To find out if the account has exceeded its disk quota, execute the DISKQUOTA Utility and look at the username.  If the permanent quota is less than the usage, increase the quota so it is larger than the usage.  You may increase the quota to 2 times the usage to be safe. The SYSTEM account has the EXQUOTA privilege so the quota does not restrict it.

NOTE: To limit the possibility of problems with the detached process and any batch jobs it submits we suggest you issue the VDM/MONITOR/START command while logged in as SYSTEM. If you add this command to your system startup file the monitor will automatically be started under the SYSTEM account.        

Debugging the VDM Monitor
    If, at any time, the VDM monitor does not appear to be working use the VDM /MONITOR /STATUS command. This will generate a screen showing whether or not the monitor is running as well as listing every disk and file being monitored and the current state of the event. For disk space events this state will be Okay, Worry or Panic; for file events the state will be either true, if the event has already been detected, or false.


Correcting problems detected by VDM

This section shows you how to correct problems you locate using VDM.

For many VDM operations, you can generate a list of file names in your current directory using the /DCL qualifier.  This allows you to make a list of the files which meet the criteria you've specified.

The [VDM.COM] directory contains a DCL shell procedure named PROCESS_FILE.COM which accepts a parameter specifying a file containing filenames, reads the file and does an operation on each file.  Use this procedure with the output from /DCL to perform "corrective" surgery on files with problems after editing it to include the operator you want to perform on each file.