Data Collection
This chapter discusses how VDM collects data for subsequent analysis in usage, capacity and directory reports.  Data collection must be done before usage, directory and capacity reports can be generated.  Cohort users can take advantage of the new common data collection procedure in RA Version 6 or 7; with this procedure disk information for RA and VDM is collected in one pass, reducing data collection overhead by nearly 50 percent. Refer to the section on VDM and RA in the RA user manual for more information; since RA’s data collection is more involved than VDM’s the procedures were added to RA.

Process Description
The data collection process performs the following operations on every disk drive you've specified in DISKNAMES.DAT:


. Generate usage data
Run the OpenVMS utility ANALYZE/DISK with the /USAGE qualifier on the drive.  This creates a temporary file containing usage by file, one entry per file, and can exceed 5,000 blocks in size. VDM deletes them as soon as it has summarized them.

. Create VDM data files
Read the temporary usage files for the usage information and create the VDM data files which are later used to create VDM reports.  This is divided into four phases:

Determine usage by disk drive
Create usage by UIC/identifier
Create usage by directory (Optional)
2.4    Run the special FILES report which shows a breakdown of files by size, the largest 50 files and files that may need to be purged. (Optional)

. Get quota data (Optional)
Enable quotas (if they are not enabled), rebuild the quota file, extract quota information into a temporary file and disable quotas (if they were disabled originally).  This part of collection is optional.  If it is not performed, quotas will be displayed as zero on all reports.

System Impact
Although data collection can be run while there are other users on the system, it is best to run it when there is no activity since it uses the OpenVMS ANALYZE/DISK and DISK QUOTA utilities which freeze volume activity momentarily and produce a sizable load on the drive for several minutes.  Refer to the chapters on ANALYZE/DISK and DISK QUOTA for more details.

Conflict with BACKUP
If you run BACKUP at night, be sure that the run time does not conflict with the data collection run.  No problems will occur, but data collection will take much longer if BACKUP is also running.  The same could be said for any other site specific jobs that run off-line at night.  VDM data collection is best performed on a lightly loaded system.

Automatic submission
Once the data collection procedure is set up, it will normally run without further interaction from you.  However, if your system goes down while a data collection is running, you may have to resubmit the batch job.

Time required
VDM requires approximately 2 minutes per disk drive for data collection and consolidation on a MicroVAX 3100. The exact time is dependent on the type of disk drive, the number of files, the number of UICs, the amount of activity on the system and the processor type. You will have to make sample runs to determine the time more exactly on your system.

Collecting quota information
VDM allows you to collect UIC quota information using the OpenVMS DISKQUOTA utility.  By default, VDM will collect quota information. If the quota information is not important to you, you can remove the call to COLLECT_QUOTA in the collection procedure. Avoiding the collection of quota information will decrease the data collection time.

Collecting directory information
By default, VDM collects directory information. VDM allows you to exclude data collection for directories. Removing directory collection will reduce data collection time. If you do not want to generate current or capacity reports in directory sequence, remove the VDM/DIR_COLLECT command from the data collection procedure. Disabling directory collection will also decrease the amount of disk space consumed by the VDM data files substantially.
Specifying disk drives
 VDM uses a file DISKNAMES.DAT in the [VDM.DAT] directory to determine which files to collect data on. Modify this file to fit your needs. If you maintain scratch drives which do not contain permanent data, they would normally be excluded from the data collection process. The maximum number of drives that VDM can handle in a single pass is 200.

Specifying collection time
By default, VDM starts collecting data at 10 p.m. This is normally a period of low activity so that data collection won't impact other processes. If you want to collect data at some other time, modify the COLLECT.COM command procedure. Remember to consider other off-line procedures such as BACKUP.

Generating file size reports
By default, VDM generates a file size report. You can omit this report if you wish. For more information, refer to the chapter "Generating File Size Reports".

Specifying the location of the batch log
VDM will put the batch log from data collection in the VDM_DAT directory. If /LOG_FILE is left out of the SUBMIT call, the default directory for the log file is SYS$LOGIN.

Side benefits
VDM data collection has two side benefits you may be interested in. The first is that ANALYZE/DISK is run on your disk drives on a regular basis as part of the data collection process. This may enable you to detect errors on a drive earlier than you normally would.  The second benefit is that the disk quota information is rebuilt each data collection if you choose to collect quota information. This means that the information in the quota file will be more accurate.