PROBE: PROblem data augmented By Experience

Joseph L. Hellerstein

IBM Thomas J. Watson Research Center, Hawthorne, New York

jlh@watson.ibm.com

Chuanyi Ji

Rensselaer Polytechnic Institute, Troy, New York

chuanyi@ecse.rpi.edu

December 2, 1996

Summary

The PROBE Project is intended to provide researchers with access to data needed to develop a new generation of technologies for managing the availability and performance of information systems. These technologies include: proactive detection of service-level degradations, models of MIB variables to facilitate intelligent monitoring, measurement data warehouses that enable advanced decision support for availability and performance management, and more effective algorithms for diagnosing performance problems. Also of interest are data that relate to the management of emerging computing and communication technologies, such as LANs that support mobile users. A repository of problem data is being established at Rensselaer Polytechnic Institute. Guidelines for submitting data to the repository are discussed.

Motivation

Client/server computing. Intranets. Downsizing. These trends have increased the number of systems to manage, added to the diversity of data sources, and reduced the number of skilled people. As a result, it has become increasingly difficult to manage the availability and performance of workstations, LANs, etc. With the rapid evolution of computing and communications technology, the management problem is getting worse. Addressing this situation requires innovative, ground-breaking management technologies. For example, at Rensselaer Polytechnic Institute, work in proactive detection holds the promise for detecting problems before serious service degradations result. At Columbia University, work in forecasting models for MIB variables may simplify monitoring (e.g., making timestamp reconciliation easier). At Queens University, work on measurement data warehouses may provide an efficient infrastructure for a new generation of decision support tools for availability and performance management. To succeed, these efforts require data. These data are different from those that are routinely collected for capacity planning. For example, proactive detection requires collecting time-series data (e.g., sampled at one minute intervals) during periods when problems are not present and during periods when problems are present. The PROBE Project seeks to foster innovative, ground-breaking technologies for availability and performance management by providing the data necessary for the development of these technologies. This project will establish a repository of problem data along with related information that is necessary to develop, tune, and evaluate advanced management technologies. Also of interest are data that foster research in systems and network management of emerging technologies for computing and communications (e.g., mobile networks). The intent is that customers, vendors, and others will contribute appropriately sanitized data to the repository in accordance with guidelines that we are establishing. The data will then be freely available to researchers in universities, government, and private industry. It is expected that the guidelines for submitting data as well as the procedures for accessing data will evolve over time.

Data Submission Guidelines

The guidelines listed below are intended to give potential data providers help as to the information needed by researchers using PROBE. Supplying all of the information listed below could be burdensome. As such, a provider may choose to supply a subset and then allow researchers to contact him/her directly for further details.

The following guidelines are provided based on our understanding of the requirements of existing research projects. We expect these guidelines to evolve as new requirements arise. A data submission may consist of many data sets from multiple measurement tools running on one or more managed nodes. For example, a data submission may consist of Unix(TM) vmstat data from several hosts on two connected LANs along with SNMP data from the routers that connect the LANs. Another example would be RMF Monitor III data collected from one or more partitions in an MVS Sysplex in combination with data from an SNA network connected to the Sysplex. Sampled and event data are preferred, since we anticipate much work in the area of real time detection and diagnosis. Details of the data requirements are listed below:

  1. Background information
  2. Problem description
  3. Variables measured
  4. Data Formats
  5. Contact information. Contact information should be provided so that the PROBE administrator can resolve questions about the data. If it is acceptable to the data provider, contact information will be given to researchers so that they can ask their questions directly. To minimize the burden on data providers, researchers and the PROBE administrator will record the information collected from data providers so that questions need be answered only once.
  6. Analysis Process. For researchers developing decision support tools, it is necessary to understand better how management data are navigated and interpreted.

Repository

The repository is administered and maintained by Professor Chuanyi at Rensselaer Polytechnic Institute. Current access is by ftp and is limited to a few researchers while the repository is undergoing its initial construction. We are investigating the use of resources available to the Computer Measurement Group (e.g., a web site) as an alternative to using machines at RPI.

Responsibilities of Researchers

For the research community to get maximum benefit from the PROBE repository, certain guidelines should be followed by researchers using PROBE data.

  1. Tools developed to access, reduce, and analyze PROBE data should be submitted to the PROBE administrator for general use by the research community. Where appropriate, the tools should be developed according to standards specified by the PROBE administrator.
  2. Access to data providers must be done with respect for their limited time. To this end, the PROBE administrator should be notified of contact made. Also, researchers are responsible for recording the information obtained (e.g., definitions of measurement variables, refinements of configuration information) and providing this to the PROBE administrator.

Current Research Activities

Listed below are efforts that we anticipate will benefit from the PROBE Project.