Click HERE to download a .ZIP file with the WORD97 and PowerPoint97 files
NT Benchmarking
Dr. Bernard Domanski
The City University of New York/College of Staten Island
Email: {domanski [at] mail [dot] csi [dot] cuny [dot] edu } Phone: 732-303-1500
Overview
Our objective here is simply to identify the current state-of-the-art in benchmarking the speed of processors capable of running Window NT®. We will describe some of the standard processor ratings and benchmarks that currently exist in the industry, and, where possible, we will offer ratings of various processors. Our research indicates that a fair amount of work has been done in developing different benchmark suites that correspond to different workloads, each yielding different processor ratings. Finally, pointers to World Wide Web sites will be provided for the reader to access more information on a particular subject.
Existing Benchmarks -
AIM's Server Benchmark For The Windows NT Operating System
AIM Technology's Server Benchmark for Windows NT is a system level WIN32-compliant Benchmark for Windows NT. This benchmark utilizes a load-mix benchmarking technology and is designed to test overall system performance of standard Windows NT Server configurations on Alpha, Intel, and PowerPC platforms.
AIM Technology uses load-mix benchmarking to test how well multi-user systems perform under different application loads. The role of load-mix benchmarking is to apply any type of load to a system running NT. The benchmark includes a pre-defined set of application mixes to model the most general uses of server systems. The two initial application mixes for the Server Benchmark are the Domain Server Mix and File Server Mix. AIM plans to develop additional application mixes in future months.
Standard Mixes
Domain Server Mix/Windows NT
The AIM Domain Server Mix/Windows NT is composed of 50 different tests from what appears to be all subsystem categories. The Domain Server Mix represents a balanced usage of subsystems that are configured as a reportedly typical enterprise shared servur. (Aside - what is really typical? ) The major tasks performed by this typical domain server include email, shared applications such as spreadsheets and word processing, light file transfers, network routing and packet forwarding, and network maintenance.
File Server Mix/Windows NT
The AIM File Server Mix/Windows NT is composed of 37 different tests from nearly all major subsystem categories. The File Server Mix represents a balanced usage of subsystems for a gateway file server. The major tasks performed include file transfers of various sizes (both synchronous and asynchronous), network routing and packet forwarding, system security and access permission checkinG, heavy memory usage and IPC calls.
Custom User Mix
If an application load is thought of as unique, the benchmark is designed to allow customization so that it comes closer`to modeling the actual environment using what is called an application mix template. This provides more control over how to the mix of application loads, giving a more in-depth picture of how systems will run in the unique environment. The application mix can be customized by:
Certified Server Reports
AIM uses these benchmark results to generate third-party independent Certified Reports for Windows NT. AIM produces a Certified Report for Windows NT by combining the results from Server Benchmark, the Workstation Benchmark and the AIM Subsystem Benchmark.
These reports detail the performance results for the selected system running a specific application mix. The Server Report for NT also includes WinNT Peak Performance in application jobs per minute (the system's highest performance level when a significant amount of CPU, RAM, and disk caching is used) and the WinNT Sustained Performance (a measurement taken at maximum system capacity).
Aim Server Tests For Windows NT
The following table lists 59 different subsystem tests for the AIM Server Benchmark for Windows NT. They test the configuration in the following categories: Floating Point Performance, Integer Math Performance, Disk Performance, InterProcess Communication, Directory Routines, System and Memory Performance, Run Time Support, and Algorithmic Operations.
| add_double | div_long | num_rtns_1 |
| add_float | div_short | pipe_cpy |
| add_int | dll_test | ram_copy |
| add_long | exec_test | record_lock |
| add_short | fun_cal | series_1 |
| array_rtns | fun_cal1 | shared_Memory |
| cmd_rtns_1 | fun_cal2 | sieve |
| cmd_rtns_2 | fun_cal15 | sort_rtns_1 |
| cmd_rtns_3 | heap_test | string_rtns |
| creat-clo | jmp_test | sync_disk_cp |
| dir_rtns_1 | matrix_rtns | sync_disk_rw |
| disk_cp | mem_rtns_1 | sync_disk_update |
| disk_rd | mem_rtns_2 | sync_disk_wrt |
| disk_rr | misc_rtns_1 | sync_test |
| disk_rw | mul_double | tcp_test |
| disk_src | mul_float | thread_test |
| disk_wrt | mul_int | trig_rtns |
| div_double | mul_long | udp_test |
| div_float | mul_short | virtual_test |
| div_int | new_raph |
NT Benchmark Results - Certified Server Report
Below we've reproduced a small portion of an AIM Certified Server Report, so you can get a better idea of what they provide. Performance ratings are noted by the Peak and Sustained values listed, and they are defined as:
| System Name | Peak | Sustained | TVI | Price | CPU | Cache | Clock | RAM | Disk |
| AcerAltos 110 | 3307.1 | 2681.3 | 88 | $4,940 | Pentium II (2) | 32K / 512K | 400MHz | 256MB | 2GB, 9ms (4) |
| AcerAltos 19000 Pro4 | 2671.6 | 2336.7 | 86 | $19,993 | Pentium Pro (4) | 16K / 512K | 200MHz | 1GB | 2GB, 9.4ms (9) |
| 9100B | 3830 | 2927.6 | 87 | $7,725 | Pentium II (2) | 32K / 512K | 450MHz | 512MB | 2GB, 9ms (4) |
| AcerAltos 930 | 2549.6 | 2110.7 | 84 | $6,893 | Pentium II (2) | 32K / 512K | 333MHz | 256MB | 2GB, 9.4ms (4) |
| ALR Revolution 2x | 1256.9 | 1216.6 | 87 | $5,709 | Pentium II (2) | 32K / 512K | 266MHz | 128MB | 4.2GB, 12ms (2) |
| ALR Revolution 6x6 | 4542.8 | 4029.3 | 82 | $44,660 | Pentium Pro (6) | 16K / 1MB | 200MHz | 1GB | 4.3GB, 9ms (1) 9GB, 9ms (8) |
| Apricot KT1200 (266MHz) | 1176.8 | 1108.5 | 88 | $7,468 | Pentium II (2) | 32K / 512K | 266MHz | 256MB | 4GB, 8,5ms (3) |
| Apricot KT1200 (333MHz) | 2594.5 | 2566.1 | 75 | $12,418 | Pentium II (2) | 32K / 512K | 333MHz | 512MB | 4GB, 8.5ms (4) |
| Aspen Systems Durango II | 1297.1 | 1251.7 | 84 | $5,805 | DEC 21164 (1) | 16K / 96K / 2MB | 533MHz | 128MB | 4.2GB, 12ms (2) |
| Cubix DP6200 | 1045.7 | 1004.4 | 88 | Pentium Pro (2) | 3K / 512K | 200MHz | 1GB | 4GB, 8ms (1) | |
| General Aviion 8600 (4CPU) | 3842.5 | 2932.9 | 83 | $124,642 | Pentium Pro (4) | 16K / 1MB | 200MHz | 4GB | 9GB, 8ms, 10KRPM (10) |
| General Aviion 8600 (6CPU) | 5083.6 | 3584.6 | 79 | $138,149 | Pentium Pro (6) | 16K / 1MB | 200MHz | 4GB | 9GB, 8ms, 10KRPM (10) |
| General Aviion 8600 (8CPU) | 5747.4 | 3850 | 85 | $151,655 | Pentium Pro (8) | 16K / 1MB | 200MHz | 4GB | 9GB, 8ms, 10KRPM (10) |
| Dell PowerEdge 2200 (192MB) | 1610.2 | 1554.9 | 92 | $5,584 | Pentium II (2) | 32K / 512K | 333MHz | 192MB | 2GB, 9ms (3) |
| Dell PowerEdge 2200 (256MB) | 1640.7 | 1593.7 | 92 | $5,872 | Pentium II (2) | 32K / 512K | 333MHz | 256MB | 2GB, 9ms (3) |
AIM's Subsystem Benchmark For the Windows NT Operating System
The AIM Subsystem Benchmark for Windows NT exercises and measures each component of a computer system running NT. The benchmark uses 73 subtests to generate absolute processing rates, in operations per second, for subsystems, I/O transfers, function calls, and system calls. Test results can be used to compare different machines on a test-by-test basis, or to measure the success or failure of system tuning and configuration changes on a single system. This benchmark yields specific results on a per-test basis.
The individual values shown with the Subsystem benchmark represent how well the subsystem components operate. These values can show how changes in the components, OS, or compiler options affect each subsystem. For example, differences in disk speed are very apparent with a glance at the disk tests. ThE amount of RAM effects the memory tests. A 20 minute test can show the results of adding ram, changing disks, or changing CPUs.
AIM Technologys Workstation Benchmark for Windows NT® is a system level WIN32-compliant Benchmark for the Microsoft Windows NT operating system. It is designed to test overall system performance of standard Windows NT Workstation configurations on Alpha and Intel platforms.
AIM Technology uses Load/Mix Modeling to test how well multithreading systems perform in different application environments. The role of Load/Mix modeling is to allow AIM to apply a set load to any workstation running the Windows NT operating system. The benchmark includes a pre-defined application mix to model the most general uses of workstation systemsn The Workstation Benchmark includes both 2-D and 3-D graphical tests, as well as the ability to run across multiple processors. The primary application mix for the Workstation Benchmark is the General Workstation Mix, which represents a balanced usage of subsystems that are configured as a typical corporate desktop workstation. The major tasks performed by the typical workstation include applications such as spreadsheets and word processors, graphical applications, Internet access, peer-to-peer connections, and email.
For more information on AIM: (800)848-8649 and www.aim.com or email: benchinfo@aim.com . AIM Technology is a wholly-owned subsidiary of Network General Corporation.
SYSmark/NT 4.0 is designed for five basic market segments: corporations that are migrating to Windows NT and need a tool to evaluate computer performance; end users who make purchasing decisions for computers running Windows NT; testing labs and publications whose benchmarking results influence purchasing decisions; systems developers who need to analyze and tune products under development; and VARs/resellers.
Benchmarking workloads within SYSmark/NT 4.0 represent word processing (WP), spreadsheet (SS), project management (PM), computer-aided design (CAD), and presentation graphics programs (Pres.).
.
"The demand for SYSmark/NT, our first NT-based benchmark, exceeded our expectations," says John Sammons, BAPCo's president. "We expect to have a high demand for this product from both those upgrading from our first release and from other NT users seeking performance benchmarks based on real-world business applications."
Workloads for SYSmark/NT 4.0 were developed based on BAPCo's standardized practice of surveying users to determine how they exercise popular applications in day-to-day work. The applications selected for testing had to be able to run across Intel and Alpha architectures. SYSmark/NT 4.0 can generate performance metrics as a composite of all the different applications or for a specific application, such as word processing or spreadsheets.
| Word Processing | Spreadsheets | Project Management | Computer Aided Design | Presentation Graphics | Desktop Publishing |
| Microsoft
Word 6.0 (native 32-bit on all architectures) |
Microsoft Excel 5.0 (native 32-bit on all architectures) | Welcom
Software Technology Texim Project 2.0e (native 32-bit on all architectures) |
Orcad
Layout for Windows 7.0 (PCB design tool) (native 32-bit on all architectures) |
Microsoft
PowerPoint 4.0 (16-bit Windows emulation) |
Adobe Pagemaker 6.0 |
SYSmark/NT 4.0 is available for those who want to do their own benchmarking at a nominal fee. The benchmark was developed and is fully supported by the current BAPCo membership, which includes: AER Energy Resources, Amdahl, Apricot Computers, AT&T Global Information Solutions, Client/Server Labs LLC, Compaq, Dell, Digital Equipment Corp., Duracell, EMAP Computing Labs, Gateway2000, Hewlett-Packard, IBM, InfoWorld, Intel, Lotus, Microsoft, Motorola, NEC Technologies, Texas Instruments, Unisys and Ziff-Davis Labs.
The following summary of the top 20 NT performers using the Sysmark32 benchmark comes from http://www.ideasinternational.com/benchmark/bapco/sysnt4.html --
Rank |
System |
CPU |
CLK |
Mem |
SYSmark/NT ( ver. 4.0 ) |
SS |
PM |
WP |
Pres. |
CAD |
1 |
IBM IntelliStation M Pro 6889 -modified | Intel Pentium II (Dual) | 450 | 128MB | 435 |
438 |
479 |
372 |
496 |
403 |
2 |
HP Kayak XU 6/400 Slot2 DP fastRAID | Intel Pentium II Xeon (Dual) | 400 | 128MB | 418 |
427 |
479 |
356 |
461 |
378 |
3 |
HP Kayak XU 6/400 DP 2x4.5GB in RAID0 | Intel Pentium II (Dual) | 400 | 128MB | 408 |
417 |
451 |
354 |
463 |
368 |
4 |
IBM IntelliStation M Pro 6889 -modified | Intel Pentium II (Dual) | 400 | 128MB | 407 |
421 |
448 |
352 |
457 |
368 |
5 |
HP Kayak XU 6/400 DP 9.1GB | Intel Pentium II (Dual) | 400 | 128MB | 399 |
397 |
451 |
335 |
459 |
366 |
6 |
IBM IntelliStation M Pro 6889 -modified | Intel Pentium II (Dual) | 400 | 128MB | 397 |
396 |
445 |
336 |
450 |
368 |
7 |
Seattle -modified | Intel Pentium II | 450 | 64MB | 394 |
325 |
513 |
325 |
425 |
413 |
8 |
HP Kayak XU 6/400 4.5 GB | Intel Pentium II | 400 | 128MB | 379 |
321 |
486 |
325 |
409 |
377 |
9 |
DP5400 | Intel Pentium II (Dual) | 400 | 260MB | 373 |
371 |
406 |
308 |
443 |
353 |
10 |
Intel SE440BX -modified | Intel Pentium II | 400 | 64MB | 368 |
307 |
481 |
310 |
393 |
375 |
11 |
Gateway E-4200 | Intel Pentium II | 400 | 128MB | 362 |
298 |
471 |
297 |
394 |
379 |
12 |
Compaq Deskpro EP Series 6400 | Intel Pentium II | 400 | 64MB | 341 |
280 |
430 |
280 |
377 |
362 |
13 |
HP Kayak XU 6/333 DP 2x4.5GB Fast Raid -modified | Intel Pentium II (Dual) | 333 | 128MB | 340 |
351 |
380 |
297 |
376 |
304 |
14 |
IBM IntelliStation M Pro 6898 -modified | Intel Pentium II (Dual) | 333 | 128MB | 340 |
358 |
374 |
302 |
370 |
303 |
15 |
Duracom Performa 6221 | Intel Pentium II | 400 | 64MB | 339 |
270 |
449 |
281 |
366 |
360 |
16 |
IBM IntelliStation M Pro 6898 -modified | Intel Pentium II (Dual) | 333 | 128MB | 338 |
352 |
375 |
298 |
369 |
306 |
17 |
IBM IntelliStation M Pro 6898 -modified | Intel Pentium II (Dual) | 333 | 128MB | 337 |
346 |
379 |
291 |
375 |
303 |
18 |
Nova P-II 400 | Intel Pentium II | 400 | 64MB | 337 |
277 |
428 |
276 |
362 |
369 |
19 |
HP Kayak XU 6/333 DP 4.5GB | Intel Pentium II (Dual) | 333 | 128MB | 335 |
339 |
378 |
292 |
374 |
302 |
20 |
Intergraph TDZ 2000 | Intel Pentium II (Dual) | 333 | 128MB | 334 |
337 |
375 |
292 |
368 |
304 |
For more information, contact BAPCo, 2200 Mission College Blvd., RN2-02, Santa Clara, CA 95052; phone: 408-988-7654; fax: 408-765-4920; Internet: http://www.bapco.com.
ByteMark
This data is based on a usenet post to comp.sys.ibm.pc.hardware.chips by Eric Mintz. Byte Magazine's "Byte CPU DOS 32" is used for testing. It's popular because it's relatively easy to run and report, it's easy to get and free, and they publish the source code. But it really is a processor-only type of test.
# CPU Int FP NS SS Bit FPem Four Assg Idea Huff NNet LUde 3. i486DX2-66 0.41 0.25 0.43 0.27 0.43 0.50 0.28 0.38 0.40 0.49 0.20 0.30 6. i486DX2-66 0.42 0.26 0.42 0.36 0.43 0.50 0.28 0.38 0.40 0.48 0.20 0.31 2. Am486DX2/80 0.51 0.31 0.52 0.44 0.52 0.60 0.34 0.46 0.48 0.57 0.25 0.34 11. AM5x86-133 0.58 0.39 0.80 0.89 0.87 0.50 0.28 0.37 0.40 0.48 0.42 0.50 15. AMD486dx4-100 0.60 0.38 0.65 0.74 0.41 0.55 0.60 0.71 0.28 0.39 0.59 0.36 12. AM5x86-133 0.61 0.25 0.80 0.89 0.87 0.81 0.28 0.37 0.40 0.47 0.21 0.26 5. Am486DX4/100 0.61 0.37 0.63 0.41 0.65 0.74 0.42 0.56 0.60 0.72 0.29 0.42 1. iPOD-83 0.79 0.85 0.81 0.47 0.89 0.85 0.89 0.86 0.91 0.86 0.85 0.82 10. iP5-75 0.84 0.96 0.88 0.84 0.83 0.84 0.85 0.88 0.84 0.77 0.91 1.17 9. AMD5x86-133 0.86 0.49 0.79 0.89 0.87 1.01 0.57 0.74 0.80 0.95 0.42 0.50 4. Cx5x86-120 0.97 0.69 0.86 1.46 0.94 1.02 0.81 0.79 0.89 0.93 0.55 0.73 13. AM5x86-160 1.01 0.58 0.87 1.07 1.03 1.21 0.67 0.84 0.95 1.14 0.51 0.57 14. AM5x86-160 1.02 0.58 0.91 1.07 1.04 1.21 0.68 0.87 0.95 1.14 0.51 0.58 17. AM5x86-160 1.03 0.59 0.94 1.07 1.04 1.21 0.68 0.88 0.95 1.15 0.51 0.59 7. iP5-100 1.10 1.21 1.12 1.08 1.10 1.12 1.12 1.16 1.11 1.01 1.16 1.35 16. P100 1.17 1.12 1.11 1.13 1.13 1.17 1.12 1.02 1.20 1.53 1.12 1.27 8. iP5-120 1.30 1.38 1.27 1.28 1.33 1.35 1.35 1.36 1.33 1.21 1.35 1.45 18. iP5-166 1.80 1.74 1.82 1.73 1.84 1.87 1.86 1.87 1.84 1.68 1.89 1.48 Int = Overall Integer FP = Overall Floating Point NS = Numeric Sort SS = String Sort Bit = Bitfield FPem = Floating Point Emulation Four = Fourier Assg = Assignment Idea = Idea Huff = Huffman NNet = Neural Net LUde = LU Decomposition
SPEC
What is SPEC
SPEC, the Standard Performance Evaluation Corporation, is a non-profit corporation formed to "establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers" (from SPEC's bylaws). SPEC's founders believe that the user community will benefit greatly from an objective series of applications-oriented tests, that can serve as common reference points and be considered during the evaluation process. While no one benchmark can fully characterize overall system performance, the results of a variety of realistic benchmarks can give valuable insight into expected real performance. The list of current members of SPEC includes: Open Systems Group (OSG): Amdahl, AT&T, Auspex Systems, Bull, Compaq Computer, Cray Research, Dansk Data Elektronik, Data General, Digital Equipment, Electronic Data Systems, FirePower Systems, Fujitsu, HAL Computer Systems, Hewlett-Packard, Hitachi, IBM, Intel, Intergraph, International Computer (ICL), Locus Computing, Motorola, Network Appliance, Nikkei Datapro, Novell, Olivetti, Pyramid Technology, Ross Technology, Siemens Nixdorf Informationssysteme, Silicon Graphics, Sun Microsystems, Tricord Systems, Unisys, and ZIFF Davis Publishing.
The Open Systems Group associates includes: Center for Scientific Computing (Finland), Defense Logistics Agency / Systems DeSign Center, Leibniz-Rechenzentrum (Germany), NASA AMES Research Center, National Taiwan University (Taiwan), Oregon Graduate Institute, OSF Research Institute, Princeton University, Technische Universitaet Chemnitz-Zwickau (Germany), University of Aizu (Japan), University of California at Berkeley. (2) High Performance Group (HPG): Convex Computers, Cray Research, Digital Equipment, Electronic Data Systems, Fujitsu America, Hewlett-Packard, International Supercomputing Technology Institute (ISTI, France), Kuck & Associates, NEC/HNSX Supercomputer, Silicon Graphics, and Sun Microsystems.
Legally, SPEC is a non-profit corporation registered in California. SPEC basically performs two functions:
The SPEC organization really comprises two groups, each with their own benchmarks: - Open Systems Group (OSG) which covers benchmarks in a UNIX, NT, or VMS environment. The High Performance Computing Group (HPCG) covers benchmarking in a numeric computing environment, with emphasis on high-performance numeric computing. Many people think about the OSG benchmarks only (often: the CPU benchmarks only) when they hear "SPEC", as these benchmarks are indeed the best known ones of SPEC's benchmarks.
How to Contact SPEC
SPEC (Standard Performance Evaluation Corporation)
10754 Ambassador Drive, Suite 201
Manassas, VA 22110, USA
Phone: (703) 331-0180 Fax: (703) 331-0181
E-Mail: info@spec.org
Ziff-Davis Benchmarks
Ziff-Davis provides 3 different benchmarks at no charge that test application servers (ServerBench), file servers (NetBench), and web servers (WebBench). These are summarized here in terms of what they are, what they measure, and what benchmarks tricks are possible to improve scores.
ServerBench 4.0 - http://www.zdnet.com/zdbop/svrbench/svrbench.html
What Is ServerBench? ServerBench measures the performance of application servers in a client/server environment. It provides an overall score for a server and individual scores for the clients, which are PCs running Windows® 95 or Windows NT®. Test runs are started and monitored from the controller, which is a PC running Windows 95 or Windows NT.
TPS: The Units Used To Report Scores Results are in TPS, or transactions per second. Each client measures how long each transaction takes and how many transactions ran. The client calculates its TPS score by dividing the total number of transactions by the total amount of time for completion. ServerBench combines the individual client TPS scores to calculate the overall server score.
ServerBench provides an overall measure of how a servers' performance stacks up against others if the the standard system test0suite was run. If one of ServerBenchs subsystem test suites is run, the results obtrained indicate how well that server subsystem is performing. The higher the score, the better your application server performed.
What Is A "Transaction" A transaction consists of the request a client sends to the server, the response it gets back, and the time it takes from the moment the client sends the request until it receives the reply from the server. ServerBench breaks down the time the transaction spends traveling along the network to and from the server, waiting in a queue on the server to receive a service, and receiving service.
How Performance Is Measured ServerBench uses a weighted harmonic to calculate the overall server performance score. By using a harmonic mean, the scores for different transactions can be combined to create a single representative score. Different transactions are weighted based on how often clients request each transaction in one iteration of a mix.
To calculate the overall server TPS score, the ServerBench benchmark:
· Tracks the amount of time each transaction takes to complete.
· Tallies the number of completed transactions. Incomplete transactions or transactions that began during the Ramp up or Ramp down periods at the start and end of a benchmark run are not counted.
· Creates a total TPS score for each transaction by adding together the TPS score for each client.
· Uses a weighted harmonic mean to turn the total TPS scores into a single overall server score.
Setting Test Parameters To Get The Best Results Adjusting the test parameters for different mixes can also influence results. Specifically, Ziff-Davis points out:
· Number Of Transaction Iterations. The higher the value, the greater the weight that transaction will have`in the test results. In addition, the more times a client requests a transaction during an iteration of a mix, the longer one iteration of the mix takes.
· Number Of Test Iterations. The more test iterations specified, the longer the transaction. lower TPS scores. Lower TPS scores than expected could be seen because each client measures the amount of time it takes the server to get a transaction and then send a reply.
· Number Of Iterations For The Processor Test. The more iterations specified as the Total Size value for the Processor test, the larger the CPU load that the test places on the system. By creating tests with different Processor test iterations, improvements are made in the way the benchmark stresses the processor subsystem on servers with different processor power.
· Data Segment Size. When the size is increased, the amount of file data accessed during a test is increased. This has a significant impact on the file system cache. As you add memory to the server, this parameter should be increased so as to stress the disk subsystem.
· Segment Access Ratio. This is an indication of how spread out the file data is across more physical disk space without increasing the amount of data being accessed.
Creating Test Mixes To Stress The Server If the goal is to stress the server to the point where TPS drops off, then these "hints" from Ziff-Davis will likely help:
Determining What To Test On Your System When studying the different tests, consider what would be the most useful information desired about a system. For example:
Additional
information: To
see the effects of changing parameters, change only
one parameter at a time between benchmark tests. That
policy will let you gauge each parameter's affect on
performance. The server parameters to consider changing
are:
|
Factoring In Multitasking Multitasking is at the core of the ServerBench tests. Many tasks share processing time in a multitasking environment. Tasks are swapped in and out of memory by the processor. Many of the machines ServerBench and the other benchmarks run on use multiple processors. The following, though, is a single processor example.
Each mix contains a fixed number of clients. The mix tells the clients which specific transactions to send to the server. The clients begin the test and start sending transaction requests to the server. The server, instead of telling the clients to wait while it handles a single client's request, takes all the transactions from all the clients. Because a processor can only respond to one request at a time, the server gathers up all the other requests and puts them in a queue to wait for the processor time they need. If there are many clients sending requests, the queues can get unruly. Meanwhile, as the clients wait for the server to reply to their tests, they're running a clock and calculating how much time the server is taking.
By knowing how many clients are competing for processing time and how much time it takes for them to get service, the throughput for the tests can be calculated and the point at which throughput drops off can also be identified.
Figuring Out The Knees In The Curves A typical curve that plots a server's total TPS throughput against the number of active clients is shown below. The point at which the number of clients is increasing but the throughput starts to level off is called the knee of the curve.
A Results Curve Containing a Single Knee

As the number of clients sending requests to a server increases, the total throughput on the server goes up. This is because at low client counts, the server is not being utilized to its full potentiall. Some of the components of a computer system will remain idle.
With a server, as the number of client requests increases, the overhead of managing the requests also increases. The additional requests take more memory, there is more work for the scheduler, there is only a limited bandwidth for the network, and so on. Eventually, the throughput starts decreasing, indicating the server reached the knee of the curve. This is the point where the contention for resources and the overhead of managing the resources causes throughput to decrease.
A server's configuration can affect where the knee of the curve occurs. Peak throughput can often be improved by:
There are three general types of curves that you will see with ServerBench as well .
A Results Curve With A Double Knee

To get a valid measure of a server's performance, run the benchmark test until you see the knee in the server's performance curve. This means the client load should be increased until the TPS scores begins to level off. In the standard test suites, add clients incrementally as the tests run (i.e., start with 1 client, then run the test with 4 clients, next 8 clients, and so on).
If the ServerBench test suites are run and don't hit a knee, then the intensity of the tests needs to be intensified. Try:
How To Evaluate Throughput In Terms Of Your Server As you examine at the throughput graphs for different servers, keep in mind that the server with the best peak throughput number may not be the best server for your requirements. Exactly how the server handles intense loads must be studied. ServerBench measures how much the throughput degrades as the load is increased on a server.
An example: Suppose two servers are being compared. Server A reaches a knee at 30 clients and then experiences a sharp drop in throughput. Server B, however, reaches a knee at 25 clients and then begins a very slow drop in throughput. If you only have 25 to 30 users, then Server A is probably the server you want. But, if you have a workload equivalent to 40 clients, Server B may be the better server for you. This is because, while the throughput on Server B drops off sooner than on Server A, it doesn't deteriorate as sharply. This says that probably more users will get reasonable service. Since Server A's throughput shows such a sharp drop, then if you have a workload equivalent to more that 30 clients, some of those clients will get really poor service.
System Throughput Versus Client Response When looking at results, you may notice that the system throughput as shown by the overall score is different from the throughput for the individual clients. In other words, system throughput may increase even as client response time decreases. Don't worry. This is normal.
To understand how this happens, consider the following example. Imagine you are at a bank where each transaction takes ten minutes. There are four tellers and you. From your point of view, the process took you ten minutes (i.e., that is your response time). From the bank's point of view, one transaction occurred in ten minutes; thus, the transaction throughput was one in ten minutes or .1 per minute. Now, go through the same process but this time you are one of four customers. You still have to wait ten minutes, but the throughput is now four in ten minutes or .4 per minute.
Consider this scenario one more time but with eight customers instead of four. The tellers are working harder (probably, they're not asking people how they are any more), so the transaction time is now eight minutes. This means that, if you were at the end of the line, you had to wait sixteen minutes. This response time seems bad to you; however, throughput from the bank's point of view is now eight in sixteen minutes or .5 per minute. In other words, the bank's throughput improved even though your response time increased.
Similarly, suppose you are running the benchmark on a four-processor totally symmetric, multiprocessing system and a test mix is run that only does processor tests. It takes the first client 150 milliseconds to run the test. It takes the same amount of time for the second client and also for the third client. Thus, from the client's point of view, nothing has changed ¾ the test took the same amount of time for each client. From the server's point of view, it has tripled its throughput. This is a case where the overall system throughput improves but the client response times stay the same. In evaluating a server, consider both the system throughput and the client response time. You want to find out how evenly the server handles the clients.
Processor Scaling With a multiprocessor system, consider, too, the effect that the number of processors has on the scores. Within a certain range, the morE processors a system has, the better its scores. A point will be reached where the overhead required to manage more processors outweighs the advantage of having them. Finding how much throughput improves in relation to the number of processors is called processor scaling.
The multiprocessor systems likely to be used with ServerBench and other NT benchmarks are tightly coupled processors. This means all the processors have an equal access to memory. While access to memory is an advantage, it also results in overhead. The system has to coordinate access to memory. This means the hardware and software spend more time making this processing system work.
For example, with two processors, you would expect to see throughput double. However, because of the overhead of managing an additional processor, throughput may only increase by a rate of something like 1.8, not 2. Even at that rate, there's a definite advantage to having two processors. Eventually, though, as processors are added, the server's throughput rate will start dropping instead of increasing. At that point, the peak of processor scaling will have been passed and we are better off not adding another processor.
While bad processor scaling or even a lack of processor scaling is not a problem that would be encountered often, it is something to think about as scores are evaluated. Processor scaling is especially important when comparing servers with different numbers of processors; for example, if one alternative being evaluated is upgrade the current server with more CPU engines.
ServerBench Sample Reports




NetBench 5.01 - http://www.zdnet.com/zdbop/netbench/netbench.html
NetBench is a portable Ziff-Davis benchmark program that measures the performance of file servers as they handle requests from DOS, Windows® 95, Windows for Workgroups, Windows NT®, and Mac® OS clients.
How NetBench Measures NetBench provides a measure of how well a server handles network file I/O by having each client in the test make repeated requests to the server for file I/O service. Each client records the number of bytes of data moved and divides this number by the amount of time required to move the data. NetBench totals all the client throughput scores to determine the overall throughput for the server.
NetBench does not distinguish between the server subsystems. For example, NetBench cant determine the relative performance differences between the disk and network.
The Units NetBench Uses To Report Its Scores NetBench reports its overall server scores in both bytes per second and megabits per second. The key unit of measure for NetBench, though, is bytes per second. NetBench uses bytes per second to report all other test information, such as individual client throughputs.
To determine the overall server scores, NetBench totals the individual throughput for all the clients.. The higher the score, the better the file server performed. Some publications convert NetBenchs bytes per second results to bits per second, megabits per second, or kilobits per second.
Some Concepts That Play A Role In NetBench's Results Here's a summary of some of the NetBench concepts:
Suggestions For Improving Your Server's Results Because NetBench tests a system as a whole instead of isolating the individual components of a server, everything about a system can affect the scores. Both hardware and software factors can influence NetBench results. For example, CPU speed obviously affects results, though it might not be the primary factor. Perhaps less obvious, though, is that a fragmented disk can slow down the NetBench tests as well.
Server scores can be improved if you:
Determining What To Test On Your System When looking at NetBench tests, consider what would be the most useful information you can get about the system. For example:
WebBench 1.1 - http://www.zdnet.com/zdbop/webbench/webbench.html
WebBench 1.1 provides a way to measure the performance of Web servers. WebBench uses client PCs to simulate Web browsers. However, unlike actual browsers, the clients dont display the files that the server sends in response to their requests. Instead, when a client receives a response from the server, it records the information associated with the response and then immediately sends another request to the server.
WebBench In Brief When WebBench is installed, a set of HyperText Markup Language (HTML) files is copied into the Web servers document root (i.e., the location where the Web server looks for its HTML files). The WebBench clients use HyperText Transfer Protocol (HTTP) to request these files from the server. Each client keeps a count of how many bytes the server transfers to it, how many requests succeed, and how many requests fail during either the connection phase of the request or the transfer phase. When the test mix ends, WebBench takes the individual client results, combines them, and produces two overall server scores: requests per second and throughput as measured in bytes per second. WebBenchs results tables also include details about how each client performed.
An Overview Of WebBench

How WebBench Measures Performance WebBench measures a Web servers performance by having each client issue HTTP requests to the server. The server responds to these requests based on the URL in the HTTP request. The URL can point to any of the typical HTML elements ¾ files, graphics, programs, and so on. For the standard test suites in WebBench 1.1, the server always sends an HTML file in response to a client request.
The Units WebBench Uses To Report Its Scores WebBench produces two overall Web server scores: requests per second and throughput as measured in bytes per second. Requests per second is a basic measure of the client/server interaction. Clients count only completed requests; i.e., the client sends a request to the server and the server sends a response to the client. By looking at the servers requests per second score, you can get a pretty good idea of how many hits per day that server can handle. (Hits are one way people measure a Web servers capacity.) Because WebBench acts as a stress test, it loads the server with requests as fast as the clients can issue them. As a result, the requests per second score that WebBench generates is higher than the typical load serviced by most Web servers.
The servers throughput score indicates how many bytes per second the server is moving to the clients. The value seen for throughput depends in part on the distribution of the file sizes in the workload file. If the clients are making a higher percentage of requests for large files, then they will be receiving more bytes from the server and, at least initially, throughput will go up. Once either the network or the server becomes saturated, the throughput will flatten out. If the client requests a greater quantity of smaller files, then it will make more requests per second, but the throughput wont be as high because the server wont move as many bytes. WebBench counts as throughput only the number of bytes in the HTML file the Web server sends the clients. Note, WebBench doesnt include the bytes from the HTTP file header, the TCP/IP header, or the Ethernet header when it records its throughput scores.
To calculate the overall server scores, WebBench totals the individual requests per second and throughput scores for all the clients. The higher the score, the better the file server performed.
Some Concepts That Play A Role In The Results Some of the WebBench concepts that are important to consider are:
| Additional
information: Adding clients increases the total
requests per second and throughput ... up to a point.
When the overhead of managing the additional clients
outweighs the advantage of having more clients, these two
numbers quit increasing. This is the point in the results
curve where it starts to flatten out. With WebBenchone
normally see a rising slope at low client counts followed
by a flat line once the server becomes saturated. Herere the types of results curves seen in WebBench testing
|
We summarize the 3 Ziff-Davis benchmarks here.
| ServerBench | NetBench | WebBench |
| Measures application servers. | Measures file servers. | Measures Web servers. |
| Provides an overall TPS score for your server and individual scores for clients. The scores show how well servers handle client requests for a variety of operations. | Provides an overall I/O throughput score for your server and individual scores for clients. The scores show how well servers handle client requests for network file operations. | Provides two overall server scores (requests per seconds and throughput as measured in bytes) as well as individual client scores. These scores show how well a server handles client HTTP requests and how many bytes per second the server is moving to the client. |
| Accepts Windows 95 and Windows NT clients. | Accepts DOS, 16-bit Windows, and 32-bit Windows PC clients and Mac® OS clients. | Accepts Windows 95 and Windows NT clients. |
| Requires a server program, a controller program, and a client program. | Requires only a controller program and a client program. | Requires
a controller program and a client program; however, you
do need to place files on the server. In addition, the
server must be running a third-party Web server program. |
| Requires you to use a Winsock 1.1 compliant TCP/IP stack on the controller and the clients. | Runs on top of whatever protocol your system uses. | Requires you to use a Winsock 1.1 compliant TCP/IP stack on the controller and the clients. |
| Uses proprietary programs to communicate only with itself. | Accesses data through a publicly available API. | Accesses data through HyperText Transfer Protocol (HTTP). |
| Runs only on specific server platforms. | Runs on any server that provides shared file access to the controller and clients. | Runs static (.HTML only) tests on any Web server; runs CGI, ISAPI, NSAPI, and IntranetWare 4.11 NLM tests on specific server platforms only. |
| CPU power and disk I/O play a very large role in how well your server performs. | The server's disk I/O speed and the network I/O speed have the most influence on test scores | The server's CPU power, the servers memory size, and the network I/O speed all affect your Web servers performance. |
| Reports scores as TPS (transactions per second). | Reports scores as bytes per second. | Reports scores as requests per second and throughput as measured in bytes per second. |
Summary
By no means should the reader think that the benchmark suites covered in this article is a comlete list. New benchmark techniques are always being developed, and we hope to cover two addtional tools - Client/Server Solutions' Benchmark Factory 97 and Bluecurve's Dynameasure - in depth in a subsequent article.
Our objective was to show some of the myriad of current, state-of-the-art benchmarks for Window NT®. We must point out that benchmarks reflect the performance of the system that they run on, and the underlying software's characteristics. While these benchmarks describe some of the standard processor ratings and benchmarks that currently exist in the industry, the potential benchmark user must be careful when assuming that the results achieved in an industry benchmark will apply to their own, specific computing environment. Clearly, considerable work has been done in developing different benchmark suites that correspond to different workloads, each yielding different processor ratings. But be sure to make sure that any differences between what was run and what you will run can be explained.