Click HERE to download a .ZIP file with the WORD97 and PowerPoint97 files

NT Benchmarking

Dr. Bernard Domanski

The City University of New York/College of Staten Island

Email: {domanski [at] mail [dot] csi [dot] cuny [dot] edu } Phone: 732-303-1500

Overview

Our objective here is simply to identify the current state-of-the-art in benchmarking the speed of processors capable of running Window NT®. We will describe some of the standard processor ratings and benchmarks that currently exist in the industry, and, where possible, we will offer ratings of various processors. Our research indicates that a fair amount of work has been done in developing different benchmark suites that correspond to different workloads, each yielding different processor ratings. Finally, pointers to World Wide Web sites will be provided for the reader to access more information on a particular subject.

Existing Benchmarks -

AIM's Server Benchmark For The Windows NT Operating System

AIM Technology's Server Benchmark for Windows NT is a system level WIN32-compliant Benchmark for Windows NT. This benchmark utilizes a load-mix benchmarking technology and is designed to test overall system performance of standard Windows NT Server configurations on Alpha, Intel, and PowerPC platforms.

AIM Technology uses load-mix benchmarking to test how well multi-user systems perform under different application loads. The role of load-mix benchmarking is to apply any type of load to a system running NT. The benchmark includes a pre-defined set of application mixes to model the most general uses of server systems. The two initial application mixes for the Server Benchmark are the Domain Server Mix and File Server Mix. AIM plans to develop additional application mixes in future months.

Standard Mixes

Domain Server Mix/Windows NT

The AIM Domain Server Mix/Windows NT is composed of 50 different tests from what appears to be all subsystem categories. The Domain Server Mix represents a balanced usage of subsystems that are configured as a reportedly typical enterprise shared servur. (Aside - what is really typical? ) The major tasks performed by this typical domain server include email, shared applications such as spreadsheets and word processing, light file transfers, network routing and packet forwarding, and network maintenance.

File Server Mix/Windows NT

The AIM File Server Mix/Windows NT is composed of 37 different tests from nearly all major subsystem categories. The File Server Mix represents a balanced usage of subsystems for a gateway file server. The major tasks performed include file transfers of various sizes (both synchronous and asynchronous), network routing and packet forwarding, system security and access permission checkinG, heavy memory usage and IPC calls.

Custom User Mix

If an application load is thought of as unique, the benchmark is designed to allow customization so that it comes closer`to modeling the actual environment using what is called an application mix template. This provides more control over how to the mix of application loads, giving a more in-depth picture of how systems will run in the unique environment. The application mix can be customized by:

 

Certified Server Reports

AIM uses these benchmark results to generate third-party independent Certified Reports for Windows NT. AIM produces a Certified Report for Windows NT by combining the results from Server Benchmark, the Workstation Benchmark and the AIM Subsystem Benchmark.

These reports detail the performance results for the selected system running a specific application mix. The Server Report for NT also includes WinNT Peak Performance in application jobs per minute (the system's highest performance level when a significant amount of CPU, RAM, and disk caching is used) and the WinNT Sustained Performance (a measurement taken at maximum system capacity).

Aim Server Tests For Windows NT

The following table lists 59 different subsystem tests for the AIM Server Benchmark for Windows NT. They test the configuration in the following categories: Floating Point Performance, Integer Math Performance, Disk Performance, InterProcess Communication, Directory Routines, System and Memory Performance, Run Time Support, and Algorithmic Operations.

 

add_double div_long num_rtns_1
add_float div_short pipe_cpy
add_int dll_test ram_copy
add_long exec_test record_lock
add_short fun_cal series_1
array_rtns fun_cal1 shared_Memory
cmd_rtns_1 fun_cal2 sieve
cmd_rtns_2 fun_cal15 sort_rtns_1
cmd_rtns_3 heap_test string_rtns
creat-clo jmp_test sync_disk_cp
dir_rtns_1 matrix_rtns sync_disk_rw
disk_cp mem_rtns_1 sync_disk_update
disk_rd mem_rtns_2 sync_disk_wrt
disk_rr misc_rtns_1 sync_test
disk_rw mul_double tcp_test
disk_src mul_float thread_test
disk_wrt mul_int trig_rtns
div_double mul_long udp_test
div_float mul_short virtual_test
div_int new_raph  

NT Benchmark Results - Certified Server Report

Below we've reproduced a small portion of an AIM Certified Server Report, so you can get a better idea of what they provide. Performance ratings are noted by the Peak and Sustained values listed, and they are defined as:

 

 

 

System Name Peak Sustained TVI Price CPU Cache Clock RAM Disk
AcerAltos 110 3307.1 2681.3 88 $4,940 Pentium II (2) 32K / 512K 400MHz 256MB 2GB, 9ms (4)
AcerAltos 19000 Pro4 2671.6 2336.7 86 $19,993 Pentium Pro (4) 16K / 512K 200MHz 1GB 2GB, 9.4ms (9)
9100B 3830 2927.6 87 $7,725 Pentium II (2) 32K / 512K 450MHz 512MB 2GB, 9ms (4)
AcerAltos 930 2549.6 2110.7 84 $6,893 Pentium II (2) 32K / 512K 333MHz 256MB 2GB, 9.4ms (4)
ALR Revolution 2x 1256.9 1216.6 87 $5,709 Pentium II (2) 32K / 512K 266MHz 128MB 4.2GB, 12ms (2)
ALR Revolution 6x6 4542.8 4029.3 82 $44,660 Pentium Pro (6) 16K / 1MB 200MHz 1GB 4.3GB, 9ms (1) 9GB, 9ms (8)
Apricot KT1200 (266MHz) 1176.8 1108.5 88 $7,468 Pentium II (2) 32K / 512K 266MHz 256MB 4GB, 8,5ms (3)
Apricot KT1200 (333MHz) 2594.5 2566.1 75 $12,418 Pentium II (2) 32K / 512K 333MHz 512MB 4GB, 8.5ms (4)
Aspen Systems Durango II 1297.1 1251.7 84 $5,805 DEC 21164 (1) 16K / 96K / 2MB 533MHz 128MB 4.2GB, 12ms (2)
Cubix DP6200 1045.7 1004.4 88 Pentium Pro (2) 3K / 512K 200MHz 1GB 4GB, 8ms (1)
General Aviion 8600 (4CPU) 3842.5 2932.9 83 $124,642 Pentium Pro (4) 16K / 1MB 200MHz 4GB 9GB, 8ms, 10KRPM (10)
General Aviion 8600 (6CPU) 5083.6 3584.6 79 $138,149 Pentium Pro (6) 16K / 1MB 200MHz 4GB 9GB, 8ms, 10KRPM (10)
General Aviion 8600 (8CPU) 5747.4 3850 85 $151,655 Pentium Pro (8) 16K / 1MB 200MHz 4GB 9GB, 8ms, 10KRPM (10)
Dell PowerEdge 2200 (192MB) 1610.2 1554.9 92 $5,584 Pentium II (2) 32K / 512K 333MHz 192MB 2GB, 9ms (3)
Dell PowerEdge 2200 (256MB) 1640.7 1593.7 92 $5,872 Pentium II (2) 32K / 512K 333MHz 256MB 2GB, 9ms (3)

 

 

 

AIM's Subsystem Benchmark For the Windows NT Operating System

The AIM Subsystem Benchmark for Windows NT exercises and measures each component of a computer system running NT. The benchmark uses 73 subtests to generate absolute processing rates, in operations per second, for subsystems, I/O transfers, function calls, and system calls. Test results can be used to compare different machines on a test-by-test basis, or to measure the success or failure of system tuning and configuration changes on a single system. This benchmark yields specific results on a per-test basis.

The individual values shown with the Subsystem benchmark represent how well the subsystem components operate. These values can show how changes in the components, OS, or compiler options affect each subsystem. For example, differences in disk speed are very apparent with a glance at the disk tests. ThE amount of RAM effects the memory tests. A 20 minute test can show the results of adding ram, changing disks, or changing CPUs.

AIM Technology’s Workstation Benchmark for Windows NT® is a system level WIN32-compliant Benchmark for the Microsoft Windows NT operating system. It is designed to test overall system performance of standard Windows NT Workstation configurations on Alpha and Intel platforms.

AIM Technology uses Load/Mix Modeling to test how well multithreading systems perform in different application environments. The role of Load/Mix modeling is to allow AIM to apply a set load to any workstation running the Windows NT operating system. The benchmark includes a pre-defined application mix to model the most general uses of workstation systemsn The Workstation Benchmark includes both 2-D and 3-D graphical tests, as well as the ability to run across multiple processors. The primary application mix for the Workstation Benchmark is the General Workstation Mix, which represents a balanced usage of subsystems that are configured as a typical corporate desktop workstation. The major tasks performed by the typical workstation include applications such as spreadsheets and word processors, graphical applications, Internet access, peer-to-peer connections, and email.

For more information on AIM: (800)848-8649 and www.aim.com or email: benchinfo@aim.com . AIM Technology is a wholly-owned subsidiary of Network General Corporation.

SYSmark/NT 4.0 is designed for five basic market segments: corporations that are migrating to Windows NT and need a tool to evaluate computer performance; end users who make purchasing decisions for computers running Windows NT; testing labs and publications whose benchmarking results influence purchasing decisions; systems developers who need to analyze and tune products under development; and VARs/resellers.

Benchmarking workloads within SYSmark/NT 4.0 represent word processing (WP), spreadsheet (SS), project management (PM), computer-aided design (CAD), and presentation graphics programs (Pres.).

.

"The demand for SYSmark/NT, our first NT-based benchmark, exceeded our expectations," says John Sammons, BAPCo's president. "We expect to have a high demand for this product from both those upgrading from our first release and from other NT users seeking performance benchmarks based on real-world business applications."

Workloads for SYSmark/NT 4.0 were developed based on BAPCo's standardized practice of surveying users to determine how they exercise popular applications in day-to-day work. The applications selected for testing had to be able to run across Intel and Alpha architectures. SYSmark/NT 4.0 can generate performance metrics as a composite of all the different applications or for a specific application, such as word processing or spreadsheets.

 

Workloads based on these applications for Intel and Alpha platforms are included in SYSmark/NT 4.0:

Word Processing Spreadsheets Project Management Computer Aided Design Presentation Graphics Desktop Publishing
Microsoft Word 6.0
(native 32-bit on all architectures)
Microsoft Excel 5.0 (native 32-bit on all architectures) Welcom Software Technology Texim Project 2.0e
(native 32-bit on all architectures)
Orcad Layout for Windows 7.0 (PCB design tool)
(native 32-bit on all architectures)
Microsoft PowerPoint 4.0
(16-bit Windows emulation)
Adobe Pagemaker 6.0

 

SYSmark/NT 4.0 is available for those who want to do their own benchmarking at a nominal fee. The benchmark was developed and is fully supported by the current BAPCo membership, which includes: AER Energy Resources, Amdahl, Apricot Computers, AT&T Global Information Solutions, Client/Server Labs LLC, Compaq, Dell, Digital Equipment Corp., Duracell, EMAP Computing Labs, Gateway2000, Hewlett-Packard, IBM, InfoWorld, Intel, Lotus, Microsoft, Motorola, NEC Technologies, Texas Instruments, Unisys and Ziff-Davis Labs.

The following summary of the top 20 NT performers using the Sysmark32 benchmark comes from http://www.ideasinternational.com/benchmark/bapco/sysnt4.html --

Rank

System

CPU

CLK

Mem

SYSmark/NT ( ver. 4.0 )

SS

PM

WP

Pres.

CAD

1

IBM IntelliStation M Pro 6889 -modified Intel Pentium II (Dual) 450 128MB

435

438

479

372

496

403

2

HP Kayak XU 6/400 Slot2 DP fastRAID Intel Pentium II Xeon (Dual) 400 128MB

418

427

479

356

461

378

3

HP Kayak XU 6/400 DP 2x4.5GB in RAID0 Intel Pentium II (Dual) 400 128MB

408

417

451

354

463

368

4

IBM IntelliStation M Pro 6889 -modified Intel Pentium II (Dual) 400 128MB

407

421

448

352

457

368

5

HP Kayak XU 6/400 DP 9.1GB Intel Pentium II (Dual) 400 128MB

399

397

451

335

459

366

6

IBM IntelliStation M Pro 6889 -modified Intel Pentium II (Dual) 400 128MB

397

396

445

336

450

368

7

Seattle -modified Intel Pentium II 450 64MB

394

325

513

325

425

413

8

HP Kayak XU 6/400 4.5 GB Intel Pentium II 400 128MB

379

321

486

325

409

377

9

DP5400 Intel Pentium II (Dual) 400 260MB

373

371

406

308

443

353

10

Intel SE440BX -modified Intel Pentium II 400 64MB

368

307

481

310

393

375

11

Gateway E-4200 Intel Pentium II 400 128MB

362

298

471

297

394

379

12

Compaq Deskpro EP Series 6400 Intel Pentium II 400 64MB

341

280

430

280

377

362

13

HP Kayak XU 6/333 DP 2x4.5GB Fast Raid -modified Intel Pentium II (Dual) 333 128MB

340

351

380

297

376

304

14

IBM IntelliStation M Pro 6898 -modified Intel Pentium II (Dual) 333 128MB

340

358

374

302

370

303

15

Duracom Performa 6221 Intel Pentium II 400 64MB

339

270

449

281

366

360

16

IBM IntelliStation M Pro 6898 -modified Intel Pentium II (Dual) 333 128MB

338

352

375

298

369

306

17

IBM IntelliStation M Pro 6898 -modified Intel Pentium II (Dual) 333 128MB

337

346

379

291

375

303

18

Nova P-II 400 Intel Pentium II 400 64MB

337

277

428

276

362

369

19

HP Kayak XU 6/333 DP 4.5GB Intel Pentium II (Dual) 333 128MB

335

339

378

292

374

302

20

Intergraph TDZ 2000 Intel Pentium II (Dual) 333 128MB

334

337

375

292

368

304

For more information, contact BAPCo, 2200 Mission College Blvd., RN2-02, Santa Clara, CA 95052; phone: 408-988-7654; fax: 408-765-4920; Internet: http://www.bapco.com.

 

ByteMark

This data is based on a usenet post to comp.sys.ibm.pc.hardware.chips by Eric Mintz. Byte Magazine's "Byte CPU DOS 32" is used for testing. It's popular because it's relatively easy to run and report, it's easy to get and free, and they publish the source code. But it really is a processor-only type of test.

#     CPU           Int    FP    NS    SS   Bit   FPem  Four  Assg  Idea  Huff  NNet  LUde
3.  i486DX2-66      0.41  0.25  0.43  0.27  0.43  0.50  0.28  0.38  0.40  0.49  0.20  0.30
6.  i486DX2-66      0.42  0.26  0.42  0.36  0.43  0.50  0.28  0.38  0.40  0.48  0.20  0.31
2.  Am486DX2/80     0.51  0.31  0.52  0.44  0.52  0.60  0.34  0.46  0.48  0.57  0.25  0.34
11. AM5x86-133      0.58  0.39  0.80  0.89  0.87  0.50  0.28  0.37  0.40  0.48  0.42  0.50
15. AMD486dx4-100   0.60  0.38  0.65  0.74  0.41  0.55  0.60  0.71  0.28  0.39  0.59  0.36
12. AM5x86-133      0.61  0.25  0.80  0.89  0.87  0.81  0.28  0.37  0.40  0.47  0.21  0.26
5.  Am486DX4/100    0.61  0.37  0.63  0.41  0.65  0.74  0.42  0.56  0.60  0.72  0.29  0.42
1.  iPOD-83         0.79  0.85  0.81  0.47  0.89  0.85  0.89  0.86  0.91  0.86  0.85  0.82
10. iP5-75          0.84  0.96  0.88  0.84  0.83  0.84  0.85  0.88  0.84  0.77  0.91  1.17
9.  AMD5x86-133     0.86  0.49  0.79  0.89  0.87  1.01  0.57  0.74  0.80  0.95  0.42  0.50
4.  Cx5x86-120      0.97  0.69  0.86  1.46  0.94  1.02  0.81  0.79  0.89  0.93  0.55  0.73
13. AM5x86-160      1.01  0.58  0.87  1.07  1.03  1.21  0.67  0.84  0.95  1.14  0.51  0.57
14. AM5x86-160      1.02  0.58  0.91  1.07  1.04  1.21  0.68  0.87  0.95  1.14  0.51  0.58
17. AM5x86-160      1.03  0.59  0.94  1.07  1.04  1.21  0.68  0.88  0.95  1.15  0.51  0.59
7.  iP5-100         1.10  1.21  1.12  1.08  1.10  1.12  1.12  1.16  1.11  1.01  1.16  1.35
16. P100            1.17  1.12  1.11  1.13  1.13  1.17  1.12  1.02  1.20  1.53  1.12  1.27
8.  iP5-120         1.30  1.38  1.27  1.28  1.33  1.35  1.35  1.36  1.33  1.21  1.35  1.45
18. iP5-166         1.80  1.74  1.82  1.73  1.84  1.87  1.86  1.87  1.84  1.68  1.89  1.48
Int = Overall Integer
FP = Overall Floating Point
NS = Numeric Sort
SS = String Sort
Bit = Bitfield
FPem = Floating Point Emulation

Four = Fourier
Assg = Assignment
Idea = Idea
Huff = Huffman
NNet = Neural Net
LUde = LU Decomposition

SPEC

What is SPEC

SPEC, the Standard Performance Evaluation Corporation, is a non-profit corporation formed to "establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers" (from SPEC's bylaws). SPEC's founders believe that the user community will benefit greatly from an objective series of applications-oriented tests, that can serve as common reference points and be considered during the evaluation process. While no one benchmark can fully characterize overall system performance, the results of a variety of realistic benchmarks can give valuable insight into expected real performance. The list of current members of SPEC includes: Open Systems Group (OSG): Amdahl, AT&T, Auspex Systems, Bull, Compaq Computer, Cray Research, Dansk Data Elektronik, Data General, Digital Equipment, Electronic Data Systems, FirePower Systems, Fujitsu, HAL Computer Systems, Hewlett-Packard, Hitachi, IBM, Intel, Intergraph, International Computer (ICL), Locus Computing, Motorola, Network Appliance, Nikkei Datapro, Novell, Olivetti, Pyramid Technology, Ross Technology, Siemens Nixdorf Informationssysteme, Silicon Graphics, Sun Microsystems, Tricord Systems, Unisys, and ZIFF Davis Publishing.

The Open Systems Group associates includes: Center for Scientific Computing (Finland), Defense Logistics Agency / Systems DeSign Center, Leibniz-Rechenzentrum (Germany), NASA AMES Research Center, National Taiwan University (Taiwan), Oregon Graduate Institute, OSF Research Institute, Princeton University, Technische Universitaet Chemnitz-Zwickau (Germany), University of Aizu (Japan), University of California at Berkeley. (2) High Performance Group (HPG): Convex Computers, Cray Research, Digital Equipment, Electronic Data Systems, Fujitsu America, Hewlett-Packard, International Supercomputing Technology Institute (ISTI, France), Kuck & Associates, NEC/HNSX Supercomputer, Silicon Graphics, and Sun Microsystems.

Legally, SPEC is a non-profit corporation registered in California. SPEC basically performs two functions:

The SPEC organization really comprises two groups, each with their own benchmarks: - Open Systems Group (OSG) which covers benchmarks in a UNIX, NT, or VMS environment. The High Performance Computing Group (HPCG) covers benchmarking in a numeric computing environment, with emphasis on high-performance numeric computing. Many people think about the OSG benchmarks only (often: the CPU benchmarks only) when they hear "SPEC", as these benchmarks are indeed the best known ones of SPEC's benchmarks.

 

How to Contact SPEC

SPEC (Standard Performance Evaluation Corporation)

10754 Ambassador Drive, Suite 201

Manassas, VA 22110, USA

Phone: (703) 331-0180 Fax: (703) 331-0181

E-Mail: info@spec.org

Ziff-Davis Benchmarks

Ziff-Davis provides 3 different benchmarks at no charge that test application servers (ServerBench), file servers (NetBench), and web servers (WebBench). These are summarized here in terms of what they are, what they measure, and what benchmarks tricks are possible to improve scores.

ServerBench 4.0 - http://www.zdnet.com/zdbop/svrbench/svrbench.html

What Is ServerBench? ServerBench measures the performance of application servers in a client/server environment. It provides an overall score for a server and individual scores for the clients, which are PCs running Windows® 95 or Windows NT®. Test runs are started and monitored from the controller, which is a PC running Windows 95 or Windows NT.

 

TPS: The Units Used To Report Scores Results are in TPS, or transactions per second. Each client measures how long each transaction takes and how many transactions ran. The client calculates its TPS score by dividing the total number of transactions by the total amount of time for completion. ServerBench combines the individual client TPS scores to calculate the overall server score.

ServerBench provides an overall measure of how a servers' performance stacks up against others if the the standard system test0suite was run. If one of ServerBench’s subsystem test suites is run, the results obtrained indicate how well that server subsystem is performing. The higher the score, the better your application server performed.

What Is A "Transaction" A transaction consists of the request a client sends to the server, the response it gets back, and the time it takes from the moment the client sends the request until it receives the reply from the server. ServerBench breaks down the time the transaction spends traveling along the network to and from the server, waiting in a queue on the server to receive a service, and receiving service.

How Performance Is Measured ServerBench uses a weighted harmonic to calculate the overall server performance score. By using a harmonic mean, the scores for different transactions can be combined to create a single representative score. Different transactions are weighted based on how often clients request each transaction in one iteration of a mix.

To calculate the overall server TPS score, the ServerBench benchmark:

· Tracks the amount of time each transaction takes to complete.

· Tallies the number of completed transactions. Incomplete transactions or transactions that began during the Ramp up or Ramp down periods at the start and end of a benchmark run are not counted.

· Creates a total TPS score for each transaction by adding together the TPS score for each client.

· Uses a weighted harmonic mean to turn the total TPS scores into a single overall server score.

Setting Test Parameters To Get The Best Results Adjusting the test parameters for different mixes can also influence results. Specifically, Ziff-Davis points out:

· Number Of Transaction Iterations. The higher the value, the greater the weight that transaction will have`in the test results. In addition, the more times a client requests a transaction during an iteration of a mix, the longer one iteration of the mix takes.

· Number Of Test Iterations. The more test iterations specified, the longer the transaction. lower TPS scores. Lower TPS scores than expected could be seen because each client measures the amount of time it takes the server to get a transaction and then send a reply.

· Number Of Iterations For The Processor Test. The more iterations specified as the Total Size value for the Processor test, the larger the CPU load that the test places on the system. By creating tests with different Processor test iterations, improvements are made in the way the benchmark stresses the processor subsystem on servers with different processor power.

· Data Segment Size. When the size is increased, the amount of file data accessed during a test is increased. This has a significant impact on the file system cache. As you add memory to the server, this parameter should be increased so as to stress the disk subsystem.

· Segment Access Ratio. This is an indication of how spread out the file data is across more physical disk space without increasing the amount of data being accessed.

Creating Test Mixes To Stress The Server If the goal is to stress the server to the point where TPS drops off, then these "hints" from Ziff-Davis will likely help:

Determining What To Test On Your System When studying the different tests, consider what would be the most useful information desired about a system. For example:

Additional information: To see the effects of changing parameters, change only one parameter at a time between benchmark tests. That policy will let you gauge each parameter's affect on performance. The server parameters to consider changing are:
  • Amount of server RAM.
  • Number of CPUs
  • Number of disks.
  • Disk organization (mirroring versus striping).
  • Network topology.

Factoring In Multitasking Multitasking is at the core of the ServerBench tests. Many tasks share processing time in a multitasking environment. Tasks are swapped in and out of memory by the processor. Many of the machines ServerBench and the other benchmarks run on use multiple processors. The following, though, is a single processor example.

Each mix contains a fixed number of clients. The mix tells the clients which specific transactions to send to the server. The clients begin the test and start sending transaction requests to the server. The server, instead of telling the clients to wait while it handles a single client's request, takes all the transactions from all the clients. Because a processor can only respond to one request at a time, the server gathers up all the other requests and puts them in a queue to wait for the processor time they need. If there are many clients sending requests, the queues can get unruly. Meanwhile, as the clients wait for the server to reply to their tests, they're running a clock and calculating how much time the server is taking.

By knowing how many clients are competing for processing time and how much time it takes for them to get service, the throughput for the tests can be calculated and the point at which throughput drops off can also be identified.

Figuring Out The Knees In The Curves A typical curve that plots a server's total TPS throughput against the number of active clients is shown below. The point at which the number of clients is increasing but the throughput starts to level off is called the knee of the curve.

 

A Results Curve Containing a Single Knee

 

As the number of clients sending requests to a server increases, the total throughput on the server goes up. This is because at low client counts, the server is not being utilized to its full potentiall. Some of the components of a computer system will remain idle.

With a server, as the number of client requests increases, the overhead of managing the requests also increases. The additional requests take more memory, there is more work for the scheduler, there is only a limited bandwidth for the network, and so on. Eventually, the throughput starts decreasing, indicating the server reached the knee of the curve. This is the point where the contention for resources and the overhead of managing the resources causes throughput to decrease.

A server's configuration can affect where the knee of the curve occurs. Peak throughput can often be improved by:

There are three general types of curves that you will see with ServerBench as well .

A Results Curve With A Double Knee

To get a valid measure of a server's performance, run the benchmark test until you see the knee in the server's performance curve. This means the client load should be increased until the TPS scores begins to level off. In the standard test suites, add clients incrementally as the tests run (i.e., start with 1 client, then run the test with 4 clients, next 8 clients, and so on).

If the ServerBench test suites are run and don't hit a knee, then the intensity of the tests needs to be intensified. Try:

How To Evaluate Throughput In Terms Of Your Server As you examine at the throughput graphs for different servers, keep in mind that the server with the best peak throughput number may not be the best server for your requirements. Exactly how the server handles intense loads must be studied. ServerBench measures how much the throughput degrades as the load is increased on a server.

An example: Suppose two servers are being compared. Server A reaches a knee at 30 clients and then experiences a sharp drop in throughput. Server B, however, reaches a knee at 25 clients and then begins a very slow drop in throughput. If you only have 25 to 30 users, then Server A is probably the server you want. But, if you have a workload equivalent to 40 clients, Server B may be the better server for you. This is because, while the throughput on Server B drops off sooner than on Server A, it doesn't deteriorate as sharply. This says that probably more users will get reasonable service. Since Server A's throughput shows such a sharp drop, then if you have a workload equivalent to more that 30 clients, some of those clients will get really poor service.

System Throughput Versus Client Response When looking at results, you may notice that the system throughput as shown by the overall score is different from the throughput for the individual clients. In other words, system throughput may increase even as client response time decreases. Don't worry. This is normal.

To understand how this happens, consider the following example. Imagine you are at a bank where each transaction takes ten minutes. There are four tellers and you. From your point of view, the process took you ten minutes (i.e., that is your response time). From the bank's point of view, one transaction occurred in ten minutes; thus, the transaction throughput was one in ten minutes or .1 per minute. Now, go through the same process but this time you are one of four customers. You still have to wait ten minutes, but the throughput is now four in ten minutes or .4 per minute.

Consider this scenario one more time but with eight customers instead of four. The tellers are working harder (probably, they're not asking people how they are any more), so the transaction time is now eight minutes. This means that, if you were at the end of the line, you had to wait sixteen minutes. This response time seems bad to you; however, throughput from the bank's point of view is now eight in sixteen minutes or .5 per minute. In other words, the bank's throughput improved even though your response time increased.

Similarly, suppose you are running the benchmark on a four-processor totally symmetric, multiprocessing system and a test mix is run that only does processor tests. It takes the first client 150 milliseconds to run the test. It takes the same amount of time for the second client and also for the third client. Thus, from the client's point of view, nothing has changed ¾ the test took the same amount of time for each client. From the server's point of view, it has tripled its throughput. This is a case where the overall system throughput improves but the client response times stay the same. In evaluating a server, consider both the system throughput and the client response time. You want to find out how evenly the server handles the clients.

Processor Scaling With a multiprocessor system, consider, too, the effect that the number of processors has on the scores. Within a certain range, the morE processors a system has, the better its scores. A point will be reached where the overhead required to manage more processors outweighs the advantage of having them. Finding how much throughput improves in relation to the number of processors is called processor scaling.

The multiprocessor systems likely to be used with ServerBench and other NT benchmarks are tightly coupled processors. This means all the processors have an equal access to memory. While access to memory is an advantage, it also results in overhead. The system has to coordinate access to memory. This means the hardware and software spend more time making this processing system work.

For example, with two processors, you would expect to see throughput double. However, because of the overhead of managing an additional processor, throughput may only increase by a rate of something like 1.8, not 2. Even at that rate, there's a definite advantage to having two processors. Eventually, though, as processors are added, the server's throughput rate will start dropping instead of increasing. At that point, the peak of processor scaling will have been passed and we are better off not adding another processor.

While bad processor scaling or even a lack of processor scaling is not a problem that would be encountered often, it is something to think about as scores are evaluated. Processor scaling is especially important when comparing servers with different numbers of processors; for example, if one alternative being evaluated is upgrade the current server with more CPU engines.

 

ServerBench Sample Reports

 

NetBench 5.01 - http://www.zdnet.com/zdbop/netbench/netbench.html

NetBench is a portable Ziff-Davis benchmark program that measures the performance of file servers as they handle requests from DOS, Windows® 95, Windows for Workgroups, Windows NT®, and Mac® OS clients.

How NetBench Measures NetBench provides a measure of how well a server handles network file I/O by having each client in the test make repeated requests to the server for file I/O service. Each client records the number of bytes of data moved and divides this number by the amount of time required to move the data. NetBench totals all the client throughput scores to determine the overall throughput for the server.

NetBench does not distinguish between the server subsystems. For example, NetBench can’t determine the relative performance differences between the disk and network.

The Units NetBench Uses To Report Its Scores NetBench reports its overall server scores in both bytes per second and megabits per second. The key unit of measure for NetBench, though, is bytes per second. NetBench uses bytes per second to report all other test information, such as individual client throughputs.

To determine the overall server scores, NetBench totals the individual throughput for all the clients.. The higher the score, the better the file server performed. Some publications convert NetBench’s bytes per second results to bits per second, megabits per second, or kilobits per second.

Some Concepts That Play A Role In NetBench's Results Here's a summary of some of the NetBench concepts:

Suggestions For Improving Your Server's Results Because NetBench tests a system as a whole instead of isolating the individual components of a server, everything about a system can affect the scores. Both hardware and software factors can influence NetBench results. For example, CPU speed obviously affects results, though it might not be the primary factor. Perhaps less obvious, though, is that a fragmented disk can slow down the NetBench tests as well.

Server scores can be improved if you:

Determining What To Test On Your System When looking at NetBench tests, consider what would be the most useful information you can get about the system. For example:

WebBench 1.1 - http://www.zdnet.com/zdbop/webbench/webbench.html

WebBench 1.1 provides a way to measure the performance of Web servers. WebBench uses client PCs to simulate Web browsers. However, unlike actual browsers, the clients don’t display the files that the server sends in response to their requests. Instead, when a client receives a response from the server, it records the information associated with the response and then immediately sends another request to the server.

WebBench In Brief When WebBench is installed, a set of HyperText Markup Language (HTML) files is copied into the Web server’s document root (i.e., the location where the Web server looks for its HTML files). The WebBench clients use HyperText Transfer Protocol (HTTP) to request these files from the server. Each client keeps a count of how many bytes the server transfers to it, how many requests succeed, and how many requests fail during either the connection phase of the request or the transfer phase. When the test mix ends, WebBench takes the individual client results, combines them, and produces two overall server scores: requests per second and throughput as measured in bytes per second. WebBench’s results tables also include details about how each client performed.

 

An Overview Of WebBench

How WebBench Measures Performance WebBench measures a Web server’s performance by having each client issue HTTP requests to the server. The server responds to these requests based on the URL in the HTTP request. The URL can point to any of the typical HTML elements ¾ files, graphics, programs, and so on. For the standard test suites in WebBench 1.1, the server always sends an HTML file in response to a client request.

The Units WebBench Uses To Report Its Scores WebBench produces two overall Web server scores: requests per second and throughput as measured in bytes per second. Requests per second is a basic measure of the client/server interaction. Clients count only completed requests; i.e., the client sends a request to the server and the server sends a response to the client. By looking at the server’s requests per second score, you can get a pretty good idea of how many hits per day that server can handle. (Hits are one way people measure a Web server’s capacity.) Because WebBench acts as a stress test, it loads the server with requests as fast as the clients can issue them. As a result, the requests per second score that WebBench generates is higher than the typical load serviced by most Web servers.

The server’s throughput score indicates how many bytes per second the server is moving to the clients. The value seen for throughput depends in part on the distribution of the file sizes in the workload file. If the clients are making a higher percentage of requests for large files, then they will be receiving more bytes from the server and, at least initially, throughput will go up. Once either the network or the server becomes saturated, the throughput will flatten out. If the client requests a greater quantity of smaller files, then it will make more requests per second, but the throughput won’t be as high because the server won’t move as many bytes. WebBench counts as throughput only the number of bytes in the HTML file the Web server sends the clients. Note, WebBench doesn’t include the bytes from the HTTP file header, the TCP/IP header, or the Ethernet header when it records its throughput scores.

To calculate the overall server scores, WebBench totals the individual requests per second and throughput scores for all the clients. The higher the score, the better the file server performed.

Some Concepts That Play A Role In The Results Some of the WebBench concepts that are important to consider are:

 

Additional information: Adding clients increases the total requests per second and throughput ... up to a point. When the overhead of managing the additional clients outweighs the advantage of having more clients, these two numbers quit increasing. This is the point in the results curve where it starts to flatten out. With WebBenchone normally see a rising slope at low client counts followed by a flat line once the server becomes saturated.

Here’re the types of results curves seen in WebBench testing

  • Most curves increase very sharply from 1 to 8 or 12 clients and then flatten out, indicating that the server has hit its saturation point.
  • Occasionally, curves hit a peak saturation point and then dip slightly before they flatten out.
  • If a curve keeps increasing and never reaches a point where it flattens out, then the server hasn't been saturated. More physical clients should be added.

We summarize the 3 Ziff-Davis benchmarks here.

ServerBench NetBench WebBench
Measures application servers. Measures file servers. Measures Web servers.
Provides an overall TPS score for your server and individual scores for clients. The scores show how well servers handle client requests for a variety of operations. Provides an overall I/O throughput score for your server and individual scores for clients. The scores show how well servers handle client requests for network file operations. Provides two overall server scores (requests per seconds and throughput as measured in bytes) as well as individual client scores. These scores show how well a server handles client HTTP requests and how many bytes per second the server is moving to the client.
Accepts Windows 95 and Windows NT clients. Accepts DOS, 16-bit Windows, and 32-bit Windows PC clients and Mac® OS clients. Accepts Windows 95 and Windows NT clients.
Requires a server program, a controller program, and a client program. Requires only a controller program and a client program. Requires a controller program and a client program; however, you do need to place files on the server. In addition, the server must be running a
third-party Web server program.
Requires you to use a Winsock 1.1 compliant TCP/IP stack on the controller and the clients. Runs on top of whatever protocol your system uses. Requires you to use a Winsock 1.1 compliant TCP/IP stack on the controller and the clients.
Uses proprietary programs to communicate only with itself. Accesses data through a publicly available API. Accesses data through HyperText Transfer Protocol (HTTP).
Runs only on specific server platforms. Runs on any server that provides shared file access to the controller and clients. Runs static (.HTML only) tests on any Web server; runs CGI, ISAPI, NSAPI, and IntranetWare 4.11 NLM tests on specific server platforms only.
CPU power and disk I/O play a very large role in how well your server performs. The server's disk I/O speed and the network I/O speed have the most influence on test scores The server's CPU power, the server’s memory size, and the network I/O speed all affect your Web server’s performance.
Reports scores as TPS (transactions per second). Reports scores as bytes per second. Reports scores as requests per second and throughput as measured in bytes per second.

 

Summary

By no means should the reader think that the benchmark suites covered in this article is a comlete list. New benchmark techniques are always being developed, and we hope to cover two addtional tools - Client/Server Solutions' Benchmark Factory 97 and Bluecurve's Dynameasure - in depth in a subsequent article.

Our objective was to show some of the myriad of current, state-of-the-art benchmarks for Window NT®. We must point out that benchmarks reflect the performance of the system that they run on, and the underlying software's characteristics. While these benchmarks describe some of the standard processor ratings and benchmarks that currently exist in the industry, the potential benchmark user must be careful when assuming that the results achieved in an industry benchmark will apply to their own, specific computing environment. Clearly, considerable work has been done in developing different benchmark suites that correspond to different workloads, each yielding different processor ratings. But be sure to make sure that any differences between what was run and what you will run can be explained.