Hard Disk
The Physical Characteristics of All Disks
The hard disk has one or more metal platters coated top and bottom with a magnetic material similar to the coating on a VCR magnetic tape. Information is recorded onto bands of the disk surface that form concentric circles. The circle closest to the outside is much bigger than the circle closest to the center. Since each metal platter has a top and bottom surface, there are at least two magnetic circles for each size and location. However, a disk may have as many as five platters, producing ten of these identical circles at the same distance out from the center.
There is a separate magnetic read/write head for each disk surface. With five platters there are ten heads. Each head is on a metal "arms" that can move the head from the innermost to the outermost circular position on the disk surfaces. The arms all move together, so if there are 10 disk surfaces the read/write heads are in the same location on all 10 surfaces.
One circle on one disk surface, equivalent to one position of the read/write head on the end of the arm, is called a track.
All of the tracks on all of the surfaces that correspond to the same arm position is called a cylinder.
To find some particular piece of data, the disk must move the arms to the position where it is located. This is called a seek. An average seek takes around 9 ms (milliseconds) on a desktop drive and around 5 ms on an enterprise server disk. To get the higher speed, the server disk uses more powerful motors to move the arms and that makes enough sound you can hear it in a quiet home or office environment.
Once the arm is in position, the disk must wait for the data on the disk surface to rotate around until the start of the data comes under the read head. This is called rotational latency and it depends on how fast the disk rotates. Latency on a typical desktop 7200 RPM disk is twice that of an enterprise server 15000 RPM disk.
Once the arm is in position and the first byte of data rotates over the head, the disk transfers data to the computer at a speed determined by the rotational speed and the density (how much data is squeezed onto the disk surface). Generally desktop disks store more data and have a higher density, so while enterprise server disks get to the data more quickly, if you are copying a 6 HD gigabyte video file from one disk to another, a desktop and server disk will perform at roughly the same speed.
When you buy a disk, there are several parameters:
Size
3.5 inch - This is the size of all desktop disks and many server disks. The large disk holds more data, and capacities are available up to 1.5 terrabytes (1500 gigabytes).
2.5 inch - A smaller disk surface holds less data, but it is lighter and requires less power to rotate. This is the typical disk size for all but the smallest, thinest laptops (to save battery). Recently it has also become a common size for enterprise disks that typically do not require large capacities, but do require better performance and lower power drain.
1.8 inch - The small, thin laptops choose an even smaller disk with less capacity but even lower power use.
Rotation
All desktop disks are 3.5" and rotate at 7200 RPM. A few lower power "Green" disks will slow rotation down to 5400 when they are mostly idle, but then speed back up to 7200 during periods of heavy use.
Laptop 2.5" disks can rotate at 7200 RPM, but then they generate more heat that is hard to dissipate. Typically laptops run at 5400 RPM, but when even more battery life (an less performance) is required some run at 4200 RPM.
Enterprise disks rotate at 10,000 RPM or 15,000 RPM. One home disk (the WD Raptor) has a SATA connector but rotates at 10,000 RPM, but it sells at a premium price to get a disk with the best performance of any desktop system.
Seek
How fast the arms move depends on the power of the motor used, which in turn uses electric power. Enterprise disks have the fastest seek, but they draw a lot of power and make an audible noise when seeking. Desktop and laptop disks use progressively less power. A new generation of intelligent disks can reduce power use even further. They calculate the rotational position of the disk surface when the arm arrives in position at full speed. If the data is just about to rotate under the arm, then the seek runs at full speed. However, if the data would still be half a rotation away, or even worse if the data would just have rotated past and the arm has to wait for a full rotation to complete, then it makes no sense to move the arm at full speed just so it can wait. An "intelligent seek" tells the disk motor to move more slowly and use less power so it gets into position just in time to read the data.
Capacity
Disks come with 1 to 5 platters, two surfaces to each platter. The current state of the art is a 3.5 inch disk platter with 333 gigabytes. That can produce a three platter 1 terrabyte disk, or a five platter 1.5 terrabyte disk. The largest capacity 2.5 inch disk is 500 gigabytes.
Vendors will sell you a disk with 80 gigabytes, but it doesn't make a lot of sense to use only part of a platter. In any given generation there is a sweet spot that minimizes cost per byte. Generally speaking, it is whatever disk size costs around $100 to $120 (currently a 500 or maybe 750 gigabyte size). You will pay a premium to get the very largest disk sizes, and if you only have one or two disks in your machine then you will probably get better performance across many different types of use if you buy twice as many 500 gigabyte disks rather than half as many 1 terrabyte disks. However, once your machine gets full at 4-6 disks and you run out of SATA connectors and power plugs you have a problem. So the trick is to plan you space requrements and choose a size that is comfortable, leaves room for expansion, but distributes your data across multiple disks right now.
External
For several years there have been external disk enclosures that can be connected to a USB port. That provides extra capacity at decent performance, but the USB connection is slower than the maximum disk transfer capability.
The SATA connector was designed to be "hot pluggable". The disk can easly slide along a guide and plug into a connector in the back of an enclosure. This design was important for enterprise server disks, where a disk failure in a redundant RAID system can be corrected by removing the broken disk and replacing it with a new disk without stopping the system. However, it also works with desktop disks in ordinary home environments. For around $30 you can buy a SATA hot swap disk bracket that fits into one of your 5 1/4 inch bays below your DVD drive. You connect a SATA disk cable and power to the back. You then open a door in the front of the bracket, slide a naked 3 1/2" SATA disk into the guide, and close the door. The SATA connector on the disk mates with a socket in the back of the unit. Now if your mainboard disk driver is smart enough, it will recognize the new disk, power it up, and make it immediately available. Otherwise you may have to tell the OS to scan for new hardware. If your driver is smart enough you will have the option to disconnect and power down the disk. Otherwise you will have to manually disable the drive, or in the worse case reboot. Even in the worst case, this is a lot simpler than taking the cover off and screwing the disk into the disk bay.
Vendors also sell an external $35 version of a pluggable SATA disk bay. It connects to your computer through an eSATA cable (although some models also support slower USB). The disk is inserted vertically and typically sticks out of the top of the unit.
If your laptop boots up slowly and seems to be sluggish, this is more likely to be the sluggish performance of the low power 2.5 inch disk than a problem with the CPU. Of course you can max out the memory to increase the disk cache, and you can put the laptop to sleep instead of turning it off to speed up power on, but at some point it may be worth attacking the problem head on. Replacing the internal disk is hard, and the new disk may run too hot. However, you can buy an ExpressCard eSATA adapter and hook the laptop up to an external hard drive. The SATA connection guarantees full speed and best performance. You can simply leave a disk at work with your work stuff and have a second unit at home with your home stuff, or you can pop the 3 1/2 inch disk out of the unit, wrap it in an anti static bag, and take it with you. At $35 it makes sense to buy several external enclosure units and leave them in place rather than trying to carry them around with you.
Rough Performance Numbers
You have now been given all the numbers, but you won't appreciate what they mean until we run a few calculations.
Suppose you read some random small pieces of data scattered all over the disk. Each time you jump from one piece of data to the next, you have to wait for a seek to complete and then for the rotational latency. If you add the 9 millisecond average seek to the average latency (half of a single 7200 RPM rotation) you discover that a typical desktop hard drive can do about 75 random reads per second.
Most people measure disk performance in terms of sequential transfer of large data files. A desktop disk can transfer 40 to 60 megabytes per second if all the data is written sequentially on consecutive tracks and cylinders. However, if you are reading 4K of chunks of data scattered randomly around on the disk surface, you can only read 300K of that data in the 75 operations you perform each second.
There is a massive difference between 300 thousand and 60 million bytes per second. This is why you cannot really state what disk performance is going to be unless you know what you are asking it to do.
Obviously you improve overall performance by periodically defragmenting the system disk to make data sequential. That is not, however, the whole story. Suppose you are reading two sequential files that happen to be on different parts of the same disk. If the system reads 4K from one file and then 4K from the other file, with a seek and rotational delay between each file switch, then it will be experiencing data transfer in the 300K per second range. If, on the other hand, the system reads 10 megabytes from one file and then 10 megabytes from the other file (and caches up the data in memory even before the program asks for it) then performance will be close to the 60 megabytes per second.
Operating systems will detect when a program is reading data sequentially and will "read ahead" to reduce arm movement and improve performance. However, this type of automatic optimization is conservative and only makes things two or three times better.
The place where this is most obvious is when you are editing or reencoding a large video file. If you try to save the output file onto the same disk where you are reading the input file, then the arm will be jumping back and forth. The processing will take 10 times as long as it would if the output file is written to a different hard disk.
Although modern computers have lots of free memory and could solve the problem with read-ahead and buffering, given current operating system behavior the real solution to this problem is to have more disks and to optimize locations so that whenever possible, different files being used by the same program are on different disks.
Enterprise disks run twice as fast as desktop disks. This is better, but it doesn't solve what is not a 600K per second or 80M per second spread (which is still a factor of 100 times slower). However, while corporate disks may not make much different, the microcode in a RAID adapter or Storage Area Network (SAN) may do more aggressive read-ahead that the operating system. Just don't count on it.
There is no substitute for manually optimizing things that you do all the time. In a corporate database, for example, you always put the "log" file on a physically different disk than the data (or the arm will be constantly jumping between the two). A desktop computer user should think about how often he copies or processes large files (for example, removing commercials from recorded TV programs). Putting the input and output files on different disks will cause the operation to run more than 10 times faster.
Using cache (in the computer memory, on the disk controller, or on the disk itself) will optimize the random requests you cannot anticipate. Careful positioning of data for things you do over and over will have a much greater effect. You might think it would cost more money to have two disks than to have one, but it depends on the amount of data you store. Two 250 Gigabyte desktop computer disks actually cost less than one 500 Gigabyte drive, and three of them cost a lot less than one 750 Gigabyte disk. You pay a premium for large devices, but a larger number of small independent devices will always perform better. Of course, you must plan for such a configuration. You need room in the case for more drives, and you need SATA connectors on the mainboard, and you need power from the Power Supply.
One of the most widely quoted performance characteristics is totally meaningless. A desktop disk can read data at a maximum rate of 40 to 60 megabytes per second. An ATA or Serial ATA connection may be advertised at 100, 150, or 300 megabytes per second. That speed represents the burst speed for transferring data from the disk cache to the computer, but the disk performance still remains a tiny fraction of this nominal transfer rate.
PIO means something is wrong
In 1985 the PC AT transmitted data to or from the disk two bytes at a time using IN or OUT CPU instructions. The CPU had to go into a loop transferring data two bytes per instruction until the entire block had been transferred. This is called Programmed IO (PIO) and the ability to operate this way is still built into every computer (for compatibility and for use at boot time) even though much faster and more efficient bulk data transfer mechanisms are available.
Unfortunately, when something is wrong with your disk cable or its connectors, bad enough to interfere but not to make data transfer completely impossible, Windows will discover the problem and, instead of generating a big message on the screen to reconnect or replace your disk cable, it will silently fall back into PIO mode. At that point you disk performance goes to hell and your CPU utilization runs at 100% to do even trivial I/O operations. Look in Windows Device Manager at the ATA disk controllers. There are two devices on each controller, and Windows will show you what speed they are running at. If any is in PIO mode, shut the system down and replace or replug the disk cables. Note that PIO can be selected as a fall back even for SATA disks.
SCSI and SAS (For Servers and Power Users)
SCSI is the name of a particular family of connectors and a command set. However, in modern technology SCSI is the type of connection of Enterprise class disks with 4.5 millisecond seek time and 10K or 15K RPM.
The SCSI command set does provide more powerful options for optimization than is available for dumb ATA drives. However, effective use of this command set requires a lot of disks and some fairly powerful system software.
Modern Serial Attached SCSI (SAS) disks use basically the same type of connection as a SATA disk. One pair of wire sends data from the computer to the disk while another pair sends data from the disk to the computer. Different grades of wire can then be used to extend the distance between the computer and the disk so that SAS disk can be a few meters away from the computer. In corporate server rooms, the disks can be in a different part of the rack than the CPU connected to them.
The most valuable part of the SAS architecture is seldom used, particularly when all you do is buy a standard server from Dell. Each SAS disk has two data connectors and has an "address" similar to the address on an Ethernet adapter. Each SAS controller has four or eight connectors and each of those connectors has an address. Now inside a Dell server each disk has one of its two attachments connected to one port on the controller. The effect is almost identical to SATA.
However, just as many computers, printers, and routers can be connected to each other using Ethernet cables and switches, so it is possible to get SAS switches that connect a bunch of disks to a bunch of computers. Physically any computer could in theory talk to any disk, but you configure each computer with the addresses of the disks it should use. Everything works the same until a computer (or maybe just a RAID adapter card) fails for some reason.
Then if you are using the full SAS architecture, a computer may have a second RAID card that can also get to the same disk using a different path. Or if the entire computer goes down, a backup computer can connect to the disks and pick up where the first computer left off. All of this is possible given the SAS architecture, but it is not typically the solution that hardware vendors sell. Instead, they treat SAS disks the same as an older generation of SCSI disks, and if you want to store disks separate from the computer case itself then the vendors would rather sell you some type of SAN device for big bucks rather than a SAS switch.
Copyright 1998, 2008 PCLT -- Introduction to PC Hardware -- H. Gilbert
