British mathematician and physicist William Thomson (1824–1907), otherwise known as Lord Kelvin, indicated the importance of measurement to science:
When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science, whatever the matter may be.
Possibly the most striking application of Kelvin's words is to the explanation of combustion by the French chemist Antoine Lavoisier (1743–1794). Combustion was confusing to scientists of the time because some materials, such as wood, seemed to decrease in mass on burning: Ashes weigh less than wood. In contrast, others, including iron, increased in mass: Rust weighs more than iron. Lavoisier was able to explain that combustion results when oxygen in the air unites with the material being burned, after careful measurement of the masses of the reactants—air and the material to be burned—and those of the products. Because Lavoisier was careful to capture all products of combustion, it was clear that the reason wood seemed to lose mass on burning was because one of its combustion products is a gas, carbon dioxide, which had been allowed to escape.
Lavoisier's experiments and his explanations of them and of the experiments of others are often regarded as the beginning of modern chemistry. It is not an exaggeration to say that modern chemistry is the result of careful measurement.
Most people think of measurement as a simple process. One simply finds a measuring device, uses it on the object to be measured, and records the result. Careful scientific measurement is more involved than this and must be thought of as consisting of four steps, each one of which is discussed here: choosing a measuring device, selecting a sample to be measured, making a measurement, and interpreting the results.
Choosing a Measuring Device
The measuring device one chooses may be determined by the devices available and by the object to be measured. For example, if it were necessary to
determine the mass of a coin, obviously inappropriate measuring devices would include a truck scale (reading in units of 20 pounds, with a 10-ton capacity), bathroom scale (in units of 1 pound, with a 300-pound capacity), and baby scale (in units of 0.1 ounce, with a 30-pound capacity). None of these is capable of determining the mass of so small an object. Possibly useful devices include a centigram balance (reading in units of 0.01 gram, with a 500-gram capacity), milligram balance (in units of 0.001 gram, with a 300-gram capacity), and analytical balance (in units of 0.00001 gram, with a 100-gram capacity). Even within this limited group of six instruments, those that are suitable differ if the object to be measured is an approximately one-kilogram book instead of a coin. Then only the bathroom scale and baby scale will suffice.
In addition, it is essential that that the measuring device provide reproducible results. A milligram balance that yields successive measurements of 3.012, 1.246, 8.937, and 6.008 grams for the mass of the same coin is clearly faulty. One can check the reliability of a measuring device by measuring a standard object, in part to make sure that measurements are reproducible. A common measuring practice is to intersperse samples of known value within a group of many samples to be measured. When the final results are tallied, incorrect values for the known samples indicate some fault, which may be that of the measuring device, or that of the experimenter. In the example of measuring the masses of different coins, one would include several "standard" coins, the mass of each being very well known.
Selecting a Sample
There may be no choice of sample because the task at hand may be simply that of measuring one object, such as determining the mass of a specific coin. If the goal is to determine the mass of a specific kind of coin, such as a U.S. penny, there are several questions to be addressed, including the following. Are uncirculated or worn coins to be measured? Worn coins may have less mass because copper has worn off, or more mass because copper oxide weighs more than copper and dirt also adds mass. Are the coins of just one year to be measured? Coin mass may differ from year to year. How many coins should be measured to obtain a representative sample? It is likely that there is a slight variation in mass among coins and a large enough number of coins should be measured to encompass that variation. How many sources (banks or stores) should be visited to obtain samples? Different batches of new coins may be sent to different banks; circulated coins may be used mostly in vending machines and show more wear as a result.
The questions asked depend on the type of sample to be measured. If the calorie content of breakfast cereal is to be determined, the sampling questions include how many factories to visit for samples, whether to sample unopened or opened boxes of cereal, and the date when the breakfast sample was manufactured, asked for much the same reason that similar questions were advanced about coins. In addition, other questions come to mind. How many samples should be taken from each box? From where in the box should samples be taken? May samples of small flakes have a different calorie content than samples of large flakes?
These sampling questions are often the most difficult to formulate but they are also the most important to consider in making a measurement. The purpose of asking them is to obtain a sample that is as representative as possible of the object being measured, without repeating the measurement unnecessarily. Obviously, a very exact average mass of the U.S. penny can be obtained by measuring every penny in circulation. This procedure would be so time-consuming that it is impractical, in addition to being expensive.
Making a Measurement
As mentioned above, making a measurement includes verifying that the measuring device yields reproducible results, typically by measuring standard samples. Another reason for measuring standard samples is to calibrate the measuring instrument. For example, a common method to determine the viscosity of a liquid—its resistance to flow—requires knowing the density of that liquid and the time that it takes for a definite volume of liquid to flow through a thin tube, within a device called a viscometer. It is very difficult to construct duplicate viscometers that have exactly the same length and diameter of that tube. To overcome this natural variation, a viscometer is calibrated by timing the flow of a pure liquid whose viscosity is known—such as water—through it. Careful calibration involves timing the flow of a standard volume of more than one pure liquid.
Calibration not only accounts for variations in the dimensions of the viscometer. It also compensates for small variations in the composition of the glass of which the viscometer is made, small differences in temperatures, and even differences in the gravitational acceleration due to different positions on Earth. Finally, calibration can compensate for small variations in technique from one experimenter to another.
These variations between experimenters are of special concern. Different experimenters can obtain very different values when measuring the same sample. The careful experimenter takes care to prevent bias or difference in technique from being reflected in the final result. Methods of prevention include attempting to measure different samples without knowing the identity of each sample. For instance, if the viscosities of two colorless liquids are to be measured, several different aliquots of each liquid will be prepared, the aliquots will be shuffled, and each aliquot will be measured in order. As much of the measurement as possible will be made mechanically. Rather than timing flow with a stopwatch, it is timed with an electronic device that starts and stops as liquid passes definite points.
Finally, the experimenter makes certain to observe the measurement the same way for each trial. When a length is measured with a meter stick or a volume is measured with a graduated cylinder, the eye of the experimenter is in line with or at the same level as the object being measured to avoid parallax. When using a graduated device, such as a thermometer, meter stick, or graduated cylinder, the measurement is estimated one digit more finely than the finest graduation. For instance, if a thermometer is graduated in degrees, 25.4°C (77.7°F) would be a reasonable measurement made with it, with the ".4" estimated by the experimenter.
Each measurement is recorded as it is made. It is important to not trust one's memory. In addition, it is important to write down the measurements made, not the results from them. For instance, if the mass of a sample of sodium chloride is determined on a balance, one will first obtain the mass of a container, such as 24.789 grams, and then the mass of the container with the sodium chloride present, such as 32.012 grams. It is important to record both of these masses and not just their difference, the mass of sodium chloride, 7.223 grams.
Typically, the results of a measurement involve many values, the observations of many trials. It is tempting to discard values that seem quite different from the others. This is an acceptable course of action if there is good reason to believe that the errant value was improperly measured. If the experimenter kept good records while measuring, notations made during one or more trials may indicate that an individual value was poorly obtained—for instance, by not zeroing or leveling a balance, neglecting to read the starting volume in a buret before titration, or failing to cool a dried sample before obtaining its mass.
Simply discarding a value based on its deviation from other values, without sound experimental reasons for doing so, may lead to misleading results besides being unjustified. Consider the masses of several pennies determined with a milligram balance to be: 3.107, 3.078, 3.112, 2.911,3.012, 3.091, 3.055, and 2.508 grams. Discarding the last mass because of its deviation would obscure the facts that post-1982 pennies have a zinc core with copper cladding (representing a total of about 2.4% copper), whereas pre-1982 pennies are composed of an alloy that is 95 percent copper. There are statistical tests that help in deciding whether to reject a specific value or not.
It is cumbersome, however, to report all the values that have been measured. Reporting solely the average or mean value gives no indication of how carefully the measurement has been made or how reproducible the repeated measurements are. Care in measurement is implied by the number of significant figures reported; this corresponds to the number of digits to
which one can read the measuring devices, with one digit beyond the finest graduation, as indicated earlier.
The reproducibility of measurements is a manifestation of their precision. Precision is easily expressed by citing the range of the results; a narrow range indicates high precision. Other methods of expressing precision include relative average deviation and standard deviation. Again, a small value of either deviation indicates high precision; repeated measurements are apt to replicate the values of previous ones.
When several different quantities are combined to obtain a final value—such as combining flow time and liquid density to determine viscosity—standard propagation-of-error techniques are employed to calculate the deviation in the final value from the deviations in the different quantities.
Both errors and deviations combine in the same way when several quantities are combined, even though error and deviation are quite different concepts. As mentioned above, deviation indicates how reproducible successive measurements are. Error is a measure of how close an individual value—or an average—is to an accepted value of a quantity. A measurement with small error is said to be accurate. Often, an experimenter will believe that high precision indicates low error. This frequently is true, but very precise measurements may have a uniform error, known as a systematic error. An example would be a balance that is not zeroed, resulting in masses that are uniformly high or low.
The goal of careful measurement ultimately is to determine an accepted value. Careful measurement technique—including choosing the correct measuring device, selecting a sample to be measured, making a measurement, and interpreting the results—helps to realize that goal.
Youden, W. J. (1991). Experimentation and Measurement. NIST Special Publication 672. Washington, DC: National Institute of Standards and Technology.