EFTA00307512.pdf
PDF Source (No Download)
Extracted Text (OCR)
Vol.. 63, No. 2
MARCH, 1956
THE PSYCHOLOGICAL REVIEW
THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO:
SOME LIMITS ON OUR CAPACITY FOR
PROCESSING INFORMATION'
GEORGE A. MILLER
Harvard University
My problem is that I have been perse-
cuted by an integer. For seven years
this number has followed me around, has
Intruded in my most private data, and
has assaulted me from the pages of our
most public journals. This number as-
sumes a variety of disguises, being some-
times a little larger and sometimes a
little smaller than usual, but never
changing so much as to be unrecogniz-
able. The persistence with which this
number plagues me is far more than
a random accident. There is, to quote
a famous senator, a design behind it,
some pattern governing its appearances.
Either there really is something unusual
about the number or else I am suffering
from delusions of persecution.
I shall begin my case history by tell-
ing you about some experiments that
tested how accurately people can assign
numbers to the magnitudes of various
aspects of a stimulus. In the tradi-
tional language of psychology these
would be called experiments in absolute
Th1s paper was first read as an Invited
Address before the Eastern Psychological As-
sociation In Philadelphia on April IS, 193.5.
Preparation of the paper was supported by
the Harvard Psycho-Acoustic Laboratory un-
der Contract NSort-76 between Harvard Uni-
versity and the Office of Naval Research, U. S.
Navy (Project NR142-201, Report PNR-174).
Reproduction for any purpose of the U. S.
Government is permitted.
judgment.
Historical accident, how-
ever, has decreed that they should have
another name. We now call them ex-
periments on the capacity of people to
transmit information. Since these ex-
periments would not have been done
without the appearance of information
theory on the psychological scene, and
since the results are analyzed in terms
of the concepts of information theory,
I shall have to preface my discussion
with a few remarks about this theory.
INFORMATION MEASUREMENT
The "amount of information" is ex-
actly the same concept that we have
talked about for years under the name
of "variance." The equations are dif-
ferent, but if we hold tight to the idea
that anything that increases the vari-
ance also increases the amount of infor-
mation we cannot go far astray.
The advantages of this new way
of talking about variance are simple
enough. Variance is always stated in
terms of the unit of measurement—
inches, pounds, volts, etc.—whereas the
amount of information is a dimension-
less quantity. Since the information in
a discrete statistical distribution does
not depend upon the unit of measure-
ment, we can extend the concept to
situations where we have no metric and
we would not ordinarily think of using
81
I '
EFTA00307512
82
GEORGE A. Mn...ten
the variance. And It also enables us to
compare results obtained in quite dif-
ferent experimental situations where it
would be meaningless to compare vari-
ances based on different metrics. So
there are some good reasons for adopt-
ing the newer concept.
The similarity of variance and amount
of information might be explained this
way: When we have a large variance,
we are very ignorant about what is go-
ing to happen. If we are very ignorant,
then when we make the observation It
gives us a lot of information. On the
other hand, if the variance is very small,
we know in advance how our observa-
tion must come out, so we get little in-
formation from making the observation.
If you will now imagine a communi-
cation system, you will realize that
there is a great deal of variability about
what goes into the system and also a
great deal of variability about what
comes out. The input and the output
can therefore be described in terms of
their variance (or their information).
If it is a good communication system,
however, there must be some system-
atic relation between what goes in and
what comes out. That is to say, the
output will depend upon the input, or
will be correlated with the input. If we
measure this correlation, then we can
say how much of the output variance is
attributable to the input and how much
is due to random fluctuations or "noise"
introduced by the system during trans-
mission. So we see that the measure
of transmitted information is simply a
measure of the input-output correlation.
There are two simple rules to follow.
Whenever I refer to "amount of in-
formation," you will understand "vari-
ance." And whenever I refer to "amount
of transmitted information," you will
understand "covariance" or "correla-
tion."
The situation can be described graphi-
cally by two partially overlapping cir-
des. Then the left circle can be taken
to represent the variance of the input,
the right circle the variance of the out-
put, and the overlap the covariance of
input and output. I shall speak of the
left circle as the amount of input infor-
mation, the right circle as the amount
of output information, and the overlap
as the amount of transmitted informa-
tion.
In the experiments on absolute judg-
ment, the observer is considered to be
a communication channel. Then the
left circle would represent the amount
of information in the stimuli, the right
circle the amount of information in his
responses, and the overlap the stimulus-
response correlation as measured by the
amount of transmitted information. The
experimental problem is to increase the
amount of input information and to
measure the amount of transmitted in-
formation. If the observer's absolute
judgments are quite accurate, then
nearly all of the input information will
be transmitted and will be recoverable
from his responses. If he makes errors,
then the transmitted information may
be considerably less than the input. We
expect that, as we increase the amount
of input information, the observer will
begin to make more and more errors;
we can test the limits of accuracy of his
absolute judgments. If the human ob-
server is a reasonable kind of communi-
cation system, then when we increase
the amount of input information the
transmitted information will increase at
first and will eventually level off at some
asymptotic value. This asymptotic value
we take to be the channel capacity of
the observer: it represents the greatest
amount of information that he can give
us about the stimulus on the basis of
an absolute judgment. The channel ca-
pacity is the upper limit on the extent
to which the observer can match his re-
sponses to the stimuli we give him.
Now just a brief word about the bit
EFTA00307513
THE MAGICAL
and we can begin to look at some data.
One bit of information is the amount of
information that we need to make a
decision between two equally likely al-
ternatives. If we must decide whether
a man is less than six feet tall or more
than six feet tall and if we know that
the chances are SO-50, then we need
one bit of information.
Notice that
this unit of information does not refer
in any way to the unit of length that
we use—feet, inches, centimeters, etc.
However you measure the man's height,
we still need just one bit of information.
Two bits of information enable us to
decide among four equally likely alter-
natives. Three bits of information en-
able us to decide among eight equally
likely alternatives. Four bits of infor-
mation decide among 16 alternatives,
five among 32, and so on. That is to
say, if there are 32 equally likely alter-
natives, we must make five successive
binary decisions, worth one bit each, be-
fore we know which alternative is cor-
rect.
So the general rule is simple:
every time the number of alternatives
is increased by a factor of two, one bit
of information is added.
There are two ways we might in-
crease the amount of input information.
We could increase the rate at which we
give information to the observer, so that
the amount of information per unit time
would increase. Or we could ignore the
time variable completely and increase
the amount of input information by
increasing the number of alternative
stimuli. In the absolute judgment ex-
periment we are interested in the second
alternative. We give the observer as
much time as he wants to make his re-
sponse; we simply increase the number
of alternative stimuli among which he
must discriminate and look to see where
confusions begin to occur. Confusions
will appear near the point that we are
calling his "channel capacity."
NUMBER SEVEN
83
ABSOLUTE JUDGMENTS OF UNI-
DIMENSIONAL STIMULI
Now let us consider what happens
when we make absolute judgments of
tones. Pollack (17) asked listeners to
identify tones by assigning numerals to
them. The tones were different with re-
spect to frequency, and covered the
range from 100 to 8000 cps in equal
logarithmic steps. A tone was sounded
and the listener responded by giving a
numeral. After the listener had made
his response he was told the correct
identification of the tone.
When only two or three tones were
used the listeners never confused them.
With four different tones confusions
were quite rare, but with five or more
tones confusions were frequent. With
fourteen different tones the listeners
made many mistakes.
These data are plotted in Fig. 1.
Along the bottom is the amount of in-
put information in bits per stimulus.
As the number of alternative tones was
increased from 2 to 14, the input infor-
mation increased from 1 to 3.8 bits. On
the ordinate is plotted the amount of
2
0
4
0
PITCHES
i00-8000 CP5
3
n
5
INPUT INrORMAT ION
2 5
sirs
no. 1. Data from Pollack (17, 18) on the
amount of information that is transmitted by
listeners who make absolute judgments of
auditory pitch. As the amount of input in-
formation is increased by increasing from 2
to 14 the number of different pitches to be
judged, the amount of transmitted informa-
tion approaches as its upper limit a channel
capacity of about IS bits per judgment.
EFTA00307514
84
GEORGE A. Minn
transmitted information. The amount
of transmitted information behaves in
much the way we would expect a corn-
munication channel to behave; the trans-
mitted information increases linearly up
to about 2 bits and then bends off to-
ward an asymptote at about 2.5 bits.
This value, 2.5 bits, therefore, is what
we are calling the channel capacity of
the listener for absolute judgments of
pitch.
So now we have the number 2.5
bits. What does it mean? First, note
that 2.5 bits corresponds to about six
equally likely alternatives. The result
means that we cannot pick more than
six different pitches that the listener will
never confuse. Or, stated slightly dif-
ferently, no matter how many alterna-
tive tones we ask him to judge, the best
we can expect him to do is to assign
them to about six different classes with-
out error. Or, again, if we know that
there were N alternative stimuli, then
his judgment enables us to narrow down
the particular stimulus to one out of
N/6.
Most people are surprised that the
number is as small as six. Of course,
there is evidence that a musically so-
phisticated person with absolute pitch
can identify accurately any one of 50
or 60 different pitches. Fortunately, I
do not have time to discuss these re-
markable exceptions. I say it is for-
tunate because I do not know how to
explain their superior performance. So
I shall stick to the more pedestrian fact
that most of us can identify about one
out of only five or six pitches before we
begin to get confused.
It is interesting to consider that psy-
chologists have been using seven-point
rating scales for a long time, on the
intuitive basis that trying to rate into
finer categories does not really add much
to the usefulness of the ratings. Pol-
lack's results indicate that, at least for
pitches, this Intuition is fairly sound.
2.3
BITS
INPUT INFORMATION
FIG. 2. Data from Garner (7) on the chan-
nel capacity for absolute judgments of audi-
tory loudness.
Next you can ask how reproducible
this result is. Does it depend on the
spacing of the tones or the various con-
ditions of judgment?
Pollack varied
these conditions in a number of ways.
The range of frequencies can be changed
by a factor of about 20 without chang-
ing the amount of information trans-
mitted more than a small percentage.
Different groupings of the pitches de-
creased the transmission, but the loss
was small. For example, if you can
discriminate five high-pitched tones in
one series and five low-pitched tones in
another series, it is reasonable to ex-
pect that you could combine all ten into
a single series and still tell them all
apart without error. When you try it,
however, it does not work. The chan-
nel capacity for pitch seems to be about
six and that is the best you can do.
While we are on tones, let us look
next at Garner's (7) work on loudness.
Garner's data for loudness are sum-
marized in Fig. 2. Garner went to some
trouble to get the best possible spacing
of his tones over the intensity range
from 15 to 110 db. He used 4, 5, 6, 7,
10, and 20 different stimulus intensities.
The results shown in Fig. 2 take into
account the differences among subjects
and the sequential influence of the im-
mediately preceding judgment. Again
we find that there seems to be a limit.
EFTA00307515
THE MAGICAL NUMBER SEVEN
85
- I9
gas
TASTF$
•Fuocaattts of samw
CONCENTRATION
2
3
4
5
elm INFORMATION
Fro. S. Data from Beebe-Center, Rogers,
and O'Connell (1) on the channel capacity for
absolute judgments of saltiness.
The channel capacity for absolute judg-
ments of loudness is 2.3 bits, or about
five perfectly discriminable alternatives.
Since these two studies were done in
different laboratories with slightly dif-
ferent techniques and methods of analy-
sis, we are not in a good position to
argue whether five loudnesses is signifi-
cantly different from six pitches. Prob-
ably the difference is in the right direc-
tion, and absolute judgments of pitch
are slightly more accurate than absolute
judgments of loudness. The important
point, however, is that the two answers
are of the same order of magnitude.
The experiment has also been done
for taste intensities. In Fig. 3 are the
results obtained by Beebe-Center, Rog-
ers, and O'Connell (1) for absolute
judgments of the concentration of salt
solutions.
The concentrations ranged
from 0.3 to 34.7 gin. Neel per 100
cc. tap water in equal subjective steps.
They used 3, 5, 9, and 17 different con-
centrations. The channel capacity is
1.9 bits, which is about four distinct
concentrations. Thus taste intensities
seem a little less distinctive than audi-
tory stimuli, but again the order of
magnitude is not far off.
On the other hand, the channel ca-
pacity for judgments of visual position
seems to be significantly larger. Hake
and Garner (8) asked observers to in-
terpolate visually between two scale
markers. Their results are shown in
Fig. 4. They did the experiment in
two ways. In one version they let the
observer use any number between zero
and 100 to describe the position, al-
though they presented stimuli at only
5, 10, 20, or 50 different positions. The
results with this unlimited response
technique are shown by the filled circles
on the graph. In the other version the
observers were limited in their re-
sponses to reporting just those stimu-
lus values that were possible. That is
to say, in the second version the num-
ber of different responses that the ob-
server could make was exactly the same
as the number of different stimuli that
the experimenter might present. The
results with this limited response tech-
nique are shown by the open circles on
the graph. The two functions are so
similar that it seems fair to conclude
that the number of responses available
to the observer had nothing to do with
the channel capacity of 3.25 bits.
The Hake-Garner experiment has been
repeated by Coonan and Klemmer. Al-
though they have not yet published
their results, they have given me per-
mission to say that they obtained chan-
nel capacities ranging from 3.2 bits for
a
POINTS ON A LINE
o me • ms
• mr • KO
2
3
4
5
iNFUT INFORMATION
Fro. 4. Data from Hake and Garner (8)
on the channel capacity for absolute Judg-
ments of the position of a pointer in a linear
interval.
EFTA00307516
86
GEORGE A. MILLER
very short exposures of the pointer po-
sition to 3.9 bits for longer exposures.
These values are slightly higher than
Hake and Garner's, so we must con-
clude that there are between 10 and IS
distinct positions along a linear inter-
val. This is the largest channel ca-
pacity that has been measured for any
unidimensional variable.
At the present time these four experi-
ments on absolute judgments of simple,
unidimensional stimuli are all that have
appeared in the psychological journals.
However, a great deal of work on other
stimulus variables has not yet appeared
in the journals. For example, Eriksen
and Hake (6) have found that the
channel capacity for judging the sizes
of squares is 2.2 bits, or about five
categories, under a wide range of ex-
perimental conditions.
In a separate
experiment Eriksen (5) found 2.8 bits
for size, 3.1 bits for hue, and 2.3 bits
for brightness. Geldard has measured
the channel capacity for the skin by
placing vibrators on the chest region.
A good observer can identify about four
intensities, about five durations, and
about seven locations.
One of the most active groups in this
area has been the Air Force Operational
Applications Laboratory. Pollack has
been kind enough to furnish me with
the results of their measurements for
several aspects of visual displays. They
made measurements for area and for
the curvature, length, and direction of
lines. In one set of experiments they
used a very short exposure of the stimu-
lus—%0 second—and then they re-
peated the measurements with a 5-
second exposure.
For area they got
2.6 bits with the short exposure and
2.7 bits with the long exposure. For
the length of a line they got about 2.6
bits with the short exposure and about
3.0 bits with the long exposure. Direc-
tion, or angle of inclination, gave 2.8
bits for the short exposure and 3.3 bits
for the long exposure. Curvature was
apparently harder to judge. When the
length of the arc was constant, the re-
sult at the short exposure duration was
2.2 bits, but when the length of the
chord was constant, the result was only
1.6 bits. This last value is the lowest
that anyone has measured to date. I
should add, however, that these values
are apt to be slightly too low because
the data from all subjects were pooled
before the transmitted information was
computed.
Now let us see where we are. First,
the channel capacity does seem to be a
valid notion for describing human ob-
servers. Second, the channel capacities
measured for these unidimensional vari-
ables range from 1.6 bits for curvature
to 3.9 bits for positions in an interval.
Although there is no question that the
differences among the variables are real
and meaningful, the more impressive
fact to me is their considerable simi-
larity. If I take the best estimates I
can get of the channel capacities for all
the stimulus variables I have mentioned,
the mean is 2.6 bits and the standard
deviation is only 0.6 bit. In terms of
distinguishable alternatives, this mean
corresponds to about 6.5 categories, one
standard deviation includes from 4 to
10 categories, and the total range is
from 3 to 15 categories. Considering
the wide variety of different variables
that have been studied, I find this to
be a remarkably narrow range.
There seems to be some limitation
built into us either by learning or by
the design of our nervous systems, a
limit that keeps our channel capacities
in this general range. On the basis of
the present evidence it seems safe to
say that we possess a finite and rather
small capacity for making such unfelt-
mensional judgments and that this ca-
pacity does not vary a great deal from
one simple sensory attribute to another.
EFTA00307517
THE MAGICAL NUMBER SEVEN
87
ABSOLUTE JUDGMENTS OF MULTI-
DIMENSIONAL STIMULI
You may have noticed that I have
been careful to say that this magical
number seven applies to one-dimensional
judgments. Everyday experience teaches
us that we can identify accurately any
one of several hundred faces, any one
of several thousand words, any one of
several thousand objects, etc. The story
certainly would not be complete if we
stopped at this point. We must have
some understanding of why the one-
dimensional variables we judge in the
laboratory give results so far out of
line with what we do constantly in our
behavior outside the laboratory. A pos-
sible explanation lies in the number of
independently variable attributes of the
stimuli that are being judged. Objects,
faces, words, and the like differ from
one another in many ways, whereas the
simple stimuli we have considered thus
far differ from one another in only one
respect.
Fortunately, there are a few data on
what happens when we make absolute
judgments of stimuli that differ from
one another in several ways. Let us
look first at the results Klemmer and
Frick (13) have reported for the abso-
lute judgment of the position of a dot
in a square. In Fig. S we see their re-
I
z
3
4
5
6
7
6
INPUT INFORMATION
a
- --4.6 BITS—
7ete
POINTS IN A SQUARE
I43 GRID
.03 SEC. EXPOSURE
Fro. S. Data from ICkmmer and Frick (13)
on the channel capacity for absolute Judg-
ments of the position of a dot In a square.
sults. Now the channel capacity seems
to have increased to 4.6 bits, which
means that people can identify accu-
rately any one of 24 positions in the
square.
The position of a dot In a square is
clearly a two-dimensional proposition.
Both its horizontal and its vertical po-
sition must be identified. Thus it seems
natural to compare the 4.6-bit capacity
for a square with the 3.25-bit capacity
for the position of a point in an inter-
val. The point in the square requires
two judgments of the interval type. If
we have a capacity of 3.25 bits for esti-
mating intervals and we do this twice,
we should get 6.5 bits as our capacity
for locating points in a square. Adding
the second independent dimension gives
us an increase from .3.25 to 4.6, but it
falls short of the perfect addition that
would give 6.5 bits.
Another example is provided by Beebe-
Center, Rogers, and O'Connell. When
they asked people to identify both the
saltiness and the sweetness of solutions
containing various concentrations of salt
and sucrose, they found that the chan-
nel capacity was 2.3 bits. Since the ca-
pacity for salt alone was 1,9, we might
expect about 3.8 bits if the two aspects
of the compound stimuli were judged
independently.
As with spatial loca-
tions, the second dimension adds a little
to the capacity but not as much as it
conceivably might.
A third example is provided by Pol-
lack (18), who asked listeners to judge
both the loudness and the pitch of pure
tones. Since pitch gives 2.5 bits and
loudness gives 2.3 bits, we might hope
to get as much as 4.8 bits for pitch and
loudness together. Pollack obtained 3.1
bits, which again indicates that the
second dimension augments the channel
capacity but not so much as it might.
A fourth example can be drawn from
the work of Halsey and Chapanis (9)
on confusions among colors of equal
EFTA00307518
88
GEORGE A. MILLER
luminance. Although they did not ana-
lyze their results in informational terms,
they estimate that there are about 11 to
15 identifiable colors, or, in our terms,
about 3.6 bits. Since these colors varied
in both hue and saturation, it is prob-
ably correct to regard this as a two-
dimensional judgment. If we compare
this with Eriksen's 3.1 bits for hue
(which is a questionable comparison to
draw), we again have something less
than perfect addition when a second
dimension is added.
It is still a long way, however, from
these two-dimensional examples to the
multidimensional stimuli provided by
faces, words, etc. To fill this gap we
have only one experiment, an auditory
study done by Pollack and Ficks (19).
They managed to get six different acous-
tic variables that they could change:
frequency, intensity, rate of Interrup-
tion, on-time fraction, total duration,
and spatial location. Each one of these
six variables could assume any one of
five different values, so altogether there
were 58, or 15,625 different tones that
they could present. The listeners made
a separate rating for each one of these
six dimensions. Under these conditions
the transmitted information was 7.2 bits,
which corresponds to about ISO differ-
ent categories that could be absolutely
identified without error. Now we are
beginning to get up into the range that
ordinary experience would lead us to
expect.
Suppose that we plot these data,
fragmentary as they are, and make a
guess about how the channel capacity
changes with the dimensionality of the
stimuli. The result is given in Fig. 6.
In a moment of considerable daring I
sketched the dotted line to indicate
roughly the trend that the data seemed
to be taking.
Clearly, the addition of independently
variable attributes to the stimulus in-
creases the channel capacity, but at a
33
10
6
2
0
2
3
4
5
6
7
NUMBER OF VAAI&SLC ASPECTS
Fm. 6. The general form of the relation be-
tween channel capacity and the number of in-
dependently variable attributes of the stimuli.
decreasing rate. It Is Interesting to
note that the channel capacity is in-
creased even when the several variables
are not independent. Eriksen (5) re-
ports that, when size, brightness, and
hue all vary together in perfect correla-
tion, the transmitted information is 4.1
bits as compared with an average of
about 2.7 bits when these attributes are
varied one at a time. By confounding
three attributes, Eriksen increased the
dimensionality of the input without in-
creasing the amount of input informa-
tion; the result was an increase in chan-
nel capacity of about the amount that
the dotted function in Fig. 6 would lead
us to expect.
The point seems to be that, as we
add more variables to the display, we
increase the total capacity, but we de-
crease the accuracy for any particular
variable. In other words, we can make
relatively crude judgments of several
things simultaneously.
We might argue that in the course of
evolution those organisms were most
successful that were responsive to the
widest range of stimulus energies in
their environment. In order to survive
in a constantly fluctuating world, it was
better to have a little information about
a lot of things than to have a lot of in-
formation about a small segment of the
EFTA00307519
THE MAGICAL NUMBER SEVEN
environment. If a compromise was nec-
essary, the one we seem to have made is
clearly the more adaptive.
Pollack and Ficks's results are very
strongly suggestive of an argument that
linguists and phoneticians have been
making for some time (11). According
to the linguistic analysis of the sounds
of human speech, there are about eight
or ten dimensions—the linguists call
them distinctive features—that distin-
guish one phoneme from another. These
distinctive features are usually binary,
or at most ternary, in nature. For ex-
ample, a binary distinction is made be-
tween vowels and consonants, a binary
decision is made between oral and nasal
consonants, a ternary decision is made
among front, middle, and back pho-
nemes, etc.
This approach gives us
quite a different picture of speech per-
ception than we might otherwise obtain
from our studies of the speech spectrum
and of the ear's ability to discriminate
relative differences among pure tones.
I am personally much interested in this
new approach (15), and I regret that
there is not time to discuss it here.
It was probably with this linguistic
theory in mind that Pollack and Ficks
conducted a test on a set of tonal
stimuli that varied in eight dimensions,
but required only a binary decision on
each dimension. With these tones they
measured the transmitted information
at 6.9 bits, or about 120 recognizable
kinds of sounds. It is an intriguing
question, as yet unexplored, whether
one can go on adding dimensions in-
definitely in this way.
In human speech there is clearly a
limit to the number of dimensions that
we use. In this instance, however, it is
not known whether the limit is imposed
by the nature of the perceptual ma-
chinery that must recognize the sounds
or by the nature of the speech ma-
chinery that must produce them. Some-
body will have to do the experiment to
89
find out. There is a limit, however, at
about eight or nine distinctive features
in every language that has been studied,
and so when we talk we must resort to
still another trick for increasing our
channel capacity.
Language uses se-
quences of phonemes, so we make sev-
eral judgments successively when we
listen to words and sentences. That is
to say, we use both simultaneous and
successive discriminations in order to
expand the rather rigid limits imposed
by the inaccuracy of our absolute judg-
ments of simple magnitudes.
These multidimensional judgments are
strongly reminiscent of the abstraction
experiment of Kiilpe (14). As you may
remember, Kiilpe showed that observers
report more accurately on an attribute
for which they are set than on attributes
for which they are not set. For exam-
ple, Chapman (4) used three different
attributes and compared the results ob-
tained when the observers were in-
structed before the tachistoscopic pres-
entation with the results obtained when
they were not told until after the pres-
entation which one of the three attri-
butes was to be reported. When the
instruction was given in advance, the
judgments were more accurate. When
the instruction was given afterwards,
the subjects presumably had to judge all
three attributes in order to report on
any one of them and the accuracy was
correspondingly lower. This Is in com-
plete accord with the results we have
just been considering, where the ac-
curacy of judgment on each attribute
decreased as more dimensions were
added. The point is probably obvious,
but I shall make it anyhow, that the
abstraction experiments did not demon-
strate that people can judge only one
attribute at a time. They merely showed
what seems quite reasonable, that peo-
ple are less accurate if they must Judge
more than one attribute simultaneously.
EFTA00307520
90
GEORGE A. Main
&INTIM°
I cannot leave this general area with-
out mentioning, however briefly, the ex-
periments conducted at Mount Holyoke
College on the discrimination of num-
ber (12). In experiments by Kaufman,
Lord, Reese, and Volkmann random
patterns of dots were flashed on a screen
for t/5
a second. Anywhere from 1
to more than 200 dots could appear in
the pattern. The subject's task was to
report how many dots there were.
The first point to note is that on pat-
terns containing up to five or six dots
the subjects simply did not make errors.
The performance on these small num-
bers of dots was so different from the
performance with more dots that it was
given a special name. Below seven the
subjects were said to subitize; above
seven they were said to estimate. This
is, as you will recognize, what we once
optimistically called "the span of atten-
tion."
This discontinuity at seven is, of
course, suggestive.
Is this the same
basic process that limits our unidimen-
sional judgments to about seven cate-
gories? The generalization is tempting,
but not sound in my opinion. The data
on number estimates have not been ana-
lyzed in informational terms; but on
the basis of the published data I would
guess that the subjects transmitted
something more than four bits of in-
formation about the number of dots.
Using the same arguments as before, we
would conclude that there are about 20
or 30 distinguishable categories of nu-
merousness. This is considerably more
information than we would expect to
get from a unidimensional display. It
is, as a matter of fact, very much like a
two-dimensional display. Although the
dimensionality of the random dot pat-
terns is not entirely clear, these results
are in the same range as Klemmer and
Frick's for their two-dimensional dis-
play of dots in a square. Perhaps the
two dimensions of numerousness are
area and density. When the subject
can subitize, area and density may not
be the significant variables, but when
the subject must estimate perhaps they
are significant. In any event, the com-
parison is not so simple as it might
seem at first thought.
This is one of the ways in which the
magical number seven has persecuted
me. Here we have two closely related
kinds of experiments, both of which
point to the significance of the number
seven as a limit on our capacities. And
yet when we examine the matter more
closely, there seems to be a reasonable
suspicion that it is nothing more than
a coincidence.
THE SPAN OF IMMEDIATE MEMORY
Let me summarize the situation in
this way. There is a clear and definite
limit to the accuracy with which we can
identify absolutely the magnitude of
a unidimensional stimulus variable. I
would propose to call this limit the
span of absolute judgment, and I
maintain that for unidimensional judg-
ments this span is usually somewhere
in the neighborhood of seven. We are
not completely at the mercy of this
limited span, however, because we have
a variety of techniques for getting
around it and increasing the accuracy
of our judgments. The three most im-
portant of these devices are (a) to
make relative rather than absolute judg-
ments; or, if that is not possible, (b)
to increase the number of dimensions
along which the stimuli can differ; or
(c) to arrange the task in such a way
that we make a sequence of several ab-
solute judgments in a row.
The study of relative judgments is
one of the oldest topics in experimental
psychology, and I will not pause to re-
view it now. The second device, in-
creasing the dimensionality, we have just
considered. It seems that by adding
EFTA00307521
THE MAGICAL NUMBER SEVEN
more dimensions and requiring crude,
binary, yes-no judgments on each at-
tribute we can extend the span of abso-
lute judgment from seven to at least
ISO. Judging from our everyday be-
havior, the limit is probably in the
thousands, if indeed there is a limit. In
my opinion, we cannot go on compound-
ing dimensions indefinitely. I suspect
that there is also a span of perceptual
dimensionality and that this span is
somewhere in the neighborhood of ten,
but I must add at once that there is no
objective evidence to support this sus-
picion. This is a question sadly need-
ing experimental exploration.
Concerning the third device, the use
of successive judgments, I have quite a
bit to say because this device introduces
memory as the handmaiden of discrimi-
nation. And, since mnemonic processes
are at least as complex as are perceptual
processes, we can anticipate that their
interactions will not be easily disen-
tangled.
Suppose that we start by simply ex-
tending slightly the experimental pro-
cedure that we have been using. Up
to this point we have presented a single
stimulus and asked the observer to name
it immediately thereafter. We can ex-
tend this procedure by requiring the ob-
server to withhold his response until we
have given him several stimuli in suc-
cession. At the end of the sequence of
stimuli he then makes his response. We
still have the same sort of input-out-
put situation that is required for the
measurement of transmitted informa-
tion. But now we have passed from
an experiment on absolute judgment to
what is traditionally called an experi-
ment on immediate memory.
Before we look at any data on this
topic I feel I must give you a word of
warning to help you avoid some obvi-
ous associations that can be confusing.
Everybody knows that there is a finite
span of immediate memory and that for
91
a lot of different kinds of test materials
this span is about seven items in length.
I have just shown you that there is a
span of absolute judgment that can dis-
tinguish about seven categories and that
there is a span of attention that will
encompass about six objects at a glance.
What is more natural than to think that
all three of these spans are different as-
pects of a single underlying process?
And that is a fundamental mistake, as
I shall be at some pains to demonstrate.
This mistake is one of the malicious
persecutions that the magical number
seven has subjected me to.
My mistake went something like this.
We have seen that the invariant fea-
ture in the span of absolute judgment
is the amount of information that the
observer can transmit. There is a real
operational similarity between the ab-
solute judgment experiment and the
immediate memory experiment. If im-
mediate memory is like absolute judg-
ment, then it should follow that the in-
variant feature in the span of immediate
memory is also the amount of informa-
tion that an observer can retain. If the
amount of information in the span of
immediate memory is a constant, then
the span should be short when the indi-
vidual items contain a lot of informa-
tion and the span should be long when
the items contain little information. For
example, decimal digits are worth 3.3
bits apiece. We can recall about seven
of them, for a total of 23 bits of in-
formation. Isolated English words are
worth about 10 bits apiece. If the total
amount of information is to remain
constant at 23 bits, then we should be
able to remember only two or three
words chosen at random. In this way
I generated a theory about how the span
of immediate memory should vary as a
function of the amount of information
per item in the test materials.
The measurements of memory span in
the literature are suggestive on this
EFTA00307522
E 50
aa
20
00
92
GEORGE A. Main
question, but not definitive. And so it
was necessary to do the experiment to
see. Hayes (10) tried it out with five
different kinds of test materials: binary
digits, decimal digits, letters of the al-
phabet, letters plus decimal digits, and
with 1,000 monosyllabic words.
The
lists were read aloud at the rate of one
item per second and the subjects bad as
much time as they needed to give their
responses.
A procedure described by
Woodworth (20) was used to score the
responses.
The results are shown by the filled
circles in Fig. 7. Here the dotted line
indicates what the span should have
been if the amount of information in the
span were constant. The solid curves
represent the data. Hayes repeated the
experiment using test vocabularies of
different sizes but all containing only
English monosyllables (open circles in
Fig. 7). This more homogeneous test
material did not change the picture sig-
nificantly. With binary items the span
is about nine and, although it drops to
about five with monosyllabic English
words, the difference is far less than
the hypothesis of constant information
would require.
80
APIS
C44maN. 15(1103
DTI
I
SOIOITS WORDS
40
%CONSTANT
t INFORMATION
1
'PA
2
1
6
8
10
12
INFORMATION PER ITEM IN BITS
Pro. 7. Data from Hayes (10) on the span
of immediate memory plotted as a function
of the amount of Information per item in the
test materials.
50 -
Of 40
5
ffi so
it 20
Ct.
1
4
S
CONSIANT
NO OF
OS!
/.
z
I.ETTEF4
AtED
piGn‘
mrFORmATiON PER ITEM IN NITS
Fm. 8. Data from Pollack (16) on the
amount of information retained after one
presentation plotted as a function of the
amount of information per item in the test
materials.
There is nothing wrong with Hayes's
experiment, because Pollack (16) re-
peated it much more elaborately and
got essentially the same result. Pol-
lack took pains to measure the amount
of information transmitted and did not
rely on the traditional procedure for
scoring the responses. His results are
plotted in Fig. 8. Here it is clear that
the amount of information transmitted
is not a constant, but increases almost
linearly as the amount of information
per item in the input is increased.
And so the outcome is perfectly clear.
In spite of the coincidence that the
magical number seven appears in both
places, the span of absolute judgment
and the span of immediate memory are
quite different kinds of limitations that
are imposed on our ability to process
information. Absolute judgment is lim-
ited by the amount of information. Im-
mediate memory is limited by the num-
ber of items. In order to capture this dis-
tinction in somewhat picturesque terms,
I have fallen into the custom of distin-
guishing between bits of information
and chunks of information. Then I can
say that the number of bits of informa-
tion is constant for absolute judgment
and the number of chunks of informa-
EFTA00307523
THE MAGICAL NUMBER $EVEN
tion is constant for immediate memory.
The span of immediate memory seems
to be almost independent of the number
of bits per chunk, at least over the
range that has been examined to date.
The contrast of the terms bit and
chunk also serves to highlight the fact
that we are not very definite about what
constitutes a chunk of information. For
example, the memory span of five words
that Hayes obtained when each word
was drawn at random from a set of 1000
English monosyllables might just as ap-
propriately have been called a memory
span of 15 phonemes, since each word
had about three phonemes in it. Intui-
tively, it is clear that the subjects were
recalling five words, not IS phonemes,
but the logical distinction is not im-
mediately apparent.
We are dealing
here with a process of organizing or
grouping the input into familiar units
or chunks, and a great deal of learning
has gone into the formation of these
familiar units.
RECODING
In order to speak more precisely,
therefore, we must recognize the impor-
tance of grouping or organizing the in-
put sequence into units or chunks.
Since the memory span is a fixed num-
ber of chunks, we can increase the num-
ber of bits of information that it con-
tains simply by building larger and
larger chunks, each chunk containing
more information than before.
A man just beginning to learn radio-
telegraphic code hears each dit and dale
as a separate chunk. Soon he is able
to organize these sounds into letters and
then he can deal with the letters as
chunks.
Then the letters organize
themselves as words, which are still
larger chunks, and he begins to hear
whole phrases. I do not mean that each
step is a discrete process, or that pla-
teaus must appear in his learning curve,
for surely the levels of organization are
93
achieved at different rates and overlap
each other during the learning process.
I ant simply pointing to the obvious
fact that the dits and dahs are organ-
ized by learning into patterns and that
as these larger chunks emerge the
amount of message that the operator
can remember increases correspondingly.
In the terms I am proposing to use, the
operator learns to increase the bits per
chunk.
In the jargon of communication the-
ory, this process would be called recod-
ing. The input is given in a code that
contains many chunks with few bits per
chunk. The operator recodes the input
into another code that contains fewer
chunks with more bits per chunk. There
are many ways to do this recoding, but
probably the simplest is to group the
input events, apply a new name to the
group, and then remember the new name
rather than the original input events.
Since I am convinced that this proc-
ess is a very general and important one
for psychology, I want to tell you about
a demonstration experiment that should
make perfectly explicit what I am talk-
ing about. This experiment was con-
ducted by Sidney Smith and was re-
ported by him before the Eastern Psy-
chological Association in 1954.
Begin with the observed fact that peo-
ple can repeat back eight decimal digits,
but only nine binary digits. Since there
is a large discrepancy in the amount of
information recalled in these two cases,
we suspect at once that a recoding pro-
cedure could be used to increase the
span of immediate memory for binary
digits. In Table 1 a method for group-
ing and renaming is illustrated. Along
the top is a sequence of 18 binary digits,
far more than any subject was able to
recall after a single presentation. In
the next line these same binary digits
are grouped by pairs.
Four possible
pairs can occur: 00 is renamed 0, 01 is
renamed 1, 10 is renamed 2, and 1 i is
EFTA00307524
94
GEORGE A. Maxis
TABLE 1
WAYS OP RECODING SEQUENCES OF BINARY DIGITS
Elnary Malts (NO)
1 0 ( 0 0 0 1 0 0 1 t 1 0 0 1 1 1 0
2:1 Chunks
Recoding
3:1 Chunks
Receding
4:1 Chunks
Recoding
5:1 Chunks
Receding
10
10
00
10
01
11
00
11
10
2
2
0
2
1
3
0
3
2
101
030
103
111
001
110
5
0
4
1
1
6
1010
0010
0111
0011
10
10
2
7
3
10100
01001
11001
110
20
9
25
renamed 3. That is to say, we recode
from a base-two arithmetic to a base-
four arithmetic. In the recoded
se-
quence there are now just nine digits to
remember, and this is almost within the
span of immediate memory. In the next
line the same sequence of binary digits
is regrouped into chunks of three. There
are eight possible sequences of three, so
we give each sequence a new name be-
tween 0 and 7. Now we have recoded
from a sequence of 18 binary digits
into a sequence of 6 octal digits, and
this is well within the span of immedi-
ate memory. In the last two lines the
binary digits are grouped by fours and
by fives and are given decimal-digit
names from 0 to 15 and from 0 to 31.
It is reasonably obvious that this kind
of recoding increases the bits per chunk,
and packages the binary sequence into
a form that can be retained within the
span of immediate memory. So Smith
assembled 20 subjects and measured
their spans for binary and octal digits.
The spans were 9 for binaries and 7 for
octals.
Then he gave each recoding
scheme to five of the subjects. They
studied the recoding until they said
they understood it—for about S or 10
minutes. Then he tested their span for
binary digits again while they tried to
use the recoding schemes they had
studied.
The recoding schemes increased their
span for binary digits in every case.
But the increase was not as large as we
had expected on the basis of their span
for octal digits. Since the discrepancy
increased as the recoding ratio increased,
we reasoned that the few minutes the
subjects had spent learning the recod-
ing schemes had not been sufficient.
Apparently the translation from one
code to the other must be almost auto-
matic or the subject will lose part of the
next group while he is trying to remem-
ber the translation of the last group.
Since the 4:1 and 5:1 ratios require
considerable study, Smith decided to
imitate Ebbinghaus and do the experi-
ment on himself. With Germanic pa-
tience he drilled himself on each recod-
ing successively, and obtained the re-
sults shown in Fig. 9. Here the data
follow along rather nicely with the re-
sults you would predict on the basis of
his span for octal digits. He could re-
member 12 octal digits. With the 2:1
recoding, these 12 chunks were worth
24 binary digits. With the 3:1 recod-
ing they were worth 36 binary digits.
With the 4:1 and 5:1 recodings, they
were worth about 40 binary digits.
It is a little dramatic to watch a per-
son get 40 binary digits in a row and
then repeat them back without error.
However, if you think of this merely as
EFTA00307525
THE ?term. Nyman SAVES;
95
50
3 40
a so
20
10
0
1:1
2:1
3:1
4:1
5:1
RECODIWI RATI0
Flo. 9. The span of immediate memory for
binary digits is plotted as a function of the
recoding procedure used. The predicted func-
tion is obtained by multiplying the span for
octals by 2, 3 and 3.3 for recoding into base
4, base 8, and base 10, respectively.
a mnemonic trick for extending the
memory span, you will miss the more
important point that is implicit in
nearly all such mnemonic devices. The
point is that recoding is an extremely
powerful weapon for increasing the
amount of information that we can
deal with. In one form or another we
use recoding constantly in our daily
behavior.
In my opinion the most customary
kind of recoding that we do all the time
is to translate into a verbal code. When
there is a story or an argument or an
idea that we want to remember, we usu-
ally try to rephrase it "in our own
words." When we witness some event
we want to remember, we make a verbal
description of the event and then re-
member our verbalization. Upon recall
we recreate by secondary elaboration
the details that seem consistent with
the particular verbal recoding we hap-
pen to have made. The well-known a-
periment by Carmichael, Hogan, and
Walter (3) on the influence that names
have on the recall of visual figures is
one demonstration of the protege.
The inaccuracy of the testimony of
eyewitnesses is well known in legal psy-
chology, but the distortions of testi-
mony are not random—they follow
naturally from the particular recoding
that the witness used, and the particu-
lar recoding he used depends upon his
whole life history. Our language is tre-
mendously useful for repackaging ma-
terial into a few chunks rich in infor-
mation. I suspect that imagery is a
form of recoding, too, but images seem
much harder to get at operationally and
to study experimentally than the more
symbolic kinds of recoding.
It seems probable that even memori-
zation can be studied in these terms.
The process of memorizing may be sim-
ply the formation of chunks, or groups
of items that go together, until there
are few enough chunks so that we can
recall all the items. The work by Bons-
field and Cohen (2) on the occurrence
of clustering in the recall of words is
especially interesting in this respect.
SUMMARY
I have come to the end of the data
that I wanted to present, so I would
like now to make some summarizing re-
marks.
First, the span of absolute judgment
and the span of immediate memory im-
pose severe limitations on the amount
of information that we are able to re-
ceive, procecs, and remember. By or-
ganizing the stimulus input simultane-
ously into several dimensions and suc-
cessively into a sequence of chunks, we
manage to break (or at least stretch)
this informational bottleneck.
Second, the process of recoding is a
very important one in human psychol-
ogy and deserves much more explicit at-
tention than it has received. In par-
ticular, the kind of linguistic recoding
that people do seems to me to be the
very lifeblood of the thought processes.
Recoding procedures are a constant
concern to clinicians, social psycholo-
EFTA00307526
96
GEORGE A. MILLER
gists, linguists, and anthropologists and
yet, probably because recoding is less
accessible to experimental manipulation
than nonsense syllables or T mazes, the
traditional experimental psychologist has
contributed little or nothing to their
analysis.
Nevertheless, experimental
techniques can be used, methods of re-
coding can be specified, behavioral in-
dicants can be found. And I anticipate
that we will find a very orderly set of
relations describing what now seems an
uncharted wilderness of individual dif-
ferences.
Third, the concepts and measures
provided by the theory of information
provide a quantitative way of getting at
some of these questions. The theory
provides us with a yardstick for cali-
brating our stimulus materials and for
measuring the performance of our sub-
jects. In the interests of communica-
tion I have suppressed the technical de-
tails of information measurement and
have tried to express the ideas in more
familiar terms; I hope this paraphrase
will not lead you to think they are not
useful in research. Informational con-
cepts have already proved valuable in
the study of discrimination and of lan-
guage; they promise a great deal in the
study of learning and memory; and it
has even been proposed that they can
be useful in the study of concept for-
mation. A lot of questions that seemed
fruitless twenty or thirty years ago may
now be worth another look. In fact, I
feel that my story here must stop just
as it begins to get really interesting.
And finally, what about the magical
number seven? What about the seven
wonders of the world, the seven seas,
the seven deadly sins, the seven daugh-
ters of Atlas in the Pleiades, the seven
ages of man, the seven levels of hell,
the seven primary colors, the seven notes
of the musical scale, and the seven days
of the week? What about the seven-
point rating scale, the seven categories
for absolute judgment, the seven ob-
jects in the span of attention, and the
seven digits in the span of immediate
memory? For the present I propose to
withhold judgment. Perhaps there is
something deep and profound behind all
these sevens, something just calling out
for us to discover it. But I suspect
that it is only a pernicious, Pythagorean
coincidence.
REFERENCES
1. Bstas-Currint, 3. 6, Rooras, M. S., •
O'Cosznaz., D. N. Transmission of in-
formation about sucrose and saline solu-
tions through the sense of taste. J.
Puckett, 1955, 39, 157-160.
2. Bousinno, W. A., • Corms, B. H. The
occurrence of clustering in the recall of
randomly arranged words of different
frequencies-of-usage. J. gen. Psycho!.,
1955, 52, 83-93.
3. CAnccnaEL, L, Hones, H. P., & Werna,
A. A. An experimental study of the
effect of language on the reproduction
of visually perceived form. J. exp.
Psycho!., 1932, IS, 73-46.
4. CH•PMAN, 1). W. Relative effects of de-
terminate and Indeterminate Aufgabeit.
Amer. J. Psycho!., 1932, 44, 163-174.
S. Eutaw, C. W. Multidimensional stimu-
lus differences and accuracy of discrimi-
nation.
USAF, WADC Tech. Rep.,
1954, No. 34-16S.
6. Boum, C. W., & Hiss, H. W. Abso-
lute Judgments as a function of the
stimulus range and the number of
stimulus and response categories.
J.
up. Psychol., 1955, 49, 323-331.
7. Gusts, W. R. An informational analy-
sis of absolute Judgments of loudness.
J. exp. Psycho!., 1953, 46, 373-380.
8. HARZ, H. W., a GARNER, W. R. The ef-
fect of presenting various numbers of
discrete steps on scale reading accuracy.
J. exp. Psycho!., 1951, 42, 358-366.
9. HALSEY, R. M., • CIIAPANIS, A. Chro-
maticity-confusion contours in a com-
plex viewing situation. J. Opt. Soc.
Amer., 1954, 44, 442-454.
10. HAYES, J. R. M. Memory span for sev-
eral vocabularies as a function of vo-
cabulary size. In Quarterly Progress
Report, Cambridge, Mass.: Acoustics
Laboratory, Massachusetts Institute of
Technology, jan.-June, 1932.
EFTA00307527
THE MAGICAL NUMBER SEVEN
11. Jxxossott, R., PANT, C. G. M., a Kent,
M.
Preliminaries to speech analysis.
Cambridge, Mass.: Acoustics Labora-
tory, Massachusetts Institute of Tech-
nology, 1952. (Tech. Rep. No. 13.)
12. Kannuar, E. L., LORD, M. W., Rater,
T. W., a VOLX11ANN, J. The discrimi-
nation of visual number.
Amer. J.
Psycho!., 1949, 62, 498-525.
13. lannialt, E. T., a Fate; F. C. Assimi-
lation of information from dot and
matrix patterns. J. exp. Psycho!., 1953,
45, 15-19.
14. KIH.PE, O.
Versuche Ober Abstraktion.
Ber. Si. d. I ?Contr. J. exper. Psycho!.,
1904, 56-68.
15. Maw, G. A., a NICELY, P. E. An analy-
sis of perceptual confusions among some
97
English consonants. J. Accost. Soc.
Amer., 1955, 27, 338-352.
16. POLLACK, I. The assimilation of sequen-
tially encoded information. Amer. J.
Psyche, 1953, 66, 421-435.
17. POLLACE, I. The information of elemen-
tary auditory displays. J. Acoust. Soc.
Amer., 1952, 24, 745-749.
18. POLLACK, I. The information of elemen-
tary auditory displays. II. J. Acosta.
Soc. Amer., 1953, 25, 765-769.
19. Poriacx, I., a Fruits, L. Information of
elementary multi-dimensional auditory
displays. J. Acosta. Soc. Amer., 1954,
26, 155-158.
20. WOODWORTH, R. S.
Experimental psy-
chology. New York: Holt, 1938.
(Received May 4, 1955)
EFTA00307528
Document Preview
PDF source document
This document was extracted from a PDF. No image preview is available. The OCR text is shown on the left.
This document was extracted from a PDF. No image preview is available. The OCR text is shown on the left.
Extracted Information
Dates
Document Details
| Filename | EFTA00307512.pdf |
| File Size | 2617.9 KB |
| OCR Confidence | 85.0% |
| Has Readable Text | Yes |
| Text Length | 56,774 characters |
| Indexed | 2026-02-11T13:25:23.728271 |