Acoustical Society of America
 – 138th Meeting Lay Language Papers
Diana Deutsch- [email protected]
Trevor Henthorn
Mark Dolson
Department of Psychology, University of California San Diego
La Jolla, CA 92093
Through End of October Call: 858-453-1558 or 858-534-4615 email: ( [email protected] )
During ASA Meeting (2-5 Nov) Call: 614-228-3200 ( Courtyard by Marriott )
Popular version of paper 4pPP5 
Presented Thursday afternoon, November 4, 1999 
138th ASA Meeting, Columbus, Ohio
Absolute pitch, or perfect pitch, is defined as the ability to produce or identify the pitch of a tone without reference to an external standard. It is considered to be an extremely rare faculty, with an estimated incidence in our population of less than one in ten thousand. There have been many speculations concerning its genesis: Some have argued that it is an inherited trait, others that it can be acquired through extensive training, and yet others that its attainment requires that the individual have taken music lessons at an early age. However, the evidence with respect to these various hypotheses has been equivocal.
This paper shows that native speakers of two tone languages – Vietnamese and Mandarin – show remarkably precise absolute pitch in reading out lists of words. Since pitch is an essential feature in conveying the meaning of words in tone language, our findings lead us to conjecture that the potential for acquiring absolute pitch may be universal, and that it can be realised by the association of pitches with meaningful words very early in life.
Seven native speakers of Vietnamese served as subjects in Experiment 1 1. Each subject was tested individually in two sessions, which were held on different days. In each session, the subject was handed a list of ten Vietnamese words to read out, at a rate of roughly one word every two seconds. The words were chosen so that they spanned the range of tones in Vietnamese speech. The same list was handed to the subject to read out on both days.
The speech samples were recorded and entered into computer memory. Then for each spoken word, pitch estimates were taken at 5 ms intervals, and from these estimates an average pitch was derived. Then for each subject, we calculated the difference between the average pitches produced by each word when it was read out on different days, and we averaged these differences across the words in the list 2.
The results showed remarkable consistencies: The data from all seven subjects displayed averaged pitch differences of less than 1.1 semitone, and the data from four of the seven subjects displayed averaged pitch differences of less than .5 semitone. The subjects must therefore have been referring to precise and stable absolute pitch templates in enunciating the words.
Sound Demonstrations 1 and 2 illustrate these findings. Demonstration 1 presents the two readings of the list by a male subject. The readings have been interleaved so as to enable the listener to make direct comparisons between the pitches produced by each word on the different days. First we present Word 1 read on Day 1, followed by Word 1 read on Day 2. Next we present Word 2 read on Day 1, followed by Word 2 read on Day 2; and so on until all ten words have been presented. Demonstration 2 presents the two readings of a female subject, interleaved in the same way.
Sound Demonstration 1. (44.1 KHz, 16-bit, mono, 2.6MB)
Sound Demonstration 2. (44.1 KHz, 16-bit, mono, 2.4MB)
For Experiment 2, a list of twelve words was prepared which spanned the range of tones in Mandarin speech, and fifteen native speakers of Mandarin served as subjects 3.
The basic procedure was as in Experiment 1. However, the design was extended to investigate whether the pitch differences found in enunciating the same words on different days should be taken to reflect imprecisions in the subjects’ absolute pitch templates, or whether other factors were responsible.
To this end, each subject was asked to read out the word list twice in each session, with roughly 20 seconds intervening between the two readings. The same word list was therefore read out four times; twice on each of the two different days. Four difference scores were then calculated: (1) between the first readings on Day 1 and Day 2, (2) between the second readings on Day 1 and Day 2, (3) between the first and second readings on Day 1, and (4) between the first and second readings on Day 2.
The results are displayed in the table below. As can be seen, remarkable and surprising consistencies were obtained. For all comparisons, half of the subjects showed averaged pitch differences of less than .5 semitone, and one-third of the subjects showed averaged pitch differences of less than .25 semitone. In addition, statistical analyses found no significant difference in the degree of pitch consistency in reading out the word list on different days, compared with reading it twice in immediate succession. This leads us to conclude that although the pitch differences obtained were remarkably small, they nevertheless underestimated the precision of the subjects’ absolute pitch templates.
PITCH DIFFERENCE
(FRACTIONS OF A SEMITONE)
Mandarin speakers. Differences between the pitches produced in reading the list of words on different occasions. The table displays, for each comparison, the number of subjects whose difference scores fell in each .25 semitone bin.
First reading; Day 1 vs. Day2 Second reading; Day 1 vs. Day 2
 First vs. Second Reading; Day 1 First vs. second reading; Day 2
Sound Demonstrations 3 and 4 present, for two subjects, the first readings of the word list on Day 1 and on Day 2. In each case, the two readings have been interleaved as in Demonstrations 1 and 2, so as to enable direct comparisons to be made between the pitches produced by each word on the different days. Demonstration 3 presents these readings by a male subject, and Demonstration 4 by 
a female subject.
Sound Demonstration 3 (44.1 KHz, 16-bit, mono, 2.9 MB)
Sound Demonstration 4 (44.1 KHz, 16-bit, mono, 3.5 MB)
In summary, the findings show that speakers of Vietnamese and Mandarin possess an extraordinarily precise form of absolute pitch, which is reflected in their enunciation of words. Since all except one of the subjects in the study had received little or no musical training, we conclude that this ability resulted from their early acquisition of tone language, so that they had learned to associate pitches with meaningful words very early in life.4
Footnotes
1. The subjects (two male and five female) ranged in age from 27 to 56 years, and had grown up in different regions of Vietnam. They had been living in the U.S. for periods ranging from a few months to 17 years. All subjects had had very little or no musical training.
2. The pitch estimates were averaged along the musical scale, and so along a log frequency continuum.
3. The subjects (seven male and eight female) were all graduate students at the University of California, San Diego, and had grown up in different regions of the P.R.C. They had been living in the U.S. for periods ranging from a few months to six years. All but one subject had had very little or no musical training.
4. We are grateful to Vincent Hsieh, Leonard Zhang, Karin Liu, Van Doan, Quyen Doan, Lynsey Doan, Phi Nguyen, and Larry McClure for their contributions in the different phases of the study.