Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

The CORE of filled pause research

The study of filled pauses now stretches well over half a century and is widely dispersed across numerous academic fields. As such it is very difficult to narrow down on a small body of work that represents the full range of work and knowledge on um and uh. No doubt another researcher would come up with a very different list. I have decided to emphasize works that represent the state of the field, although they may not be the earliest works that investigated the specific aspect of filled pauses that they focus on. These are all easily accessible resources, some open license, and well-written and well-documented. Works that might have taken their place in the list are almost surely cited in their respective references.

A good place to start would be with work by Stanley Schachter and colleagues in “Speech Disfluency and the Structure of Knowledge”[1] which shows how some very interesting and thoughtful research can be done in quite minimal circumstances. One of those colleagues, Nicholas Christenfeld wrote a further paper, “Options and Ums”[2], which uses a similarly simple experimental design to investigate and establish some basic notions about the cognitive states surrounding the use of filled pauses.

Two of the most highly cited works in the field are by Herb Clark and Jean Fox Tree as well as by Elizabeth Shriberg. Shriberg's doctoral dissertation, “Preliminaries to a theory of speech disfluencies”[3], is about more than just filled pauses (covering also silent pauses repairs, lengthenings, false starts, etc.), but her focus on filled pauses in the work set down the groundwork for studying the acoustic phonetic properties of filled pauses. Clark and Fox Tree's work, “Using uh and um in spontaneous speaking”[4], is arguably what brought wide attention to the study of filled pauses in psycholinguistics and related fields. The paper is chock full of references, information about filled pauses in various languages and, like Shriberg's thesis, is ground-breaking work. Their filler-as-word hypothesis remains a challenging and motivational idea in research on filled pauses. This hypothesis is controversial, however, and many researchers actually argue against it. One particularly cogent argument by Martin Corley and Oliver Stewart, “Hesitation Disfluencies in Spontaneous Speech: The Meaning of um”[5], lays out comprehensive evidence and concludes that the evidence that fillers are words is simply not conclusive.

Various other researchers have looked at specific phenomena regarding filled pauses. Marc Swerts in “Filled pauses as markers of discourse structure”[6] shows how speakers tend to use filled pauses more at major than minor discourse boundaries. In “Listeners' uses of "um" and "uh" in speech comprehension”[7], Jean Fox Tree demonstrates that listeners seem to make some inferences about speakers' meaning based on the occurrence of filled pauses. This research theme has been contributed to by various researchers in several ways. Karl Bailey and Fernanda Ferreira observe in “Disfluencies affect the parsing of garden-path sentences”[8] that the placement of filled pauses relative to clause boundaries affects how listeners perceive the sentence structure. Jennifer Arnold and colleagues in “Disfluencies Signal Theee, Um, New Information”[9] show that listeners assume that entities described by noun phrases that have a filled pause before them are discourse-new entities. Martin Corley and colleagues show in “It's the way that you, er, say it: Hesitations in speech affect language comprehension”[10] that listeners further judge words following filled pauses as unlikely words (that is, low contextual probability). And Scott Fraundorf and Duane Watson in “The disfluent discourse: Effects of filled pauses on recall”[11] show that listeners' memory of information is enhanced when that information follows a filled pause.

As I have already suggested, there are surely many more excellent works that could have been included in this list. I apologize to anyone who feels their own paper should be listed here. Its exclusion is not intended as any evaluative judgment. Some works have been included in other lists (for example, acoustic phonetics of filled pauses and second language use and perception). However, I would suggest that the list here represents the thematic core of filled pause research. Anyone looking for a quick introduction to the field would do well to read (or at least skim) all of these works.


  • Stanley Schachter, Nicholas Christenfeld, Bernard Ravina, and Frances Bilous, “Speech Disfluency and the Structure of Knowledge,” Journal of Personality and Social Psychology, vol. 60, no. 3, 1991, pp. 362-367. DOI: 10.1037/0022-3514.60.3.362.

    Abstract It is generally accepted that filled pauses ("uh," "er," and "um") indicate time out while the speaker searches for the next word or phrase. It is hypothesized that the more options, the more likely that a speaker will say "uh." The academic disciplines differ in the extent to which their subject matter and mode of thought require a speaker to choose among options. The more formal, structured, and factual the discipline, the fewer the options. It follows that lecturers in the humanities should use more filled pauses during lectures than social scientists and that natural scientists should use fewest of all. Observations of lecturers in 10 academic disciplines indicate that this is the case. That this is due to subject matter rather than to self-selection into disciplines is suggested by observations of this same set of lecturers all speaking on a common subject. In this circumstance, the academic disciplines are identical in the number of filled pauses used.

    Keywords lecturers, number of filled pauses in speech, word options in academic discipline

  • Nicholas Christenfeld, “Options and Ums,” Journal of Language & Social Psychology, vol. 13, no. 2, June 1994, pp. 192-199. DOI: 10.1177/0261927X94132005.

    Abstract Most people who have speculated about the causes of ums in speech (also known as filled pauses) have suggested that they are produced when the speaker is confronted with a challenging choice. This idea, in spite of its intuitive appeal and theoretical usefulness, has never been directly tested. The present experiment manipulates the complexity of options facing a speaker by having subjects describe mazes with a varying number of alternate possible routes. The mazes with more options did produce more filled pauses. However, in describing even the simplest maze, one of the easiest possible speech tasks, the subjects still said um regularly. It is suggested that options are only one factor in filled pause production, and that breaking up the rhythm of speech may also fosterfilled pauses.

  • Elizabeth Shriberg, “Preliminaries to a theory of speech disfluencies,” Master's Thesis, University of California, Berkeley. 1994.

    Abstract This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably regular trends in a number of dimensions. These regularities have consequences for models of human language production; they can also be exploited to improve performance in speech applications. The method includes analysis of over 5000 hand-annotated disfluencies from a database 250,000 words) containing three different styles of spontaneous speech: task-oriented human-computer dialog, task-oriented human-human dialog, and human-human conversation on a prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations correspond to observable characteristics ("features") in the data, including: 1) the speech domain; 2) the speaker; 3) the sentence in which a disfluency occurs; 4) word-related characteristics of the disfluency; and 5) simple acoustic characteristics of the disfluency. A methodology is developed for representing these features in a database format, and an algorithm is provided for automatic disfluency type classification based on this representation. Results show regular trends in disfluency rates by sentence length, by disfluency position, by presence of another disfluency in the same sentence, by disfluency type, and by combinations of these features both across and within speakers. Regularities are also found for word-related features of the disfluency, including the number of excised words, the rate of cut-off words, and the rate of editing phrases. Additional analyses describe characteristics of overlapping disfluencies and prosodic characteristics of the simplest disfluency types. Across analyses, data from the three different speech styles are compared; where relevant, simple parametric models are provided. In sum, disfluencies show regularities in a variety of dimensions. These regularities can help guide and constrain models of spoken language production. In addition they can be modeled in applications to improve the automatic processing of spontaneous speech.

  • Herbert Clark, and Jean E. Fox Tree, “Using uh and um in spontaneous speaking,” Cognition, vol. 84, no. 1, May 2002, pp. 73-111. DOI: 10.1016/S0010-0277(02)00017-3.

    Abstract The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in "and-uh"), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.

    Keywords conversation, Dialogue, disfluencies, Language production, spontaneous speech, uh, um

  • Martin Corley, and Oliver W. Stewart, “Hesitation Disfluencies in Spontaneous Speech: The Meaning of um,” Language and Linguistics Compass, vol. 2, no. 4, July 2008, pp. 589-602. DOI: 10.1111/j.1749-818X.2008.00068.x.

    Abstract Human speech is peppered with ums and uhs, among other signs of hesitation in the planning process. But are these so-called fillers (or filled pauses) intentionally uttered by speakers, or are they side-effects of difficulties in the planning process? And how do listeners respond to them? In the present paper, we review evidence concerning the production and comprehension of fillers such as um and uh, in an attempt to determine whether they can be said to be ’words’ with ’meanings’ that are understood by listeners. We conclude that, whereas listeners are highly sensitive to hesitation disfluencies in speech, there is little evidence to suggest that they are intentionally produced, or should be considered to be words in the conventional sense.

  • Marc Swerts, “Filled pauses as markers of discourse structure,” Journal of Pragmatics, vol. 30, no. 4, 1998, pp. 485-496. DOI: 10.1016/S0378-2166(98)00014-9.

    Abstract This study aims to test whether filled pauses (FPs) may highlight discourse structure. This question is tackled from the perspectives of both the speaker and the listener. More specifically, it is first investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically different from those at shallower breaks. Analyses of twelve spontaneous monologues (Dutch) show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phrase-initially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also differ with respect to the distribution of neighbouring silent pauses. Finally, a general linear model reveals that discourse structure can to some extent be predicted from characteristics of the FPs.

  • Jean E. Fox Tree, “Listeners’ uses of "um" and "uh" in speech comprehension,” Memory and Cognition, vol. 29, no. 2, March 2001, pp. 320-326.$$/content/29/2/320.abstract.

    Abstract Despite their frequency in conversational talk, little is known about how ums and uhs affect listeners’ on-line processing of spontaneous speech. Two studies of ums and uhs in English and Dutch reveal that hearing an uh has a beneficial effect on listeners’ ability to recognize words in upcoming speech, but that hearing an um has neither a beneficial nor a detrimental effect. The results suggest that um and uh are different from one another and support the hypothesis that uh is a signal of short upcoming delay and um is a signal of a long upcoming delay.

  • Karl G.D. Bailey, and Fernanda Ferreira, “Disfluencies affect the parsing of garden-path sentences,” Journal of Memory and Language, vol. 49, no. 2, 2003, pp. 183-200. DOI: 10.1016/S0749-596X(03)00027-5.

    Abstract Spontaneous speech differs in several ways from the sentences often studied in psycholinguistics experiments. One important difference is that naturally produced utterances often contain disfluencies. In this study, we examined how the presence of “uh” in a spoken sentence might affect processes that assign syntactic structure (i.e., parsing). Four experiments are reported. In the first, participants judged the grammaticality of sentences that had disfluencies either right before the head noun of the ambiguous phrase or after (e.g., Sandra bumped into the busboy and the uh uh waiter told her to be careful or Sandra bumped into the busboy and the waiter uh uh told her to be careful). Sentences in the latter condition were judged grammatical less often. This result was replicated in the second experiment, in which disfluencies were replaced with environmental sounds. These findings suggest that interruptions can affect syntactic parsing, and the content of the interruption need not be speechlike. In Experiments 3 and 4 we tested whether these effects occurred because listeners use interruptions as cues to help resolve a structural ambiguity. Results from these latter two grammaticality judgment tasks suggest that when an interruption occurs before an ambiguous noun phrase, comprehenders are more likely to assume that the noun phrase is the subject of a new clause rather than the object of an old one, and furthermore, it appears that the parser is relatively insensitive to the form of the interruption. We conclude that disfluencies can influence the parser by signaling a particular structure; at the same time, for the parser, a disfluency might be any interruption to the flow of speech.

  • Jennifer Arnold, Michael K. Tanenhaus, Rebecca Altmann, and Maria Fagnano, “The Old and Thee, uh, New: Disfluency and Reference Resolution,” Psychological Science, vol. 15, no. 9, September 2004, pp. 578-582. DOI: 10.1111/j.0956-7976.2004.00723.x.

    Abstract Most research on the rapid mental processes of online language processing has been limited to the study of idealized, fluent utterances. Yet speakers are often disfluent, for example, saying "thee, uh, candle" instead of "the candle." By monitoring listeners’ eye movements to objects in a display, we demonstrated that the fluency of an article ("thee uh" vs. "the") affects how listeners interpret the following noun. With a fluent article, listeners were biased toward an object that had been mentioned previously, but with a disfluent article, they were biased toward an object that had not been mentioned. These biases were apparent as early as lexical information became available, showing that disfluency affects the basic processes of decoding linguistic input.

  • Martin Corley, Lucy J. MacGregor, and David Donaldson, “It’s the way that you, er, say it: Hesitations in speech affect language comprehension,” Cognition, vol. 105, no. 3, 2006, pp. 658-698. DOI: 10.1016/j.cognition.2006.10.010.

    Abstract Everyday speech is littered with disfluency, often correlated with the production of less predictable words (e.g., Beattie & Butterworth [Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses in spontaneous speech. Language and Speech, 22, 201-211.]). But what are the effects of disfluency on listeners? In an ERP experiment which compared fluent to disfluent utterances, we established an N400 effect for unpredictable compared to predictable words. This effect, reflecting the difference in ease of integrating words into their contexts, was reduced in cases where the target words were preceded by a hesitation marked by the word er. Moreover, a subsequent recognition memory test showed that words preceded by disfluency were more likely to be remembered. The study demonstrates that hesitation affects the way in which listeners process spoken language, and that these changes are associated with longer-term consequences for the representation of the message.

    Keywords disfluency, ERPs, Language comprehension, speech

  • Scott H. Fraundorf, and Duane G. Watson, “The disfluent discourse: Effects of filled pauses on recall,” Journal of Memory and Language, vol. 65, no. 2, 2011, pp. 161-175. DOI:

    Abstract We investigated the mechanisms by which fillers, such as uh and um, affect memory for discourse. Participants listened to and attempted to recall recorded passages adapted from Alice’s Adventures in Wonderland. The type and location of interruptions were manipulated through digital splicing. In Experiment 1, we tested a processing time account of fillers’ effects. While fillers facilitated recall, coughs matched in duration to the fillers impaired recall, suggesting that fillers’ benefits cannot be attributed to adding processing time. In Experiment 2, fillers’ locations were manipulated based on norming data to be either predictive or non-predictive of upcoming material. Fillers facilitated recall in both cases, inconsistent with an account in which listeners predict upcoming material using past experience with the distribution of fillers. Instead, these results suggest an attentional orienting account in which fillers direct attention to the speech stream but do not always result in specific predictions about upcoming material.

    Keywords Language comprehension