Investigating 'um' and 'uh' and other hesitation phenomena

My greatest hits!

Okay, this is shameless self-promotion, but if anyone is curious about the work I've done on filled pauses, then this list should give you some highlights. Admittedly, I'm not quite as prolific as some researchers, but I hope you'll find some interesting and even innovative work here.

Though my first work on filled pauses was my master's dissertation, the content of it has largely been superseded by many later works by other researchers, so I'll skip it here. Perhaps the first work that includes a comprehensive review of filled pauses as they relate to second language teaching is “Filled Pauses in Language Teaching: Why and How”[1] wherein I do exactly what the title says: review the relevant research and make an argument for both why and how filled pauses should be addressed more explicitly in language teaching.

Another work that has formed the basis for many later works has been the construction of the Crosslinguistic Corpus of Hesitation Phenomena (CCHP) which is described in the aptly named “Crosslinguistic Corpus of Hesitation Phenomena: A Corpus for Investigating First and Second Language Speech Performance”[2]. Those who wish to know about the corpus in detail should consult that work.

Building on the CCHP are several later works including “Temporal Variables in First and Second Language Speech and Perception of Fluency”[3] which examines the correlations between first and second language speech behavior related to fluency. I looked at the crosslinguistic perception of fluency in “Differences in second language speech fluency ratings: Native versus nonnative listeners”[4] using the CCHP has the listening stimuli. And in “A Comparison of Form and Temporal Characteristics of Filled Pauses in L1 Japanese and L2 English”[5], I investigated the relationship between filled pauses in first and second language speech.

Two other works focus on first language filled pauses, using data in English. The first is a look at how silent and filled pauses relate to each other in speech perception in “The structural signaling effect of silent and filled pauses”[6], showing that silent pauses -- despite being "silent" seem to have a greater impact on listeners' perception of sentence structure. And last, in “A Comparison of Disfluencies in Scripted and Non-Scripted Spontaneous Speech”[7], I've looked at some distributional differences between the use of filled pauses in spontaneous speech versus those in scripted or simulated situations in TV and film production.

I have several other works as well, but I suppose I should show some humility and stop here. But if anyone really wants to see what I've done, then check out the FPRC bibliography and narrow to "Rose". Or, if you're curious about works I've done in other areas, you can see an exhaustive list at my institutional web site.


  • Ralph L. Rose, “Filled Pauses in Language Teaching: Why and How,” Bulletin of Gunma Prefectural Women’s University, vol. 29, 2008, pp. 47-64.

    Abstract Filled Pauses (uh, um) are ubiquitous elements of spontaneous speech but have received relatively little attention in second language teaching. Perhaps this is because filled pauses have often been regarded as meaningless elements resulting from speech processing difficulties. This paper draws from research in widely disparate fields to show that speakers and listeners use them systematically and meaningfully. These facts are used to generate a unified and coherent model of filled pauses in spontaneous speech. This model is then used to develop a concept of communicative competence in which filled pauses play a role at the interface between pragmatic constraints and communication strategies. The article concludes with practical recommendations for how filled pauses may be incorporated into the second-language teaching curriculum.

  • Ralph L. Rose, “Crosslinguistic Corpus of Hesitation Phenomena: A Corpus for Investigating First and Second Language Speech Performance,” in INTERSPEECH 2013, Lyon, France, 08/2013 2013, pp. 992-996.

    Abstract There is a growing consensus that there is a need to evaluate second language speech performance with respect to first language speech behavior. To support this need, the Crosslinguistic Corpus of Hesitation Phenomena was developed. This freely available corpus is designed to investigate the crosslinguistic influence of speech patterns and consists of recordings of speakers producing first and second language speech samples in response to parallel elicitation tasks in each language. Preliminary results from the corpus are consistent with other findings that second language performance is sometimes correlated with first language speech behavior. In particular, findings show that silent pause rate and duration as well as other hesitation phenomena correlate with first language performance while speech rate does not. Interestingly, repeats also differ from first language production. Results show that the corpus may be a useful tool for researchers who wish to investigate the correspondence between first and second language speech, particularly with respect to the use of hesitation phenomena.

    Keywords corpus, hesitation phenomena, second language speech

  • Ralph Rose, “Temporal Variables in First and Second Language Speech and Perception of Fluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0405.1-5.

    Abstract Evidence is accumulating that many temporal features of second language speech are correlated with those of first language speech. This study looks at the correlation between articulation rate, pause rate, and mean pause duration in Japanese first and English second language speech and how second language fluency raters perceive these. In a crosslinguistic corpus of spontaneous speech, mean pause duration was found to have a near-high correlation while the other two temporal variables have a moderate correlation. A subsequent elicitation of fluency judgments on the second language English speech via Amazon Mechanical Turk showed that ratings were highly dependent on pause duration, rather less on articulation rate, but not on pause rate. Results suggest that raters’ perception of second language fluency is divergent from speakers’ actual second language development: Ratings are related to features that are not indicative of second language development but rather of individual speech patterns.

    Keywords articulation rate, Fluency, second language acquisition, silent pause

  • Ralph L. Rose, “Differences in second language speech fluency ratings: Native versus nonnative listeners,” February 2017.

    Abstract (none)

  • Ralph L. Rose, “A Comparison of Form and Temporal Characteristics of Filled Pauses in L1 Japanese and L2 English,” Journal of the Phonetic Society of Japan, vol. 21, no. 3, 2017, pp. 33-40. DOI: 10.24467/onseikenkyu.21.3_33.

    Abstract Filled pauses (FPs) in English can be either monophonemic ‘uh’ [ə] or polyphonemic ‘um’ [əm]. These differ temporally: shorter ‘uh’ is associated with shorter overall delay (including silent pauses). Japanese FPs are more varied, including both monophonemic ([ε], [ŋ]) and polyphonemic ([ε:to], [ɑno]) forms. This study compares the FPs of native Japanese speakers in a crosslinguistic speech corpus. Results show speakers use FPs with a lower F1 than native English speakers and strongly prefer the monophonemic form. Duration patterns are similar, but low proficiency speakers delay longer with monophonemic FPs. Results suggest possibilities for nonnative speech detection in speech applications.

  • Ralph L. Rose, “The structural signaling effect of silent and filled pauses,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 19-22. DOI: 10.21862/diss-09-006-rose.

    Abstract Filled pauses (uh, um) have been shown in a number of studies to have a facilitative effect for listeners, such as helping them better perceive the syntactic structure of ongoing speech. This may be because the extra time afforded by the filled pause gives listeners more time to process the input. Theoretically, then, silent pauses should show a comparable effect. The present study tests this prediction using a grammaticality judgment task following a study by Bailey and Ferreira (2003). Results show that filled and silent pauses have a comparable influence on listeners’ grammaticality judgments but further suggest that listeners deem silent pauses as more important and influential markers.

  • Ralph L. Rose, “A comparison of filled pauses in scripted and non-scripted spontaneous speech,” in The 3rd International Symposium on Linguistic Patterns in Spontaneous Speech, Taipei, Taiwan, November 2019, pp. 21-25.

    Abstract Television and film productions are heavily scripted, but intend to portray speech as unscripted within the fiction of the dramatic universe they depict. Previous evidence (Quaglio, 2009) suggests however, that various lexical features of speech occur in such scripted spontaneous speech differently than they do in actual spontaneous speech. The present study is a comparison of the occurrence of filled pause disfluencies (in English, uh and um) in scripted spontaneous speech and actual spontaneous speech, to see if the basic usage patterns are similar. Using the web site interface, filled pauses were examined in three corpora (spontaneous speech, TV transcripts, and movie transcripts) in terms of their basic frequency of occurrence, their um:uh ratios, and their structural distribution with respect to sentence boundaries. Each was also evaluated in terms of how they shifted over time. Results show that the disfluency patterns of scripted spontaneous speech are similar in many ways to that of actual spontaneous speech. The frequency of filled pauses is similar to that shown in other major corpora and the um:uh ratio also replicates a trend observed in other work (Wieling et al, 2016; Fruehwald, 2016) suggesting an ongoing shift toward the use of um over uh but with television and film speech patterns lagging that of society.