Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021)

Intro | DiSS 1999 | DiSS 2001 | DiSS 2003 | DiSS 2005 | DiSS-LPSS 2010 | DiSS 2013 | DiSS 2015 | DiSS 2017 | DiSS 2019 | DiSS 2021

Disfluency in Spontaneous Speech (DiSS) workshop 2021 logo

The tenth Workshop on Disfluency in Spontaneous Speech was held virtually as a satellite event of the INTERSPEECH annual conference. Furthermore, this two-day edition of DiSS was followed by a special day focusing on (Dis)Fluency in speech and language pathology (Aug 27).

Date: August 25-26, 2021

Location: Paris 8 University Vincennes (St. Denis, France)

Organizers: Ivana Didirková (Chair), Robert Eklund, Pierre-Olivier Gaumin, Fabrice Hirsch, Takeki Kamiyama, Sébastien Le Maguer, Ralph L. Rose, Sabina Tabacaru

Invited speakers: Liesbeth Degand, Vered Silber-Varod, Bridget Walsh

Web site:

Papers presented

(Download references in bibtex format here. Proceedings available in full (TBA)).

  • Simon Betz, Nataliya Bryhadyr, Loulou Kosmala, and Loredana Schettino, “A crosslinguistic study on the interplay of fillers and silences,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 47-52.

    Abstract We present a crosslinguistic study on the interplay of hesitation silences and fillers in conversation. The research questions have been addressed for English in a previous DiSS workshop paper (Betz & Kosmala, 2019) and this study extends the analysis to German, Italian and French. The research questions are: 1) Does the type of the filler influence following silence duration 2) Does the duration of the filler correlate with silence duration 3) Does silence duration vary depending on its distance from filler. The analysis shows cross-linguistic similarities and differences, thus highlighting the role and the language- and culture-specific nature of disfluencies.

  • Judit Bóna, “Disfluencies in spontaneous speech: The effect of age, sex and speech task,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 99-104.

    Abstract The main question of this study is if there are differences in the occurrence of disfluencies of young and old males and females depending on speech task. Frequency and types of disfluencies of 20 young and 20 old speakers were analyzed in three different speech tasks. Results show that speakers’ age has significant effect on the frequency of disfluencies only in males’ speech. There are disfluencies which are more characteristic of old speakers’ speech, and others of young speakers’ speech. Speech task has significant effect on the analyzed parameters in both ages, while sex has the least impact on frequency.

  • Liesbeth Degand, “Discourse markers as markers of (dis)fluency: The role of peripheral position,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 1-2.

    Abstract Studies on the relationship between discourse markers (DMs) and (dis)fluency have a Janus-headed face. On the one hand, DMs are described as structuring devices key to the local and global organization of discourse. As such, they contribute to its overall fluency. On the other hand, they have been described as traces of impediments in the speech production process, thus signalling disfluency. In other words, DMs are characterized by “functional ambivalence”, a notion reflecting their effects as symptoms of production difficulties and as signals of inferences to be made (Crible, 2018:3, see also Clark & Fox Tree, 2002). Starting point of this presentation is the observation that DMs occur overwhelmingly in initial position of their host unit, where they fulfil specific discourse functions. Discourse Markers may also occur in a functionally-motivated way in final position, be it less frequently. The simplified hypothesis of this study is that DMs in peripheral position have a fluent signalling function, while DMs in non-peripheral position are symptomatic of disfluent use. We will show that this dichotomy needs to be fine-tuned considering the type of host unit under study. On the basis of previous work investigating the relationship between DM function, DM position and the linguistic type of host unit (syntactic clause, intonation unit or speech turn) (Degand & Crible, in press), the hypothesis is that DMs work as functional boundary markers at the syntactic (clause) and the interactional (turn) levels, but not at the prosodic (intonation unit) level. Other (medial) uses should be less functionally motivated and be considered as symptoms of disfluency. Fluent and disfluent use will be evaluated in context, considering co-occurrence with other disfluency markers (Crible, Degand & Gilquin, 2017). A systematic study of the functional distribution of DMs in spoken French will show that this hypothesis is at least partially borne out.

  • Jessica Di Napoli, “Filled pauses in university lectures,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 41-46.

    Abstract Previous studies have shown that filled pauses such as uh and um may provide cues to listeners to discourse structure and information structure. The present study employs a corpus-based approach to investigate to what extent filled pauses occur in this function in eight undergraduate lectures in American English. Results show that filled pauses occur most frequently in initial (i.e., post-pausal) position, and that they often cluster together following topic changes. Filled pauses are also shown to occur before important words in the corpus. Together, the results suggest that filled pauses in lectures may highlight important information and mark discourse structure at various levels. The findings contribute to gaining a better understanding of filled pause use across different registers and provide support of filled pauses as signals which benefit listeners.

  • Dorottya Gyarmathy, Valéria Krepsz, Anna Huszár, and Viktória Horváth, “Dynamic changes of pausing in triadic conversations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 105-110.

    Abstract Pausing in conversation has several roles from speech planning to managing turn-takings (TTs). However, less is known about the dynamic changes of pauses over time or with regard to the turn-taking system. The frequency and the duration of silent and filled pauses (SPs and FPs) as well as shared silences was analyzed in 20 triadic Hungarian conversations using dynamic frames (altogether more than 7700 items). Data showed that the frequency of silent and FPs decreased over time across conversations. As opposite, shared silences were found to be the most frequent in the last sections of conversations. However, the duration of the pauses did not change over time across conversation—it may be influenced by other factors. We found that the SPs containing audible breathing were longer than other SPs. The SPs were less frequent before turn-takings than in other positions. However, their duration was not affected by the turn-taking system.

  • Mária Gósy, and Vered Silber-Varod, “Attached filled pauses: Occurrences and durations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 71-76.

    Abstract Filled pauses may reveal speech planning or execution problems that result in various positional and temporal patterns in spontaneous utterances. The purpose of this study was to analyze the position of the vocalic FPs, with respect to an adjacent word, in terms of occurrences and their durations produced by young (mean age: 25 years) and elderly (mean age: 76 years) speakers of Hungarian (a total of 32 participants). Elderly speakers produced significantly less and longer vocalic FPs than young speakers did. Both the occurrences and durations were significantly influenced by position of FPs and by age. In this paper, we introduced the conception of a functional difference between FPs attached either to the preceding or to the following word. The findings indicated different ways of resolving speech planning or execution problems depending on age.

  • Loulou Kosmala, “Gestures in fluent and disfluent cycles of speech: What they may tell us about the role of (dis)fluency in L2 discourse,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 77-82.

    Abstract The present study looks at the production of gestures in fluent versus disfluent speech in L1-L2 interactions, following Graziano and Gullberg (2013, 2018). The aim of this paper is twofold: first to argue against the Lexical Retrieval Hypothesis (Krauss, Chen, & Gottesman, 2000) by comparing the distribution and function of gestures in fluent versus disfluent speech; second, to closely examine the unfolding of embodied (dis)fluencies, where vocal and visual-gestural actions are coordinated and situated within word searching sequences. The analyses are conducted on a video-recorded corpus of semi-spontaneous interactions between French and American speakers in tandem settings. Overall, our results support Graziano and Gullberg’s (2018) findings, and show that gestures accompanying (dis)fluencies are not necessarily related to lexical difficulties. Additionally, the qualitative analyses highlight the interactional and multimodal role of (dis)fluencies, which offers a fresh perspective of these phenomena which have often been treated from an internal production perspective.

  • Xinyue Li, Carlos Toshinori Ishi, and Ryoko Hayashi, “EGG analysis of filled pauses in Japanese spontaneous speech: Differences in Japanese native speakers and Chinese learners,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 65-70.

    Abstract Previous studies on L2 learners of Japanese have shown that the appropriate use of filled pauses is a crucial skill in communication with native speakers. However, there is limited acoustic investigations on filled pauses produced by L2 learners of Japanese. The present study examines the production of filled pauses in Japanese native speakers and L1-Chinese L2 learners of Japanese, using open quotient features extracted from Electroglottography (EGG) signals. The results show that open quotient values of filled pauses were lower than those in ordinary lexical items for Chinese learners of L2 Japanese, suggesting that they may be using vocal tension as one cue to distinguish filled pauses from ordinary lexical items. However, no similar differences for open quotient were observed for the Japanese native speakers. Furthermore, open quotient-valued voice range profiles reveal that Chinese learners of L2 Japanese transfer their native glottal source cues when they produce filled pauses in Japanese.

  • Gabrielle Morin, and Benjamin Tucker, “The acoustic characteristics of um and uh in spontaneous Canadian English,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 53-58.

    Abstract The present study investigates and compares the acoustic characteristics of uh [ə] and um [əm] spontaneous speech. The data comes from a corpus of Western Canadian conversational spontaneous speech. Measures of duration, fundamental frequency, F1 and F2 were extracted from 1,048 instances of um and uh. Results indicate that longer durations occurred when markers preceded silent pauses. Um was found to have higher F1 and lower F2 than uh. F0 was overall lower for um in comparison to uh. These results provide a preliminary understanding of um and uh as markers in spontaneous Canadian English. Canadian English shows a similar proportion of um over uh usage in comparison to American and British English. Findings on vowel duration show no significant difference between um and uh. Differences in f0, F1 and F2 provide additional indication of how um and uh are different.

  • Sieb Nooteboom, and Hugo Quené, “Why are some speech errors detected by self-monitoring “early” and others “late”?,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 11-16.

    Abstract In this paper we attempt to answer the question why in self-monitoring some segmental speech errors are detected in internal, some in external speech, and others not at all. This was done by re-analyzing data obtained in two earlier published SLIP experiments. It is hypothesized that detection of errors that are similar to the correct target takes longer than detection of errors that are dissimilar. It is also hypothesized that the time available for error detection in internal speech and for detection at all is limited. Results show that indeed a major factor is the strength of phonetic contrast between two competing response candidates.

  • Aurélie Pistono, and Robert Hartsuiker, “Word-form related disfluency versus lemma related disfluency: An exploratory analysis of disfluency patterns in connected-speech production,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 95-98.

    Abstract Several language production levels may be involved in the production of disfluencies. In the current study, we conducted network task experiments to tackle disfluencies related to conceptualization, which we operationalized by impeding visual object recognition (i.e. blurring). Contrary to what was expected, blurriness did not lead to more disfluency. However, disfluency type and disfluency location were closely related. This suggests a distinction in the underlying function of disfluencies, some reflecting word-form related difficulties, others reflecting lemma related difficulties.

  • Valeriya Prokaeva, and Elena Riekhakaynen, “Hesitation phenomena in first and second languages: Evidence from reading in Russian as L1 and Japanese as L2,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 89-94.

    Abstract The studies of speech disfluencies rarely involve spontaneous reading data. The current study aims at the identification and the comparative analysis of the hesitation phenomena during unprepared reading of texts in the native (Russian) and non-native (Japanese) language. Three groups of disfluencies are differentiated: silent pauses, filled pauses (including lexical fillers, non-lexical fillers, lengthenings, syllable-by-syllable pronunciation and paralinguistic phenomena), and other hesitations (error-related disfluencies, repetitions, self-interruptions and within-word breaks). The results suggest that disfluency is more frequent in non-native reading and is prevalent in the lower Japanese proficiency group, whilst the higher text complexity defined by a text type does not necessarily induce more hesitations. The self-correction phenomena were equally widespread in both L2 proficiency groups, whereas the number of noticed but uncorrected errors was higher in the lower Japanese proficiency group.

  • Laurent Prévot, Roxane Bertrand, and Stéphane Rauzy, “Investigating disfluencies contribution to discourse-prosody mismatches in French conversations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 35-40.

    Abstract In conversation, discourse and prosodic units association can be articulated through an interesting range of configurations. The situation in which these units are mismatching is the least studied and understood of these configurations. We make the hypothesis in this paper that disfluencies are a major cause for such mismatches. Our quantitative analysis based on a 8 hour corpus of French conversations manually annotated with disfluencies, discourse units (DU) and prosodic units (PU), confirms that disfluencies do play a major role in PU-DU mismatch but also that other sources should be considered. In the analysis, we also provide some insight about the different types of disfluencies and their frequency in the different DU-PU configurations.

  • Ralph L. Rose, “Variation in jitter, shimmer, and intensity of filled pauses and their contexts in native and nonnative speech,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 59-64.

    Abstract Various acoustic parameters of filled pauses (e.g. uh/um in English, e-(to) in Japanese) have been investigated including duration, pitch, and formants. Less investigated have been jitter, shimmer, and intensity. The present work looks at systematic variation in these properties of filled pauses and their immediate contexts in a crosslinguistic speech corpus. Filled pauses were examined within the five token (word) window centered on the filled pause, exploring variation with respect to first (L1 Japanese) and second language (L2 English) speech as well as L2 proficiency. Results show that relative to the central filled pause, higher jitter and shimmer occur before the filled pause and higher intensity afterward. Proficiency group differences are weak, but suggest that jitter differences are greater in high proficiency speakers and shimmer differences greater in low proficiency speakers. Results vary somewhat from earlier work, but suggest jitter and shimmer may be advance indicators of upcoming disfluency.

  • Toshiyuki Sadanobu, “Attitudinal correlates of word-internal disfluencies in Japanese communication,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 5-10.

    Abstract Through a case observation and a questionnaire survey, this presentation seeks to elucidate the patterns of word-internal disfluency in Japanese communication and determine how speakers implement these patterns. Two conclusions can be drawn: (i) Four possible patterns of word-internal disfluency exist in Japanese communication. Some cases show that disfluency that superficially appears not to be prolonged may come under prolongation. (ii) Some deviations are observed in disfluency patterns in accordance with the speaker’s attitude; all four patterns can be seen to occur in hesitant attitudes, whereas those expressed in the attitude of surprise primarily belong to the “suspending and restarting” pattern. However, where the degree of surprise is low or close to disgust, disfluency is more likely to be expressed as “prolonging and continuing.”

  • Loredana Schettino, Simon Betz, and Petra Wagner, “Hesitations distribution in Italian discourse,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 29-34.

    Abstract The acknowledgment of the functional role of hesitations in speech has increased the research interest in investigating and modeling their occurrence in discourse. This study explores hesitation combinations and distribution in Italian discourse. Though clusters represent less frequent occurrences than standalone hesitations, it is still worth examining their composition, distribution, and context of occurrence for a better understanding of hesitations’ role in discourse. Also, the emerging patterns may provide interesting findings for technological applications, such as integrating hesitations models in conversational agents’ production to improve their communicative efficiency and naturalness.

  • Vered Silber-Varod, “DiSStory: A computational analysis of 9 editions of Disfluency in Spontaneous Speech workshop,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 3-4.

    Abstract What are the most prominent research topics during the past nine DISS workshop? Do we see any shift over the years? Can we identify the specific terms used in the research of disfluency? At the 10th workshop of DiSS, I will present some answers I have come up with using a data-driven approach on the database of abstracts published in the proceedings of DiSS workshops from 1999 to 2019. In this talk I call the participant to “Trust the text”, as Sinclair (2004) entitled his book, and to join the journey into the DiSS story.

  • Nette Vandenhouwe, and Robert Hartsuiker, “Speech disfluencies as actual and believed cues to deception: Individuality of liars and the collective of listeners,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 17-22.

    Abstract There is no consensus about the relationship between disfluencies and deception in speech production. However, it is well established that listeners believe deceptive speech to contain more disfluencies than truthful speech. Here, we used an interactive game to collect the speech of liars and the veracity decisions of listeners. Using Multivariate Pattern Analysis (MVPA), we determined the predictive value of speech disfluencies as both actual and believed cues to deception. We found that patterns of disfluencies can indeed be used to predict both an utterance’s veracity and a listener’s decision about that veracity better than chance. However, there was much individual variation in how lies altered speech, whereas listeners were consistent in how they thought the speech of others indicates lying.

  • Simon Williams, “Categorical differences in the false starts of speakers of English as a second language: Further evidence for developmental disfluency,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 83.

    Abstract Although much is known about the formal properties of L2 repair in general and error corrections in particular, less in known about other subtypes, here collectively referred to as false starts. Unlike L2 self-corrections, false starts are psycholinguistically more comparable with NS equivalents and are of particular interest as possible sites of learner monitoring and modified output. Consistent with previous research on L2 repairs, this study found that lower-intermediate and advanced L2 speakers produced similar numbers of false starts. Their mapping by speaker proficiency level onto Levelt’s (1989) model of speech production revealed that both groups were concerned with lexical and morphological false start repair but that lower-intermediate speakers produced more syntactic and advanced speakers more conceptual examples.

  • Yaru Wu, Mathilde Hutin, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, and Liesbeth Degand, “Fine phonetic details for DM disambiguation: A corpus-based investigation,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 23-28.

    Abstract In this study we examine phonetic variation of discourse markers in French, using for this purpose the 4-hour richly annotated LOCAS-F corpus. Both linguistic factors and stylistic variables are considered: speech style, part-of-speech category, mean phone duration and vowel formant distributions with respect to the word status. The results show that the use of discourse markers increases with the degree of spontaneity of the speech. Coordinating conjunctions are the part-of-speech which is most frequently used as discourse markers. Moreover, the mean phone duration tends to be shorter and the vowel space more centralized when words are employed as discourse markers, suggesting that discourse markers undergo hypoarticulation and, more generally, reduction.