Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019)

Intro | DiSS 1999 | DiSS 2001 | DiSS 2003 | DiSS 2005 | DiSS-LPSS 2010 | DiSS 2013 | DiSS 2015 | DiSS 2017 | DiSS 2019 | DiSS 2021

Disfluency in Spontaneous Speech (DiSS) workshop 2019 logo

The ninth Workshop on Disfluency in Spontaneous Speech was held as a satellite event of the INTERSPEECH annual conference. Furthermore, this two-day edition of DiSS was followed by a special day on (Dis)Fluency in Children's Speech (Sep 14).

Date: September 12-13, 2019

Location: ELTE Eötvös Loránd University (Budapest, Hungary)

Organizers: Judit Bóna (Chair), Márton Bartók, Andrea Deme, Robert Eklund, Mária Gósy, Viktória Horváth, Ágnes Jordanidisz, Alexandra Markó, Vered Silber-Varod, Valéria Krepsz, Tímea Vakula, Viola Váradi

Invited speakers: Helena Moniz, Melissa Redford, Mária Gósy

Web site:

Papers presented

(Download references in bibtex format here. Proceedings available in full here)

  • Thanaporn Anansiripinyo, and Chutamanee Onsuwan, “Acoustic-phonetic characteristics of Thai filled pauses in monologues,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 51-54. DOI: 10.21862/diss-09-014-anan-onsu.

    Abstract Filled pause (FP) is one type of disfluent phenomena that is commonly found in everyday speech. It has been widely studied in many languages, but little is known about this topic in Thai. This work explored three important acoustic-phonetic characteristics of Thai filled pauses in monologues. To elicit target monosyllabic tokens of FPs and those of regular word (RW) counterparts, 31 Thai adult females were asked to watch two short cooking videos and describe the contents. They were also asked to read out loud target word lists. Three acoustic measures: syllable dura¬tion, first (F1) and second formant (F2) frequencies were taken from 738 tokens. Across vowel contexts, only F2, not F1, in FPs, was significantly different from that in RWs. Differences in syllable duration between RWs versus FPs were near significant. The findings suggest that Thai speakers produced FPs in a presumably different way from RWs. In FPs, the syllable was relatively lengthened and the tongue position was moved towards the center of vowel space. Future directions include a detailed analysis of FPs in terms of amplitude, fundamental frequency, pause duration before/after fillers and other non-linguistic factors.

  • Maria Bakti, “Error type disfluencies in consecutively interpreted and spontaneous monolingual Hungarian speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 71-74. DOI: 10.21862/diss-09-019-bakti.

    Abstract Interpreting can be considered as a form of spontaneous speech, the key differences being that language change is involved in interpreting and the fact that speech production is influenced by several constraints during interpreting. Research has shown that the interpreting task influences the disfluency patterns of target language texts. The aim of this paper is to investigate how the frequency and distribution of error type disfluencies changes in the target language output of trainee interpreters as they progress in their training. Results indicate that there is no considerable change in the frequency and proportion of error type disfluencies in the target language texts recorded at the end of the second, third and fourth semesters of interpreter training. The proportion of error type disfluencies is higher in the consecutively interpreted texts than in the spontaneous monolingual speech of the students. This suggests that the complexity of the task, rather than progress in training, determines the disfluency pattern of consecutively interpreted target language texts.

  • Charlotte Bellinghausen, Thomas Fangmeier, Bernhard Schröder, Johanna Keller, Susanne Drechsel, Peter Birkholz, Ludger Tebartz van Elst, and Andreas Riedel, “On the role of disfluent speech for uncertainty in articulatory speech synthesis,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 39-42. DOI: 10.21862/diss-09-011-bell-etal.

    Abstract In this paper we present a perception study on the role of disfluent speech in forms of prosodic cues of uncertainty in question-answering situations. In our scenario the answer to each question was modeled by varying three prosodic cues: pause, intonation, and hesitation. The utterances were generated by means of an articulatory speech synthesizer. Subjects were asked to rate each answer on a Likert scale with respect to uncertainty, naturalness and understandability. Results showed evidence for an additive principle of the prosodic cues, i.e. the more cues were activated the higher the perceived level of uncertainty. Overall, the effect of intonation and hesitation was more evident than the effect of pause.

  • Simon Betz, and Loulou Kosmala, “Fill the silence! Basics for modeling hesitation,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 11-14. DOI: 10.21862/diss-09-004-betz-kosm.

    Abstract In order to model hesitations for technical applications such as conversational speech synthesis, it is desirable to understand interactions between individual hesitation markers. In this study, we explore two markers that have been subject to many discussions: silences and fillers. While it is generally acknowledged that fillers occur in two distinct forms, um and uh, it is not agreed on whether these forms systematically influence the length of associated silences. This notion will be investigated on a small dataset of English spontaneous speech data, and the measure of distance between filler and silence will be introduced to the analyses. Results suggest that filler type influences associated silence duration systematically and that silences tend to gravitate towards fillers in utterances, exhibiting systematically lower duration when preceding them. These results provide valuable insights for improving existing hesitation models.

  • Iulia Grosman, Anne Catherine Simon, and Liesbeth Degand, “Empathetic hearers perceive repetitions as less disfluent, especially in non-broadcast situations,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 23-26. DOI: 10.21862/diss-09-007-gros-etal.

    Abstract This experiment measures the impact of the communicative situation on perceived fluency in French speech. We consider three dimensions of fluency: grammatical, discursive and socio-interper¬sonal. We first hypothesise that grammatical fluency is less influenced by contextual constraints than the other two dimensions. Furthermore, taking into account the Interpersonal Reactivity Index of each participant, we hypothesise that hearers with higher interpersonal capacities will be more tolerant in their fluency evaluation, because of their ability to project into the speaker’s mind. The strength of the design rests on the proposal to test natural stimuli and integrate social and individual variables in a perception experiment.

  • Dorottya Gyarmathy, and Viktória Horváth, “Pausing strategies with regard to speech style,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 27-30. DOI: 10.21862/diss-09-008-gyar-horv.

    Abstract Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrela¬tion between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style.

  • Mária Gósy, “Halt command in word retrieval,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 3-6. DOI: 10.21862/diss-09-002-gosy.

    Abstract In this study, occurrences and temporal patterns of five types of disfluencies were analyzed that show a common feature on the surface. All of them have some kind of interruption of content words followed by some continuation. The purpose was to show whether the place of interruption of the word articulation and the durational patterns of the editing phases are characteristic of re-starts, false starts, slips of the tongue, pauses within words, and prolongations. More than 1,400 instances were processed. Both (i) the number of pronounced segments of abandoned words and the duration of the corresponding editing phases are characteristic of a specific disfluency type, and (ii) speakers select a strategy to overcome their speech planning difficulties most economically.

  • Julianna Jankovics, and Luca Garai, “Disfluencies in mildly intellectually disabled young adults’ spontaneous speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 79-82. DOI: 10.21862/diss-09-021-jank-gara.

    Abstract The study analyzes various hesitations and repairs in the spontaneous speech of mildly intellectually disabled women. The main research questions of the study focus on the similarities and differences in the frequency of disfluencies and the duration of pauses between the spontaneous speech of mildly intellectually disabled and mentally healthy young adults. Our results show that hesitation phenomena were more frequent among intellectually disabled subjects in spontaneous speech, while repairs occurred more frequently among control subjects in guided spontaneous speech.

  • Borbála Keszler, and Judit Bóna, “Pausing and disfluencies in elderly speech: Longitudinal case studies,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 67-70. DOI: 10.21862/diss-09-018-kesz-bona.

    Abstract The aim of this paper was to investigate the changes in fluency of speech during ageing. The novelty of the examination is that this is a longitudinal study: it analyses the speech of 7 speakers from middle or young-old age to old-old age. Pausing strategies and frequency of disfluencies were analyzed. Results show that active aging helps to preserve certain parameters of speech characteristics of young speakers.

  • Valéria Krepsz, “Vowel lengthening — Effect of position, age, and phonological quantity,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 59-62. DOI: 10.21862/diss-09-016-krepsz.

    Abstract The present research examined the effect of phrase-final lengthening on the spectral structure of vowels in the spontaneous speech of children and adults. Three Hungarian vowel pairs (in quantity pairs) were analyzed in two positions: in the middle of the phrase and at the end of the phrase. The effect of lengthening on the spectral structure of the vowels were already be detected in four-year-olds. However, its extent was strongly correlated with the articulation aspects of the vowels. There was a discrepancy in the tendencies of the lengthening’s effect between the two groups of children and the adults, presumably due to different linguistic experience, inaccuracy of articulation, and significant individual differences.

  • Mária Laczkó, “Temporal characteristics of teenagers’ spontaneous speech and topic based narratives produced during school lessons,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 63-66. DOI: 10.21862/diss-09-017-laczko.

    Abstract The aim of this presentation is to analyse the articulation and speech rates of teenagers and the types of pauses in their spontaneous speech and topic based narratives during school lessons. The speech samples were analysed in terms of temporal characteristics by Praat program. The results showed the different tempo values and various function of filled pauses in the examnined situations.

  • Kikuo Maekawa, “Five pieces of evidence suggesting large lookahead in spontaneous monologue,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 7-10. DOI: 10.21862/diss-09-003-maekawa.

    Abstract There is considerable disagreement among the researchers of speech production with respect to the range of lookahead or pre-planning. In this paper, five pieces of evidence suggesting the presence of relatively large lookahead in spontaneous monologues are presented, based on the analyses of the Corpus of Spontaneous Japanese. This evidence consistently suggests that the range of a lookahead is six to seven accentual phrases long, which corresponds on average to 3–4 seconds in the time domain.

  • Helena Moniz, “Processing disfluencies in distinct speaking styles: Idiosyncrasies and transversality,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 1-2. DOI: 10.21862/diss-09-001-moniz.

    Abstract This talk will tackle the idiosyncratic properties of disfluencies in distinct speaking styles, mostly university lectures (Trancoso et al., 2008) and map-task dialogues (Trancoso et al., 1998), but also featuring verbal fluency tests, and (more recently) second language learning presentations in ecological settings. It will also discuss the transversal acoustic-prosodic properties pertained across speaking styles. The main research questions are twofold: i) are there domain effects in the production of disfluencies when speakers adjust to distinct communicative contexts, as in university lectures and dialogues?; ii) if domain effects do exist, are there still acoustic-prosodic properties that can be shared across domains?

  • Johanna Pap, “Effects of speech rate changes on pausing and disfluencies in cluttering,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 75-78. DOI: 10.21862/diss-09-020-pap.

    Abstract People with cluttering (PWC) often receive feedback, such as “Slow down!”, even so, this fluency disorder cannot be cured by only slowing down the speakers’ speech rate. When PWC accelerate their speech rate, language planning difficulties and word structure errors might occur, which might result in breakdowns in fluency and/or intelligibility. In the present paper characteristics of the frequency of disfluencies were examined in four different speech tasks from deliberately slow to maximum speech rate, whether speech rate changes have effects on cluttered speech. Twenty participants of this investigation were individuals suspected of cluttering with ages between 20 and 50 years of both genders. The results show that PWC are able to change, not only their speech rate but articulatory rate as well. Moreover, disfluencies were produced the most frequently in the speech task of maximum speech rate, where PWC do not have enough time for speech planning. The research provides empirical, measured data for a better insight into the nature of cluttering. Understanding the correlation between speech rate and disfluencies in cluttered speech is fundamental to improve the diagnosis of cluttering.

  • Kata Baditzné Pálvölgyi, “Hesitation patterns in the Spanish spontaneous speech of Hungarian learners of Spanish,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 35-38. DOI: 10.21862/diss-09-010-badi.

    Abstract This paper examines what native Spanish speakers find most disturbing in the pronunciation of Hungarian language learners of Spanish. Former research (Baditzné Pálvölgyi, 2019) showed that in spontaneous Spanish speech of at least threshold level Hungarian learners, one of the aspects that Spanish native speakers least tolerated was the way Hungarians hesitated. So the present paper focuses primarily on hesitation phenomena—lengthening and filled pauses—assuming that Hungarians hesitate more, and the lengthened segments are longer than the Spanish ones. In order to validate the hypothesis, an investigation comparing a corpus of Northern Spanish spontaneous speech to another corpus of advanced Hungarian learners of Spanish was conducted.

  • Ralph L. Rose, “The structural signaling effect of silent and filled pauses,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 19-22. DOI: 10.21862/diss-09-006-rose.

    Abstract Filled pauses (uh, um) have been shown in a number of studies to have a facilitative effect for listeners, such as helping them better perceive the syntactic structure of ongoing speech. This may be because the extra time afforded by the filled pause gives listeners more time to process the input. Theoretically, then, silent pauses should show a comparable effect. The present study tests this prediction using a grammaticality judgment task following a study by Bailey and Ferreira (2003). Results show that filled and silent pauses have a comparable influence on listeners’ grammaticality judgments but further suggest that listeners deem silent pauses as more important and influential markers.

  • Vered Silber-Varod, Mária Gósy, and Robert Eklund, “Segment prolongation in Hebrew,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 47-50. DOI: 10.21862/diss-09-013-silb-etal.

    Abstract In this paper we study segment prolongations (PRs), a type of disfluency sometimes included under the term “hesitation disfluencies”, in Hebrew. PRs have previously been studied in a number of other lan¬guages within a comprehensive speech disfluency framework, which is applied to Hebrew in the cur¬rent study. For the purpose of this study we defined Hebrew clitics, such as conjunctions, articles, prepositions and so on, as words. The most striking difference between Hebrew and the previously studies languages is how restricted PRs seem to be in Hebrew, occurring almost exclusively on word-final vowels. The most frequently prolonged vowel is [e]. The segment type does not affect PRs’ duration. We found significant differences between men and women regarding the frequency of PRs.

  • Shungo Suzuki, and Judit Kormos, “The effects of read-aloud assistance on second language oral fluency in text summary speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 31-34. DOI: 10.21862/diss-09-009-suzu-korm.

    Abstract Focusing on text summary speaking tasks, the present study investigated the effects of the activation of phonological representations during text comprehension (operationalized by read-aloud assistance) on the subsequent retelling speech. A total of 24 Japanese learners of English completed text summary speaking tasks under two conditions: (a) reading without read-aloud assistance and (b) reading with read-aloud assistance. Their speech data were analyzed by lexical overlap indices (i.e. the ratio of characteristic single-words and multiword sequences) and by fluency measures capturing three major dimensions of fluency—speed, breakdown, and repair fluency. The results showed that read-aloud assistance directly facilitated lexical overlaps with source texts and indirectly improved speed and repair fluency. Furthermore, read-aloud assistance was found to affect the interrelationship between lexical overlaps and utterance fluency. The findings suggested that read-aloud assistance might help second language learners to store multiword sequences as a single unit (i.e. chunking) during text comprehension.

  • Linda Taschenberger, Outi Tuomainen, and Valerie Hazan, “Disfluencies in spontaneous speech in easy and adverse communicative situations: The effect of age,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 55-58. DOI: 10.21862/diss-09-015-tasc-etal.

    Abstract Disfluencies are a pervasive feature of speech communication. Their function in communication is still widely discussed with some proposing that their usage might aid understanding. Accordingly, talkers may produce more disfluencies when conversing in adverse communicative situations, e.g. in background noise. Moreover, increasing age may have an effect on disfluency use as older adults report particular difficulties when communicating in adverse condi¬tions. In this study, we elicited spontaneous speech via a problem-solving task from four different age groups (19–76 years old) to investigate the effect of energetic and informational maskers on the use of filled pauses (FPs), and its interaction with age. Measures of disfluency rates, effort ratings, and communication efficiency were obtained. Results show that, against our predictions, FP usage may decrease in adverse conditions. Moreover, age does not play a great role in adults with normal hearing. The results indicate that individuals differ greatly in their disfluency adaptations, utilising different strategies to overcome challenging communicative situations.

  • Michiko Watanabe, Yusaku Korematsu, and Yuma Shirahata, ““Uh” is preferred by male speakers in informal presentations in American English,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 43-46. DOI: 10.21862/diss-09-012-wata-etal.

    Abstract This study investigates factors that are likely to be related to speakers' choice of filler type between uh and um in English, using an informal presentation speech corpus. The effects of the following factors on the probability of each filler type was examined: (1) immediately preceding clause boundary depth, (2) clause size measured as the number of words in the clause, (3) the number of quotation remarks in the clause, and (4) speaker's sex. The filler probabilities increased with the boundary depths. This trend was much stronger with um than with uh. Ums are more likely to appear clause-initially than uhs. Clause size had similar effect sizes on the two filler types. The number of quotation remarks had a stronger negative effect with ums. Speaker's sex had a significant effect only with uhs. Uhs are used more frequently by male speakers than by female speakers. The results indicate that speakers' choice of filler type is affected by the combination of multiple factors with various effect sizes.

  • Hong Zhang, “Variation in the choice of filled pause: A language change, or a variation in meaning?,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 15-18. DOI: 10.21862/diss-09-005-zhang.

    Abstract The role of filled pauses in message structuring is a heavily debated question, but the result is still somewhat inconclusive. In this study, I consider this question jointly with sociolinguistic factors that have been thought to affect the choice of filled pause in American English. The results suggest that the use of uh is subject to higher variability across not only age groups, but also conversation topics and interlocutors. A latent semantic analysis found consistent difference between two forms of filled pause and silent pauses of varying duration in the primary latent dimension, but similarity between short silent pause and uh, as well as long silent pause and um in the second dimension. Therefore, the functional difference between um and uh should be acknowledged, and the observed change in their relative popularity is potentially related to their different meaning or function in the discourse.