Bibliography of Disfluency in Spontaneous Speech (DiSS) papers

Following is a complete list of proceedings papers from the Disfluency in Spontaneous Speech (DiSS) workshop series. Download the entire list in bibtex format here.

2021

Simon Betz, Nataliya Bryhadyr, Loulou Kosmala, and Loredana Schettino, “A crosslinguistic study on the interplay of fillers and silences,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 47-52.

Abstract We present a crosslinguistic study on the interplay of hesitation silences and fillers in conversation. The research questions have been addressed for English in a previous DiSS workshop paper (Betz & Kosmala, 2019) and this study extends the analysis to German, Italian and French. The research questions are: 1) Does the type of the filler influence following silence duration 2) Does the duration of the filler correlate with silence duration 3) Does silence duration vary depending on its distance from filler. The analysis shows cross-linguistic similarities and differences, thus highlighting the role and the language- and culture-specific nature of disfluencies.
Judit Bóna, “Disfluencies in spontaneous speech: The effect of age, sex and speech task,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 99-104.

Abstract The main question of this study is if there are differences in the occurrence of disfluencies of young and old males and females depending on speech task. Frequency and types of disfluencies of 20 young and 20 old speakers were analyzed in three different speech tasks. Results show that speakers’ age has significant effect on the frequency of disfluencies only in males’ speech. There are disfluencies which are more characteristic of old speakers’ speech, and others of young speakers’ speech. Speech task has significant effect on the analyzed parameters in both ages, while sex has the least impact on frequency.
Liesbeth Degand, “Discourse markers as markers of (dis)fluency: The role of peripheral position,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 1-2.

Abstract Studies on the relationship between discourse markers (DMs) and (dis)fluency have a Janus-headed face. On the one hand, DMs are described as structuring devices key to the local and global organization of discourse. As such, they contribute to its overall fluency. On the other hand, they have been described as traces of impediments in the speech production process, thus signalling disfluency. In other words, DMs are characterized by “functional ambivalence”, a notion reflecting their effects as symptoms of production difficulties and as signals of inferences to be made (Crible, 2018:3, see also Clark & Fox Tree, 2002). Starting point of this presentation is the observation that DMs occur overwhelmingly in initial position of their host unit, where they fulfil specific discourse functions. Discourse Markers may also occur in a functionally-motivated way in final position, be it less frequently. The simplified hypothesis of this study is that DMs in peripheral position have a fluent signalling function, while DMs in non-peripheral position are symptomatic of disfluent use. We will show that this dichotomy needs to be fine-tuned considering the type of host unit under study. On the basis of previous work investigating the relationship between DM function, DM position and the linguistic type of host unit (syntactic clause, intonation unit or speech turn) (Degand & Crible, in press), the hypothesis is that DMs work as functional boundary markers at the syntactic (clause) and the interactional (turn) levels, but not at the prosodic (intonation unit) level. Other (medial) uses should be less functionally motivated and be considered as symptoms of disfluency. Fluent and disfluent use will be evaluated in context, considering co-occurrence with other disfluency markers (Crible, Degand & Gilquin, 2017). A systematic study of the functional distribution of DMs in spoken French will show that this hypothesis is at least partially borne out.
Jessica Di Napoli, “Filled pauses in university lectures,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 41-46.

Abstract Previous studies have shown that filled pauses such as uh and um may provide cues to listeners to discourse structure and information structure. The present study employs a corpus-based approach to investigate to what extent filled pauses occur in this function in eight undergraduate lectures in American English. Results show that filled pauses occur most frequently in initial (i.e., post-pausal) position, and that they often cluster together following topic changes. Filled pauses are also shown to occur before important words in the corpus. Together, the results suggest that filled pauses in lectures may highlight important information and mark discourse structure at various levels. The findings contribute to gaining a better understanding of filled pause use across different registers and provide support of filled pauses as signals which benefit listeners.
Dorottya Gyarmathy, Valéria Krepsz, Anna Huszár, and Viktória Horváth, “Dynamic changes of pausing in triadic conversations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 105-110.

Abstract Pausing in conversation has several roles from speech planning to managing turn-takings (TTs). However, less is known about the dynamic changes of pauses over time or with regard to the turn-taking system. The frequency and the duration of silent and filled pauses (SPs and FPs) as well as shared silences was analyzed in 20 triadic Hungarian conversations using dynamic frames (altogether more than 7700 items). Data showed that the frequency of silent and FPs decreased over time across conversations. As opposite, shared silences were found to be the most frequent in the last sections of conversations. However, the duration of the pauses did not change over time across conversation—it may be influenced by other factors. We found that the SPs containing audible breathing were longer than other SPs. The SPs were less frequent before turn-takings than in other positions. However, their duration was not affected by the turn-taking system.
Mária Gósy, and Vered Silber-Varod, “Attached filled pauses: Occurrences and durations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 71-76.

Abstract Filled pauses may reveal speech planning or execution problems that result in various positional and temporal patterns in spontaneous utterances. The purpose of this study was to analyze the position of the vocalic FPs, with respect to an adjacent word, in terms of occurrences and their durations produced by young (mean age: 25 years) and elderly (mean age: 76 years) speakers of Hungarian (a total of 32 participants). Elderly speakers produced significantly less and longer vocalic FPs than young speakers did. Both the occurrences and durations were significantly influenced by position of FPs and by age. In this paper, we introduced the conception of a functional difference between FPs attached either to the preceding or to the following word. The findings indicated different ways of resolving speech planning or execution problems depending on age.
Loulou Kosmala, “Gestures in fluent and disfluent cycles of speech: What they may tell us about the role of (dis)fluency in L2 discourse,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 77-82.

Abstract The present study looks at the production of gestures in fluent versus disfluent speech in L1-L2 interactions, following Graziano and Gullberg (2013, 2018). The aim of this paper is twofold: first to argue against the Lexical Retrieval Hypothesis (Krauss, Chen, & Gottesman, 2000) by comparing the distribution and function of gestures in fluent versus disfluent speech; second, to closely examine the unfolding of embodied (dis)fluencies, where vocal and visual-gestural actions are coordinated and situated within word searching sequences. The analyses are conducted on a video-recorded corpus of semi-spontaneous interactions between French and American speakers in tandem settings. Overall, our results support Graziano and Gullberg’s (2018) findings, and show that gestures accompanying (dis)fluencies are not necessarily related to lexical difficulties. Additionally, the qualitative analyses highlight the interactional and multimodal role of (dis)fluencies, which offers a fresh perspective of these phenomena which have often been treated from an internal production perspective.
Xinyue Li, Carlos Toshinori Ishi, and Ryoko Hayashi, “EGG analysis of filled pauses in Japanese spontaneous speech: Differences in Japanese native speakers and Chinese learners,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 65-70.

Abstract Previous studies on L2 learners of Japanese have shown that the appropriate use of filled pauses is a crucial skill in communication with native speakers. However, there is limited acoustic investigations on filled pauses produced by L2 learners of Japanese. The present study examines the production of filled pauses in Japanese native speakers and L1-Chinese L2 learners of Japanese, using open quotient features extracted from Electroglottography (EGG) signals. The results show that open quotient values of filled pauses were lower than those in ordinary lexical items for Chinese learners of L2 Japanese, suggesting that they may be using vocal tension as one cue to distinguish filled pauses from ordinary lexical items. However, no similar differences for open quotient were observed for the Japanese native speakers. Furthermore, open quotient-valued voice range profiles reveal that Chinese learners of L2 Japanese transfer their native glottal source cues when they produce filled pauses in Japanese.
Gabrielle Morin, and Benjamin Tucker, “The acoustic characteristics of um and uh in spontaneous Canadian English,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 53-58.

Abstract The present study investigates and compares the acoustic characteristics of uh [ə] and um [əm] spontaneous speech. The data comes from a corpus of Western Canadian conversational spontaneous speech. Measures of duration, fundamental frequency, F1 and F2 were extracted from 1,048 instances of um and uh. Results indicate that longer durations occurred when markers preceded silent pauses. Um was found to have higher F1 and lower F2 than uh. F0 was overall lower for um in comparison to uh. These results provide a preliminary understanding of um and uh as markers in spontaneous Canadian English. Canadian English shows a similar proportion of um over uh usage in comparison to American and British English. Findings on vowel duration show no significant difference between um and uh. Differences in f0, F1 and F2 provide additional indication of how um and uh are different.
Sieb Nooteboom, and Hugo Quené, “Why are some speech errors detected by self-monitoring “early” and others “late”?,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 11-16.

Abstract In this paper we attempt to answer the question why in self-monitoring some segmental speech errors are detected in internal, some in external speech, and others not at all. This was done by re-analyzing data obtained in two earlier published SLIP experiments. It is hypothesized that detection of errors that are similar to the correct target takes longer than detection of errors that are dissimilar. It is also hypothesized that the time available for error detection in internal speech and for detection at all is limited. Results show that indeed a major factor is the strength of phonetic contrast between two competing response candidates.
Aurélie Pistono, and Robert Hartsuiker, “Word-form related disfluency versus lemma related disfluency: An exploratory analysis of disfluency patterns in connected-speech production,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 95-98.

Abstract Several language production levels may be involved in the production of disfluencies. In the current study, we conducted network task experiments to tackle disfluencies related to conceptualization, which we operationalized by impeding visual object recognition (i.e. blurring). Contrary to what was expected, blurriness did not lead to more disfluency. However, disfluency type and disfluency location were closely related. This suggests a distinction in the underlying function of disfluencies, some reflecting word-form related difficulties, others reflecting lemma related difficulties.
Valeriya Prokaeva, and Elena Riekhakaynen, “Hesitation phenomena in first and second languages: Evidence from reading in Russian as L1 and Japanese as L2,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 89-94.

Abstract The studies of speech disfluencies rarely involve spontaneous reading data. The current study aims at the identification and the comparative analysis of the hesitation phenomena during unprepared reading of texts in the native (Russian) and non-native (Japanese) language. Three groups of disfluencies are differentiated: silent pauses, filled pauses (including lexical fillers, non-lexical fillers, lengthenings, syllable-by-syllable pronunciation and paralinguistic phenomena), and other hesitations (error-related disfluencies, repetitions, self-interruptions and within-word breaks). The results suggest that disfluency is more frequent in non-native reading and is prevalent in the lower Japanese proficiency group, whilst the higher text complexity defined by a text type does not necessarily induce more hesitations. The self-correction phenomena were equally widespread in both L2 proficiency groups, whereas the number of noticed but uncorrected errors was higher in the lower Japanese proficiency group.
Laurent Prévot, Roxane Bertrand, and Stéphane Rauzy, “Investigating disfluencies contribution to discourse-prosody mismatches in French conversations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 35-40.

Abstract In conversation, discourse and prosodic units association can be articulated through an interesting range of configurations. The situation in which these units are mismatching is the least studied and understood of these configurations. We make the hypothesis in this paper that disfluencies are a major cause for such mismatches. Our quantitative analysis based on a 8 hour corpus of French conversations manually annotated with disfluencies, discourse units (DU) and prosodic units (PU), confirms that disfluencies do play a major role in PU-DU mismatch but also that other sources should be considered. In the analysis, we also provide some insight about the different types of disfluencies and their frequency in the different DU-PU configurations.
Ralph L. Rose, “Variation in jitter, shimmer, and intensity of filled pauses and their contexts in native and nonnative speech,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 59-64.

Abstract Various acoustic parameters of filled pauses (e.g. uh/um in English, e-(to) in Japanese) have been investigated including duration, pitch, and formants. Less investigated have been jitter, shimmer, and intensity. The present work looks at systematic variation in these properties of filled pauses and their immediate contexts in a crosslinguistic speech corpus. Filled pauses were examined within the five token (word) window centered on the filled pause, exploring variation with respect to first (L1 Japanese) and second language (L2 English) speech as well as L2 proficiency. Results show that relative to the central filled pause, higher jitter and shimmer occur before the filled pause and higher intensity afterward. Proficiency group differences are weak, but suggest that jitter differences are greater in high proficiency speakers and shimmer differences greater in low proficiency speakers. Results vary somewhat from earlier work, but suggest jitter and shimmer may be advance indicators of upcoming disfluency.
Toshiyuki Sadanobu, “Attitudinal correlates of word-internal disfluencies in Japanese communication,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 5-10.

Abstract Through a case observation and a questionnaire survey, this presentation seeks to elucidate the patterns of word-internal disfluency in Japanese communication and determine how speakers implement these patterns. Two conclusions can be drawn: (i) Four possible patterns of word-internal disfluency exist in Japanese communication. Some cases show that disfluency that superficially appears not to be prolonged may come under prolongation. (ii) Some deviations are observed in disfluency patterns in accordance with the speaker’s attitude; all four patterns can be seen to occur in hesitant attitudes, whereas those expressed in the attitude of surprise primarily belong to the “suspending and restarting” pattern. However, where the degree of surprise is low or close to disgust, disfluency is more likely to be expressed as “prolonging and continuing.”
Loredana Schettino, Simon Betz, and Petra Wagner, “Hesitations distribution in Italian discourse,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 29-34.

Abstract The acknowledgment of the functional role of hesitations in speech has increased the research interest in investigating and modeling their occurrence in discourse. This study explores hesitation combinations and distribution in Italian discourse. Though clusters represent less frequent occurrences than standalone hesitations, it is still worth examining their composition, distribution, and context of occurrence for a better understanding of hesitations’ role in discourse. Also, the emerging patterns may provide interesting findings for technological applications, such as integrating hesitations models in conversational agents’ production to improve their communicative efficiency and naturalness.
Vered Silber-Varod, “DiSStory: A computational analysis of 9 editions of Disfluency in Spontaneous Speech workshop,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 3-4.

Abstract What are the most prominent research topics during the past nine DISS workshop? Do we see any shift over the years? Can we identify the specific terms used in the research of disfluency? At the 10th workshop of DiSS, I will present some answers I have come up with using a data-driven approach on the database of abstracts published in the proceedings of DiSS workshops from 1999 to 2019. In this talk I call the participant to “Trust the text”, as Sinclair (2004) entitled his book, and to join the journey into the DiSS story.
Nette Vandenhouwe, and Robert Hartsuiker, “Speech disfluencies as actual and believed cues to deception: Individuality of liars and the collective of listeners,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 17-22.

Abstract There is no consensus about the relationship between disfluencies and deception in speech production. However, it is well established that listeners believe deceptive speech to contain more disfluencies than truthful speech. Here, we used an interactive game to collect the speech of liars and the veracity decisions of listeners. Using Multivariate Pattern Analysis (MVPA), we determined the predictive value of speech disfluencies as both actual and believed cues to deception. We found that patterns of disfluencies can indeed be used to predict both an utterance’s veracity and a listener’s decision about that veracity better than chance. However, there was much individual variation in how lies altered speech, whereas listeners were consistent in how they thought the speech of others indicates lying.
Simon Williams, “Categorical differences in the false starts of speakers of English as a second language: Further evidence for developmental disfluency,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 83.

Abstract Although much is known about the formal properties of L2 repair in general and error corrections in particular, less in known about other subtypes, here collectively referred to as false starts. Unlike L2 self-corrections, false starts are psycholinguistically more comparable with NS equivalents and are of particular interest as possible sites of learner monitoring and modified output. Consistent with previous research on L2 repairs, this study found that lower-intermediate and advanced L2 speakers produced similar numbers of false starts. Their mapping by speaker proficiency level onto Levelt’s (1989) model of speech production revealed that both groups were concerned with lexical and morphological false start repair but that lower-intermediate speakers produced more syntactic and advanced speakers more conceptual examples.
Yaru Wu, Mathilde Hutin, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, and Liesbeth Degand, “Fine phonetic details for DM disambiguation: A corpus-based investigation,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 23-28.

Abstract In this study we examine phonetic variation of discourse markers in French, using for this purpose the 4-hour richly annotated LOCAS-F corpus. Both linguistic factors and stylistic variables are considered: speech style, part-of-speech category, mean phone duration and vowel formant distributions with respect to the word status. The results show that the use of discourse markers increases with the degree of spontaneity of the speech. Coordinating conjunctions are the part-of-speech which is most frequently used as discourse markers. Moreover, the mean phone duration tends to be shorter and the vowel space more centralized when words are employed as discourse markers, suggesting that discourse markers undergo hypoarticulation and, more generally, reduction.

2019

Thanaporn Anansiripinyo, and Chutamanee Onsuwan, “Acoustic-phonetic characteristics of Thai filled pauses in monologues,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 51-54. DOI: 10.21862/diss-09-014-anan-onsu. https://doi.org/10.21862/diss-09-014-anan-onsu.

Abstract Filled pause (FP) is one type of disfluent phenomena that is commonly found in everyday speech. It has been widely studied in many languages, but little is known about this topic in Thai. This work explored three important acoustic-phonetic characteristics of Thai filled pauses in monologues. To elicit target monosyllabic tokens of FPs and those of regular word (RW) counterparts, 31 Thai adult females were asked to watch two short cooking videos and describe the contents. They were also asked to read out loud target word lists. Three acoustic measures: syllable dura¬tion, first (F1) and second formant (F2) frequencies were taken from 738 tokens. Across vowel contexts, only F2, not F1, in FPs, was significantly different from that in RWs. Differences in syllable duration between RWs versus FPs were near significant. The findings suggest that Thai speakers produced FPs in a presumably different way from RWs. In FPs, the syllable was relatively lengthened and the tongue position was moved towards the center of vowel space. Future directions include a detailed analysis of FPs in terms of amplitude, fundamental frequency, pause duration before/after fillers and other non-linguistic factors.
Maria Bakti, “Error type disfluencies in consecutively interpreted and spontaneous monolingual Hungarian speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 71-74. DOI: 10.21862/diss-09-019-bakti. https://doi.org/10.21862/diss-09-019-bakti.

Abstract Interpreting can be considered as a form of spontaneous speech, the key differences being that language change is involved in interpreting and the fact that speech production is influenced by several constraints during interpreting. Research has shown that the interpreting task influences the disfluency patterns of target language texts. The aim of this paper is to investigate how the frequency and distribution of error type disfluencies changes in the target language output of trainee interpreters as they progress in their training. Results indicate that there is no considerable change in the frequency and proportion of error type disfluencies in the target language texts recorded at the end of the second, third and fourth semesters of interpreter training. The proportion of error type disfluencies is higher in the consecutively interpreted texts than in the spontaneous monolingual speech of the students. This suggests that the complexity of the task, rather than progress in training, determines the disfluency pattern of consecutively interpreted target language texts.
Charlotte Bellinghausen, Thomas Fangmeier, Bernhard Schröder, Johanna Keller, Susanne Drechsel, Peter Birkholz, Ludger Tebartz van Elst, and Andreas Riedel, “On the role of disfluent speech for uncertainty in articulatory speech synthesis,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 39-42. DOI: 10.21862/diss-09-011-bell-etal. https://doi.org/10.21862/diss-09-011-bell-etal.

Abstract In this paper we present a perception study on the role of disfluent speech in forms of prosodic cues of uncertainty in question-answering situations. In our scenario the answer to each question was modeled by varying three prosodic cues: pause, intonation, and hesitation. The utterances were generated by means of an articulatory speech synthesizer. Subjects were asked to rate each answer on a Likert scale with respect to uncertainty, naturalness and understandability. Results showed evidence for an additive principle of the prosodic cues, i.e. the more cues were activated the higher the perceived level of uncertainty. Overall, the effect of intonation and hesitation was more evident than the effect of pause.
Simon Betz, and Loulou Kosmala, “Fill the silence! Basics for modeling hesitation,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 11-14. DOI: 10.21862/diss-09-004-betz-kosm. https://doi.org/10.21862/diss-09-004-betz-kosm.

Abstract In order to model hesitations for technical applications such as conversational speech synthesis, it is desirable to understand interactions between individual hesitation markers. In this study, we explore two markers that have been subject to many discussions: silences and fillers. While it is generally acknowledged that fillers occur in two distinct forms, um and uh, it is not agreed on whether these forms systematically influence the length of associated silences. This notion will be investigated on a small dataset of English spontaneous speech data, and the measure of distance between filler and silence will be introduced to the analyses. Results suggest that filler type influences associated silence duration systematically and that silences tend to gravitate towards fillers in utterances, exhibiting systematically lower duration when preceding them. These results provide valuable insights for improving existing hesitation models.
Iulia Grosman, Anne Catherine Simon, and Liesbeth Degand, “Empathetic hearers perceive repetitions as less disfluent, especially in non-broadcast situations,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 23-26. DOI: 10.21862/diss-09-007-gros-etal. https://doi.org/10.21862/diss-09-007-gros-etal.

Abstract This experiment measures the impact of the communicative situation on perceived fluency in French speech. We consider three dimensions of fluency: grammatical, discursive and socio-interper¬sonal. We first hypothesise that grammatical fluency is less influenced by contextual constraints than the other two dimensions. Furthermore, taking into account the Interpersonal Reactivity Index of each participant, we hypothesise that hearers with higher interpersonal capacities will be more tolerant in their fluency evaluation, because of their ability to project into the speaker’s mind. The strength of the design rests on the proposal to test natural stimuli and integrate social and individual variables in a perception experiment.
Dorottya Gyarmathy, and Viktória Horváth, “Pausing strategies with regard to speech style,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 27-30. DOI: 10.21862/diss-09-008-gyar-horv. https://doi.org/10.21862/diss-09-008-gyar-horv.

Abstract Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrela¬tion between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style.
Mária Gósy, “Halt command in word retrieval,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 3-6. DOI: 10.21862/diss-09-002-gosy. https://doi.org/10.21862/diss-09-002-gosy.

Abstract In this study, occurrences and temporal patterns of five types of disfluencies were analyzed that show a common feature on the surface. All of them have some kind of interruption of content words followed by some continuation. The purpose was to show whether the place of interruption of the word articulation and the durational patterns of the editing phases are characteristic of re-starts, false starts, slips of the tongue, pauses within words, and prolongations. More than 1,400 instances were processed. Both (i) the number of pronounced segments of abandoned words and the duration of the corresponding editing phases are characteristic of a specific disfluency type, and (ii) speakers select a strategy to overcome their speech planning difficulties most economically.
Julianna Jankovics, and Luca Garai, “Disfluencies in mildly intellectually disabled young adults’ spontaneous speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 79-82. DOI: 10.21862/diss-09-021-jank-gara. https://doi.org/10.21862/diss-09-021-jank-gara.

Abstract The study analyzes various hesitations and repairs in the spontaneous speech of mildly intellectually disabled women. The main research questions of the study focus on the similarities and differences in the frequency of disfluencies and the duration of pauses between the spontaneous speech of mildly intellectually disabled and mentally healthy young adults. Our results show that hesitation phenomena were more frequent among intellectually disabled subjects in spontaneous speech, while repairs occurred more frequently among control subjects in guided spontaneous speech.
Borbála Keszler, and Judit Bóna, “Pausing and disfluencies in elderly speech: Longitudinal case studies,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 67-70. DOI: 10.21862/diss-09-018-kesz-bona. https://doi.org/10.21862/diss-09-018-kesz-bona.

Abstract The aim of this paper was to investigate the changes in fluency of speech during ageing. The novelty of the examination is that this is a longitudinal study: it analyses the speech of 7 speakers from middle or young-old age to old-old age. Pausing strategies and frequency of disfluencies were analyzed. Results show that active aging helps to preserve certain parameters of speech characteristics of young speakers.
Valéria Krepsz, “Vowel lengthening — Effect of position, age, and phonological quantity,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 59-62. DOI: 10.21862/diss-09-016-krepsz. https://doi.org/10.21862/diss-09-016-krepsz.

Abstract The present research examined the effect of phrase-final lengthening on the spectral structure of vowels in the spontaneous speech of children and adults. Three Hungarian vowel pairs (in quantity pairs) were analyzed in two positions: in the middle of the phrase and at the end of the phrase. The effect of lengthening on the spectral structure of the vowels were already be detected in four-year-olds. However, its extent was strongly correlated with the articulation aspects of the vowels. There was a discrepancy in the tendencies of the lengthening’s effect between the two groups of children and the adults, presumably due to different linguistic experience, inaccuracy of articulation, and significant individual differences.
Mária Laczkó, “Temporal characteristics of teenagers’ spontaneous speech and topic based narratives produced during school lessons,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 63-66. DOI: 10.21862/diss-09-017-laczko. https://doi.org/10.21862/diss-09-017-laczko.

Abstract The aim of this presentation is to analyse the articulation and speech rates of teenagers and the types of pauses in their spontaneous speech and topic based narratives during school lessons. The speech samples were analysed in terms of temporal characteristics by Praat program. The results showed the different tempo values and various function of filled pauses in the examnined situations.
Kikuo Maekawa, “Five pieces of evidence suggesting large lookahead in spontaneous monologue,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 7-10. DOI: 10.21862/diss-09-003-maekawa. https://doi.org/10.21862/diss-09-003-maekawa.

Abstract There is considerable disagreement among the researchers of speech production with respect to the range of lookahead or pre-planning. In this paper, five pieces of evidence suggesting the presence of relatively large lookahead in spontaneous monologues are presented, based on the analyses of the Corpus of Spontaneous Japanese. This evidence consistently suggests that the range of a lookahead is six to seven accentual phrases long, which corresponds on average to 3–4 seconds in the time domain.
Helena Moniz, “Processing disfluencies in distinct speaking styles: Idiosyncrasies and transversality,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 1-2. DOI: 10.21862/diss-09-001-moniz. https://doi.org/10.21862/diss-09-001-moniz.

Abstract This talk will tackle the idiosyncratic properties of disfluencies in distinct speaking styles, mostly university lectures (Trancoso et al., 2008) and map-task dialogues (Trancoso et al., 1998), but also featuring verbal fluency tests, and (more recently) second language learning presentations in ecological settings. It will also discuss the transversal acoustic-prosodic properties pertained across speaking styles. The main research questions are twofold: i) are there domain effects in the production of disfluencies when speakers adjust to distinct communicative contexts, as in university lectures and dialogues?; ii) if domain effects do exist, are there still acoustic-prosodic properties that can be shared across domains?
Johanna Pap, “Effects of speech rate changes on pausing and disfluencies in cluttering,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 75-78. DOI: 10.21862/diss-09-020-pap. https://doi.org/10.21862/diss-09-020-pap.

Abstract People with cluttering (PWC) often receive feedback, such as “Slow down!”, even so, this fluency disorder cannot be cured by only slowing down the speakers’ speech rate. When PWC accelerate their speech rate, language planning difficulties and word structure errors might occur, which might result in breakdowns in fluency and/or intelligibility. In the present paper characteristics of the frequency of disfluencies were examined in four different speech tasks from deliberately slow to maximum speech rate, whether speech rate changes have effects on cluttered speech. Twenty participants of this investigation were individuals suspected of cluttering with ages between 20 and 50 years of both genders. The results show that PWC are able to change, not only their speech rate but articulatory rate as well. Moreover, disfluencies were produced the most frequently in the speech task of maximum speech rate, where PWC do not have enough time for speech planning. The research provides empirical, measured data for a better insight into the nature of cluttering. Understanding the correlation between speech rate and disfluencies in cluttered speech is fundamental to improve the diagnosis of cluttering.
Kata Baditzné Pálvölgyi, “Hesitation patterns in the Spanish spontaneous speech of Hungarian learners of Spanish,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 35-38. DOI: 10.21862/diss-09-010-badi. https://doi.org/10.21862/diss-09-010-badi.

Abstract This paper examines what native Spanish speakers find most disturbing in the pronunciation of Hungarian language learners of Spanish. Former research (Baditzné Pálvölgyi, 2019) showed that in spontaneous Spanish speech of at least threshold level Hungarian learners, one of the aspects that Spanish native speakers least tolerated was the way Hungarians hesitated. So the present paper focuses primarily on hesitation phenomena—lengthening and filled pauses—assuming that Hungarians hesitate more, and the lengthened segments are longer than the Spanish ones. In order to validate the hypothesis, an investigation comparing a corpus of Northern Spanish spontaneous speech to another corpus of advanced Hungarian learners of Spanish was conducted.
Ralph L. Rose, “The structural signaling effect of silent and filled pauses,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 19-22. DOI: 10.21862/diss-09-006-rose. https://doi.org/10.21862/diss-09-006-rose.

Abstract Filled pauses (uh, um) have been shown in a number of studies to have a facilitative effect for listeners, such as helping them better perceive the syntactic structure of ongoing speech. This may be because the extra time afforded by the filled pause gives listeners more time to process the input. Theoretically, then, silent pauses should show a comparable effect. The present study tests this prediction using a grammaticality judgment task following a study by Bailey and Ferreira (2003). Results show that filled and silent pauses have a comparable influence on listeners’ grammaticality judgments but further suggest that listeners deem silent pauses as more important and influential markers.
Vered Silber-Varod, Mária Gósy, and Robert Eklund, “Segment prolongation in Hebrew,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 47-50. DOI: 10.21862/diss-09-013-silb-etal. https://doi.org/10.21862/diss-09-013-silb-etal.

Abstract In this paper we study segment prolongations (PRs), a type of disfluency sometimes included under the term “hesitation disfluencies”, in Hebrew. PRs have previously been studied in a number of other lan¬guages within a comprehensive speech disfluency framework, which is applied to Hebrew in the cur¬rent study. For the purpose of this study we defined Hebrew clitics, such as conjunctions, articles, prepositions and so on, as words. The most striking difference between Hebrew and the previously studies languages is how restricted PRs seem to be in Hebrew, occurring almost exclusively on word-final vowels. The most frequently prolonged vowel is [e]. The segment type does not affect PRs’ duration. We found significant differences between men and women regarding the frequency of PRs.
Shungo Suzuki, and Judit Kormos, “The effects of read-aloud assistance on second language oral fluency in text summary speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 31-34. DOI: 10.21862/diss-09-009-suzu-korm. https://doi.org/10.21862/diss-09-009-suzu-korm.

Abstract Focusing on text summary speaking tasks, the present study investigated the effects of the activation of phonological representations during text comprehension (operationalized by read-aloud assistance) on the subsequent retelling speech. A total of 24 Japanese learners of English completed text summary speaking tasks under two conditions: (a) reading without read-aloud assistance and (b) reading with read-aloud assistance. Their speech data were analyzed by lexical overlap indices (i.e. the ratio of characteristic single-words and multiword sequences) and by fluency measures capturing three major dimensions of fluency—speed, breakdown, and repair fluency. The results showed that read-aloud assistance directly facilitated lexical overlaps with source texts and indirectly improved speed and repair fluency. Furthermore, read-aloud assistance was found to affect the interrelationship between lexical overlaps and utterance fluency. The findings suggested that read-aloud assistance might help second language learners to store multiword sequences as a single unit (i.e. chunking) during text comprehension.
Linda Taschenberger, Outi Tuomainen, and Valerie Hazan, “Disfluencies in spontaneous speech in easy and adverse communicative situations: The effect of age,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 55-58. DOI: 10.21862/diss-09-015-tasc-etal. https://doi.org/10.21862/diss-09-015-tasc-etal.

Abstract Disfluencies are a pervasive feature of speech communication. Their function in communication is still widely discussed with some proposing that their usage might aid understanding. Accordingly, talkers may produce more disfluencies when conversing in adverse communicative situations, e.g. in background noise. Moreover, increasing age may have an effect on disfluency use as older adults report particular difficulties when communicating in adverse condi¬tions. In this study, we elicited spontaneous speech via a problem-solving task from four different age groups (19–76 years old) to investigate the effect of energetic and informational maskers on the use of filled pauses (FPs), and its interaction with age. Measures of disfluency rates, effort ratings, and communication efficiency were obtained. Results show that, against our predictions, FP usage may decrease in adverse conditions. Moreover, age does not play a great role in adults with normal hearing. The results indicate that individuals differ greatly in their disfluency adaptations, utilising different strategies to overcome challenging communicative situations.
Michiko Watanabe, Yusaku Korematsu, and Yuma Shirahata, ““Uh” is preferred by male speakers in informal presentations in American English,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 43-46. DOI: 10.21862/diss-09-012-wata-etal. https://doi.org/10.21862/diss-09-012-wata-etal.

Abstract This study investigates factors that are likely to be related to speakers' choice of filler type between uh and um in English, using an informal presentation speech corpus. The effects of the following factors on the probability of each filler type was examined: (1) immediately preceding clause boundary depth, (2) clause size measured as the number of words in the clause, (3) the number of quotation remarks in the clause, and (4) speaker's sex. The filler probabilities increased with the boundary depths. This trend was much stronger with um than with uh. Ums are more likely to appear clause-initially than uhs. Clause size had similar effect sizes on the two filler types. The number of quotation remarks had a stronger negative effect with ums. Speaker's sex had a significant effect only with uhs. Uhs are used more frequently by male speakers than by female speakers. The results indicate that speakers' choice of filler type is affected by the combination of multiple factors with various effect sizes.
Hong Zhang, “Variation in the choice of filled pause: A language change, or a variation in meaning?,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 15-18. DOI: 10.21862/diss-09-005-zhang. https://doi.org/10.21862/diss-09-005-zhang.

Abstract The role of filled pauses in message structuring is a heavily debated question, but the result is still somewhat inconclusive. In this study, I consider this question jointly with sociolinguistic factors that have been thought to affect the choice of filled pause in American English. The results suggest that the use of uh is subject to higher variability across not only age groups, but also conversation topics and interlocutors. A latent semantic analysis found consistent difference between two forms of filled pause and silent pauses of varying duration in the primary latent dimension, but similarity between short silent pause and uh, as well as long silent pause and um in the second dimension. Therefore, the functional difference between um and uh should be acknowledged, and the observed change in their relative popularity is potentially related to their different meaning or function in the discourse.

2017

Jens Allwood, “Fluency or disfluency?,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 1-4. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract In this paper, I investigate the concepts of “fluency” and “disfluency” and argue that the application of the two concepts must be relativized to type of communicative activity. It is not clear that there is a generic sense of fluency or disfluency, rather what contributes to fluency and disfluency depends on what type of communication we are dealing with. The paper then turns to a brief investigation of what makes interactive face-to-face communication fluent or disfluent and argues that many of the features that have been labeled as disfluent, in fact, contribute to the fluency of interactive communication. Finally, I suggest that maybe it is time for a change of terminology and abandon the term “disfluent” for more positive or neutral terminology.

Keywords DiSS
Malte Belz, “Glottal filled pauses in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 5-8. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract For German, filled pauses are traditionally described with a vocalic form äh and a vocalic-nasal form ähm. A corpus-based approach and a closer phonetic inspection is used here to argue for an additional form, namely glottal filled pauses. In the data analysed for this study, the glottal form is produced by all seven speakers and amounts to 21% of all filled pauses. Contexts and durations of occurrences are discussed and compared to earlier studies on traditional filled pauses. It is suggested that the glottal variant should be considered in future studies on filled pauses and disfluencies.

Keywords DiSS
Axel Bergström, Martin Johansson, and Robert Eklund, “Differences in production of disfluencies in children with typical language development and children with mixed receptive-expressive language disorder,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 9-12. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract There are several studies about non-fluency in people who stutter, but comparatively few regarding children with language impairment. The current research body regarding disfluencies in children with language impairment has been using different study-designs and definitions, making some results rather contradictory. The purpose of the present study is to expand the knowledge about disfluencies in children with language impairment and compare the occurrence of disfluencies between children with language impairment and children with typical language development in the same age group. A total of ten children with language impairment and six children with typical language development participated in this study. The subjects were recorded when talking freely about a thematic picture or toys and then analysed by calculating disfluencies per 50 words including frequency of different kinds of disfluencies according to Johnson and Associates’ (1959) classic taxonomy. Our results show that children with language impairment do produce statistically significant more disfluency in general, notably sound and syllable repetition, broken words and prolongations.

Keywords DiSS
Simon Betz, Robert Eklund, and Petra Wagner, “Prolongation in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 13-16. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract We investigate segment prolongation as a means of disfluent hesitation in spontaneous German speech. We describe phonetic and structural features of disfluent prolongation and compare it to data of other languages and to non-disfluent prolongations.

Keywords DiSS
Jillian Donahue, Christine Schoepfer, and Robin Lickley, “The effects of disfluent repetitions and speech rate on recall accuracy in a discourse listening task,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 17-20. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract disfluency on word recognition and local syntactic or semantic issues, fewer have addressed the impact on comprehension at a discourse level. In this work, we ask what effects features typical in the pathological condition of cluttering (essentially, rapid, disfluent and unintelligible speech) have on our ability to retain the information conveyed in speech. Specifically, we manipulate repetition disfluencies and speech rate in passages of running speech. Forty participants listened to four recordings of passages presented in four conditions: Control, Rapid, Disfluent, Rapid + Disfluent. They were asked to recall details of the passages and rate their speed, fluency and comprehensibility. Both repetition disfluencies and increased speech rate significantly reduced recall of information from discourse. Though no relationship was found between the working memory span of individuals and information recall, we argue that the cognitive load of these features of cluttered speech significantly affects intelligibility and thus recall of speech.

Keywords DiSS
Megan Drevets, and Robin Lickley, “A psycholinguistic exploration of disfluency behaviour during the tip-of-the-tongue phenomenon,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 21-24. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract A tip-of-the-tongue state (TOT) occurs when a speaker knows a word but cannot retrieve its phonological form from memory. While previous studies have found that disfluencies are related to lexical retrieval difficulties, the literature lacks studies which have specifically investigated the impact of TOTs on disfluency. This study explores the relationship between TOTs and such disfluency behaviours as hesitations and target approximations (i.e. incorrect attempts to produce targets). TOTs were induced using the TOTimal method (Smith, Brown & Balfour, 1991), where participants memorised and retrieved the names of imaginary animals. Speech samples were analysed for TOTs and disfluencies. Disfluency rates increased with retrieval times during resolved TOTs. Additionally, target approximation rates correlated with the rates of both TOTs and “Don’t Know” responses, suggesting that target approximations are not unique to TOTs but are indicative of general uncertainty during lexical retrieval.

Keywords DiSS
Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in chat and chunk phases of multiparty casual talk,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 25-28. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Multiparty casual conversation lasting more than a few minutes can be viewed as a series of phases of chat and chunk type interaction, where chat is interactive conversation with several participants taking turns, and chunk refers to phases where one participant dominates the conversation, often by telling a story or giving an opinion. We investigate the distribution of disfluency in these phases in a 70-minute 5-party conversation where participants had no practical task to perform. This pilot study shows differences in the distribution of disfluency types and frequency in the two phases.

Keywords DiSS
Mária Gósy, and Robert Eklund, “Segment prolongation in Hungarian,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 29-32. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Segment prolongation (PR) has been shown to be one of the most common forms of non-pathological speech disfluencies (Eklund, 2001). The distribution of PRs in the word (initial–medial–final segment) seems to vary between languages of different syllable-structure complexity, making it interesting to study segment prolongation in languages that exhibit different syllable structure characteristics. Previous studies have studied languages with complex syllable structure, such as English and Swedish (Eklund & Shriberg, 1998; Eklund, 2001, 2004) where affixation creates complex consonant clusters, and languages with very simple syllable, such as Japanese (Den, 2003) or Tok Pisin (Eklund, 2001, 2004), as well as Mandarin Chinese (Lee et al., 2004). In this paper we study PRs in Hungarian. Our results indicate that PRs in Hungarian are more similar to English and Swedish than it is to Japanese, Tok Pisin or Mandarin Chinese, which lends support to the notion that underlying morphology plays a role in how PRs is realised.

Keywords DiSS
Peter Howell, Kaho Yoshikawa, Kevin Tang, John Harris, and Clarissa Sorger, “Intervention for word-finding difficulty for children starting school who have diverse language backgrounds,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 33-36. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Children who have word-finding difficulty can be identified by the pattern of disfluencies in their spontaneous speech; in particular whole-word repetition of prior words often occurs when they cannot retrieve the subsequent word. Work is reviewed that shows whole-word repetitions can be used to identify children from diverse language backgrounds who have word-finding difficulty. The symptom-based identification procedure was validated using a non-word repetition task. Children who were identified as having word-finding difficulty were given phonological training that taught them features of English that they lacked (this depended on their language background). Then they received semantic training. In the cases of children whose first language was not English, the children were primed to use English and then presented with material where there was interference in meanings across the languages (English names had to be produced). It was found that this training improved a range of outcome measures related to education.

Keywords DiSS
Loulou Kosmala, and Aliyah Morgenstern, “A preliminary study of hesitation phenomena in L1 and L2 productions: a multimodal approach,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 37-40. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract This paper presents a preliminary study of vocal hesitations in L1 and L2 productions using a multimodal perspective. It investigates the use of vocal hesitations of French learners of English interacting in tandem with American speakers in semi-spontaneous speech. Several hesitation markers were analyzed (filled pauses, unfilled pauses, prolongations and non-lexical sounds) based on formal and functional features as well as their relation to gesture. Results do not show great differences in the frequency of vocal hesitations between L1 and L2 productions overall; however, we find differences in duration and combination complexity. Our study indicated that vocal hesitations mainly served planning functions and were very often accompanied with gaze aversion both in L1 and L2 productions. Moreover, speakers did not tend to gesture while hesitating. We conclude that hesitations mainly served planning strategies both in L1 and L2 speech, but with some differences in duration and complexity.

Keywords DiSS
Kikuo Maekawa, Ken’ya Nishikawa, and Shu-Chuan Tseng, “Phonetic characteristics of filled pauses: a preliminary comparison between Japanese and Chinese,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 41-44. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Filled pauses in spontaneous Chinese and Japanese were analyzed to examine if there is systematic phonetic difference between the vowels of filled pauses and those occurred in ordinary lexical items. Also, the effect of the category of filled pauses (simple vocalic fillers versus fillers derived from demonstratives) was examined in both languages. Random forests analysis revealed that it was possible to construct automatic classifiers that achieved F-measure values of .7-.9. It turned out also that, in both languages, vowels in simple vocalic filled pauses showed higher F-values than the filled pauses derived from demonstratives. Lastly, it turned out that acoustic features distinguishing filled pauses from ordinary lexical items differ depending on both the category of filled pauses and languages.

Keywords DiSS
Sieb Nooteboom, and Hugo Quené, “The time course of self-monitoring within words and utterances,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 45-48. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract The within-word and within-utterance time course of internal and external self-monitoring is investigated in a four-word tongue twister experiment eliciting interactional word initial and word medial segmental errors and their repairs. It is found that detection rate for both internal and external self-monitoring decreases from early to late both within words and within utterances. Also, offset-to-repair times are more often of 0 ms in initial than in medial consonants.

Keywords DiSS
Ralph Rose, “Silent and filled pauses and speech planning in first and second language production,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 49-52. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract The present study looks at the relative association of silent and filled pauses to problems in discourse and syntactic planning via utterance and clause boundary phenomena, respectively, by focusing on crosslinguistic data. The occurrence of boundary pauses in a crosslinguistic corpus of speech suggests that silent pauses are more closely related to both discourse and syntactic planning than filled pauses, but more strongly so for discourse planning. These results were consistent across both first and second language production. However, clause boundary silent pauses in first language speech were more atypical (i.e., longer than average) than those in second language speech. This difference may be due to complexity differences in the first and second language speech samples.

Keywords DiSS
Vered Silber-Varod, and Anat Lerner, “Analysis of silences in unbalanced dialogues: the effect of genre and role,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 53-57. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract This study examines the diversity of silences in unbalanced dialogues, i.e. dialogues between speakers with different participation levels: responder and reporter. We examined two genres: therapeutic sessions and private dialogues that are based on this responder-reporter structure. When looking at silences versus speech ratios, we found no differences between the genres nor between the roles. However, when grouping the silences by their types: Pauses (intra-speaker silences), gaps (interspeakers’ silences) and silences that occur in the vicinity of speech overlaps, we found that the silence duration of pauses are role dependent in both genres, while the silence duration of gaps were found genre dependent, but not role dependent. Moreover, speech rate was not found genre dependent. It seems that although silences in unbalanced dialogues vary considerably, genre and speaker’s role are influential.

Keywords DiSS

2015

Malte Belz, and Uwe Reichel, “Pitch Characteristics of Filled Pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract We investigate the pitch characteristics of filled pauses in order to distinguish between hesitational and floor-holding functions of filled pauses. A corpus of spontaneous dialogues is explored using a parametric bottom-up approach to extract intonation contours. We find that subjects tend to utter filled pauses more prominently when they cannot see each other, which indicates an increased floor-holding usage of filled pauses in this condition.

Keywords DiSS, disfluencies, filled pauses, intonation, floor-holding
Hans Rutger Bosker, Jade Tjiong, Hugo Quené, Ted Sanders, and Nivja de Jong, “Both native and non-native disfluencies trigger listeners’ attention,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For instance, disfluencies may elicit prediction of less accessible referents and may trigger listeners’ attention to the following word. However, recent work suggests differential processing of disfluencies in native and non-native speech. The current study investigated whether the beneficial effects of disfluencies on listeners’ attention are modulated by the (non-)native identity of the speaker. Using the Change Detection Paradigm, we investigated listeners’ recall accuracy for words presented in disfluent and fluent contexts, in native and non-native speech. We observed beneficial effects of both native and non-native disfluencies on listeners’ recall accuracy, suggesting that native and non-native disfluencies trigger listeners’ attention in a similar fashion.

Keywords DiSS, disfluencies, attention, non-native speech, Change Detection Paradigm
Rasmus Dall, Mirjam Wester, and Martin Corley, “Disfluencies in change detection in natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract In this paper, we investigate the effect of filled pauses, a discourse marker and silent pauses in a change detection experiment in natural, vocoded and synthetic speech. In natural speech change detection has been found to increase in the presence of filled pauses, we extend this work by replicating earlier findings and explore the effect of a discourse marker, like, and silent pauses. Furthermore we report how the use of "unnatural" speech, namely synthetic and vocoded, affects change detection rates. It was found that the filled pauses, the discourse marker and silent pauses all increase change detection rates in natural speech, however in neither synthetic nor vocoded speech did this effect appear. Rather, change detection rates decreased in both types of "unnatural" speech compared to natural speech. The natural results suggests that while each type of pause increase detection rates, the type of pause may have a further effect. The "unnatural" results suggest that it is not the full pipeline of synthetic speech that causes the degradation, but rather that something in the pre-processing, i.e. vocoding, of the speech database limits the resulting synthesis.

Keywords DiSS, change detection, filled pauses, speech synthesis
Stephanie Don, and Robin Lickley, “Uh I forgot what I was going to say: How memory affects fluency,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Disfluency rates vary considerably between individuals. Previous studies have considered gender, age and conversational roles amongst other factors that may affect fluency. Testing a nonclinical, gender-balanced population of young adults performing the same speaking tasks, this study explores how inter-speaker variations in working memory and in long-term (lexical) memory affect disfluency in two different ways. Working memory was tested by a forward digit span test; long-term lexical memory was tested by the Verbal Fluency Test, both semantic and phonological versions. In addition, each participant provided 3 one-minute samples of monologue speech. The speech samples were analysed for disfluencies. Speakers with lower working memory scores produced more error repairs in running speech. Speakers with lower lexical access scores produced a higher rate of hesitations. The two types of memory affected fluency in different ways.

Keywords DiSS, hesitation, error repair, working memory, long term lexical memory
Robert Eklund, Peter Fransson, and Martin Ingvar, “Neural correlates of the processing of unfilled and filled pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Spontaneously produced Unfilled Pauses (UPs) and Filled Pauses (FPs) were played to subjects in an fMRI experiment. While both stimuli resulted in increased activity in the Primary Auditory Cortex, FPs, unlike UPs, also elicited modulation in the Supplementary Motor Area, Brodmann Area 6. This observation provides neurocognitive confirmation of the oft-reported difference between FPs and other kinds of speech disfluency and also could provide a partial explanation for the previously reported beneficial effect of FPs on reaction times in speech perception. The results are discussed in the light of the suggested role of FPs as floor-holding devices in human polylogs.

Keywords DiSS, speech disfluency, filled pauses, unfilled pauses, speech perception, spontaneous speech, fMRI, Auditory Cortex, PAC, Supplementary Motor Area, SMA, Brodmann Area 6, BA6
Lorenzo García-Amaya, “A longitudinal study of filled pauses and silent pauses in second language speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This study provides a longitudinal analysis of speech rate and the use of filled pauses (FPs) and unfilled or silent pauses (SPs) in the oral production of L2 learners of Spanish in two learning contexts: a 6-week intensive overseas immersion program (OIM), and a 15-week US-based ‘at-home’ foreign language classroom (AH). Fifty-six native speakers of English performed two video-retell tasks at three different time points. A total of five measurements of oral production were calculated. The results show a significant increase in rate of speech over time in the OIM group compared to the AH group. Additionally, the OIM learners show greater use of “disfluencies” over time, namely FPs and short Sps. We suggest that OIM learners increase their use of hesitation phenomena over time as a speech processing and planning strategy and discuss this finding within the framework of L2 cognitive Fluency.

Keywords DiSS, second language fluency, disfluencies, rate of speech, filled pauses, silent pauses, study abroad, Spanish
Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in multiparty social talk,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Much research on disfluencies in spontaneous spoken interaction has been carried out on corpora of task-based conversations, resulting in greater understanding of the role of several phenomena. Modern multimodal corpora allow the full spectrum of signals in face to face communication to be analysed. However, the ‘unmarked’ case of casual conversation or social talk with no obvious short-term instrumental goal has been less studied in this manner. Corpus-based work on social talk tends to deal with short dyadic interactions, although the norm for social conversation is for longer multiparty interaction. In this paper, we outline our programme of exploratory studies of disfluency in a longer multiparty conversation. We briefly describe the background to our research goals, and then report on the collection, transcription, and annotation of the data for our experiments. We present and discuss some of our early results.

Keywords DiSS, disfluency, hesitation, repair, casual conversation, spoken interaction
Iulia Grosman, “Complexity cues or attention triggers? Repetitions and editing terms for native speakers of French,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract A growing stream of research shows evidence of the metalinguistic information that disfluencies (silent and filled pauses, repetitions, false-starts, repairs, etc.) can display to listeners. As a result, disfluencies may work as fluent devices. By means of a decision task latencies, this study investigates whether lexical repetition co-occurring with an editing term affects the perception of native speakers of French. There is a lack of consensus in the literature: do disfluencies trigger conceptual priming of complex entity or act simply as attention cues? Results from multiple analysis of variance and linear mixed-effect modelling show that the presence of a disfluency triggers a faster response from the participant, however complex the following noun-phrase might be, supporting the hypothesis that repetition and co-occurring editing terms act as cognitive signposts rather than as cues of complexity of an upcoming event.

Keywords DiSS, disfluencies, reaction time, perception, prosody, repetitions, French
Sandra Götz, “Fluency in ENL, ESL and EFL: A corpus-based approach,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Against the background of a ‘cline model’ of increasing fluency/decreasing disfluency from ENL to ESL to EFL forms of English, the present pilot study investigates (dis)fluency features in British English, Sri Lankan English and German Learner English. The analysis of selected variables of temporal fluency (viz. unfilled pauses, mean length of runs) and fluency-enhancement strategies (viz. discourse markers, smallwords and repeats) is based on the c. 40,000-word subcorpora of the British and the Sri Lankan components of the International Corpus of English (ICE-GB and ICE-SL) and the c. 80,000-word German component of the Louvain International Database of Spoken English Interlanguage (LINDSEI-GE). The study reveals that, while the EFL variant shows the lowest degree of temporal fluency (e.g. the highest number of unfilled pauses), the findings are mixed for ESL and ENL (e.g. the ESL speakers show a lower number of unfilled pauses, but the ENL speakers show a higher number of smallwords). Also, variant-specific preferences of using certain fluency-enhancement strategies become clearly visible.

Keywords DiSS, ENL vs. ESL vs. EFL, fluency, corpus-based (dis)fluency, fluency profiles
Zara Harmon, and Vsevolod Kapatsinski, “Studying the dynamics of lexical access using disfluencies,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Faced with planning problems related to lexical access, speakers take advantage of a major function of disfluencies: buying time. It is reasonable, then, to expect that the structure of disfluencies sheds light on the mechanisms underlying lexical access. Using data from the Switchboard Corpus, we investigated the effect of semantic competition during lexical access on repetition disfluencies. We hypothesized that the more time the speaker needs to access the following unit, the longer the repetition. We examined the repetitions preceding verbs and nouns and tested predictors influencing the accessibility of these items. Results suggest that speed of lexical access negatively correlates with the length of repetition and that the main determinants of lexical access speed differ for verbs and nouns. Longer disfluencies before verbs appear to be due to significant paradigmatic competition from semantically similar verbs. For nouns, they occur when the noun is relatively unpredictable given the preceding context.

Keywords DiSS, repetition, lexical access, semantic competition, sentence planning, lexicalization
Clara Hedenqvist, Frida Persson, and Robert Eklund, “Disfluency incidence in 6-year old Swedish boys and girls with typical language development,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This paper reports the prevalence of disfluencies in a group of 55 (25F/30M) Swedish children with typical speech development, and within the age range 6;0 and 6;11. All children had Swedish as their mother tongue. Speech was elicited using an “event picture” which the children described in their own, spontaneously produced, words. The data were analysed with regard to sex differences and lexical ability, including size of vocabulary and word retrieval, which was assessed using the two tests Peabody Picture Vocabulary Test and Ordracet. Results showed that girls produced significantly more unfilled pauses, prolongations and sound repetitions, while boys produced more word repetitions. However, no correlation with lexical development was found. The results are of interest to speech pathologists who study early speech development in search for potential early predictors of speech pathologies.

Keywords DiSS, speech disfluency, children, lexical development, sex differences
Julian Hough, Laura de Ruiter, Simon Betz, and David Schlangen, “Disfluency and laughter annotation agreement in a light-weight dialogue mark-up protocol,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Despite a great deal of research effort, disfluency and laughter annotation is still an unsolved problem, both in terms of consensus for a general applicable system, and in terms of annotation agreement metrics. In this paper we present a new annotation scheme within a light-weight mark-up for spontaneous speech. We show, despite the low overhead required for understanding the annotation protocol, it allows for good inter-annotator agreement and can be used to map onto existing disfluency categorization, with no loss of information.

Keywords DiSS, disfluency annotation, laughter, German corpora, inter-annotator agreement, spontaneous speech
Peter Howell, “Intervention for children with word-finding difficulty: Impact on fluency during spontaneous speech for children using English as their native or as an additional language,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Types of intervention that could be targeted when there are high rates of word-finding difficulty were examined for any impact they had on speech fluency (whole-word repetition rate in particular). Results are reported that are interpreted as showing that a semantic-based intervention has an impact on fluency as well as word-finding.

Keywords DiSS, EAL, word-finding, stuttering, intervention
Hanae Koiso, and Yasuharu Den, “Causal analysis of acoustic and linguistic factors related to speech planning in Japanese monologs,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract In this paper, we applied a general method of testing path models, investigating causal relationship between cognitive load in speech planning and four types of disfluencies in Japanese monologs. The four disfluencies examined were i) clause-initial fillers, ii) inter-clausal pauses, iii) clause-final lengthening, and iv) boundary pitch movements, which occurred at weak clause boundaries. The length of the constituents following weak clause boundaries was assumed to be a measure of the complexity affecting the cognitive load. By using a model selection technique based on the AIC, we found an optimal model with the smallest AIC, in which the constituent complexity had direct effects on all of the four disfluency variables. In addition, some of the disfluencies influenced one another; clause-final lengthening was enhanced by the presence of a boundary pitch movement and the occurrence of clause-initial fillers was affected by all the other three disfluency variables.

Keywords DiSS, path models, fillers, pauses, clause-final lengthening, boundary pitch movements
Kikuo Maekawa, and Hiroki Mori, “Voice quality analysis of Japanese filled pauses : a preliminary report,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Using the Core of the Corpus of Spontaneous Japanese, acoustic analysis of F1, spectral tilt (TL), H1-H2, jitter and F0 was conducted to examine the voice-quality difference between the vowels in filled pauses and those in ordinary lexical items. It turned out by simple SVM analysis that the two classes of vowels could be discriminated with the mean accuracy of higher than 70%.

Keywords DiSS
Helena Moniz, Jaime Ferreira, Fernando Batista, and Isabel Trancoso, “Disfluency detection across domains,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.

Keywords DiSS, disfluency detection, acoustic-prosodic features, cross-domain analysis, European Portuguese.
Ralph Rose, “Um and uh as differential delay markers: the role of contextual factors,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract The English filled pauses uh and um have been argued to correspond respectively to shorter and longer anticipated delays in speech production. This study looks at some contextual factors that might cause this difference by investigating filled pause instances in monologue and conversation speech corpora. Results are consistent with previously observed delay differences and further show that discourse-level processing may influence differential delay marking though monologue results are more conclusive than conversation results. However, no evidence was found that lexical factors (word type, frequency) correlate with filled pause choice. The findings suggest a limited view of how speakers use filled pauses as delay markers: Not all contextual factors may trigger differential delay marking.

Keywords DiSS, filled pause, delay, contextual factors
Vered Silber-Varod, Adva Weiss, and Noam Amir, “Can you hear these mid-front vowels? Formants analysis of hesitation disfluencies in spontaneous Hebrew,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This study attempts to characterize the timbre of the default type of hesitation disfluency (HD) in Israeli Hebrew: the mid-front vowel /e/. For this purpose, we analysed the frequencies of the first three formants, F1, F2, and F3, of hundreds of HD pronunciations taken from The Corpus of Spoken Israeli Hebrew (COSIH). We also compared the formant values with two former studies that were carried out on the vowel /e/ in fluent speech. The findings show that, in general, elongated word-final syllables and appended [e]s are pronounced with the same amount of openness as fluent [e], while filled pauses tend to be more open (lower F1), and more frontal (higher F2). Following these results, we suggest to use different set of IPA symbols, and not the phonemic mid-front /e/, in order to better represent hesitation disfluencies.

Keywords DiSS, hesitation disfluency, filled pauses, LPC analysis, formants, spontaneous speech, Hebrew
Jozsef Szakos, and Ulrike Glavitsch, “Investigating disfluency in recordings of last speakers of endangered Austronesion languages in Taiwan,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract The nearly three decades spent in Formosan language documentation produced hundreds of hours of recorded speech. In this paper, we show how the use of SpeechIndexer for transcribing and indexing the data visualises the problem of disfluency in the spontaneous narratives and dialogues. The semiautomatic alignment of speech and transcription needs to be adjusted manually each time when unpredictable pauses occur which are disfluencies, rather than markers of phrasal units. It is illustrated how the combination of SpeechIndexer’s pause finder with pitch measurements can help to pinpoint the difference of phrasal boundaries and pauses of disfluency.

Keywords DiSS, Austronesian, lesser-documented unwritten language, SpeechIndexer, pause finder
Leimin Tian, Catherine Lai, and Johanna Moore, “Recognising emotions in dialogues with disfluencies and non-verbal vocalisations,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV features alone is not as predictive as using lexical or acoustic features. However, adding them to lexical or acoustic feature set yields improvement compared to using lexical or acoustic features alone. This indicates that disfluencies and non-verbal vocalisations provide useful information overlooked by the other two types of features for emotion recognition.

Keywords DiSS, emotion recognition, dialogue, disfluency, speech processing, HCI
Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King, “A lattice-based approach to automatic filled pause insertion,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.

Keywords DiSS, disfluency, filled pauses, f-RNNLMs, Ngrams, lattices
Michiko Watanabe, Yosuke Kashiwagi, and Kikuo Maekawa, “The relationship between preceding clause type, subsequent clause length and duration of silent and filled pauses at clause boundaries in Japanese monologues,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Filled pauses (FPs) are claimed to occur when speakers have some difficulties and need extra time in speech production. This study investigated whether the following two factors affect silent pause (SP) and FP durations at clause boundaries, using a spontaneous speech corpus: 1) boundary strength and 2) subsequent clause length. First, whether SP and FP durations increase with syntactic boundary strength was examined. Second, whether subsequent clause length affects SP and FP durations at the boundaries was investigated. Results show SP duration increased with boundary strength and subsequent clause length, but FP duration did not, suggesting only SP duration is affected by the two Factors.

Keywords DiSS, silent pause, filled pause, clause boundary, speech planning, disfluency
Mirjam Wester, Martin Corley, and Rasmus Dall, “The temporal delay hypothesis: natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Including disfluencies in synthetic speech is being explored as a way of making synthetic speech sound more natural and conversational. How to measure whether the resulting speech is actually more natural, however, is not straightforward. Conventional approaches to synthetic speech evaluation fall short as a listener is either primed to prefer stimuli with filled pauses or, when they aren’t primed they prefer more fluent speech. Psycholinguistic reaction time experiments may circumvent this issue. In this paper, we revisit one such reaction time experiment. For natural speech, delays in word onset were found to facilitate word recognition regardless of the type of delay; be they a filled pause (um), silence or a tone. We expand these experiments by examining the effect of using vocoded and synthetic speech. Our results partially replicate previous findings. For natural and vocoded speech, if the delay is a silent pause, significant increases in the speed of word recognition are found. If the delay comprises a filled pause there is a significant increase in reaction time for vocoded speech but not for natural speech. For synthetic speech, no clear effects of delay on word recognition are found. We hypothesise this is because it takes longer (requires more cognitive resources) to process synthetic speech than natural or vocoded speech.

Keywords DiSS, delay hypothesis, disfluency
Clare Wright, and Cong Zhang, “The effect of study abroad experience on L2 Mandarin disfluency in different types of tasks,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Disfluency is a common phenomenon in L2 speech, especially in beginners’ speech. Whether studying abroad can help with reducing their disfluency or not remains debated [8]. We examined longitudinal data from 10 adult English instructed learners of Mandarin measured before and after ten months of studying abroad (SA) in this paper. We used two speaking tasks comparing pre-planned vs. Unplanned spontaneous speech to compare differences over time and between tasks, using eight linguistic and temporal fluency measures (analysed using CLAN and PRAAT). Overall mean linguistic and temporal fluency scores improved significantly (p < .05), especially speech rate (p <.01), supporting the general claim that SA favours oral development, particularly fluency [2]. Further analysis revealed task differences at both times of measurement, but with greater improvement in the spontaneous task.

Keywords DiSS, fluency; L2 Mandarin; study abroad

2013

Julie Beliao, and Anne Lacheret, “Disfluency and discursive markers: when prosody and syntax plan discourse,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 5-8. http://www.isca-speech.org/archive/diss_2013/papers/dis6_005.pdf.

Abstract Hesitations, interruptions within phrases or within words are common in spontaneous speech. Those phenomena are widely known to be observable from a prosodic point of view through disfluencies. From a syntactic point of view, many studies already established that discursive markers such as hm, oh, I mean, etc. are representative of spontaneous speech. In this study, we demonstrate through a joint corpus-based analysis that these prosodical and syntactical features are correlated, without however being equivalent. More precisely, the lack of either disfluencies or discursive markers is consistently shown to be representative of a planned discourse.

Keywords DiSS, disfluency, discursive marker, genres
Malte Belz, and Myriam Klapi, “Pauses following fillers in L1 and L2 German map task dialogues,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 9-12. http://www.isca-speech.org/archive/diss_2013/papers/dis6_009.pdf.

Abstract Fillers and pauses in spoken language indicate hesitations. Filler type (uh vs. um) is believed to signal a minor or major following speech delay in L1. We examined whether advanced speakers of L2 German use pauses following filler type (äh vs. ähm) in the same way as native speakers do. Two Map Task corpora of L1 and L2 were contrasted with respect to speaker role, filler type and the exact time interval of fillers and pauses. Speaker role influenced the disfluency patterns in L1 and L2 in the same way. Filler type had no impact on the length of the following pause, but the time interval patterns differed significantly. Longer filler intervals are followed by longer pauses in L2 and by shorter pauses in L1. These results suggest that filler type in German is not used to indicate the length of the following delay. Advanced learners seem to have adopted this pattern of use, but cannot overcome their hesitations as fast as native speakers, probably due to their less automatised speech production.

Keywords DiSS, fillers, pauses, spontaneous speech, L1, L2, map task, German, disfluencies, contrastive analysis
Sara Candeias, Dirce Celorico, Jorge Proença, Arlindo Veiga, and Fernando Perdigão, “HESITA(tions) in Portuguese: a database,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 13-16. http://www.isca-speech.org/archive/diss_2013/papers/dis6_013.pdf.

Abstract With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.

Keywords DiSS, hesitations, disfluency, prepared speech, spontaneous speech, annotation, hesitation corpus
Nivja H. de Jong, and Hans Rutger Bosker, “Choosing a threshold for silent pauses to measure second language fluency,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 17-20. http://www.isca-speech.org/archive/diss_2013/papers/dis6_017.pdf.

Abstract Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.

Keywords DiSS, silent pauses, number of pauses, duration of pauses, silent pause threshold, second language speech
Laura E. de Ruiter, “Self-repairs in German children's peer interaction - initial explorations,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 29-32. http://www.isca-speech.org/archive/diss_2013/papers/dis6_029.pdf.

Abstract Forty-nine self-repairs were extracted from a corpus of conversational speech of ten German children (mean age 5;1) with peers. The repairs were analysed using Levelt’s [1] classification and compared with his adult data. Children produced fewer appropriateness repairs than adults, but more covert repairs and more phonetic repairs. Like adults, children had a preference to interrupt themselves within-word only for error repairs. Unlike adults, children did not produce editing terms following interruptions.

Keywords DiSS
Andrea Deme, and Alexandra Markó, “Lengthenings aand filled pauses in Hungarian adults' and children's speech,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 21-24. http://www.isca-speech.org/archive/diss_2013/papers/dis6_021.pdf.

Abstract In the present paper vowel lengthenings and non-lexicalized filled pauses were studied in the spontaneous speech of children and adults (focusing more on the much less studied phenomenon: vowel lengthening). The results revealed different usage and appearance of lengthenings in the two age groups, therefore, differences in speech skills and strategies can be concluded. LEs and FPs differ mostly in their position in the speech session between the age groups, which has implications regarding different planning strategies of adults and children. We also draw conclusions regarding the methodological considerations in the issue of identifying vowel lengthening supporting a previously formulated conception.

Keywords DiSS, lengthening, (non-lexicalized) filled pause, spontaneous speech, speech planning, discourse management
Yasuharu Den, and Natsuko Nakagawa, “Anti-zero pronominalization: when Japanese speakers overtly express omissible topic phrases,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 25-28. http://www.isca-speech.org/archive/diss_2013/papers/dis6_025.pdf.

Abstract In this paper, we focus on cases where Japanese speakers overtly express a topic phrase that could have been omitted. We call this phenomenon anti-zero-pronominalization and hypothesize that this helps speakers gain time for planning a following utterance; anti-zero-pronominalization is another option to deal with cognitive load at the beginning of an utterance in addition to fillers and other speech disfluencies. Based on a quantitative analysis of a corpus of spontaneous Japanese dialogs, we investigate the difference between overt topic NPs and zero-pronouns. We show that i) the utterance is more complex when the topic is expressed as an overt NP than when it is expressed as a zero-pronoun; ii) turn-initial items such as fillers are produced less frequently when overt NPs appear than when zero-pronouns appear; and iii) the utterance becomes more complex when the last mora of the topic is more prolonged.

Keywords DiSS, zero-pronouns, topic phrases, cognitive load, Japanese dialogs
Jonathan Ginzburg, Raquel Fernández, and David Schlangen, “Self-addressed questions in disfluencies,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 33-36. http://www.isca-speech.org/archive/diss_2013/papers/dis6_033.pdf.

Abstract The paper considers self-addressed queries – queries speakers address to themselves in the aftermath of a filled pause. We study their distribution in the BNC and show that such queries show signs of sensitivity to the syntactic/semantic type of the sub-utterance they follow. We offer a formal model that explains the coherence of such queries.

Keywords DiSS
Hanae Koiso, and Yasuharu Den, “Acoustic and linguistics features related to speech planning appearing at weak clause boundaries in Japanese monologs,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 37-40. http://www.isca-speech.org/archive/diss_2013/papers/dis6_037.pdf.

Abstract In this paper, we focus on weak clause boundaries in Japanese monologs in order to investigate the relationship of the length of constituents following weak boundaries to three acoustic and linguistic features: 1) occurrence rate of fillers, 2) occurrence rate of boundary pitch movements, and 3) degree of lengthening of clause-final morae. We found that all these features were significantly correlated with the length of following constituents. Most importantly, boundary pitch movements had an additional effect that can be distinct from the effect of clause-final lengthening. These results suggest that Japanese speakers have earlier-occurring items that help them deal with cognitive load in speech planning, in addition to fillers and other clause-initial disfluencies.

Keywords DiSS, fillers, boundary pitch movements, clause-final lengthening, Japanese monologs
Kikuo Maekawa, “Prediction of F0 height of filled pauses in spontaneous Japanese: a preliminary report,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 41-44. http://www.isca-speech.org/archive/diss_2013/papers/dis6_041.pdf.

Abstract F0 values of filled pauses (FP) in the Corpus of Spontaneous Japanese were analyzed to examine the mechanism by which the F0 heights of FP were determined. Statistical analyses of the F0 values of FP occurring in between two full-fledged accentual phrases (AP) revealed correspondence between the occurrence timing of FP and the F0 height. Based upon this finding, 5 models of F0 prediction were proposed. Comparison of the mean prediction errors revealed that the best prediction was obtained in a model that linearly interpolate the phrase-final L% tone of the immediately preceding AP and the phrase-initial %L tone of the immediately following AP. This finding suggests that the F0 of FP was specified at the level of phonetic realization rather than phonological prosodic representation.

Keywords DiSS
Takehiko Maruyama, “Analysis of parenthetical clauses in spontaneous Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 45-48. http://www.isca-speech.org/archive/diss_2013/papers/dis6_045.pdf.

Abstract In this paper, I will discuss the functional aspects of parenthetical clauses and sentences in spontaneous Japanese monologues. Parentheticals can be defined as syntactic elements that are instantly inserted in the middle of an ongoing utterance to add supplemental information and thus interrupts the fluent flow of speech production. Examples of parenthetical clauses/sentences that appeared in the Corpus of Spontaneous Japanese were examined and then classified into three types. These types differ in their contextual functions, but share a commonality in that they present multiplex information simultaneously in the process of producing spontaneous speech.

Keywords DiSS, parenthetical clause/sentence, Corpus of Spontaneous Japanese, contextual functions
Helena Moniz, Fernando Batista, Isabel Trancoso, and Ana Isabel Mata, “Automatic structural metadata identification based on multilayer prosodic information,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 49-52. http://www.isca-speech.org/archive/diss_2013/papers/dis6_049.pdf.

Abstract This paper discriminates different types of structural metadata in transcripts of university lectures: boundary events (comma, full stops and interrogatives), and disfluencies (repair). The disambiguation process is based on predefined multilayered linguistic information and on its hierarchical structure. Since boundary events may share similar linguistic properties, in terms of F0 and energy slopes, presence/absence of silent pauses, and duration of different units of analysis, different classification methods based on a set of automatically derived prosodic features have been applied to differentiate between those events and disfluencies. This paper also performs a detailed analysis on the impact of each individual feature in discriminating each structural event. The results of our data-driven approach allow us to reach a structured set of basic features towards the disambiguation of metadata events. These results are a step forward towards the analysis of speech acts and their disambiguation from disfluencies.

Keywords DiSS, disfluencies, automatic speech processing, structural metadata, speech prosody
Rena Nemoto, “Which kind of hesitations can be found in Estonian spontaneous speech?,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 53-54. http://www.isca-speech.org/archive/diss_2013/papers/dis6_053.pdf.

Abstract This paper describes the acoustic characteristics of hesitations in Estonian spontaneous speech. We especially investigate duration, fundamental frequency, and first two formant analyses. Most frequent hesitations can be expressed by lengthened phonemes such as /ää/, /ee/, /õõ/, and /mm/. We compare lengthened phoneme hesitations with their related phonemes. The results from our preliminary hesitation study show (i) hesitations have longer duration and its range is spread; (ii) hesitations globally include lower pitch; (iii) hesitation formants are likely to be centralized or posterior and opened in comparison with related phonemes.

Keywords DiSS, hesitation, Estonian, spontaneous speech
Sieb Nooteboom, and Hugo Quené, “Self-monitoring as reflected in identification of misspoken segments,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 55-57. http://www.isca-speech.org/archive/diss_2013/papers/dis6_055.pdf.

Abstract Most segmental speech errors probably are articulatory blends of competing segments. Perceptual consequences were studied in listeners' reactions to misspoken segments. 291 speech fragments containing misspoken initial consonants plus 291 correct control fragments, all stemming from earlier SLIP experiments, were presented for identification to listeners. Results show that misidentifications (i.e. deviations from an earlier auditory transcription) are rare (3%), but reaction times to correctly identified fragments systematically reflect differences between correct controls, undetected, early detected and late detected speech errors, leading to the following speculative conclusions: (1) segmental errors begin their life in inner speech as full substitutions, and competition with correct target segments often is slightly delayed; (2) in early interruptions speech is initiated before competing target segments are activated, but then rapidly interrupted after error detection; (3) late detected errors reflect conflict-based monitoring of articulation or monitoring overt speech.

Keywords DiSS
Klim Peshkov, Laurent Prévot, Stéphane Rauzy, and Berthille Pallaud, “Catogorizing syntactic chunks for marking disfluent speech in French language,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 59-62. http://www.isca-speech.org/archive/diss_2013/papers/dis6_059.pdf.

Abstract Disfluency is the first phenomenon one has to address when processing spontaneous speech. Efficient systems combining transcription-based and signal-based cues have been created for English. These systems generally use supervised machine learning models, trained over large annotated datasets combining signal and transcription. As for other languages, including French, the situation is complicated by the lack of resources. A few proposals based on filled pauses, truncated words and repetitions have been made for identifying disfluencies in French. In this paper, we propose a transcription-based approach to this task, with high-quality morpho-syntactic tags as input for identifying disfluent areas. Originally, we adopted a transcription-based approach for obtaining an independent way of characterizing disfluencies. This can be later compared and combined with prosodic cues. Our method consists in building syntactic chunks from our tagging and then classify these chunks into several categories, some of them being considered as disfluent. We apply our method to speaker style characterization, discourse genres zoning, as well as to dataset cleaning. Finally, an attempt is made to relate our disfluent chunks to a more standard description of disfluencies in order to open the way of a deeper integration of our work with the one of the disfluency community.

Keywords DiSS, tagging, chunking, transcription-based approach, disfluencies, speaking style
Jorge Proença, Dirce Celorico, Arlindo Veiga, Sara Candeias, and Fernando Perdigão, “Acoustical characterization of vocalic fillers in European Portuguese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 63-66. http://www.isca-speech.org/archive/diss_2013/papers/dis6_063.pdf.

Abstract This study attempts to acoustically characterize the most common filled pause vocalizations (or vocalic fillers) in spontaneous speech in European Portuguese: the near-open central vowel [ɐ] and the mid-central vowel [ə]. For this purpose we analyzed the spectral information of the vocalic fillers by estimating their first two formant frequencies as well as their duration properties. The vocalic fillers are taken from a large corpus of European Portuguese broadcast news' speech. We also compared the vocalic fillers with lexical vowels possessing similar timbre. No formant variation trend was attained for the vocalic fillers and a great overlap of formant values is observed. These results provide a base of information for understanding the most common vocalic fillers in European Portuguese spontaneous speech.

Keywords DiSS, filled pauses, vocalic fillers, formant estimation, spontaneous speech, hesitations
Vered Silber-Varod, and Takehiko Maruyama, “The linguistic role of hesitation disfluencies: evidence from Hebrew and Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 67-70. http://www.isca-speech.org/archive/diss_2013/papers/dis6_067.pdf.

Abstract In this paper we examine a certain aspect of prosodysyntax interface, that of hesitation disfluencies (HD) that occur intra-phrases or intra-morphemes. Such cases were found in two spontaneous corpora of two syntactically distinct languages – Israeli Hebrew (IH) and Japanese. It was found that intra-phrasal hesitations in the two languages calls for different explanations, since in Japanese the noun (e.g., in NP) precedes the case marking particle while in IH the preposition (e.g., in PP) precedes the noun. In this paper we will present qualitative findings and suggest a unified view of the phenomenon of intra-phrasal HDs.

Keywords DiSS, hesitation disfluency, prosody-syntax interface, Israeli Hebrew, Japanese
Michiko Watanabe, “Phrasal complexity and the occurrence of filled pauses in presentation speeches in Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 71-72. http://www.isca-speech.org/archive/diss_2013/papers/dis6_071.pdf.

Abstract Filled pauses are ubiquitous in everyday speech. I investigated whether linguistic complexity of upcoming phrases affects filler rate at phrase boundaries in presentation speeches in Japanese. Filler rate at phrase boundaries increased monotonically with complexity of the following phrases. However, when the following phrase was composed of more than 11 Bunsetsu-phrases, the filler rate did not show any constant increase. The results indicate that filler rate at phrase boundaries is closely related to cognitive load of local linguistic encoding and that the maximum planning span for linguistic encoding is about 10 Bunsetsu-phrases in Japanese monologues.

Keywords DiSS, filled pause, bunsetsu-phrase, linguistic complexity, planning load
Charlotte Wollermann, Eva Lasarcyk, Ulrich Schade, and Bernhard Schröder, “Disfluencies and uncertainty perception - evidence from a human-machine scenario,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 73-76. http://www.isca-speech.org/archive/diss_2013/papers/dis6_073.pdf.

Abstract This paper deals with the modelling and perception of disfluencies in articulatory speech synthesis. The stimuli are embedded into short dialogues in question-answering situations in a human–machine scenario. The system is supposed to express uncertainty in the answer. We test the influence of delay, intonation, and filler as prosodic indicators of uncertainty on perception in two studies. Study 1 deals with the effect of delay and filler on uncertainty perception. Results suggest an additive effect of the cues, i.e. the activation of both prosodic cues of uncertainty has a stronger impact on uncertainty perception than the deactivation of a single cue or of both cues. With respect to the effect of single cues, no significant difference can be observed. Study 2 investigates the impact of delay and intonation on perceived uncertainty. Again, a principle of additivity can be observed. Furthermore as modelled here, intonation has a stronger influence than delay. In both studies no correlation between the ranking of uncertainty and naturalness of the stimuli is found.

Keywords DiSS, uncertainty, disfluencies, speech synthesis, speech perception

2010

Rachel Baker, and Valerie Hazan, “LUCID: a corpus of spontaneous and read clear speech in British English,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 3-6. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_003.pdf.

Abstract This paper describes LUCID, the London UCL Clear Speech in Interaction Database, which contains spontaneous and read speech in clear and casual speaking styles for 40 Southern British English speakers. The problem-solving task used to collect the spontaneous speech, the DiapixUK task, is also described, along with ways of using the task to elicit different types of clear speech without explicit instruction, e,g. using different ‘barriers’ to communication. Applications of the corpus and of the task materials for future research projects are discussed. The corpus and materials will be available online to the research community at the end of the project.

Keywords DiSS, spontaneous speech, speech production, clear speech, interaction
Catia Cucchiarini, Joost van Doremalen, and Helmer Strik, “Fluency in non-native read and spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 15-18. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_015.pdf.

Abstract Various studies have investigated the temporal aspects of nonnative speech and their relation to perceived fluency, because fluency constitutes an important aspect of second language proficiency. For this purpose it is important to determine which measures are most strongly correlated with perceived fluency and how these measures vary. In the present study objective measures related to perceived fluency were calculated for read and spontaneous speech of non-native speakers of Dutch. The results indicate that the objective measures vary as a function of different variables. Suggestions are made for future investigations so as to facilitate comparisons between studies and meta-analyses.

Keywords DiSS, fluency, non-native speech, temporal measures
Anne Cutler, Holger Mitterer, Susanne Brouwer, and Annelie Tuinman, “Phonological competition in casual speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 43-46. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_043.pdf.

Abstract The natural processes affecting spontaneous speech production and the natural processes of spoken-word recognition combine to cause significant activation of irrelevant lexical competitors. Using eye-tracking, we show that reduced forms of words that occur in casual speech cause listeners to activate lexical candidates that resemble the reduced form but are quite unlike the canonical form of the intended word. In L2, the problem is worse: casual speech processes that occur in the L2 but not in the L1 lead to activation of irrelevant competitors even where native listeners experience no such competition.

Keywords DiSS, word recognition, competition, eyetracking
Robert Eklund, “The effect of directed and open disambiguation prompts in authentic call center data on the frequency and distribution of filled pauses and possible implications for filled pause hypotheses and data collection methodology,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 23-26. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_023.pdf.

Abstract This paper studies the frequency and distribution of filled pauses (FPs) in ecologically valid data where unaware and authentic customers called in to report problems with their telephony and/or Internet services and were met by a novel Wizard-of-Oz paradigm using real call center agents as wizards. The data analyzed were caller utterances following a directed or an open disambiguation prompt. While no significant differences in FP production were observed as a function of prompt type, FP frequency was found to be considerably higher than what is usually reported in the literature. Moreover, a higher proportion of utterance-initial FPs than normally reported was also observed. The results are compared to previously reported FP frequencies. Potential implications for data collection methodology are discussed.

Keywords DiSS, filled pauses, Wizard-of-Oz, WOZ, speech planning, speech production, many-options, data collection, open prompts, directed prompts, call center, dialog systems
Ian R. Finlayson, Robin J. Lickley, and Martin Corley, “The influence of articulation rate, and the disfluency of others, on one's own speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 119-122. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_119.pdf.

Abstract Disfluencies are a regular feature of spontaneous speech, and much has been learnt about the effects of various linguistic factors on their production. Speech usually occurs within dialogue, yet little is known about the influence of an interlocutor's speech on a speaker's own fluency. It has been shown that speakers tend to align on various levels, converging, for example, on lexical, and syntactic levels. But we know little about convergence in rate of speech or disfluency. Little is also known about the effects of speech rate on fluency in a speaker's own speech. In this paper, we examine these effects through analysis of speech rate, hesitation and error correction in a corpus of task-oriented dialogues (the HCRC Map Task Corpus). Our findings demonstrate that different types of disfluencies can be influenced in different ways by speech rate. Furthermore, the probability of an interlocutor being disfluent appears to affect the speaker's own likelihood, raising the possibility that interlocutors may “align” on disfluent, as well as fluent, speech.

Keywords DiSS, articulation rate, alignment, accommodation theory, dialogue
Anne Garcia-Fernandez, Ioana Vasilescu, and Sophie Rosset, “euh as cue for speaker confidence and word searching in human spoken answers in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 79-80. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_079.pdf.

Abstract This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question answering automatic systems.

Keywords DiSS, vocalic hesitation, feeling of knowing, rephrasing, interaction management, QA systems
Jean-Philippe Goldman, Mathieu Avanzi, and Antoine Auchlin, “Hesitations in read vs. spontaneous French in a multi-genre corpus,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 101-104. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_101.pdf.

Abstract This study is a part of an on-going work whose goal is the prosodic characterization of various speaking styles in a multi-genre 70-minutes French corpus as well as the development of prosodic automatic detection tools. In this corpus, a manual annotation prominences and disfluencies like hesitations and syntactic ruptures is used to show evident phonological aspects of hesitation in regard to quality, pause position and proximity to syntactic rupture.

Keywords DiSS, hesitation, filled pause, vowel lengthening, spoken French, disfluencies
Joakim Gustafson, and Daniel Neiberg, “Prosodic cues to engagement in non-lexical response tokens in Swedish,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 63-66. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_063.pdf.

Abstract This paper investigates the prosodic patterns of non-lexical response tokens in a Swedish call-in radio show. The feedback of a professional speaker was investigated to give insight in how to build a simulated active listener that could encourage its users to continue talking. Possible domains for such systems include customer care and second language learning. The prosodic analysis of the non-lexical response tokens showed that the engagement level decreases over time. Prosodic cues to this include change in syllabicity, pitch slope and loudness. We have also investigated prosodic alignment, to see to what extent the active listener mimic the prosody of the callers in his non-lexical response tokens.

Keywords DiSS, listener responses, prosodic cues, turn management, prosodic alignment
Corinna Harwardt, “Investigating the COG ratio as feature for speaker verification on high-effort speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 35-38. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_035.pdf.

Abstract Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%. When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

Keywords DiSS, vocal effort, speaker recognition, center of gravity ratio
Valerie Hazan, and Rachel Baker, “Does reading clearly produce the same acoustic-phonetic modifications as spontaneous speech in a clear speaking style?,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 7-10. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_007.pdf.

Abstract This paper describes an acoustic-phonetic comparison of casual and clear speech styles elicited in read and spontaneous speech. For the spontaneous speech, 20 pairs of English talkers were recorded doing a problem-solving picture task in good and degraded listening conditions. Each person also read sentences in casual and clear styles. The read clear speech was an exaggerated form of clear speech relative to the spontaneous clear speech: it had higher median F0 in both styles, a greater increase in F0 range and greater decrease in speaking rate between casual and clear styles, and trends towards greater vowel space expansion.

Keywords DiSS, spontaneous speech, read speech, clear speech, interaction, acoustic-phonetic characteristics
Pei-Yu Hsieh, “Pitch patterns in the vocalization of a 3-month-old Taiwanese infant,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 93-96. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_093.pdf.

Abstract This paper studied pitch contours of a Taiwanese-acquiring infant at gooing stage. Breath group theory has shown that pitch patterns of this stage were physiologically-based [6]. Fall was expected to occur at the boundary of a breath group. It predicted that Fall to be the most common pitch contour, and the second high was Rise-Fall. But previous studies [8], [9] showed that Rise-Fall occurred more. We investigated patterns of an infant from six weeks old to twelve weeks old. Mean f0 of basic contours of this stage were also shown. The f0 range of Level, Fall, and Rise were reported. Our results showed four types of contours (Level, Fall, Rise, Rise-Fall) appearing at this stage. Consistent with the hypothesis, Fall was found to be most common. Rise-Fall was found to be the second high. Fall and Rise-Fall made up to almost seventy percent. Level contour was found to be rare. The mean f0 of the infant at 3-month old was 400 Hz, higher than that of a toddler at 1;3 (370 Hz) and that of an adult (220 Hz). The f0 range was 700 Hz, greater than that of a toddler at 1;3 (450 Hz), and an adult (300 Hz).

Keywords DiSS, vocalization, pitch, acquisition
Yuichi Ishimoto, and Mika Enomoto, “Analysis of prosodic features for end-of-utterance prediction in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 97-100. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_097.pdf.

Abstract In this study, we analyzed prosodic features of accentual phrases and investigated their temporal changes to obtain cues for de- tecting boundaries at where turn-taking could occur in sponta- neous conversations. The acoustic parameters used as prosodic features were the fundamental frequency, sound pressure level, and duration of accentual phrases in long utterance units. The results showed that the fundamental frequency shift between the first and second accentual phrases could be useful for detecting the number of accentual phrases in the long utterance unit. In addition, the results suggested that a rapid decrease in sound pressure and an extended duration of the accentual phrase con- stitute a cue for detecting the end of the utterance. That is, the acoustic predictor of the utterance length appeared at the begin- ning of the utterance, and the predictor of the utterance bound- ary appeared shortly before the end of the utterance.

Keywords DiSS, prosody, turn-taking, accentual phrase, long utterance unit
Kristiina Jokinen, “Hesitation and uncertainty as feedback,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 103-106. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_103.pdf.

Abstract This paper deals with the signals that are used to express hesitation and uncertainty in conversational interactions. It studies the relation between gesturing, body posture, facial expressions, and speech, and draws conclusions of their role and function in the interpretation and coordination of interaction with respect to the basic enablements of communication. Dialogues are assumed to be cooperative activity that is constrained by the participants' roles, social obligations, and communicative situation.

Keywords DiSS, hesitation, uncertainty, interaction, speech
Takuya Kawada, “On the characteristics of three types of Japanese fillers: e-, ma-, and demonstrative-type fillers,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 27-30. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_027.pdf.

Abstract Japanese has various forms of fillers. However, the characteristics of each form have yet to be well understood. We use a large corpus of spontaneous Japanese speech and conversation and focus on three frequently observed types of fillers : e-, ma-, and demonstrative-type fillers. We show that it is possible to characterize Japanese fillers from the viewpoint of how a speaker concerns himself with the listener in the communicative setting. The type of discourse, way of speaking, and direction of gaze of the speaker influence the distribution of the types of filler.

Keywords DiSS, Japanese, fillers, spoken settings, gaze
Hanae Koiso, and Yasuharu Den, “Towards a precise model of turn-taking for conversation: a quantitative analysis of overlapped utterances,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 55-58. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_055.pdf.

Abstract In this paper, we present the outline of a new model of turntaking that is applicable not only to smooth transitions but also to transitions involving overlapping speech. We identify acoustic, prosodic, and syntactic cues in overlapped utterances that elicit early initiation of a next turn, based on a quantitative analysis of Japanese three-party conversations, proposing a model for predicting a turn's completion in an incremental fashion using sources from units at multiple levels.

Keywords DiSS, turn-taking, overlapped utterances, incremental processing
Rebecca Lunsford, Peter A. Heeman, Lois Black, and Jan van Santen, “Autism and the use of fillers: differences between ‘um’ and ‘uh’,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 107-110. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_107.pdf.

Abstract Little research has been done to explore differences in the use of the fillers ‘um’ and ‘uh’ between children with Autistic Spec- trum Disorder (ASD) and those with typical development (TD). Quantifying any differences could aid in diagnosing ASD, un- derstanding its nature, and better understanding the mechanisms involved in dialogue processing. In this paper, we report on a study of dialogues between clinicians and children with ASD or TD, finding that the two groups of children differ substantially in their use of ‘um’ but not ‘uh’. This suggests that these two fillers result from different cognitive processes.

Keywords DiSS, disfluencies, fillers, autism
Kikuo Maekawa, “Final lowering and boundary pitch movements in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 47-50. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_047.pdf.

Abstract Standard theory of the prosodic structure in Tokyo Japanese treats both the final lowering and boundary pitch movements as the properties of utterance node. Validity of this treatment was examined by means of corpus-based analyses of spontaneous speech. The results showed that while final lowering could be treated as a property of utterance, boundary pitch movement could not. The latter should rather be treated as the property of accentual phrase. Based on these results, revised prosodic structure and annotation scheme were proposed.

Keywords DiSS, final lowering, CSJ, X-JToBI, BPM
Takehiko Maruyama, Katsuya Takanashi, and Nao Yoshida, “An annotation scheme for syntactic unit in Japanese dialog,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 51-54. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_051.pdf.

Abstract In this paper, we propose a scheme for annotating syntactic units called DCU (Dialog Clause-Unit) in Japanese dialogs. Since there is no explicit devices to mark sentence boundaries in speech, precise definition and criteria must be designed to extract syntactic units from the utterance. We show a design of DCU which consists of clausal and non-clausal units. Annotating DCU tags to eight dialogs of 40 minutes from two different dialog corpora, we examine characteristics of each dialog from the viewpoint of DCU, and compare them to the distribution of clausal-units annotated to monologs.

Keywords DiSS, dialog clause-unit, Japanese dialog and monolog, clause boundary, unit length
Sandra Merlo, and Plínio A. Barbosa, “Periodic cycles of hesitation phenomena in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 19-22. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_019.pdf.

Abstract To verify whether hesitation phenomena are distributed periodically in spontaneous speech, twenty speech samples produced by five male adults were analyzed. Spectral analysis allowed for three main findings. First, hesitations present stationary behavior, which implies they did not accumulate in the beginning, in the middle, or in the end of speech samples. Second, periodic cycles of hesitation phenomena were detected in all speech samples (mean cycle duration around 13 seconds). This implies that regions with more hesitations tended to regularly alternate with regions with fewer hesitations. Third, periodic cycles accounted for about 30% of variance in data.

Keywords DiSS, hesitation phenomena, time series, periodic cycles
Emi Morita, “Salientizing the breaks in talk: a study of Japanese segmentizing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 59-62. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_059.pdf.

Abstract In naturally occurring conversation, Japanese speakers often break up their turns at talk with seemingly random or disfluent pauses that break the flow of talk into a series of successive small segments which may not be semantically coherent. Moreover, the boundaries between such segments are often made salient via the attachment of interactional particles, such as ne and sa. Empirical observation of such naturally occurring partitioning of talk reveals that such “semantically irregular” segmentation is used by both speakers and their recipients to accomplish a legitimate communicative function in managing the fine-tuned choreography of moment-bymoment conversational interaction.

Keywords DiSS, utterance segmentation, interactional particles, Japanese conversation
Daniel Neiberg, and Joakim Gustafson, “Modeling conversational interaction using coupled Markov chains,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 81-84. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_081.pdf.

Abstract This paper presents a series of experiments on automatic transcription and classification of fillers and feedbacks in conversational speech corpora. A feature combination of PCA projected normalized F0 Constant-Q Cepstra and MFCCs has shown to be effective for standard Hidden Markov Models (HMM). We demonstrate how to model both speaker channel with coupled HMMs and show expected improvements. In particular, we explore model topologies which take advantage of predictive cues for fillers and feedback. This is done by initializing the training with special labels located immediately before fillers in the same channel and immediately before feedbacks in the other speaker channel. The average F-score for a standard HMM is 34.1%, for a coupled HMM 36.7% and for a coupled HMM with pre-filler and pre-feedback labels 40.4%. In a pilot study the detectors are found to be useful for semi-automatic transcription of feedback and fillers in socializing conversations.

Keywords DiSS, fillers, feedbacks, coupled hidden markov models, cross-speaker modeling, conversation
Hannele Nicholson, Kathleen Eberhard, and Matthias Scheutz, “"um...i don't see any": the function of filled pauses and repairs,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 89-92. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_089.pdf.

Abstract We investigate disfluency distribution rates within different moves from an interactive task-oriented experiment to further explore the suggestion by Bortfeld et al. [1] and Nicholson [2] that different types of disfluencies may fulfill varying functions. We focus on disfluency types within moves, or speech turns, where a speaker initiates something compared to a response to such a move. We find that filled pauses (FPs) such as um or uh fulfilled an interpersonal role for participants while repairs occurred out of difficulty.

Keywords DiSS, disfluency, dialogue, dialogue moves, language production
Kazuki Sekine, “Gesture correction in children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 71-74. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_071.pdf.

Abstract Speakers sometimes modify their gestures during the process of production into disguised adaptors. Such disguised adaptors can be treated as evidence that speakers can monitor their gestures. This study investigated when disguised adaptors are produced in Japanese elementary school children. The results showed that children did not produce disguised adaptors until the age of 8. The emergence of disguised adaptors suggested that children start to monitor their gestures when they are 9 or 10 years old. Cultural influences and cognitive changes were considered as factors to influence emergence of disguised adaptors.

Keywords DiSS, spontaneous gestures, adaptors, speech error
Shu-Chuan Tseng, and Yun-Ru Huang, “A socio-phonetic analysis of Taiwan Mandarin interview speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 67-70. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_067.pdf.

Abstract This paper presents results of a socio-phonetic analysis of Taiwan Mandarin by using a corpus of questionnaire-based interview speech. Questions were asked to collect data of the interviewee's background of language use, socio-economic status, and internet access in different regions of Taiwan. Two typical dialect-influenced pronunciation errors, the deletion of /w/ before /o/ and the delabilialization of /y/ were analyzed with the associated socio-economic factors and the degree of dialect exposure. The degree of dialect exposure (Southern Min) and the studied pronunciation variants are statistically correlated with the accuracy rate. But no direct correlation was found between the pronunciation variation and the socioeconomic factors.

Keywords DiSS, sociophonetics, Taiwan Mandarin, interview speech
Shu-Chuan Tseng, and Tzu-Lun Lee, “Contextual effects in recognizing reduced words in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 39-42. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_039.pdf.

Abstract This study investigates the effects of context on recognizing reduced word forms in spontaneous speech. Sixteen high-frequency disyllabic targets, eight disyllabic and eight combinations of monosyllabic words are presented to 48 subjects in a spoken word recognition experiment in three conditions: in their original context, in isolation, and embedded in a carrier sentence. Results show that context, degree of reduction, word unit type, gender, and age group all show an effect on the accuracy rates of recognizing the target items. Most interestingly, while a meaningful context helps recognize reduced word forms, a less meaningful context inhibits the recognition more than no context.

Keywords DiSS, spoken word recognition, context effect
Shu-Chuan Tseng, Pei-Chen Tsou, Ko Kuei, and Chien-Wen Lee, “Assessing sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 11-14. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_011.pdf.

Abstract This paper examines sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children with matched gender, age and level of speech comprehension. We assessed these two kinds of speech styles by talker intelligibility, vowel space, and spike production in plosives. In both speaking styles, normally hearing children performed better in talker intelligibility than their hearingimpaired counterparts. No clear vowel space shrinkage was observed in respect of speech style, hearing impairment, and age group. Surprisingly, the production of the spike in plosives was a useful measure for distinguishing acoustic properties of different speaking styles and hearing ability.

Keywords DiSS, speech assessment, hearing impairment, speaking style, acoustic properties
Ioana Vasilescu, Sophie Rosset, and Martine Adda-Decker, “On the functions of the vocalic hesitation euh in interactive man-machine question answering dialogs in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 111-114. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_111.pdf.

Abstract This paper deals with the functions of the French vocalic hesitation euh in interactive speech of man-machine question answering dialogs. The present analysis suggests that the vocalic hesitation euh may carry various properties in speech, both disfluent signaling the speakers' efforts to put the intended message under production into appropriate words, and fluent, as markers of discourse structure. Moreover, euh seems to play a role in bracketing lexical units, pointing to the informative content within an utterance. This bracketing may favour intelligibility or decoding fluency on the listener's side. The potential contribution of the vocalic hesitation euh to lexical information bracketing is investigated with the goal of improved information processing by QA systems. Future objectives include a smarter interaction capacity by an appropriate usage of such euh items.

Keywords DiSS, disfluency, fluency, vocalic hesitation, French, discourse markers, Q/A, dialog corpus
Kun-Ching Wang, Chiun-Li Chin, and Yi-Hsing Tsai, “Voice activity detection based on combination of weighted sub-band features using auto-correlation function,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 85-88. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_085.pdf.

Abstract This paper shows the voice activity detection (VAD) based on combination of weighted sub-band features using autocorrelation function. According to the fact that the noise corruption on each sub-band is different from each other, so the estimated signal to noise ratio (SNR) is employed to weight utility rate of each frequency sub-band. Furthermore, a strategy of sub-band features combination is used to integrate all of weighted sub-band auto-correlation function feature parameter and to develop the combined feature parameter. Experimental results demonstrate that the proposed VAD achieves better performance than existing standard VADs at any noise level.

Keywords DiSS, voice activity detection, auto-correlation, wavelet packet transform, sub-band weighting, feature combination
Michiko Watanabe, and Yasuharu Den, “Utterance-initial elements in Japanese: a comparison among fillers, conjunctions, and topic phrases,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 31-34. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_031.pdf.

Abstract Speakers need to plan the following part of speech under the pressure of a temporal imperative at utterance-initial positions. Each language seems to have some devices to solve this problem, which we call utterance-initial elements (UIEs). We investigated effects of two factors, boundary strengths and complexity of the following constituents, on the durations of possible UIEs, such as fillers, conjunctions, and topic phrases. We found that the last mora of filler e, as well as wa-marked topic phrases, became longer as the complexity increased in certain conditions. Possible interpretations for the results are discussed.

Keywords DiSS, utterance-initial elements, prolongation, boundary strengths, constituent complexity
Li-chiung Yang, “Meaning and use: a pragmatic and prosodic analysis of interjections in conversational speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 75-78. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_075.pdf.

Abstract In this paper we report on our research on the pragmaticcontextual meaning and prosody of three interjections ey, wa, and oh. A detailed qualitative-contextual analysis of our corpus shows that these interjections share important contextual and prosodic characteristics due to their similar functional status with respect to new or unexpected information. We show that there are also significant differences in contextual meaning arising from specific emotional or cognitive states, and that these differences are expressively communicated in the varied prosody of each interjection.

Keywords DiSS, prosody, meaning, interjections, discourse
Etsuko Yoshida, and Robin J. Lickley, “Disfluency patterns in dialogue processing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 115-118. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_115.pdf.

Abstract Spontaneous speech abounds with disfluencies such as filled pauses, repairs, repetitions, false start and prolongations, all of which are significant but easily overlooked features of speech communication. Based on the comparable corpora of English and Japanese dialogues, we argue that disfluency features can have a positive effect on turn-taking issues and the establishment of common referring expressions in dialogue processing. We examined the occurrence of ten types of filled pauses in Japanese and investigated how they interact with discourse entities and the sharing of common ground. The results indicate that two patterns of disfluency features contribute to on-line speech planning of the participants and their four functions serve to construct the collaborative process of speech communication.

Keywords DiSS, dialogue, disfluency, referring expressions, corpus, common ground

2005

Timothy Arbisi-Kelm, and Sun-Ah Jun, “A comparison of disfluency patterns in normal and stuttered speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_013.pdf.

Abstract While speech disfluencies are commonly found in every speaker's speech, stuttering is a language disorder characterized by an abnormally high rate of speech aberrations, including prolongation, cessation, and repetition of speech segments. However, despite the obvious differences between stuttered and normal speech, identifying the crucial qualities that identify stuttered speech remains a significant challenge. A story-telling task was presented to four stutterers and four non-stutterers in order to analyze the prosodic patterns that surfaced from their spontaneous narrations. Preliminary results revealed that the major difference between stutterers' and non-stutterers' disfluencies—aside from the total number—is the type of disfluency and the context affected by the disfluency. Disfluencies in both groups included prolongation, pause and cut, but stutterers' disfluencies also include repetition and combinations of the three (e.g., cut followed by pause). In addition, stutterers' disfluencies were accompanied by more prosodic irregularities (e.g. pitch accent on function words, creating a prosodic break with degraded phonetic cues) prior to the actual disfluency than non-stutterers' disfluencies, indirectly supporting the overvigilant self-monitoring hypothesis.

Keywords DiSS
Matthew P. Aylett, “Extracting the acoustic features of interruption points using non-lexical prosodic analysis,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_017.pdf.

Abstract Non-lexical prosodic analysis is our term for the process of extracting prosodic structure from a speech waveform without reference to the lexical contents of the speech. It has been shown that human subjects are able to perceive prosodic structure within speech without lexical cues. There is some evidence that this extends to the perception of disfluency, for example, the detection interruption points (IPs) in low pass filtered speech samples. In this paper, we apply non-lexical prosodic analysis to a corpus of data collected for a speaker in a multi-person meeting environment. We show how non-lexical prosodic analysis can help structure corpus data of this kind, and reinforce previous findings that non-lexical acoustic cues can help detect IPs. These cues can be described by changes in amplitude and f0 after the IP and they can be related to the acoustic characteristics of hyper-articulated speech.

Keywords DiSS
Katarina Bartkova, “Prosodic cues of spontaneous speech in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 21-25. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_021.pdf.

Abstract Disfluencies, when present in speech signal, can make syntactic parsing difficult. This difficulty is increased when machines are involved in communication and when speech devices rely on automatic speech recognition techniques. In order to improve automatic speech parsing and thus speech comprehension, methods have been proposed to filter disfluencies out from the speech signal. Attempts have been made to use prosodic parameters to improve such a filtering. However, before introducing prosodic parameters into automatic speech recognition processes, it would be useful to investigate whether disfluencies can be characterized in a prosodic way and whether their prosodic cues would be representative enough to be used in automatic systems. The aim of this study was to examine to which extent prosodic parameters would be able to characterize disfluencies in French. Word repetitions, filled and silent pauses and speech repairs were described in a prosodic way using statistical analyses of their prosodic parameters. These analyses allowed simple prosodic rules to be formulated. The efficiency of the prosodic rules was evaluated on the task of filled pauses, word repetitions and hesitation detections.

Keywords DiSS
Philippe Boula de Mareüil, Benoît Habert, Frédérique Bénard, Martine Adda-Decker, Claude Barras, Gilles Adda, and Patrick Paroubek, “A quantitative study of disfluencies in French broadcast interviews,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 27-32. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_027.pdf.

Abstract The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the so-called disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers' status? And what are the most frequent disfuency patterns?

Keywords DiSS
Jean-Leon Bouraoui, and Nadine Vigouroux, “Disfluency phenomena in an apprenticeship corpus,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 33-37. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_033.pdf.

Abstract This papers presents a study carried out on an apprenticeship corpus. It features dialogues between air traffic controllers in formation and "pseudo-pilots". "Pseudo-pilots" are people (often instructors) that simulate the behavior of real pilots, in real situations. Its main specificities are the apprenticeship characteristic, and the fact that the production is subordinate to a particular phraseology. Our study is related to the many kinds of disfluency phenomena that occur in this specific corpus. We define 6 main categories of these phenomena, and take position in regard to the terminology used in literature. We then present the distribution of these categories. It appears that some of the occurrences frequencies largely differs from those observed in other studies. Our explanation is based on the corpus specificity: in reason of their responsibilities, both controllers and pseudo-pilots have to be especially careful to the mistakes they could do, since they could lead to some dramas. The remainder of our paper is dedicated to the more deepen study of a disfluency class: the "false starts". It consists of the beginning utterance of a word, that is not achieved. We show that this category consists of several sub-categories, of which we study the distribution.

Keywords DiSS
Pierpaolo Busan, Giovanna Pelamatti, Alessandro Tavano, Michele Grassi, and Franco Fabbro, “Improvement of verbal behavior after pharmacological treatment of developmental stuttering: a case study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 39-42. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_039.pdf.

Abstract Developmental stuttering is a disruption in normal speech fluency and rhythm. Developmental stuttering usually manifests between 6 and 9 years of age and may persist in adulthood. At present, the exact etiology of developmental stuttering is not fully clear. Besides, the dopaminergic neurological component is likely to have a causal role in the manifestation of stuttering behaviors. Actually, some studies seem to confirm the efficacy of antidopaminergic drugs (haloperidol, risperidone and olanzapine, among others) in controlling stuttering behaviors. We present a case of persistent developmental stuttering in a 24-year-old adult male who was able to control his symptoms to a significant extent after administration of risperidone, an antidopaminergic drug. Our findings show that the pharmacological intervention helped the patient improve on a set of fluency tasks but especially when the tasks involved the uttering of content words. Our results are discussed against the current theories on the cognitive and neurological basis of developmental stuttering.

Keywords DiSS
Estelle Campione, and Jean Véronis, “Pauses and hesitations in French spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 43-46. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_043.pdf.

Abstract In traditional terminology, silent and filled pauses are grouped together, whereas hesitation lengthening is put into a separate category. However, while these various phenomena are very often associated, there have been few studies on how they interact. We analyzed an hour of spontaneous speech to show that silent and filled pauses operate in a totally different way, and that contrary to common belief, silent pauses by themselves never serve as hesitation markers, but only do so when coupled with other markers – mostly syllabic lengthening and filled pauses. These last two hesitation markers have similar acoustic and articulatory characteristics; they are also distributed and function alike.

Keywords DiSS
Maria Candea, Ioana Vasilescu, and Martine Adda-Decker, “Inter- and intra-language acoustic analysis of autonomous fillers,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 47-51. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_047.pdf.

Abstract The present work deals with autonomous fillers in a multilingual context. The question addressed here is whether fillers are carrying universal or language-specific characteristics. Fillers occur frequently in spontaneous speech and represent an interesting topic for improving language-specific models in automatic language processing. Most of the current studies focus on few languages such as English and French. We focus here on multilingual fillers resulting from eight languages (Arabic, Mandarin Chinese, French, German, Italian, European Portuguese, American English and Latin American Spanish). We propose thus an acoustic typology based on the vocalic peculiarities of the autonomous fillers. Three parameters are considered here: duration, pitch (F0) and timbre (F1/F2). We also compare the vocalic segments of the fillers with intra-lexical vowels possessing similar timbre. In this purpose, a preliminary study on French language is described.

Keywords DiSS
Jennifer Cole, Mark Hasegawa-Johnson, Chilin Shih, Heejin Kim, Eun-Kyung Lee, Hsin-yi Lu, Yoonsook Mo, and Tae-Jin Yoon, “Prosodic parallelism as a cue to repetition and error correction disfluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 53-58. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_053.pdf.

Abstract Complex disfluencies that involve the repetition or correction of words are frequent in conversational speech, with repetition disfluencies alone accounting for over 20% of disfluencies. These disfluencies generally do not lead to comprehension errors for human listeners. We propose that the frequent occurrence of parallel prosodic features in the reparandum (REP) and alteration (ALT) intervals of complex disfluencies may serve as strong perceptual cues that signal the disfluency to the listener. We report results from a transcription analysis of complex disfluencies that classifies disfluent regions on the basis of prosodic factors, and preliminary evidence from F0 analysis to support our finding of prosodic parallelism.

Keywords DiSS
Andrew A. Cooper, and John T. Hale, “Promotion of disfluency in syntactic parallelism,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 59-63. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_059.pdf.

Abstract The development of a disfluency-robust speech parser requires some insight into where disfluencies occur in spontaneous spoken language. This corpus study deals with one syntactic variable which is predictive of disfluency location: syntactic parallelism. A formal definition of syntactic parallelism is used to show that syntactic parallelism is indeed predictive of disfluency.

Keywords DiSS
Rodolfo Delmonte, “Modeling conversational styles in Italian by means of overlaps,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 65-70. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_065.pdf.

Abstract Conversational styles vary cross-culturally remarkably: communities of speakers—rather than single speakers - seem to share turn-taking rules which do not always coincide with those shared by other communities of the same language. These rules are usually responsible for the smoothness of conversational interaction and the readiness of the attainment of communicative goals by conversants. Overlaps constitute a disruptive element in the economy of conversations: however, they show regular patterns which can be used to define conversational styles (Ford and Thompson, 1996). Overlaps constitute a challenge for any system of linguistic representations in that they cannot be treated as a one-dimensional event: in order to take into account the purport of an overlapping stretch of dialogue for the ongoing pragmatics and semantics of discourse, we have devised a new annotation schema which is then fed into the parser and produces a multidimensional linear syntactic constituency representation. This study takes a new tack on the issues raised by overlaps, both in terms of its linguistic representation and its semantic and pragmatic interpretation. It will present work carried out on the 60,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API - the Italian version of MapTask, in particular the parser, to produce syntactic structures of overlapped temporally aligned turns. We will also present preliminary data from IPAR, another corpus of spontaneous dialogues run with the Spot Differences protocol. Then it will concentrate on the syntactic, semantic and prosodic aspects related to this debated issue. The paper will argue in favour of a joint and thus temporally aligned representation of overlapping material to capture all linguistic information made available by the local context. This will result in a syntactically branching node we call OVL which contains both the overlapper's and the overlappee's material (linguistic or non-linguistic). An extended classification of the phenomenon has shown that overlaps contribute substantially to the interpretation of the local context rather than the other way around. They also determine the overall conversational style of a given community of speakers with cultural import.

Keywords DiSS
Janet Fletcher, Nicholas Evans, and Belinda Ross, “The intra-word pause and disfluency in Dalabon,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 77-81. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_077.pdf.

Abstract Earlier impressionistic analyses of Dalabon indicate that the grammatical word is often realized as either an accentual or an intonational phrase, followed by a pause. Unusually, it can also be interrupted by a silent pause, with each section being potentially (although not necessarily) realized as separate intonational phrases. Our analyses of pause duration and pause placement within grammatical words support these earlier impressions, although this use of the silent pause appears to be restricted to certain affix boundaries, and other phonological constraints relating to the following surrounding linguistic material. These interruptions also share certain characteristics of "normal" disfluencies however.

Keywords DiSS
Kristy Beers Fägersten, “Hesitations and repair in German,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 71-76. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_071.pdf.

Abstract The occurrence of pauses and hesitations in spontaneous speech has been shown to occur systematically, for example, "between sentences, after discourse markers and conjunctions and before accented content words." (Hansson [15]) This is certainly plausible in English, where pauses and hesitations can and often do occur before content words such as nominals, for example, "uh, there's a ... man." (Chafe [8]) However, if hesitations are, in fact, evidence of "deciding what to talk about next," (Chafe [8]) then the complex grammatical system of German should render this pausing position precarious, since pre-modifiers must account for the gender of the nominals they modify. In this paper, I present data to test the hypothesis that pre-nominal hesitation patterns in German are dissimilar to those in English. Hesitations in German will be shown, in fact, to occur within noun phrase units. Nevertheless, native speakers most often succeed in supplying a nominal which conforms to the gender indicated by the determiner or pre-modifier. Corrections, or repairs, of infelicitous pre-modifiers indicate that the speaker was unable to supply a nominal of the same gender which the choice of pre-modifier had committed him/her to. The frequency of such repairs is shown to vary according to task, with fewest repairs occurring in elicited speech which allows for linguistic freedom and therefore is most like spontaneous speech. The data sets indicate that among German native speakers, hesitations occurring before noun phrase units (pre-NPU hesitations) indicate deliberation of what to say, while hesitations within or before the head of the noun phrase (pre-NPH hesitations) indicate deliberation of how to say what has already been decided (cf. Chafe [8]).

Keywords DiSS
Tiit Hennoste, “Repair-initiating particles and um-s in Estonian spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 83-88. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_083.pdf.

Abstract Particles and um-s used in spontaneous Estonian speech as initiators of different types of repair are analysed. Our model and typology of repair based on conversation analysis is introduced. Three main types of repair and particles used to initiate those are described: prepositioned self-initiated self-repair, postpositioned self-initiated self-repair (addition, substitution, insertion and abandon), and other-initiated self-repair (reformulation, clarification and misunderstanding). In conclusion 6 groups of particles are brougth out by the role they play in the initiation of the repair sequence. Data come from Corpus of Spoken Estonian of the University of Tartu, which contains everyday and institutional speech, telephone and face-to-face conversations.

Keywords DiSS
Sandrine Henry, “Repeats in spontaneous spoken French: the influence of the complexity of phrases,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 89-92. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_089.pdf.

Abstract We here present the results of a descriptive study we conducted on 383 disfluent repeats from a corpus of spontaneous spoken French. We analyze noun phrases under construction and study whether there is a co-relation between the frequency of the repeats and the complexity feature of the phrases. We then focus on complex noun phrases in order to locate precisely the repeats. We also analyze how repeats affect structures such as [Preposition + Determiner + Noun] and what the constraints upon such structures are.

Keywords DiSS
Peter Howell, and Olatunji Akande, “Simulations of the types of disfluency produced in spontaneous utterances by fluent speakers, and the change in disfluency type seen as speakers who stutter get older,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 93-98. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_093.pdf.

Abstract The EXPLAN model is implemented on a graphic simulator. It is shown that it is able to produce speech in serial order and several types of fluency failure produced by fluent speakers and speakers who stutter. A way that EXPLAN accounts for longitudinal changes in the pattern of fluency failures shown by speakers who stutter is demonstrated.

Keywords DiSS
Peter Howell, Jennifer Hayes, Ceri Savage, Jane Ladd, and Nafisa Patel, “Factors that determine the form and position of disfluencies in spontaneous utterances,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 99-102. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_099.pdf.

Abstract This presentation reviews work on types of disfluency in the spontaneous speech of fluent speakers and speakers who stutter. Examination is made of factors that determine where disfluencies are located. It is concluded that the phonological, or prosodic, word provides a good basis for explaining the distribution of different types of disfluency in spontaneous speech.

Keywords DiSS
T. Florian Jaeger, “Optional 'that' indicates production difficulty: evidence from disfluencies,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 103-108. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_103.pdf.

Abstract Optional word omission, such as that omission in complement and relative clauses, has been argued to be driven by production pressure (rather than by comprehension). One particularly strong production-driven hypothesis states that speakers insert words to buy time to alleviate production difficulties. I present evidence from the distribution of disfluencies in non-subject-extracted relative clauses arguing against this hypothesis. While word omission is driven by production difficulties, speakers may use that as a collateral signal to addressees, informing them of anticipated production difficulties. In that sense, word omission would be subject to audience design (i.e. catering to addressees' needs).

Keywords DiSS
Jumpei Kaneda, “Phrase-final rise-fall intonation and disfluency in Japanese - a preliminary study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 109-112. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_109.pdf.

Abstract In Japanese conversations, rise-fall intonation with vowel lengthening often occurs on the final syllable of a phrase. This phrase-final rise-fall (PFRF) is a new type of intonation first reported in the 1960's. Researchers consider PFRF intonation a discourse marker which functions to sharpen the phrase boundary and retain the utterance turn, but other phrase-final intonation such as phrase-final lengthening (PFL) can have a similar pattern. PFLs are recognized as a type of disfluent speech with similar characteristics to PFRFs in terms of final-lengthening and having discourse functions. Also from reports about the spontaneity of speech, we assume that PFRFs would have a relation with disfluency, as well as with PFLs. To examine this assumption, this paper attempts to show the co-occurrence relation between PFRF and disfluency in the same utterance. The results show that PFRFs and PFLs have a relation to posterior disfluent units and suggest that both indicate speech planning strategies. Further, this paper speculates that a difference between PFRF and PFL is a difference in the purposes of speech planning: the latter represents ongoing linguistic editing while the former indicates adjusting the utterance according to the interlocutor's reaction. Disfluencies accordingly occur as effects from processes of speech planning.

Keywords DiSS
Shigeyoshi Kitazawa, “Evaluation of vowel hiatus in prosodic boundaries of Japanese,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 113-116. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_113.pdf.

Abstract We investigated V-V hiatus through J-ToBI labeling and listening to whole phrases to estimate degree of discontinuity and, if possible, to determine the exact boundary between two phrases. Appropriate boundaries were found in most cases as the maximum perceptual score. Using electroglottography (EGG) of the open quotients OQ, pitch mark and spectrogram, the acoustic phonological feature of these V-V hiatus was found as phrase-initial glottalization and phrase-final nasalization observable in EGG and spectrogram, as well as phrase-final lengthening and phrase-initial shortening of the morae. A small dip was observable at the boundary of V-V hiatus showing glottalization. The test materials are taken from the "Japanese MULTEXT", consisting of a particle - vowel (36), adjective - vowel (5), and word - word (4).

Keywords DiSS
Che-Kuang Lin, Shu-Chuan Tseng, and Lin-Shan Lee, “Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 117-121. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_117.pdf.

Abstract This paper presents a whole set of new features, some duration-related and some pitch-related, to be used in disfluency interruption point (IP) detection for spontaneous Mandarin speech, considering the special linguistic characteristics of Mandarin Chinese. Decision tree is incorporated into the maximum entropy model to perform the IP detection. By examining performance degradation when each specific feature was missing from the whole set, the most important features for IP detection for each disfluency type were analyzed in detail. The experiments were conducted on the Mandarin Conversational Dialogue Corpus (MCDC) developed by the Institute of Linguistics of Academia Sinica in Taiwan.

Keywords DiSS
Tobias Lövgren, and Jan van Doorn, “Influence of manipulation of short silent pause duration on speech fluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 123-126. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_123.pdf.

Abstract Ordinary speech contains disfluencies in the form of hesitations and repairs. When listeners make global judgements on speech fluency they are influenced by the frequency and nature of the individual disfluencies contained in the speech. The aim of this study was to investigate a single dimension, pause duration, in the perception of speech fluency. The method involved simulation of pause duration within naturally fluent speech by manipulating existing acoustic silences in the speech. Four conditions were created: one for the natural speech and three with step wise increases in acoustic silence durations (average x2, x4 and x7.5 respectively). In a forced choice task listeners were asked to judge the speech samples as fluent or non fluent. The results showed that the percentage of judgements of disfluency increased as the pause durations increased, and that the difference between the unmanipulated speech condition and the two conditions with the longest pause durations were statistically significant. The results were interpreted to indicate that the individual dimension of pause duration has an independent influence on the judgement of fluency in ordinary speech.

Keywords DiSS
Elgar-Paul Magro, “Disfluency markers and their facial and gestural correlates. preliminary observations on a dialogue in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 127-131. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_127.pdf.

Abstract The aim of this article is to try to establish any observable regularities between the vocal and the visual expression of disfluency markers in a French spontaneous dialogue. The data show different configurations for different types of disfluency markers. Thus "euh"s are typically accompanied by mutual eye contact and no gesture; interrupted eye contact takes place less frequently, on occasions where speech planning is more seriously impaired (syntactical disruption and combination of "euh" with other disfluency markers). False starts seem to be typically accompanied by gesture production whereas eye contact can be maintained if the speaker relies or not on the listener to resolve the speech production problem. The article takes up the idea that disfluency markers can be classified along a continuum throughout the speech formulation process, going from the most discreet to the most prominent. It suggests that the more prominent the disfluency, the more likely is the visual channel to play a role (interrupted eye contact and gesture production).

Keywords DiSS
Jan McAllister, and Mary Kingston, “Characteristics of final part-word repetitions,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 7-11. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_007.pdf.

Abstract In an earlier paper, we have described final part-word repetitions in the conversational speech of two school-age boys of normal intelligence with no known neurological lesions. In this paper we explore in more detail the phonetic and linguistic characteristics of the speech of the boys. The repeated word fragments were more likely to be preceded by a pause than followed by one. The word immediately following the fragment tended to have a higher word frequency score than other surrounding words. Utterances containing the disfluencies typically contained a greater number of syllables than those that did not; however, there was no reliable difference between fluent and disfluent utterances in terms of their grammatical complexity.

Keywords DiSS
Hannele Nicholson, Ellen Gurman Bard, Robin Lickley, Anne H. Anderson, Catriona Havard, and Yiya Chen, “Disfluency and behaviour in dialogue: evidence from eye-gaze,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 133-138. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_133.pdf.

Abstract Previous research on disfluency types has focused on their distinct cognitive causes, prosodic patterns, or effects on the listener. This paper seeks to add to this taxonomy by providing a psycholinguistic account of the dialogue and gaze behaviour speakers engage in when they make certain types of disfluency. Dialogues came from a version of the Map Task, [2, 4], in which 36 normal adult speakers each participated in six dialogues across which feedback modality and time-pressure were counter-balanced. In this paper, we ask whether disfluency, both generally and type-specifically, was associated with speaker attention to the listener. We show that certain disfluency types can be linked to particular dialogue goals, depending on whether the speaker had attended to listener feedback. The results shed light on the general cognitive causes of disfluency and suggest that it will be possible to predict the types of disfluency which will accompany particular behaviours.

Keywords DiSS
Sieb Nooteboom, “Lexical bias re-re-visited. some further data on its possible cause.,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 139-144. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_139.pdf.

Abstract This paper describes an experiment eliciting spoonerisms by using the so-called SLIP technique. The purpose of the experiment was to provide a further test of the hypothesis that self-monitoring of inner speech is a major source of lexical bias. This is a follow-up on an earlier experiment in which subjects were explicitly prompted after each response to make a correction in case of a speech error. In the current experiment both the prompt and the extra time for correction were left out, and there was no strong time pressure for the subject in giving his response. It is shown that under these conditions many primed-for spoonerisms are replaced by other, mostly lexical, errors. These 'replacing' or 'secondary' errors are more frequent in the condition priming for nonword-nonword errors than in the condition priming for word-word errors. Response times obtained for replacing errors are considerably and significantly longer than response times for overtly interrupted errors, and also longer than response times for the primed-for spoonerisms. This suggests that a time-consuming operation follows the primed-for spoonerisms in inner speech, and replaces those with other speech errors, often to preserve lexicality of the error.

Keywords DiSS
Berthille Pallaud, “The re-adjustment of word-fragments in spontaneous spoken French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 145-149. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_145.pdf.

Abstract A study of word-fragments in spoken French has been undertaken for a few years on the basis of non directive talks corpora recorded and transcribed according to GARS' conventions (DELIC currently). These disfluencies are often analyzed within the framework of disfluent repetitions. The observations made on these two types of disfluencies led us to distinguish them. The aim of our study is to describe on the one hand insertions which take place in relation to the word interruptions and their re-adjustment, and on the other hand, to specify the types and localizations of retracing which follow these interruptions. Two kinds of incidental clauses were observed at the time of the readjustments which follow these disturbances. Some, (the more numerous) are syntactically linked to the fragment or with its retracing, others are not. Moreover, the word-fragments which will be modified are the only one to be dependent on the type of localization. For the others, this localization does not make it possible to predict the category of interruption (complemented or unfinished). Our results on word-fragments, confirm however that in contemporary French, the retracing at the head of the nominal or verbal group which contains the disfluency remains the simplest example (at the same time the most frequent, [5]. Nevertheless, a third of the retracing either does not go back to the beginning of the Group, or exceeds it.

Keywords DiSS
Myriam Piccaluga, Jean-Luc Nespoulous, and Bernard Harmegnies, “Disfluencies as a window on cognitive processing. an analysis of silent pauses in simultaneous interpreting,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 151-155. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_151.pdf.

Abstract The paper focuses on silent pauses observed in the productions of subjects involved in simultaneous interpreting tasks. Four bilingual subjects with various degrees of expertise in interpreting and various degrees of mastery of the languages involved (French and Spanish) have been recorded while interpreting utterances of French and Spanish talks. The source discourses had been perturbated by changes both in speech rates (by time compression) and in auditory quality (by addition of a parasiting noise). On the basis of acoustical analyzes performed on the subjects' productions, statistical analyzes focus both on the number and on the duration of the observed pauses. This double approach enables investigations of the kind of cognitive disturbances caused by the independent variables and allows further speculation on the semiology of the pauses durations.

Keywords DiSS
Melanie Soderstrom, and James L. Morgan, “Disfluency in speech input to infants? The interaction of mother and child to create error-free speech input for language acquisition,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 157-162. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_157.pdf.

Abstract One characteristic of infant-directed speech is that it is highly fluent compared with adult-directed speech. However, the speech that infants hear still contains disfluencies. Such disfluencies might potentially cause problems for infants during language development. We first analyzed samples of spontaneous speech in the presence of infants (both adult- and infant-directed) and found that under ideal circumstances the speech infants hear is highly fluent. Under less than ideal circumstances infants hear much more highly disfluent speech - however this disfluent speech is almost entirely adult-directed. While grammatically ill-formed, the prosodic structure of these disfluencies might signal their ill-formedness to the infants. In a preference experiment, 10 month olds listened longer to infant-directed speech samples containing prosodic disfluencies than to equated samples without disfluency. However, this effect was found in only one of two counterbalancing groups. Using adult ratings of low-pass versions of these speech samples, we found that infants' preferences were correlated with the adults' perception of the relative disfluency of the samples. A follow-up experiment using adult-directed disfluencies found that while the 10 month olds showed no differences in their listening preferences, older infants preferred to listen to the fluent speech. These results suggest that younger and older infants attend differently to infant and adult-directed speech, and that older infants may be able to differentiate grammatical adult-directed input from input distorted by disfluency. We discuss implications of these findings for language acquisition.

Keywords DiSS
Ellen Thompson, “A cross-linguistic look at VP-ellipsis and verbal speech errors,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 163-164. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_163.pdf.

Abstract This paper argues that consideration of spontaneous speech errors provides insight into cross-linguistic analyses of syntactic phenomena. In particular, I claim that differences in the distribution of non-parallel VP-Ellipsis constructions in English and German, as well as variation in the spontaneously-occurring verbal speech errors, is explained by a parametric analysis of variation in the inflectional systems of the two languages.

Keywords DiSS
Doroteo T. Toledano, Antonio Moreno Sandoval, José Colás Pasamontes, and Javier Garrido Salas, “Acoustic-phonetic decoding of different types of spontaneous speech in Spanish,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 165-168. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_165.pdf.

Abstract This paper presents preliminary acoustic-phonetic decoding results for Spanish on the spontaneous speech corpus C-ORAL-ROM. These results are compared with results on the read speech corpus ALBAYZIN. We also compare the decoding results obtained with the different types of spontaneous speech in C-ORAL-ROM. As the most important conclusions, the experiments show that the type of spontaneous speech has a deep impact on spontaneous speech recognition results. Best speech recognition results are those obtained on speech captured from the media.

Keywords DiSS
Michiko Watanabe, Yasuharu Den, Keikichi Hirose, and Nobuaki Minematsu, “The effects of filled pauses on native and non-native listeners' speech processing,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 169-172. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_169.pdf.

Abstract Everyday speech is abundant with disfluencies. However, little is known about their roles in speech communication. We examined the effects of filled pauses at phrase boundaries on native and non-native listeners in Japanese. Study of spontaneous speech corpus showed that filled pauses tended to precede relatively long and complex constituents. We tested the hypothesis that filled pauses biased listeners' expectation about the upcoming phrase toward a longer and complex one. In the experiment participants were presented with two shapes at one time, one simple and the other compound. Their task was to identify the one that they heard as soon as possible. The speech stimuli involved two factors: complexity and fluency. As the complexity factor, a half of the speech stimuli described compound shapes with long and complex phrases and the other half described simple shapes with short and simple phrases. As the fluency factor phrases describing a shape had a preceding filled pause, a preceding silent pause of the same length, or no preceding pause. The results of the experiments with both native and non-native listeners showed that response times to the complex phrases were significantly shorter after filled or silent pauses than when there was no pause. In contrast, there was no significant difference between the three conditions for the simple phrases, supporting the hypothesis.

Keywords DiSS
Yelena Yasinnik, Stefanie Shattuck-Hufnagel, and Nanette Veilleux, “Gesture marking of disfluencies in spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 173-178. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_173.pdf.

Abstract Speakers effectively use both visual and acoustic cues to convey information in speech. While earlier research has concentrated on the association of visual cues (provided by gestures) with fluent prosodic structure, this study looks at the relationship between visual cues, prosodic markers and spoken disfluencies. Preliminary results suggested that speakers preferentially perform gestures in the eye region in spoken disfluencies, but a more careful frame-by-frame analysis capturing all gestures revealed that movements of the eye region (blinks, frowns, eyebrow raises and changes in direction of eyegaze) occur with high frequency in both fluent and non-fluent speech. The paper describes a method for frame-by-frame labelling of speech- accompanying gestures for a speech sample, whose output can then be combined with independently derived labels of the prosody. Initial analysis of 3 minute samples from two speakers reveals that one speaker produces eye movements in association with disfluencies and the other does not, and that this tendency does not result from alignment of brow gestures with pitch accents.

Keywords DiSS
Yuan Zhao, and Dan Jurafsky, “A preliminary study of Mandarin filled pauses,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 179-182. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_179.pdf.

Abstract The paper reports preliminary results on Mandarin filled pauses (FPs), based on a large speech corpus of Mandarin telephone conversation. We find that Mandarin intensively uses both demonstratives (zhege 'this', nage 'that') and uh/ mm as FPs. Demonstratives are more frequent FPs and are more likely to be surrounded by other types of disfluency phenomena than uh/mm, as well as occurring more often in nominal environments. We also find durational differences: FP demonstratives are longer than non-FP demonstratives, and mm is longer than uh. The study also revealed dialectal influence on the use of FPs. Our results agree with earlier work which shows that a language may divide conversational labor among different FPs. Our work also extends this research in suggesting that different languages may assign conversational functions to FPs in different ways.

Keywords DiSS

2003

Martine Adda-Decker, Benoît Habert, Claude Barras, Gilles Adda, Philippe Boula de Mareuil, and Patrick Paroubek, “A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 67-70. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_067.pdf.

Abstract The aim of this study is to elaborate a disfluent speech model by comparing different types of audio iranscripts. The study makes use of 10 hours of French radio interview archives, involving journalists and personalities from political or civil society. A first type of transcripts is press-oriented where most disfluencies are discarded. For 10% of the corpus, we produced exact audio transcripts: all audible phenomena and overlapping speech segments are transcribed manually. In these iranscripts about 14% of the words correspond to disfluencies and discourse markers. The audio corpus has then been iranscribed using the LIMSI speech recognizer. With 8% of the corpus the disfluency words explain 12% of the overall error rate. This shows that disfluencies have no major effect on neighboring speech segments. Restarts are the most error prone, with a 36.9% within class error rate.

Keywords DiSS
Matthew P. Aylett, “Disfluency and speech recognition profile factors,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 51-54. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_051.pdf.

Abstract This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson & Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.

Keywords DiSS
Ramona Benkenstein, and Adrian P. Simpson, “Phonetic correlates of self-repair involving word repetition in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 81-84. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_081.pdf.

Abstract A phonetic description of self-initiated self-repair sequences involving the repetition of words in German spontaneous speech is presented. Data are drawn from the Kiel Corpus of Spontaneous Speech. The description is primarily impressionistic auditory, but it also employs acoustic records to verify and objectify the impressionistic findings. A number of different patterns around cut-off are identified. The comparison of phonetic differences between reparandum and repair tokens is used to argue that repair sequences can also provide an interesting insight into the way in which fluent stretches of spontaneous speech are phonetically organized.

Keywords DiSS
Yasuharu Den, “Some strategies in prolonging speech segments in spontaneous Japanese,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 87-90. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_087.pdf.

Abstract Abstract In this paper, we investigate segmental prolongation in a corpus of spontaneous Japanese monologues consisting of over 700,000 words. We examine effects on the rate of prolongation of various factors including speech types, the genders of speakers, word classes, word positions in the phrase and in the inter-pausal unit, and the presence of preceding fillers. Based on the empirical findings, we state some sirategies in prolonging speech segments used by Japanese speakers.

Keywords DiSS
Sheena Finlayson, Victoria Forrest, Robin Lickley, and Janet Mackenzie Beck, “Effects of the restriction of hand gestures on disfluency,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 21-24. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_021.pdf.

Abstract This paper describes an experimental pilot study of disfluency and gesture rates in spontaneous speech where speakers perform a communication task in three conditions: hands free, one arm immobilized, both arms immobilized. Previous work suggests that the restriction of the ability to gesture can have an impact on the fluency of speech. In particular, it has been found that the inability to produce iconic gestures, which depict actions and objects, results in a higher rate of disfluency. Models of speech production account for this by suggesting that gesture and speech production are part of the same integrated system. Such models differ in their interpretation of the location of the gesture planning mechanism in relation to the speech model: some authors suggest that iconic gestures relate closely to lexical access, while others suggest that the link is located around the conceptualization stage. The findings of this study tentatively confirm that there is a relationship beiween gesture and fluency - overall, disfluency increases as gesture is restricted. But it remains unclear whether the disfluency is more related to lexical access than to conceptualization. Proposals for a larger study are suggested. The work is of interest to psycholinguists focusing on the integration of gesture into models of speech production and to Speech and Language Therapists who need to know about the impact that an impaired ability to produce gestures may have on communication.

Keywords DiSS
Kotaro Funakoshi, and Takenobu Tokunaga, “Evaluation of a robust parser for spoken Japanese,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 55-58. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_055.pdf.

Abstract We implemented a parser designed to handle ill-formedness in Japanese speech. The parser was evaluated by utilizing newly collected speech data, which was obtained from an experiment designed to produce ill-formed data effectively. Introducing the proposed method increased the number of correctly analyzed utterances from 171 to 322, from among 532 utterances in the corpus.

Keywords DiSS
Robert J. Hartsuiker, Martin Corley, Robin Lickley, and Melanie Russell, “Perception of disfluency in people who stutter and people who do not stutter: Results from magnitude estimation,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 35-37. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_035.pdf.

Abstract Recent accounts of stuttering consider disfluencies the result of an interaction between speech planning and self- monitoring, emphasizing the continuity beiween errors made in everyday speech and those made by people who stutter. On Vasi9 & Wijnen's account, the monitor is hypervigilant for upcoming problems and interrupts and restarts the speech signal, resulting in disfluent speech. Crucially, on this account, self-monitoring is a perceptual function. Therefore, this account makes iwo predictions (1) people who stutter are also hypervigilant in perceiving another person's speech. (2) the quality of disfluencies made by people who stutter and those who do not will be comparable. We tested these hypotheses using a magnitude estimation judgment task. Twenty participants who stutter and 20 conirols were asked to rate the fluency of excerpted fluent and disfluent fragments from recorded dialogues, either between people who stutter or beiween non-stutterers. In line with the first hypothesis, people who stutter tended to rate all fragments as more disfluent than controls did. However the second hypothesis was not confirmed: across judges, fluent and disfluent fragments excerpted from recordings of people who stutter were rated as less fluent than those excerpted from conirol dialogues, suggesting that there are perceptually relevant differences between the speech of PWS and PWDNS, independent of number and type of disfluencies.

Keywords DiSS
Sandrine Henry, and Berthille Pallaud, “Word fragments and repeats in spontaneous spoken French,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 77-80. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_077.pdf.

Abstract This paper presents the results of a study conducted on the interaction of two disfluencies: repeats and word fragments. It is based on 150 repeated word fragments (e.g., "on le re- re- revendique encore une fois") extracted from a one-million-word corpus of spoken French. Word fragments such as: "notre metier spé- spécifique", are, like repeats (e.g., "vous avez évalué le le montant des dégâts"), very frequent events in spoken language: on average, there is 1 word fragment every 50 seconds, 1 repeat every 17 seconds. Speakers and listeners alike are generally unaware of these phenomena as if they were not part of the communication process. They seldom trigger a metalinguistic reaction from the speaker and are even more rarely acknowledged by the listener. These phenomena have sometimes been interpreted as 'errors' in the communication process, like slips of the tongue. Word fragments and repeats encompass different categories of phenomena, and this enables us to define them as an heterogeneous group ruled by different types of constraints and mechanisms.2 This analysis rests on the following criteria: structural aspects of the repeat, types of word fragments, morphological and syntactic aspects. Analyses of these repeated of identical word fragments from two different angles - that of the repeats and then that of the word fragments - confirm the relevance of the distinction beiween these two types of disfluencies.

Keywords DiSS
Peter Howell, “Is a perceptual monitor needed to explain how speech errors are repaired?,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 31-34. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_031.pdf.

Abstract Kolk & Postma [2] proposed, following Dell & O'Seaghdha [1], that when a speaker chooses a word, phonologically-related words as well as the intended word are activated. Initially, the activations of all these words are similar, though eventually the intended word reaches a higher asymptotic value when activation is complete [1]. According to Kolk & Postma [2], if a response is made in the phase where activation is building up (rather than at full activation), there is a higher chance of the competing, rather than the intended, word being selected (i.e. an error). They propose that a speaker detects such errors when they are produced overtly using the perceptual system, and a monitor in the linguistic system responds by interrupting and initiating the correction [2]. Word repetition and hesitation (not errors in themselves) have been regarded as signifying underlying errors that are detected and interrupted before speech is output in a similar way to overt errors. An assumption in [2] is that activation for a word stops (or, if it continues, is ignored) immediately a candidate word is selected. The brain processes responsible for speech production have massive parallel capacity. Consequently, activation for all the candidates for a word slot could continue beyond the point where a word is selected in cases where a word is responded to prematurely. when the selected word reaches asymptote, the relative activations of this and the other candidate words indicate when an error has occurred (when the selected word has a lower activation than one of the competing words), and what correction is appropriate (the word with the highest activation). This provides the basis for error detection and correction without the need for a perceptual monitor. Continuing the buildup of activation after a word has been selected, implies that activation of nearby words in its phrase overlaps. It is shown, with some realistic assumptions about how activation builds up and decays across different words in a phrase, that this model predicts word repetition and hesitation and also part-word disfluencies (a characteristic of stuttering), again without the need for a perceptual monitor.

Keywords DiSS
Kim Kirsner, John Dunn, and Kathryn Hird, “Fluency: Time for a Paradigm Shift,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_013.pdf.

Abstract Pauses in spontaneous speaking constitute a rich source of data for several disciplines. They have been used to enhance automatic segmentation of speech, classification of patients with acquired communication disorders, the design of psycholinguistic models of speaking, and the analysis of psychological disorders. Unfortunately, however, although pause analysis has been with us for more than 40 years, their interpretation has been compromised by several problems [1]. The first problem is that the pause distribution is skewed, making mean duration a poor measure of central tendency. The second problem is that there are at least two components to the pause duration distribution, a problem that has been confounded by the fact that most authors have assumed that short pauses can be ignored. The third problem is that many scholars have used an arbitrary criterion to separate the pause components thereby adopting statistics that reflect errors of commission or omission. In this paper we review recent work that resolves each of these issues and illustrates the application of the new paradigm to a variety of problems. Our research indicates that, first, there are at least two pause duration distribufl'ons, each of which may be sensitive to theoretically interesting variables; second, the distributions are log-normal, thereby opening the way to appropriate measures of central tendency and dispersion, and, third, the distributions can be reliably separated by application of signal detection theory, and the proportion of misclassifications minimised and estimated. This paper reviews recent research using the new approach to pause analysis.

Keywords DiSS
Torbjörn Lager, “In dialogue with a desktop calculator: A concurrent stream processing approach to building simple conversational agents,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 59-62. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_059.pdf.

Abstract Human spontaneous face-to-face conversations are characterized by phenomena such as turn-taking, feedback, sounds of hesitation and repairs. A simple and highly modular stream-based approach to natural language processing is proposed that attempts to deal with such things. A basic version of the model has been implemented in the Oz programming language.

Keywords DiSS
Piroska Lendvai, Antal van den Bosch, and Emiel Krahmer, “Memory-based disfluency chunking,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 63-66. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_063.pdf.

Abstract We investigate the feasibility of machine learning in automatic detection of disfluencies in a large syntactically annotated corpus of spontaneous spoken Dutch. We define disfluencies as chunks that do not fit under the syntactic iree of a sentence (including fragmented words, laughter, self-corrections, repetitions, abandoned constituents, hesitations and filled pauses). we use a memory-based learning algorithm for detecting disfluent chunks, on the basis of a relatively small set of low-level features, keeping track of the local context of the focus word and of potential overlaps between words in this context. We use attenuation to deal with sparse data and show that this leads to a slight improvement of the results and more efficient experiments. We perform a search for the optimal settings of the learning algorithm, which yields an accuracy of 97% and an F-score of 80%. This is a significant improvement of the baselines and of the results obtained with the default settings of the learner.

Keywords DiSS
Krisztina Menyhárt, “Age-dependent types and frequency of disfluencies,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 45-48. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_045.pdf.

Abstract The age-dependent changes of one's speech production from childhood up to old age are relatively well known. However, there has been less research conducted concerning the possible alterations of the disfluency phenomena in speakers' spontaneous speech determined by age. Our hypothesis is that permanent changes are going on in the operation of speech production processes from early childhood up to old age, and that those changes can be studied via observing disfluency phenomena. A series of experiments has been carried out with the participation of altogether 30 Hungarian-speaking persons, children, midle-aged adults and old subjects (ages of 77). Their spontaneous speech was recorded and analyzed concerning the articulation and speech tempi, silent and filled pauses, as well as other disfluency phenomena (like false starts, repetitions, slips, etc.). The aim of the research is to explore the invariant and variable factors of the disfluencies depending on age. The results highlight also the individual differences that seem to be independent of the age factor.

Keywords DiSS
Hannele Nicholson, Ellen Gurman Bard, Rohin Lickley, Anne H. Anderson, Jim Mullin, David Kenicer, and Lucy Smallwood, “The intentionality of disfluency: Findings from feedback and timing,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_017.pdf.

Abstract This paper addresses the causes of disfluency. Disfluency has been described as a strategic device for intentionally signalling to an interlocutor that the speaker is committed to an utterance under construction. It is also described as an automatic effect of cognitive burdens, particularly of managing speech production during other tasks. To assess these claims, we used a version of the map task and tested 24 normal adult subjects in a baseline untimed monologue condition against conditions adding either feedback in the form of an indication of a supposed listener's gaze, or time-pressure, or both. Both feedback and time-pressure affected the nature of the speaker's performance overall. Disfluency rate increased when feedback was available, as the strategic view predicts, but only deletion disfluencies showed a significant effect of this manipulation. Both the nature of the deletion disfluencies in the current task and of the information which the speaker would need to acquire in order to use them appropriately suggest ways of refining the strategic view of disfluency.

Keywords DiSS
Sieb G. Nooteboom, “Self-monitoring is the main cause of lexical bias in phonological speech errors,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 27-30. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_027.pdf.

Abstract In this paper I present new evidence, stemming both from an experiment and from spontaneous speech, demonstrating that (a) lexical bias is caused by self-monitoring of inner speech, as proposed by Levelt et al. [1], and (b) that there is phoneme-to-word feedback in the mental programming of speech, as supposed by Dell [2] and Stemberger [3]. It is argued here that possibly phoneme-to-word feedback is an unavoidable side-effect of self-monitoring of inner speech.

Keywords DiSS
Caroline L. Rieger, “Disfluencies and hesitation strategies in oral L2 tests,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 41-44. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_041.pdf.

Abstract This paper presents an investigation of hesitation strategies of intermediate learners of German as a second or foreign language (L2) when they take part in oral L2 tests. Previous studies of L2 hesitation strategies have focused on beginning and advanced L2 learners. They found that beginners tend to leave their hesitation pauses unfilled making their speech highly disfluent [17], while advanced L2 speakers - similar to native speakers - use a variety of fillers. In oral L2 tests, intermediate learners hesitate mainly for two reasons: to search for a German word or structure, or to think about the content of their utterance. Some participants use a variety of strategies to signal to the addressee that they are hesitating. This variety is not as rich as it is for advanced L2 learners or native speakers. Other participants leave their hesitation pauses unfilled or rely on quasi-lexical fillers to hold the floor when hesitating.

Keywords DiSS
Guergana Savova, and Joan Bachenko, “Prosodic features of four types of disfluencies,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 91-94. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_091.pdf.

Abstract We present a corpus-based approach for using intonation and duration to detect disfluency sites. The questions we aim to answer are: what are the prosodic cues for each disfluency type? Can predictive models be built to describe the relationship between disfluency types and prosodic cues? Are there correlations beiween the reparandum onset and offset and the repair onset and offset? Is there a general prosodic strategy? Our findings support four main hypotheses: 1) The Combination Rule: A single prosodic feature does not uniquely identify disfluencies or their types. Rather, it is a combination of several features that signals each type. 2) The Compensatory Rule: If there is an overlap of one prosodic feature, then another cue neutralizes the overlap. 3) The Discourse Type Rule: Prosodic cues for disfluencies vary according to discourse type. 4) The Expanded Reset Rule: Repair onsets are dependent on reparandum onsets and reparandum offsets. The limitation of the current study is the relatively small corpus size. Further testing of our proposed hypotheses is needed.

Keywords DiSS
Shu-Chuan Tseng, “Repairs and repetitions in spontaneous Mandarin,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 73-76. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_073.pdf.

Abstract 246 overt repairs, 653 complete repetitions and 475 partial repetitions were identified in an annotated corpus of spontaneous Mandarin conversations. On the basis of the data, this paper investigates Mandarin repairs and repetitions by segmenting them into the reparandum part, the editing part and the reparans part and by tagging them using the CKIP automatic word segmentation and tagging system. Results of the use of editing term, the distribution of part of speech and syllables in the reparandum are presented. Semantic differences and similarity in the discrepancy of tagging results of the reparandum and the reparans are also discussed.

Keywords DiSS
Fan Yang, Peter A. Heeman, and Susan E. Strayer, “Acoustically verifying speech repair annotations,” in Disfluency in Spontaneous Speech (DiSS '03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 97-100. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_097.pdf.

Abstract Identifying speech repairs is a critical part of annotating spontaneous speech. DialogueView is an annotation tool that provides visual and audio supports for directly annotating speech repairs. In this paper, we report the usability of clean play, a special feature implemented in DialogueView, which cuts out the annotated reparanda and editing terms and plays the remaining speech. We find that although clean play does not help users detect repairs, it does help them determine the extent of repairs. We also find that clean play improves users' confidence because they have another way to verify their annotations.

Keywords DiSS

2001

Laura Abou-Haidar, “Pauses in speech by French speakers with Down Syndrome,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 33-36. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_033.pdf.

Abstract A better understanding of the control mechanisms of speech in verbal interaction is very important for the evaluation of the pragmatic competence of a mentally deficient speaker. This study focuses on pauses in the oral production of a Speaker with Down syndrome involved in a conversation: it brings to light the temporal compensation mechanisms which allow the speaker to go beyond the distortions of the segmental level. It confirms the important role of prosody in the success of a conversation, particularly with a speaker who has a handicap which disrupts language structure. Down Syndrome is a condition characterised by an overall delay in cognitive, social, linguistic and motor development. At the oral production level, it leads to deficits in segmental and supra-segmental speech patterning. The goal of this study is to bring elements of response to the following question: is the pragmatic function of language preserved in spite of significant distortions of the motor functions of the phonatory organs? The description of the management of pauses by a speaker with Down syndrome involved in a conversation makes it possible to clarify this subject, while taking into account the various functions which are specific to them beyond the respiratory function: their role in encoding, in the delimitation of syntactic boundaries, and in the regulation of speaking turns, among others. This study allowed us to define criteria which make it possible to characterise the oral production of a Speaker with Down syndrome. These elements relate to the variation of the frequency and the length of pauses. The results obtained are the following: 1. a high frequency of occurrence of pauses in the production of the trisomic speaker; 2. a frequency of occurrence of "mixed pauses", of which the majority have very long lengths, this element revealing a lack of ease and disfluency on the production level; 3. a significant recourse to false-starts, hesitation, repetition and lengthening, to mark sound pauses; 4. a considerable number of very long pauses pauses; 5. a relatively high number of pauses located at the boundaries of or within syntagms, with rather long lengths of intra-syntagmatic uses. We furthermore noted a rarity of long phonic sequences in the speaker with Down syndrome, these sequences seldom exceeding 2000 ms. In spite of these results, it is important to note that we have defined parameters which show that the speaker with Down syndrome integrated rules relating to the management of pauses in verbal interaction.

Keywords DiSS
Karl G. D. Bailey, and Fernanda Ferreira, “Do non-word disfluencies affect syntactic parsing?,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 61-64. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_061.pdf.

Abstract Although disfluencies such as uh are generally not treated as linguistic items, our results suggest that they can affect syntactic parsing. Using a grammaticality judgment task, we demonstrate that disfluencies are able to affect the syntactic parse of a sentence in two ways. First, disfluencies can make syntactic reanalysis more difficult by coming between an ambiguous constituent and a disambiguating item. Second, the pattern of disfluencies in spontaneous speech may be used by the listener to guide the parse of a sentence. Thus, although disfluencies have often been viewed as pragmatic phenomena, they can affect the language comprehension by influencing its parsing procedures.

Keywords DiSS
Ellen G. Bard, Robin J. Lickley, and Matthew P. Aylett, “Is disfluency just difficulty?,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 97-100. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_097.pdf.

Abstract The question addressed by this paper is whether disfluency resembles Inter-Move Interval, a measure of reaction time in conversation, in displaying effects of the overall difficulty of conducting a coherent conversation. Five sources of difficulty are considered as potential causes of disfluency: planning and producing an utterance, comprehending the prior utterance, performing a communicative task, order effects, and interpersonal factors. A multiple regression analysis on simple disfluencies in the HCRC Map Task Corpus shows that planning and production make the major independent contribution to predicting the rate of disfluencies, with interpersonal variables and position in dialogue also contributing significantly. Notably, comprehension variables did not affect either the total rate of disfluency or the rate of individual kinds of disfluencies.

Keywords DiSS
Jeanne-Marie Debaisieux, and José Deulofeu, “Grammatically unacceptable utterances are communicatively accepted by native speakers, why are they ?,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 69-72. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_069.pdf.

Abstract This paper aims at redefining the generally accepted notion of unfinished or elliptic sentence, which appears to be crucial in defining in turn the notion of fluency itself. It will be shown that a large part of utterances which a regularly trained linguist would consider as unacceptable and revealing some kind of disfluency of the speaker who produced them, are in fact fully accepted by the participants of a regular verbal interaction. This apparent contradiction will be explained by the fact that linguists base their judgments of well formedness of the utterances on their grammatical structure, whereas speakers interact basically by means of communicative units, which are not necessarily made up of grammatically well formed parts.

Keywords DiSS
Yasuharu Den, “Are word repetitions really intended by the speaker?,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 25-28. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_025.pdf.

Abstract This paper compares, using our Japanese data, word repetitions with error repairs in terms of their temporal structures in order to examine whether or not the prolongation of first tokens in word repetitions, observed by Den and Clark (2000), is really an effect of the speaker's strategy. Analyses of 10 task-oriented Japanese dialogues reveal a difference between word repetitions and error repairs for the data involving cut-off in first tokens; in both types of disfluencies, the final phoneme of the first token is considerably prolonged, but the degree of the prolongation is much greater in word repetitions than in error repairs. These results support our view that prolonged first tokens in word repetitions are a product of a process under the speaker's control or intention.

Keywords DiSS
Danielle Duez, “Acoustico-phonetic characteristics of filled pauses in spontaneous French speech: preliminary results,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 41-44. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_041.pdf.

Abstract In the current analysis we examined the acoustic and phonetic characteristics of filled pauses in spontaneous French speech and their relationship to the prosody of the surrounding context. Two main results emerged: 1) There was no effect of the duration of filled pauses or their sentence location on their F0 patterns or on the differences between the highest and lowest values. 2) There was no relationship between peak-F0 values and the F0 values of filled-pause onsets, but the F0 values of filled-pause onsets and the F0-values of non-marked breath-group onsets were highly similar. The F0 values of filled-pause onsets seem to be stable within the same speaker's speech. They are speaker-dependent and strongly linked to the physiological, absolute aspects of speech production. It is assumed that filled-pause onset may be used by listeners as a reference for evaluating the speaker's pitch range.

Keywords DiSS
Robert Eklund, “Prolongations: A dark horse in the disfluency stable,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 5-8. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_005.pdf.

Abstract This paper studies a specific type of disfluency, viz. segment prolongation (PR), i.e., the "stretching out" of speech sounds as a means of hesitation. It is shown that the occurrence of PRs varies as a function of phone type, position in the word, lexical factors and word class, and that PRs are subject to phonotactic constraints in Swedish. A comparison between Swedish and Tok Pisin suggests that there are languagespecific traits associated with PR production.

Keywords DiSS
Mária Gósy, “The double function of disfluency phenomena in spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 57-60. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_057.pdf.

Abstract Disfluency in spontaneous speech is the outcome of a speaker's indecision about what to say next. The listener, however, is continuously adapted to both the language signals and the types of disfluency of the heard text. What is in the background of this adaptation process? This paper analyses the types and characteristics of the disfluency phenomena of a 78-minute spontaneous speech sample (produced by 10 adults). The author's intention is to explain the characteristics of disharmony between speech planning and articulation within the speech production process. In order to explain the hypothesized double function of disfluency in terms of perceptual necessity from the listener's side various experiments have been carried out. Three different samples of spontaneous speech have been selected for experimental purposes. Three groups of listeners (altogether 60 university students) participated in the experiments. One of the groups had to detect the instances of disfluency in the texts marking them on a paper sheet. The subjects of the other group listened to the same texts and then wrote down their contents. The pauses and hesitations were then eliminated from the texts. The third group of the subjects had the same comprehension task as the previous one had. Results show that (i) instances of disfluency are consequences of the speaker's speech planning processes, (ii) their reasons and occurrences are unconsciously known by the listener as well, (iii) disfluency phenomena are relatively well predicted, (iv) the listeners need pauses and hesitations in order to comprehend the heard texts successfully.

Keywords DiSS
Tapio Hokkanen, “Prosodic marking of self-repairs,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 37-40. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_037.pdf.

Abstract Slip studies predominantly focus on either structural or semantic properties of the errors. Since most analyses have been based on pen-and-paper collections, i.e., on-line notes, it is quite understandable that suprasegmental of errors have remained a neglected area. The present prosodic analysis is based on acoustical measurements of 307 self-repairs. Each repair has been measured with the Praat program. In order to make the measurements psychoacoustically relevant and comparable across speakers, the changes in F0 are expressed in terms of semitones. In general, speakers repair slightly less than three quarters of the errors they commit whereas one quarter remains either totally undetected or at least without a repair. With respect to prosodic marking, it appears that the proportion of marked repairs in the present data is significantly larger than in previous studies: approximately two thirds of self-repairs are marked with remarkably higher pitch (>+3ST), and a total of 96.7 per cent with a somewhat heigthened pitch. It is concluded that alternations of fundamental frequency are utilized in marking self-initiated repairs.

Keywords DiSS
Peter Howell, and James Au-Yeung, “Application of EXPLAN theory to spontaneous speech control,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 9-12. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_009.pdf.

Abstract Problems for theories that explain speech errors by a monitoring process are discussed. EXPLAN theory is based on a proposal about planning and execution time, not on how errors arise. This theory is outlined and support from characteristics of fluency failure and altered feedback studies given.

Keywords DiSS
Ben Hutchinson, and Cécile Pereira, “Um, one large pizza. A preliminary study of disfluency modelling for improving ASR,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 77-80. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_077.pdf.

Abstract A corpus of spontaneous telephone transactions between call centre operators of a pizza company and its customers is examined for disfluencies (fillers and speech repairs) with the aim of improving automatic speech recognition. From this, a subset of the customer orders is selected as a test set. An architecture is presented which allows filled pauses and repairs to be detected and corrected. A language repair module removes fillers and reparanda and transforms utterances containing them into fluent utterances. An experiment on filled pauses using this module and architecture is then described. A speech recognition grammar for recognising fluent speech is used to provide a baseline. This grammar is then enriched with filled pauses, based on their placement in relation to syntactic boundaries. Evaluation is done at the level of understanding, using a metric on feature structures. Initial results indicate that incorporating filled pauses at syntactic boundaries improves the recognition results for spontaneous continuous speech containing disfluencies.

Keywords DiSS
Klaus J. Kohler, Benno Peters, and Thomas Wesener, “Interruption glottalization in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 45-48. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_045.pdf.

Abstract This paper analyzes the occurrence of phonetic interruption cues at points of syntactic irregularities (false starts and truncations) in a large annotated corpus of German dialogues and compares interruption glottalization with laryngealization in terminal low phrase-final prosodies. Glottalization (including glottal stop) predominantly marks word fragments, whereas non-verbal insertions, e.g. breathing, tend to be word-external interruption cues. Laryngealization (excluding glottal stop) predominantly signals terminal phrase boundaries in turn-final positions. Individual speakers differ a great deal as to the distribution of these phenomena.

Keywords DiSS
Robin J. Lickley, “Dialogue moves and disfluency rates,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 93-96. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_093.pdf.

Abstract Many factors conspire to cause speakers to produce hesitations and self-repairs in dialogue. It has been noted that disfluency rates vary between corpora, with different overall dialogue tasks and with different modalities (e.g. human-computer vs. human-human) and between speakers, where they play different roles within a given dialogue. In this paper, we attempt to account for some of these results by examining the interaction between rates of different types of disfluency and types of utterance (dialogue moves) within one corpus of human-human task oriented dialogues. We find both that overall disfluency rate varies by dialogue move type, with moves which require more planning producing more disfluency, and that the distribution of disfluency types varies between move types, most notably with complex and negative responses to questions producing more filled pauses than positive replies and other moves. This work helps us to understand how dialogue structure can account for differences in disfluency rates between and within speech corpora and has implications for research in speech production and perception, discourse studies, dialogue management and automatic speech recognition.

Keywords DiSS
Jan McAllister, Susan Cato-Symonds, and Blake Johnson, “Listeners' ERP responses to false starts and repetitions in spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 65-68. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_065.pdf.

Abstract Hindle [1] suggested that false starts and repetitions should be handled differently in a computational account of the processing of the two kinds of disfluency, and there is behavioural evidence that the human sentence processing mechanism likewise honours this distinction [2]. The same dichotomy was also evident in the electrophysiological data reported here. False starts and repetitions were identified in a corpus of spontaneous speech. Control items for the false starts were prepared by excising the reparanda to yield apparently fluent items. Continuous EEG was recorded while subjects listened to items containing the false starts, fluent false start controls, and first and second tokens of repetitions. Compared with identical words in their fluent controls, the false starts elicited a positive response similar to the P600 which is reported for syntactically anomalous words [3, 4, 5]. By contrast, second tokens of repetitions in general resulted in increased amplitude of the N400 [6]; yet, when the same repetitions were excised from context and presented listfashion, they elicited the positive-going response which has been reported by other researchers [7].

Keywords DiSS
Nikolinka Nenova, Gina Joue, Ronan Reilly, and Julie Carson-Berndsen, “Sound and function regularities in interjections,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 49-52. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_049.pdf.

Abstract This paper investigates the relation between the sound patterns of interjections and their functional realisation in the discourse process. It considers whether certain interjection functions tend to have particular sound distributions. In order to address these questions a classification scheme for American English nonlexical interjections in terms of discourse markers is also presented.

Keywords DiSS
Sieb G. Nooteboom, “Different sources of lexical bias and overt self-corrections,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 21-24. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_021.pdf.

Abstract In this paper it is argued, on the basis of a quantitative analysis of spontaneous speech errors and their corrections in Dutch, that the mechanism leading to lexical bias in speech errors cannot be same as that leading to overt self-corrections. Although spontaneous speech errors show a strong lexical bias, overt self-corrections do not. Lexical bias strongly increases with dissimilarity between target phoneme and source phoneme No such effect is found in overt selfcorrections. Several possible sources of these differences are discussed.

Keywords DiSS
Caroline L. Rieger, “Idiosyncratic fillers in the speech of bilinguals,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 81-84. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_081.pdf.

Abstract This paper introduces a never before described strategy used by bilinguals to fill hesitation pauses. This strategy proved so unique that it was given the name 'idiosyncratic filler.' It describes a filler type that is produced unusually often by one individual when hesitating. It is usually a particular lexical filler that is used as often as or more often than all other lexical fillers combined. Idiosyncratic fillers are as flexible as, but more 'prestigious' than quasi-lexical fillers and they are used by bilinguals in their non-native language as an overgeneralization and to avoid the incessant production of 'uhs' and 'uhms.'

Keywords DiSS
L. J. Rodríguez, I. Torres, and A. Varona, “Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 1-4. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_001.pdf.

Abstract A new database consisting of 227 dialogues in Spanish was annotated with disfluencies. Then a detailed analysis of the annotations was carried out. The database had been recorded according to the well knownWizard of Oz paradigm. Seventy-five speakers were given each one three different scenarios to make queries about timetables, prices and other conditions of train travels between two spanish cities. The notion of disfluency was relaxed to include any acoustic, lexical or syntactic feature that distinguises spontaneous from read speech. A specific XML annotation scheme was developed. A simple text editor was used to insert marks, and a specific parser was implemented to find errors in annotations. The analysis of annotations revealed that disfluencies were not uniformly distributed among either user turns or speakers. Most disfluencies were grouped into certain user turns, especially the first one. On the other hand, some speakers were remarkably more prone to hesitate, repeat or correct fragments of speech than others.

Keywords DiSS
Mandana Seyfeddinipur, and Sotaro Kita, “Gesture as an indicator of early error detection in self-monitoring of speech,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 29-32. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_029.pdf.

Abstract There is a theoretical controversy regarding when the selfmonitoring process interrupts the speech stream. One view holds that the speech stream is interrupted as soon as an error is detected. Another view holds that, even after an error is detected, the speaker does not interrupt immediately but continues speaking and at the same time plans the upcoming repair. We address this question by observing speech-accompanying gestures at the moment of speech disfluency. The results show that the concurrent gestural movements are typically stopped on average 240 ms before speech is stopped. In other words, the gesture suspension foreshadows the speech suspension. The gestural foreshadowing shows that the speaker must know early on that he is going to suspend speech. The gestural indication of an upcoming speech suspension suggests that the speaker does not interrupt speech at the very moment s/he detects an error. This result supports the hypothesis on speech monitoring stating that the speaker continues to talk after error detection and at the same time plans the upcoming repair.

Keywords DiSS
Richard Shillcock, Simon Kirby, Scott McDonald, and Chris Brew, “Filled pauses and their status in the mental lexicon,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 53-56. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_053.pdf.

Abstract We report a study of the relationship between form and meaning in the most frequent monosyllabic words in the lexicon of English. There is a small but significant correlation between the phonological distance and the semantic distance between each pair of words. To this extent, words that have similar meanings tend to sound similar. Words differ as to the size of this meaning-form correlation in their relationship with all of the other words. When the words are ranked according to the size of this correlation we find that the words which appear towards the top of the ranking are the communicatively important words. When we look at the position in the ranking of the speech editing terms, such as er, oh and um, we find that they are at the very top of the ranking. We argue that this position reflects the communicative importance of these items, and that it therefore makes sense to treat them as a proper part of the mental lexicon.

Keywords DiSS
Jörg Spilker, Anton Batliner, and Elmar Nöth, “How to repair speech repairs in an end-to-end system,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 73-76. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_073.pdf.

Abstract If automatic speech processing wants to deal with spontaneous speech, it has to deal with disfluencies in general and speech repairs in particular as well. The paper describes the processing of speech repairs in the VERBMOBIL system and discusses the special requirements of real-time systems. With respect to this criterion, the VERBMOBIL approach and its results are compared to other work. All these results are based more or less on the evaluation of a stand alone process, not integrated in a speech system. The ultimate goal is, of course, the use and the evaluation of the impact of such a repair process in a real-time, end-to-end system. An evaluation method based on this idea is presented and some preliminary results are given.

Keywords DiSS
Nada Vasic, and Frank Wijnen, “Stuttering and speech monitoring,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_013.pdf.

Abstract In this paper, we would like to argue that stuttering represents inadequate monitoring of the speech production process. The model we are proposing is the vicious circle hypothesis. The stuttering speaker has a malfunctioning monitor whose three parameters, namely focus, effort, and threshold are inappropriately set. In order to test our hypothesis, we tested 20 stuttering individuals in a dual task situation. The experiment consisted of three conditions: baseline where semi-spontansous speech was elicited and two dual-task conditions. First dual task was speaking and playiong a computer game at the same time where the processing resources were taken away from monitoring. The second dual task waqs designed to shift the monitor's focus away from habitual monitoring. Subjects were asked to monitor for a particular word in their speech. The preliminary results for our expeiment show that in the dual task condition the number of disfluencies decreased in relation to the number of words, which, in turn supports our prediction that distraction has a positive effect on fluency in the case of stuttering individuals.

Keywords DiSS
Michiko Watanabe, “The usage of fillers at discourse segment boundaries in Japanese lecture-style monologues,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 89-92. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_089.pdf.

Abstract We examined whether fillers (filled pauses) in a Japanese lecture appeared more frequently after discourse segment boundaries (DSB) than after other sentence boundaries. Contrary to our hypothesis that fillers occur more often after DSB than after other sentence boundaries, the frequency of fillers in the first phrase after DSB did not differ statistically from that after other sentence boundaries. The location of fillers in the first phrase after DSB and after other boundaries did not show any clear difference, either. However, the types of fillers at the initial position of the first phrase after two kinds of boundaries were different; sentence initial 'eto' appeared exclusively at DSB. This result indicates that sentence initial 'eto' may help highlighting DSB, but not other types of fillers. Other kinds of fillers ('e', 'ma', 'ano', 'sono') seem to be mainly concerned with planning units of the utterance that are smaller than a sentence.

Keywords DiSS
Asa Wengelin, “Disfluencies in writing - are they like in speaking?,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 85-88. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_085.pdf.

Abstract This paper presents a study of disfluencies in written language production. Texts from ten university students are compared to data from people who almost never use writing, namely adult dyslexics and to texts from people who communicate in writing under real-time constraints every day, namely deaf whose main use of writing is text telephone conversations. This paper investigates which types of disfluencies occur in writing, where they occur and their durations. Further, this paper investigates how different text types and the specific characteristics of deaf and dyslexic writers influence the distribution of disfluencies. The results are discussed in relation to earlier work on disfluencies in speaking.

Keywords DiSS
Michiko Yoshida, “Repeated phoneme effect in Japanese speech errors,” in Disfluency in Spontaneous Speech (DiSS '01), Edinburgh, Scotland, August 2001, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_017.pdf.

Abstract Analyses of errors in the natural speech of Dutch, German, and English have shown that involuntary rearrangements of phonemes (e.g., left hemisphere heft lemisphere) are more likely to occur when the two words involved in the error have the same phoneme before or after the phoneme on which the error occurred (e.g., /E/ in left hemisphere) [1, 2]. A study by Dell (1984) has revealed that phoneme repetition could also contribute to experimentally induced speech errors in English [3]. The present study explored the effect of repeated phonemes in Japanese speech errors by means of two errorinducing experiments. Analyses of subjects' errors showed that a sequence of syllables that share the same phoneme was more error-prone than one with a variety of phonemes, suggesting that phoneme repetition could contribute to Japanese speech errors. These results are consistent with the view that the repeated phoneme effect is common to all speakers regardless of language.

Keywords DiSS

1999

Heather Bortfeld, Silvia D. Leon, Jonathan Bloom, Michael F. Schober, and Susan E. and Brennan, “Which speakers are most disfluent in conversation, and when?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 7-10.

Abstract We examined disfluency rates in a corpus of task-oriented conversations [1] in which several factors were manipulated that could affect fluency rates. These factors included: speakers' age (young, middleaged, and older), task roles (director vs. matcher), difficulty of domain (abstract geometric figures or tangrams vs. photographs of children's faces), relationship between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only marginally higher (combined) disfluency rates than young and middleaged speakers. Overall, disfluency rates were higher both when speakers took the initiative and when they discussed tangrams, associating disfluencies with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repetitions and restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

Keywords DiSS
Susan E. Brennan, and Michael F. Schober, “Uhs and interrupted words: The information available to listeners,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 19-22.

Abstract Speech disfluencies are generally assumed to harm comprehension. Our studies investigated whether this is true, or whether certain disfluencies might actually help comprehension by marking for listeners which information the speaker intends to repair. We tested two hypotheses: (1) whether an interrupted word signals that the word was produced in error, and (2) whether a filler such as uh after an interrupted word signals an error. Listeners heard fluent instructions and disfluent ones whose reparanda contained completed words, interrupted words, or interrupted words with fillers, and then responded to these instructions. Responses to mid-word interruptions were no faster than to between-word interruptions, although there were fewer errors when less of the unintended word was heard. Responses to mid-word interruptions with uh were faster and more accurate than controls without disfluencies. With more complex displays, the response time advantage (but not the error rate advantage) diminished, suggesting that an interrupted word followed by uh tells listeners what the speaker does NOT mean. A fourth experiment showed that it is not the presence of the uh per se, but the additional time after the interrupted word that is the source of this "disfluency advantage."

Keywords DiSS
Mark G. Core, and Lenhart K. Schubert, “Speech Repairs: A Parsing Perspective,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 47-50.

Abstract This paper presents a grammatical and processing framework for handling speech repairs. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. This parser can also correct a pre-parser speech repair identifier producing increases in recall varying from 2% to 4.8%.

Keywords DiSS
Robert Eklund, “A Comparative Analysis of Disfluencies in Four Swedish Travel Dialogue Corpora,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 3-6.

Abstract This paper reports on ongoing work on disfluencies carried out at Telia Research AB. Four travel dialogue corpora are described: human-"machine"-human (Wizard-of-Oz); human-"machine" (Wizard-of-Oz); human-human and human-machine. The data collection methods are outlined and their possible influence on the collected material is discussed. An annotation scheme for disfluency labelling is described. Preliminary results on five different kinds of disfluencies are presented: filled and unfilled pauses, prolonged segments, truncations and explicit editing terms.

Keywords DiSS
Jean E. Fox Tree, “Between-Turn Pauses and Ums,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 15-17.

Abstract Pauses and ums are often treated as two versions of the same thing, with the traditional label for ums, filled pauses, emphasizing this seeming interchangeability. To explore this hypothesis, I compared how overhearers interpreted a speaker's contribution to a conversation depending on whether the speaker responded immediately, paused and responded, or said um and responded. Overhearers answered a series of questions about the turn exchanges they had heard. The questions measured their interpretations of the second speakers' speech production difficulty, honesty, comfort with the topic discussed, familiarity with the interlocutor, and desire to have further contact with the interlocutor. In two experiments, the type of turn exchange was found to influence overhearers' interpretations. Results supply information about both the signalling properties of ums and the relationship between ums and pauses of varying lengths in the environment of a turn exchange.

Keywords DiSS
Dafydd Gibbon, and Shu-Chuan Tseng, “Toward a formal characterisation of disfluency processing,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 35-38.

Abstract Inherent structural characteristics of speech disfluencies are the prerequisite for the fulfilment of detecting and correcting speech disfluencies in spontaneous speech. However, a considerable number of recent research works on speech disfluencies focus on the surface patterns of speech disfluency editing structure, instead of looking into the relations between editing structure, the syntactic structure and the prosodic structure of speech disfluencies. In this paper we present first results of a new line of research, using feature structures modelled by finite state transducers, on the formal modelling of speech disfluencies in unplanned speech, in relation to all three levels of description.

Keywords DiSS
Peter A. Heeman, and K.H. Loken-Kim, “Detecting and Correcting Speech Repairs in Japanese,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 43-46.

Abstract One of the characteristics of spontaneous speech is the abundance of speech repairs, in which speakers go back and repeat or change something they have just said. In other work [7], we proposed a language model for speech recognition that can detect and correct speech repairs in English. In this paper, we show that this model works equally as well on a Japanese corpus of spontaneous speech. The structure of the model captures the language independent aspect of speech repairs, while machine training techniques on an annotated corpus learn the language dependent aspects.

Keywords DiSS
Kim Kirsner, Ben Roberts, and Yong-Heng Lee, “Why does spontaneous speech unfold in temporal cycles, sometimes?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 11-14.

Abstract Spontaneous speech typically consists of alternating periods of continuous fluency, where fluency refers to the ratio of speech to pausing. Individual differences in fluency are substantial, with mean pause per minute ranging from less than 20 to more than 40 sec per minute in our sample of English and Mandarin speakers. While pauses have been regarded as critical clues for psycholinguistic analysis for decades, the existence of temporal cycles have been subject to extensive debate. The results of our experiments provide strong support for the presence of temporal cycles in spontaneous speech, and demonstrate in particular that fluency declines and increases prior and subsequent to topic shifts respectively. The source of temporal cycles is unclear, however. The prevailing assumption is that they reflect alternating periods of high level macro-planning, associated with low fluency, and low level micro-execution, associated with high fluency. However, a variety of alternative explanations merit consideration.

Keywords DiSS
Robin Lickley, David McKelvie, and Ellen Gurman Bard, “Comparing human and automatic speech recognition using word-gating,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 23-26.

Abstract This paper describes a study in which we compare human and automatic recognition of words in fluent and disfluent spontaneous speech. In a word-level gating study with confidence judgements, we examine how the recognition and confidence of recognition of words by humans develops over utterances and show how disfluency disrupts the process. We give an automatic recogniser the same task and compare its performance with the humans’. With both systems, subsequent context supports word recognition: confidence in word recognition peaks after subsequent words have been heard. With both systems, disfluency adversely affects recognition of words in the immediate vicinity of the disfluent interruption (for repeats and repairs): disrupted subsequent context disrupts the recognition process.

Keywords DiSS
Douglas O'Shaughnessy, “Better detection of hesitations in spontaneous speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 39-42.

Abstract Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speeech recognition work has concentrated on read speech, whose acoustic aspects differ significanlty from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., "uhh," "umm"), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficiences), identiyfing filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation.

Keywords DiSS
Sherri Page, “Use of a postprocessor to identify and correct speaker disfluencies in automated speech recognition for medical transcription,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 27-30.

Abstract Medical practitioners speak in a quasi-spontaneous monologue when they dictate a chart note, letter, or patient history. Prior research has largely ignored the issue of disfluency in dictation, arguing that speakers can control recording and start over if necessary. In 550,000 words of hand transcribed medical dictation, however, we find numerous filled pauses, repetitions, and other self-repairs. This paper describes: a pre-theoretical classification of disfluencies, developed to identify patterns useful in automatic text processing; the patterns of disfluency found in a corpus hand tagged with this classification, which include repetitions in combination with substitutions, insertions, and deletions; and, preliminary results of implementation of a disfluency pattern matcher and filter in a postprocessor developed for commercial use.

Keywords DiSS
Sergey Pakhomov, and Guergana Savova, “Filled Pause Distribution and Modeling in Quasi-Spontaneous Speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 31-34.

Abstract Filled pauses (FP's) are characteristic of spontaneous speech and present considerable problems for speech recognition by being often recognized as short words. Recognition of quasispontaneous speech (medical dictation) is subject to this problem as well. An um can be recognized as thumb or arm if the recognizer’s language model does not adequately represent FP’s. Representing FP’s in the training corpus improves recognition. Several techniques of seeding a training corpus with FP’s were evaluated to show that a stochastic method, along with random insertion uniformly distributed around the average sentence length, yield better results compared to random insertion at other ranges. The optimal method of seeding a training corpus with FP’s may be linked to clause boundaries despite the fact that an imperfect method of inserting FP’s at clause boundaries used in this study failed.

Keywords DiSS

Filled Pause

Research Center

Filled Pause

Research Center

Filled Pause

Research Center

Bibliography of Disfluency in Spontaneous Speech (DiSS) papers

2021

2019

2017

2015

2013

2010

2005

2003

2001

1999