Bibliography of hesitation phenomena resources

Following is a complete list of published resources in the FPRC bibliography. Note that this is not an exhaustive list of publications related to hesitation phenomena. If you know of resources that ought to be in this list, then please send them to me via the FPRC contact form. Download the entire list in bibtex format here.

2021

Simon Betz, Nataliya Bryhadyr, Loulou Kosmala, and Loredana Schettino, “A crosslinguistic study on the interplay of fillers and silences,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 47-52.

Abstract We present a crosslinguistic study on the interplay of hesitation silences and fillers in conversation. The research questions have been addressed for English in a previous DiSS workshop paper (Betz & Kosmala, 2019) and this study extends the analysis to German, Italian and French. The research questions are: 1) Does the type of the filler influence following silence duration 2) Does the duration of the filler correlate with silence duration 3) Does silence duration vary depending on its distance from filler. The analysis shows cross-linguistic similarities and differences, thus highlighting the role and the language- and culture-specific nature of disfluencies.
Judit Bóna, “Disfluencies in spontaneous speech: The effect of age, sex and speech task,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 99-104.

Abstract The main question of this study is if there are differences in the occurrence of disfluencies of young and old males and females depending on speech task. Frequency and types of disfluencies of 20 young and 20 old speakers were analyzed in three different speech tasks. Results show that speakers’ age has significant effect on the frequency of disfluencies only in males’ speech. There are disfluencies which are more characteristic of old speakers’ speech, and others of young speakers’ speech. Speech task has significant effect on the analyzed parameters in both ages, while sex has the least impact on frequency.
Liesbeth Degand, “Discourse markers as markers of (dis)fluency: The role of peripheral position,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 1-2.

Abstract Studies on the relationship between discourse markers (DMs) and (dis)fluency have a Janus-headed face. On the one hand, DMs are described as structuring devices key to the local and global organization of discourse. As such, they contribute to its overall fluency. On the other hand, they have been described as traces of impediments in the speech production process, thus signalling disfluency. In other words, DMs are characterized by “functional ambivalence”, a notion reflecting their effects as symptoms of production difficulties and as signals of inferences to be made (Crible, 2018:3, see also Clark & Fox Tree, 2002). Starting point of this presentation is the observation that DMs occur overwhelmingly in initial position of their host unit, where they fulfil specific discourse functions. Discourse Markers may also occur in a functionally-motivated way in final position, be it less frequently. The simplified hypothesis of this study is that DMs in peripheral position have a fluent signalling function, while DMs in non-peripheral position are symptomatic of disfluent use. We will show that this dichotomy needs to be fine-tuned considering the type of host unit under study. On the basis of previous work investigating the relationship between DM function, DM position and the linguistic type of host unit (syntactic clause, intonation unit or speech turn) (Degand & Crible, in press), the hypothesis is that DMs work as functional boundary markers at the syntactic (clause) and the interactional (turn) levels, but not at the prosodic (intonation unit) level. Other (medial) uses should be less functionally motivated and be considered as symptoms of disfluency. Fluent and disfluent use will be evaluated in context, considering co-occurrence with other disfluency markers (Crible, Degand & Gilquin, 2017). A systematic study of the functional distribution of DMs in spoken French will show that this hypothesis is at least partially borne out.
Jessica Di Napoli, “Filled pauses in university lectures,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 41-46.

Abstract Previous studies have shown that filled pauses such as uh and um may provide cues to listeners to discourse structure and information structure. The present study employs a corpus-based approach to investigate to what extent filled pauses occur in this function in eight undergraduate lectures in American English. Results show that filled pauses occur most frequently in initial (i.e., post-pausal) position, and that they often cluster together following topic changes. Filled pauses are also shown to occur before important words in the corpus. Together, the results suggest that filled pauses in lectures may highlight important information and mark discourse structure at various levels. The findings contribute to gaining a better understanding of filled pause use across different registers and provide support of filled pauses as signals which benefit listeners.
Dorottya Gyarmathy, Valéria Krepsz, Anna Huszár, and Viktória Horváth, “Dynamic changes of pausing in triadic conversations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 105-110.

Abstract Pausing in conversation has several roles from speech planning to managing turn-takings (TTs). However, less is known about the dynamic changes of pauses over time or with regard to the turn-taking system. The frequency and the duration of silent and filled pauses (SPs and FPs) as well as shared silences was analyzed in 20 triadic Hungarian conversations using dynamic frames (altogether more than 7700 items). Data showed that the frequency of silent and FPs decreased over time across conversations. As opposite, shared silences were found to be the most frequent in the last sections of conversations. However, the duration of the pauses did not change over time across conversation—it may be influenced by other factors. We found that the SPs containing audible breathing were longer than other SPs. The SPs were less frequent before turn-takings than in other positions. However, their duration was not affected by the turn-taking system.
Mária Gósy, and Vered Silber-Varod, “Attached filled pauses: Occurrences and durations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 71-76.

Abstract Filled pauses may reveal speech planning or execution problems that result in various positional and temporal patterns in spontaneous utterances. The purpose of this study was to analyze the position of the vocalic FPs, with respect to an adjacent word, in terms of occurrences and their durations produced by young (mean age: 25 years) and elderly (mean age: 76 years) speakers of Hungarian (a total of 32 participants). Elderly speakers produced significantly less and longer vocalic FPs than young speakers did. Both the occurrences and durations were significantly influenced by position of FPs and by age. In this paper, we introduced the conception of a functional difference between FPs attached either to the preceding or to the following word. The findings indicated different ways of resolving speech planning or execution problems depending on age.
Loulou Kosmala, “Gestures in fluent and disfluent cycles of speech: What they may tell us about the role of (dis)fluency in L2 discourse,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 77-82.

Abstract The present study looks at the production of gestures in fluent versus disfluent speech in L1-L2 interactions, following Graziano and Gullberg (2013, 2018). The aim of this paper is twofold: first to argue against the Lexical Retrieval Hypothesis (Krauss, Chen, & Gottesman, 2000) by comparing the distribution and function of gestures in fluent versus disfluent speech; second, to closely examine the unfolding of embodied (dis)fluencies, where vocal and visual-gestural actions are coordinated and situated within word searching sequences. The analyses are conducted on a video-recorded corpus of semi-spontaneous interactions between French and American speakers in tandem settings. Overall, our results support Graziano and Gullberg’s (2018) findings, and show that gestures accompanying (dis)fluencies are not necessarily related to lexical difficulties. Additionally, the qualitative analyses highlight the interactional and multimodal role of (dis)fluencies, which offers a fresh perspective of these phenomena which have often been treated from an internal production perspective.
Xinyue Li, Carlos Toshinori Ishi, and Ryoko Hayashi, “EGG analysis of filled pauses in Japanese spontaneous speech: Differences in Japanese native speakers and Chinese learners,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 65-70.

Abstract Previous studies on L2 learners of Japanese have shown that the appropriate use of filled pauses is a crucial skill in communication with native speakers. However, there is limited acoustic investigations on filled pauses produced by L2 learners of Japanese. The present study examines the production of filled pauses in Japanese native speakers and L1-Chinese L2 learners of Japanese, using open quotient features extracted from Electroglottography (EGG) signals. The results show that open quotient values of filled pauses were lower than those in ordinary lexical items for Chinese learners of L2 Japanese, suggesting that they may be using vocal tension as one cue to distinguish filled pauses from ordinary lexical items. However, no similar differences for open quotient were observed for the Japanese native speakers. Furthermore, open quotient-valued voice range profiles reveal that Chinese learners of L2 Japanese transfer their native glottal source cues when they produce filled pauses in Japanese.
Gabrielle Morin, and Benjamin Tucker, “The acoustic characteristics of um and uh in spontaneous Canadian English,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 53-58.

Abstract The present study investigates and compares the acoustic characteristics of uh [ə] and um [əm] spontaneous speech. The data comes from a corpus of Western Canadian conversational spontaneous speech. Measures of duration, fundamental frequency, F1 and F2 were extracted from 1,048 instances of um and uh. Results indicate that longer durations occurred when markers preceded silent pauses. Um was found to have higher F1 and lower F2 than uh. F0 was overall lower for um in comparison to uh. These results provide a preliminary understanding of um and uh as markers in spontaneous Canadian English. Canadian English shows a similar proportion of um over uh usage in comparison to American and British English. Findings on vowel duration show no significant difference between um and uh. Differences in f0, F1 and F2 provide additional indication of how um and uh are different.
Sieb Nooteboom, and Hugo Quené, “Why are some speech errors detected by self-monitoring “early” and others “late”?,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 11-16.

Abstract In this paper we attempt to answer the question why in self-monitoring some segmental speech errors are detected in internal, some in external speech, and others not at all. This was done by re-analyzing data obtained in two earlier published SLIP experiments. It is hypothesized that detection of errors that are similar to the correct target takes longer than detection of errors that are dissimilar. It is also hypothesized that the time available for error detection in internal speech and for detection at all is limited. Results show that indeed a major factor is the strength of phonetic contrast between two competing response candidates.
Aurélie Pistono, and Robert Hartsuiker, “Word-form related disfluency versus lemma related disfluency: An exploratory analysis of disfluency patterns in connected-speech production,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 95-98.

Abstract Several language production levels may be involved in the production of disfluencies. In the current study, we conducted network task experiments to tackle disfluencies related to conceptualization, which we operationalized by impeding visual object recognition (i.e. blurring). Contrary to what was expected, blurriness did not lead to more disfluency. However, disfluency type and disfluency location were closely related. This suggests a distinction in the underlying function of disfluencies, some reflecting word-form related difficulties, others reflecting lemma related difficulties.
Valeriya Prokaeva, and Elena Riekhakaynen, “Hesitation phenomena in first and second languages: Evidence from reading in Russian as L1 and Japanese as L2,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 89-94.

Abstract The studies of speech disfluencies rarely involve spontaneous reading data. The current study aims at the identification and the comparative analysis of the hesitation phenomena during unprepared reading of texts in the native (Russian) and non-native (Japanese) language. Three groups of disfluencies are differentiated: silent pauses, filled pauses (including lexical fillers, non-lexical fillers, lengthenings, syllable-by-syllable pronunciation and paralinguistic phenomena), and other hesitations (error-related disfluencies, repetitions, self-interruptions and within-word breaks). The results suggest that disfluency is more frequent in non-native reading and is prevalent in the lower Japanese proficiency group, whilst the higher text complexity defined by a text type does not necessarily induce more hesitations. The self-correction phenomena were equally widespread in both L2 proficiency groups, whereas the number of noticed but uncorrected errors was higher in the lower Japanese proficiency group.
Laurent Prévot, Roxane Bertrand, and Stéphane Rauzy, “Investigating disfluencies contribution to discourse-prosody mismatches in French conversations,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 35-40.

Abstract In conversation, discourse and prosodic units association can be articulated through an interesting range of configurations. The situation in which these units are mismatching is the least studied and understood of these configurations. We make the hypothesis in this paper that disfluencies are a major cause for such mismatches. Our quantitative analysis based on a 8 hour corpus of French conversations manually annotated with disfluencies, discourse units (DU) and prosodic units (PU), confirms that disfluencies do play a major role in PU-DU mismatch but also that other sources should be considered. In the analysis, we also provide some insight about the different types of disfluencies and their frequency in the different DU-PU configurations.
Ralph L. Rose, “Variation in jitter, shimmer, and intensity of filled pauses and their contexts in native and nonnative speech,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 59-64.

Abstract Various acoustic parameters of filled pauses (e.g. uh/um in English, e-(to) in Japanese) have been investigated including duration, pitch, and formants. Less investigated have been jitter, shimmer, and intensity. The present work looks at systematic variation in these properties of filled pauses and their immediate contexts in a crosslinguistic speech corpus. Filled pauses were examined within the five token (word) window centered on the filled pause, exploring variation with respect to first (L1 Japanese) and second language (L2 English) speech as well as L2 proficiency. Results show that relative to the central filled pause, higher jitter and shimmer occur before the filled pause and higher intensity afterward. Proficiency group differences are weak, but suggest that jitter differences are greater in high proficiency speakers and shimmer differences greater in low proficiency speakers. Results vary somewhat from earlier work, but suggest jitter and shimmer may be advance indicators of upcoming disfluency.
Toshiyuki Sadanobu, “Attitudinal correlates of word-internal disfluencies in Japanese communication,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 5-10.

Abstract Through a case observation and a questionnaire survey, this presentation seeks to elucidate the patterns of word-internal disfluency in Japanese communication and determine how speakers implement these patterns. Two conclusions can be drawn: (i) Four possible patterns of word-internal disfluency exist in Japanese communication. Some cases show that disfluency that superficially appears not to be prolonged may come under prolongation. (ii) Some deviations are observed in disfluency patterns in accordance with the speaker’s attitude; all four patterns can be seen to occur in hesitant attitudes, whereas those expressed in the attitude of surprise primarily belong to the “suspending and restarting” pattern. However, where the degree of surprise is low or close to disgust, disfluency is more likely to be expressed as “prolonging and continuing.”
Loredana Schettino, Simon Betz, and Petra Wagner, “Hesitations distribution in Italian discourse,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 29-34.

Abstract The acknowledgment of the functional role of hesitations in speech has increased the research interest in investigating and modeling their occurrence in discourse. This study explores hesitation combinations and distribution in Italian discourse. Though clusters represent less frequent occurrences than standalone hesitations, it is still worth examining their composition, distribution, and context of occurrence for a better understanding of hesitations’ role in discourse. Also, the emerging patterns may provide interesting findings for technological applications, such as integrating hesitations models in conversational agents’ production to improve their communicative efficiency and naturalness.
Vered Silber-Varod, “DiSStory: A computational analysis of 9 editions of Disfluency in Spontaneous Speech workshop,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 3-4.

Abstract What are the most prominent research topics during the past nine DISS workshop? Do we see any shift over the years? Can we identify the specific terms used in the research of disfluency? At the 10th workshop of DiSS, I will present some answers I have come up with using a data-driven approach on the database of abstracts published in the proceedings of DiSS workshops from 1999 to 2019. In this talk I call the participant to “Trust the text”, as Sinclair (2004) entitled his book, and to join the journey into the DiSS story.
Nette Vandenhouwe, and Robert Hartsuiker, “Speech disfluencies as actual and believed cues to deception: Individuality of liars and the collective of listeners,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 17-22.

Abstract There is no consensus about the relationship between disfluencies and deception in speech production. However, it is well established that listeners believe deceptive speech to contain more disfluencies than truthful speech. Here, we used an interactive game to collect the speech of liars and the veracity decisions of listeners. Using Multivariate Pattern Analysis (MVPA), we determined the predictive value of speech disfluencies as both actual and believed cues to deception. We found that patterns of disfluencies can indeed be used to predict both an utterance’s veracity and a listener’s decision about that veracity better than chance. However, there was much individual variation in how lies altered speech, whereas listeners were consistent in how they thought the speech of others indicates lying.
Simon Williams, “Categorical differences in the false starts of speakers of English as a second language: Further evidence for developmental disfluency,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 83.

Abstract Although much is known about the formal properties of L2 repair in general and error corrections in particular, less in known about other subtypes, here collectively referred to as false starts. Unlike L2 self-corrections, false starts are psycholinguistically more comparable with NS equivalents and are of particular interest as possible sites of learner monitoring and modified output. Consistent with previous research on L2 repairs, this study found that lower-intermediate and advanced L2 speakers produced similar numbers of false starts. Their mapping by speaker proficiency level onto Levelt’s (1989) model of speech production revealed that both groups were concerned with lexical and morphological false start repair but that lower-intermediate speakers produced more syntactic and advanced speakers more conceptual examples.
Yaru Wu, Mathilde Hutin, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, and Liesbeth Degand, “Fine phonetic details for DM disambiguation: A corpus-based investigation,” in The 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), St. Denis, France, August 2021, pp. 23-28.

Abstract In this study we examine phonetic variation of discourse markers in French, using for this purpose the 4-hour richly annotated LOCAS-F corpus. Both linguistic factors and stylistic variables are considered: speech style, part-of-speech category, mean phone duration and vowel formant distributions with respect to the word status. The results show that the use of discourse markers increases with the degree of spontaneity of the speech. Coordinating conjunctions are the part-of-speech which is most frequently used as discourse markers. Moreover, the mean phone duration tends to be shorter and the vowel space more centralized when words are employed as discourse markers, suggesting that discourse markers undergo hypoarticulation and, more generally, reduction.

2020

Burcu Arslan, and Tilbe Göksun, “Understanding Multimodal Communication: Gesture Production and Disfluency in Speech,” in Proceedings of Gesture and Speech in Interaction (GESPIN2020), Stockholm, Sweden, September 2020. https://trello.com/c/uc6U4thK.

Abstract Do gestures facilitate lexical access particularly when speech production is not fluent? This study investigates gesture and disfluency rates and patterns when individuals describe concrete and abstract paintings and asks whether gestures facilitate speech by resolving disfluencies. Turkish-speaking participants (N=30) were asked to describe three concrete and three abstract paintings. We coded speech disfluencies (i.e., filled pauses, repairs, repetitions), frequency and type of gestures used. The results showed that although describing abstract paintings were relatively more difficult compared to the concrete ones, disfluency rates and overall gesture frequency were similar between the two painting categories. However, representational gesture frequency was higher for the abstract category, emphasizing the relationship between representational gestures and conceptualization process. Moreover, we found that most disfluencies occurred without gestures and most gestures occurred without disfluent speech. These findings suggest that although there can be cases in which gestures facilitate speech, it does not mean that gestures are fully compensatory in nature.
Malte Belz, “Acoustic vowel quality of filler particles in German,” in Laughter and Other Non-Verbal Vocalisations Workshop 2020, October 2020, pp. 7-10. DOI: 10.4119/lw2020-908.

Abstract The vowel quality of filler particles (FP) is studied for 24 speakers of German who produced 666 instances of vocalic (äh) and vocalic-nasal forms (ähm) in spontaneous dialogues. The FP vowel quality is compared to reference vowels of a word list as well as to phonologically and graphematically similarly constructed lexical syllables. Filler particles show a complete overlap with the reference vowels [÷] and [5], but overlap only partially with [E] and [@].
Alyssa Bulow, “Write before you Speak: The Impact of Writing on L2 Oral Narratives,” Master's Thesis, Michigan State University, East Lansing, MI, . 2020. https://search.proquest.com/openview/a7419aa70a4496835659a3f90739b625/1?pq-origsite=gscholar&cbl=18750&diss=y.

Abstract Current literature suggests that writing may better facilitate language learning than speaking practice alone, but direct empirical research demonstrating this is limited. Evidence is also limited as to whether grammar and vocabulary learned while writing can transfer to speaking. This study investigates the prediction that written planning, even more so than oral planning, leads to improved oral narratives. Thirty-four Spanish-speaking learners of English were randomly assigned to one of two groups: writing rehearsal or oral rehearsal; rehearsal being individual practice before the final task. The writing group composed a story ending in the written modality while the oral group rehearsed by narrating theirs out loud. Both groups recorded their oral story continuation task as the final product. In order to compare the impact of writing versus oral rehearsal on learners’ subsequent oral performance, final narratives were examined using complexity, accuracy, and fluency measures. Results showed that the writing group produced more fluent and lexically diverse narratives than the speaking group but there was no effect on accuracy, and limited effects on grammatical complexity. The study concludes with pedagogical implications for using writing tasks to prepare students for oral tasks.

Keywords L2 writing, complexity, fluency, story continuation task (SCT), EFL, benefits ofwriting for speaking, pre-task planning, rehearsal
Aurélie Chlébowski, “A Semasiological Approach to Non-Lexical Conversational Sounds: Issues, Benefits and Impact,” in Laughter and Other Non-Verbal Vocalisations Workshop 2020, October 2020, pp. 11-14. DOI: 10.4119/lw2020-911.

Abstract This paper proposes to consider a semasiological approach to non-verbal vocalisations. We claim that an acoustic analysis of the components of these sounds is needed to complement the findings of earlier studies. We propose that part of the information conveyed by these sounds comes from their acoustic components and that these components might be subjected to what resembles grammatical rules. Semantic issues are discussed at the end of the paper.
Aurélie Chlébowski, and Nicolas Ballier, “A Manually Annotated Resource for the Investigation of Nasal Grunts,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, European Language Resources Association, May 2020, pp. 6514-6522(in English). https://www.aclweb.org/anthology/2020.lrec-1.802.

Abstract This paper presents an annotation framework for nasal grunts of the whole French CID corpus (Bertrand et al., 2008). The acoustic components under scrutiny are justified and the annotation guidelines are described. We carefully characterise the acoustic cues and visual cues followed by the annotator, especially for non-modal phonation types. The conventions followed for the annotation of interactional and positional properties of grunts are explained. The resulting datasets after data extraction with Praat scripts (Boersma and Weenink, 2019) are analysed with R (R Core Team, 2017), focusing on duration. We analyse the effect of non-modal phonation (especially ingressive phonation) on duration and discuss a specialisation of grunts observed in the CID for grunts with ingressive phonation. The more general aim of this research is to establish putative core and additive properties of grunts and a tentative typology of grunts in spoken interactions.
Sylvain Detey, Lionel Fontan, Maxime Le Coz, and Saïd Jmel, “Computer-assisted assessment of phonetic fluency in a second language: a longitudinal study of Japanese learners of French,” Speech Communication, vol. 125, December 2020, pp. 69 - 79. DOI: https://doi.org/10.1016/j.specom.2020.10.001. http://www.sciencedirect.com/science/article/pii/S0167639320302776.

Abstract Automatic second language (L2) speech fluency assessment has been one of the ultimate goals of several projects aiming at designing Computer-Assisted Pronunciation Training (CAPT) tools for L2 learners. Usually, three challenges must be tackled in order to solve the issues at stake: 1) Defining fluency from a threefold interdisciplinary perspective (acoustic and perceptual phonetics, computer science, L2 education); 2) Using a cost-effective algorithm; 3) Testing the procedure with actual learners’ data. Despite rapid technical developments in the field of automatic speech processing, the tools which are actually available for learners are still scarce, and most of them rely on automatic speech recognition (ASR). Moreover, most research on the topic is focusing on English as the target L2. Therefore, in this article, we address the following research questions: (a) is it possible to use a non-ASR-based low-level signal segmentation algorithm to predict human expert assessment of phonetic fluency in beginner Japanese learners of French in a text-reading task during the first stages of their learning? (b) if the answer to (a) is positive, then what are the best predictors of phonetic fluency among a set of available measures (see below for more details)? (c) is it possible to use this algorithm to monitor the evolution of phonetic fluency (and of its associated predictors) in these learners in a longitudinal study? As a first step, a corpus of French sentences read aloud by 12 Japanese learners of different proficiency levels in French was used to design a prediction system. The read-aloud speech data was perceptually annotated by three human experts on four dimensions: overall speech fluency, speech rate, regularity of speech rate, speech fluidity (i.e. smoothness of transitions between phones). Inter-rater agreement and reliability were high for all dimensions, and the average human ratings were compared with the scores provided by our prediction system. The results show strong correlations between human and automatic scores of speech rate and regularity of speech rate, and a weak correlation for speech fluidity. Automatic scores were finally combined together through a multiple linear regression model in order to predict overall speech fluency. The best model led to a correlation coefficient of .92 between automatic and human ratings, with a root-mean-square error of .38. In the second step of this study, a corpus of identical sentences read aloud four times over two years by 12 Japanese learners of French (after 4, 7, 12, and 19 months of French courses in Japan) was fed to the automatic system. The results show regular progress in overall speech fluency, which fits with the regular progress the Japanese learners under scrutiny were expected to make through their academic program in French at their university in Japan every semester. Our study suggests a positive answer to our first and third research questions, with speech rate as the best predictor to answer our second research question. In a pedagogical perspective, it seems that such a simple algorithm could be integrated in a CAPT tool to monitor learners’ progress in phonetic fluency in reading-aloud tasks.

Keywords Fluency, Automatic, Assessment, French, Japanese, Longitudinal
Jessica Di Napoli, “Filled pauses and prolongations in Roman Italian task-oriented dialogue,” in Laughter and Other Non-Verbal Vocalisations Workshop 2020, October 2020, pp. 24-27. DOI: 10.4119/lw2020-915.

Abstract This paper presents work in progress on two markers of hesitation in Roman Italian task-oriented dialogue, namely filled pauses and prolongations. We investigate their form, relative frequency, and distributional characteristics in Italian. Initial results suggest that Italian speakers produce prolongations more frequently than filled pauses, and that the prototypical hesitant prolongation involves a word-final vowel.
Jing Fang, “Pause in Sight Translation: A Pilot Study,” in Translation Education: A Tribute to the Establishment of World Interpreter and Translator Training Association (WITTA), Zhao, Junfeng and Li, Defeng and Tian, Lu, Ed.Singapore: Springer, 2020, pp. 173-192. DOI: 10.1007/978-981-15-7390-3_11.

Abstract Pauses are common in the practice of sight translation, especially among student interpreters. However, research on this topic has been limited so far. Based on a pilot project, this study aims to explore pauses in English-Chinese sight translation. Two groups of student interpreters, at different stages of training, were recruited to sight translate two texts with different syntactic complexity. The data collection also involved a pre-task vocabulary test and a post-task interview. All the silent pauses were identified and labelled based on their duration, and grammatical position in the text. The results showed that syntactic complexity of the source text had affected the pauses of short and medium length, but its effect on the long pauses of over 2 s was limited. Also, training was found to have an effect on reducing short pauses in the simple text, and medium pauses in the complex text. And students with longer training had significantly fewer longer pauses (over 1 s) at an ungrammatical position than the junior student interpreters. The research also found that, although interpreters had encountered difficult words in the source texts, these words were not a major contributor to pauses. Apart from pausing, interpreters also responded in other ways when facing lexical challenges.

Keywords Pause; Sight translation; Syntactic complexity; Training effect
Lorenzo García-Amaya, and Sean Lang, “Filled Pauses are Susceptible to Cross-Language Phonetic Influence: Evidence from Afrikaans-Spanish Bilinguals,” Studies in Second Language Acquisition, 2020, pp. 1–29. DOI: 10.1017/S0272263120000169.

Abstract This article investigates the effects of long-term bilingualism on the production of filled pauses (FPs; e.g., uh, um, eh, em) in the speech of Afrikaans-Spanish bilinguals from Patagonia, Argentina. The instrumental analysis draws from a corpus of sociolinguistic interviews obtained from three speaker groups: L1-Afrikaans/L2-Spanish bilinguals; L1-Spanish-comparison speakers, also from Patagonia; and L1-Afrikaans-comparison speakers from South Africa. In the data analysis, we examined relative FP usage (categorical outcomes), as well as phonetic measures of vowel quality and segmental duration (continuous outcomes). The results allude to multiple patterns of cross-language influence (e.g., L1-to-L2 influence, L2-to-L1 influence, bidirectional influence), which depend on the phonetic measure explored. Overall, the findings suggest that the patterns of cross-language phonetic influence observed in the L2 learning of traditionally understood lexical items likewise hold in the L2 learning of hesitation markers such as FPs.
Kajsa Gullberg, “Planning Processes in Speaking, Texting, and Writing: The effect of reader’s and listener’s temporal and spatial presence on planning in language production,” Master's Thesis, Lund University. September 2020. http://lup.lub.lu.se/student-papers/record/9030233.

Abstract This thesis investigates planning processes in language production, more specifically in texting as compared to speaking and writing, through pauses analyses (Goldman Eisler, 1969; Matsuhashi, 1981). Texting is used to examine the role of the spatial and temporal presence of a listener/reader in language production. Texting offers an interesting context in this respect since it has spatial absence between texter and reader, just like in writing, but temporal presence just like in speaking. The main research questions are as follows: In which contexts are pauses located in texting, speaking, and writing? How long are the production bursts in speaking, texting, and writing? This study uses the processing models the blueprint of the speaker (Levelt 1999) and the individual-environmental model of written language production (Hayes 1996) to identify the processes in texting. Part of the thesis comprises method development to capture and analyse the real-time language production in texting on smartphones. The method consists of an experimental set-up where the same participant talks and texts dialogically, and then writes monologically. In the texting and writing conditions, the pause threshold is 1 minute, and in the speaking condition all perceived pauses are identified. In the analyses, the pauses are categorised based on the context that precede the pause (e.g. syntactic unit or revision). The results show that the temporal and spatial presence of the reader/listener has an effect on language production. Clause boundaries are important contexts for pausing and planning in all three conditions, indicating that language users make use of syntactic units when they produce language regardless of the spatial and temporal presence of the speaker/listener. In texting and writing, pauses following a revision are important, showing that the texters and writers review what they have written. Further, the results show that texting has shorter planning units than both speaking and writing, which can be explained by the temporal presence of the reader resulting in a faster pace of communication, while the writing tool limits the speed at which language can be produced. In speaking and texting, pauses in phrase-final position are more common than in writing, which can be a result of the shorter planning units. In conclusion, texters adapt their language production to the temporal presence of a reader, through shorter planning units, while also adapting to the spatial absence of the reader through reviewing and editing their messages. The findings of this thesis are finally used to propose a model for the language processes in texting.

Keywords language processing; spoken language production; written language production; computer mediated communication; CMC; planning processes; texting
Widya Nindi Hardianti, and Rohmani Nur Indah, “Disfluencies in Stand-Up Comedy: A Psycholinguistic Analysis on Drew Lynch's Stuttering,” LEKSEMA: Jurnal Bahasa dan Sastra, vol. 5, no. 1, 2020, pp. 27-38. DOI: 10.22515/ljbs.v5i1.2075. http://ejournal.iainsurakarta.ac.id/index.php/leksema/article/view/2075.

Abstract Difficulties of producing speech sound in stutterers are indicated by the repetition, pause, prolongation, revision, and filled pause on the speaking. However, such difficulties do not hinder the communication as shown in the speech of a stand-up comedian named Drew Lynch. This study aims at exploring the types of fluency disorder identified in Lynch’s utterances on stage. This study uses the descriptive qualitative method employed through the process of observing, transcribing, describing, and analyzing his utterances in American Got Talent videos. The result shows Lynch produces all kinds of disfluency covering filled pause, phrase repetition, revision, multisyllabic whole-word repetition, monosyllabic whole-word repetition, repetition of individual sound or syllable, prolongation of sound, and block. The monosyllabic whole-word repetition is more dominant. The combination happens between revision with monosyllabic whole-word repetition, prolongation, or multisyllabic whole-word repetition. These findings confirm that in the context of stand-up comedy, the disfluencies in stuttering do not hamper the transfer of meaning.

Keywords disfluency, fluency disorder, stand-up comedy, stuttering
Zara Harmon, and Vsevolod Kapatsinski, “The best-laid plans of mice and men: Competition between top-down and preceding-item cues in plan execution,” in CogSci 2020 Proceedings (Proceedings of the Cognitive Science Society), 2020, pp. 1674-1680. https://cognitivesciencesociety.org/cogsci20/papers/0366/index.html.

Abstract There is evidence that the process of executing a planned utterance involves the use of both preceding-context and top-down cues. Utterance-initial words are cued only by the top-down plan. In contrast, non-initial words are cued both by top-down cues and preceding-context cues. Co-existence of both cue types raises the question of how they interact during learning. We argue that this interaction is competitive: items that tend to be preceded by predictive preceding-context cues are harder to activate from the plan without this predictive context. A novel computational model of this competition is developed. The model is tested on a corpus of repetition disfluencies and shown to account for the influences on patterns of restarts during production. In particular, this model predicts a novel Initiation Effect: following an interruption, speakers re-initiate production from words that tend to occur in utterance-initial position, even when they are not initial in the interrupted utterance.
Nur Kafifah, and Nurul Aini, “A Comparative Analysis of Spoken Error of Students’ Utterances,” Pedagogy : Journal of English Language Teaching, vol. 8, no. 1, 2020, pp. 64-72. DOI: 10.32332/pedagogy.v8i1.1926. http://e-journal.metrouniv.ac.id/index.php/pedagogy/article/view/1926.

Abstract This present study deals with the comparative analysis in spoken production errors made by the 2nd and the 4th-semester students of English Education Study Program in STKIP Kumala Metro. The objectives of this research are to comparative the types of errors, the frequency of error, the dominant type of errors, the similarities and differences of errors, and the sources of errors. The type of this research is qualitative research. The data of this research are utterances containing errors taken from the 2nd and the 4th-semester students. In collecting data, the researcher listened to the audio record carefully, writes the scripts correctly, then identifies the data, and selects the data deals with the types of errors. The researcher used Clark and Clark, Dulay, Burt, and Krashen's theory to analyze the errors. The results indicated that there are three types of errors made by the 2nd-semester students, namely, speech errors (78,22%), morphological errors (15,6%), and syntactical errors (6,06%). Whereas, the erroneous made by the 4th-semester students are speech errors (83,86%), morphological errors (13,1%), and syntactical errors (2,93%). The speech errors made by the 2nd and the 4th-semester students have similarities and differences. The similarities of speech errors that found by the researcher were: silent pause, filled pause, repeats, false start (unretracted), false start (retraced), correction, interjection, stutters, a slip of the tongue, error in pronunciation, error in vocabulary, error in word selection, the omission of bound morpheme-s, the omission of to be, the addition of to be, the omission of the verb, the omission of –Ing, the addition of –Ing, and misuse of to be. The differences of errors made by the 2nd and the 4th-semester students are in the addition of preposition, malformation, and disordering. The dominant error made by students is filled pause. These speech errors mostly caused by three sources; cognitive difficulty, situational anxiety, and social reason.
Oriana Kilbourn-Ceron, Meghan Clayards, and Michael Wagner, “Predictability modulates pronunciation variants through speech planning effects: A case study on coronal stop realizations,” Laboratory Phonology: Journal of the Association for Laboratory Phonology, vol. 11, no. 1, 2020, pp. 5. DOI: 10.5334/labphon.168.

Abstract Predictability has been shown to be associated with many dimensions of variation in speech, including durational variation and variable omission of segments. However, the mechanism or mechanisms that underlie these effects are still unclear. This paper presents data on a new aspect of predictability in speech, namely how it affects allophonic variation. We examine two coronal stop allophones in English, flap and glottal stop, and find that their relationship with predictability is quite different from what is expected under current theories of probabilistic reduction in speech. Flapping is more likely when the word that follows is more predictable, but is not influenced by the frequency of the word itself, while glottal stops are more likely in words that are less predictable. We propose that the crucial distinction between these two allophones is how they are conditioned by phonological context. This, we argue, interacts with online speech planning processes and gives rise to variability for context-dependent allophones. This hypothesis offers a specific, testable mechanism for certain predictability effects, and has the potential to extend to other factors that contribute to variability in speech.

Keywords Phonological variation, predictability, speech production planning, corpus phonology
Katarzyna Klessa, and Maciej Karpiński, “Hesitation markers in a corpus of Polish-German, German-German and Polish-Polish task-oriented dialogues in the context of communicative alignment,” in Proceedings of the 19th Meeting of theTexas Linguistics Society, vol. 19, Austin, Texas, USA, February 2020, pp. 17-26. http://tls.ling.utexas.edu/2020tls/TLS19_Conference_Proceedings.pdf.

Abstract In this study, we investigate the distribution and properties of hesitation markers produced in task-oriented dialogues by Polish and German teenagers. The material comes from a multimodal corpus which has been collected in the Polish-German border area, in the cities of Słubice and Frankfurt (Oder). The speakers took part in two kinds of dialogue tasks: a collaborative and a competitive one. We report that the number and durational variability of hesitation markers produced by the speakers are influenced by dialogue task type and language configuration. We inspect aspects of interlocutor alignment using automatized annotation mining. A number of patterns of alignment can be visually traced for the study material. However, only few of them can be confirmed by tests as statistically significant.
Christian Koch, and Britta Thörle, “Metadiscursive Activities in Oral Discourse Production in L2 French: A Study on Learner Profiles,” Corpus Pragmatics, 2020. DOI: 10.1007/s41701-020-00089-7.

Abstract This study explores the use of discourse markers (DMs) in metadiscursive activities such as word searches, repairs or metalinguistic evaluations that occur during spontaneous oral production. The analysis is based on a corpus of telephone conversations between advanced learners and native speakers of French and draws on functional as well as on interactional work on DM. In a first step, three selected learner profiles provide insight, by means of sequence analysis, into how individual learners make use of their particular DM inventory for their utterance planning, carrying out repairs and expressing attitudes toward their oral production. In a second step, the study compares native and non-native speaker’s DM inventories in order to detect general tendencies in the learners’ DM use that differ from the native speakers’ use of DMs. The comparison of the profiles shows that, even if there is relatively little agreement among the learners regarding the concrete lexical forms of the DMs, similarities can be discerned regarding the interlinguistic characteristics (e.g. individual preferences and overuse in the form of “lexical teddy bears” such as oui, alors or voilà, underuse of typical French reformulation markers like enfin, and weak routine in the lexicalisation of metadiscursive comments).
Justin J. H. Lo, “Between Äh(m) and Euh(m): The Distribution and Realization of Filled Pauses in the Speech of German-French Simultaneous Bilinguals,” Language and Speech, vol. 63, no. 4, December 2020, pp. 746-768. DOI: 10.1177/0023830919890068. https://journals.sagepub.com/doi/10.1177/0023830919890068.

Abstract Filled pauses are well known for their speaker specificity, yet cross-linguistic research has also shown language-specific trends in their distribution and phonetic quality. To examine the extent to which speakers acquire filled pauses as language- or speaker-specific phenomena, this study investigates the use of filled pauses in the context of adult simultaneous bilinguals. Making use of both distributional and acoustic data, this study analyzed UH, consisting of only a vowel component, and UM, with a vowel followed by [m], in the speech of 15 female speakers who were simultaneously bilingual in French and German. Speakers were found to use UM more frequently in German than in French, but only German-dominant speakers had a preference for UM in German. Formant and durational analyses showed that while speakers maintained distinct vowel qualities in their filled pauses in different languages, filled pauses in their weaker language exhibited a shift towards those in their dominant language. These results suggest that, despite high levels of variability between speakers, there is a significant role for language in the acquisition of filled pauses in simultaneous bilingual speakers, which is further shaped by the linguistic environment they grow up in.
Minxia Luo, Mona Neysari, Gerold Schneider, Mike Martin, and Burcu Demiray, “Linear and Nonlinear Age Trajectories of Language Use: A Laboratory Observation Study of Couples’ Conflict Conversations,” The Journals of Gerontology: Series B, vol. 75, no. 9, 03 2020, pp. e206-e214. DOI: 10.1093/geronb/gbaa041.

Abstract This study investigated linear and nonlinear age effects on language use with speech samples that were representative of naturally occurring conversations.Using a corpus-based approach, we examined couples’ conflict conversations in the laboratory. The conversations, from a total of 364 community-dwelling German-speaking heterosexual couples (aged 19–82), were videotaped and transcribed. We examined usage of lower-frequency words, grammatical complexity, and utterance of filled pauses (e.g., äh [“um”]).Multilevel models showed that age effects on the usage of lower-frequency words were nonsignificant. Grammatical complexity increased until middle age (i.e., 54) and then declined. The utterance of filled pauses increased until old age (i.e., 70) and then decreased.Results are discussed in relation to cognitive aging research.

Keywords Adult life span; Cognitive aging; Filled pauses; Frequency of nouns; Grammatical complexity
Nathan D. Maxfield, “Inhibitory Control of Lexical Selection in Adults who Stutter,” Journal of Fluency Disorders, vol. 66, 2020, pp. 105780. DOI: 10.1016/j.jfludis.2020.105780. http://www.sciencedirect.com/science/article/pii/S0094730X20300358.

Abstract Purpose: Based on previous evidence that lexical selection may operate differently in adults who stutter (AWS) versus typically-fluent adults (TFA), and that atypical attentional processing may be a contributing factor, the purpose of this study was to investigate inhibitory control of lexical selection in AWS. | Method: 12 AWS and 12 TFA completed two tasks. One was a picture naming task featuring High and Low Agreement object naming. Naming accuracy and reaction times (RT), and event-related potentials (ERPs) time-locked to picture onset, were recorded. Second was a flanker task featuring Congruent and Incongruent arrow arrays. Push-button accuracy and RTs, and ERPs time-locked to arrow array onset, were recorded. | Results: Low Agreement pictures were named less accurately and slower than High Agreement pictures in both Groups. The magnitude of the Agreement effect on naming RTs was larger in AWS versus TFA. Delta-plot analysis revealed that the Agreement effect was positively correlated with individual differences in inhibition in TFA but not in AWS. Moreover, Low Agreement pictures elicited negative-going ERP activity relative to High Agreement pictures in both Groups. However, the scalp topography of this effect was markedly reduced in AWS versus TFA. For the Flanker task, Congruency affected push-button accuracy and RTs, and N2 amplitudes, similarly between groups. | Conclusions: Results point to a selective deficit in inhibitory control of lexical selection in AWS. Potential pathways between diminished inhibitory control of lexical selection, speech motor control and stuttering are discussed.

Keywords stuttering, lexical selection, executive, inhibition, ERP
Mohammed Ali Mohsen, and Mutahar Qassem, “Analyses of L2 Learners’ Text Writing Strategy: Process-Oriented Perspective,” Journal of Psycholinguistic Research, vol. 49, 2020, pp. 435-451. DOI: 10.1007/s10936-020-09693-9.

Abstract Second language writing researchers have examined the affordances of Automated Writing Evaluation programs in providing immediate feedback that helps improve students’ writing outputs. However, a little is known about tracking learners’ process during writing essays and whether much/less pauses made by learners could predict good/poor quality of students’ writing output. This article aims to address this issue by recording a case study of 8 postgraduate students’ pauses during writing two types of text genre; descriptive and argumentative essays. Their pauses have been recorded using Keystroke logging program—Input Log 7.0 (Leijten and Van Waes in Writ Commun 30:358–392, 2013. https://doi.org/10.1177/0741088313491692) and their screen activities were captured by Active Presenter program. Findings revealed that the students’ pauses were significantly higher in word boundary than in sentence and/or paragraph boundaries. Moreover, on word boundary, pauses before words were significantly higher than that after words for both types of text genre. Concerning pauses across text genre, students’ pauses were significantly higher in the argumentative essay than that of the descriptive essay. Multiple regression revealed negative correlation between much pauses and poor quality of students’ product in the descriptive essay while there was no correlation found in the argumentative essay.
Beeke Muhlack, “L1 and L2 Production of Non-Lexical HesitationParticles of German and English Native Speakers,” in Laughter and Other Non-Verbal Vocalisations Workshop 2020, October 2020, pp. 44-47. DOI: 10.4119/lw2020-924.

Abstract This study focuses on the vowel quality of non-lexical hesitation particles produced by 24 English and German native speakers in their native language (L1) and their second language (L2) both of which are English and German. The aim is to show that a) English and German hesitation particles employ a different vowel quality and b) L2-learners of the respective language can adapt the native-like vowel quality if they are sufficiently proficient in their L2.
Costanza Navarretta, “Speech Pauses and Dialogue Acts,” in 2020 IEEE International Conference on Human-Machine Systems (ICHMS), Rome, Italy, IEEE, 2020, pp. 1-6. DOI: 10.1109/ICHMS49158.2020.9209502. https://ieeexplore.ieee.org/document/9209502.

Abstract This study concerns the use of speech pauses, and especially breath pauses in a Danish corpus of spontaneous dyadic conversations. Speech pauses which have specific communicative functions are investigated in relation to their occurrences before and after other communicative units, all annotated and classified in the form of dialogue acts. Breath pauses have been addressed in only few studies even though they are important in communication and therefore should be accounted for when implementing human-machine dialogue systems. Dialogue acts, on the contrary, have been one of the backbones in dialogue systems since they generalize over different expressions of common communicative functions. In the current work, we describe the annotation of dialogue acts in the corpus and present an analysis of pauses using these annotations. To our best knowledge, dialogue acts have not been previously used for analyzing the functions of breath pauses. Our analysis shows that the most common type of pause having a communicative function in the Danish conversations are breath pauses. Breath pauses in the corpus have different uses, one of these being that of delimiting speech segments which are left unfinished and are then abandoned by the speaker (retractions in dialogue acts terminology) and therefore perceivable breathing can be a useful feature for determining spoken segments which must not be included in the dialogue history in human-machine dialogue systems.
Luis Bernardo Quesada Nieto, “Fenómenos de vacilación, sus contextos léxicos ysintácticos en entrevistas formales de legisladores aciudadanos en el Congreso de la Ciudad de México [Lexical and syntactic contexts of hesitation phenomena informal deputy-citizen interviews conducted at Mexico City congress],” Cuadernos de Lingüística de El Colegio de México, vol. 7, no. e141, October 2020, pp. 1-50. DOI: 10.24201/clecm.v7i0.141. http://www.scielo.org.mx/scielo.php?pid=S2007-736X2020000100102&script=sci_abstract&tlng=en.

Abstract This article, which is an outcome of an ethnographic research, aims to offer an insight into lexical and syntactic contexts of some hesitation phenomena (short fillers, repetitions, long fillers, word lengthening, unfinished words and unfinished phrases), identified in a corpus sample that consists of structured interviews conducted by a group of deputies of Mexico City Local Congress with citizens who applied for the ombudsperson’s position at the city’s Human Rights Office (Comisión de Derechos Humanos de la Ciudad de México). Drawing upon a lexical and syntactic description, some remarks on the hesitation phenomena’s linguistic and communicative values are presented. I propose an interpretation of hesitation occurrence patterns that appear in the respondent’s answers. This interpretation is based on the discursive planning level, the interaction between hesitation markers and word classes, and the concept of repertoire as it has been used in the theory of translanguaging. Towards the end of the manuscript I argue that the studied phenomena and their distribution are directly related to open class words, and the cognitive effort of producing grammatical, accurate and socially appropriate messages.

Keywords hesitation markers, discursive planning, oral language, word classes, repertoire, translanguaging theory
Nikhil Saini, Jyotsana Khatri, Preethi Jyothi, and Pushpak Bhattacharyya, “Generating Fluent Translations from Disfluent Text Without Access to Fluent References: IIT Bombay@IWSLT2020,” in Proceedings of the 17th International Conference on Spoken Language Translation, Online, Association for Computational Linguistics, July 2020, pp. 178-186. DOI: 10.18653/v1/2020.iwslt-1.22. https://www.aclweb.org/anthology/2020.iwslt-1.22.

Abstract Machine translation systems perform reasonably well when the input is well-formed speech or text. Conversational speech is spontaneous and inherently consists of many disfluencies. Producing fluent translations of disfluent source text would typically require parallel disfluent to fluent training data. However, fluent translations of spontaneous speech are an additional resource that is tedious to obtain. This work describes the submission of IIT Bombay to the Conversational Speech Translation challenge at IWSLT 2020. We specifically tackle the problem of disfluency removal in disfluent-to-fluent text-to-text translation assuming no access to fluent references during training. Common patterns of disfluency are extracted from disfluent references and a noise induction model is used to simulate them starting from a clean monolingual corpus. This synthetically constructed dataset is then considered as a proxy for labeled data during training. We also make use of additional fluent text in the target language to help generate fluent translations. This work uses no fluent references during training and beats a baseline model by a margin of 4.21 and 3.11 BLEU points where the baseline uses disfluent and fluent references, respectively. Index Terms- disfluency removal, machine translation, noise induction, leveraging monolingual data, denoising for disfluency removal.
Loredana Schettino, Maria Di Maro, and Francesco Cutugno, “Silent pauses as clarification trigger,” in Laughter and Other Non-Verbal Vocalisations Workshop 2020, October 2020, pp. 51-54. DOI: 10.4119/lw2020-927.

Abstract Among possible pragmatic feedback an interlocutor can use to acknowledge the degree of understanding of an utterance, clarification requests (CRs) are to be considered. The functional role of CRs can furthermore be expressed via silent pauses - or failed turn-giving moves - which express an understanding problem and are solved through a clarify speech act. In this work, we therefore hypothesise that some silent pauses, in specific conditions, may also have an interactional role which is interpreted by the speaker as a clarification need.
Katerina Smirnova, Nikolay Korotaev, Yana Panikratova, Irina Lebedeva, Ekaterina Pechenkova, and Olga Fedorova, “Using the RUPEX Multichannel Corpus in a Pilot fMRI Study on Speech Disfluencies,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, European Language Resources Association, May 2020, pp. 195-203(in English). https://www.aclweb.org/anthology/2020.lrec-1.25.

Abstract In modern linguistics and psycholinguistics speech disfluencies in real fluent speech are a well-known phenomenon. But it's not still clear which components of brain systems are involved into its comprehension in a listener's brain. In this paper we provide a pilot neuroimaging study of the possible neural correlates of speech disfluencies perception, using a combination of the corpus and functional magnetic-resonance imaging (fMRI) methods. Special technical procedure of selecting stimulus material from Russian multichannel corpus RUPEX allowed to create fragments in terms of requirements for the fMRI BOLD temporal resolution. They contain isolated speech disfluencies and their clusters. Also, we used the referential task for participants fMRI scanning. As a result, it was demonstrated that annotated multichannel corpora like RUPEX can be an important resource for experimental research in interdisciplinary fields. Thus, different aspects of communication can be explored through the prism of brain activation.
Jürgen Trouvain, and Raphael Werner, “Comparing Annotations of Non-verbal Vocalisations in Speech Corpora,” in Laughter and Other Non-Verbal Vocalisations Workshop 2020, October 2020, pp. 69-72. DOI: 10.4119/lw2020-931.

Abstract In this study eleven corpora of spontaneous and scripted speech (in English and in German) are analysed regarding their annotation inventories of selected highly frequent nonverbal vocalisations (NVVs). It appears that only one corpus considers all NVVs and that laughter is the only NVV annotated in all corpora. The findings lead to a discussion of possible reasons for this situation. In conclusion it is argued that a wider distribution and more consistency is needed with respect to the annotation of NVVs.
Wikipedia contributors, “Filler (linguistics) -- Wikipedia, The Free Encyclopedia,” October 2020. https://en.wikipedia.org/w/index.php?title=Filler_(linguistics)&oldid=978016784.

Abstract In linguistics, a filler, filled pause, hesitation marker or planner is a sound or word that is spoken in conversation by one participant to signal to others a pause to think without giving the impression of having finished speaking. (These are not to be confused with placeholder names, such as thingamajig, whatchamacallit, whosawhatsa and whats'isface, which refer to objects or people whose names are temporarily forgotten, irrelevant, or unknown.) Fillers fall into the category of formulaic language, and different languages have different characteristic filler sounds. The term filler also has a separate use in the syntactic description of wh-movement constructions.
Kadek Wirahyuni, and Putu Nitiasih, “Pause and Slip of the Tongue on the Participants of 2019 Putra Putri Undiksha in the Interview Session,” International Journal of Education and Pedagogy, vol. 2, no. 2, 2020, pp. 64-77. http://myjms.moe.gov.my/index.php/ijeap/article/view/9488.

Abstract The Election of Putra Putri Undiksha is conducted every year. There are several stages in the selection of Putra Putri Undiksha, which one of them is the interview stage. At this stage, participants will be interviewed about insight, talent, personality, and beauty or good looks. During this interview, researchers found pause and slip of the tongue that were said by several participants. This research uses descriptive qualitative research. Qualitative research is an approach in conducting research whose orientation lies in natural phenomena (Mahmud, 2011: 89). Sources of data in this study took the form of slip of the tongue and pause notes experienced by the participants. The subjects of the research were 50 participants of Putra Putri Undiksha consisting of 22 men and 28 women. Data collection technique in this study is indirect techniques in the form of documentary study techniques. The source consists of documents in the form of notes (Syamsuddin and Damaianti, 2015: 108). The types of pause that are obtained are pause and filled pause. The nine pauses that occurred were as many as nine, consisting of 2 pauses and there were 7 filled in, namely ‘e’, ‘m’, and ‘ng’. In addition there are also progressive repetitive pause that are ‘saya’, ‘apa’, ‘itu’, and ‘ya’. Furthermore, there were 13 slips of the tongues spoken by Undiksha Putra Putri participants during the interview. Tongue blobs found were tongue flirting, selection error and assembling error. Selection errors are divided into three types, namely semantic errors, which are the utterances, 'Pak' and 'selamat pagi'. Furthermore, the error of malaproprism is the utterance of 'fikir', and the error of mixed words or blends on the utterance of sinu, benul, inu, bileh. The mistake of assembling in this research is the transposition error ‘menyadari sudah’, ‘semester tiga baru’, and ‘media sosial’. Furthermore, the mistake of anticipating assembling is found in the utterances ‘halus’, ‘pretasi’, and ‘diporpaganda’. The cause of pause and slips of the tongue in the Putra Putri Undiksha participants during this interview was due to nervousness or nervousness, thinking, not knowing the answers, haste, spontaneity, out of focus, and habits.
Yasunori Yamada, Kaoru Shinkawa, Akihiro Kosugi, Masatomo Kobayashi, Hironobu Takagi, Miyuki Nemoto, Kiyotaka Nemoto, and Tetsuaki Arai, “Predicting Future Accident Risks of Older Drivers by Speech Data from a Voice-Based Dialogue System: A Preliminary Result,” in Advances in the Human Side of Service Engineering. AHFE 2020. Advances in Intelligent Systems and Computing, vol. 1208, Springer, Cham, July 2020, pp. 131-137. DOI: 10.1007/978-3-030-51057-2_19.

Abstract As the world’s elderly population increases, driving accidents involving older adults has become an increasingly serious social problem. Previous studies have suggested cognitive impairments as one of the risk factors for future accidents. However, it remains unclear whether and how such future accident risks related to cognitive impairments can be predicted by using health monitoring technologies. In this study, we collected speech data from simulated conversations between 38 healthy older adults and a voice-based dialogue system. We followed up with the participants 1.5 years later and found that 17 of them had experienced near-accidents within the past year. We then built a binary classification model using the originally obtained speech data and found through leave-one-out cross-validation that it could predict whether a person would have a near-accident experience with 78.9% accuracy. Our preliminary results suggest that speech data from voice-based interaction systems might help older drivers recognize future accident risks.
Michael Zock, and Chris Biemann, “Comparison of Different Lexical Resources With Respect to the Tip-of-the-Tongue Problem,” Journal of Cognitive Science, vol. 21, no. 2, 2020, pp. 193-252. http://cogsci.snu.ac.kr/jcs/index.php/issues/?uid=298&mod=document.

Abstract Language production is largely a matter of words which, in the case of access problems, can be searched for in an external resource (lexicon, thesaurus). When accessing the resource, the user provides her momentarily available knowledge concerning the target and the resource-powered system responds with the best guess(es) it can make given this input. As tip-of-the-tongue studies have shown, people always have some knowledge concerning the target (meaning fragments, number of syllables, ...) even if its precise or complete form is eluding them. We will show here how to tap on this knowledge to build a resource likely to help authors (speakers/writers) to overcome the Tip-of-the-Tongue (ToT) problem. Yet, before doing so we need a better understanding of the various kinds of knowledge people have when looking for a word. To this end, we asked crowd workers to provide some cues to describe a given target and to specify then how each one of them relates to it, in the hope that this could help others to find the elusive word. Next, we checked how well a given search strategy worked when being applied to differently built lexical networks. The results showed quite dramatic differences, which is not really surprising. After all, different networks are built for different purposes; hence each one of them is more or less well suited for a given task. What was more surprising though is the fact that the relational information given by the users did not allow us to find the elusive word in WordNet more easily than without relying on this information.

Keywords word access, tip of the tongue problem, indexing, knowledge states, metaknowledge, mental lexicon, navigation, lexical networks

2019

Thanaporn Anansiripinyo, and Chutamanee Onsuwan, “Acoustic-phonetic characteristics of Thai filled pauses in monologues,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 51-54. DOI: 10.21862/diss-09-014-anan-onsu. https://doi.org/10.21862/diss-09-014-anan-onsu.

Abstract Filled pause (FP) is one type of disfluent phenomena that is commonly found in everyday speech. It has been widely studied in many languages, but little is known about this topic in Thai. This work explored three important acoustic-phonetic characteristics of Thai filled pauses in monologues. To elicit target monosyllabic tokens of FPs and those of regular word (RW) counterparts, 31 Thai adult females were asked to watch two short cooking videos and describe the contents. They were also asked to read out loud target word lists. Three acoustic measures: syllable dura¬tion, first (F1) and second formant (F2) frequencies were taken from 738 tokens. Across vowel contexts, only F2, not F1, in FPs, was significantly different from that in RWs. Differences in syllable duration between RWs versus FPs were near significant. The findings suggest that Thai speakers produced FPs in a presumably different way from RWs. In FPs, the syllable was relatively lengthened and the tongue position was moved towards the center of vowel space. Future directions include a detailed analysis of FPs in terms of amplitude, fundamental frequency, pause duration before/after fillers and other non-linguistic factors.
Maria Bakti, “Error type disfluencies in consecutively interpreted and spontaneous monolingual Hungarian speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 71-74. DOI: 10.21862/diss-09-019-bakti. https://doi.org/10.21862/diss-09-019-bakti.

Abstract Interpreting can be considered as a form of spontaneous speech, the key differences being that language change is involved in interpreting and the fact that speech production is influenced by several constraints during interpreting. Research has shown that the interpreting task influences the disfluency patterns of target language texts. The aim of this paper is to investigate how the frequency and distribution of error type disfluencies changes in the target language output of trainee interpreters as they progress in their training. Results indicate that there is no considerable change in the frequency and proportion of error type disfluencies in the target language texts recorded at the end of the second, third and fourth semesters of interpreter training. The proportion of error type disfluencies is higher in the consecutively interpreted texts than in the spontaneous monolingual speech of the students. This suggests that the complexity of the task, rather than progress in training, determines the disfluency pattern of consecutively interpreted target language texts.
Charlotte Bellinghausen, Thomas Fangmeier, Bernhard Schröder, Johanna Keller, Susanne Drechsel, Peter Birkholz, Ludger Tebartz van Elst, and Andreas Riedel, “On the role of disfluent speech for uncertainty in articulatory speech synthesis,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 39-42. DOI: 10.21862/diss-09-011-bell-etal. https://doi.org/10.21862/diss-09-011-bell-etal.

Abstract In this paper we present a perception study on the role of disfluent speech in forms of prosodic cues of uncertainty in question-answering situations. In our scenario the answer to each question was modeled by varying three prosodic cues: pause, intonation, and hesitation. The utterances were generated by means of an articulatory speech synthesizer. Subjects were asked to rate each answer on a Likert scale with respect to uncertainty, naturalness and understandability. Results showed evidence for an additive principle of the prosodic cues, i.e. the more cues were activated the higher the perceived level of uncertainty. Overall, the effect of intonation and hesitation was more evident than the effect of pause.
Simon Betz, and Loulou Kosmala, “Fill the silence! Basics for modeling hesitation,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 11-14. DOI: 10.21862/diss-09-004-betz-kosm. https://doi.org/10.21862/diss-09-004-betz-kosm.

Abstract In order to model hesitations for technical applications such as conversational speech synthesis, it is desirable to understand interactions between individual hesitation markers. In this study, we explore two markers that have been subject to many discussions: silences and fillers. While it is generally acknowledged that fillers occur in two distinct forms, um and uh, it is not agreed on whether these forms systematically influence the length of associated silences. This notion will be investigated on a small dataset of English spontaneous speech data, and the measure of distance between filler and silence will be introduced to the analyses. Results suggest that filler type influences associated silence duration systematically and that silences tend to gravitate towards fillers in utterances, exhibiting systematically lower duration when preceding them. These results provide valuable insights for improving existing hesitation models.
Simon Philip Botley, and Sharifah Zakiah Wan Hassan, “Investigating Dysfluency in Malaysian Spoken Discussions,” in Research Mosaics of Language Studies in Asia: Differences and Diversity, Lah, Salasiah Che and Ramakrishna, Rita Abdul Rahman, Ed.: Penerbit Universiti Sains Malaysia, 2019. https://books.google.co.jp/books?hl=en&lr=lang_en&id=HDX6DwAAQBAJ&oi=fnd&pg=PT8&dq="hesitation phenomena"&ots=h2H97wyXzv&sig=M40wPldX9FGmh4JvgwTd0k4NrqU#v=onepage&q="hesitation phenomena"&f=false.

Abstract (none)
Harry Collins , Willow Leonard-Clarke, and Hannah O’Mahoney, “‘Um, er’: how meaning varies between speech and its typed transcript,” Qualitative Research, vol. 19, no. 6, 2019, pp. 653-668. DOI: 10.1177/1468794118816615. https://journals.sagepub.com/doi/10.1177/1468794118816615.

Abstract We report a small empirical study on the way the transcription used to represent speech affects its meaning. We show that ‘disfluencies’ in speech indicate far more uncertainty in the speaker when transmitted in text than when transmitted in recorded sound. This has important implications for how transcribed interviews should be edited when they are being used to convey meaning rather than the organization of phonemes. We propose the implications of different ways of representing speech in text could be a new subject for investigation. Presented here is one possible empirical approach to such studies.

Keywords certainty in text and speech, disfluencies, editing of transcripts, interview transcription, meaning, qualitative research, transcribing fillers: um, er, uh
Iulia Grosman, Anne Catherine Simon, and Liesbeth Degand, “Empathetic hearers perceive repetitions as less disfluent, especially in non-broadcast situations,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 23-26. DOI: 10.21862/diss-09-007-gros-etal. https://doi.org/10.21862/diss-09-007-gros-etal.

Abstract This experiment measures the impact of the communicative situation on perceived fluency in French speech. We consider three dimensions of fluency: grammatical, discursive and socio-interper¬sonal. We first hypothesise that grammatical fluency is less influenced by contextual constraints than the other two dimensions. Furthermore, taking into account the Interpersonal Reactivity Index of each participant, we hypothesise that hearers with higher interpersonal capacities will be more tolerant in their fluency evaluation, because of their ability to project into the speaker’s mind. The strength of the design rests on the proposal to test natural stimuli and integrate social and individual variables in a perception experiment.
Dorottya Gyarmathy, and Viktória Horváth, “Pausing strategies with regard to speech style,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 27-30. DOI: 10.21862/diss-09-008-gyar-horv. https://doi.org/10.21862/diss-09-008-gyar-horv.

Abstract Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrela¬tion between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style.
Mária Gósy, “Halt command in word retrieval,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 3-6. DOI: 10.21862/diss-09-002-gosy. https://doi.org/10.21862/diss-09-002-gosy.

Abstract In this study, occurrences and temporal patterns of five types of disfluencies were analyzed that show a common feature on the surface. All of them have some kind of interruption of content words followed by some continuation. The purpose was to show whether the place of interruption of the word articulation and the durational patterns of the editing phases are characteristic of re-starts, false starts, slips of the tongue, pauses within words, and prolongations. More than 1,400 instances were processed. Both (i) the number of pronounced segments of abandoned words and the duration of the corresponding editing phases are characteristic of a specific disfluency type, and (ii) speakers select a strategy to overcome their speech planning difficulties most economically.
Julianna Jankovics, and Luca Garai, “Disfluencies in mildly intellectually disabled young adults’ spontaneous speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 79-82. DOI: 10.21862/diss-09-021-jank-gara. https://doi.org/10.21862/diss-09-021-jank-gara.

Abstract The study analyzes various hesitations and repairs in the spontaneous speech of mildly intellectually disabled women. The main research questions of the study focus on the similarities and differences in the frequency of disfluencies and the duration of pauses between the spontaneous speech of mildly intellectually disabled and mentally healthy young adults. Our results show that hesitation phenomena were more frequent among intellectually disabled subjects in spontaneous speech, while repairs occurred more frequently among control subjects in guided spontaneous speech.
Annelies Jehoul, “Filled pauses from a multimodal perspective. On the interplay of speech and eye gaze.,” PhD Dissertation, Katholieke Universiteit Leuven. September 2019(in eng). https://lirias.kuleuven.be/2814932?limo=0.

Abstract This project offers a novel, integrative approach on filled pauses, the elements 'euh' and 'euhm' in Dutch. Insights on filled pauses from various research traditions are united to obtain a comprehensive overview of their form and function. Starting from a cognitive-interactional framework, our analysis relates formal variation in filled pauses to the functional variation. We show that formal differences in filled pauses, such as the difference between 'euh' and 'euhm', the difference in duration, the presence of surrounding silences and the speaker's eye gaze behavior, are associated with functional variation. In the study of the function of filled pauses, earlier studies can be distinguished in two approaches: the filler-as-symptom approach and the filler-as-signal approach (Clark & Fox Tree 2002, De Leeuw 2007). The filler-as-symptom perspective interprets filled pauses as symptoms of cognitive difficulties, for example when the speaker is uncertain or has trouble producing an utterance (e.g. Siegman & Pope 1965, Goldman-Eisler 1968, Christenfeld 1994). In the filler-as-signal perspective, a signaling function is attributed to filled pauses. Filled pauses are, amongst other things, claimed to signal the speaker's intention to continue the turn (Maclay & Osgood 1959), mark a delay in speech (Clark & Fox Tree 2002), structure the discourse (Rendle-Short 2004) and exit a sequence (Schegloff 2010). In this project, however, we show that filled pauses cannot be distinguished into cognitive and discursive filled pauses, but rather, that in most of their functions, these two dimensions are connected. There is an association of the complexity of the cognitive processing, and the scope of the discursive force. Both complex cognitive processing and a broad scope are reflected in the form of the filled pause: a longer duration of the filled pause, more pauses, the use of 'euhm' (instead of 'euh'), and the speaker's gaze aversion.
Borbála Keszler, and Judit Bóna, “Pausing and disfluencies in elderly speech: Longitudinal case studies,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 67-70. DOI: 10.21862/diss-09-018-kesz-bona. https://doi.org/10.21862/diss-09-018-kesz-bona.

Abstract The aim of this paper was to investigate the changes in fluency of speech during ageing. The novelty of the examination is that this is a longitudinal study: it analyses the speech of 7 speakers from middle or young-old age to old-old age. Pausing strategies and frequency of disfluencies were analyzed. Results show that active aging helps to preserve certain parameters of speech characteristics of young speakers.
Valéria Krepsz, “Vowel lengthening — Effect of position, age, and phonological quantity,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 59-62. DOI: 10.21862/diss-09-016-krepsz. https://doi.org/10.21862/diss-09-016-krepsz.

Abstract The present research examined the effect of phrase-final lengthening on the spectral structure of vowels in the spontaneous speech of children and adults. Three Hungarian vowel pairs (in quantity pairs) were analyzed in two positions: in the middle of the phrase and at the end of the phrase. The effect of lengthening on the spectral structure of the vowels were already be detected in four-year-olds. However, its extent was strongly correlated with the articulation aspects of the vowels. There was a discrepancy in the tendencies of the lengthening’s effect between the two groups of children and the adults, presumably due to different linguistic experience, inaccuracy of articulation, and significant individual differences.
Mária Laczkó, “Temporal characteristics of teenagers’ spontaneous speech and topic based narratives produced during school lessons,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 63-66. DOI: 10.21862/diss-09-017-laczko. https://doi.org/10.21862/diss-09-017-laczko.

Abstract The aim of this presentation is to analyse the articulation and speech rates of teenagers and the types of pauses in their spontaneous speech and topic based narratives during school lessons. The speech samples were analysed in terms of temporal characteristics by Praat program. The results showed the different tempo values and various function of filled pauses in the examnined situations.
Mark Liberman, “Dysfluency considered Harmful,” May 2019. https://languagelog.ldc.upenn.edu/nll/?p=42775.

Abstract … as a technical term, that is. Disfluency is no better, although the prefix is less judgmental. There are two problems: 1. These terms pathologize normal behavior, creating confusion between pathological symptoms and common phenomena in normal speech, which may be different not only in their causes and their frequency but also in behavioral detail; 2. Applied to normal speech, these terms often treat intrinsic aspects of the content and performance of spoken messages as if they were disruptions or failures.
Kikuo Maekawa, “Five pieces of evidence suggesting large lookahead in spontaneous monologue,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 7-10. DOI: 10.21862/diss-09-003-maekawa. https://doi.org/10.21862/diss-09-003-maekawa.

Abstract There is considerable disagreement among the researchers of speech production with respect to the range of lookahead or pre-planning. In this paper, five pieces of evidence suggesting the presence of relatively large lookahead in spontaneous monologues are presented, based on the analyses of the Corpus of Spontaneous Japanese. This evidence consistently suggests that the range of a lookahead is six to seven accentual phrases long, which corresponds on average to 3–4 seconds in the time domain.
Helena Moniz, “Processing disfluencies in distinct speaking styles: Idiosyncrasies and transversality,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 1-2. DOI: 10.21862/diss-09-001-moniz. https://doi.org/10.21862/diss-09-001-moniz.

Abstract This talk will tackle the idiosyncratic properties of disfluencies in distinct speaking styles, mostly university lectures (Trancoso et al., 2008) and map-task dialogues (Trancoso et al., 1998), but also featuring verbal fluency tests, and (more recently) second language learning presentations in ecological settings. It will also discuss the transversal acoustic-prosodic properties pertained across speaking styles. The main research questions are twofold: i) are there domain effects in the production of disfluencies when speakers adjust to distinct communicative contexts, as in university lectures and dialogues?; ii) if domain effects do exist, are there still acoustic-prosodic properties that can be shared across domains?
Elizabeth Morin-Lessard, and Krista Byers-Heinlein, “Uh and euh signal novelty for monolinguals and bilinguals: evidence from children and adults,” Journal of Child Language, vol. 46, no. 3, 2019, pp. 522–545. DOI: 10.1017/S0305000918000612.

Abstract Previous research suggests that English monolingual children and adults can use speech disfluencies (e.g., uh) to predict that a speaker will name a novel object. To understand the origins of this ability, we tested 48 32-month-old children (monolingual English, monolingual French, bilingual English–French; Study 1) and 16 adults (bilingual English–French; Study 2). Our design leveraged the distinct realizations of English (uh) versus French (euh) disfluencies. In a preferential-looking paradigm, participants saw familiar–novel object pairs (e.g., doll–rel), labeled in either Fluent (“Look at the doll/rel!”), Disfluent Language-consistent (“Look at thee uh doll/rel!”), or Disfluent Language-inconsistent (“Look at thee euh doll/rel!”) sentences. All participants looked more at the novel object when hearing disfluencies, irrespective of their phonetic realization. These results suggest that listeners from different language backgrounds harness disfluencies to comprehend day-to-day speech, possibly by attending to their lengthening as a signal of speaker uncertainty. Stimuli and data are available at [https://osf.io/qn6px/].
Johanna Pap, “Effects of speech rate changes on pausing and disfluencies in cluttering,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 75-78. DOI: 10.21862/diss-09-020-pap. https://doi.org/10.21862/diss-09-020-pap.

Abstract People with cluttering (PWC) often receive feedback, such as “Slow down!”, even so, this fluency disorder cannot be cured by only slowing down the speakers’ speech rate. When PWC accelerate their speech rate, language planning difficulties and word structure errors might occur, which might result in breakdowns in fluency and/or intelligibility. In the present paper characteristics of the frequency of disfluencies were examined in four different speech tasks from deliberately slow to maximum speech rate, whether speech rate changes have effects on cluttered speech. Twenty participants of this investigation were individuals suspected of cluttering with ages between 20 and 50 years of both genders. The results show that PWC are able to change, not only their speech rate but articulatory rate as well. Moreover, disfluencies were produced the most frequently in the speech task of maximum speech rate, where PWC do not have enough time for speech planning. The research provides empirical, measured data for a better insight into the nature of cluttering. Understanding the correlation between speech rate and disfluencies in cluttered speech is fundamental to improve the diagnosis of cluttering.
Brent Pitchford, and Karen M. Arnell, “Speech of young offenders as a function of their psychopathic tendencies,” Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimental, vol. 73, no. 3, 2019, pp. 193-201. DOI: 10.1037/cep0000176.

Abstract The purpose of this study was to analyse young psychopathic offenders’ speech compared with controls and to determine whether it was dissimilar. An examination of two subsets of disfluencies in speech was conducted (i.e., filled pauses and discourse markers) to explore their disfluent language. Transcripts of Psychopathy Checklist–Revised Youth Version (PCL:YV) interviews from a sample of young offenders were analysed using Wmatrix software (Rayson, 2003, 2008). The young offenders were divided into a high psychopathy group (HP; n = 13) and a low psychopathy group (LP; n = 13). HP participants included more words relating to basic needs (i.e., money, sex) in their speech than their counterparts, but not fewer words relating to social needs (i.e., family, kin), which could reflect viewing the world in a more unemotional and instrumental way by HP individuals compared with LP participants. HP participants had fewer total disfluencies and filled pauses (i.e., uh, um) in their speech than LP participants. However, the usage of discourse markers (i.e., I mean, you know, like) was similar for HP and LP participants. Like adult psychopaths, the young offenders with higher psychopathic tendencies tended to use more basic needs words in their speech. Reduced filled pause use, which has been found to be related to individual’s self-consciousness, may reflect less self-monitoring in psychopaths when they are engaging in secondary tasks (i.e., tasks that will not offer rewards). These findings provide further support that individual differences can be reflected by characteristics in speech.
Kata Baditzné Pálvölgyi, “Hesitation patterns in the Spanish spontaneous speech of Hungarian learners of Spanish,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 35-38. DOI: 10.21862/diss-09-010-badi. https://doi.org/10.21862/diss-09-010-badi.

Abstract This paper examines what native Spanish speakers find most disturbing in the pronunciation of Hungarian language learners of Spanish. Former research (Baditzné Pálvölgyi, 2019) showed that in spontaneous Spanish speech of at least threshold level Hungarian learners, one of the aspects that Spanish native speakers least tolerated was the way Hungarians hesitated. So the present paper focuses primarily on hesitation phenomena—lengthening and filled pauses—assuming that Hungarians hesitate more, and the lengthened segments are longer than the Spanish ones. In order to validate the hypothesis, an investigation comparing a corpus of Northern Spanish spontaneous speech to another corpus of advanced Hungarian learners of Spanish was conducted.
Ralph L. Rose, “The structural signaling effect of silent and filled pauses,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 19-22. DOI: 10.21862/diss-09-006-rose. https://doi.org/10.21862/diss-09-006-rose.

Abstract Filled pauses (uh, um) have been shown in a number of studies to have a facilitative effect for listeners, such as helping them better perceive the syntactic structure of ongoing speech. This may be because the extra time afforded by the filled pause gives listeners more time to process the input. Theoretically, then, silent pauses should show a comparable effect. The present study tests this prediction using a grammaticality judgment task following a study by Bailey and Ferreira (2003). Results show that filled and silent pauses have a comparable influence on listeners’ grammaticality judgments but further suggest that listeners deem silent pauses as more important and influential markers.
Ralph L. Rose, “A comparison of filled pauses in scripted and non-scripted spontaneous speech,” in The 3rd International Symposium on Linguistic Patterns in Spontaneous Speech, Taipei, Taiwan, November 2019, pp. 21-25. http://hdl.handle.net/2065/00074187.

Abstract Television and film productions are heavily scripted, but intend to portray speech as unscripted within the fiction of the dramatic universe they depict. Previous evidence (Quaglio, 2009) suggests however, that various lexical features of speech occur in such scripted spontaneous speech differently than they do in actual spontaneous speech. The present study is a comparison of the occurrence of filled pause disfluencies (in English, uh and um) in scripted spontaneous speech and actual spontaneous speech, to see if the basic usage patterns are similar. Using the English-Corpora.org web site interface, filled pauses were examined in three corpora (spontaneous speech, TV transcripts, and movie transcripts) in terms of their basic frequency of occurrence, their um:uh ratios, and their structural distribution with respect to sentence boundaries. Each was also evaluated in terms of how they shifted over time. Results show that the disfluency patterns of scripted spontaneous speech are similar in many ways to that of actual spontaneous speech. The frequency of filled pauses is similar to that shown in other major corpora and the um:uh ratio also replicates a trend observed in other work (Wieling et al, 2016; Fruehwald, 2016) suggesting an ongoing shift toward the use of um over uh but with television and film speech patterns lagging that of society.
Vered Silber-Varod, Mária Gósy, and Robert Eklund, “Segment prolongation in Hebrew,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 47-50. DOI: 10.21862/diss-09-013-silb-etal. https://doi.org/10.21862/diss-09-013-silb-etal.

Abstract In this paper we study segment prolongations (PRs), a type of disfluency sometimes included under the term “hesitation disfluencies”, in Hebrew. PRs have previously been studied in a number of other lan¬guages within a comprehensive speech disfluency framework, which is applied to Hebrew in the cur¬rent study. For the purpose of this study we defined Hebrew clitics, such as conjunctions, articles, prepositions and so on, as words. The most striking difference between Hebrew and the previously studies languages is how restricted PRs seem to be in Hebrew, occurring almost exclusively on word-final vowels. The most frequently prolonged vowel is [e]. The segment type does not affect PRs’ duration. We found significant differences between men and women regarding the frequency of PRs.
Shungo Suzuki, and Judit Kormos, “The effects of read-aloud assistance on second language oral fluency in text summary speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 31-34. DOI: 10.21862/diss-09-009-suzu-korm. https://doi.org/10.21862/diss-09-009-suzu-korm.

Abstract Focusing on text summary speaking tasks, the present study investigated the effects of the activation of phonological representations during text comprehension (operationalized by read-aloud assistance) on the subsequent retelling speech. A total of 24 Japanese learners of English completed text summary speaking tasks under two conditions: (a) reading without read-aloud assistance and (b) reading with read-aloud assistance. Their speech data were analyzed by lexical overlap indices (i.e. the ratio of characteristic single-words and multiword sequences) and by fluency measures capturing three major dimensions of fluency—speed, breakdown, and repair fluency. The results showed that read-aloud assistance directly facilitated lexical overlaps with source texts and indirectly improved speed and repair fluency. Furthermore, read-aloud assistance was found to affect the interrelationship between lexical overlaps and utterance fluency. The findings suggested that read-aloud assistance might help second language learners to store multiword sequences as a single unit (i.e. chunking) during text comprehension.
Linda Taschenberger, Outi Tuomainen, and Valerie Hazan, “Disfluencies in spontaneous speech in easy and adverse communicative situations: The effect of age,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 55-58. DOI: 10.21862/diss-09-015-tasc-etal. https://doi.org/10.21862/diss-09-015-tasc-etal.

Abstract Disfluencies are a pervasive feature of speech communication. Their function in communication is still widely discussed with some proposing that their usage might aid understanding. Accordingly, talkers may produce more disfluencies when conversing in adverse communicative situations, e.g. in background noise. Moreover, increasing age may have an effect on disfluency use as older adults report particular difficulties when communicating in adverse condi¬tions. In this study, we elicited spontaneous speech via a problem-solving task from four different age groups (19–76 years old) to investigate the effect of energetic and informational maskers on the use of filled pauses (FPs), and its interaction with age. Measures of disfluency rates, effort ratings, and communication efficiency were obtained. Results show that, against our predictions, FP usage may decrease in adverse conditions. Moreover, age does not play a great role in adults with normal hearing. The results indicate that individuals differ greatly in their disfluency adaptations, utilising different strategies to overcome challenging communicative situations.
Michiko Watanabe, Yusaku Korematsu, and Yuma Shirahata, ““Uh” is preferred by male speakers in informal presentations in American English,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 43-46. DOI: 10.21862/diss-09-012-wata-etal. https://doi.org/10.21862/diss-09-012-wata-etal.

Abstract This study investigates factors that are likely to be related to speakers' choice of filler type between uh and um in English, using an informal presentation speech corpus. The effects of the following factors on the probability of each filler type was examined: (1) immediately preceding clause boundary depth, (2) clause size measured as the number of words in the clause, (3) the number of quotation remarks in the clause, and (4) speaker's sex. The filler probabilities increased with the boundary depths. This trend was much stronger with um than with uh. Ums are more likely to appear clause-initially than uhs. Clause size had similar effect sizes on the two filler types. The number of quotation remarks had a stronger negative effect with ums. Speaker's sex had a significant effect only with uhs. Uhs are used more frequently by male speakers than by female speakers. The results indicate that speakers' choice of filler type is affected by the combination of multiple factors with various effect sizes.
Hong Zhang, “Variation in the choice of filled pause: A language change, or a variation in meaning?,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 15-18. DOI: 10.21862/diss-09-005-zhang. https://doi.org/10.21862/diss-09-005-zhang.

Abstract The role of filled pauses in message structuring is a heavily debated question, but the result is still somewhat inconclusive. In this study, I consider this question jointly with sociolinguistic factors that have been thought to affect the choice of filled pause in American English. The results suggest that the use of uh is subject to higher variability across not only age groups, but also conversation topics and interlocutors. A latent semantic analysis found consistent difference between two forms of filled pause and silent pauses of varying duration in the primary latent dimension, but similarity between short silent pause and uh, as well as long silent pause and um in the second dimension. Therefore, the functional difference between um and uh should be acknowledged, and the observed change in their relative popularity is potentially related to their different meaning or function in the discourse.
Derya Çokal, Vitor Zimmerer, Douglas Turkington, Nicol Ferrier, Rosemary Varley, Stuart Watson, and Wolfram Hinzen, “Disturbing the rhythm of thought: Speech pausing patterns in schizophrenia, with and without formal thought disorder,” PLOS ONE, vol. 14, no. 5, 05 2019, pp. 1-14. DOI: 10.1371/journal.pone.0217404. https://doi.org/10.1371/journal.pone.0217404.

Abstract Everyday speech is produced with an intricate timing pattern and rhythm. Speech units follow each other with short interleaving pauses, which can be either bridged by fillers (erm, ah) or empty. Through their syntactic positions, pauses connect to the thoughts expressed. We investigated whether disturbances of thought in schizophrenia are manifest in patterns at this level of linguistic organization, whether these are seen in first degree relatives (FDR) and how specific they are to formal thought disorder (FTD). Spontaneous speech from 15 participants without FTD (SZ-FTD), 15 with FTD (SZ+FTD), 15 FDRs and 15 neurotypical controls (NC) was obtained from a comic strip retelling task and rated for pauses subclassified by syntactic position and duration. SZ-FTD produced significantly more unfilled pauses than NC in utterance-initial positions and before embedded clauses. Unfilled pauses occurring within clausal units did not distinguish any groups. SZ-FTD also differed from SZ+FTD in producing significantly more pauses before embedded clauses. SZ+FTD differed from NC and FDR only in producing longer utterance-initial pauses. FDRs produced significantly fewer fillers than NC. Results reveal that the temporal organization of speech is an important window on disturbances of the thought process and how these relate to language.

2018

Nawal Fadhil Abbas, Ru'aa Tariq Jawad, and Maysoon Tahir Muhi, “Pauses and Hesitations in Drama Texts,” International Journal of English Linguistics, vol. 8, no. 4, 2018, pp. 106-114. DOI: 10.5539/ijel.v8n4p106. http://www.ccsenet.org/journal/index.php/ijel/article/view/74205.

Abstract Pauses and hesitations are phenomena that can be found in speech. They can help both the speaker and the hearer, due to the functions they have in a dialogue. Their occurrence in speech has a value that they make it more understandable. In this regard, the researchers intend to critically examine the pauses and hesitations used in the two texts as well as their functions. The present paper aims to identify the types of pauses and hesitations used by Pinter’s The Homecoming and Baker’s Circle Mirror Transformation as well as the functions they serve and to compare both playwrights in this regard. To do so, the sequential production approach of turn taking, in combination with the contributions of some scholars who state the multifunctional use of pauses and hesitations, has been used. The findings of the present study show that pauses and hesitations do not exist arbitrarily in speech but they are found to serve certain functions depending on the context in which they occur. Regarding the two selected extracts, it is noticed from the comparison that the two writers do not use pauses and hesitations equally. Baker uses them more frequently than Pinter due to the context in which they are used which requires using pauses to aid comprehension.
Ayşe Altıparmak, and Gülmira Kuruoğlu, “An Analysis of Speech Disfluencies of Turkish Speakers Based on Age Variable,” Journal of Psycholinguistic Research, Jan 2018. DOI: 10.1007/s10936-017-9553-4. https://doi.org/10.1007/s10936-017-9553-4.

Abstract The focus of this research is to verify the influence of the age variable on fluent Turkish native speakers’ production of the various types of speech disfluencies. To accomplish this, four groups of native speakers of Turkish between ages 4–8, 18–23, 33–50 years respectively and those over 50-year-olds were constructed. A total of 84 participants took part in this study. Prepared and unprepared speech samples of at least 300 words were collected from each participant via face-to-face interviews that were tape recorded and transcribed; for practical reasons, only the unprepared speech samples were collected from children. As a result, for the prepared speech situation, there was no statistically significant difference in terms of age in the production rates of filled gaps, false starts, slips of the tongue and repetitions; however, participants in the over 50-year-old group produced more hesitations and prolongations than participants in the 18–23 and 33–50-year-old groups. For the unprepared speech situation, age variable was not effective on the production rates of filled gaps. However, 4–8 and over 50-year-old participants produced more hesitations and prolongations than the 18–23 and 33–50-year-old groups. 4–8-year-old children produced more slips of the tongue than the 18–23 and 33–50-year-old groups, and more false starts and repetitions than the participants in the other three age groups (18–23, 33–50, over 50). Further analyses revealed more extensive insights related to the types of disfluencies, the position of disfluencies, and the linguistic units involved in disfluency production in Turkish speech.

Keywords linguistics, Speech disfluencies, Speech production, Turkish speech
Yu-Lin Cheng, “Unfamiliar Accented English Negatively Affects EFL Listening Comprehension: It Helps to be a More Able Accent Mimic,” Journal of Psycholinguistic Research, Feb 2018. DOI: 10.1007/s10936-018-9562-y. https://doi.org/10.1007/s10936-018-9562-y.

Abstract In this study, EFL learners who listened to four short context-rich audio files each delivered in an unfamiliar English accent were required to produce best-attempt transcriptions and accent imitation recordings. Results indicate that exposure alone does not suffice to eliminate accent impact on EFL listeners. Importantly, results from one-way ANOVA analyses reveal between-participants differences in residual accent impact, vocabulary knowledge, and quality of accent imitation. Results from a linear mixed-effects model analysis, while suggesting that other unidentified factors may also assist EFL listeners in processing unfamiliar accented English, demonstrate that the more able mimics cope more successfully with unfamiliar accents than the less able mimics. Counter-intuitively, vocabulary knowledge is rejected as a predictor for success in reducing accent impact. A logical explanation for this particular finding is that a larger vocabulary repertoire aids listeners where there is no interference from unfamiliar accents. Given these findings, to better prepare EFL listeners for the English-as-an-International-Language world, training should include both listening to a variety of native and non-native accents and performing accent imitation (reproduction) exercises to further expand listeners’ phonological-phonetic flexibility.

Keywords Accent imitation, Accent impact, Chinese-L1, EFL
Felix Ball, Lara E. Michels, Carsten Thiele, and Toemme Noesselt, “The role of multisensory interplay in enabling temporal expectations,” Cognition, vol. 170, no. Supplement C, 2018, pp. 130 - 146. DOI: 10.1016/j.cognition.2017.09.015. http://www.sciencedirect.com/science/article/pii/S0010027717302585.

Abstract Temporal regularities can guide our attention to focus on a particular moment in time and to be especially vigilant just then. Previous research provided evidence for the influence of temporal expectation on perceptual processing in unisensory auditory, visual, and tactile contexts. However, in real life we are often exposed to a complex and continuous stream of multisensory events. Here we tested – in a series of experiments – whether temporal expectations can enhance perception in multisensory contexts and whether this enhancement differs from enhancements in unisensory contexts. Our discrimination paradigm contained near-threshold targets (subject-specific 75% discrimination accuracy) embedded in a sequence of distractors. The likelihood of target occurrence (early or late) was manipulated block-wise. Furthermore, we tested whether spatial and modality-specific target uncertainty (i.e. predictable vs. unpredictable target position or modality) would affect temporal expectation (TE) measured with perceptual sensitivity (d′) and response times (RT). In all our experiments, hidden temporal regularities improved performance for expected multisensory targets. Moreover, multisensory performance was unaffected by spatial and modality-specific uncertainty, whereas unisensory TE effects on but not RT were modulated by spatial and modality-specific uncertainty. Additionally, the size of the temporal expectation effect, i.e. the increase in perceptual sensitivity and decrease of RT, scaled linearly with the likelihood of expected targets. Finally, temporal expectation effects were unaffected by varying target position within the stream. Together, our results strongly suggest that participants quickly adapt to novel temporal contexts, that they benefit from multisensory (relative to unisensory) stimulation and that multisensory benefits are maximal if the stimulus-driven uncertainty is highest. We propose that enhanced informational content (i.e. multisensory stimulation) enables the robust extraction of temporal regularities which in turn boost (uni-)sensory representations.

Keywords Auditory dominance, Multisensory interplay, Redundant target, Spatial coincidence, Temporal expectation, Temporal orienting
Jia E. Loy, Hannah Rohde, and Martin Corley, “Cues to Lying May be Deceptive: Speaker and Listener Behaviour in an Interactive Game of Deception,” Journal of Cognition, vol. 1, no. 1, 2018, pp. 1-21. DOI: 10.5334/joc.46.

Abstract Are the cues that speakers produce when lying the same cues that listeners attend to when attempting to detect deceit? We used a two-person interactive game to explore the production and perception of speech and nonverbal cues to lying. In each game turn, participants viewed pairs of images, with the location of some treasure indicated to the speaker but not to the listener. The speaker described the location of the treasure, with the objective of misleading the listener about its true location; the listener attempted to locate the treasure, based on their judgement of the speaker’s veracity. In line with previous comprehension research, listeners’ responses suggest that they attend primarily to behaviours associated with increased mental difficulty, perhaps because lying, under a cognitive hypothesis, is thought to cause an increased cognitive load. Moreover, a mouse-tracking analysis suggests that these judgements are made quickly, while the speakers’ utterances are still unfolding. However, there is a surprising mismatch between listeners and speakers: When producing false statements, speakers are less likely to produce the cues that listeners associate with lying. This production pattern is in keeping with an attempted control hypothesis, whereby liars may take into account listeners’ expectations and correspondingly manipulate their behaviour to avoid detection.

Keywords Deception; Communication; Pragmatics; Disfluency
Emi Morita, and Tomoyo Takagi, “Marking “commitment to undertaking of the task at hand”: Initiating responses with eeto in Japanese conversation,” Journal of Pragmatics, vol. 124, January 2018, pp. 31-49. DOI: 10.1016/j.pragma.2017.12.002. http://www.sciencedirect.com/science/article/pii/S0378216617302515.

Abstract Eeto is one of the most frequently occurring Japanese vocal markers. Often characterized as a mere time-buyer, or “filler”, this token has been frequently said to reflect ongoing internal cognitive processing or reflection. Examining naturally occurring instances of eeto by focusing on its occurrences at the beginning of responses to information–seeking questions, however, we found that eeto-prefaced responses all provide a carefully constructed answer in contexts where the responses might otherwise be heard as not aligning in the most straightforward way. We argue that eeto affords Japanese conversationalists a way through which they can project the maximally prosocial stance of interactional commitment to undertaking the task at hand. Rather than a marker of an internal processing state, eeto, we argue, is instead a useful linguistic resource to publically display a respectful stance toward the questioner while the respondent is carefully building an appropriately contextualized response.

Keywords Japanese, Fillers, Conversation analysis, Turn beginnings, stance
Matthew Purver, Julian Hough, and Christine Howes, “Computational Models of Miscommunication Phenomena,” Topics in Cognitive Science, 3 2018. DOI: 10.1111/tops.12324. http:https://doi.org/10.1111/tops.12324.

Abstract Miscommunication phenomena such as repair in dialogue are important indicators of the quality of communication. Automatic detection is therefore a key step toward tools that can characterize communication quality and thus help in applications from call center management to mental health monitoring. However, most existing computational linguistic approaches to these phenomena are unsuitable for general use in this way, and particularly for analyzing human–human dialogue: Although models of other-repair are common in human-computer dialogue systems, they tend to focus on specific phenomena (e.g., repair initiation by systems), missing the range of repair and repair initiation forms used by humans; and while self-repair models for speech recognition and understanding are advanced, they tend to focus on removal of “disfluent” material important for full understanding of the discourse contribution, and/or rely on domain-specific knowledge. We explain the requirements for more satisfactory models, including incrementality of processing and robustness to sparsity. We then describe models for self- and other-repair detection that meet these requirements (for the former, an adaptation of an existing repair model; for the latter, an adaptation of standard techniques) and investigate how they perform on datasets from a range of dialogue genres and domains, with promising results.

Keywords Dialogue, disfluency, Incrementality, Miscommunication, Parallelism, repair, Sparsity
Jennifer M. Roche, and Hayley S. Arnold, “The Effects of Emotion Suppression During Language Planning and Production,” Journal of Speech, Language, and Hearing Research, vol. 61, no. 8, August 2018, pp. 2076-2083. DOI: 10.1044/2018_JSLHR-L-17-0232. https://pubs.asha.org/doi/10.1044/2018_JSLHR-L-17-0232.

Abstract Purpose: Emotion regulation and language planning occur in parallel during interactive communication, but their processes are often studied separately. It has been suggested that emotion suppression and more complex language production both recruit cognitive resources. However, it is currently less clear how the language planning and production system is impacted when required to emotionally suppress outward displays of affect (i.e., expressive suppression). The purpose of the current study was to evaluate the interactive effects of emotion regulation and language production processes. | Method: Through discourse analysis of a corpus of interactive dialogue, we evaluated the production of interjections (i.e., also termed “filled pauses,” a type of speech disfluency) when participants regulated outward displays of emotion and when language was lexically complex (i.e., via lexical diversity). One participant (the sender) was assigned to either express or suppress affective displays during the interaction. The other person (the receiver) was given no special instructions before the interaction. The interactions were transcribed, and their linguistic content (i.e., lexical diversity, lexical alignment, and interjections) was analyzed. | Results: Results indicated that participants actively suppressing outward displays of affect produced more interjections and that participants asked to emotionally regulate, both expressors and suppressors, were more disfluent when producing lexically diverse statements (2 cognitively demanding tasks). | Conclusions: The current research provides support that, when suppressing emotion, one might be more disfluent when speaking. However, also when engaged in 2 simultaneous, demanding tasks of having to either upregulate or downregulate emotions and utter lexically diverse statements, the combined cognitive load may impede fluency in language production. More specifically, in the context of language planning and production, emotion suppression may pilfer resources away from the language planning and production system, leading to higher rates of disfluent speech. This finding is of particular importance because understanding the interactive effects of emotion and language production may be impactful to interventions for communication disorders.
Julie Sedivy, “Your Speech Is Packed With Misunderstood, Unconscious Messages,” March 2018. http://nautil.us/blog/-your-speech-is-packed-with-misunderstood-unconscious-messages.

Abstract Imagine standing up to give a speech in front of a critical audience. As you do your best to wax eloquent, someone in the room uses a clicker to conspicuously count your every stumble, hesitation, um and uh; once you’ve finished, this person loudly announces how many of these blemishes have marred your presentation...
Sophia Uddin, Shannon L.M. Heald, Stephen C. Van Hedger, Serena Klos, and Howard C. Nusbaum, “Understanding environmental sounds in sentence context,” Cognition, vol. 172, 2018, pp. 134 - 143. DOI: 10.1016/j.cognition.2017.12.009. https://www.sciencedirect.com/science/article/pii/S0010027717303293.

Abstract There is debate about how individuals use context to successfully predict and recognize words. One view argues that context supports neural predictions that make use of the speech motor system, whereas other views argue for a sensory or conceptual level of prediction. While environmental sounds can convey clear referential meaning, they are not linguistic signals, and are thus neither produced with the vocal tract nor typically encountered in sentence context. We compared the effect of spoken sentence context on recognition and comprehension of spoken words versus nonspeech, environmental sounds. In Experiment 1, sentence context decreased the amount of signal needed for recognition of spoken words and environmental sounds in similar fashion. In Experiment 2, listeners judged sentence meaning in both high and low contextually constraining sentence frames, when the final word was present or replaced with a matching environmental sound. Results showed that sentence constraint affected decision time similarly for speech and nonspeech, such that high constraint sentences (i.e., frame plus completion) were processed faster than low constraint sentences for speech and nonspeech. Linguistic context facilitates the recognition and understanding of nonspeech sounds in much the same way as for spoken words. This argues against a simple form of a speech-motor explanation of predictive coding in spoken language understanding, and suggests support for conceptual-level predictions.

Keywords Constraint, Context, Environmental sound perception, Language, Recognition, speech perception
Sylvie Hancil, “Discourse coherence and intersubjectivity: The development of final but in dialogues,” Language Sciences, 2018. DOI: 10.1016/j.langsci.2017.12.002. http://www.sciencedirect.com/science/article/pii/S0388000117300852.

Abstract All the studies on final particles in non-Asian languages systematically propose a synchronic view of the constructions under consideration. This paper closes the gap by offering a diachronic analysis of final but in dialogues in a corpus of Northern English over a sixty-year period. Relying on Schiffrin’s (1987) planes of discourse and Hasselgård’s (2006) definition of a modal particle, it is shown that final but has semantic–pragmatic properties of both a discourse marker and a modal particle. A socio-linguistic approach complements the analysis. Besides, the modal values identified are discussed in relation to Traugott’s (1982) and Traugott and Dasher’s (2002) theories of language change. Finally, it is explained how final but can be inserted in the category of final particles.

Keywords Discourse value, Final particles, language change, Modal value, Northern English, Socio-linguistic parameters

2017

Jens Allwood, “Fluency or disfluency?,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 1-4. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract In this paper, I investigate the concepts of “fluency” and “disfluency” and argue that the application of the two concepts must be relativized to type of communicative activity. It is not clear that there is a generic sense of fluency or disfluency, rather what contributes to fluency and disfluency depends on what type of communication we are dealing with. The paper then turns to a brief investigation of what makes interactive face-to-face communication fluent or disfluent and argues that many of the features that have been labeled as disfluent, in fact, contribute to the fluency of interactive communication. Finally, I suggest that maybe it is time for a change of terminology and abandon the term “disfluent” for more positive or neutral terminology.

Keywords DiSS
Ana Rita S. Valente, Kenneth O. St. Louis, Margaret Leahy, Andreia Hall, and Luis M.T. Jesus, “A country-wide probability sample of public attitudes toward stuttering in Portugal,” Journal of Fluency Disorders, vol. 52, 2017, pp. 37 - 52. DOI: http://dx.doi.org/10.1016/j.jfludis.2017.03.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300249.

Abstract Background. Negative public attitudes toward stuttering have been widely reported, although differences among countries and regions exist. Clear reasons for these differences remain obscure. | Purpose. Published research is unavailable on public attitudes toward stuttering in Portugal as well as a representative sample that explores stuttering attitudes in an entire country. This study sought to (a) determine the feasibility of a country-wide probability sampling scheme to measure public stuttering attitudes in Portugal using a standard instrument (the "Public Opinion Survey of Human Attributes–Stuttering" ["POSHA–S"]) and (b) identify demographic variables that predict Portuguese attitudes. | Methods. The POSHA–S was translated to European Portuguese through a five-step process. Thereafter, a local administrative office-based, three-stage, cluster, probability sampling scheme was carried out to obtain 311 adult respondents who filled out the questionnaire. | Results. The Portuguese population held stuttering attitudes that were generally within the average range of those observed from numerous previous POSHA–S samples. Demographic variables that predicted more versus less positive stuttering attitudes were respondents’ age, region of the country, years of school completed, working situation, and number of languages spoken. Non-predicting variables were respondents’ sex, marital status, and parental status. | Conclusion. A local administrative office-based, probability sampling scheme generated a respondent profile similar to census data and indicated that Portuguese attitudes are generally typical.

Keywords Representative Sampling
Anne Ruth van Leeuwen, Right on time. Utrecht, the Netherlands: Netherlands Graduate School of Linguistics / Landelijke (LOT).2017, pp. 155. https://www.lotpublications.nl/right-on-time.

Abstract When a conversation is running smoothly, you know exactly when to nod, hum, or when to start your turn. You feel understood and connected, and you sense that your conversational partner feels the same. However, a conversation may also contain awkward silences, simultaneous starts, and an overall feeling of stuttering and stammering. During such conversations, you are often left with feelings of distance and mutual incomprehension. | Many people share the intuition that the expression of ‘being in sync’ with someone means that you are somehow in tune, in agreement, or in harmony with the other. This dissertation explores whether this intuition is correct; it investigates whether specific temporal patterns between turn-taking speakers, including synchronization of speech rhythms, shape the affective impression of speakers in conversation. The answer to this question can broaden our understanding of the affective push-and-pull of spoken interaction that we experience every day. | This question was explored by presenting participants with short fragments of dialogues between speakers in which we manipulated the temporal patterns between those speakers. Participants were then asked to rate the perceived degree of affiliation between the speakers of those fragments. In the last study of this dissertation we also recorded participants’ real-time affective response during listening to these fragments. We found that, in addition to the presence of overlapping talk, responding too early given the beat of the previous speaker conveys disaffiliation. ‘Being in sync’ is not just a figure of speech, but a real sign of affiliation in spoken dialogue.
Malte Belz, “Glottal filled pauses in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 5-8. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract For German, filled pauses are traditionally described with a vocalic form äh and a vocalic-nasal form ähm. A corpus-based approach and a closer phonetic inspection is used here to argue for an additional form, namely glottal filled pauses. In the data analysed for this study, the glottal form is produced by all seven speakers and amounts to 21% of all filled pauses. Contexts and durations of occurrences are discussed and compared to earlier studies on traditional filled pauses. It is suggested that the glottal variant should be considered in future studies on filled pauses and disfluencies.

Keywords DiSS
Axel Bergström, Martin Johansson, and Robert Eklund, “Differences in production of disfluencies in children with typical language development and children with mixed receptive-expressive language disorder,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 9-12. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract There are several studies about non-fluency in people who stutter, but comparatively few regarding children with language impairment. The current research body regarding disfluencies in children with language impairment has been using different study-designs and definitions, making some results rather contradictory. The purpose of the present study is to expand the knowledge about disfluencies in children with language impairment and compare the occurrence of disfluencies between children with language impairment and children with typical language development in the same age group. A total of ten children with language impairment and six children with typical language development participated in this study. The subjects were recorded when talking freely about a thematic picture or toys and then analysed by calculating disfluencies per 50 words including frequency of different kinds of disfluencies according to Johnson and Associates’ (1959) classic taxonomy. Our results show that children with language impairment do produce statistically significant more disfluency in general, notably sound and syllable repetition, broken words and prolongations.

Keywords DiSS
Simon Betz, Robert Eklund, and Petra Wagner, “Prolongation in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 13-16. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract We investigate segment prolongation as a means of disfluent hesitation in spontaneous German speech. We describe phonetic and structural features of disfluent prolongation and compare it to data of other languages and to non-disfluent prolongations.

Keywords DiSS
Miriam Bilac, Marine Chamoux, and Angelica Lim, “Gaze and filled pause detection for smooth human-robot conversations,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), 2017, pp. 297-304. DOI: 10.1109/HUMANOIDS.2017.8246889.

Abstract Let the human speak! Interactive robots and voice interfaces such as Pepper, Amazon Alexa, and OK Google are becoming more and more popular, allowing for more natural interaction compared to screens or keyboards. One issue with voice interfaces is that they tend to require a “robotic” flow of human speech. Humans must be careful to not produce disfluencies, such as hesitations or extended pauses between words. If they do, the agent may assume that the human has finished their speech turn, and interrupts them mid-thought. Interactive robots often rely on the same limited dialogue technology built for speech interfaces. Yet humanoid robots have the potential to also use their vision systems to determine when the human has finished their speaking turn. In this paper, we introduce HOMAGE (Human-rObot Multimodal Audio and Gaze End-of-turn), a multimodal turntaking system for conversational humanoid robots. We created a dataset of humans spontaneously hesitating when responding to a robot's open-ended questions such as, “What was your favorite moment this year?”. Our analyses found that users produced both auditory filled pauses such as “uhhh”, as well as gaze away from the robot to keep their speaking turn. We then trained a machine learning system to detect the auditory filled pauses and integrated it along with gaze into the Pepper humanoid robot's real-time dialog system. Experiments with 28 naive users revealed that adding auditory filled pause detection and gaze tracking significantly reduced robot interruptions. Furthermore, user turns were 2.1 times longer (without repetitions), suggesting that this strategy allows humans to express themselves more, toward less time pressure and better robot listeners.

Keywords Speech; Humanoid robots; Feature extraction; Human-robot interaction; Training; Real-time systems; human computer interaction; humanoid robots; human-robot interaction; interactive systems; learning (artificial intelligence); man-machine systems; robot vision; Gaze; human-robot conversations; interactive robots; voice interfaces; Amazon Alexa; OK Google; natural interaction; robotic flow; human speech; humans; speech turn; speech interfaces; vision systems; speaking turn; Human-rObot Multimodal Audio; multimodal turntaking system; conversational humanoid robots; auditory filled pauses; Pepper humanoid robot; auditory filled pause detection; robot interruptions; user turns; robot listeners
Hans Rutger Bosker, “How our own speech rate influences our perception of others.,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 43, no. 8, 08/2017 2017, pp. 1225-1238. DOI: 10.1037/xlm0000381. http://psycnet.apa.org/record/2017-01854-001.

Abstract In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved)
Hans Rutger Bosker, Eva Reinisch, and Matthias J. Sjerps, “Cognitive load makes speech sound fast, but does not modulate acoustic context effects,” Journal of Memory and Language, vol. 94, 2017, pp. 166 - 176. DOI: 10.1016/j.jml.2016.12.002. http://www.sciencedirect.com/science/article/pii/S0749596X16302492.

Abstract In natural situations, speech perception often takes place during the concurrent execution of other cognitive tasks, such as listening while viewing a visual scene. The execution of a dual task typically has detrimental effects on concurrent speech perception, but how exactly cognitive load disrupts speech encoding is still unclear. The detrimental effect on speech representations may consist of either a general reduction in the robustness of processing of the speech signal (‘noisy encoding’), or, alternatively it may specifically influence the temporal sampling of the sensory input, with listeners missing temporal pulses, thus underestimating segmental durations (‘shrinking of time’). The present study investigated whether and how spectral and temporal cues in a precursor sentence that has been processed under high vs. low cognitive load influence the perception of a subsequent target word. If cognitive load effects are implemented through ‘noisy encoding’, increasing cognitive load during the precursor should attenuate the encoding of both its temporal and spectral cues, and hence reduce the contextual effect that these cues can have on subsequent target sound perception. However, if cognitive load effects are expressed as ‘shrinking of time’, context effects should not be modulated by load, but a main effect would be expected on the perceived duration of the speech signal. Results from two experiments indicate that increasing cognitive load (manipulated through a secondary visual search task) did not modulate temporal (Experiment 1) or spectral context effects (Experiment 2). However, a consistent main effect of cognitive load was found: increasing cognitive load during the precursor induced a perceptual increase in its perceived speech rate, biasing the perception of a following target word towards longer durations. This finding suggests that cognitive load effects in speech perception are implemented via ‘shrinking of time’, in line with a temporal sampling framework. In addition, we argue that our results align with a model in which early (spectral and temporal) normalization is unaffected by attention but later adjustments may be attention-dependent.

Keywords Acoustic context, cognitive load, Rate normalization, Spectral normalization
Shin Ying Chu, Naomi Sakai, Koichi Mori, and Lisa Iverach, “Japanese normative data for the Unhelpful Thoughts and Beliefs about Stuttering (UTBAS) Scales for adults who stutter,” Journal of Fluency Disorders, vol. 51, 03/2017 2017, pp. 1-7. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.09.006. http://www.sciencedirect.com/science/article/pii/S0094730X16300274.

Abstract Purpose. This study reports Japanese normative data for the Unhelpful Thoughts and Beliefs about Stuttering (UTBAS) scales. We outline the translation process, and evaluate the psychometric properties of the Japanese version of the UTBAS scales. | Methods. The translation of the UTBAS scales into Japanese (UTBAS-J) was completed using the standard forward-backward translation process, and was administered to 130 Japanese adults who stutter. To validate the UTBAS-J scales, scores for the Japanese and Australian cohorts were compared. Spearman correlations were conducted between the UTBAS-J and the Modified Erickson Communication Attitude scale (S-24), the self-assessment scale of speech (SA scale), and age. The test-retest reliability and internal consistency of the UTBAS-J were assessed. Independent t-tests were conducted to evaluate the differences in the UTBAS-J scales according to gender, speech treatment experience, and stuttering self-help group participation experience. | Results. The UTBAS-J showed good test-retest reliability, high internal consistency, and moderate to high significant correlations with S-24 and SA scale. A weak correlation was found between the UTBAS-J scales with age. No significant relationships were found between UTBAS-J scores, gender and speech treatment experience. However, those who participated in the stuttering self-help group demonstrated lower UTBAS-J scores than those who did not. | Conclusion. Given the current scarcity of clinical assessment tools for adults who stutter in Japan, the UTBAS-J holds promise as an assessment tool and outcome measure for use in clinical and research environments.

Keywords Assessment, Japanese, Psychosocial issues, Questionnaire, stuttering
Jennifer Cole, Timothy Mahrt, and Joseph Roy, “Crowd-sourcing prosodic annotation,” Computer Speech & Language, 2017, pp. -. DOI: http://dx.doi.org/10.1016/j.csl.2017.02.008. http://www.sciencedirect.com/science/article/pii/S0885230816302455.

Abstract Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

Keywords Speech transcription
Ludivine Crible, Liesbeth Degand, and Gaëtanelle Gilquin, “The clustering of discourse markers and filled pauses A corpus-based French-English study of (dis)fluency,” Languages in Contrast, vol. 17, 02/2017 2017, pp. 69-95. DOI: 10.1075/lic.17.1.04cri. http://www.jbe-platform.com/content/journals/10.1075/lic.17.1.04cri.

Abstract This article presents a corpus-based contrastive study of (dis)fluency in French and English, focusing on the clustering of discourse markers (DMs) and filled pauses (FPs) across various spoken registers. Starting from the hypothesis that markers of (dis)fluency, or ‘fluencemes’, occur more frequently in sequences than in isolation, and that their contribution to the relative fluency of discourse can only be assessed by taking into account the contextual distribution of these sequences, this study uncovers the specific contextual conditions that trigger the clustering of fluencemes in the two languages. First, the contexts of appearance of DMs and FPs are described separately, both in English and French, focusing on their distribution, position and co-occurrence patterns. Then, the combination of DMs and FPs in sequences and their different configurations (DM+FP, FP+DM, etc.) are investigated. Overall, it appears that FPs function differently depending on whether they are clustered with DMs or not, and this difference consists in either maintaining or erasing inter- and intra-linguistic contrasts.

Keywords comparable corpus, Discourse markers, English/French, filled pauses, Fluency
Jillian Donahue, Christine Schoepfer, and Robin Lickley, “The effects of disfluent repetitions and speech rate on recall accuracy in a discourse listening task,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 17-20. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract disfluency on word recognition and local syntactic or semantic issues, fewer have addressed the impact on comprehension at a discourse level. In this work, we ask what effects features typical in the pathological condition of cluttering (essentially, rapid, disfluent and unintelligible speech) have on our ability to retain the information conveyed in speech. Specifically, we manipulate repetition disfluencies and speech rate in passages of running speech. Forty participants listened to four recordings of passages presented in four conditions: Control, Rapid, Disfluent, Rapid + Disfluent. They were asked to recall details of the passages and rate their speed, fluency and comprehensibility. Both repetition disfluencies and increased speech rate significantly reduced recall of information from discourse. Though no relationship was found between the working memory span of individuals and information recall, we argue that the cognitive load of these features of cluttered speech significantly affects intelligibility and thus recall of speech.

Keywords DiSS
Megan Drevets, and Robin Lickley, “A psycholinguistic exploration of disfluency behaviour during the tip-of-the-tongue phenomenon,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 21-24. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract A tip-of-the-tongue state (TOT) occurs when a speaker knows a word but cannot retrieve its phonological form from memory. While previous studies have found that disfluencies are related to lexical retrieval difficulties, the literature lacks studies which have specifically investigated the impact of TOTs on disfluency. This study explores the relationship between TOTs and such disfluency behaviours as hesitations and target approximations (i.e. incorrect attempts to produce targets). TOTs were induced using the TOTimal method (Smith, Brown & Balfour, 1991), where participants memorised and retrieved the names of imaginary animals. Speech samples were analysed for TOTs and disfluencies. Disfluency rates increased with retrieval times during resolved TOTs. Additionally, target approximation rates correlated with the rates of both TOTs and “Don’t Know” responses, suggesting that target approximations are not unique to TOTs but are indicative of general uncertainty during lexical retrieval.

Keywords DiSS
Gary Geunbae Lee, Ho-Young Lee, Jieun Song, Byeongchang Kim, Sechun Kang, Jinsik Lee, and Hyosung Hwang, “Automatic sentence stress feedback for non-native English learners,” Computer Speech & Language, vol. 41, 2017, pp. 29 - 42. DOI: http://dx.doi.org/10.1016/j.csl.2016.04.003. http://www.sciencedirect.com/science/article/pii/S0885230816301759.

Abstract This paper proposes a sentence stress feedback system in which sentence stress prediction, detection, and feedback provision models are combined. This system provides non-native learners with feedback on sentence stress errors so that they can improve their English rhythm and fluency in a self-study setting. The sentence stress feedback system was devised to predict and detect the sentence stress of any practice sentence. The accuracy of the prediction and detection models was 96.6% and 84.1%, respectively. The stress feedback provision model offers positive or negative stress feedback for each spoken word by comparing the probability of the predicted stress pattern with that of the detected stress pattern. In an experiment that evaluated the educational effect of the proposed system incorporated in our CALL system, significant improvements in accentedness and rhythm were seen with the students who trained with our system but not with those in the control group.

Keywords CALL
Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in chat and chunk phases of multiparty casual talk,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 25-28. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Multiparty casual conversation lasting more than a few minutes can be viewed as a series of phases of chat and chunk type interaction, where chat is interactive conversation with several participants taking turns, and chunk refers to phases where one participant dominates the conversation, often by telling a story or giving an opinion. We investigate the distribution of disfluency in these phases in a 70-minute 5-party conversation where participants had no practical task to perform. This pilot study shows differences in the distribution of disfluency types and frequency in the two phases.

Keywords DiSS
Mária Gósy, Dorottya Gyarmathy, and András Beke, “Phonetic analysis of filled pauses based on a Hungarian-English learner corpus,” International Journal of Learner Corpus Research, vol. 3, 12/2017 2017, pp. 149-174. DOI: 10.1075/ijlcr.3.2.03gos. http://www.jbe-platform.com/content/journals/10.1075/ijlcr.3.2.03gos.

Abstract Filled pauses may reveal speech planning or execution problems to a greater extent in L2 spontaneous speech than in L1. The purpose of this study was to analyze the forms and position of all filled pauses, and the durations and the formants of vocalic filled pauses in English (L2) and in Hungarian (L1) spontaneous speech produced by 30 young learners with various L2 proficiency levels using data from our HunEng-D learner corpus. The findings showed that the forms of filled pauses were similar in both languages, irrespective of level of language proficiency. Results confirmed significantly longer vocalic filled pauses in basic and intermediate learners in their L2 relative to their more advanced peers. Formant values (as acoustic reflections of vowel quality) indicated very similar articulatory configurations for all vocalic filled pauses, irrespective of language and language proficiency.

Keywords acoustics of vocalic filled pauses, duration, HunEng-D corpus, proficiency level
Mária Gósy, and Robert Eklund, “Segment prolongation in Hungarian,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 29-32. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Segment prolongation (PR) has been shown to be one of the most common forms of non-pathological speech disfluencies (Eklund, 2001). The distribution of PRs in the word (initial–medial–final segment) seems to vary between languages of different syllable-structure complexity, making it interesting to study segment prolongation in languages that exhibit different syllable structure characteristics. Previous studies have studied languages with complex syllable structure, such as English and Swedish (Eklund & Shriberg, 1998; Eklund, 2001, 2004) where affixation creates complex consonant clusters, and languages with very simple syllable, such as Japanese (Den, 2003) or Tok Pisin (Eklund, 2001, 2004), as well as Mandarin Chinese (Lee et al., 2004). In this paper we study PRs in Hungarian. Our results indicate that PRs in Hungarian are more similar to English and Swedish than it is to Japanese, Tok Pisin or Mandarin Chinese, which lends support to the notion that underlying morphology plays a role in how PRs is realised.

Keywords DiSS
Peter Howell, Kaho Yoshikawa, Kevin Tang, John Harris, and Clarissa Sorger, “Intervention for word-finding difficulty for children starting school who have diverse language backgrounds,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 33-36. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Children who have word-finding difficulty can be identified by the pattern of disfluencies in their spontaneous speech; in particular whole-word repetition of prior words often occurs when they cannot retrieve the subsequent word. Work is reviewed that shows whole-word repetitions can be used to identify children from diverse language backgrounds who have word-finding difficulty. The symptom-based identification procedure was validated using a non-word repetition task. Children who were identified as having word-finding difficulty were given phonological training that taught them features of English that they lacked (this depended on their language background). Then they received semantic training. In the cases of children whose first language was not English, the children were primed to use English and then presented with material where there was interference in meanings across the languages (English names had to be produced). It was found that this training improved a range of outcome measures related to education.

Keywords DiSS
Kenneth O. St. Louis, Farzan Irani, Rodney M. Gabel, Stephanie Hughes, Marilyn Langevin, Midori Rodriguez, Kathleen Scaler Scott, and Mary E. Weidner, “Evidence-based guidelines for being supportive of people who stutter in North America,” Journal of Fluency Disorders, 2017, pp. -. DOI: 10.1016/j.jfludis.2017.05.002. http://www.sciencedirect.com/science/article/pii/S0094730X17300050.

Abstract Purpose. While many resources, particularly those available on the Internet, provide suggestions for fluent speakers as they interact with people who stutter (PWS), little evidence exists to support these suggestions. Thus, the purpose of this study was to document the supportiveness of common public reactions, behaviors, or interventions to stuttering by PWS. | Methods. 148 PWS completed the Personal Appraisal of Support for Stuttering-Adults. Additionally, a comparison of the opinions of adults who stutter based on gender and their involvement in self-help/support groups was undertaken. | Results. Many of the Internet-based suggestions for interacting with PWS are aligned with the opinions of the participants of this study. Significant differences were found amongst people who stutter on the basis of gender and involvement in self-help groups. | Conclusions. Lists of “DOs and DON’Ts” that are readily available on the Internet are largely supported by the data in this study; however, the findings highlight the need for changing the emphasis from strict rules for interacting with people who stutter to more flexible principles that keep the needs of individual PWS in mind.
Loulou Kosmala, and Aliyah Morgenstern, “A preliminary study of hesitation phenomena in L1 and L2 productions: a multimodal approach,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 37-40. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract This paper presents a preliminary study of vocal hesitations in L1 and L2 productions using a multimodal perspective. It investigates the use of vocal hesitations of French learners of English interacting in tandem with American speakers in semi-spontaneous speech. Several hesitation markers were analyzed (filled pauses, unfilled pauses, prolongations and non-lexical sounds) based on formal and functional features as well as their relation to gesture. Results do not show great differences in the frequency of vocal hesitations between L1 and L2 productions overall; however, we find differences in duration and combination complexity. Our study indicated that vocal hesitations mainly served planning functions and were very often accompanied with gaze aversion both in L1 and L2 productions. Moreover, speakers did not tend to gesture while hesitating. We conclude that hesitations mainly served planning strategies both in L1 and L2 speech, but with some differences in duration and complexity.

Keywords DiSS
Kurt Eggers, and Sabine Van Eerdenbrugh, “Speech disfluencies in children with Down Syndrome,” Journal of Communication Disorders, 2017. DOI: 10.1016/j.jcomdis.2017.11.001. http://www.sciencedirect.com/science/article/pii/S0021992416301794.

Abstract Purpose. Speech and language development in individuals with Down syndrome is often delayed and/or disordered and speech disfluencies appear to be more common. These disfluencies have been labeled over time as stuttering, cluttering or both. | Findings. were usually generated from studies with adults or a mixed age group, quite often using different methodologies, making it difficult to compare findings. Therefore, the purpose of this study was to analyze and describe the speech disfluencies of a group, only consisting of children with Down Syndrome between 3 and 13 years of age. | Method. Participants consisted of 26 Dutch-speaking children with DS. Spontaneous speech samples were collected and 50 utterances were analyzed for each child. Types of disfluencies were identified and classified into stuttering-like (SLD) and other disfluencies (OD). The criterion of three or more SLD per 100 syllables (cf. Ambrose & Yairi, 1999) was used to identify stuttering. Additional parameters such as mean articulation rate (MAR), ratio of disfluencies, and telescoping (cf. Coppens-Hofman et al., 2013) were used to identify cluttering and to differentiate between stuttering and cluttering. | Results & conclusion. Approximately 30 percent of children with DS between 3 and 13 years of age in this study stutter, which is much higher than the prevalence in normally developing children. Moreover, this study showed that the speech of children with DS has a different distribution of types of disfluencies than the speech of normally developing children. Although different cluttering-like characteristics were found in the speech of young children with DS, none of them could be identified as cluttering or cluttering-stuttering.

Keywords Cluttering, Down Syndrome, Speech disfluencies, stuttering
Craig Lambert, Judit Kormos, and Danny Minn, “Task Repetition and Second Language Speech Processing,” Studies in Second Language Acquisition, vol. 39, no. 1, 2017, pp. 167–196. DOI: 10.1017/S0272263116000085.

Abstract This study examines the relationship between the repetition of oral monologue tasks and immediate gains in L2 fluency. It considers the effect of aural-oral task repetition on speech rate, frequency of clause-final and midclause filled pauses, and overt self-repairs across different task types and proficiency levels and relates these findings to specific stages of L2 speech production (conceptualization, formulation, and monitoring). Thirty-two Japanese learners of English sampled at three levels of proficiency completed three oral communication tasks (instruction, narration, and opinion) six times. Results revealed that immediate aural-oral same task repetition was related to gains in oral fluency regardless of proficiency level or task type. Overall gains in speech rate were the largest across the first three performances of each task type but continued until the fifth performance. More specifically, however, clause-final pauses decreased until the second performance, midclause pauses decreased up to the fourth, and self-repairs decreased only after the fourth performance, indicating that task repetition may have been differentially related to specific stages in the speech production process.
Robin Lickley, “Disfluency in typical and stuttered speech,” in Fattori Socali e Biologici Nella Variazione Fonetica [Social and Biological Factors in Speech Variation] (Studi AISV), Bertini, Chiara and Celata, Chiara and Lenoci, Giovanna and Meluzzi, Chiara and Ricci, Irene, Ed.Milano, Italy: Associazione Italiana Scienze della Voce, 2017, pp. 373-387. DOI: 10.17469/O2103AISV000019.

Abstract This paper discusses what happens when things go wrong in the planning and execution of running speech, comparing disfluency in typical speech with pathological disfluency in stuttering. Spontaneous speech by typical speakers is rarely completely fluent. There are several reasons why fluency can break down in typical speech. Various studies suggest that we produce disfluencies at a rate of around 6 per 100 fluent words, so a significant proportion of our utterances are disfluent in some way. Stuttering can halt the flow of speech at a much higher rate than typical disfluency. While persons who stutter are also prone to the same kinds of disfluency as typical speakers, their impairment results in the production of other forms of disfluency that are both quantitatively and qualitatively different from typical forms. In this paper, I give an overview of the causes of disfluency in both typical and stuttered speech and relate these causes to their articulatory and phonetic realisations. I show how typical and stuttered disfluencies differ in both their cause and their realisations.
Ludivine Crible, “Discourse markers and (dis)fluency in English and French Variation and combination in the DisFrEn corpus,” International Journal of Corpus Linguistics, vol. 22, no. 2, 09/2017 2017, pp. 242-264. DOI: 10.1075/ijcl.22.2.04cri. http://www.jbe-platform.com/content/journals/10.1075/ijcl.22.2.04cri.

Abstract While discourse markers (DMs) and (dis)fluency have been extensively studied in the past as separate phenomena, corpus-based research combining large-scale yet fine-grained annotations of both categories has, however, never been carried out before. Integrating these two levels of analysis, while methodologically challenging, is not only innovative but also highly relevant to the investigation of spoken discourse in general and form-meaning patterns in particular. The aim of this paper is to provide corpus-based evidence of the register-sensitivity of DMs and other disfluencies (e.g. pauses, repetitions) and of their tendency to combine in recurrent clusters. These claims are supported by quantitative findings on the variation and combination of DMs with other (dis)fluency devices in DisFrEn, a richly annotated and comparable English-French corpus representative of eight different interaction settings. The analysis uncovers the prominent place of DMs within (dis)fluency and meaningful association patterns between forms and functions, in a usage-based approach to meaning-in-context.

Keywords corpus annotation, dis uency, Discourse markers, speech, usage-based
Kikuo Maekawa, Ken’ya Nishikawa, and Shu-Chuan Tseng, “Phonetic characteristics of filled pauses: a preliminary comparison between Japanese and Chinese,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 41-44. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract Filled pauses in spontaneous Chinese and Japanese were analyzed to examine if there is systematic phonetic difference between the vowels of filled pauses and those occurred in ordinary lexical items. Also, the effect of the category of filled pauses (simple vocalic fillers versus fillers derived from demonstratives) was examined in both languages. Random forests analysis revealed that it was possible to construct automatic classifiers that achieved F-measure values of .7-.9. It turned out also that, in both languages, vowels in simple vocalic filled pauses showed higher F-values than the filled pauses derived from demonstratives. Lastly, it turned out that acoustic features distinguishing filled pauses from ordinary lexical items differ depending on both the category of filled pauses and languages.

Keywords DiSS
Srdan Medimorec, Torin P. Young, and Evan F. Risko, “Disfluency effects on lexical selection,” Cognition, vol. 158, 01/2017 2017, pp. 28 - 32. DOI: http://dx.doi.org/10.1016/j.cognition.2016.10.008. http://www.sciencedirect.com/science/article/pii/S0010027716302426.

Abstract Recent research has suggested that introducing a disfluency in the context of written composition (i.e., typing with one hand) can increase lexical sophistication. In the current study, we provide a strong test between two accounts of this phenomenon, one that attributes it to the delay caused by the disfluency and one that attributes it to the disruption of typical finger-to-letter mappings caused by the disfluency. To test between these accounts, we slowed down participants’ typewriting by introducing a small delay between keystrokes while individuals wrote essays. Critically, this manipulation did not disrupt typical finger-to-letter mappings. Consistent with the delay-based account, our results demonstrate that the essays written in this less fluent condition were more lexically diverse and used less frequent words. Implications for the temporal dynamics of lexical selection in complex cognitive tasks are discussed.

Keywords Lexical sophistication
Mohammad Alameer, Lotte Meteyard, and David Ward, “Stuttering Generalization Self-Measure: Preliminary Development of a Self-Measuring Tool,” Journal of Fluency Disorders, 2017, pp. -. DOI: 10.1016/j.jfludis.2017.04.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300390.

Abstract Introduction. Generalization of treatment is considered a difficult task for clinicians and people who stutter (PWS), and can constitute a barrier to long-term treatment success. To our knowledge, there are no standardized tests that collect measurement of the behavioral and cognitive aspects alongside the client’s self-perception in real-life speaking situations. | Purpose. This paper describes the preliminary development of a Stuttering Generalization Self-Measure (SGSM). The purpose of SGSM is to assess 1) stuttering severity and 2) speech-anxiety level during real-life situations as perceived by PWS. Additionally, this measurement aims to 3) investigate correlations between stuttering severity and speech-anxiety level within the same real-life situation. | Method. The SGSM initially reported includes nine speaking situations designed that are developed to cover a variety of frequent speaking scenario situations. However, two of these were less commonly encountered by participants and subsequently not included in the final analyses. Items were created according to five listener categories (family and close friends, acquaintances, strangers, persons of authority, and giving a short speech to small audience). Forty-three participants (22 PWS, and 21 control) aged 18 to 53 years were asked to complete the assessment in real-life situations. | Results. Analyses indicated that test-retest reliability was high for both groups. Discriminant validity was also achieved as the SGSM scores significantly differed between the controls and PWS two groups for stuttering and speech-anxiety. Convergent validity was confirmed by significant correlations between the SGSM and other speech-related anxiety measures.

Keywords Assessment, Generalization, Self-perception, Speech-anxiety, Stuttering severity
Naomi Ogi, Involvement and Attitude in Japanese Discourse. Amsterdam, Netherlands: John Benjamins.2017. DOI: 10.1075/pbns.272. https://benjamins.com/$#$catalog/books/pbns.272/main.

Abstract This book addresses the long discussed issue of Japanese interactive markers (traditionally called sentence-final particles) in a new light, and provides the comprehensive linguistic documentation of the interactional functions of seven interactive markers: ne, na, yo, sa, wa, zo and ze. By adopting three key notions, ‘involvement’, ‘formality’ and ‘gender’, the study not only reveals the functions and pragmatic effects of each marker, but also sheds light on some fundamental issues of the nature of spoken discourse in general, including how speakers collaborate with each other to create and sustain their conversations and how linguistic functions of verbal forms interface with sociocultural norms. This book will be of interest to students and scholars in a wide range of linguistic fields such as Japanese linguistics, pragmatics, sociolinguistics, discourse analysis and applied linguistics and to teachers and learners of Japanese and of a second/foreign language.
Sieb Nooteboom, and Hugo Quené, “The time course of self-monitoring within words and utterances,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 45-48. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract The within-word and within-utterance time course of internal and external self-monitoring is investigated in a four-word tongue twister experiment eliciting interactional word initial and word medial segmental errors and their repairs. It is found that detection rate for both internal and external self-monitoring decreases from early to late both within words and within utterances. Also, offset-to-repair times are more often of 0 ms in initial than in medial consonants.

Keywords DiSS
Dan Nosowitz, “The Mystery and Occasional Poetry of, Uh, Filled Pauses,” January 2017. https://www.atlasobscura.com/articles/the-mystery-and-occasional-poetry-of-uh-filled-pauses.

Abstract NEARLY EVERY LANGUAGE AND EVERY culture has what are called “filled pauses,” a notoriously difficult-to-define concept that generally refers to sounds or words that a speaker uses when, well, not exactly speaking. In American English, the most common are “uh” and “um.”
Pauliina Peltonen, “Temporal fluency and problem-solving in interaction: An exploratory study of fluency resources in L2 dialogue,” System, vol. 70, 2017, pp. 1 - 13. DOI: 10.1016/j.system.2017.08.009. http://www.sciencedirect.com/science/article/pii/S0346251X1630286X.

Abstract Second language (L2) speech fluency has mostly been studied from monologues with temporal measures. In the present study, dialogue data are examined with a new framework that links (temporal) fluency analysis to a broader problem-solving perspective, offering a unique approach to examining the resources learners have for maintaining fluent speech despite problems. Dialogues based on a pairwise problem-solving task from 42 Finnish learners of English at two school levels were analyzed quantitatively for temporal fluency, dialogue fluency, stalling mechanisms, and communication strategies (CSs). A complementary qualitative analysis of selected productions was also conducted. The results indicate that temporal and dialogue fluency measures differentiate learners at different school levels, but the relationship between CSs and fluency is complex. While correlations between mid-clause pauses and certain strategies were found, the qualitative analysis indicated that stalling mechanisms and CSs can compensate for local dysfluencies and even contribute to temporal fluency. The results highlight the importance of combining quantitative and qualitative analysis in L2 fluency studies. Conceptually, L2 speech fluency should include collaborative aspects (dialogue fluency) in addition to individual, temporal fluency, and cover resources for maintaining fluency.

Keywords Communication strategies, interaction, Mixed-methods, oral fluency, Problem-solving, second language speech
Ralph Rose, “Silent and filled pauses and speech planning in first and second language production,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 49-52. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract The present study looks at the relative association of silent and filled pauses to problems in discourse and syntactic planning via utterance and clause boundary phenomena, respectively, by focusing on crosslinguistic data. The occurrence of boundary pauses in a crosslinguistic corpus of speech suggests that silent pauses are more closely related to both discourse and syntactic planning than filled pauses, but more strongly so for discourse planning. These results were consistent across both first and second language production. However, clause boundary silent pauses in first language speech were more atypical (i.e., longer than average) than those in second language speech. This difference may be due to complexity differences in the first and second language speech samples.

Keywords DiSS
Ralph L Rose, “Differences in second language speech fluency ratings: native versus nonnative listeners,” in Proceedings of the International Conference: Fluency & Disfluency Across Languages and Language Varieties, Université catholique de Louvain, 2 2017, pp. 101-103. http://hdl.handle.net/2078.1/195807.

Abstract (none)
Ralph L. Rose, “A Comparison of Form and Temporal Characteristics of Filled Pauses in L1 Japanese and L2 English,” Journal of the Phonetic Society of Japan, vol. 21, no. 3, 2017, pp. 33-40. DOI: 10.24467/onseikenkyu.21.3_33. https://www.jstage.jst.go.jp/article/onseikenkyu/21/3/21_33/_article/-char/en.

Abstract Filled pauses (FPs) in English can be either monophonemic ‘uh’ [ə] or polyphonemic ‘um’ [əm]. These differ temporally: shorter ‘uh’ is associated with shorter overall delay (including silent pauses). Japanese FPs are more varied, including both monophonemic ([ε], [ŋ]) and polyphonemic ([ε:to], [ɑno]) forms. This study compares the FPs of native Japanese speakers in a crosslinguistic speech corpus. Results show speakers use FPs with a lower F1 than native English speakers and strongly prefer the monophonemic form. Duration patterns are similar, but low proficiency speakers delay longer with monophonemic FPs. Results suggest possibilities for nonnative speech detection in speech applications.
June Ruivivar, and Laura Collins, “The Effects of Foreign Accent on Perceptions of Nonstandard Grammar: A Pilot Study Authors,” TESOL Quarterly, 05/2017 2017. DOI: 10.1002/tesq.374. http://onlinelibrary.wiley.com/doi/10.1002/tesq.374/full.

Abstract (none)
Naomi Sakai, Shin Ying Chu, Koichi Mori, and J. Scott Yaruss, “The Japanese version of the Overall Assessment of the Speaker’s Experience of Stuttering for Adults (OASES-A-J): Translation and psychometric evaluation,” Journal of Fluency Disorders, 01/2017 2017. DOI: 10.1016/j.jfludis.2016.11.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300663.

Abstract Purpose. This study evaluates the psychometric performance of the Japanese version of the Overall Assessment of the Speaker’s Experience of Stuttering for Adults (OASES-A), a comprehensive assessment tool of individuals who stutter. | Methods. The OASES-A-J was administered to 200 adults who stutter in Japan. All respondents also evaluated their own speech (SA scale), satisfaction of their own speech (SS scale) and the Japanese translation version of the Modified Erickson Communication Attitude scale (S-24). The test-retest reliability and internal consistency of the OASES-A-J were assessed. To examine the concurrent validity of the questionnaire, Pearson correlation was conducted between the OASES-A-J Impact score and the S-24 scale, SA scale and SS scale. In addition, Pearson correlation among the impact scores of each section and total were calculated to examine the construct validity. | Results. The OASES-A-J showed a good test-retest reliability (r = 0.81–0.95) and high internal consistency (α > 0.80). Concurrent validity was moderate to high (0.55–0.75). Construct validity was confirmed by the relation between internal consistency in each section and correlation among sections’ impact scores. Japanese adults showed higher negative impact for ‘General Information’, ‘Reactions to Stuttering’ and ‘Quality of Life’ sections. | Conclusion. These results suggest that the OASES-A-J is a reliable and valid instrument to measure the impact of stuttering on Japanese adults who stutter. The OASES-A-J could be used as a clinical tool in Japanese stuttering field.

Keywords ICF, OASES, Psychometric analysis, Quality of life, stuttering
Vered Silber-Varod, and Anat Lerner, “Analysis of silences in unbalanced dialogues: the effect of genre and role,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 53-57. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

Abstract This study examines the diversity of silences in unbalanced dialogues, i.e. dialogues between speakers with different participation levels: responder and reporter. We examined two genres: therapeutic sessions and private dialogues that are based on this responder-reporter structure. When looking at silences versus speech ratios, we found no differences between the genres nor between the roles. However, when grouping the silences by their types: Pauses (intra-speaker silences), gaps (interspeakers’ silences) and silences that occur in the vicinity of speech overlaps, we found that the silence duration of pauses are role dependent in both genres, while the silence duration of gaps were found genre dependent, but not role dependent. Moreover, speech rate was not found genre dependent. It seems that although silences in unbalanced dialogues vary considerably, genre and speaker’s role are influential.

Keywords DiSS
Richard Stephens, and Amy Zile, “Does Emotional Arousal Influence Swearing Fluency?,” Journal of Psycholinguistic Research, 01/2017 2017, pp. 1–13. DOI: 10.1007/s10936-016-9473-8. http://dx.doi.org/10.1007/s10936-016-9473-8.

Abstract This study assessed the effect of experimentally manipulated emotional arousal on swearing fluency. We hypothesised that swear word generation would be increased with raised emotional arousal. The emotional arousal of 60 participants was manipulated by having them play a first-person shooter video game or, as a control, a golf video game, in a randomised order. A behavioural measure of swearing fluency based on the Controlled Oral Word Association Test was employed. Successful experimental manipulation was indicated by raised State Hostility Questionnaire scores after playing the shooter game. Swearing fluency was significantly greater after playing the shooter game compared with the golf game. Validity of the swearing fluency task was demonstrated via positive correlations with self-reported swearing fluency and daily swearing frequency. In certain instances swearing may represent a form of emotional expression. This finding will inform debates around the acceptability of using taboo language.
Stewart M. McCauley, and Morten H. Christiansen, “Computational Investigations of Multiword Chunks in Language Learning,” Topics in Cognitive Science, 2017. DOI: 10.1111/tops.12258. http:https://dx.doi.org/10.1111/tops.12258.

Abstract Second-language learners rarely arrive at native proficiency in a number of linguistic domains, including morphological and syntactic processing. Previous approaches to understanding the different outcomes of first- versus second-language learning have focused on cognitive and neural factors. In contrast, we explore the possibility that children and adults may rely on different linguistic units throughout the course of language learning, with specific focus on the granularity of those units. Following recent psycholinguistic evidence for the role of multiword chunks in online language processing, we explore the hypothesis that children rely more heavily on multiword units in language learning than do adults learning a second language. To this end, we take an initial step toward using large-scale, corpus-based computational modeling as a tool for exploring the granularity of speakers’ linguistic units. Employing a computational model of language learning, the Chunk-Based Learner, we compare the usefulness of chunk-based knowledge in accounting for the speech of second-language learners versus children and adults speaking their first language. Our findings suggest that while multiword units are likely to play a role in second-language learning, adults may learn less useful chunks, rely on them to a lesser extent, and arrive at them through different means than children learning a first language.

Keywords chunking, Comput ational modeling, Corpora, L2, Language learning
Uriel Cohen Priva, “Not so fast: Fast speech correlates with lower lexical and structural information,” Cognition, vol. 160, 2017, pp. 27 - 34. DOI: 10.1016/j.cognition.2016.12.002. http://www.sciencedirect.com/science/article/pii/S0010027716302888.

Abstract Speakers dynamically adjust their speech rate throughout conversations. These adjustments have been linked to cognitive and communicative limitations: for example, speakers speak words that are contextually unexpected (and thus add more information) with slower speech rates. This raises the question whether limitations of this type vary wildly across speakers or are relatively constant. The latter predicts that across speakers (or conversations), speech rate and the amount of information content are inversely correlated: on average, speakers can either provide high information content or speak quickly, but not both. Using two corpus studies replicated across two corpora, I demonstrate that indeed, fast speech correlates with the use of less informative words and syntactic structures. Thus, while there are individual differences in overall information throughput, speakers are more similar in this aspect than differences in speech rate would suggest. The results suggest that information theoretic constraints on production operate at a higher level than was observed before and affect language throughout production, not only after words and structures are chosen.

Keywords Information, Information rate, Language, speech rate
Vered Aharonson, Eran Aharonson, Katia Raichlin-Levi, Aviv Sotzianu, Ofer Amir, and Zehava Ovadia-Blechman, “A real-time phoneme counting algorithm and application for speech rate monitoring,” Journal of Fluency Disorders, vol. 51, 2017, pp. 60 - 68. DOI: 10.1016/j.jfludis.2017.01.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300389.

Abstract Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient’s speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient’s practice. The algorithm’s phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of −4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.

Keywords Smartphone application, Speaking rate computation, Spectral transition measure, Stuttering therapy
Xiaoming Jiang, and Marc D. Pell, “The sound of confidence and doubt,” Speech Communication, vol. 88, 2017, pp. 106 - 126. DOI: http://dx.doi.org/10.1016/j.specom.2017.01.011. http://www.sciencedirect.com/science/article/pii/S0167639316301509.

Abstract Feeling of knowing (or "expressed confidence") reflects a speaker’s certainty or commitment to a statement and can be associated with one’s trustworthiness or persuasiveness in social interaction. We investigated the perceptual-acoustic correlates of expressed confidence and doubt in spoken language, with a focus on both linguistic and vocal speech cues. In Experiment 1, utterances subserving different communicative functions (e.g., stating facts, making judgments) produced in a confident, close-to-confident, unconfident, and neutral-intending voice by six speakers, were then rated for perceived confidence by 72 native listeners. As expected, speaker confidence ratings increased with the intended level of expressed confidence; neutral-intending statements were frequently judged as relatively high in confidence. The communicative function of the statement, and the presence vs. absence of an utterance-initial probability phrase (e.g., Maybe, I’m sure), further modulated speaker confidence ratings. In Experiment 2, acoustic analysis of perceptually valid tokens rated in Experiment 1 revealed distinct patterns of pitch, intensity and temporal features according to perceived confidence levels; confident expressions were highest in fundamental frequency (f0) range, mean amplitude, and amplitude range, whereas unconfident expressions were highest in mean f0, slowest in speaking rate, with more frequent pauses. Dynamic analyses of f0 and intensity changes across the utterance uncovered distinctive patterns in expression as a function of confidence level at different positions of the utterance. Our findings provide new information on how metacognitive states such as confidence and doubt are communicated by vocal and linguistic cues which permit listeners to arrive at graded impressions of a speaker’s feeling of (un)knowing.

Keywords nonverbal behavior
Yuh-show Cheng, “Development and preliminary validation of four brief measures of L2 language-skill-specific anxiety,” System, 2017, pp. -. DOI: 10.1016/j.system.2017.06.009. http://www.sciencedirect.com/science/article/pii/S0346251X17304888.

Abstract This paper reports a study on the development and validation of four brief measures of L2 language-skill-specific anxiety scales: L2 listening, speaking, reading, and writing anxiety scales. A total of 523 college students in Taiwan participated in the study. Lang’s (1971) tripartite model of anxiety provided a theoretical basis for developing the four scales. An initial pool of items were developed based on a review of related literature and the results of a focus group interview. Less ideal items were removed based upon the results of a pilot test. In the formal study, exploratory factor analysis was conducted to select items for each anxiety scale, which was subsequently validated by confirmatory factor analysis and correlation analysis. The results provided evidence for the reliability, convergent validity, and discriminant validity of the scores of the four brief measures.

Keywords Brief measure, L2, Language anxiety, Language-skill-specific, Psychometric properties

2016

Niloofar Akhavan, Tilbe Göksun, and Nazbanou Bonnie Nozari, “Disfluency production in speech and gesture,” in Proceedings of the 38th Annual Conference of the Cognitive Science Society, Philadelphia, USA, 2016, pp. 716-721. http://www.tilbegoksunyoruk.com/documents/Akhavan2016.

Abstract The cognitive architecture and function of co-speech gesture has been the subject of a large body of research. We investigate two main questions in this field, namely, whether language and gesture are the same or two inter-related systems, and whether gestures help resolve speech problems, by examining the relationship between gesture and disfluency in neurotypical speakers. Our results support the view of separate, but inter-related systems by showing that speech problems do not necessarily cause gesture problems, and on many occasions, gestures signal an upcoming speech problem even before it surfaces in overt speech. We also show that while gestures are more common on fluent trials, speakers use both iconic and beat gestures on disfluent trials to facilitate communication, although the two gesture types support communication in different ways.

Keywords gesture; speech production; disfluency
Akiko Fuse, and Erika A. Lanham, “Impact of social media and quality life of people who stutter,” Journal of Fluency Disorders, vol. 50, 2016, pp. 59 - 71. DOI: 10.1016/j.jfludis.2016.09.005. http://www.sciencedirect.com/science/article/pii/S0094730X16300262.

Abstract Highlights. • People who stutter (PWS) who are connecting with other PWS have seen an improvement in their overall confidence. • PWS who use social media feel that they do not rely on it as their main form of communication and feel that they use social media an average amount. • Social media relieves PWS anxiety in communication by allowing them to communicate without negative evaluation or experience difficulty with functional communication.
Amy Watts, Patricia Eadie, Susan Block, Fiona Mensah, and Sheena Reilly, “Language skills of children during the first 12 months after stuttering onset,” Journal of Fluency Disorders, 12/2016 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.12.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300286.

Abstract Purpose To describe the language development in a sample of young children who stutter during the first 12 months after stuttering onset was reported. Methods Language production was analysed in a sample of 66 children who stuttered (aged 2 to 4 years). The sample were identified from a pre-existing prospective, community based longitudinal cohort. Data were collected at three time points within the first year after stuttering onset. Stuttering severity was measured, and global indicators of expressive language proficiency (length of utterances and grammatical complexity) were derived from the samples and summarised. Language production abilities of the children who stutter were contrasted with normative data. Results The majority of children’s stuttering was rated as mild in severity, with more than 83% of participants demonstrating very mild or mild stuttering at each of the time points studied. The participants demonstrated developmentally appropriate spoken language skills comparable with available normative data. Conclusion In the first year following the report of stuttering onset, the language skills of the children who were stuttering progressed in a manner that is consistent with developmental expectations.

Keywords Language
Andrea Révész, Monika Ekiert, and Eivind Nessa Torgersen, “The Effects of Complexity, Accuracy, and Fluency on Communicative Adequacy in Oral Task Performance,” Applied Linguistics, vol. 37, no. 6, 12/2016 2016, pp. 828-848. DOI: 10.1093/applin/amu069. http://applij.oxfordjournals.org/content/37/6/828.short?rss=1.

Abstract Communicative adequacy is a key construct in second language research, as the primary goal of most language learners is to communicate successfully in real-world situations. Nevertheless, little is known about what linguistic features contribute to communicatively adequate speech. This study fills this gap by investigating the extent to which complexity, accuracy, and fluency (CAF) predict adequacy, and whether proficiency and task type moderate these relationships. In all, 20 native speakers and 80 second language users from four proficiency levels performed five tasks. Speech samples were rated for adequacy and coded for a range of CAF indices. Filled pause frequency, a feature of breakdown fluency, emerged as the strongest predictor of adequacy. Predictors with significant but smaller effects included indices of all three CAF dimensions: linguistic complexity (lexical diversity, overall syntactic complexity, syntactic complexity by subordination, and frequency of conjoined clauses), accuracy (general accuracy and accuracy of connectors), and fluency (silent pause frequency and speed fluency). For advanced speakers, incidence of false starts also emerged as predicting communicatively adequate speech. Task type did not influence the link between linguistic features and adequacy.
Andrew Martin, Yosuke Igarashi, Nobuyuki Jincho, and Reiko Mazuka, “Utterances in infant-directed speech are shorter, not slower,” Cognition, vol. 156, 2016, pp. 52 - 59. DOI: http://dx.doi.org/10.1016/j.cognition.2016.07.015. http://www.sciencedirect.com/science/article/pii/S0010027716301901.

Abstract It has become a truism in the literature on infant-directed speech (IDS) that IDS is pronounced more slowly than adult-directed speech (ADS). Using recordings of 22 Japanese mothers speaking to their infant and to an adult, we show that although IDS has an overall lower mean speech rate than ADS, this is not the result of an across-the-board slowing in which every vowel is expanded equally. Instead, the speech rate difference is entirely due to the effects of phrase-final lengthening, which disproportionally affects IDS because of its shorter utterances. These results demonstrate that taking utterance-internal prosodic characteristics into account is crucial to studies of speech rate.

Keywords Final lengthening
Elina Banzina, “Consonant lengthening for persuasiveness in L1 and L2 English,” International Journal of Applied Linguistics, vol. 26, no. 3, 11/2016 2016, pp. 403-419. DOI: doi.org/10.1111/ijal.12137. http://www.ingentaconnect.com/content/bpl/ijal/2016/00000026/00000003/art00007.

Abstract The present study explored how persuasiveness is expressed phonetically in English and whether non-native speakers of English are able to employ L2 phonetic cues to convey importance in L2 in a native-like manner. An acoustic experiment compared English and Latvian speakers’ of English treatment of syllable-onset consonant duration relative to vowels in (i) neutral and (ii) persuasive speech contexts. Duration was measured in voiceless stops and continuants and a wide variety of vowels in the stressed syllables of key words. Results revealed that in persuasive speech, native English speakers significantly increased the proportion of consonantal duration, whereas no consonant lengthening was found in Latvian L1 and L2 productions. These findings provide evidence for the paralinguistic function of consonants and the existence of language-specific persuasion cues.

Keywords consonant duration, consonant lengthening, discurso persuasivo, discurso público, duración de consonante, emphasis, énfasis, inglés como lengua extranjera, persuasive speech, public speaking, spoken English
Benjamin V. Tucker, Mirjam Ernestus, and View Affiliations, “Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon,” The Mental Lexicon, vol. 11, no. 3, 12/2016 2016, pp. 375-400. DOI: 10.1075/ml.11.3.03tuc. http://www.jbe-platform.com/content/journals/10.1075/ml.11.3.03tuc.

Abstract The majority of studies addressing psycholinguistic questions focus on speech produced and processed in a careful, laboratory speech style. This ‘careful’ speech is very different from the speech that listeners encounter in casual conversations. This article argues that research on casual speech is necessary to show the validity of conclusions based on careful speech. Moreover, research on casual speech produces new insights and questions on the processes underlying communication and on the mental lexicon that cannot be revealed by research using careful speech. This article first places research on casual speech in its historic perspective. It then provides many examples of how casual speech differs from careful speech and shows that these differences may have important implications for psycholinguistic theories. Subsequently, the article discusses the challenges that research on casual speech faces, which stem from the high variability of this speech style, its necessary casual context, and that casual speech is connected speech. We also present opportunities for research on casual speech, mostly in the form of new experimental methods that facilitate research on connected speech. However, real progress can only be made if these new methods are combined with advanced (still to be developed) statistical techniques.

Keywords casual speech, conversational speech, experimental paradigms, pronunciation variability, statistical analyses
Bjørn Wessel-Tolvig, and Patrizia Paggio, “Revisiting the thinking-for-speaking hypothesis: Speech and gesture representation of motion in Danish and Italian,” Journal of Pragmatics, vol. 99, 07/2016 2016, pp. 39 - 61. DOI: http://dx.doi.org/10.1016/j.pragma.2016.05.004. http://www.sciencedirect.com/science/article/pii/S0378216616301539.

Abstract Many studies try to explain thought processes based on verbal data alone and often take the linguistic variation between languages as evidence for cross-linguistic thought processes during speaking. We argue that looking at co-speech gestures might broaden the scope and shed new light on different thinking-for-speaking patterns. Data comes from a corpus study investigating the relationship between speech and gesture in two typologically different languages: Danish, a satellite-framed language and Italian, a verb-framed language. Results show cross-linguistic variation in how motion components are mapped onto linguistic constituents, but also show how Italian speakers to some degree deviate from standard verb-framed lexicalization patterns, and use typical satellite-framed constructions. Co-speech gestures, when they occur, largely follow the patterns used in speech, with a notable exception: In 28% of the cases, in fact, Italian speakers express manner in path-only speech constructions gesturally. This finding suggests that gestures may be instrumental in revealing what semantic components speakers attend to while speaking; in other words, purely verbal data may not fully account for the thinking part of the thinking-for-speaking hypothesis.

Keywords Gesture
Boaz M. Ben-David, Maroof I. Moral, Aravind K. Namasivayam, Hadas Erel, and Pascal H.H.M. van Lieshout, “Linguistic and Emotional-Valence Characteristics of Reading Passages for Clinical Use and Research,” Journal of Fluency Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.06.003. http://www.sciencedirect.com/science/article/pii/S0094730X16300377.

Abstract Highlights: • There is little information on fundamental properties of reading passages that can affect reading (e.g., words’ arousal and valence, passage readability). • In a detailed analysis, the three commonly used passages were found to contain a share of emotionally valenced, high arousal, lower familiarity and polysyllabic content words. • The paper also provides a new well-balanced (and ranked high on ease of readability) passage that minimizes the impact of these properties (e.g., low arousal words). • Testing 26 PWS, error rates on a traditional passage and on the novel passage were correlated, yet many individuals showed a large difference between the two. • We suggest a combined procedure, using more than one passage. The details on passage characteristics can inform clinical practice.
Jazmín Cevasco, and Paul van den Broek, “The effect of filled pauses on the processing of the surface form and the establishment of causal connections during the comprehension of spoken expository discourse,” Cognitive Processing, vol. 17, no. 2, 2016, pp. 185–194. DOI: 10.1007/s10339-016-0755-8. http://dx.doi.org/10.1007/s10339-016-0755-8.

Abstract The purpose of this study was to examine the effect of filled pauses (uh) on the verification of words and the establishment of causal connections during the comprehension of spoken expository discourse. With this aim, we asked Spanish-speaking students to listen to excerpts of interviews with writers, and to perform a word-verification task and a question-answering task on causal connectivity. There were two versions of the excerpts: filled pause present and filled pause absent. Results indicated that filled pauses increased verification times for words that preceded them, but did not make a difference on response times to questions on causal connectivity. The results suggest that, as signals of delay, filled pauses create a break with surface information, but they do not have the same effect on the establishment of meaningful connections.
David Wood, “Willingness to communicate and second language speech fluency: An idiodynamic investigation,” System, vol. 60, 2016, pp. 11 - 28. DOI: http://dx.doi.org/10.1016/j.system.2016.05.003. http://www.sciencedirect.com/science/article/pii/S0346251X16300276.

Abstract Second language (L2) speech fluency has usually been studied as a function of a set of measurable temporal features of speech, but it has seldom been researched in relation to learner or situational factors in performance such as willingness to communicate (WTC), definable as readiness to engage in communication at a specific time and with specific interlocutors. The present study is an examination of the fluid relationship between WTC and L2 fluency from a dynamic systems perspective. The exploratory case study presents an examination of WTC and fluency in Japanese learners of English L2, in communication with a non-Japanese interlocutor. Speech samples produced by the learners were analyzed for markers of fluency. The learners produced WTC profiles for their speech samples by creating bitmaps during stimulated recall, and also provided retrospective self-analysis of WTC in stimulated recall. The fluency profiles and WTC profiles were matched and analyzed to explore the interrelationship between fluency and WTC. The results illuminate the relationship between fluency and WTC, particularly the fluidity and possible directionality of the relationship, i.e. whether fluency breakdowns lead to lowered WTC or vice versa.

Keywords Cognitive fluency
Nivja H. de Jong, “Predicting pauses in L1 and L2 speech: the effects of utterance boundaries and word frequency,” International Review of Applied Linguistics in Language Teaching, vol. 54, no. 2, 06/2016 2016, pp. 113-132. DOI: 10.1515/iral-2016-9993. http://www.degruyter.com/view/j/iral.2016.54.issue-2/iral-2016-9993/iral-2016-9993.xml.

Abstract This paper compares the distribution of silent and filled pauses in first (L1) and second language (L2) speech. The occurrence of pauses of 52 L2 and 18 L1 Dutch speakers was evaluated with respect to utterance boundaries and word frequency. We found that L2 speakers paused more often than L1 speakers within utterances; but not between utterances. Similarly, only within utterances, L2 pauses were longer than L1 pauses. Regarding word frequency, both L1 and L2 speakers are more likely to pause before lower frequency words as compared to higher frequency words. These findings imply that L1 and L2 speakers’ production processes may be similar in that (1) pauses at utterance boundaries are used for conceptual planning mostly and (2) lexical retrieval difficulties are comparable for L1 and L2 speakers. These findings furthermore imply that when using fluency for L2 testing, pause locations must be taken into account.
Francesca Bianchi, and Sara Gesuato, Pragmatic Issues in Specialized Communicative Contexts. : Brill.2016, pp. 240. DOI: 10.1163/9789004323902. http://www.brill.com/products/book/pragmatic-issues-specialized-communicative-contexts.

Abstract "Pragmatic Issues in Specialized Communicative Contexts", edited by Francesca Bianchi and Sara Gesuato, illustrates how interactants systematically and effectively employ micro and macro linguistic resources and textual strategies to engage in communicative practices in such specific contexts as healthcare services, TV interpreting, film dialogue, TED talks, archaeology academic communication, student-teacher communication, and multilingual classrooms. Each contribution presents a pedagogical slant, reporting on or suggesting didactic approaches to, or applications of, pragmatic aspects of communication in SL, FL and LSP learning contexts. The topics covered and the issues addressed are all directly relevant to applied pragmatics, that is, pragmatically oriented linguistic analysis that accounts for interpersonal-transactional issues in real-life situated communication.
Josef Fruehwald, “Filled Pause Choice as a Sociolinguistic Variable,” University of Pennsylvania Working Papers in Linguistics, vol. 22, no. 2, 2016, pp. Article 6. https://repository.upenn.edu/pwpl/vol22/iss2/6.

Abstract In this paper, I argue that filled pause selection (um/uh) is a sociolinguistic variable, conditioned by both internal and external factors. There appears to be a language change in progress towards selecting um more often than uh. In all respects, the (UHM) variable appears to pattern quantiatively just like all other sociolinguistic variables which have been examined, even though the locus of (UHM) variation would seem to be firmly in the speech planning domain. Combined with the quantitative systematicity of sociolinguistic variables across the full range of linguistic modules, I argue that the locus of variation may not be in the grammar, but rather constitutes a separate domain of knowledge, perhaps what Preston (2004) called the “sociocultural selection device.”
Effrosyni Georgiadou, and Karen Roehr-Brackin, “Investigating Executive Working Memory and Phonological Short-Term Memory in Relation to Fluency and Self-Repair Behavior in L2 Speech,” Journal of Psycholinguistic Research, 2016, pp. 1–19. DOI: 10.1007/s10936-016-9463-x. http://dx.doi.org/10.1007/s10936-016-9463-x.

Abstract This paper reports the findings of a study investigating the relationship of executive working memory (WM) and phonological short-term memory (PSTM) to fluency and self-repair behavior during an unrehearsed oral task performed by second language (L2) speakers of English at two levels of proficiency, elementary and lower intermediate. Correlational analyses revealed a negative relationship between executive WM and number of pauses in the lower intermediate L2 speakers. However, no reliable association was found in our sample between executive WM or PSTM and self-repair behavior in terms of either frequency or type of self-repair. Taken together, our findings suggest that while executive WM may enhance performance at the conceptualization and formulation stages of the speech production process, self-repair behavior in L2 speakers may depend on factors other than working memory.

Keywords Executive working memory, Fluency, hesitation phenomena, L2 speech production, Phonological short-term memory, Self-repair behavior, Working memory capacity
Anna Gladkova, Ulla Vanhatalo, and Cliff Goddard, “The semantics of interjections: An experimental study with natural semantic metalanguage,” Applied Psycholinguistics, vol. 37, 7 2016, pp. 841–865. DOI: 10.1017/S0142716415000260. http://journals.cambridge.org/article_S0142716415000260.

Abstract The paper reports the results of a pilot experimental study aimed at evaluating natural semantic metalanguage (NSM) explications of English interjections. It proposes a novel online survey technique to test NSM explications with language speakers. The survey tested recently developed semantic explications of selected English interjections as published in Goddard (2014a): 'wow', 'gosh', 'gee', 'yikes' (“surprise” group) and 'yuck', 'ugh' (“disgust” group). The results provide overall support for the proposed explications and indicate directions for their further development. It is interesting that respondents’ preexisting knowledge of NSM and other background variables (age, gender, being a native speaker, or studying linguistics) were shown to have little influence on the test results.
Kaisa Hash, Heini-Marja Javinen, and Kalle Juuti, “Accommodating to English-medium instruction in teacher education in Finland,” International Journal of Applied Linguistics, vol. 26, no. 3, 11/2016 2016, pp. 291-310. DOI: 10.1111/ijal.12093. http://www.ingentaconnect.com/content/bpl/ijal/2016/00000026/00000003/art00001.

Abstract This study analyses teacher educators’ and student teachers’ perceptions of teaching and learning situations in an international English as a lingua franca (ELF) context in an English-medium instruction (EMI) teacher education programme in Finland. The analysis of semi-structured interviews revealed that the participants perceived a partial reversal of traditional teacher and student roles; students assisted voluntarily and teaching became reciprocal. Some teachers reflected on having used typical strategies in ELF context, such as code-switching, to further communication and engage students. However, teachers’ lack of fluency was sometimes considered causing frustration among students and affected negatively their feeling of being professional teacher educators. Nevertheless, by increasing more learner-led activities, ELF can positively affect teacher education pedagogy.

Keywords accommodation strategies, co-construction of communication, ELF, EMI, englanninkielinen koulutus, opettajankoulutus, sovittamisstrategiat, teacher education, yhdessä rakennettu viestintä
Hyunkyung Lee, Hyunsub Sim, Eunju Lee, and Dahye Choi, “Disfluency characteristics of children with attention-deficit/hyperactivity disorder symptoms,” Journal of Communication Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jcomdis.2016.12.001. http://www.sciencedirect.com/science/article/pii/S0021992416302027.

Abstract The purpose of the current study was to investigate the characteristics of speech disfluency in 15 children with attention-deficit/hyperactivity disorder (ADHD) symptoms and 15 age-matched control children. Reading, story retelling, and picture description tasks were used to elicit utterances from the participants. The findings indicated that children with ADHD symptoms produced significantly more stuttering-like disfluencies (SLD) and other disfluencies (OD) when compared to the control group during all three tasks. Further statistical analysis showed that children with ADHD symptoms produced more OD during the story retelling task than the other two tasks, whereas no significant differences in OD were observed among the three tasks in the control children. Finally, children with ADHD symptoms exhibited a higher proportion of SLD in total disfluencies (TD) than the control children. These results are consistent with previous studies that children with ADHD are disfluent in their verbal production. Furthermore, children with ADHD symptoms seem to be more vulnerable to a speaking task that places greater demands on their attentional resources for language production, resulting in increased speech disfluencies.

Keywords Stuttering-like disfluency
Magdalena Igras-Cybulska, Bartosz Ziółko, Piotr Żelasko, and Marcin Witkowski, “Structure of pauses in speech in the context of speaker verification and classification of speech type,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2016, no. 1, November 2016, pp. 18. DOI: 10.1186/s13636-016-0096-7.

Abstract Statistics of pauses appearing in Polish as a potential source of biometry information for automatic speaker recognition were described. The usage of three main types of acoustic pauses (silent, filled and breath pauses) and syntactic pauses (punctuation marks in speech transcripts) was investigated quantitatively in three types of spontaneous speech (presentations, simultaneous interpretation and radio interviews) and read speech (audio books). Selected parameters of pauses extracted for each speaker separately or for speaker groups were examined statistically to verify usefulness of information on pauses for speaker recognition and speaker profile estimation. Quantity and duration of filled pauses, audible breaths, and correlation between the temporal structure of speech and the syntax structure of the spoken language were the features which characterize speakers most. The experiment of using pauses in speaker biometry system (using Universal Background Model and i-vectors) resulted in 30 % equal error rate. Including pause-related features to the baseline Mel-frequency cepstral coefficient system has not significantly improved its performance. In the experiment with automatic recognition of three types of spontaneous speech, we achieved 78 % accuracy, using GMM classifier. Silent pause-related features allowed distinguishing between read and spontaneous speech by extreme gradient boosting with 75 % accuracy.
Jennifer A. Foote, and Pavel Trofimovich, “A Multidimensional Scaling Study of Native and Non-Native Listeners’ Perception of Second Language Speech,” Perceptual and Motor Skills, vol. 122, no. 2, 03/2016 2016, pp. 470-489. DOI: 10.1177/0031512516636528. http://pms.sagepub.com/content/122/2/470.

Abstract Second language speech learning is predicated on learners’ ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers’ pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training.

Keywords multidimensional scaling, second language speech, speech perception
Joana Cholin, Sabrina Heiler, Alexander Whillier, and Martin Sommer, “Premonitory Awareness in Stuttering Scale (PAiS),” Journal of Fluency Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.07.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300353.

Abstract Anticipation of stuttering events in persistent developmental stuttering is a frequent but inadequately measured phenomenon that is of both theoretical and clinical importance. Here, we describe the development and preliminary testing of a German version of the Premonitory Awareness in Stuttering Scale (PAiS) a 12-item questionnaire assessing immediate and prospective anticipation of stuttering that was translated and adapted from the Premonitory Urge for Tics Scale (PUTS) (Woods, Piacentini, Himle, & Chang, 2005). After refining the preliminary PAiS scale in a pilot study, we administered a revised version to 21 adults who stutter (AWS) and 21 age, gender and education-matched control participants. Results demonstrated that the PAiS had good internal consistency and discriminated the two speaker groups very effectively, with AWS reporting anticipation of speech disruptions significantly more often than adults with typical speech. Correlations between the PAiS total score and both the objective and subjective measures of stuttering severity revealed that AWS with high PAiS scores produced fewer stuttered syllables. This is possibly because these individuals are better able to adaptively use these anticipatory sensations to modulate their speech. These results suggest that, with continued refinement, the PAiS has the potential to provide clinicians and researchers with a practical and psychometrically sound tool that can quantify how a given AWS anticipates upcoming stuttering events.

Keywords premonitory awareness
Kristen Lucas, Sharon A. Kerrick, Jenna Haugen, and Cole J. Corider, “Communicating Entrepreneurial Passion: Personal Passion vs. Perceived Passion in Venture Pitches,” IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, vol. 59, no. 4, 10/2016 2016, pp. 363-378. DOI: 10.1109/TPC.2016.2607818. http://ieeexplore.ieee.org/document/7604127/.

Abstract Research problem: Entrepreneurial passion has been shown to play an important role in venture success and, therefore, in investors’ funding decisions. However, it is unknown whether the passion entrepreneurs personally feel or experience can be accurately assessed by investors during a venture pitch. Research questions: (1) To what extent does entrepreneurs’ personal passion align with investors’ perceived passion? (2) To what cues do investors attend when assessing entrepreneurs’ passion? Literature review: Integrating theory and research in entrepreneurship communication and entrepreneurial passion within the context of venture pitching, we explain that during venture pitches, investors make judgments about entrepreneurs’ passion that have consequences for their investment decisions. However, they can attend to only those cues that entrepreneurs outwardly display. As a result, they may not be assessing the passion entrepreneurs personally feel or experience. Methodology: We used a sequential explanatory mixed methods research design. For our data collection, we surveyed 40 student entrepreneurs, videorecorded their venture pitches, and facilitated focus groups with 16 investors who viewed the videos and ranked, rated, and discussed their perceptions of entrepreneurs’ passion. We conducted statistical analyses to assess the extent to which entrepreneurs’ personal passion and investors’ perceived passion aligned. We then performed an inductive analysis of critical cases to identify specific cues that investors attributed to passion or lack thereof. Results and conclusions: We revealed a large misalignment between entrepreneurs’ personal passion and investors’ perceived passion. Our critical case analysis demonstrated that entrepreneurs’ weak or strong presentation skills led investors either to underestimate or overestimate, respectively, perceptions of entrepreneurs’ passion. We suggest that entrepreneurs should develop specific presentation skills and rhetorical strategies for displaying their passion; at the same time, investors should be wary of attending too closely to presentation skills when assessing passion.

Keywords Communication effectiveness, oral communication, public speaking
Lisa Iverach, Mark Jones, Lauren F. McLellan, Heidi J. Lyneham, Ross G. Menzies, Mark Onslow, and Ronald M. Rapee, “Prevalence of anxiety disorders among children who stutter,” Journal of Fluency Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.07.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300067.

Abstract Purpose Stuttering during adulthood is associated with a heightened rate of anxiety disorders, especially social anxiety disorder. Given the early onset of both anxiety and stuttering, this comorbidity could be present among stuttering children. Method Participants were 75 stuttering children 7–12 years and 150 matched non-stuttering control children. Multinomial and binary logistic regression models were used to estimate odds ratios for anxiety disorders, and two-sample t-tests compared scores on measures of anxiety and psycho-social difficulties. Results Compared to non-stuttering controls, the stuttering group had six-fold increased odds for social anxiety disorder, seven-fold increased odds for subclinical generalized anxiety disorder, and four-fold increased odds for any anxiety disorder. Conclusion These results show that, as is the case during adulthood, stuttering during childhood is associated with a significantly heightened rate of anxiety disorders. Future research is needed to determine the impact of those disorders on speech treatment outcomes.

Keywords stuttering
Louise Cummings, Case Studies in Communication Disorders. New York: Cambridge University Press.2016. get-book.cfm?BookID=109554.

Abstract Designed for students of speech-language pathology, audiology and clinical linguistics, this valuable text introduces students to all aspects of the assessment, diagnosis and treatment of clients with developmental and acquired communication disorders through a series of structured case studies. Each case study includes questions which direct readers to important features of the case that will facilitate clinical learning. A selection of further readings encourages students to extend their knowledge of communication disorders. Key features of this book include: • 48 detailed case studies based on actual clients with communication disorders • 25 questions within each case study • Fully-worked answers to every question • 105 suggestions for further reading The text also develops knowledge of the epidemiology, aetiology, and linguistic and cognitive features of communication disorders, highlights salient aspects of client histories, and examines assessments and interventions used in the management of clients.

Keywords cognitive science, General Linguistics, Neurolinguistics, psycholinguistics
Carolyn Mancuso, and Raymond G. Miltenberger, “Using habit reversal to decrease filled pauses in public speaking,” Journal of Applied Behavior Analysis, vol. 49, no. 1, 2016, pp. 188–192. DOI: 10.1002/jaba.267. http://dx.doi.org/10.1002/jaba.267.

Abstract This study evaluated the effectiveness of simplified habit reversal in reducing filled pauses that occur during public speaking. Filled pauses consist of “uh,” “um,” or “er”; clicking sounds; and misuse of the word “like.” After baseline, participants received habit reversal training that consisted of awareness training and competing response training. During postintervention assessments, all 6 participants exhibited an immediate decrease in filled pauses.

Keywords awareness training, competing response training, habit reversal, public speaking
Martijn Wieling, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman, and Mark Liberman, “Variation and Change in the Use of Hesitation Markers in Germanic Languages,” Language Dynamics and Change, vol. 6, no. 2, 2016 2016, pp. 199-234. DOI: 10.1163/22105832-00602001. http://booksandjournals.brillonline.com/content/journals/10.1163/22105832-00602001.

Abstract In this study, we investigate crosslinguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.

Keywords corpus linguistics, crosslinguistic change, hesitation markers, language change
Michael P. Boyle, Lauren Dioguardi, and Julie E. Pate, “A comparison of three strategies for reducing the public stigma associated with stuttering,” Journal of Fluency Disorders, vol. 50, 09/2016 2016, pp. 44-58. DOI: 10.1016/j.jfludis.2016.09.004. http://www.sciencedirect.com/science/article/pii/S0094730X16300316.

Abstract Purpose. The effects of three anti-stigma strategies for stuttering—contact (hearing personal stories from an individual who stutters), education (replacing myths about stuttering with facts), and protest (condemning negative attitudes toward people who stutter)—were examined on attitudes, emotions, and behavioral intentions toward people who stutter. | Method. Two hundred and twelve adults recruited from a nationwide survey in the United States were randomly assigned to one of the three anti-stigma conditions or a control condition. Participants completed questionnaires about stereotypes, negative emotional reactions, social distance, discriminatory intentions, and empowerment regarding people who stutter prior to and after watching a video for the assigned condition, and reported their attitude changes about people who stutter. Some participants completed follow-up questionnaires on the same measures one week later. | Results. All three anti-stigma strategies were more effective than the control condition for reducing stereotypes, negative emotions, and discriminatory intentions from pretest to posttest. Education and protest effects for reducing negative stereotypes were maintained at one-week follow-up. Contact had the most positive effect for increasing affirming attitudes about people who stutter from pretest to posttest and pretest to follow-up. Participants in the contact and education groups, but not protest, self-reported significantly more positive attitude change about people who stutter as a result of watching the video compared to the control group. | Conclusion. Advocates in the field of stuttering can use education and protest strategies to reduce negative attitudes about people who stutter, and people who stutter can increase affirming attitudes through interpersonal contact with others.

Keywords Anti-stigma programs, Empowerment, Public stigma, Stereotypes, Stuttering advocacy
Milly Heelan, Jan McAllister, and Jane Skinner, “Stuttering, alcohol consumption and smoking,” Journal of Fluency Disorders, vol. 48, 2016, pp. 27 - 34. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.05.001. http://www.sciencedirect.com/science/article/pii/S0094730X1630016X.

Abstract Purpose: Limited research has been published regarding the association between stuttering and substance use. An earlier study provided no evidence for such an association, but the authors called for further research to be conducted using a community sample. The present study used data from a community sample to investigate whether an association between stuttering and alcohol consumption or regular smoking exists in late adolescence and adulthood. Methods: Regression analyses were carried out on data from a birth cohort study, the National Child Development Study (NCDS), whose initial cohort included 18,558 participants who have since been followed up until age 55. In the analyses, the main predictor variable was parent-reported stuttering at age 16. Parental socio-economic group, cohort member’s sex and childhood behavioural problems were also included. The outcome variables related to alcohol consumption and smoking habits at ages 16, 23, 33, 41, 46, 50 and 55. Results: No significant association was found between stuttering and alcohol consumption or stuttering and smoking at any of the ages. It was speculated that the absence of significant associations might be due to avoidance of social situations on the part of many of the participants who stutter, or adoption of alternative coping strategies. Conclusion: Because of the association between anxiety and substance use, individuals who stutter and are anxious might be found to drink or smoke excessively, but as a group, people who stutter are not more likely than those who do not to have high levels of consumption of alcohol or nicotine.

Keywords Birth cohort
Nadia Brejon Teitler, Sandrine Ferré, and Clémentine Dailly, “Specific subtype of fluency disorder affecting French speaking children: A phonological analysis,” Journal of Fluency Disorders, vol. 50, 2016, pp. 33 - 43. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.09.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300237.

Abstract Purpose Clinicians working with fluency disorders sometimes see children whose word repetitions are mostly located at the end of words and do not induce physical tension. Prior studies on the topic have proposed several names for these disfluencies including “end word repetitions”, “final sound repetitions” and “atypical disfluency”. The purpose of this study was to use phonological analysis to explore the patterns of this poorly recognized fluency disorder in order to better understand its specific speech characteristics. Methods We analyzed a spontaneous language sample of 8 French speaking children. Audio and video recordings allowed us to study general communication issues as well as linguistic and acoustical data. Results We did not detect speech rupture or coarticulation failures between the syllable onset and rhyme. The problem resides primarily on the rhyme production with a voicing interruption in the middle of the syllable nucleus or a repetition of the rhyme (nucleus alone or nucleus and coda), regardless of the position in the word or phrase. Conclusion The present study provides data suggesting that there exist major differences in syllable production between the disfluencies produced by our 8 children and stuttered disfluencies. Consequently, we believe that this fluency disorder should be recognized as distinct from stuttering.

Keywords Syllable rhyme
Naomi Hertsberg, and Patricia M. Zebrowski, “Self-perceived competence and social acceptance of young children who stutter: Initial findings,” Journal of Communication Disorders, vol. 64, 2016, pp. 18 - 31. DOI: http://dx.doi.org/10.1016/j.jcomdis.2016.08.004. http://www.sciencedirect.com/science/article/pii/S0021992416301083.

Abstract Purpose. The goals of this study were to determine whether young children who stutter (CWS) perceive their own competence and social acceptance differently than young children who do not stutter (CWNS), and to identify the predictors of perceived competence and social acceptance in young speakers. | Method. We administered the "Pictorial Scale of Perceived Competence and Social Acceptance for Young Children" (PSPCSA; Harter & Pike, 1984) to 13 CWS and 14 CWNS and examined group differences. We also collected information on the children’s genders, temperaments, stuttering frequencies, language abilities, and phonological skills to identify which of these factors predicted PSPCSA scores. | Results. CWS, as a group, did not differ from CWNS in their perceived general competence or social acceptance. Gender predicted scores of perceived general competence, and stuttering frequency predicted perceived social acceptance. Temperament, language abilities, and phonological skills were not significant predictors of perceived competence or social acceptance in our sample. | Conclusions. While CWS did not significantly differ from CWNS in terms of perceived competence and social acceptance, when both talker groups were considered together, girls self-reported greater perceived competence than boys. Further, lower stuttering frequency was associated with greater perceived social acceptance. These preliminary findings provide motivation for further empirical study of the psychosocial components of childhood stuttering. | Learning outcomes. Readers will be able to describe the constructs of perceived competence and social acceptance in young children, and whether early stuttering plays a role in the development of these constructs.

Keywords children
Olga Kozar, “Teachers’ reaction to silence and teachers’ wait time in video and audioconferencing English lessons: Do webcams make a difference?,” System, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.system.2016.07.002. http://www.sciencedirect.com/science/article/pii/S0346251X16300720.

Abstract There is a mismatch between an increasing number of people teaching languages via video or audioconferencing tools, and the amount of research available to such teachers to guide their practice. One particular pedagogical question that research does not provide guidance on teachers’ treatment of during videoconferencing and audioconferencing lessons. This study uses Conversation Analysis to compare lessons conducted by the same teacher-student dyads in audio and videoconferencing. The findings show distinct differences in teachers’ treatment of silence and teachers’ and students’ pausing behaviour in video and audioconferencing. Specifically, teachers tended to wait longer in videoconferencing and took the conversational floor faster in audioconferencing, thus leading to a higher number of overlaps with students’ emergent turns. This suggests that teachers need to be trained for conducting lessons via audio and video conferencing, and that teachers and teacher trainers need to identify specific pedagogical behaviours for each of these contexts.

Keywords Online language teaching
Mary Grantham O’Brien, “Methodological Choices in Rating Speech Samples,” Studies in Second Language Acquisition, vol. 38, 9 2016, pp. 587–605. DOI: 10.1017/S0272263115000418. http://journals.cambridge.org/article_S0272263115000418.

Abstract Much pronunciation research critically relies upon listeners’ judgments of speech samples, but researchers have rarely examined the impact of methodological choices. In the current study, 30 German native listeners and 42 German L2 learners (L1 English) rated speech samples produced by English-German L2 learners along three continua: accentedness, fluency, and comprehensibility. The goal was to determine whether rating condition, that is, (a) whether each speech sample is rated along all three continua after it is heard once or (b) whether all speech samples are rated along one continuum before being rated along the next continuum, and continuum order (e.g., whether participants rate speech samples for accentedness before comprehensibility or fluency) have an effect on listeners’ ratings. Results indicate no significant overall effects of rating condition or continuum order, but there is evidence of rating condition effects by listener group. The results have implications for laboratory and classroom assessments of L2 speech.
Ross Menzies, Sue O’Brian, Robyn Lowe, Ann Packman, and Mark Onslow, “International Phase II clinical trial of CBTPsych: A standalone Internet social anxiety treatment for adults who stutter,” Journal of Fluency Disorders, vol. 48, 2016, pp. 35-43. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.06.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300195.

Abstract Purpose : is an individualized, fully automated, standalone Internet treatment program that requires no clinical contact or support. It is designed specifically for those who stutter. Two preliminary trials demonstrated that it may be efficacious for treating the social anxiety commonly associated with stuttering. However, both trials involved pre- and post-treatment assessment at a speech clinic. This contact may have increased compliance, commitment and adherence with the program. The present study sought to establish the effectiveness of : in a large international trial with no contact of any kind from researchers or clinicians. Method Participants were 267 adults with a reported history of stuttering who were given a maximum of 5 months access to CBTPsych. Pre-and post-treatment functioning was assessed within the online program with a range of psychometric measures. Results Forty-nine participants (18.4%) completed all seven modules of : and completed the post-treatment online assessments. That compliance rate was far superior to similar community trials of self-directed Internet mental health programs. Completion of the program was associated with large, statistically and clinically significant reductions for all measures. The reductions were similar to those obtained in earlier trials of CBTPsych, and those obtained in trials of in-clinic {CBT} with an expert clinician. Conclusions : is a promising individualized treatment for social anxiety for a proportion of adults who stutter, which requires no health care costs in terms of clinician contact or support. Educational objectives The reader will be able to: (a) Discuss the reasons for investigating : without any clinical contact (b) Describe the main components of the : treatment; (c) Summarize the results of this clinical trial; (d) Describe how the results might affect clinical practice, if at all.

Keywords Stuttering, Cognitive behavior therapy, E-therapy, Internet
Benjamin G. Schultz, Irena O’Brien, Natalie Phillips, David H. McFarland, Debra Titone, and Caroline Palmer, “Speech rates converge in scripted turn-taking conversations,” Applied Psycholinguistics, vol. 37, 09/2016 2016, pp. 1201–1220. DOI: 10.1017/S0142716415000545. http://journals.cambridge.org/article_S0142716415000545.

Abstract When speakers engage in conversation, acoustic features of their utterances sometimes converge. We examined how the speech rate of participants changed when a confederate spoke at fast or slow rates during readings of scripted dialogues. A beat-tracking algorithm extracted the periodic relations between stressed syllables (beats) from acoustic recordings. The mean interbeat interval (IBI) between successive stressed syllables was compared across speech rates. Participants’ IBIs were smaller in the fast condition than in the slow condition; the difference between participants’ and the confederate’s IBIs decreased across utterances. Cross-correlational analyses demonstrated mutual influences between speakers, with greater impact of the confederate on participants’ beat rates than vice versa. Beat rates converged in scripted conversations, suggesting speakers mutually entrain to one another’s beat.
Ye Tian, Takehiko Maruyama, and Jonathan Ginzburg, “Self Addressed Questions and Filled Pauses: A Cross-linguistic Investigation,” Journal of Psycholinguistic Research, 12/2016 2016, pp. 1–18. DOI: 10.1007/s10936-016-9468-5. http://dx.doi.org/10.1007/s10936-016-9468-5.

Abstract There is an ongoing debate whether phenomena of disfluency (such as filled pauses) are produced communicatively. Clark and Fox Tree (Cognition 84(1):73–111, 2002) propose that filled pauses are words, and that different forms signal different lengths of delay. This paper evaluates this Filler-As-Words hypothesis by analyzing the distribution of self-addressed-questions or SAQs (such as ‘‘what’s the word’’) in relation to filled pauses. We found that SAQs address different problems in different languages (most frequently about memory-retrieval in English and Chinese, and about appropriateness in Japanese). In relation to filled pauses, British but not American English uses ‘‘um’’ to signal a more severe problem than ‘‘uh’’. Chinese uses different filled pauses to signal the syntactic category of the problem constituent. Japanese uses different filled pauses to signal levels of interaction with the interlocuter. Overall, our data supports the Filler-As-Words hypothesis that filled pauses are used communicatively. However, the dimensions of its meanings vary across languages and dialects.

Keywords Cross-linguistic analysis, disfluency, filled pauses, Self addressed questions
Gunnel Tottie, “Planning what to say: Uh and um among the pragmatic markers,” in Outside the Clause: Form and function of extra-clausal constituents (Outside the Clause: Form and function of extra-clausal constituents), .: John Benjamins, 2016, pp. 97-122. https://benjamins.com/$#$catalog/books/slcs.178.04tot/details.

Abstract Based on data from the Santa Barbara Corpus of Spoken American English, this paper argues that the vocalizations [ə(:)] and [ə(:)m]), usually transcribed 'uh' and 'um,' can be regarded as pragmatic markers, rather than as undesirable disfluencies or hesitation markers. It is shown that they are especially frequent in registers and contexts that require more planning by speakers, like narrative passages in conversation and in task-related contexts, especially in long turns. The term 'planner' is therefore proposed as an appropriate designation. Co-occurrences of 'uh' and 'um' with other pragmatic markers such as 'well, you know, I mean' and 'like' as well as with 'and' and 'but' are shown to support this view.
Vincent Hughes, Sophie Wood, and Paul Foulkes, “Strength of forensic voice comparison evidence from the acoustics of filled pauses,” International Journal of Speech Language and the Law, vol. 23, no. 1, 2016, pp. 99-132. DOI: 10.1558/ijsll.v23i1.29874. https://journals.equinoxpub.com/index.php/IJSLL/article/view/29874.

Abstract This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed, drawn from Task 1 of the DyViS corpus (Nolan et al. 2009). The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (Brümmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; Brümmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases.

Keywords durations, Forensic voice comparison, formant dynamics, hesitation markers, likelihood ratio
Vincenza Tudini, “Repair and codeswitching for learning in online intercultural talk,” System, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.system.2016.06.011. http://www.sciencedirect.com/science/article/pii/S0346251X16300641.

Abstract This study examines the role of repair and code switching for language learning in online written interaction between two speakers of both Italian and English as, respectively, either an L1 or L2. Specifically, during episodes of general repair and corrective feedback, these geographically dispersed university language students used both languages in their repertoire as key interactional and learning resources to co-construct a language learning partnership and pursue affiliation. Despite the face-threatening nature of corrective feedback, also known as other-initiated other-repair, participants managed to construct and maintain intersubjectivity in the text chat environment by availing themselves of the reciprocal possibilities of their bilingual expertise, thus overcoming linguistic asymmetries. In this way both social and learning objectives were achieved during written talk-in-interaction, suggesting that online language learning partnerships with multilingual intercultural speakers of the target language rather than monolingual native speaker partners should be given a more prominent role in languages programs across sectors.

Keywords Written talk-in-interaction
Yvonne Préfontaine, Judit Kormos, and Daniel Ezra Johnson, “How do utterance measures predict raters’ perceptions of fluency in French as a second language?,” Language Testing, vol. 33, no. 1, 2016, pp. 53-73. DOI: 10.1177/0265532215579530. http://dx.doi.org/10.1177/0265532215579530.

Abstract While the research literature on second language (L2) fluency is replete with descriptions of fluency and its influence with regard to English as an additional language, little is known about what fluency features influence judgments of fluency in L2 French. This study reports the results of an investigation that analyzed the relationship between utterance fluency measures and raters’ perceptions of L2 fluency in French using mixed-effects modeling. Participants were 40 adult learners of French at varying levels of proficiency, studying in a university immersion context. Speech performances were collected on three different types of narrative tasks. Four utterance fluency measures were extracted from each performance. Eleven untrained judges rated the speech performances and we examined which utterance fluency measures are the best predictors of the scores awarded by the raters. The mean length of runs and articulation rate proved to be the most influential factors in raters’ judgments, while the frequency of pauses played a less important role. The length of pauses was positively related to fluency scores, indicating a prominent cross-linguistic variation specific to French. The relative importance of the utterance measures in predicting fluency ratings, however, was found to vary across tasks.
Peyman Zamani, Majid Ravanbakhsh, Farzad Weisi, Vahid Rashedi, Sara Naderi, Ayub Hosseinzadeh, and M Rezaei, “Effect(s) of Language Tasks on Severity of Disfluencies in Preschool Children with Stuttering,” Journal of Psycholinguistic Research, 05/2016 2016. DOI: 10.1007/s10936-016-9437-z. http://dx.doi.org/10.1007/s10936-016-9437-z.

Abstract Speech disfluency in children can be increased or decreased depending on the type of linguistic task presented to them. In this study, the effect of sentence imitation and sentence modeling on severity of speech disfluencies in preschool children with stuttering is investigated. In this cross-sectional descriptive analytical study, 58 children with stuttering (29 with mild stuttering and 29 with moderate stuttering) and 58 typical children aged between 4 and 6 years old participated. The severity of speech disfluencies was assessed by SSI-3 and TOCS before and after offering each task. In boys with mild stuttering, The mean stuttering severity scores in two tasks of sentence imitation and sentence modeling were 21.81±1.7221.81±1.72 and 12.94±1.3812.94±1.38 respectively (P=0.837P=0.837). But, in boys with moderate stuttering the stuttering severity in the both tasks were 23.79±1.2623.79±1.26 and 29.00±2.0329.00±2.03 respectively (P=0.004P=0.004). In girls with mild stuttering, the stuttering severity in two tasks of sentence imitation and sentence modeling were 13.14±2.4713.14±2.47 and 13.86±2.0313.86±2.03 respectively (P=0.094P=0.094). But, in girls with moderate stuttering the mean stuttering severity in the both tasks were 25.27±1.9325.27±1.93 and 33.18±2.3233.18±2.32 respectively (P=0.007P=0.007). In both gender of typical children, the score of speech disfluencies had no significant difference between two tasks (P>0.05P>0.05). In preschool children with mild stuttering and peer non-stutters, performing the tasks of sentence imitation and sentence modeling could not increase the severity of stuttering. But, in preschool children with moderate stuttering, doing the task of sentence modeling increased the stuttering severity score.

2015

Jamie Lynn Armbrecht, “Hesitation Rate as a Speaker-Specific Cue inBilingual Individuals,” Master's Thesis, University of South Florida. June 2015. https://scholarcommons.usf.edu/etd/5634.

Abstract Hesitation use is common among all speakers, regardless of whether they are engaged in their dominant or non-dominant language (Fehringer & Fry, 2007; Reed, 2000). The question is whether a bilingual speaker will engage in the same types of hesitations in both languages. If hesitation patterns can be identified consistently across speakers regardless of language, their use as an acoustic cue for speaker identification may be possible. This study examines differences in hesitation use across languages and speaking contexts (reading vs. conversation) in bilingual speakers. | Twenty Spanish-English bilinguals (ages 19 -31 years) were tested as part of a larger speaker identification project focusing on bilingual speech patterns. These individuals were recorded in a sound-treated booth while speaking extemporaneously and reading a standardized passage in both Spanish and English. Unfilled pause length and speech segment durations were obtained from one minute speech samples using Praat scripts (Boersma & Weenink, 2014). Pause to speaking ratios were computed in Excel. The number of filled pauses were determined from the same one minute speech samples in English and Spanish. Differences in planning style were demonstrated with step graphs which compared both the frequency and length of alternations between speech and pauses in two participants with different planning styles. | Wilcoxon signed ranks tests revealed significant differences in the use of unfilled pauses across speaking contexts in both languages. Both pause to speaking ratios and pause durations were larger in spontaneous speech when compared to read speech. Speech segment durations were shorter in extemporaneous speech and filled pauses were more common in spontaneous speech. | Cross-language comparisons were considered within each speaking condition. Results indicated few instances where there were significant differences. There were longer speech segment durations in read speech and more filled pause use in spontaneous speech in English. Further demonstration of these patterns was illustrated through step graphs. | The similarities in the hesitation phenomenon between languages suggests that bilingual speakers often use the same planning aspects between languages and carryover aspects of speech production from their first language to their second (Fehringer & Fry, 2007). Therefore, comparisons within and across languages within a specific speaking condition may be useful in speaker identification. However, these findings also indicate the need for caution when comparing speech samples across speaking conditions using unfilled and filled pauses. One should consider hesitation as one of several acoustic cues for use in speaker identification in a cross-language situation.

Keywords Bilingual; Hesitation; Reading; Speaker Identification; Spontaneous Speech
Malte Belz, and Uwe Reichel, “Pitch Characteristics of Filled Pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract We investigate the pitch characteristics of filled pauses in order to distinguish between hesitational and floor-holding functions of filled pauses. A corpus of spontaneous dialogues is explored using a parametric bottom-up approach to extract intonation contours. We find that subjects tend to utter filled pauses more prominently when they cannot see each other, which indicates an increased floor-holding usage of filled pauses in this condition.

Keywords disfluencies, DiSS, filled pauses, floor-holding, intonation
Hans Rutger Bosker, and Eva Reinisch, “Normalization for Speechrate in Native and Nonnative Speech,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0324.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0324.pdf.

Abstract Speech perception involves a number of processes that deal with variation in the speech signal. One such process is normalization for speechrate: local temporal cues are perceived relative to the rate in the surrounding context. It is as yet unclear whether and how this perceptual effect interacts with higher level impressions of rate, such as a speaker’s nonnative identity. Nonnative speakers typically speak more slowly than natives, an experience that listeners take into account when explicitly judging the rate of nonnative speech. The present study investigated whether this is also reflected in implicit rate normalization. Results indicate that nonnative speech is implicitly perceived as faster than temporally-matched native speech, suggesting that the additional cognitive load of listening to an accent speeds up rate perception. Therefore, rate perception in speech is not dependent on syllable durations alone but also on the ease of processing of the temporal signal.

Keywords cognitive load, implicit processing, nonnative speech, speech perception, speechrate
Hans Rutger Bosker, Jade Tjiong, Hugo Quené, Ted Sanders, and Nivja De Jong, “Both native and non-native disfluencies trigger listeners’ attention,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For instance, disfluencies may elicit prediction of less accessible referents and may trigger listeners’ attention to the following word. However, recent work suggests differential processing of disfluencies in native and non-native speech. The current study investigated whether the beneficial effects of disfluencies on listeners’ attention are modulated by the (non-)native identity of the speaker. Using the Change Detection Paradigm, we investigated listeners’ recall accuracy for words presented in disfluent and fluent contexts, in native and non-native speech. We observed beneficial effects of both native and non-native disfluencies on listeners’ recall accuracy, suggesting that native and non-native disfluencies trigger listeners’ attention in a similar fashion.

Keywords attention, Change Detection Paradigm, disfluencies, DiSS, non-native speech
Angelika Braun, and Annabelle Rosin, “On the Speaker-Specificity of Hesitation Markers,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0731.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0731.pdf.

Abstract The occurrence of hesitation markers is generally considered to be part of the verbal planning process. It is also a feature which is of potential importance to the forensic application of phonetics if hesitation behaviour could be linked to individual speakers. This study examines a total of eight female speakers on three different days. It can be demonstrated that, even though results vary across sessions, subjects exhibit distinct patterns of hesitation marker usage. This pertains to the number as well as the type of hesitations marker, which makes this feature a potential candidate for forensic investigations.

Keywords forensic phonetics, verbal planning
Vera Cabarrão, Helena Moniz, Jaime Ferreira, and Fernando Batista, “Prosodic Classification of Discourse Markers,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0634.1-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0634.pdf.

Abstract The first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, and which correspond to words that are neither markers nor disfluencies (chunks). Our goal is to automatically predict discourse markers and include them in rich transcripts, along with other structural metadata events (e.g., disfluencies and punctuation marks) that are already encompassed in the language models of our in-house speech recognizer. Results show that the automatic classification of discourse markers is better for the lectures corpus (87%) than for the dialogue corpus (84%). Nonetheless, in both corpora, discourse markers are more easily confused with chunks than with disfluencies.

Keywords Dialogues, Discourse markers, Lectures, prosody, Structural Metadata Events
Rasmus Dall, Mirjam Wester, and Martin Corley, “Disfluencies in change detection in natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract In this paper, we investigate the effect of filled pauses, a discourse marker and silent pauses in a change detection experiment in natural, vocoded and synthetic speech. In natural speech change detection has been found to increase in the presence of filled pauses, we extend this work by replicating earlier findings and explore the effect of a discourse marker, like, and silent pauses. Furthermore we report how the use of "unnatural" speech, namely synthetic and vocoded, affects change detection rates. It was found that the filled pauses, the discourse marker and silent pauses all increase change detection rates in natural speech, however in neither synthetic nor vocoded speech did this effect appear. Rather, change detection rates decreased in both types of "unnatural" speech compared to natural speech. The natural results suggests that while each type of pause increase detection rates, the type of pause may have a further effect. The "unnatural" results suggest that it is not the full pipeline of synthetic speech that causes the degradation, but rather that something in the pre-processing, i.e. vocoding, of the speech database limits the resulting synthesis.

Keywords change detection, DiSS, filled pauses, speech synthesis
Nivja H. de Jong, Rachel Groenhout, Rob Schoonen, and Jan H. Hulstijn, “Second language fluency: Speaking style or proficiency? Correcting measures of second language fluency for first language behavior,” Applied Psycholinguistics, vol. 36, no. 2, 03/2015 2015, pp. 223-243. DOI: 10.1017/S0142716413000210. http://journals.cambridge.org/article_S0142716413000210.

Abstract In second language (L2) research and testing, measures of oral fluency are used as diagnostics for proficiency. However, fluency is also determined by personality or speaking style, raising the question to what extent L2 fluency measures are valid indicators of L2 proficiency. In this study, we obtained a measure of L2 (Dutch) proficiency (vocabulary knowledge), L2 fluency measures, and fluency measures that were corrected for first language behavior from the same group of Turkish and English native speakers (N = 51). For most measures of fluency, except for silent pause duration, both the corrected and the uncorrected measures significantly predicted L2 proficiency. For syllable duration, the corrected measure was a stronger predictor of L2 proficiency than was the uncorrected measure. We conclude that for L2 research purposes, as well as for some types of L2 testing, it is useful to obtain corrected measures of syllable duration to measure L2-specific fluency.
Mark Dingemanse, Seán G. Roberts, Julija Baranova, Joe Blythe, Paul Drew, Simeon Floyd, Rosa S. Gisladottir, Kobin H. Kendrick, Stephen C. Levinson, Elizabeth Manrique, Giovanni Rossi, and N. J. Enfield, “Universal Principles in the Repair of Communication Problems,” PLoS ONE, vol. 10, no. 9, 09/2015 2015, pp. e0136100. DOI: 10.1371/journal.pone.0136100. http://dx.doi.org/10.1371%2Fjournal.pone.0136100.

Abstract There would be little adaptive value in a complex communication system like human language if there were no ways to detect and correct problems. A systematic comparison of conversation in a broad sample of the world’s languages reveals a universal system for the real-time resolution of frequent breakdowns in communication. In a sample of 12 languages of 8 language families of varied typological profiles we find a system of ‘other-initiated repair’, where the recipient of an unclear message can signal trouble and the sender can repair the original message. We find that this system is frequently used (on average about once per 1.4 minutes in any language), and that it has detailed common properties, contrary to assumptions of radical cultural variation. Unrelated languages share the same three functionally distinct types of repair initiator for signalling problems and use them in the same kinds of contexts. People prefer to choose the type that is the most specific possible, a principle that minimizes cost both for the sender being asked to fix the problem and for the dyad as a social unit. Disruption to the conversation is kept to a minimum, with the two-utterance repair sequence being on average no longer that the single utterance which is being fixed. The findings, controlled for historical relationships, situation types and other dependencies, reveal the fundamentally cooperative nature of human communication and offer support for the pragmatic universals hypothesis: while languages may vary in the organization of grammar and meaning, key systems of language use may be largely similar across cultural groups. They also provide a fresh perspective on controversies about the core properties of language, by revealing a common infrastructure for social interaction which may be the universal bedrock upon which linguistic diversity rests.
Stephanie Don, and Robin Lickley, “Uh I forgot what I was going to say: How memory affects fluency,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Disfluency rates vary considerably between individuals. Previous studies have considered gender, age and conversational roles amongst other factors that may affect fluency. Testing a nonclinical, gender-balanced population of young adults performing the same speaking tasks, this study explores how inter-speaker variations in working memory and in long-term (lexical) memory affect disfluency in two different ways. Working memory was tested by a forward digit span test; long-term lexical memory was tested by the Verbal Fluency Test, both semantic and phonological versions. In addition, each participant provided 3 one-minute samples of monologue speech. The speech samples were analysed for disfluencies. Speakers with lower working memory scores produced more error repairs in running speech. Speakers with lower lexical access scores produced a higher rate of hesitations. The two types of memory affected fluency in different ways.

Keywords DiSS, error repair, hesitation, long term lexical memory, working memory
Robert Eklund, Peter Fransson, and Martin Ingvar, “Neural correlates of the processing of unfilled and filled pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Spontaneously produced Unfilled Pauses (UPs) and Filled Pauses (FPs) were played to subjects in an fMRI experiment. While both stimuli resulted in increased activity in the Primary Auditory Cortex, FPs, unlike UPs, also elicited modulation in the Supplementary Motor Area, Brodmann Area 6. This observation provides neurocognitive confirmation of the oft-reported difference between FPs and other kinds of speech disfluency and also could provide a partial explanation for the previously reported beneficial effect of FPs on reaction times in speech perception. The results are discussed in the light of the suggested role of FPs as floor-holding devices in human polylogs.

Keywords Auditory Cortex, BA6, Brodmann Area 6, DiSS, filled pauses, fMRI, PAC, SMA, speech disfluency, speech perception, spontaneous speech, Supplementary Motor Area, unfilled pauses
Ewa Guz, “Establishing the Fluency Gap Between Native and Non-Native-Speech,” Research in Language, vol. 13, no. 3, 2015. DOI: 10.1515/rela-2015-0021. https://www.degruyter.com/view/j/rela.2015.13.issue-3/rela-2015-0021/rela-2015-0021.xml.

Abstract Although various dimensions of speech fluency have so far generated a great deal of research interest, very few accounts have tackled the issue of the relationship between L1 and L2 fluency. Also, little empirical evidence has been provided to support the claim that language users are more fluent in their mother tongue than in a foreign/second language. This study examines the fluency gap between L1 and L2 fluency using a battery of objectively quantifiable temporal measures of speed and breakdown fluency. It also attempts to identify those temporal fluency variables which are affected by the individual way of speaking rather than the degree of automatisation of speech processing and which underlie oral performance both in L1 and L2. The analysis draws on transcriptions of elicited speech samples in L1 (Polish) and L2 (English).

Keywords breakdown fluency, hesitation phenomena, L1/ L2 speech fluency, pausing, speech rate, speed fluency, temporal measures of fluency
Elena Galkina, “Processing of Garden-Path Sentences Containing Silent and Filled Pauses in Stuttered Speech: Evidence From a Comprehensive Study,” Master's Thesis, University of South Carolina - Columbia, Columbia, South Carolina, USA, . 2015. http://scholarcommons.sc.edu/etd/3139.

Abstract Disfluency is common in spontaneous speech. Self-correction is a type of disfluency that consists of reparandum, filler, and repair (Levelt, 1989). Little is known about the processing of self-corrections in a normally disfluent speech, and even less is known about its processing in atypically disfluent speech (e.g. speech in patients with autism spectrum disorder, hearing impaired, patients with brain damage, and stuttered speech; see: Lake, Humphreys, & Cardy, 2011; Lind, Hickson, & Erber, 2004; Plexico et al., 2010; Rossi et al., 2011; Yairi, Gintautas, & Avent, 1981). This study focuses on self-correction disfluencies in garden-path sentences and employs a behavioral data collection method to investigate how disfluencies are processed as they are heard. This experiment examines spoken language comprehension by measuring accuracy and response time to comprehension questions. The data was gathered and analyzed. Two experimental conditions were presented where in the first one normal speakers listened to typically disfluent speech, and in the second one normal speakers listened to atypically disfluent stuttered speech. The information about the speakers in the recorded stimuli was kept from the listeners. Fillers, such as uh and um are common in stuttered speech because of their helpful role in starting an utterance. In stuttered speech, the uhs, ums and pauses tend to be longer and in odd places, relative to the speech of people who do not stutter. Therefore, the hypothesis of this study was that the fillers and pauses made by people who stutter affect the dynamics of processing, particularly in garden-path sentences. Namely, the accuracy rate for the comprehensive questions was predicted to be lower for the garden-path filled pause sentences, particularly for atypical speaker condition. Reaction time was predicted to be longer for the same condition. The analysis revealed an accuracy measure dependence on the speaker condition but no significant time correlation. This study provides significant information about how normal speakers’ comprehension is affected by disfluency such as pauses in general, and how speech impairment, such as stuttering, affects the processing of filled and silent pause disfluecies.
Lorenzo García-Amaya, “A longitudinal study of filled pauses and silent pauses in second language speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract This study provides a longitudinal analysis of speech rate and the use of filled pauses (FPs) and unfilled or silent pauses (SPs) in the oral production of L2 learners of Spanish in two learning contexts: a 6-week intensive overseas immersion program (OIM), and a 15-week US-based ‘at-home’ foreign language classroom (AH). Fifty-six native speakers of English performed two video-retell tasks at three different time points. A total of five measurements of oral production were calculated. The results show a significant increase in rate of speech over time in the OIM group compared to the AH group. Additionally, the OIM learners show greater use of “disfluencies” over time, namely FPs and short Sps. We suggest that OIM learners increase their use of hesitation phenomena over time as a speech processing and planning strategy and discuss this finding within the framework of L2 cognitive Fluency.

Keywords disfluencies, DiSS, filled pauses, rate of speech, second language fluency, silent pauses, Spanish, study abroad
Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in multiparty social talk,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Much research on disfluencies in spontaneous spoken interaction has been carried out on corpora of task-based conversations, resulting in greater understanding of the role of several phenomena. Modern multimodal corpora allow the full spectrum of signals in face to face communication to be analysed. However, the ‘unmarked’ case of casual conversation or social talk with no obvious short-term instrumental goal has been less studied in this manner. Corpus-based work on social talk tends to deal with short dyadic interactions, although the norm for social conversation is for longer multiparty interaction. In this paper, we outline our programme of exploratory studies of disfluency in a longer multiparty conversation. We briefly describe the background to our research goals, and then report on the collection, transcription, and annotation of the data for our experiments. We present and discuss some of our early results.

Keywords casual conversation, disfluency, DiSS, hesitation, repair, spoken interaction
Iulia Grosman, “Complexity cues or attention triggers? Repetitions and editing terms for native speakers of French,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract A growing stream of research shows evidence of the metalinguistic information that disfluencies (silent and filled pauses, repetitions, false-starts, repairs, etc.) can display to listeners. As a result, disfluencies may work as fluent devices. By means of a decision task latencies, this study investigates whether lexical repetition co-occurring with an editing term affects the perception of native speakers of French. There is a lack of consensus in the literature: do disfluencies trigger conceptual priming of complex entity or act simply as attention cues? Results from multiple analysis of variance and linear mixed-effect modelling show that the presence of a disfluency triggers a faster response from the participant, however complex the following noun-phrase might be, supporting the hypothesis that repetition and co-occurring editing terms act as cognitive signposts rather than as cues of complexity of an upcoming event.

Keywords disfluencies, DiSS, French, perception, prosody, reaction time, repetitions
Sandra Götz, “Fluency in ENL, ESL and EFL: A corpus-based approach,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Against the background of a ‘cline model’ of increasing fluency/decreasing disfluency from ENL to ESL to EFL forms of English, the present pilot study investigates (dis)fluency features in British English, Sri Lankan English and German Learner English. The analysis of selected variables of temporal fluency (viz. unfilled pauses, mean length of runs) and fluency-enhancement strategies (viz. discourse markers, smallwords and repeats) is based on the c. 40,000-word subcorpora of the British and the Sri Lankan components of the International Corpus of English (ICE-GB and ICE-SL) and the c. 80,000-word German component of the Louvain International Database of Spoken English Interlanguage (LINDSEI-GE). The study reveals that, while the EFL variant shows the lowest degree of temporal fluency (e.g. the highest number of unfilled pauses), the findings are mixed for ESL and ENL (e.g. the ESL speakers show a lower number of unfilled pauses, but the ENL speakers show a higher number of smallwords). Also, variant-specific preferences of using certain fluency-enhancement strategies become clearly visible.

Keywords corpus-based (dis)fluency, DiSS, ENL vs. ESL vs. EFL, Fluency, fluency profiles
Zara Harmon, and Vsevolod Kapatsinski, “Studying the dynamics of lexical access using disfluencies,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Faced with planning problems related to lexical access, speakers take advantage of a major function of disfluencies: buying time. It is reasonable, then, to expect that the structure of disfluencies sheds light on the mechanisms underlying lexical access. Using data from the Switchboard Corpus, we investigated the effect of semantic competition during lexical access on repetition disfluencies. We hypothesized that the more time the speaker needs to access the following unit, the longer the repetition. We examined the repetitions preceding verbs and nouns and tested predictors influencing the accessibility of these items. Results suggest that speed of lexical access negatively correlates with the length of repetition and that the main determinants of lexical access speed differ for verbs and nouns. Longer disfluencies before verbs appear to be due to significant paradigmatic competition from semantically similar verbs. For nouns, they occur when the noun is relatively unpredictable given the preceding context.

Keywords DiSS, lexical access, lexicalization, repetition, semantic competition, sentence planning
Clara Hedenqvist, Frida Persson, and Robert Eklund, “Disfluency incidence in 6-year old Swedish boys and girls with typical language development,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract This paper reports the prevalence of disfluencies in a group of 55 (25F/30M) Swedish children with typical speech development, and within the age range 6;0 and 6;11. All children had Swedish as their mother tongue. Speech was elicited using an “event picture” which the children described in their own, spontaneously produced, words. The data were analysed with regard to sex differences and lexical ability, including size of vocabulary and word retrieval, which was assessed using the two tests Peabody Picture Vocabulary Test and Ordracet. Results showed that girls produced significantly more unfilled pauses, prolongations and sound repetitions, while boys produced more word repetitions. However, no correlation with lexical development was found. The results are of interest to speech pathologists who study early speech development in search for potential early predictors of speech pathologies.

Keywords children, DiSS, lexical development, sex differences, speech disfluency
Julian Hough, Laura de Ruiter, Simon Betz, and David Schlangen, “Disfluency and laughter annotation agreement in a light-weight dialogue mark-up protocol,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Despite a great deal of research effort, disfluency and laughter annotation is still an unsolved problem, both in terms of consensus for a general applicable system, and in terms of annotation agreement metrics. In this paper we present a new annotation scheme within a light-weight mark-up for spontaneous speech. We show, despite the low overhead required for understanding the annotation protocol, it allows for good inter-annotator agreement and can be used to map onto existing disfluency categorization, with no loss of information.

Keywords disfluency annotation, DiSS, German corpora, inter-annotator agreement, laughter, spontaneous speech
Peter Howell, “Intervention for children with word-finding difficulty: Impact on fluency during spontaneous speech for children using English as their native or as an additional language,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Types of intervention that could be targeted when there are high rates of word-finding difficulty were examined for any impact they had on speech fluency (whole-word repetition rate in particular). Results are reported that are interpreted as showing that a semantic-based intervention has an impact on fluency as well as word-finding.

Keywords DiSS, EAL, intervention, stuttering, word-finding
Jennifer E. Mack, Sarah D. Chandler, Aya Meltzer-Asscher, Emily Rogalski, Sandra Weintraub, M.-Marsel Mesulam, and Cynthia K. Thompson, “What do pauses in narrative production reveal about the nature of word retrieval deficits in PPA?,” Neuropsychologia, vol. 77, 2015, pp. 211 - 222. DOI: http://dx.doi.org/10.1016/j.neuropsychologia.2015.08.019. http://www.sciencedirect.com/science/article/pii/S0028393215301354.

Abstract Naming and word-retrieval deficits, which are common characteristics of primary progressive aphasia (PPA), differentially affect production across word classes (e.g., nouns, verbs) in some patients. Individuals with the agrammatic variant (PPA-G) often show greater difficulty producing verbs whereas those with the semantic variant (PPA-S) show greater noun deficits and those with logopenic PPA (PPA-L) evince no clear-cut differences in production of the two word classes. To determine the source of these production patterns, the present study examined word-finding pauses as conditioned by lexical variables (i.e., word class, frequency, length) in narrative speech samples of individuals with PPA-S (n=12), PPA-G (n=12), PPA-L (n=11), and cognitively healthy controls (n=12). We also examined the relation between pause distribution and cortical atrophy (i.e., cortical thickness) in nine left hemisphere regions of interest (ROIs) linked to word production. Results showed higher overall pause rates for PPA compared to unimpaired controls; however, greater naming severity was not associated with increased pause rate. Across all groups, more pauses were produced before lower vs. higher frequency words, with no independent effects of word length after controlling for frequency. With regard to word class, the PPA-L group showed a higher rate of pauses prior to production of nouns compared to verbs, consistent with noun-retrieval deficits arising at the lemma level of word production. Those with PPA-G and PPA-S, like controls, produced similar pause rates across word classes; however, lexical simplification (i.e., production of higher-frequency and/or shorter words) was evident in the more-impaired word class: nouns for PPA-S and verbs for PPA-G. These patterns are consistent with conceptual and/or lemma-level impairments for PPA-S, predominantly affecting objects/nouns, and a lemma-level verb-retrieval deficit for PPA-G, with a concomitant impairment in phonological encoding and articulation affecting overall pause rates. The greater tendency to pause before nouns was correlated with atrophy in the left precentral gyrus, inferior frontal gyrus and inferior parietal lobule, whereas the greater tendency to pause before less frequent and longer words was associated with atrophy in left precentral and inferior parietal regions.

Keywords Brain–behavior relationship
Hanae Koiso, and Yasuharu Den, “Causal analysis of acoustic and linguistic factors related to speech planning in Japanese monologs,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract In this paper, we applied a general method of testing path models, investigating causal relationship between cognitive load in speech planning and four types of disfluencies in Japanese monologs. The four disfluencies examined were i) clause-initial fillers, ii) inter-clausal pauses, iii) clause-final lengthening, and iv) boundary pitch movements, which occurred at weak clause boundaries. The length of the constituents following weak clause boundaries was assumed to be a measure of the complexity affecting the cognitive load. By using a model selection technique based on the AIC, we found an optimal model with the smallest AIC, in which the constituent complexity had direct effects on all of the four disfluency variables. In addition, some of the disfluencies influenced one another; clause-final lengthening was enhanced by the presence of a boundary pitch movement and the occurrence of clause-initial fillers was affected by all the other three disfluency variables.

Keywords boundary pitch movements, clause-final lengthening, DiSS, fillers, path models, pauses
Marie-José Kolly, Adrian Leemann, Philippe Boula de Mareüil, and Volker Dellwo, “Speaker-Idiosyncrasy in Pausing Behavior: Evidence from a Cross-Linguistic Study,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0294.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0294.pdf.

Abstract Phoneticians study acoustic speech signals. But what about the aspects of speech where the signal is silent? The present study investigated speakers’ pausing behavior in their native and non-native speech. Pausing measures were applied in order to study between-speaker and within-speaker variability, where within-speaker variability was introduced by recording speakers in their native Zurich German, and in their second languages English and French. Results showed that pausing measures in the form of pause numbers and pause durations are speaker-specific. Furthermore, this speaker-specificity became evident across different languages. Results are discussed in the context of forensic voice comparison.

Keywords forensic phonetics, pausing, second language, speaker-idiosyncrasy, temporal features
Jixing Li, and Sam Tilsen, “Phonetic Evidence for Two Types of Disfluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0766.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0766.pdf.

Abstract Disfluency, such as pause (silences), filled pause (e.g., ‘um’, ‘uh’), repetition (e.g., ‘the the’) and cutoff word (e.g., ‘hori[zontal]-’), is a common part of human speech that occurs at a rate of 6 to 10 per 100 words [2, 5]. According to one model of speech production [8], there are two types of disfluency: disfluency at the internal planning stage (e.g., word-retrieval difficulties), and disfluency at the external monitoring stage (e.g., self-correction of speech errors). The current study provides phonetic evidence for the two types of disfluency by examining word durations before different types of disfluency in the Switchboard corpus [6]. The results showed only a marginal increase in the durations of words before cutoffs, but a large increase in the durations of words before repetitions, silences and filled pauses, suggesting internal processing difficulty before noncutoff disfluency, but not before cutoff disfluency.

Keywords disfluency, duration, self-monitoring, Switchboard
Yan-Hua Long, and Hong Ye, “Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech,” PLoS ONE, vol. 10, no. 4, 04/2015 2015. DOI: doi:10.1371/journal.pone.0123466.

Abstract Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.
Kikuo Maekawa, and Hiroki Mori, “Voice quality analysis of Japanese filled pauses : a preliminary report,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Using the Core of the Corpus of Spontaneous Japanese, acoustic analysis of F1, spectral tilt (TL), H1-H2, jitter and F0 was conducted to examine the voice-quality difference between the vowels in filled pauses and those in ordinary lexical items. It turned out by simple SVM analysis that the two classes of vowels could be discriminated with the mean accuracy of higher than 70%.

Keywords DiSS
Kirsty McDougall, Martin Duckworth, and Toby Hudson, “Individual and Group Variation in Disfluency Features: A Cross-Accent Investigation,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0308.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0308.pdf.

Abstract A study of individual differences in the fluency disruptions of speakers of two different accents, Standard Southern British English (SSBE) and York English is presented. Distributions of rates of occurrence per 100 syllables are examined for filled and silent pauses, repetitions, prolongations and (self-)interruptions, and subcategories of these. Patterns of occurrence of disfluency features show considerable between-speaker variation in both SSBE and York English. Similar ranges of speakers’ overall disfluency rates are exhibited by both accents, but cross-accent differences are present in the patterning of some disfluency feature categories. The results suggest that a detailed record of disfluency features is a useful additional tool in forensic speaker comparison.

Keywords accent differences, disfluency, forensic speaker comparison, individual differences
Helena Moniz, Jaime Ferreira, Fernando Batista, and Isabel Trancoso, “Disfluency detection across domains,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.

Keywords acoustic-prosodic features, cross-domain analysis, disfluency detection, DiSS, European Portuguese.
Helena Moniz, A. Pompili, Fernando Batista, Isabel Trancoso, A. Abad, and C. Amorim, “Automatic Recogntion of Prosodic Patterns in Semantic Verbal Fluency Tests – An Animal Naming Task for Edutainment Applications,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0997.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0997.pdf.

Abstract This paper automatically detects prosodic patterns in the domain of semantic fluency tests. Verbal fluency tests aim at evaluating the spontaneous production of words under constrained conditions. Mostly used for assessing cognitive impairment, they can be used in a plethora of domains, as edutainment applications or games with educational purposes. This work discriminates between list effects, disfluencies, and other linguistic events in an animal naming task. Recordings from 42 Portuguese speakers were automatically recognized and AuToBI was applied in order to detect prosodic patterns, using both European Portuguese and English models. Both models allowed to differentiate list effects from the other events, mostly represented by the tunes: L* H/L(-%) (English models) or L*+H H/L(-%) (Portuguese models). However, English models proved to be more suitable because they rely in substantial more training material.

Keywords and Automatic Speech Recognition, Edutainment, prosody, Semantic Fluency
Costanza Navarretta, “The functions of fillers, filled pauses and co-occurring gestures in Danish dyadic conversations,” in Proceedings from the 3rd European Symposium on Multimodal Communication, Dublin, September 2015, pp. 55-61. https://ep.liu.se/konferensartikel.aspx?series=ecp&issue=105&Article_No=10.

Abstract Fillers, alone or accompanied by pauses and/or gestures, are quite frequent in all types of spoken communication. They have numerous and non-exclusive functions which are related to interaction management (feedback and turn management) or discourse planning. Fillers are part of the language and thus, to some extent, language dependent. This article presents an analysis of fillers, filled pauses and co-occurring gestures in a Danish multimodal corpus of first encounters. The aims of the study are to determine the most common fillers in the corpus, the gestures co-occurring with them, their functions, and possibly their most prototypical uses. The results of our study indicate that the most common fillers in the data are øh, mm, øhm which all are accompanied by one or more gestures in most of their occurrences. We also found that each filler type has a predominant or prototypical use. Mm often occurs alone as feedback marker and is accompanied by feedback gestures. Øhm has the longest duration and often precedes an utterance or a clausal phrase signaling discourse planning. Its co-speech gestures have also interaction management functions. Finally, øh often precedes a content word, has a shorter duration than øhm and signals lexical retrieval. Interestingly the prototypical uses of the vocal øh and the vocal-nasal øhm are the same as those of the English vocal uh and vocal-nasal um, respectively.

Keywords multimodal communication; gestures; filled pause
Sieb Nooteboom, and Hugo Quené, “The Word-Onset Effect: Some Contradictory Findings,” 2015. http://www.siebnooteboom.nl/files/pdf/Diss2015WordOnsetsSomeContradictoryFindings.pdf.

Abstract In this paper we describe two experiments exploring possible for reasons for earlier conflicting results concerning the so-called word-onset effect in interactional segmental speech errors. Experiment 1 elicits errors in pairs of CVC real words with the SLIP technique. No word-onset effect is found. Experiment 2 is a tongue-twister experiment with lists of four disyllabic words. A significant word-onset effect is found. The conflicting results are not resolved. We also found that intervocalic consonants hardly ever interact with initial and final consonants, and that words sharing a stress pattern are a major factor in generating interactional errors.
Núria Enríquez, Lourdes Díaz, and Mariona Taulé, “Mental Processes in the Oral Production of Non-Native Spanish Speakers: Pauses and Self-Correction,” Procedia - Social and Behavioral Sciences, vol. 173, 2015, pp. 24-30. DOI: http://dx.doi.org/10.1016/j.sbspro.2015.02.025. http://www.sciencedirect.com/science/article/pii/S1877042815013348.

Abstract In the field of teaching Spanish as a Foreign Language (SFL), textbooks and teaching materials often provide learners with language samples characterized by a lack of naturalness. We propose the use of a prototypical model of core competence, obtained from the analysis of communicative situations based on real corpora and the comparison of the same type of work with native and non-native speakers. The specific objective is the study of communication strategies related to pauses and self-correction in native and non-native speech, in order to analyse the repair strategies related to language processing

Keywords L1/L2 corpora
Naoki Ohshima, Keita Kimijima, Junji Yamato, and Naoki Mukawa, “A conversational robot with vocal and bodily fillers for recovering from awkward silence at turn-takings,” in 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2015, pp. 325-330. DOI: 10.1109/ROMAN.2015.7333677. https://ieeexplore.ieee.org/document/7333677.

Abstract When there is a lull in conversation, many people feel awkward and make sounds, such as “ummm” (a vocal filler), or stroke their chins or other parts of their bodies (a bodily filler). These fillers suggest the intent to recommence and continue the conversation. Thus, the purpose of conversation is to facilitate the sharing of beneficial information as well as a comfortable moment with an interlocutor. In this manner, a robot intended to help foster a comfortable or relaxing conversational atmosphere through interaction with a user (e.g., therapeutic robots) needs to be designed to convey such a cooperative demeanor to its human interlocutor. In this study, we analyze the effects of a robot's vocal and bodily fillers during awkward silences between turns in conversations with humans. The results of our study show that people feel awkward during silences, even when in conversation with a robot, and that the robot's conversational fillers help mitigate awkwardness and express a cooperative attitude in verbal interactions with people. Our experiments also revealed that subjects who were less socially adept reported feeling that their robot interlocutor was more sincere than its human counterpart.

Keywords Robots; Speech recognition; Pragmatics; Animals; Laboratories; Speech; Human-robot interaction; human factors; human-robot interaction; awkward silence recovery; turn-takings; interlocutor; comfortable conversational atmosphere; relaxing conversational atmosphere; user-robot interaction; therapeutic robots; human interlocutor; robot vocal fillers; robot bodily fillers; robot conversational fillers; verbal interactions; cooperative attitude; robot interlocutor
Leendert Plug, “Prosodic Marking and Predictability in Lexical Self-Repair,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0032.1-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0032.pdf.

Abstract This paper reports on an investigation of lexical self-repair in Dutch spontaneous dialogue. Lexical self-repairs, in which one word is rejected for another, can be produced with or without notable ’prosodic marking’ of the second word. It remains unclear what motivates speakers‘ choices, but previous research has shown that the semantic distance between the two words is relevant. This study assesses the relevance of the words’ predictability. Prosodic marking judgements are modelled using an established semantic classification and a range of probabilistic variables, including both frequency-based and cloze-based measures. Results suggest that probabilistic measures add little predictive power to the semantic classification, although informative data trends can be observed.

Keywords Dutch, predictability, prosody, self-repair, spontaneous speech
Ines Rehbein, “Filled Pauses in User-generated Content are Words with Extra-propositional Meaning,” in Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015), Denver, Colorado, Association for Computational Linguistics, June 2015, pp. 12-21. DOI: 10.3115/v1/W15-1302. https://www.aclweb.org/anthology/W15-1302.

Abstract In this paper, we present a corpus study investigating the use of the fillers äh (uh) and ähm (uhm) in informal spoken German youth language and in written text from social media. Our study shows that filled pauses occur in both corpora as markers of hesitations, corrections, repetitions and unfinished sentences, and that the form as well as the type of the fillers are distributed similarly in both registers. We present an analysis of fillers in written microblogs, illustrating that äh and ähm are used intentionally and can add a subtext to the message that is understandable to both author and reader. We thus argue that filled pauses in user-generated content from social media are words with extrapropositional meaning.
Sandra Reitbrecht, and Ursula Hirschfeld, “The Impact of Fluency and Hesitation Phenomena on the Perception of Non-native Speakers by Native Listeners of German,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0166.1-4. http://www.icphs2015.info/pdfs/Papers/ICPHS0166.pdf.

Abstract The here presented and ongoing study addresses L2 fluency and hesitation phenomena in the context of speech effects in intercultural communication. It investigates the impact of fluency and hesitation phenomena on the perception of non-native speakers by native listeners of German. The first results underline the importance and salience of hesitation phenomena and fluency for speech effects and suggest a higher consideration of these features in future studies. Native recipients’ verbal reactions to L2 speech material show that they often make reference to features of L2 utterance fluency to explain how they perceive non-native speakers, their personality and their emotional state. Furthermore, Spearman’s rank correlation tests for a certain number of fixed perceptual categories prove significant correlations between perceived fluency and the attributes assured (r(309)=0.617, p<0.01), well prepared (r(303)=0.589, p<0.01), competent (r(305)=0.483, p<0.01), relaxed (r(307)=0.375, p<0.01) and nervous (r(309)=-0.322, p<0.01).

Keywords Czech, Fluency, French, German as a foreign language, speech effects
Ralph Rose, “Um and uh as differential delay markers: the role of contextual factors,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract The English filled pauses uh and um have been argued to correspond respectively to shorter and longer anticipated delays in speech production. This study looks at some contextual factors that might cause this difference by investigating filled pause instances in monologue and conversation speech corpora. Results are consistent with previously observed delay differences and further show that discourse-level processing may influence differential delay marking though monologue results are more conclusive than conversation results. However, no evidence was found that lexical factors (word type, frequency) correlate with filled pause choice. The findings suggest a limited view of how speakers use filled pauses as delay markers: Not all contextual factors may trigger differential delay marking.

Keywords contextual factors, delay, DiSS, filled pause
Ralph Rose, “Temporal Variables in First and Second Language Speech and Perception of Fluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0405.1-5. http://www.roselab.sci.waseda.ac.jp/resources/file/2015_icphs_rose_paper.pdf.

Abstract Evidence is accumulating that many temporal features of second language speech are correlated with those of first language speech. This study looks at the correlation between articulation rate, pause rate, and mean pause duration in Japanese first and English second language speech and how second language fluency raters perceive these. In a crosslinguistic corpus of spontaneous speech, mean pause duration was found to have a near-high correlation while the other two temporal variables have a moderate correlation. A subsequent elicitation of fluency judgments on the second language English speech via Amazon Mechanical Turk showed that ratings were highly dependent on pause duration, rather less on articulation rate, but not on pause rate. Results suggest that raters’ perception of second language fluency is divergent from speakers’ actual second language development: Ratings are related to features that are not indicative of second language development but rather of individual speech patterns.

Keywords articulation rate, Fluency, second language acquisition, silent pause
Sara Bögels, Kobin H. Kendrick, and Stephen C. Levinson, “Never Say No … How the Brain Interprets the Pregnant Pause in Conversation,” PLoS ONE, vol. 10, no. 12, 2015, pp. 15. DOI: 10.1371/journal.pone.0145474. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0145474.

Abstract In conversation, negative responses to invitations, requests, offers, and the like are more likely to occur with a delay–conversation analysts talk of them as dispreferred. Here we examine the contrastive cognitive load ‘yes’ and ‘no’ responses make, either when relatively fast (300 ms after question offset) or delayed (1000 ms). Participants heard short dialogues contrasting in speed and valence of response while having their EEG recorded. We found that a fast ‘no’ evokes an N400-effect relative to a fast ‘yes’; however, this contrast disappeared in the delayed responses. ’No’ responses, however, elicited a late frontal positivity both if they were fast and if they were delayed. We interpret these results as follows: a fast ‘no’ evoked an N400 because an immediate response is expected to be positive–this effect disappears as the response time lengthens because now in ordinary conversation the probability of a ‘no’ has increased. However, regardless of the latency of response, a ‘no’ response is associated with a late positivity, since a negative response is always dispreferred. Together these results show that negative responses to social actions exact a higher cognitive load, but especially when least expected, in immediate response.
Miki Shrosbree, “Cross-Linguistic Articulation Rate among Near-Balanced Bilinguals and Implications for Second Language Fluency Measurement,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0572.1-4. http://www.icphs2015.info/pdfs/Papers/ICPHS0572.pdf.

Abstract The present study examines cross-linguistic articulation rates in read speech among 28 native speakers (14 English and 14 Japanese) and 14 Japanese-English near-balanced bilinguals. The results show that: (1) articulation rates are comparable between the native speakers and the bilinguals; (2) there was a significant difference of articulation rates in Japanese and English among the bilinguals; (3) there is a strong positive correlation between English and Japanese articulation rates among bilinguals. Implications for development of L2 fluency measurement using the L1 fluency as a baseline are discussed.

Keywords articulation rate, balanced bilingual, Fluency, second language, speech rate
Vered Silber-Varod, Adva Weiss, and Noam Amir, “Can you hear these mid-front vowels? Formants analysis of hesitation disfluencies in spontaneous Hebrew,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract This study attempts to characterize the timbre of the default type of hesitation disfluency (HD) in Israeli Hebrew: the mid-front vowel /e/. For this purpose, we analysed the frequencies of the first three formants, F1, F2, and F3, of hundreds of HD pronunciations taken from The Corpus of Spoken Israeli Hebrew (COSIH). We also compared the formant values with two former studies that were carried out on the vowel /e/ in fluent speech. The findings show that, in general, elongated word-final syllables and appended [e]s are pronounced with the same amount of openness as fluent [e], while filled pauses tend to be more open (lower F1), and more frontal (higher F2). Following these results, we suggest to use different set of IPA symbols, and not the phonemic mid-front /e/, in order to better represent hesitation disfluencies.

Keywords DiSS, filled pauses, formants, Hebrew, hesitation disfluency, LPC analysis, spontaneous speech
Anton Stepikhov, and Anastassia Loukina, “Sentence Boundaries in Text and Pauses in Speech: Correlation or Confrontation?,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0588.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0588.pdf.

Abstract The paper explores the interaction between sentence boundaries marked by annotators in transcriptions of Russian spontaneous speech and actual prosodic boundaries in the signal. The aim of the research is to investigate whether annotators’ prosodic competence allows them to correctly detect sentence boundaries in speech based on textual information only. We found that inter-annotator agreement for each sentence boundary identified in transcription was affected by both presence or absence of pause and pause duration. Mixed linear model showed that presence or absence of pause explain 13% of variance in boundary detection. Pause duration explained only 4% of variance in inter-annotator agreement with moderate correlation of r = 0.21. We argue that relatively small size of effect in this case may be due to the interaction of different pausing strategies typical for reading and spontaneous speech, ambiguity of sentence boundaries and individual differences in speech perception.

Keywords annotation, boundary detection, pausing, Russian, spontaneous speech
Jozsef Szakos, and Ulrike Glavitsch, “Investigating disfluency in recordings of last speakers of endangered Austronesion languages in Taiwan,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract The nearly three decades spent in Formosan language documentation produced hundreds of hours of recorded speech. In this paper, we show how the use of SpeechIndexer for transcribing and indexing the data visualises the problem of disfluency in the spontaneous narratives and dialogues. The semiautomatic alignment of speech and transcription needs to be adjusted manually each time when unpredictable pauses occur which are disfluencies, rather than markers of phrasal units. It is illustrated how the combination of SpeechIndexer’s pause finder with pitch measurements can help to pinpoint the difference of phrasal boundaries and pauses of disfluency.

Keywords Austronesian, DiSS, lesser-documented unwritten language, pause finder, SpeechIndexer
Leimin Tian, Catherine Lai, and Johanna Moore, “Recognising emotions in dialogues with disfluencies and non-verbal vocalisations,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV features alone is not as predictive as using lexical or acoustic features. However, adding them to lexical or acoustic feature set yields improvement compared to using lexical or acoustic features alone. This indicates that disfluencies and non-verbal vocalisations provide useful information overlooked by the other two types of features for emotion recognition.

Keywords Dialogue, disfluency, DiSS, emotion recognition, HCI, speech processing
Leimin Tian, Johanna D. Moore, and Catherine Lai, “Emotion recognition in spontaneous and acted dialogues,” in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), September 2015, pp. 698-704. DOI: 10.1109/ACII.2015.7344645. https://ieeexplore.ieee.org/document/7344645.

Abstract In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.

Keywords Hidden Markov models; Databases; Emotion recognition; Support vector machines; Data models; Feature extraction; Context modeling
Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King, “A lattice-based approach to automatic filled pause insertion,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.

Keywords disfluency, DiSS, f-RNNLMs, filled pauses, lattices, Ngrams
Gunnel Tottie, “From pause to word: Uh and um in written language.,” in ICAME 36 (WORDS, WORDS, WORDS – CORPORA AND LEXIS), 05/2015 2015, pp. 174. https://www.uni-trier.de/fileadmin/fb2/ANG/ICAME36/ICAME_36_abstracts_booklet.pdf.

Abstract (none)
Michiko Watanabe, Yosuke Kashiwagi, and Kikuo Maekawa, “The relationship between preceding clause type, subsequent clause length and duration of silent and filled pauses at clause boundaries in Japanese monologues,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Filled pauses (FPs) are claimed to occur when speakers have some difficulties and need extra time in speech production. This study investigated whether the following two factors affect silent pause (SP) and FP durations at clause boundaries, using a spontaneous speech corpus: 1) boundary strength and 2) subsequent clause length. First, whether SP and FP durations increase with syntactic boundary strength was examined. Second, whether subsequent clause length affects SP and FP durations at the boundaries was investigated. Results show SP duration increased with boundary strength and subsequent clause length, but FP duration did not, suggesting only SP duration is affected by the two Factors.

Keywords clause boundary, disfluency, DiSS, filled pause, silent pause, speech planning
Mirjam Wester, Martin Corley, and Rasmus Dall, “The temporal delay hypothesis: natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Including disfluencies in synthetic speech is being explored as a way of making synthetic speech sound more natural and conversational. How to measure whether the resulting speech is actually more natural, however, is not straightforward. Conventional approaches to synthetic speech evaluation fall short as a listener is either primed to prefer stimuli with filled pauses or, when they aren’t primed they prefer more fluent speech. Psycholinguistic reaction time experiments may circumvent this issue. In this paper, we revisit one such reaction time experiment. For natural speech, delays in word onset were found to facilitate word recognition regardless of the type of delay; be they a filled pause (um), silence or a tone. We expand these experiments by examining the effect of using vocoded and synthetic speech. Our results partially replicate previous findings. For natural and vocoded speech, if the delay is a silent pause, significant increases in the speed of word recognition are found. If the delay comprises a filled pause there is a significant increase in reaction time for vocoded speech but not for natural speech. For synthetic speech, no clear effects of delay on word recognition are found. We hypothesise this is because it takes longer (requires more cognitive resources) to process synthetic speech than natural or vocoded speech.

Keywords delay hypothesis, disfluency, DiSS
Maria K. Wolters, Luis Ferrini, Elaine Farrow, Aurora Szentagotai Tatar, and Christopher D. Burton, “Tracking Depressed Mood Using Speech Pause Patterns,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0811.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0811.pdf.

Abstract The speech of people with depression often shows clear signs of their condition (e.g., flat intonation, slow speech, long pauses), but it is not clear to what extent these signs covary with diurnal fluctuations in mood. In this paper, we report results from a pilot longitudinal study where 11 people with depression tracked various aspects of their mental health for a month. This included a daily mood tracker and regular completion of speech tasks. Speech tasks were designed to be emotionally neutral and require different levels of automaticity. We found that participants differed in their willingness to complete the speech tasks, and that preliminary analyses show no clear link between mood and prosody. We discuss implications of this study for tracking depressed mood using speech in real-life applications.

Keywords depression, emotion, pauses, prosody
Clare Wright, and Cong Zhang, “The effect of study abroad experience on L2 Mandarin disfluency in different types of tasks,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

Abstract Disfluency is a common phenomenon in L2 speech, especially in beginners’ speech. Whether studying abroad can help with reducing their disfluency or not remains debated [8]. We examined longitudinal data from 10 adult English instructed learners of Mandarin measured before and after ten months of studying abroad (SA) in this paper. We used two speaking tasks comparing pre-planned vs. Unplanned spontaneous speech to compare differences over time and between tasks, using eight linguistic and temporal fluency measures (analysed using CLAN and PRAAT). Overall mean linguistic and temporal fluency scores improved significantly (p < .05), especially speech rate (p <.01), supporting the general claim that SA favours oral development, particularly fluency [2]. Further analysis revealed task differences at both times of measurement, but with greater improvement in the spontaneous task.

Keywords DiSS, Fluency, L2 Mandarin, study abroad

2014

Hans Rutger Bosker, Hugo Quené, Ted Sanders, and Nivja H. Jong, “The Perception of Fluency in Native and Nonnative Speech,” Language Learning, vol. 64, no. 3, 9 2014, pp. 579–614. DOI: 10.1111/lang.12067. http:https://dx.doi.org/10.1111/lang.12067.

Abstract Where native speakers supposedly are fluent by default, nonnative speakers often have to strive hard to achieve a nativelike fluency level. However, disfluencies (such as pauses, fillers, repairs, etc.) occur in both native and nonnative speech and it is as yet unclear how fluency raters weigh the fluency characteristics of native and nonnative speech. Two rating experiments compared the way raters assess the fluency of native and nonnative speech. The fluency characteristics were controlled by using phonetic manipulations in pause (Experiment 1) and speed characteristics (Experiment 2). The results show that the ratings of manipulated native and nonnative speech were affected in a similar fashion. This suggests that there is no difference in the way listeners weigh the fluency characteristics of native and nonnative speakers.
Richard Dufour, Yannick Estève, and Paul Deléglise, “Characterizing and detecting spontaneous speech: Application to speaker role recognition,” Speech Communication, vol. 56, 2014, pp. 1 - 18. DOI: 10.1016/j.specom.2013.07.007. http://www.sciencedirect.com/science/article/pii/S0167639313000976.

Abstract Processing spontaneous speech is one of the many challenges that automatic speech recognition systems have to deal with. The main characteristics of this kind of speech are disfluencies (filled pause, repetition, false start, etc.) and many studies have focused on their detection and correction. Spontaneous speech is defined in opposition to prepared speech, where utterances contain well-formed sentences close to those found in written documents. Acoustic and linguistic features made available by the use of an automatic speech recognition system are proposed to characterize and detect spontaneous speech segments from large audio databases. To better define this notion of spontaneous speech, segments of an 11-hour corpus (French Broadcast News) had been manually labeled according to three classes of spontaneity. Firstly, we present a study of these features. We then propose a two-level strategy to automatically assign a class of spontaneity to each speech segment. The proposed system reaches a 73.0% precision and a 73.5% recall on high spontaneous speech segments, and a 66.8% precision and a 69.6% recall on prepared speech segments. A quantitative study shows that the classes of spontaneity are useful information to characterize the speaker roles. This is confirmed by extending the speech spontaneity characterization approach to build an efficient automatic speaker role recognition system.

Keywords Spontaneous speech; Speaker role; Feature extraction; Speech classification; Automatic speech recognition; Role recognition
Eszter Tisljár-Szabó, and Csaba Pléh, “Ascribing emotions depending on pause length in native and foreign language speech,” Speech Communication, vol. 56, 2014, pp. 35-48. DOI: http://dx.doi.org/10.1016/j.specom.2013.07.009. http://www.sciencedirect.com/science/article/pii/S016763931300099X.

Abstract Although the relationship between emotions and speech is well documented, little is known about the role of speech pauses in emotion expression and emotion recognition. The present study investigated how speech pause length influences how listeners ascribe emotional states to the speaker. Emotionally neutral Hungarian speech samples were taken, and speech pauses were systematically manipulated to create five variants of all passages. Hungarian and Austrian participants rated the emotionality of these passages by indicating on a 1–6 point scale how angry, sad, disgusted, happy, surprised, scared, positive, and heated the speaker could have been. The data reveal that the length of silent pauses influences listeners in attributing emotional states to the speaker. Our findings argue that pauses play a relevant role in ascribing emotions and that this phenomenon might be partly independent of language.

Keywords Foreign language
Ian R. Finlayson, “Testing the roles of disfluency and rate of speech in the coordination of conversation,” Master's Thesis, Queen Margaret University, Edinburgh, Scotland, UK, . 2014. http://etheses.qmu.ac.uk/1631/.

Abstract This thesis is concerned with two different accounts of how speakers coordinate conversation. In both accounts it is suggested that aspects of the manner in which speech is performed (its disfluency and its rate) are integral to the smooth performance of conversation. In the first strand, we address Clark’s (1996) suggestion that speakers design hesitations, such as filled pauses (e.g. uh and um), repetitions and prolongations, to signal to their audience that they are experiencing difficulties during language production. Such signals allow speakers to account for their use of time, particularly when they experience disruptions during production. The account is tested against three criteria, proposed by Kraljic and Brennan (2005), for evaluating whether a feature of speech is being designed: That it be produced with regularity, that it be interpretable by listeners, and that its production varies according to the speaker’s communicative intention. While existing literature offers support for the first two criteria, neither an experiment with dyads nor analyses of dialogue in the Map Task Corpus (MTC; Anderson et al., 1991) found support for the third criterion. We conclude that, rather than being signals of difficulty, hesitations are merely symptoms which listeners may exploit to aid comprehension. In the second strand, we tested Wilson and Wilson’s (2005) oscillator theory of the timing of turn-taking. This suggests that entrainment between conversational partners’ rates of speech allow them to make precise predictions about when each others’ turns are going to end, and, subsequently, when they can begin a turn of their own. As a critical test of the theory, we predicted that speakers who were more tightly entrained would produce more seamless turn-taking. Again using the MTC, we found no evidence of a relationship between how closely entrained speakers were and how precisely they timed the beginning of their turns relative to the ends of each others’ turns.
Craig Lambert, and Judit Kormos, “Complexity, Accuracy, and Fluency in Task-based L2 Research: Toward More Developmentally Based Measures of Second Language Acquisition,” Applied Linguistics, vol. 35, no. 5, 08/2014 2014, pp. 607-614. DOI: doi.org/10.1093/applin/amu047. https://academic.oup.com/applij/article/35/5/607/2887860/Complexity-Accuracy-and-Fluency-in-Task-based-L2.

Abstract This article surveys how complexity, accuracy, and fluency (CAF) have been operationalized in studies of task-based L2 production, pointing out some problems with this approach and the need for more precise information about L2 development during task performance. Research into developing L1 text construction ability is then discussed and some approaches for establishing measures of the relevant constructs in L2 performance are suggested.
Charlyn M. Laserna, Yi-Tai Seih, and James W. Pennebaker, “Um . . . Who Like Says You Know : Filler Word Use as a Function of Age, Gender, and Personality,” Journal of Language and Social Psychology, vol. 33, no. 3, 2014, pp. 328-338. DOI: 10.1177/0261927X14526993. http://jls.sagepub.com/content/early/2014/03/26/0261927X14526993.abstract.

Abstract Filler words ('I mean, you know, like, uh, um') are commonly used in spoken conversation. The authors analyzed these five filler words from transcripts recorded by a device called the Electronically Activated Recorder (EAR), which sampled participants’ language use in daily conversations over several days. By examining filler words from 263 transcriptions of natural language from five separate studies, the current research sought to clarify the psychometric properties of filler words. An exploratory factor analysis extracted two factors from the five filler words: filled pauses ('uh, um') and discourse markers ('I mean, you know, like'). Overall, filled pauses were used at comparable rates across genders and ages. Discourse markers, however, were more common among women, younger participants, and more conscientious people. These findings suggest that filler word use can be considered a potential social and personality marker.

Keywords discourse marker, EAR, filler word, LIWC
Olga Vyacheslavovna Maletina, “All Theses and Dissertations Understanding L1-L2 Fluency Relationship Across Different Languages and Different Proficiency Levels,” Master's Thesis, Brigham Young University. 06/2014 2014, pp. 4094. http://scholarsarchive.byu.edu/etd/4094/.

Abstract The purpose of this research was to better understand the relationship between L1 and L2 fluency, precisely, whether there is a relationship between L1 and L2 temporal fluency measures and whether this relationship differs across different languages and different proficiency levels. In order to answer these questions, L1 and L2 speech samples of the same speakers were collected and analyzed. Twenty-five native speakers and 45 non-native speakers of Japanese, Mandarin Chinese, Portuguese, Spanish, and Russian were asked to respond to questions and perform picture descriptions in their L1 and L2. The recorded speech samples were then analyzed by means of a Praat script in order to identify mean length of run (MLR), speech rate, and number of pauses. Several different statistical analyses were then performed to compare these L1 and L2 temporal features across different languages and different proficiency levels. The results of this study indicate that there is a strong relationship between L1 and L2 fluency and that this relationship may play a role in L2 production. Furthermore, it was found that native languages differ in their patterns of L1 temporal fluency production and that these differences may affect the production of L2 temporal fluency. It was also found that L1-L2 fluency relationship did not differ at different proficiency levels suggesting that individual factors may play a role in L2 fluency production. Thus, it was found that an Intermediate speaker of Spanish, for instance, did not speak faster than an Intermediate speaker of Russian, suggesting that naturally slower speakers in their L1 will still speak slower in their L2. These results indicate that fluency is as much of a trait as it is a state. However, it was also found that not all of the L1-L2 language combinations demonstrated the same results, indicating that the L1-L2 fluency relationship is affected by the L2. These findings have different implications for both L2 teaching and learning, as well as L2 assessment of fluency and overall language proficiency.

Keywords acquisition, Fluency, proficiency, second-language
Helena Moniz, Fernando Batista, Ana Isabel Mata, and Isabel Trancoso, “Speaking style effects in the production of disfluencies,” Speech Communication, vol. 65, 2014, pp. 20-35. DOI: 10.1016/j.specom.2014.05.004. http://www.sciencedirect.com/science/article/pii/S0167639314000430.

Abstract This work explores speaking style effects in the production of disfluencies. University lectures and map-task dialogues are analyzed in order to evaluate if the prosodic strategies used when uttering disfluencies vary across speaking styles. Our results show that the distribution of disfluency types is not arbitrary across lectures and dialogues. Moreover, although there is a statistically significant cross-style strategy of prosodic contrast marking (pitch and energy increases) between the region to repair and the repair of fluency, this strategy is displayed differently depending on the specific speech task. The overall patterns observed in the lectures, with regularities ascribed for speaker and disfluency types, do not hold with the same strength for the dialogues, due to underlying specificities of the communicative purposes. The tempo patterns found for both speech tasks also confirm their distinct behaviour, evidencing the more dynamic tempo characteristics of dialogues. In university lectures, prosodic cues are given to the listener both for the units inside disfluent regions and between these and the adjacent contexts. This suggests a stronger prosodic contrast marking of disfluency–fluency repair when compared to dialogues, as if teachers were monitoring the different regions – the introduction to a disfluency, the disfluency itself and the beginning of the repair – demarcating them in very contrastive ways.

Keywords Prosody, Disfluencies, Lectures, Dialogues, Speaking styles
Naoki Mukawa, Hiroki Sasaki, and Atsushi Kimura, “How do verbal/bodily fillers ease embarrassing situations during silences in conversations?,” in The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014, pp. 30-35. DOI: 10.1109/ROMAN.2014.6926226.

Abstract In this study we analyzed the roles of verbal/bodily fillers for recovering from awkward silences in conversations. We focused on verbal fillers such as “ummm” and “uh,” and bodily fillers like “touching own hair or chin” that commonly emerge during silences between turns in conversations. We designed and created simulated dyadic-conversation scenarios using computer graphics characters, and then performed evaluations utilizing stimuli drawn from these simulations. Subjective evaluation results suggested that fillers express participants' sincerity in maintaining conversations and they can be used as clues for other participants to begin their utterances. These findings have practical implications for the behavioral design of conversational robots that can behave more appropriately and politely with humans.

Keywords Educational robots; Estimation; Pragmatics; Computer graphics; Hair; Maintenance engineering; human-robot interaction; verbal-bodily fillers; embarrassing situations; conversation silences; awkward silences; bodily fillers; simulated dyadic-conversation scenarios; computer graphics characters; behavioral design; conversational robots; humans
O’Brien,Mary Grantham, “L2 Learners’ Assessments of Accentedness, Fluency, and Comprehensibility of Native and Nonnative German Speech,” Language Learning, vol. 64, no. 4, 12/2014 2014, pp. 715-748. DOI: 10.1111/lang.12082. http://onlinelibrary.wiley.com/doi/10.1111/lang.12082/full.

Abstract In early stages of classroom language learning, many adult second language (L2) learners communicate primarily with one another, yet we know little about which speech stream characteristics learners tune into or the extent to which they understand this lingua franca communication. In the current study, 25 native English speakers learning German as a L2 with varying levels of German proficiency rated German speech produced by native speakers and fellow learners of German along three continua: accentedness, fluency, and comprehensibility. An examination of speech stream (i.e., phonological, fluency based, and lexical/grammatical) characteristics along with partial correlations indicates both that the raters distinguished among the three concepts but that they conflated the term fluency with proficiency. Self‐reported proficiency in German and linguistic training were the best predictors of the ratings assigned.

Keywords accentedness, Comprehensibility, Fluency, German, L2 raters, L2 speech
Vikram Ramanarayanan, Adam Lammert, Louis Goldstein, and Shrikanth Narayanan, “Are Articulatory Settings Mechanically Advantageous for Speech Motor Control?,” PLoS ONE, vol. 9, no. 8, 08/2014 2014, pp. e104168. DOI: 10.1371/journal.pone.0104168. http://dx.doi.org/10.1371%2Fjournal.pone.0104168.

Abstract We address the hypothesis that postures adopted during grammatical pauses in speech production are more “mechanically advantageous” than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and consonant intervals using automated analysis of video captured with real-time magnetic resonance imaging during production of read and spontaneous speech by 5 healthy speakers of American English. We then use locally-weighted linear regression to estimate the articulatory forward map from low-level articulator variables to high-level task/goal variables for these postures. We quantify the overall magnitude of the first derivative of the forward map as a measure of mechanical advantage. We find that postures assumed during grammatical pauses in speech as well as speech-ready postures are significantly more mechanically advantageous than postures assumed during absolute rest. Further, these postures represent empirical extremes of mechanical advantage, between which lie the postures assumed during various vowels and consonants. Relative mechanical advantage of different postures might be an important physical constraint influencing planning and control of speech production.
Jessamyn Schertz, and Mirjam Ernestus, “Variability in the pronunciation of non-native English the: Effects of frequency and disfluencies,” Corpus Linguistics and Linguistic Theory, vol. 10, no. 2, October 2014, pp. 329-345. DOI: 10.1515/cllt-2014-0024. https://www.degruyter.com/view/journals/cllt/10/2/article-p329.xml.

Abstract This study examines how lexical frequency and planning problems can predict phonetic variability in the function word ‘the’ in conversational speech produced by non-native speakers of English. We examined 3180 tokens of ‘the’ drawn from English conversations between native speakers of Czech or Norwegian. Using regression models, we investigated the effect of following word frequency and disfluencies on three phonetic parameters: vowel duration, vowel quality, and consonant quality. Overall, the non-native speakers showed variation that is very similar to the variation displayed by native speakers of English. Like native speakers, Czech speakers showed an effect of frequency on vowel durations, which were shorter in more frequent word sequences. Both groups of speakers showed an effect of frequency on consonant quality: the substitution of another consonant for /ð/ occurred more often in the context of more frequent words. The speakers in this study also showed a native-like allophonic distinction in vowel quality, in which /ði/ occurs more often before vowels and /ðə/ before consonants. Vowel durations were longer in the presence of following disfluencies, again mirroring patterns in native speakers, and the consonant quality was more likely to be the target /ð/ before disfluencies, as opposed to a different consonant. The fact that non-native speakers show native-like sensitivity to lexical frequency and disfluencies suggests that these effects are consequences of a general, non-language-specific production mechanism governing language planning. On the other hand, the non-native speakers in this study did not show native-like patterns of vowel quality in the presence of disfluencies, suggesting that the pattern attested in native speakers of English may result from language-specific processes separate from the general production mechanisms.

Keywords pronunciation variation; non-native speech; phonetics; lexical probability; disfluencies
Scott H. Fraundorf, and Duane G. Watson, “Alice’s adventures in um-derland: psycholinguistic sources of variation in disfluency production,” Language, Cognition and Neuroscience, vol. 29, no. 9, 2014, pp. 1083-1096. DOI: 10.1080/01690965.2013.832785. http://dx.doi.org/10.1080/01690965.2013.832785.

Abstract This study tests the hypothesis that three common types of disfluency (fillers, silent pauses and repeated words) reflect variance in what strategies are available to the production system for responding to difficulty in language production. Participants’ speech in a storytelling paradigm was coded for the three disfluency types. Repeats occurred most often when difficult material was already being produced and could be repeated, but fillers and silent pauses occurred most when difficult material was still being planned. Fillers were associated only with conceptual difficulties, consistent with the proposal that they reflect a communicative signal, whereas silent pauses and repeats were also related to lexical and phonological difficulties. These differences are discussed in terms of different strategies available to the language production system.

Keywords discourse, Disfluency, Language production
Gunnel Tottie, “On the use of uh and um in American English,” Functions of Language, vol. 21, no. 1, 2014, pp. 6-29. DOI: http://dx.doi.org/10.1075/fol.21.1.02tot. http://www.jbe-platform.com/content/journals/10.1075/fol.21.1.02tot.

Abstract This study examines the use of uh and um — referred to jointly as UHM — in 14 conversations totaling c. 62,350 words from the Santa Barbara Corpus of Spoken American English. UHM was much less frequent than in British English with 7.5 vs. 14.5 instances per million words in the British National Corpus. However, as in British English the frequency of UHM was closely correlated to extra-linguistic context. Conversations in non-private environments (such as offices and classrooms) had higher frequencies than those taking place in private spaces, mostly homes. Time required for planning, especially when difficult subjects were discussed, appeared to be an important explanatory factor. It is clear that UHM cannot be dismissed as mere hesitation or disfluency; it functions as a pragmatic marker on a par with well, you know, and I mean, sharing some of the functions of these in discourse. Although the role of sociolinguistic factors was less clear, the tendencies for older speakers and educated speakers to use UHM more frequently than younger and less educated ones paralleled British usage, but contrary to British usage, there were no gender differences.

2013

Julie Beliao, and Anne Lacheret, “Disfluency and discursive markers: when prosody and syntax plan discourse,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 5-8. http://www.isca-speech.org/archive/diss_2013/papers/dis6_005.pdf.

Abstract Hesitations, interruptions within phrases or within words are common in spontaneous speech. Those phenomena are widely known to be observable from a prosodic point of view through disfluencies. From a syntactic point of view, many studies already established that discursive markers such as hm, oh, I mean, etc. are representative of spontaneous speech. In this study, we demonstrate through a joint corpus-based analysis that these prosodical and syntactical features are correlated, without however being equivalent. More precisely, the lack of either disfluencies or discursive markers is consistently shown to be representative of a planned discourse.

Keywords discursive marker, disfluency, DiSS, genres
Malte Belz, and Myriam Klapi, “Pauses following fillers in L1 and L2 German map task dialogues,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 9-12. http://www.isca-speech.org/archive/diss_2013/papers/dis6_009.pdf.

Abstract Fillers and pauses in spoken language indicate hesitations. Filler type (uh vs. um) is believed to signal a minor or major following speech delay in L1. We examined whether advanced speakers of L2 German use pauses following filler type (äh vs. ähm) in the same way as native speakers do. Two Map Task corpora of L1 and L2 were contrasted with respect to speaker role, filler type and the exact time interval of fillers and pauses. Speaker role influenced the disfluency patterns in L1 and L2 in the same way. Filler type had no impact on the length of the following pause, but the time interval patterns differed significantly. Longer filler intervals are followed by longer pauses in L2 and by shorter pauses in L1. These results suggest that filler type in German is not used to indicate the length of the following delay. Advanced learners seem to have adopted this pattern of use, but cannot overcome their hesitations as fast as native speakers, probably due to their less automatised speech production.

Keywords contrastive analysis, disfluencies, DiSS, fillers, German, L1, L2, map task, pauses, spontaneous speech
Sara Candeias, Dirce Celorico, Jorge Proença, Arlindo Veiga, and Fernando Perdigão, “HESITA(tions) in Portuguese: a database,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 13-16. http://www.isca-speech.org/archive/diss_2013/papers/dis6_013.pdf.

Abstract With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.

Keywords annotation, disfluency, DiSS, hesitation corpus, hesitations, prepared speech, spontaneous speech
Rebecca Carroll, and Esther Ruigendijk, “The Effects of Syntactic Complexity on Processing Sentences in Noise,” Journal of Psycholinguistic Research, vol. 42, no. 2, 2013, pp. 139–159. DOI: 10.1007/s10936-012-9213-7. http://dx.doi.org/10.1007/s10936-012-9213-7.

Abstract This paper discusses the influence of stationary (non-fluctuating) noise on processing and understanding of sentences, which vary in their syntactic complexity (with the factors canonicity, embedding, ambiguity). It presents data from two RT-studies with 44 participants testing processing of German sentences in silence and in noise. Results show a stronger impact of noise on the processing of structurally difficult than on syntactically simpler parts of the sentence. This may be explained by a combination of decreased acoustical information and an increased strain on cognitive resources, such as working memory or attention, which is caused by noise. The noise effect for embedded sentences is less than for non-embedded sentences, which may be explained by a benefit from prosodic information.
Nivja H. de Jong, and Hans Rutger Bosker, “Choosing a threshold for silent pauses to measure second language fluency,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 17-20. http://www.isca-speech.org/archive/diss_2013/papers/dis6_017.pdf.

Abstract Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.

Keywords DiSS, duration of pauses, number of pauses, second language speech, silent pause threshold, silent pauses
Nivja H. de Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, and Jan H. Hulstijn, “Linguistic skills and speaking fluency in a second language,” Applied Psycholinguistics, vol. 34, no. 5, 09/2013 2013, pp. 893-916. DOI: 10.1017/S0142716412000069. http://journals.cambridge.org/article_S0142716412000069.

Abstract This study investigated how individual differences in linguistic knowledge and processing skills relate to individual differences in speaking fluency. Speakers of Dutch as a second language (N = 179) performed eight speaking tasks, from which several measures of fluency were derived such as measures for pausing, repairing, and speed (mean syllable duration). In addition, participants performed separate tasks, designed to gauge individuals’ second language linguistic knowledge and linguistic processing speed. The results showed that the linguistic skills were most strongly related to average syllable duration, of which 50% of individual variance was explained; in contrast, average pausing duration was only weakly related to linguistic knowledge and processing skills.
Laura E. de Ruiter, “Self-repairs in German children’s peer interaction - initial explorations,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 29-32. http://www.isca-speech.org/archive/diss_2013/papers/dis6_029.pdf.

Abstract Forty-nine self-repairs were extracted from a corpus of conversational speech of ten German children (mean age 5;1) with peers. The repairs were analysed using Levelt’s [1] classification and compared with his adult data. Children produced fewer appropriateness repairs than adults, but more covert repairs and more phonetic repairs. Like adults, children had a preference to interrupt themselves within-word only for error repairs. Unlike adults, children did not produce editing terms following interruptions.

Keywords DiSS
Andrea Deme, and Alexandra Markó, “Lengthenings aand filled pauses in Hungarian adults’ and children’s speech,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 21-24. http://www.isca-speech.org/archive/diss_2013/papers/dis6_021.pdf.

Abstract In the present paper vowel lengthenings and non-lexicalized filled pauses were studied in the spontaneous speech of children and adults (focusing more on the much less studied phenomenon: vowel lengthening). The results revealed different usage and appearance of lengthenings in the two age groups, therefore, differences in speech skills and strategies can be concluded. LEs and FPs differ mostly in their position in the speech session between the age groups, which has implications regarding different planning strategies of adults and children. We also draw conclusions regarding the methodological considerations in the issue of identifying vowel lengthening supporting a previously formulated conception.

Keywords (non-lexicalized) filled pause, discourse management, DiSS, lengthening, speech planning, spontaneous speech
Yasuharu Den, and Natsuko Nakagawa, “Anti-zero pronominalization: when Japanese speakers overtly express omissible topic phrases,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 25-28. http://www.isca-speech.org/archive/diss_2013/papers/dis6_025.pdf.

Abstract In this paper, we focus on cases where Japanese speakers overtly express a topic phrase that could have been omitted. We call this phenomenon anti-zero-pronominalization and hypothesize that this helps speakers gain time for planning a following utterance; anti-zero-pronominalization is another option to deal with cognitive load at the beginning of an utterance in addition to fillers and other speech disfluencies. Based on a quantitative analysis of a corpus of spontaneous Japanese dialogs, we investigate the difference between overt topic NPs and zero-pronouns. We show that i) the utterance is more complex when the topic is expressed as an overt NP than when it is expressed as a zero-pronoun; ii) turn-initial items such as fillers are produced less frequently when overt NPs appear than when zero-pronouns appear; and iii) the utterance becomes more complex when the last mora of the topic is more prolonged.

Keywords cognitive load, DiSS, Japanese dialogs, topic phrases, zero-pronouns
Luis J. García-López, M. Belén Díez-Bedmar, and José M. Almansa-Moreno, “From Being a Trainee to Being a Trainer: Helping Peers Improve their Public Speaking Skills,” Revista de Psicodidáctica, vol. 18, no. 2, 2013, pp. 331-342. DOI: 10.1387/RevPsicodidact.6419. http://www.redalyc.org/articulo.oa?id=17527003006.

Abstract Although public speaking anxiety is present at all educational stages, the university period is critical since the students’ lack of oral communication skills may prevent them from accomplishing their educational goals. To improve this situation, a two-fold objective was pursued in this study. First, to examine the effects of a 3-hour public speaking training workshop for Psychology undergraduates. Second, to test if these students could effectively train other undergraduates to use public speaking skills and reduce their anxiety by using a collaborative methodology and peer tutoring. The findings prove that the training of Psychology students resulted in their peers’ improvement of their oral communication skills and reduction of their speech anxiety. Both groups of students benefited from the study: Psychology students had the opportunity to improve their communication skills and gained practical experience, and the other undergraduates received a free, personalized and successful workshop which improved their communication skills and reduced their anxiety levels.

Keywords collaborative methodology, Communication skills, peers, public speaking
Jonathan Ginzburg, Raquel Fernández, and David Schlangen, “Self-addressed questions in disfluencies,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 33-36. http://www.isca-speech.org/archive/diss_2013/papers/dis6_033.pdf.

Abstract The paper considers self-addressed queries – queries speakers address to themselves in the aftermath of a filled pause. We study their distribution in the BNC and show that such queries show signs of sensitivity to the syntactic/semantic type of the sub-utterance they follow. We offer a formal model that explains the coherence of such queries.

Keywords DiSS
Sandra Götz, Fluency in Native and Nonnative English Speech. Amsterdam, Netherlands: John Benjamins Publishing Company.2013, pp. 238. DOI: 10.1075/scl.53. https://benjamins.com/$#$catalog/books/scl.53/main.

Abstract This book takes a new and holistic approach to fluency in English speech and differentiates between productive, perceptive, and nonverbal fluency. The in-depth corpus-based description of productive fluency points out major differences of how fluency is established in native and nonnative speech. It also reveals areas in which even highly advanced learners of English still deviate strongly from the native target norm and in which they have already approximated to it. Based on these findings, selected learners are subjected to native speakers’ ratings of seven perceptive fluency variables in order to test which variables are most responsible for a perception of oral proficiency on the sides of the listeners. Finally, language-pedagogical implications derived from these findings for the improvement of fluency in learner language are presented. This book is conceptually and methodologically relevant for corpus-linguistics, learner corpus research and foreign language teaching and learning.
Ivan Hernandez, and Jesse Lee Preston, “Disfluency disrupts the confirmation bias,” Journal of Experimental Social Psychology, vol. 49, no. 1, 01/2013 2013, pp. 178-182. DOI: http://dx.doi.org/10.1016/j.jesp.2012.08.010. http://www.sciencedirect.com/science/article/pii/S002210311200176X.

Abstract One difficulty in persuasion is overcoming the confirmation bias, where people selectively seek evidence that is consistent with their prior beliefs and expectations. This biased search for information allows people to analyze new information in an efficient, but shallow way. The present research discusses how experienced difficultly in processing (disfluency) can reduce the confirmation bias by promoting careful, analytic processing. In two studies, participants with prior attitudes on an issue became less extreme after reading an argument on the issues in a disfluent format. The change occurred for both naturally occurring attitudes (i.e. political ideology) and experimentally assigned attitudes (i.e. positivity toward a court defendant). Importantly, disfluency did not reduce confirmation biases when participants were under cognitive load, suggesting that cognitive resources are necessary to overcome these biases. Overall, these results suggest that changing the style of an argument’s presentation can lead to attitude change by promoting more comprehensive consideration of opposing views.

Keywords Attitude change, Confirmation bias, Fluency, Persuasion
Martina Jakesch, Helmut Leder, and Michael Forster, “Image Ambiguity and Fluency,” PLoS ONE, vol. 8, no. 9, 09/2013 2013, pp. e74084. DOI: 10.1371/journal.pone.0074084. http://dx.doi.org/10.1371%2Fjournal.pone.0074084.

Abstract Ambiguity is often associated with negative affective responses, and enjoying ambiguity seems restricted to only a few situations, such as experiencing art. Nevertheless, theories of judgment formation, especially the “processing fluency account”, suggest that easy-to-process (non-ambiguous) stimuli are processed faster and are therefore preferred to (ambiguous) stimuli, which are hard to process. In a series of six experiments, we investigated these contrasting approaches by manipulating fluency (presentation duration: 10ms, 50ms, 100ms, 500ms, 1000ms) and testing effects of ambiguity (ambiguous versus non-ambiguous pictures of paintings) on classification performance (Part A; speed and accuracy) and aesthetic appreciation (Part B; liking and interest). As indicated by signal detection analyses, classification accuracy increased with presentation duration (Exp. 1a), but we found no effects of ambiguity on classification speed (Exp. 1b). Fifty percent of the participants were able to successfully classify ambiguous content at a presentation duration of 100 ms, and at 500ms even 75% performed above chance level. Ambiguous artworks were found more interesting (in conditions 50ms to 1000ms) and were preferred over non-ambiguous stimuli at 500ms and 1000ms (Exp. 2a - 2c, 3). Importantly, ambiguous images were nonetheless rated significantly harder to process as non-ambiguous images. These results suggest that ambiguity is an essential ingredient in art appreciation even though or maybe because it is harder to process.
Frank Jansen, and Daniel Janssen, “Uw reservering is eh komen te vervallen - Experimenteel onderzoek naar het effect van gevulde pauzes in voicemails met slecht nieuws,” Tijdschrift voor Taalbeheersing, vol. 35, no. 3, December 2013, pp. 237-253. DOI: 10.5117/TVT2013.3.JANS. https://www.ingentaconnect.com/content/aup/tt/2013/00000035/00000003/art00003.

Abstract This article presents the results of three experiments in which the influence of the pause eh in bad news voicemails is studied on the hearer evaluation. Based on the politeness theory of Brown & Levinson (1987) we expect that eh will facilitate the hearer’s acceptance of the bad news. The addition of eh turns out to have a positive effect on the attributed relational qualities of the speaker of the voice mail. On the other hand, his attributed communicative professionalism is rated lower. One of the two potential explanations for these results is that eh causes some delay in the presentation of the bad news itself, thereby triggering the hearer’s suspicion that really very bad news is forthcoming. Against this expectation the eventual bad news is not that bad. The experimental evidence does not support this hypothesis. Therefore the alternative hypothesis, eh signals the speaker’s difficulty to communicate the message, which in turn makes him more empathic, becomes highly probable.

Keywords communicative professionalism, empathy, filled pause, hearer’s evaluation, politeness
Tyler Kendall, Speech Rate, Pause and Sociolinguistic Variation. Basingstoke: Palgrave Macmillan.2013. DOI: 10.1057/9781137291448.0001. http://www.palgrave.com/page/detail/speech-rate-pause-and-sociolinguistic-variation-tyler-kendall/?isb=9780230249776.

Abstract Speech Rate, Pause, and Sociolinguistic Variation examines the confluence of psycholinguistic factors and social factors in linguistic variation through corpus-based analyses of speech rate and silent pause in US English. In particular, based on a large amount of data extracted from a wide range of sociolinguistic interview recordings, it demonstrates the great extent to which articulation rates are correlated with social factors of speakers (such as regional origin and sex) while pause durations are less so. Through the development of new quantitative techniques, it considers the cognitive importance of variability in pauses and highlights new ways that speech features like these can be used to help understand the production of sociolinguistic variables. With detailed discussions of its data and methods, and with a helpful accompanying website, it makes a valuable guide for conducting one’s own corpus (socio)phonetic research.
Hanae Koiso, and Yasuharu Den, “Acoustic and linguistics features related to speech planning appearing at weak clause boundaries in Japanese monologs,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 37-40. http://www.isca-speech.org/archive/diss_2013/papers/dis6_037.pdf.

Abstract In this paper, we focus on weak clause boundaries in Japanese monologs in order to investigate the relationship of the length of constituents following weak boundaries to three acoustic and linguistic features: 1) occurrence rate of fillers, 2) occurrence rate of boundary pitch movements, and 3) degree of lengthening of clause-final morae. We found that all these features were significantly correlated with the length of following constituents. Most importantly, boundary pitch movements had an additional effect that can be distinct from the effect of clause-final lengthening. These results suggest that Japanese speakers have earlier-occurring items that help them deal with cognitive load in speech planning, in addition to fillers and other clause-initial disfluencies.

Keywords boundary pitch movements, clause-final lengthening, DiSS, fillers, Japanese monologs
Kikuo Maekawa, “Prediction of F0 height of filled pauses in spontaneous Japanese: a preliminary report,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 41-44. http://www.isca-speech.org/archive/diss_2013/papers/dis6_041.pdf.

Abstract F0 values of filled pauses (FP) in the Corpus of Spontaneous Japanese were analyzed to examine the mechanism by which the F0 heights of FP were determined. Statistical analyses of the F0 values of FP occurring in between two full-fledged accentual phrases (AP) revealed correspondence between the occurrence timing of FP and the F0 height. Based upon this finding, 5 models of F0 prediction were proposed. Comparison of the mean prediction errors revealed that the best prediction was obtained in a model that linearly interpolate the phrase-final L% tone of the immediately preceding AP and the phrase-initial %L tone of the immediately following AP. This finding suggests that the F0 of FP was specified at the level of phonetic realization rather than phonological prosodic representation.

Keywords DiSS
Takehiko Maruyama, “Analysis of parenthetical clauses in spontaneous Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 45-48. http://www.isca-speech.org/archive/diss_2013/papers/dis6_045.pdf.

Abstract In this paper, I will discuss the functional aspects of parenthetical clauses and sentences in spontaneous Japanese monologues. Parentheticals can be defined as syntactic elements that are instantly inserted in the middle of an ongoing utterance to add supplemental information and thus interrupts the fluent flow of speech production. Examples of parenthetical clauses/sentences that appeared in the Corpus of Spontaneous Japanese were examined and then classified into three types. These types differ in their contextual functions, but share a commonality in that they present multiplex information simultaneously in the process of producing spontaneous speech.

Keywords contextual functions, Corpus of Spontaneous Japanese, DiSS, parenthetical clause/sentence
Helena Moniz, Fernando Batista, Isabel Trancoso, and Ana Isabel Mata, “Automatic structural metadata identification based on multilayer prosodic information,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 49-52. http://www.isca-speech.org/archive/diss_2013/papers/dis6_049.pdf.

Abstract This paper discriminates different types of structural metadata in transcripts of university lectures: boundary events (comma, full stops and interrogatives), and disfluencies (repair). The disambiguation process is based on predefined multilayered linguistic information and on its hierarchical structure. Since boundary events may share similar linguistic properties, in terms of F0 and energy slopes, presence/absence of silent pauses, and duration of different units of analysis, different classification methods based on a set of automatically derived prosodic features have been applied to differentiate between those events and disfluencies. This paper also performs a detailed analysis on the impact of each individual feature in discriminating each structural event. The results of our data-driven approach allow us to reach a structured set of basic features towards the disambiguation of metadata events. These results are a step forward towards the analysis of speech acts and their disambiguation from disfluencies.

Keywords automatic speech processing, disfluencies, DiSS, speech prosody, structural metadata
Rena Nemoto, “Which kind of hesitations can be found in Estonian spontaneous speech?,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 53-54. http://www.isca-speech.org/archive/diss_2013/papers/dis6_053.pdf.

Abstract This paper describes the acoustic characteristics of hesitations in Estonian spontaneous speech. We especially investigate duration, fundamental frequency, and first two formant analyses. Most frequent hesitations can be expressed by lengthened phonemes such as /ää/, /ee/, /õõ/, and /mm/. We compare lengthened phoneme hesitations with their related phonemes. The results from our preliminary hesitation study show (i) hesitations have longer duration and its range is spread; (ii) hesitations globally include lower pitch; (iii) hesitation formants are likely to be centralized or posterior and opened in comparison with related phonemes.

Keywords DiSS, Estonian, hesitation, spontaneous speech
Sieb Nooteboom, and Hugo Quené, “Self-monitoring as reflected in identification of misspoken segments,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 55-57. http://www.isca-speech.org/archive/diss_2013/papers/dis6_055.pdf.

Abstract Most segmental speech errors probably are articulatory blends of competing segments. Perceptual consequences were studied in listeners’ reactions to misspoken segments. 291 speech fragments containing misspoken initial consonants plus 291 correct control fragments, all stemming from earlier SLIP experiments, were presented for identification to listeners. Results show that misidentifications (i.e. deviations from an earlier auditory transcription) are rare (3%), but reaction times to correctly identified fragments systematically reflect differences between correct controls, undetected, early detected and late detected speech errors, leading to the following speculative conclusions: (1) segmental errors begin their life in inner speech as full substitutions, and competition with correct target segments often is slightly delayed; (2) in early interruptions speech is initiated before competing target segments are activated, but then rapidly interrupted after error detection; (3) late detected errors reflect conflict-based monitoring of articulation or monitoring overt speech.

Keywords DiSS
Klim Peshkov, Laurent Prévot, Stéphane Rauzy, and Berthille Pallaud, “Catogorizing syntactic chunks for marking disfluent speech in French language,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 59-62. http://www.isca-speech.org/archive/diss_2013/papers/dis6_059.pdf.

Abstract Disfluency is the first phenomenon one has to address when processing spontaneous speech. Efficient systems combining transcription-based and signal-based cues have been created for English. These systems generally use supervised machine learning models, trained over large annotated datasets combining signal and transcription. As for other languages, including French, the situation is complicated by the lack of resources. A few proposals based on filled pauses, truncated words and repetitions have been made for identifying disfluencies in French. In this paper, we propose a transcription-based approach to this task, with high-quality morpho-syntactic tags as input for identifying disfluent areas. Originally, we adopted a transcription-based approach for obtaining an independent way of characterizing disfluencies. This can be later compared and combined with prosodic cues. Our method consists in building syntactic chunks from our tagging and then classify these chunks into several categories, some of them being considered as disfluent. We apply our method to speaker style characterization, discourse genres zoning, as well as to dataset cleaning. Finally, an attempt is made to relate our disfluent chunks to a more standard description of disfluencies in order to open the way of a deeper integration of our work with the one of the disfluency community.

Keywords chunking, disfluencies, DiSS, speaking style, tagging, transcription-based approach
Jorge Proença, Dirce Celorico, Arlindo Veiga, Sara Candeias, and Fernando Perdigão, “Acoustical characterization of vocalic fillers in European Portuguese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 63-66. http://www.isca-speech.org/archive/diss_2013/papers/dis6_063.pdf.

Abstract This study attempts to acoustically characterize the most common filled pause vocalizations (or vocalic fillers) in spontaneous speech in European Portuguese: the near-open central vowel [ɐ] and the mid-central vowel [ə]. For this purpose we analyzed the spectral information of the vocalic fillers by estimating their first two formant frequencies as well as their duration properties. The vocalic fillers are taken from a large corpus of European Portuguese broadcast news’ speech. We also compared the vocalic fillers with lexical vowels possessing similar timbre. No formant variation trend was attained for the vocalic fillers and a great overlap of formant values is observed. These results provide a base of information for understanding the most common vocalic fillers in European Portuguese spontaneous speech.

Keywords DiSS, filled pauses, formant estimation, hesitations, spontaneous speech, vocalic fillers
Ralph L. Rose, “Crosslinguistic Corpus of Hesitation Phenomena: A Corpus for Investigating First and Second Language Speech Performance,” in INTERSPEECH 2013, Lyon, France, 08/2013 2013, pp. 992-996. http://www.isca-speech.org/archive/interspeech_2013/i13_0992.html.

Abstract There is a growing consensus that there is a need to evaluate second language speech performance with respect to first language speech behavior. To support this need, the Crosslinguistic Corpus of Hesitation Phenomena was developed. This freely available corpus is designed to investigate the crosslinguistic influence of speech patterns and consists of recordings of speakers producing first and second language speech samples in response to parallel elicitation tasks in each language. Preliminary results from the corpus are consistent with other findings that second language performance is sometimes correlated with first language speech behavior. In particular, findings show that silent pause rate and duration as well as other hesitation phenomena correlate with first language performance while speech rate does not. Interestingly, repeats also differ from first language production. Results show that the corpus may be a useful tool for researchers who wish to investigate the correspondence between first and second language speech, particularly with respect to the use of hesitation phenomena.

Keywords corpus, hesitation phenomena, second language speech
Vered Silber-Varod, and Takehiko Maruyama, “The linguistic role of hesitation disfluencies: evidence from Hebrew and Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 67-70. http://www.isca-speech.org/archive/diss_2013/papers/dis6_067.pdf.

Abstract In this paper we examine a certain aspect of prosodysyntax interface, that of hesitation disfluencies (HD) that occur intra-phrases or intra-morphemes. Such cases were found in two spontaneous corpora of two syntactically distinct languages – Israeli Hebrew (IH) and Japanese. It was found that intra-phrasal hesitations in the two languages calls for different explanations, since in Japanese the noun (e.g., in NP) precedes the case marking particle while in IH the preposition (e.g., in PP) precedes the noun. In this paper we will present qualitative findings and suggest a unified view of the phenomenon of intra-phrasal HDs.

Keywords DiSS, hesitation disfluency, Israeli Hebrew, Japanese, prosody-syntax interface
Michiko Watanabe, “Phrasal complexity and the occurrence of filled pauses in presentation speeches in Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 71-72. http://www.isca-speech.org/archive/diss_2013/papers/dis6_071.pdf.

Abstract Filled pauses are ubiquitous in everyday speech. I investigated whether linguistic complexity of upcoming phrases affects filler rate at phrase boundaries in presentation speeches in Japanese. Filler rate at phrase boundaries increased monotonically with complexity of the following phrases. However, when the following phrase was composed of more than 11 Bunsetsu-phrases, the filler rate did not show any constant increase. The results indicate that filler rate at phrase boundaries is closely related to cognitive load of local linguistic encoding and that the maximum planning span for linguistic encoding is about 10 Bunsetsu-phrases in Japanese monologues.

Keywords bunsetsu-phrase, DiSS, filled pause, linguistic complexity, planning load
Charlotte Wollermann, Eva Lasarcyk, Ulrich Schade, and Bernhard Schröder, “Disfluencies and uncertainty perception - evidence from a human-machine scenario,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 73-76. http://www.isca-speech.org/archive/diss_2013/papers/dis6_073.pdf.

Abstract This paper deals with the modelling and perception of disfluencies in articulatory speech synthesis. The stimuli are embedded into short dialogues in question-answering situations in a human–machine scenario. The system is supposed to express uncertainty in the answer. We test the influence of delay, intonation, and filler as prosodic indicators of uncertainty on perception in two studies. Study 1 deals with the effect of delay and filler on uncertainty perception. Results suggest an additive effect of the cues, i.e. the activation of both prosodic cues of uncertainty has a stronger impact on uncertainty perception than the deactivation of a single cue or of both cues. With respect to the effect of single cues, no significant difference can be observed. Study 2 investigates the impact of delay and intonation on perceived uncertainty. Again, a principle of additivity can be observed. Furthermore as modelled here, intonation has a stronger influence than delay. In both studies no correlation between the ranking of uncertainty and naturalness of the stimuli is found.

Keywords disfluencies, DiSS, speech perception, speech synthesis, uncertainty
Luke Jai Wood, Kerstin Dautenhahn, Austen Rainer, Ben Robins, Hagen Lehmann, and Dag Sverre Syrdal, “Robot-Mediated Interviews - How Effective Is a Humanoid Robot as a Tool for Interviewing Young Children?,” PLoS ONE, vol. 8, no. 3, 03/2013 2013, pp. e59448. DOI: 10.1371/journal.pone.0059448. http://dx.doi.org/10.1371%2Fjournal.pone.0059448.

Abstract Robots have been used in a variety of education, therapy or entertainment contexts. This paper introduces the novel application of using humanoid robots for robot-mediated interviews. An experimental study examines how children’s responses towards the humanoid robot KASPAR in an interview context differ in comparison to their interaction with a human in a similar setting. Twenty-one children aged between 7 and 9 took part in this study. Each child participated in two interviews, one with an adult and one with a humanoid robot. Measures include the behavioural coding of the children’s behaviour during the interviews and questionnaire data. The questions in these interviews focused on a special event that had recently taken place in the school. The results reveal that the children interacted with KASPAR very similar to how they interacted with a human interviewer. The quantitative behaviour analysis reveal that the most notable difference between the interviews with KASPAR and the human were the duration of the interviews, the eye gaze directed towards the different interviewers, and the response time of the interviewers. These results are discussed in light of future work towards developing KASPAR as an ‘interviewer’ for young children in application areas where a robot may have advantages over a human interviewer, e.g. in police, social services, or healthcare applications.

2012

Hans Rutger Bosker, Anne-France Pinget, Hugo Quené, Ted Sanders, and Nivja H. de Jong, “What makes speech sound fluent? The contributions of pauses, speed and repairs,” Language testing, vol. 30, no. 2, 04/2013 2012, pp. 159-175. DOI: 10.1177/0265532212455394. http://ltj.sagepub.com/content/30/2/159.

Abstract The oral fluency level of an L2 speaker is often used as a measure in assessing language proficiency. The present study reports on four experiments investigating the contributions of three fluency aspects (pauses, speed and repairs) to perceived fluency. In Experiment 1 untrained raters evaluated the oral fluency of L2 Dutch speakers. Using specific acoustic measures of pause, speed and repair phenomena, linear regression analyses revealed that pause and speed measures best predicted the subjective fluency ratings, and that repair measures contributed only very little. A second research question sought to account for these results by investigating perceptual sensitivity to acoustic pause, speed and repair phenomena, possibly accounting for the results from Experiment 1. In Experiments 2–4 three new groups of untrained raters rated the same L2 speech materials from Experiment 1 on the use of pauses, speed and repairs. A comparison of the results from perceptual sensitivity (Experiments 2–4) with fluency perception (Experiment 1) showed that perceptual sensitivity alone could not account for the contributions of the three aspects to perceived fluency. We conclude that listeners weigh the importance of the perceived aspects of fluency to come to an overall judgment.

Keywords Fluency perception, pauses, perceptual sensitivity, repair, speed
Troy Cox, and Wendy Baker-Smemoe, “The relationship between L1 fluency and L2 fluency across different proficiency levels and L1s,” November 2012. https://nivjadj.wixsite.com/workshopfluentspeech/coxandsmemoe/c1gv7.

Abstract Our understanding of oral temporal fluency (i.e., speech rate, pauses, and hesitations) in a second language (L2) has increased greatly in the past several years, along with our understanding of its relationship to overall proficiency, language processing, and automaticity (i.e., Brand & Götz, 2011; Segalowitz, 2007). However, the role of the speaker’s fluency in their native language (L1) on L2 fluency is still not understood. Few studies have examined this relationship, and these studies have examined few L1/L2 relationships across few proficiency levels (Scanlon, 1987; Derwing et al., 2009). Thus, the influence of L1 fluency on L2 fluency development is still unclear. The purpose of this study is to determine the effect of native language (L1) fluency and L2 proficiency level on features of L2 temporal fluency. Over one hundred English as a second language (ESL) students participated from five L1 backgrounds (Chinese, Japanese, Korean, Spanish, Portuguese) and 9 proficiency levels (novice high to advanced high on the ACTFL scale). Participants were asked to describe 4 pictures stories, 2 in their L1 and 2 in their L2. Several fluency measures including unfilled pauses, speech rate, and articulation rate were analyzed using the Praat script described in de Jong and Wempe (2007). These fluency measures in the L1 were compared to those in the L2. The results of this analysis revealed that all features were highly correlated across the two languages, that these correlations were stronger for lower than higher proficiency speakers, and that differences in the number and type of pauses, as well as speaking rate, differed across L1s. These results suggest that fluency reveals more than processing constraints aggregated by learning an L2, and suggest that measuring L1 fluency is important in any investigation of L2 fluency.
Nivja De Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, and Jan H. Hulstijn, “Facets of Speaking Proficiency,” Studies in Second Language Acquisition, vol. 34, no. 1, March 2012, pp. 5-34. DOI: 10.1017/S0272263111000489.

Abstract This study examined the componential structure of second-language (L2) speaking proficiency. Participants—181 L2 and 54 native speakers of Dutch—performed eight speaking tasks and six tasks tapping nine linguistic skills. Performance in the speaking tasks was rated on functional adequacy by a panel of judges and formed the dependent variable in subsequent analyses (structural equation modeling). The following independent variables were assessed separately: linguistic knowledge in two tests (vocabulary and grammar); linguistic processing skills (four reaction time measures obtained in three tasks: picture naming, delayed picture naming, and sentence building); and pronunciation skills (speech sounds, word stress, and intonation). All linguistic skills, with the exception of two articulation measures in the delayed picture naming task, were significantly and substantially related to functional adequacy of speaking, explaining 76% of the variance. This provides substantial evidence for a componential view of L2 speaking proficiency that consists of language-knowledge and language-processing components. The componential structure of speaking proficiency was almost identical for the 40% of participants at the lower and the 40% of participants at the higher end of the functional adequacy distribution (n = 73 each), which does not support Higgs and Clifford’s (1982) relative contribution model, predicting that, although L2 learners become more proficient over time, the relative weight of component skills may change.
Ian R. Finlayson, and Martin Corley, “Disfluency in dialogue: an intentional signal from the speaker?,” Psychonomic Bulletin & Review, vol. 19, no. 5, October 2012, pp. 921-928. DOI: 10.3758/s13423-012-0279-x. https://link.springer.com/article/10.3758/s13423-012-0279-x.

Abstract Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use disfluency in dialogue to manage listeners’ expectations? To address this question, we present two experiments investigating the production of disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue.
Jordi Adell, David Escudero, and Antonio Bonafonte, “Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence,” Speech Communication, vol. 54, no. 3, 2012, pp. 459-476. DOI: http://dx.doi.org/10.1016/j.specom.2011.10.010. http://www.sciencedirect.com/science/article/pii/S0167639311001580.

Abstract Until now, speech synthesis has mainly involved reading-style speech. Today, however, text-to-speech systems must provide a variety of styles because users expect these interfaces to do more than just read information. If synthetic voices must be integrated into future technology, they must simulate the way people talk instead of the way people read. Existing knowledge about how disfluencies occur has made it possible to propose a general framework for synthesising disfluencies. We propose a model based on the definition of disfluency and the concept of underlying fluent sentences. The model incorporates the parameters of standard prosodic models for fluent speech with local modifications of prosodic parameters near the interruption point. The constituents of the local models for filled pauses are derived from the analysis corpus, and constituent’s prosodic parameters are predicted via linear regression analysis. We also discuss the implementation details of the model when used in a real speech synthesis system. Objective and perceptual evaluations showed that the proposed models outperformed the baseline model. Perceptual evaluations of the system showed that it is possible to synthesise filled pauses without decreasing the overall naturalness of the system, and users stated that the speech produced is even more natural than the one produced without filled pauses.

Keywords Perceptual evaluation
Ralph L. Rose, “On the lexical status of filled pauses: Seeing ’uh’ and ’um’ as words,” 2012.

Abstract Filled pauses (FPs: e.g., English uh/um, Japanese e-(to)) occur frequently in everyday communication. However, the exact linguistic status of FPs has been the subject of some debate. Some researchers have argued that FPs are words, with the same lexical status as such interjections as well or oh (Clark and Fox Tree 2002), or at least word-like in that they can be used in a controlled fashion (Villar et al 2012). However, others have argued that the evidence is inconclusive and that FPs can be regarded as resulting automatically from cognitive processes (Corley and Stewart 2008). I argue that FPs are words based on facts showing the systematic and distinctive use of FPs in speech corpora (Kjellmer, 2003), and particularly in a corpus of blog writings (Rose 2011). Evidence from these corpora show that FPs are used, among other ways, to highlight unexpected or unusual words and phrases (e.g., "Jan Wenner’s famous pub has gone, um, gaga for [Lady] Gaga.").
Gina Villar, Joanne Arciuli, and David Mallard, “Use of "um" in the deceptive speech of a convicted murderer,” Applied Psycholinguistics, vol. 33, no. 1, January 2012, pp. 83-95. DOI: 10.1017/S0142716411000117.

Abstract Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data revealed significantly fewer instances of "um" in deceptive speech. These results are in line with our recent study of "um" in laboratory elicited low-stakes lies. Rather than constituting a filled pause or speech disfluency, "um" may have a lexical status similar to other English words and may be under the strategic control of the speaker. In an attempt to successfully deceive, humans may alter their speech, perhaps in order to avoid certain language behaviors that they think might give them away.

2011

Karin Aijmer, “"Well I’m not sure I think…" The use of "well" by non-native speakers,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 231-254. DOI: 10.1075/ijcl.16.2.04aij.

Abstract Pragmatic markers are an important part of the grammar of conversation and not simply markers of disfluency. They have a number of functions that help the speaker to organise the conversation and to express feelings and attitudes. Advanced EFL learners use frequent pragmatic markers such as well. However their use of well diverges from the native speaker norm. The present study uses data from the Swedish component of the LINDSEI corpus and its native speaker counterpart (LOCNEC) to examine similarities and differences between native and non-native speakers. The overall picture is that Swedish learners overuse well, although there are considerable individual differences. Thus learners use well above all as a fluency device to cope with speech management problems but underuse it for attitudinal purposes. Pragmatic markers cannot be taught in the same way as other lexical items but it is important to discuss how and where they are used.

Keywords language teaching, learner corpora, non-native speaker, pragmatic marker, well
Christiane Brand, and Sandra Götz, “Fluency versus accuracy in advanced spoken learner language: A multi-method approach,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 255-275. DOI: 10.1075/ijcl.16.2.05bra.

Abstract In this paper we present a possible multi-method approach towards the description of a potential correlation between errors and temporal variables of (dys-)fluency in spoken learner language. Using the German subcorpus of the Louvain International Database of Spoken English Interlanguage (LINDSEI) and the native control corpus Louvain Corpus of Native English Conversation (LOCNEC), we first analysed errors and temporal variables of fluency quantitatively. We detected lexical and grammatical categories which are especially error-prone as well as problematic aspects of fluency for all learners in the LINDSEI subcorpus, e.g. confusion in tense agreement across clauses or an overuse of unfilled pauses. In the ensuing qualitative analysis of five prototypical learners, no trend for a possible correlation of accuracy and fluency could be observed. Fifty native speakers’ ratings of these five learners revealed that the learner with an average performance across the investigated variables received the highest ratings for overall oral proficiency.

Keywords accuracy, error analysis, errors, Fluency, learner corpus, LINDSEI
Martin Corley, and Robert J. Hartsuiker, “Why Um Helps Auditory Word Recognition: The Temporal Delay Hypothesis,” PLOS ONE, vol. 6, no. 5, 05 2011, pp. 1-6. DOI: 10.1371/journal.pone.0019792.

Abstract Several studies suggest that speech understanding can sometimes benefit from the presence of filled pauses (uh, um, and the like), and that words following such filled pauses are recognised more quickly. Three experiments examined whether this is because filled pauses serve to delay the onset of upcoming words and these delays facilitate auditory word recognition, or whether the fillers themselves serve to signal upcoming delays in a way which informs listeners' reactions. Participants viewed pairs of images on a computer screen, and followed recorded instructions to press buttons corresponding to either an easy (unmanipulated, with a high-frequency name) or a difficult (visually blurred, low-frequency) image. In all three experiments, participants were faster to respond to easy images. In 50% of trials in each experiment, the name of the image was directly preceded by a delay; in the remaining trials an equivalent delay was included earlier in the instruction. Participants were quicker to respond when a name was directly preceded by a delay, regardless of whether this delay was filled with a spoken um, was silent, or contained an artificial tone. This effect did not interact with the effect of image difficulty, nor did it change over the course of each experiment. Taken together, our consistent finding that delays of any kind help word recognition indicates that natural delays such as fillers need not be seen as ‘signals’ to explain the benefits they have to listeners' ability to recognise and respond to the words which follow them.
Nivja De Jong, “Cross-linguistic differences in pausing behavior,” December 2011. https://mirjamernestus.nl/Ernestus/public/AbstractsWorkshop2011.pdf.

Abstract Pauses in speech can serve communicative means, to help listeners understand (Clark, 1994), and pauses can be due to cognitive factors, when a speaker has not finished planning and formulating the upcoming utterance (Howell & Au-Yeung, 2002). In theories of speech production, lexical concepts are seen as the basic units of planning. If this holds for all languages, one would predict that for an agglutinative language such as Turkish, units of planning can be larger than for a non-agglutinative language such as English. Following this reasoning, speakers of Turkish would have fewer opportunities to pause than speakers of English. This hypothesis is tested by comparing speech data of Turkish and English native speakers. Twenty-four Turkish speakers and twenty-nine English speakers performed eight speaking tasks. These tasks were long turns in simulated conversation. In total, nine hours of Turkish and English speech were annotated, adding information about frequency and duration of silent pauses (as well as other hesitation phenomena). The results showed that Turkish words are indeed longer in number of syllables and in duration. Furthermore, speakers hardly paused within words, confirming the hypothesis that lexical items form the basis of units-of-speech. Finally, Turkish speakers paused less often than English speakers, but when they paused the duration of these pauses was longer. In total, percentage of time spent pausing did not differ for the Turkish and English speakers. We conclude that usage of pauses due to cognitive factors is dependent on typological features of languages, leading to cross-linguistic differences in pausing behavior.
Tyko Dirksmeyer, “Lexical hesitation marking in Chintang: Evidence for fillers as words,” December 2011. https://mirjamernestus.nl/Ernestus/public/AbstractsWorkshop2011.pdf.

Abstract The status of hesitation markers (or ‘fillers’, ‘filled pauses’, ‘editing expressions’, etc. — such as uh(m) in English) has been fiercely disputed in various subdisciplines of the language sciences over the past decades. | Should these items be viewed as aberrations in performance that need to be excluded from linguistic analysis (e.g. Chomsky 1965), are they symptoms of speech production processes that signal trouble but do not signify anything beyond that (Goldman-Eisler 1968; Levelt 1989), or are they actively employed as communicative means just like other words are (Clark and Fox Tree 2002; Jefferson 1974; Schegloff 2010), and thus form an integral part of language? | Chintang, a Tibeto-Burman language spoken in two villages in Nepal, provides evidence for the latter view. Its principal hesitation marker me~ı occurs in the same range of functional environments — word search, self-repair, prefacing dispreferred turns, among others — in which uh(m) appears in English (and similar forms feature in other wellknown languages). Yet, me~ı demonstrably conforms to standard phonological, morphosyntactic and semantic criteria for wordhood, can be seamlessly integrated into utterances, and is regularly exploited for communicative purposes such as "floor management" and projecting what to expect next. | In this talk, I will review data drawn from a corpus of video-recorded naturallyoccurring conversational interaction in Chintang and argue for the profoundly conventional nature of hesitation marking with me~ı. The findings from this small, as-yet-understudied speech community indicate that fillers should indeed be treated as lexical items on a par with other words. Consequently, they call on linguistic theorizing not only to take hesitation marking and its communicative functions in conversational speech seriously, but also to embrace and incorporate typological diversity in order to arrive at truly generalizable models of language processing.
Gaëtanelle Gilquin, and Sylvie De Cock, “Errors and disfluencies in spoken corpora: Setting the scene,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 141-172. DOI: 10.1075/ijcl.16.2.01gil.

Abstract (none)
John Osborne, “Fluency, complexity and informativeness in native and non-native speech,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 276-298. DOI: 10.1075/ijcl.16.2.06osb.

Abstract Individual speakers vary considerably in their rate of speech, their syntactic choices, and the organization of information in their discourse. This study, based on a corpus of monologue productions from native and non-native speakers of English and French, examines the relations between temporal fluency, syntactic complexity and informational content. The purpose is to identify which features, or combinations of features, are common to more fluent speakers, and which are more idiosyncratic in nature. While the syntax of fluent speakers is not necessarily more complex than that of less fluent speakers, it is suggested that they are able to deliver content more efficiently through a combination of less hesitant speech and of lexical and syntactic choices that allow them to package information more economically.

Keywords Fluency, information content, learner corpora, lexical bundles, syntactic complexity
Anne-France Pinget, “Native Speakers’ Perceptions of Fluency and Accent in L2 Speech,” Master's Thesis, Utrecht University, Utrecht, the Netherlands, . June 2011. http://igitur-archive.library.uu.nl/student-theses/2011-0816-200626/UUindex.html.

Abstract The goal of this study is threefold. It is aimed at exploring (i) the relationship between objective properties of speech and perceived fluency, (ii) the relationship between segmental characteristics of speech and perceived accent, and (iii) the relationship between fluency and accent. We collected 90 speech samples from Turkish and English L2 learners of Dutch. Objective measures of fluency and accent were made for each sample. Forty untrained native speakers of Dutch rated the samples for fluency and accentedness. The results showed that the temporal measures of fluency were good predictors of fluency ratings, and that their predictive power depends on the type of measures used (i.e. traditional measures per time units, measures per information units, measures that take the L1 into consideration). Furthermore, the segmental measure of accent could predict a small part of accent ratings. Finally, perceived fluency and accent appeared to be weakly correlated, but objective measures of fluency and accent did not add additional explanatory power to the models of perceived accent and perceived fluency respectively.

Keywords accent, Fluency, perception, second language acquisition
Ralph L. Rose, “Filled Pauses in Writing: What can they Teach us about Speech?,” December 2011. https://mirjamernestus.nl/Ernestus/public/AbstractsWorkshop2011.pdf.

Abstract This presentation reports on a research effort to use filled pauses ('uh', 'um': hereafter, FPs) in blog writings to better understand how and why speakers use them in spontaneous speech. Blog FPs are written intentionally and cannot be the result of some linguistic processing shortcoming (i.e., speech-repair as in Levelt, 1983). Hence, if written FPs can be accurately characterized, then the spoken FPs that fit this characterization can be removed from consideration leaving a smaller, potentially more uniform set of other FPs for further study. | Samples of FPs in blog writings were gathered from 100 top blogs. Samples of FPs in spontaneous speech were taken from the Switchboard corpus. A balanced sample of 227 FPs were gathered of each type. Each FP was categorized according to its medium (written or spoken), its location (at clause boundary or clause-internal), the part-of-speech of the immediately following word (content or function, following Maclay and Osgood's 1959 classification), and the FP type (open 'uh' or closed 'um', after Rose, 1998). The data was analyzed under a generalized linear model with chi-square tests. | There was a main effect of FP Type (Chi-square=48.4, p<0.001) with a ratio of open to closed FPs of approximately 2:1. This is comparable to previous studies (e.g., Rose, 1998). There were no other main effects. There was an interaction between medium and following word type (Chi-square=37.0, p<0.001), as well as between medium and FP type (Chisquare=5.4, p<0.05). In the spoken medium, the following word was 30% more likely to be a function word than a content word, while in the written medium, this trend reversed: the following word was 70% more likely to be a content word than a function word. Also, in the spoken medium, the ratio of open to closed FPs was almost 3:1, but in the written medium, this ratio dropped to 1.4:1. | Results from FPs in writing suggest a hybrid view of FPs in speech: Some FPs are used intentionally and with some selectional restrictions (i.e., before content words) in order to serve some pragmatic function (cf., filler-as-word hypothesis in Clark and Fox Tree, 2002), with open FPs being slightly preferred in this role. Other FPs in speech are the result of difficulties during linguistic processing and occur semi-automatically as part of speech repair (cf., Levelt, 1983).
Christoph Rühlemann, Andrej Bagoutdinov, and Matthew Brook O’Donnell, “Windows on the mind: Pauses in conversational narrative,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 198-230. DOI: 10.1075/ijcl.16.2.03ruh.

Abstract This paper investigates four different types of pauses in conversational narrative: the filled pauses er and erm, and short and long silent pauses. The study is based on the Narrative Corpus (NC), a recently created corpus of everyday narratives. The texts, which include both the narrative and some context, have been annotated for important textual components. The current analysis reveals that pauses are more frequent in conversational narrative than in general conversation. We suggest three factors that account for this high frequency: (i) the need for narrators, in the opening utterance of the story, to provide specific information to orient listeners to the situation in which the events unfolded, (ii) the need to coordinate narrative clauses to match the story events, and (iii) the preference of narrators to present speech, thought, emotion and gesture using direct-mode discourse presentation, which is more "dramatic" but also more costly in terms of reference resolution.

Keywords discourse presentation, narrative, narrative corpus, pauses, quotatives, Reference
Scott H. Fraundorf, and Duane G. Watson, “The disfluent discourse: Effects of filled pauses on recall,” Journal of Memory and Language, vol. 65, no. 2, 2011, pp. 161-175. DOI: http://dx.doi.org/10.1016/j.jml.2011.03.004. http://www.sciencedirect.com/science/article/pii/S0749596X11000234.

Abstract We investigated the mechanisms by which fillers, such as uh and um, affect memory for discourse. Participants listened to and attempted to recall recorded passages adapted from Alice’s Adventures in Wonderland. The type and location of interruptions were manipulated through digital splicing. In Experiment 1, we tested a processing time account of fillers’ effects. While fillers facilitated recall, coughs matched in duration to the fillers impaired recall, suggesting that fillers’ benefits cannot be attributed to adding processing time. In Experiment 2, fillers’ locations were manipulated based on norming data to be either predictive or non-predictive of upcoming material. Fillers facilitated recall in both cases, inconsistent with an account in which listeners predict upcoming material using past experience with the distribution of fillers. Instead, these results suggest an attentional orienting account in which fillers direct attention to the speech stream but do not always result in specific predictions about upcoming material.

Keywords Language comprehension
Parvaneh Tavakoli, “Pausing patterns: differences between L2 learners and native speakers,” ELT Journal, vol. 65, no. 1, May 2011, pp. 71-79. DOI: 10.1093/elt/ccq020.

Abstract This paper reports on a comparative study of pauses made by L2 learners and native speakers of English while narrating picture stories. The comparison is based on the number of pauses and total amount of silence in the middle and at the end of clauses in the performance of 40 native speakers and 40 L2 learners of English.1 The results of the quantitative analyses suggest that, although the L2 learners generally pause more repeatedly and have longer periods of silence than the native speakers, the distinctive feature of their pausing pattern is that they pause frequently in the middle of clauses rather than at the end. The qualitative analysis of the data suggests that some of the L2 learners’ mid-clause pauses are associated with processes such as replacement, reformulation, and online planning. Formulaic sequences, however, contain very few pauses and therefore appear to facilitate the learners’ fluency.
Gunnel Tottie, “"Uh" and "Um" as sociolinguistic markers in British English,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 173-197. DOI: 10.1075/ijcl.16.2.02tot.

Abstract This study is based on the British National Corpus (BNC) and also takes data from the London-Lund Corpus (LLC) into account. It shows that the so-called filled pauses er/uh and erm/um are sociolinguistic markers that differentiate between registers of English and along gender, age and socio-economic class. Men, older people and educated speakers use more fillers than women, younger speakers and less educated speakers. Nasalization is used more often by women, younger speakers and more educated speakers. These sociolinguistic factors can probably partly explain the fact that the use of fillers is higher in the LLC and the context-governed sample of the BNC than in the demographic sample of the BNC. It is argued that a more positive view should be taken of fillers as planning signals, or planners, and that their functions should be submitted to careful discourse analytic study. Their recognition as words will facilitate such an undertaking.

Keywords corpus linguistics, Discourse markers, disfluency, filled pauses, hesitation markers, sociolinguistic markers

2010

April Ginther, Slobodanka Dimova, and Rui Yang, “Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring,” Language Testing, vol. 27, no. 3, 06/2010 2010, pp. 379-399. DOI: 10.1177/0265532210364407. http://ltj.sagepub.com/content/27/3/379.short.

Abstract Information provided by examination of the skills that underlie holistic scores can be used not only as supporting evidence for the validity of inferences associated with performance tests but also as a way to improve the scoring rubrics, descriptors, and benchmarks associated with scoring scales. As fluency is considered a critical, perhaps foundational, component of speaking proficiency, temporal measures of fluency are expected to be strongly related to holistic ratings of speech quality.This study examines the relationships among selected temporal measures of fluency and holistic scores on a semi-direct measure of oral English proficiency. The spoken responses of 150 respondents to one item on the Oral English Proficiency Test (OEPT) were analyzed for selected temporal measures of fluency. The examinees represented three first language backgrounds (Chinese, Hindi, and English) and the range of scores on the OEPT scale. While strong and moderate correlations between OEPT scores and speech rate, speech time ratio, mean length of run, and the number and length of silent pauses were found, fluency variables alone did not distinguish adjacent levels of the OEPT scale. Temporal measures of fluency may reasonably be selected for the development of automated scoring systems for speech; however, identification of an examinee’s level remains dependent on aspects of performance only partially represented by fluency measures.

Keywords automated scoring, Fluency, oral English proficiency
Joanne Arciuli, David Mallard, and Gina Villar, “"Um, I can tell you’re lying": Linguistic markers of deception versus truth-telling in speech,” Applied Psycholinguistics, vol. 31, no. 03, 2010, pp. 397-411. DOI: 10.1017/s0142716410000044. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=7792900&fulltextType=RA&fileId=S0142716410000044.

Abstract Lying is a deliberate attempt to transmit messages that mislead others. Analysis of language behaviors holds great promise as an objective method of detecting deception. The current study reports on the frequency of use and acoustic nature of and during laboratory-elicited lying versus truth-telling. Results obtained using a within-participants false opinion paradigm showed that instances of occur less frequently and are of shorter duration during lying compared to truth-telling. There were no significant differences in relation to These findings contribute to our understanding of the linguistic markers of deception behavior. They also assist in our understanding of the role of in communication more generally. Our results suggest that may not be accurately conceptualized as a filled pause/hesitation or speech disfluency/error whose increased usage coincides with increased cognitive load or increased arousal during lying. It may instead carry a lexical status similar to interjections and form an important part of authentic, effortless communication, which is somewhat lacking during lying.
Rachel Baker, and Valerie Hazan, “LUCID: a corpus of spontaneous and read clear speech in British English,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 3-6. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_003.pdf.

Abstract This paper describes LUCID, the London UCL Clear Speech in Interaction Database, which contains spontaneous and read speech in clear and casual speaking styles for 40 Southern British English speakers. The problem-solving task used to collect the spontaneous speech, the DiapixUK task, is also described, along with ways of using the task to elicit different types of clear speech without explicit instruction, e,g. using different ‘barriers’ to communication. Applications of the corpus and of the task materials for future research projects are discussed. The corpus and materials will be available online to the research community at the end of the project.

Keywords clear speech, DiSS, interaction, Speech production, spontaneous speech
Catia Cucchiarini, Joost van Doremalen, and Helmer Strik, “Fluency in non-native read and spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 15-18. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_015.pdf.

Abstract Various studies have investigated the temporal aspects of nonnative speech and their relation to perceived fluency, because fluency constitutes an important aspect of second language proficiency. For this purpose it is important to determine which measures are most strongly correlated with perceived fluency and how these measures vary. In the present study objective measures related to perceived fluency were calculated for read and spontaneous speech of non-native speakers of Dutch. The results indicate that the objective measures vary as a function of different variables. Suggestions are made for future investigations so as to facilitate comparisons between studies and meta-analyses.

Keywords DiSS, Fluency, non-native speech, temporal measures
Anne Cutler, Holger Mitterer, Susanne Brouwer, and Annelie Tuinman, “Phonological competition in casual speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 43-46. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_043.pdf.

Abstract The natural processes affecting spontaneous speech production and the natural processes of spoken-word recognition combine to cause significant activation of irrelevant lexical competitors. Using eye-tracking, we show that reduced forms of words that occur in casual speech cause listeners to activate lexical candidates that resemble the reduced form but are quite unlike the canonical form of the intended word. In L2, the problem is worse: casual speech processes that occur in the L2 but not in the L1 lead to activation of irrelevant competitors even where native listeners experience no such competition.

Keywords competition, DiSS, eyetracking, word recognition
Yuto Daikuhara, “日本語教育におけるフィラーの指導のための基礎的研究 : フィラーの定義と個々の形式の使い分けについて [Basic research on filler for Japanese as foreign language: definition of filler and differentiated use of each form],” PhD Dissertation, Kobe University, Kobe, Japan. March 2010. http://www.lib.kobe-u.ac.jp/infolib/meta_pub/G0000003kernel_D1004831.

Abstract (none)
Dale J. Barr, and Mandana Seyfeddinipur, “The role of fillers in listener attributions for speaker disfluency,” Language and Cognitive Processes, vol. 25, no. 4, 2010, pp. 441-455. DOI: 10.1080/01690960903047122. https://www.tandfonline.com/doi/abs/10.1080/01690960903047122.

Abstract When listeners hear a speaker become disfluent, they expect the speaker to refer to something new. What is the mechanism underlying this expectation? In a mouse-tracking experiment, listeners sought to identify images that a speaker was describing. Listeners more strongly expected new referents when they heard a speaker say um than when they heard a matched utterance where the um was replaced by noise. This expectation was speaker-specific: it depended on what was new and old for the current speaker, not just on what was new or old for the listener. This finding suggests that listeners treat fillers as collateral signals.

Keywords common ground, Dialogue, Disfluency, fillers, Perspective taking
Robert Eklund, “The effect of directed and open disambiguation prompts in authentic call center data on the frequency and distribution of filled pauses and possible implications for filled pause hypotheses and data collection methodology,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 23-26. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_023.pdf.

Abstract This paper studies the frequency and distribution of filled pauses (FPs) in ecologically valid data where unaware and authentic customers called in to report problems with their telephony and/or Internet services and were met by a novel Wizard-of-Oz paradigm using real call center agents as wizards. The data analyzed were caller utterances following a directed or an open disambiguation prompt. While no significant differences in FP production were observed as a function of prompt type, FP frequency was found to be considerably higher than what is usually reported in the literature. Moreover, a higher proportion of utterance-initial FPs than normally reported was also observed. The results are compared to previously reported FP frequencies. Potential implications for data collection methodology are discussed.

Keywords call center, data collection, dialog systems, directed prompts, DiSS, filled pauses, many-options, open prompts, speech planning, Speech production, Wizard-of-Oz, WOZ
Paul E. Engelhardt, Martin Corley, Joel T. Nigg, and Fernanda Ferreira, “The role of inhibition in the production of disfluencies,” Memory & Cognition, vol. 38, no. 5, July 2010, pp. 617-628. DOI: 10.3758/MC.38.5.617. https://link.springer.com/article/10.3758/MC.38.5.617.

Abstract Disfluency is a common occurrence in speech and is generally thought to be related to difficulty in the production system. One unexplored issue is the extent to which inhibition is required to prevent incorrect speech plans from being articulated. Therefore, we examined disfluency production in participants with attention-deficit/ hyperactivity disorder (ADHD), which is linked to deficits in inhibitory function and response suppression (Nigg, 2001). Participants completed a sentence production task in which they were presented with two pictures and a verb and their task was to produce a sentence. If inhibition plays a role in preventing incorrect speech plans, we would expect ADHD participants to produce more repetition and repair disfluencies than would non-ADHD controls. The results showed that one subtype of ADHD (i.e., the combined) produced more repair disfluencies as task demands increased. We conclude that the production system relies on inhibitory control in order to prevent errors in language production.

Keywords Inhibitory Control; Language Production; Animate Object; Object Order; Abnormal Child Psychology
Ian R. Finlayson, Robin J. Lickley, and Martin Corley, “The influence of articulation rate, and the disfluency of others, on one’s own speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 119-122. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_119.pdf.

Abstract Disfluencies are a regular feature of spontaneous speech, and much has been learnt about the effects of various linguistic factors on their production. Speech usually occurs within dialogue, yet little is known about the influence of an interlocutor’s speech on a speaker’s own fluency. It has been shown that speakers tend to align on various levels, converging, for example, on lexical, and syntactic levels. But we know little about convergence in rate of speech or disfluency. Little is also known about the effects of speech rate on fluency in a speaker’s own speech. In this paper, we examine these effects through analysis of speech rate, hesitation and error correction in a corpus of task-oriented dialogues (the HCRC Map Task Corpus). Our findings demonstrate that different types of disfluencies can be influenced in different ways by speech rate. Furthermore, the probability of an interlocutor being disfluent appears to affect the speaker’s own likelihood, raising the possibility that interlocutors may “align” on disfluent, as well as fluent, speech.

Keywords accommodation theory, alignment, articulation rate, Dialogue, DiSS
Anne Garcia-Fernandez, Ioana Vasilescu, and Sophie Rosset, “euh as cue for speaker confidence and word searching in human spoken answers in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 79-80. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_079.pdf.

Abstract This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question answering automatic systems.

Keywords DiSS, feeling of knowing, interaction management, QA systems, rephrasing, vocalic hesitation
Jean-Philippe Goldman, Mathieu Avanzi, and Antoine Auchlin, “Hesitations in read vs. spontaneous French in a multi-genre corpus,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 101-104. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_101.pdf.

Abstract This study is a part of an on-going work whose goal is the prosodic characterization of various speaking styles in a multi-genre 70-minutes French corpus as well as the development of prosodic automatic detection tools. In this corpus, a manual annotation prominences and disfluencies like hesitations and syntactic ruptures is used to show evident phonological aspects of hesitation in regard to quality, pause position and proximity to syntactic rupture.

Keywords disfluencies, DiSS, filled pause, hesitation, spoken French, vowel lengthening
Joakim Gustafson, and Daniel Neiberg, “Prosodic cues to engagement in non-lexical response tokens in Swedish,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 63-66. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_063.pdf.

Abstract This paper investigates the prosodic patterns of non-lexical response tokens in a Swedish call-in radio show. The feedback of a professional speaker was investigated to give insight in how to build a simulated active listener that could encourage its users to continue talking. Possible domains for such systems include customer care and second language learning. The prosodic analysis of the non-lexical response tokens showed that the engagement level decreases over time. Prosodic cues to this include change in syllabicity, pitch slope and loudness. We have also investigated prosodic alignment, to see to what extent the active listener mimic the prosody of the callers in his non-lexical response tokens.

Keywords DiSS, listener responses, prosodic alignment, prosodic cues, turn management
Corinna Harwardt, “Investigating the COG ratio as feature for speaker verification on high-effort speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 35-38. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_035.pdf.

Abstract Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%. When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

Keywords center of gravity ratio, DiSS, speaker recognition, vocal effort
Valerie Hazan, and Rachel Baker, “Does reading clearly produce the same acoustic-phonetic modifications as spontaneous speech in a clear speaking style?,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 7-10. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_007.pdf.

Abstract This paper describes an acoustic-phonetic comparison of casual and clear speech styles elicited in read and spontaneous speech. For the spontaneous speech, 20 pairs of English talkers were recorded doing a problem-solving picture task in good and degraded listening conditions. Each person also read sentences in casual and clear styles. The read clear speech was an exaggerated form of clear speech relative to the spontaneous clear speech: it had higher median F0 in both styles, a greater increase in F0 range and greater decrease in speaking rate between casual and clear styles, and trends towards greater vowel space expansion.

Keywords acoustic-phonetic characteristics, clear speech, DiSS, interaction, read speech, spontaneous speech
Pei-Yu Hsieh, “Pitch patterns in the vocalization of a 3-month-old Taiwanese infant,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 93-96. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_093.pdf.

Abstract This paper studied pitch contours of a Taiwanese-acquiring infant at gooing stage. Breath group theory has shown that pitch patterns of this stage were physiologically-based [6]. Fall was expected to occur at the boundary of a breath group. It predicted that Fall to be the most common pitch contour, and the second high was Rise-Fall. But previous studies [8], [9] showed that Rise-Fall occurred more. We investigated patterns of an infant from six weeks old to twelve weeks old. Mean f0 of basic contours of this stage were also shown. The f0 range of Level, Fall, and Rise were reported. Our results showed four types of contours (Level, Fall, Rise, Rise-Fall) appearing at this stage. Consistent with the hypothesis, Fall was found to be most common. Rise-Fall was found to be the second high. Fall and Rise-Fall made up to almost seventy percent. Level contour was found to be rare. The mean f0 of the infant at 3-month old was 400 Hz, higher than that of a toddler at 1;3 (370 Hz) and that of an adult (220 Hz). The f0 range was 700 Hz, greater than that of a toddler at 1;3 (450 Hz), and an adult (300 Hz).

Keywords acquisition, DiSS, pitch, vocalization
Tomohito Ishikawa, “Coding disfluency phenomena for a fluency measure in TBLT research,” Journal of Soka Women’s College, vol. 40, March 2010, pp. 101-130. http://ci.nii.ac.jp/naid/40017373381/en/.

Abstract The aim of this article is to describe coding steps for a disfluency measure employed in Ishikawa (2008a, b). According to Ellis and Barkhuizen (2005), fluency measures can be divided into two major categories. One is related to speed of speaking (i.e., temporal variables) and the other is related to repair fluency. In the sections to follow, I will first describe Shriberg’s classification system of disfluency. After the description of Shriberg’s classification system, I will describe an L2 disfluency measure used in Ishikawa (2008a, b).
Yuichi Ishimoto, and Mika Enomoto, “Analysis of prosodic features for end-of-utterance prediction in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 97-100. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_097.pdf.

Abstract In this study, we analyzed prosodic features of accentual phrases and investigated their temporal changes to obtain cues for de- tecting boundaries at where turn-taking could occur in sponta- neous conversations. The acoustic parameters used as prosodic features were the fundamental frequency, sound pressure level, and duration of accentual phrases in long utterance units. The results showed that the fundamental frequency shift between the first and second accentual phrases could be useful for detecting the number of accentual phrases in the long utterance unit. In addition, the results suggested that a rapid decrease in sound pressure and an extended duration of the accentual phrase con- stitute a cue for detecting the end of the utterance. That is, the acoustic predictor of the utterance length appeared at the begin- ning of the utterance, and the predictor of the utterance bound- ary appeared shortly before the end of the utterance.

Keywords accentual phrase, DiSS, long utterance unit, prosody, turn-taking
Kristiina Jokinen, “Hesitation and uncertainty as feedback,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 103-106. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_103.pdf.

Abstract This paper deals with the signals that are used to express hesitation and uncertainty in conversational interactions. It studies the relation between gesturing, body posture, facial expressions, and speech, and draws conclusions of their role and function in the interpretation and coordination of interaction with respect to the basic enablements of communication. Dialogues are assumed to be cooperative activity that is constrained by the participants’ roles, social obligations, and communicative situation.

Keywords DiSS, hesitation, interaction, speech, uncertainty
Okim Kang, “Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness,” System, vol. 38, no. 2, June 2010, pp. 301-315. DOI: 10.1016/j.system.2010.01.005.

Abstract Suprasegmentals have been emphasized in ESL/EFL pedagogy since the advent of communicative language teaching. However, it is still unclear how individual suprasegmental features affect listeners’ judgments of non-native speakers’ accented speech. The current study began to specify relative weights of individual temporal and prosodic features for listeners’ judgments on L2 comprehensibility and accentedness. Using the PRAAT computer program, 5 min of continuous in-class lectures from 11 international teaching assistants (ITAs) were acoustically analyzed for measures of speech rate, pauses, stress, and pitch range. Fifty eight US undergraduate students evaluated the ITAs’ oral performance and commented on their ratings. The results revealed that suprasegmental features independently contributed to listeners’ perceptual judgments. Accent ratings were best predicted by pitch range and word stress measures whereas comprehensibility scores were mostly associated with speaking rates. ITAs’ acoustic profiles as well as listeners’ comments on their rating offer practical implications to ITA program developers, ESL teachers, and future research in accented speech.

Keywords accentedness, Comprehensibility, International teaching assistants, Suprasegmentals
Takuya Kawada, “On the characteristics of three types of Japanese fillers: e-, ma-, and demonstrative-type fillers,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 27-30. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_027.pdf.

Abstract Japanese has various forms of fillers. However, the characteristics of each form have yet to be well understood. We use a large corpus of spontaneous Japanese speech and conversation and focus on three frequently observed types of fillers : e-, ma-, and demonstrative-type fillers. We show that it is possible to characterize Japanese fillers from the viewpoint of how a speaker concerns himself with the listener in the communicative setting. The type of discourse, way of speaking, and direction of gaze of the speaker influence the distribution of the types of filler.

Keywords DiSS, fillers, gaze, Japanese, spoken settings
Hanae Koiso, and Yasuharu Den, “Towards a precise model of turn-taking for conversation: a quantitative analysis of overlapped utterances,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 55-58. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_055.pdf.

Abstract In this paper, we present the outline of a new model of turntaking that is applicable not only to smooth transitions but also to transitions involving overlapping speech. We identify acoustic, prosodic, and syntactic cues in overlapped utterances that elicit early initiation of a next turn, based on a quantitative analysis of Japanese three-party conversations, proposing a model for predicting a turn’s completion in an incremental fashion using sources from units at multiple levels.

Keywords DiSS, incremental processing, overlapped utterances, turn-taking
Phoenix W. Y. Lam, “Discourse Particles in Corpus Data and Textbooks: The Case of Well,” Applied Linguistics, vol. 31, no. 2, May 2010, pp. 260-281. DOI: 10.1093/applin/amp026. http://applij.oxfordjournals.org/content/31/2/260.abstract.

Abstract Discourse particles are ubiquitous in spoken discourse. Yet despite their pervasiveness very few studies attempt to look at their use in the pedagogical setting. Drawing on data from an intercultural corpus of speech and a textbook database, the present study compares the use of discourse particles by expert users of English in Hong Kong with their descriptions and presentations in textbooks designed for learners of English in the same community. Specifically, it investigates the similarities and differences in the use of the discourse particle well between the two datasets in terms of its frequency of occurrence, its positional preference and its discourse function. Results from the analysis show that there are vast differences as regards how the particle well is used in real-world examples and how its use is described and presented in teaching materials. This raises the question to what extent foreign language learners who have minimal exposure to naturally-occurring spoken interactions in English could effectively master the use of discourse particles if they solely rely on these textbooks.
Rebecca Lunsford, Peter A. Heeman, Lois Black, and Jan van Santen, “Autism and the use of fillers: differences between ‘um’ and ‘uh’,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 107-110. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_107.pdf.

Abstract Little research has been done to explore differences in the use of the fillers ‘um’ and ‘uh’ between children with Autistic Spec- trum Disorder (ASD) and those with typical development (TD). Quantifying any differences could aid in diagnosing ASD, un- derstanding its nature, and better understanding the mechanisms involved in dialogue processing. In this paper, we report on a study of dialogues between clinicians and children with ASD or TD, finding that the two groups of children differ substantially in their use of ‘um’ but not ‘uh’. This suggests that these two fillers result from different cognitive processes.

Keywords autism, disfluencies, DiSS, fillers
Lucy J. MacGregor, Martin Corley, and David I. Donaldson, “Listening to the sound of silence: disfluent silent pauses in speech have consequences for listeners,” Neuropsychologia, vol. 48, no. 14, 2010, pp. 3982-3992. DOI: https://doi.org/10.1016/j.neuropsychologia.2010.09.024. http://www.sciencedirect.com/science/article/pii/S0028393210004148.

Abstract Silent pauses are a common form of disfluency in speech yet little attention has been paid to them in the psycholinguistic literature. The present paper investigates the consequences of such silences for listeners, using an Event-Related Potential (ERP) paradigm. Participants heard utterances ending in predictable or unpredictable words, some of which included a disfluent silence before the target. In common with previous findings using er disfluencies, the N400 difference between predictable and unpredictable words was attenuated for the utterances that included silent pauses, suggesting a reduction in the relative processing benefit for predictable words. An earlier relative negativity, topographically distinct from the N400 effect and identifiable as a Phonological Mismatch Negativity (PMN), was found for fluent utterances only. This suggests that only in the fluent condition did participants perceive the phonology of unpredictable words to mismatch with their expectations. By contrast, for disfluent utterances only, unpredictable words gave rise to a late left frontal positivity, an effect previously observed following ers and disfluent repetitions. We suggest that this effect reflects the engagement of working memory processes that occurs when fluent speech is resumed. Using a surprise recognition memory test, we also show that listeners were more likely to recognise words which had been encountered after silent pauses, demonstrating that silence affects not only the process of language comprehension but also its eventual outcome. We argue that, from a listener's perspective, one critical feature of disfluency is the temporal delay which it adds to the speech signal.

Keywords Language comprehension; Disfluency; ERPs; Recognition memory; N400; PMN; LPC
Kikuo Maekawa, “Final lowering and boundary pitch movements in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 47-50. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_047.pdf.

Abstract Standard theory of the prosodic structure in Tokyo Japanese treats both the final lowering and boundary pitch movements as the properties of utterance node. Validity of this treatment was examined by means of corpus-based analyses of spontaneous speech. The results showed that while final lowering could be treated as a property of utterance, boundary pitch movement could not. The latter should rather be treated as the property of accentual phrase. Based on these results, revised prosodic structure and annotation scheme were proposed.

Keywords BPM, CSJ, DiSS, final lowering, X-JToBI
Takehiko Maruyama, Katsuya Takanashi, and Nao Yoshida, “An annotation scheme for syntactic unit in Japanese dialog,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 51-54. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_051.pdf.

Abstract In this paper, we propose a scheme for annotating syntactic units called DCU (Dialog Clause-Unit) in Japanese dialogs. Since there is no explicit devices to mark sentence boundaries in speech, precise definition and criteria must be designed to extract syntactic units from the utterance. We show a design of DCU which consists of clausal and non-clausal units. Annotating DCU tags to eight dialogs of 40 minutes from two different dialog corpora, we examine characteristics of each dialog from the viewpoint of DCU, and compare them to the distribution of clausal-units annotated to monologs.

Keywords clause boundary, dialog clause-unit, DiSS, Japanese dialog and monolog, unit length
Dana McDaniel, Cecile McKee, and Merrill F. Garrett, “Children’s sentence planning: Syntactic correlates of fluency variations,” Journal of Child Language, vol. 37, no. 1, 2010, pp. 59-94. DOI: 10.1017/s0305000909009507. http://journals.cambridge.org/article_S0305000909009507.

Abstract This paper argues for broader consideration of children’s language production systems and, in that context, describes research on children’s planning of syntactic structures. The research presented here measures non-fluency patterns in elicited utterances of varied syntactic type. We describe and interpret several regularities in these patterns for two groups of children ((‘young’: three–five-year-olds; and ‘older’: six–eight-year-olds) and an adult comparison group. The evidence indicates a strong correspondence of adult and child responses to structural complexity, both in terms of global fluency measures and in terms of more detailed indicators of planning load. In addition, we report some specific contrasts in the patterning for children and adults that suggest disparities in processing resources and/or in local planning strategies.
Sandra Merlo, and Plínio A. Barbosa, “Periodic cycles of hesitation phenomena in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 19-22. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_019.pdf.

Abstract To verify whether hesitation phenomena are distributed periodically in spontaneous speech, twenty speech samples produced by five male adults were analyzed. Spectral analysis allowed for three main findings. First, hesitations present stationary behavior, which implies they did not accumulate in the beginning, in the middle, or in the end of speech samples. Second, periodic cycles of hesitation phenomena were detected in all speech samples (mean cycle duration around 13 seconds). This implies that regions with more hesitations tended to regularly alternate with regions with fewer hesitations. Third, periodic cycles accounted for about 30% of variance in data.

Keywords DiSS, hesitation phenomena, periodic cycles, time series
Emi Morita, “Salientizing the breaks in talk: a study of Japanese segmentizing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 59-62. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_059.pdf.

Abstract In naturally occurring conversation, Japanese speakers often break up their turns at talk with seemingly random or disfluent pauses that break the flow of talk into a series of successive small segments which may not be semantically coherent. Moreover, the boundaries between such segments are often made salient via the attachment of interactional particles, such as ne and sa. Empirical observation of such naturally occurring partitioning of talk reveals that such “semantically irregular” segmentation is used by both speakers and their recipients to accomplish a legitimate communicative function in managing the fine-tuned choreography of moment-bymoment conversational interaction.

Keywords DiSS, interactional particles, Japanese conversation, utterance segmentation
Daniel Neiberg, and Joakim Gustafson, “Modeling conversational interaction using coupled Markov chains,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 81-84. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_081.pdf.

Abstract This paper presents a series of experiments on automatic transcription and classification of fillers and feedbacks in conversational speech corpora. A feature combination of PCA projected normalized F0 Constant-Q Cepstra and MFCCs has shown to be effective for standard Hidden Markov Models (HMM). We demonstrate how to model both speaker channel with coupled HMMs and show expected improvements. In particular, we explore model topologies which take advantage of predictive cues for fillers and feedback. This is done by initializing the training with special labels located immediately before fillers in the same channel and immediately before feedbacks in the other speaker channel. The average F-score for a standard HMM is 34.1%, for a coupled HMM 36.7% and for a coupled HMM with pre-filler and pre-feedback labels 40.4%. In a pilot study the detectors are found to be useful for semi-automatic transcription of feedback and fillers in socializing conversations.

Keywords conversation, coupled hidden markov models, cross-speaker modeling, DiSS, feedbacks, fillers
Hannele Nicholson, Kathleen Eberhard, and Matthias Scheutz, “"um...i don’t see any": the function of filled pauses and repairs,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 89-92. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_089.pdf.

Abstract We investigate disfluency distribution rates within different moves from an interactive task-oriented experiment to further explore the suggestion by Bortfeld et al. [1] and Nicholson [2] that different types of disfluencies may fulfill varying functions. We focus on disfluency types within moves, or speech turns, where a speaker initiates something compared to a response to such a move. We find that filled pauses (FPs) such as um or uh fulfilled an interpersonal role for participants while repairs occurred out of difficulty.

Keywords Dialogue, dialogue moves, disfluency, DiSS, Language production
Emanuel A. Schegloff, “Some Other "Uh(m)"s,” Discourse Processes, vol. 47, no. 2, 2010, pp. 130-174. DOI: 10.1080/01638530903223380.

Abstract Recent work on the occurrence of "uh" and "uhm" in ordinary talk-in-interaction is concerned almost exclusively with its relation to trouble in the speech production process. After touching briefly on this environment of occurrence, this conversation-analytic article focuses attention on several interactional environments in which "uh(m)" figures in other ways—most extensively on its use to indicate the "reason-for-the-interaction’s-launching." The underlying theme is that accounts for what gets done and gets understood in talk-in-interaction must take into account not only its composition, but also its position—not only with respect to the grammar of sentences, but also with respect to the organization of turns at talk, of action sequences encompassing multiple turns at talk, and of occasions of talk, all of which are demonstrably oriented to by speakers in their production of the talk and by recipients in their analyzing of the talk.
Norman Segalowitz, Cognitive Bases of Second Language Fluency. London: Routledge.June 2010. http://www.routledge.com/books/details/9780805856620/.

Abstract Exploring fluency from multiple vantage points that together constitute a cognitive science perspective, this book examines research in second language acquisition and bilingualism that points to promising avenues for understanding and promoting second language fluency. Cognitive Bases of Second Language Fluency covers essential topics such as units of analysis for measuring fluency, the relation of second language fluency to general cognitive fluidity, social and motivational contributors to fluency, and neural correlates of fluency. The author provides clear and accessible summaries of foundational empirical work on speech production, automaticity, lexical access, and other issues of relevance to second language acquisition theory. Cognitive Bases of Second Language Fluency is a valuable reference for scholars in SLA, cognitive psychology, and language teaching, and it can also serve as an ideal textbook for advanced courses in these fields.
Kazuki Sekine, “Gesture correction in children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 71-74. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_071.pdf.

Abstract Speakers sometimes modify their gestures during the process of production into disguised adaptors. Such disguised adaptors can be treated as evidence that speakers can monitor their gestures. This study investigated when disguised adaptors are produced in Japanese elementary school children. The results showed that children did not produce disguised adaptors until the age of 8. The emergence of disguised adaptors suggested that children start to monitor their gestures when they are 9 or 10 years old. Cultural influences and cognitive changes were considered as factors to influence emergence of disguised adaptors.

Keywords adaptors, DiSS, speech error, spontaneous gestures
Shu-Chuan Tseng, and Yun-Ru Huang, “A socio-phonetic analysis of Taiwan Mandarin interview speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 67-70. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_067.pdf.

Abstract This paper presents results of a socio-phonetic analysis of Taiwan Mandarin by using a corpus of questionnaire-based interview speech. Questions were asked to collect data of the interviewee’s background of language use, socio-economic status, and internet access in different regions of Taiwan. Two typical dialect-influenced pronunciation errors, the deletion of /w/ before /o/ and the delabilialization of /y/ were analyzed with the associated socio-economic factors and the degree of dialect exposure. The degree of dialect exposure (Southern Min) and the studied pronunciation variants are statistically correlated with the accuracy rate. But no direct correlation was found between the pronunciation variation and the socioeconomic factors.

Keywords DiSS, interview speech, sociophonetics, Taiwan Mandarin
Shu-Chuan Tseng, and Tzu-Lun Lee, “Contextual effects in recognizing reduced words in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 39-42. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_039.pdf.

Abstract This study investigates the effects of context on recognizing reduced word forms in spontaneous speech. Sixteen high-frequency disyllabic targets, eight disyllabic and eight combinations of monosyllabic words are presented to 48 subjects in a spoken word recognition experiment in three conditions: in their original context, in isolation, and embedded in a carrier sentence. Results show that context, degree of reduction, word unit type, gender, and age group all show an effect on the accuracy rates of recognizing the target items. Most interestingly, while a meaningful context helps recognize reduced word forms, a less meaningful context inhibits the recognition more than no context.

Keywords context effect, DiSS, spoken word recognition
Shu-Chuan Tseng, Pei-Chen Tsou, Ko Kuei, and Chien-Wen Lee, “Assessing sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 11-14. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_011.pdf.

Abstract This paper examines sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children with matched gender, age and level of speech comprehension. We assessed these two kinds of speech styles by talker intelligibility, vowel space, and spike production in plosives. In both speaking styles, normally hearing children performed better in talker intelligibility than their hearingimpaired counterparts. No clear vowel space shrinkage was observed in respect of speech style, hearing impairment, and age group. Surprisingly, the production of the spike in plosives was a useful measure for distinguishing acoustic properties of different speaking styles and hearing ability.

Keywords acoustic properties, DiSS, hearing impairment, speaking style, speech assessment
Ioana Vasilescu, Sophie Rosset, and Martine Adda-Decker, “On the functions of the vocalic hesitation euh in interactive man-machine question answering dialogs in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 111-114. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_111.pdf.

Abstract This paper deals with the functions of the French vocalic hesitation euh in interactive speech of man-machine question answering dialogs. The present analysis suggests that the vocalic hesitation euh may carry various properties in speech, both disfluent signaling the speakers’ efforts to put the intended message under production into appropriate words, and fluent, as markers of discourse structure. Moreover, euh seems to play a role in bracketing lexical units, pointing to the informative content within an utterance. This bracketing may favour intelligibility or decoding fluency on the listener’s side. The potential contribution of the vocalic hesitation euh to lexical information bracketing is investigated with the goal of improved information processing by QA systems. Future objectives include a smarter interaction capacity by an appropriate usage of such euh items.

Keywords dialog corpus, Discourse markers, disfluency, DiSS, Fluency, French, Q/A, vocalic hesitation
Kun-Ching Wang, Chiun-Li Chin, and Yi-Hsing Tsai, “Voice activity detection based on combination of weighted sub-band features using auto-correlation function,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 85-88. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_085.pdf.

Abstract This paper shows the voice activity detection (VAD) based on combination of weighted sub-band features using autocorrelation function. According to the fact that the noise corruption on each sub-band is different from each other, so the estimated signal to noise ratio (SNR) is employed to weight utility rate of each frequency sub-band. Furthermore, a strategy of sub-band features combination is used to integrate all of weighted sub-band auto-correlation function feature parameter and to develop the combined feature parameter. Experimental results demonstrate that the proposed VAD achieves better performance than existing standard VADs at any noise level.

Keywords auto-correlation, DiSS, feature combination, sub-band weighting, voice activity detection, wavelet packet transform
Michiko Watanabe, and Yasuharu Den, “Utterance-initial elements in Japanese: a comparison among fillers, conjunctions, and topic phrases,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 31-34. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_031.pdf.

Abstract Speakers need to plan the following part of speech under the pressure of a temporal imperative at utterance-initial positions. Each language seems to have some devices to solve this problem, which we call utterance-initial elements (UIEs). We investigated effects of two factors, boundary strengths and complexity of the following constituents, on the durations of possible UIEs, such as fillers, conjunctions, and topic phrases. We found that the last mora of filler e, as well as wa-marked topic phrases, became longer as the complexity increased in certain conditions. Possible interpretations for the results are discussed.

Keywords boundary strengths, constituent complexity, DiSS, prolongation, utterance-initial elements
Li-chiung Yang, “Meaning and use: a pragmatic and prosodic analysis of interjections in conversational speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 75-78. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_075.pdf.

Abstract In this paper we report on our research on the pragmaticcontextual meaning and prosody of three interjections ey, wa, and oh. A detailed qualitative-contextual analysis of our corpus shows that these interjections share important contextual and prosodic characteristics due to their similar functional status with respect to new or unexpected information. We show that there are also significant differences in contextual meaning arising from specific emotional or cognitive states, and that these differences are expressively communicated in the varied prosody of each interjection.

Keywords discourse, DiSS, interjections, meaning, prosody
Etsuko Yoshida, and Robin J. Lickley, “Disfluency patterns in dialogue processing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 115-118. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_115.pdf.

Abstract Spontaneous speech abounds with disfluencies such as filled pauses, repairs, repetitions, false start and prolongations, all of which are significant but easily overlooked features of speech communication. Based on the comparable corpora of English and Japanese dialogues, we argue that disfluency features can have a positive effect on turn-taking issues and the establishment of common referring expressions in dialogue processing. We examined the occurrence of ten types of filled pauses in Japanese and investigated how they interact with discourse entities and the sharing of common ground. The results indicate that two patterns of disfluency features contribute to on-line speech planning of the participants and their four functions serve to construct the collaborative process of speech communication.

Keywords common ground, corpus, Dialogue, disfluency, DiSS, referring expressions

2009

Kartik Audhkhasi, Kundan Kandhway, Om. D. Deshmukh, and Ashish Verma, “Formant-based technique for automatic filled-pause detection in spontaneous spoken english,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 4857-4860. DOI: 10.1109/ICASSP.2009.4960719.

Abstract Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to improve the performance of automatic speech recognition systems or to predict the mental state of the speaker. This paper presents an algorithm for filled pause detection that is based on the premise that the vocal tract characteristics, and hence the formants, are stable during the production of a filled pause. The performance of the proposed algorithm is evaluated on real-life recordings of call center agents where the locations of the filled pauses are hand labeled. The proposed algorithm outperforms a standard cepstral stability based filled pause detection algorithm and a standard pitch-based detection technique.
Tracey M. Derwing, Murray J. Munro, Ron I. Thomson, and Marian J. Rossiter, “The Relationship between L1 Fluency and L2 Fluency Development,” Studies in Second Language Acquisition, vol. 31, no. 4, December 2009, pp. 533-557. DOI: 10.1017/S0272263109990015.

Abstract A fundamental question in the study of second language (L2) fluency is the extent to which temporal characteristics of speakers’ first language (L1) productions predict the same characteristics in the L2. A close relationship between a speaker’s L1 and L2 temporal characteristics would suggest that fluency is governed by an underlying trait. This longitudinal investigation compared L1 and L2 English fluency at three times over 2 years in Russian- and Ukrainian- (which we will refer to here as Slavic) and Mandarin-speaking adult immigrants to Canada. Fluency ratings of narratives by trained judges indicated a relationship between the L1 and the L2 in the initial stages of L2 exposure, although this relationship was found to be stronger in the Slavic than in the Mandarin learners. Pauses per second, speech rate, and pruned syllables per second were all related to the listeners’ judgments in both languages, although vowel durations were not. Between-group differences may reflect differential exposure to spoken English and a closer relationship between Slavic languages and English than between Mandarin and English. Suggestions for pedagogical interventions and further research are also proposed.
Rod Ellis, “The Differential Effects of Three Types of Task Planning on the Fluency, Complexity, and Accuracy in L2 Oral Production,” Applied Linguistics, vol. 30, no. 4, December 2009, pp. 474-509. DOI: 10.1093/applin/amp042. http://applij.oxfordjournals.org/content/30/4/474.abstract.

Abstract The main purpose of this article is to review studies that have investigated the effects of three types of planning (rehearsal, pre-task planning, and within-task planning) on the fluency, complexity, and accuracy of L2 performance. All three types of planning have been shown to have a beneficial effect on fluency but the results for complexity and accuracy are more mixed, reflecting both the type of planning and also the mediating role of various factors, including task design and implementation variables and individual difference factors. A secondary purpose is to outline a theory that can account for the role that planning plays in L2 performance. The article concludes with a list of limitations in the research to date.
Klaus Zechner, Derrick Higgins, Xiaoming Xi, and David M. Williamson, “Automatic scoring of non-native spontaneous speech in tests of spoken English,” Speech Communication, vol. 51, no. 10, 2009, pp. 883 - 895. DOI: http://dx.doi.org/10.1016/j.specom.2009.04.009. http://www.sciencedirect.com/science/article/pii/S0167639309000703.

Abstract This paper presents the first version of the SpeechRaterSM system for automatically scoring non-native spontaneous high-entropy speech in the context of an online practice test for prospective takers of the Test of English as a Foreign Language® internet-based test (TOEFL® iBT). The system consists of a speech recognizer trained on non-native English speech data, a feature computation module, using speech recognizer output to compute a set of mostly fluency based features, and a multiple regression scoring model which predicts a speaking proficiency score for every test item response, using a subset of the features generated by the previous component. Experiments with classification and regression trees (CART) complement those performed with multiple regression. We evaluate the system both on {TOEFL} Practice data [TOEFL Practice Online (TPO)] as well as on Field Study data collected before the introduction of the {TOEFL} iBT. Features are selected by test development experts based on both their empirical correlations with human scores as well as on their coverage of the concept of communicative competence. We conclude that while the correlation between machine scores and human scores on {TPO} (of 0.57) still differs by 0.17 from the inter-human correlation (of 0.74) on complete sets of six items (Pearson r correlation coefficients), the correlation of 0.57 is still high enough to warrant the deployment of the system in a low-stakes practice environment, given its coverage of several important aspects of communicative competence such as fluency, vocabulary diversity, grammar, and pronunciation. Another reason why the deployment of the system in a low-stakes practice environment is warranted is that this system is an initial version of a long-term research and development program where features related to vocabulary, grammar, and content will be added in a later stage when automatic speech recognition performance improves, which can then be easily achieved without a re-design of the system. Exact agreement on single {TPO} items between our system and human scores was 57.8%, essentially at par with inter-human agreement of 57.2%. Our system has been in operational use to score {TOEFL} Practice Online Speaking tests since the Fall of 2006 and has since scored tens of thousands of tests.

Keywords Speaking assessment
Lucy J. MacGregor, Martin Corley, and David I. Donaldson, “Not all disfluencies are are equal: The effects of disfluent repetitions on language comprehension,” Brain and Language, vol. 111, no. 1, 2009, pp. 36 - 45. DOI: https://doi.org/10.1016/j.bandl.2009.07.003. http://www.sciencedirect.com/science/article/pii/S0093934X09000819.

Abstract Disfluencies can affect language comprehension, but to date, most studies have focused on disfluent pauses such as er. We investigated whether disfluent repetitions in speech have discernible effects on listeners during language comprehension, and whether repetitions affect the linguistic processing of subsequent words in speech in ways which have been previously observed with ers. We used event-related potentials (ERPs) to measure participants’ neural responses to disfluent repetitions of words relative to acoustically identical words in fluent contexts, as well as to unpredictable and predictable words that occurred immediately post-disfluency and in fluent utterances. We additionally measured participants’ recognition memories for the predictable and unpredictable words. Repetitions elicited an early onsetting relative positivity (100–400ms post-stimulus), clearly demonstrating listeners’ sensitivity to the presence of disfluent repetitions. Unpredictable words elicited an N400 effect. Importantly, there was no evidence that this effect, thought to reflect the difficulty of semantically integrating unpredictable compared to predictable words, differed quantitatively between fluent and disfluent utterances. Furthermore there was no evidence that the memorability of words was affected by the presence of a preceding repetition. These findings contrast with previous research which demonstrated an N400 attenuation of, and an increase in memorability for, words that were preceded by an er. However, in a later (600–900ms) time window, unpredictable words following a repetition elicited a relative positivity. Reanalysis of previous data confirmed the presence of a similar effect following an er. The effect may reflect difficulties in resuming linguistic processing following any disruption to speech.

Keywords Language comprehension; Disfluency; Speech; ERPs; Repetitions
Laura M. Pfeifer, and Timothy Bickmore, “Should Agents Speak Like, um, Humans? The Use of Conversational Fillers by Virtual Agents,” in Intelligent Virtual Agents. IVA 2009. Lecture Notes in Computer Science, vol. 5773, Berlin, Heidelberg, Springer, 2009, pp. 460-466. DOI: 10.1007/978-3-642-04380-2_50. https://link.springer.com/chapter/10.1007/978-3-642-04380-2_50.

Abstract We describe the design and evaluation of an agent that uses the fillers um and uh in its speech. We describe an empirical study of human-human dialogue, analyzing gaze behavior during the production of fillers and use this data to develop a model of agent-based gaze behavior. We find that speakers are significantly more likely to gaze away from their dialogue partner while uttering fillers, especially if the filler occurs at the beginning of a speaking turn. This model is evaluated in a preliminary experiment. Results indicate mixed attitudes towards an agent that uses conversational fillers in its speech.

Keywords embodied conversational agent, fillers, filled pause, gaze

2008

Philip Collard, Martin Corley, Lucy J. MacGregor, and David I. Donaldson, “Attention orienting effects of hesitations in speech: Evidence from ERPs,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 34, no. 3, May 2008, pp. 696-702. DOI: 10.1037/0278-7393.34.3.696.

Abstract Filled-pause disfluencies such as um and er affect listeners' comprehension, possibly mediated by attentional mechanisms (J. E. Fox Tree, 2001). However, there is little direct evidence that hesitations affect attention. The current study used an acoustic manipulation of continuous speech to induce event-related potential components associated with attention (mismatch negativity [MMN] and P300) during the comprehension of fluent and disfluent utterances. In fluent cases, infrequently occurring acoustically manipulated target words gave rise to typical MMN and P300 components when compared to nonmanipulated controls. In disfluent cases, where targets were preceded by natural sounding hesitations culminating in the filled pause er, an MMN (reflecting a detection of deviance) was still apparent for manipulated words, but there was little evidence of a subsequent P300. This suggests that attention was not reoriented to deviant words in disfluent cases. A subsequent recognition test showed that nonmanipulated words were more likely to be remembered if they had been preceded by a hesitation. Taken together, these results strongly implicate attention in an account of disfluency processing: Hesitations orient listeners' attention, with consequences for the immediate processing and later representation of an utterance.
Martin Corley, and Oliver W. Stewart, “Hesitation Disfluencies in Spontaneous Speech: The Meaning of um,” Language and Linguistics Compass, vol. 2, no. 4, July 2008, pp. 589-602. DOI: 10.1111/j.1749-818X.2008.00068.x.

Abstract Human speech is peppered with ums and uhs, among other signs of hesitation in the planning process. But are these so-called fillers (or filled pauses) intentionally uttered by speakers, or are they side-effects of difficulties in the planning process? And how do listeners respond to them? In the present paper, we review evidence concerning the production and comprehension of fillers such as um and uh, in an attempt to determine whether they can be said to be ’words’ with ’meanings’ that are understood by listeners. We conclude that, whereas listeners are highly sensitive to hesitation disfluencies in speech, there is little evidence to suggest that they are intentionally produced, or should be considered to be words in the conventional sense.
Tracey M. Derwing, Murray J. Munro, and Ron I. Thomson, “A Longitudinal Study of ESL Learners’ Fluency and Comprehensibility Development,” Applied Linguistics, vol. 29, no. 3, 2008, pp. 359-380. DOI: 10.1093/applin/amm041. http://applij.oxfordjournals.org/content/29/3/359.abstract.

Abstract This longitudinal mixed-methods study compared the oral fluency of well-educated adult immigrants from Mandarin and Slavic language backgrounds (16 per group) enrolled in introductory English as a second language (ESL) classes. Speech samples were collected over a 2-year period, together with estimates of weekly English use. We also conducted interviews at the last data collection session. The participants’ fluency and comprehensibility at three points over 22 months were judged by 33 native speakers of English. We examine the learners’ progress in light of their exposure to English outside of their ESL class. The Slavic language speakers showed a small but significant improvement in both fluency and comprehensibility, whereas the Mandarin speakers’ performance did not change over 2 years, although both groups started at the same level of oral proficiency. These differences may be attributable in part to degree of exposure to English outside the ESL courses. Neither group had extensive exposure outside of their classes because of employment and familial responsibilities (although the Slavic language speakers reported more opportunities). Thus both groups may have been disadvantaged by a lack of oral fluency instruction. The findings, both quantitative and qualitative, are interpreted using the Willingness to Communicate framework; we also discuss implications for the language classroom.
Michael Erard, Um... Slips, Stumbles, and Verbal Blunders, and What They Mean. New York: Penguin Random House.August 2008. https://www.penguinrandomhouse.com/books/46803/um---by-michael-erard/.

Abstract This original, entertaining, and surprising book investigates verbal blunders: what they are, what they say about those who make them, and how and why we’ve come to judge them.Um… is about how you really speak, and why it’s normal for your everyday speech to be filled with errors—about one in every ten words. In this charming, engaging account of language in the wild, linguist and writer Michael Erard also explains why our attention to some blunders rises and falls. Where did the Freudian slip come from? Why do we prize "umlessness" in speaking—and should we? And how do we explain the American presidents who are famous for their verbal stumbles? Full of entertaining examples, Um… is essential reading for talkers and listeners of all stripes.
Gaëtanelle Gilquin, “Hesitation markers among EFL learners: Pragmatic deficiency or difference?,” in Pragmatics and Corpus Linguistics: A Mutualistic Entente, Romero-Trillo, Jesús, Ed.Berlin: De Gruyter Mouton, September 2008, pp. 119-150. DOI: 10.1515/9783110199024.119. https://www.degruyter.com/view/book/9783110199024/10.1515/9783110199024.119.xml.

Abstract Spoken corpora, by giving access to detailed transcriptions of authentic speech, have made it possible to study hesitation phenomena with a precision and reliability that were practically unattainable before. Taking advantage of the availability of spoken corpora, and more precisely of a newcomer to the field, the spoken learner corpus, which contains samples of non-native speech (NNS), this paper sets out to investigate the function of hesitation among EFL (English as a Foreign Language) learners with French as a mother tongue, using as a baseline the way the function is performed in native speech (NS). The paper is structured as follows. First the function of hesitation in speech is briefly introduced. Next, the two corpora on which the study is based (LOCNEC and LINDSEI-FR) and the three categories of hesitation phenomena investigated (pauses, smallwords and other devices) are described. The following two sections present the main results of the corpus-based analysis and discuss these results in the light of Foreign Language Teaching (FLT) and English as a Lingua Franca (ELF). Section 7 concludes the paper.
Carla L. Hudson Kam, and Nicole A. Edwards, “The use of uh and um by 3- and 4-year-old native English-speaking children: Not quite right but not completely wrong,” First Language, vol. 28, no. 3, 08/2008 2008, pp. 313-327. DOI: 10.1177/0142723708091149. http://fla.sagepub.com/content/28/3/313.abstract.

Abstract The delay markers (DMs) 'uh' and 'um' are often used by adult English speakers to indicate that an upcoming pause is due to a speech disruption, not the end of a conversational turn. Moreover, 'uh' and 'um' indicate different degrees of disruption (Clark & Fox Tree, 2002). Thus, it appears that children must learn how to use DMs appropriately. In the current study we examined DM use in elicited speech samples from 24 3- and 4-year-old children. We found that pauses following DMs were longer than those not following a DM, but that there was no difference between the pauses following 'uh' and 'um'. Children at this age, then, appear to understand the basic use of DMs, but do not yet differentiate between them.

Keywords Conversational development, disfluencies, filled pauses, narrative, turn-taking
T. Florian Jaeger, and Celeste Kidd, “A Unified Model of Redundancy Avoidance and Strategic Lengthening,” in The 21st CUNY Sentence Processing Conference, March 2008. https://www.researchgate.net/publication/228797456_A_Unified_Model_of_Redundancy_Avoidance_and_Strategic_Lengthening.

Abstract Recent studies have revealed an intriguing link between redundancy and reduction: words that are more predictable in their context are more commonly reduced (shorter and with less articulatory detail [1,2,3]). These studies have, however, also found a puzzling asymmetry: Content words are reduced when predictable given the previous word, but function words are reduced when predictable given the following word. We present a solution to this puzzle that unifies work on redundancy with work on strategic lengthening [4]. We find that the apparent backward-predictability effect on function word reduction is an artifact caused by speakers' tendency to slow pronunciation when the next word is unavailable.
Lucy J. MacGregor, “Disfluencies affect language comprehension: evidence from event-related potentials and recognition memory,” Master's Thesis, The University of Edinburgh. 2008. http://hdl.handle.net/1842/3311.

Abstract Everyday speech is littered with disfluencies such as filled pauses, silent pauses, repetitions and repairs which reflect a speaker’s language production difficulties. But what are the effects on language comprehension? This thesis took a novel approach to the study of disfluencies by combining an investigation of the immediate effects on language processing with an investigation of the longer-term effects for the representation of language in memory. A series of experiments is reported which reflects the first attempt at a systematic investigation of the effects of different types of disfluencies on language comprehension. The experiments focused on the effects of three types of disfluencies—ers, silent pauses, and repetitions—on the comprehension of subsequent words. Critical words were either straightforward continuations of the pre-interrupted speech or a repair word which corrected the pre-interrupted speech. In addition, the effects that occur when er, repetition, and repair disfluencies themselves are processed, were assessed. ERPs showed that the N400 effect elicited in response to contextually unpredictable compared to predictable words was attenuated by the presence of a pre-target er reflecting a reduction in the standard difference where unpredictable words are more difficult to integrate into their contexts. This finding suggests that ers may reduce the extent to which listeners make predictions about upcoming words. In addition, words preceded by an er were more likely to be correctly recognised in a subsequent memory test. These findings demonstrate a longer-term consequence for representation which may reflect heightened attention during processing. Silent pauses did not affect the N400 but there was some indication of an effect on recognition memory. Repetition disfluencies did not affect the N400 or recognition memory. These findings demonstrate the importance of the nature of the disruption to speech. For all types of disfluent utterances, unpredictable words elicited a Late Positive Complex (LPC), possibly reflecting processes associated with memory retrieval and control as listeners attempted to resume structural fluency after any interruption. Ers themselves elicited standard attention-related ERP effects: the Mismatch Negativity (MMN) and P300 effects, supporting the possibility that ers heighten attention. Repetition disfluencies elicited a right posterior positivity, reflecting detection of the disfluency and possibly syntactic reanalysis. Repair disfluencies elicited an early frontal negativity, possibly related to the detection of a word category violation, and a P600 effect, reflecting syntactic reanalysis. The presence of an er preceding the repair eliminated the early negativity, but had no effect on the P600 suggesting that ers may prepare listeners for the possibility of an upcoming repair, but that they do not reduce the difficulty associated with reanalysis. Taken together, the results from the studies reported in the thesis support an account of disfluency processing which incorporates both prediction and attention

Keywords Language comprehension, Psychology
Ralph L. Rose, “Filled Pauses in Language Teaching: Why and How,” Bulletin of Gunma Prefectural Women’s University, vol. 29, 2008, pp. 47-64. http://www.roselab.sci.waseda.ac.jp/resources/file/teachingfps.pdf.

Abstract Filled Pauses (uh, um) are ubiquitous elements of spontaneous speech but have received relatively little attention in second language teaching. Perhaps this is because filled pauses have often been regarded as meaningless elements resulting from speech processing difficulties. This paper draws from research in widely disparate fields to show that speakers and listeners use them systematically and meaningfully. These facts are used to generate a unified and coherent model of filled pauses in spontaneous speech. This model is then used to develop a concept of communicative competence in which filled pauses play a role at the interface between pragmatic constraints and communication strategies. The article concludes with practical recommendations for how filled pauses may be incorporated into the second-language teaching curriculum.
Michiko Watanabe, Keikichi Hirose, Yasuharu Den, and Nobuaki Minematsu, “Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners,” Speech Communication, vol. 50, no. 2, February 2008, pp. 81-94. DOI: 10.1016/j.specom.2007.06.002.

Abstract We examined whether filled pauses (FPs) affect listeners’ predictions about the complexity of upcoming phrases in Japanese. Studies of spontaneous speech corpora show that constituents tend to be longer or more complex when they are immediately preceded by FPs than when they are not. From this finding, we hypothesized that FPs cause listeners to expect that the speaker is going to refer to something that is likely to be expressed by a relatively long or complex constituent. In the experiments, participants listened to sentences describing both simple and compound shapes on a computer screen. Their task was to press a button as soon as they had identified the shape corresponding to the description. Phrases describing shapes were immediately preceded by a FP, a silent pause of the same duration, or no pause. We predicted that listeners’ response times to compound shapes would be shorter when there is a FP before phrases describing the shape than when there is no FP, because FPs are good cues to complex phrases, whereas response times to simple shapes would not be shorter with a preceding FP than without. The results of native Japanese and proficient non-native Chinese listeners agreed with the prediction and provided evidence to support the hypothesis. Response times of the least proficient non-native listeners were not affected by the existence of FPs, suggesting that the effects of FPs on non-native listeners depend on their language proficiency.
Chen-huei Wu, “Filled Pauses in L2 Chinese: A Comparison of Native and Non-Native Speakers,” in Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20), Columbus, Ohio, The Ohio State University, 2008, pp. 213-227. http://chinalinks.osu.edu/naccl/naccl-20/NACCL-20_Proceedings.htm.

Abstract The aim of this paper is to determine whether native and non-native speech can be predicted on the basis of temporal measurements of filled pauses by training a Classification and Regression Tree (Breiman et al. 1984). On the basis of the present results, several conclusions can be drawn: First, distinguishing between native and non-native speech can increase in accuracy based on temporal measurements of FPs. Among these variables, the rate of speech appears to be the best predictor. Second, this study suggests that information from the FPs ‘uh’ and ‘um’ is a useful predictor of fluency in further differentiating native/nonnative speakers. Third, the classification can be accurately predicted with a small set of variables.

2007

Karl G.D. Bailey, and Fernanda Ferreira, “The Processing of Filled Pause Disfluencies in the Visual World,” in Eye movements: A window on mind and brain, Van Gompel, Roger P.G. and Murray, Wayne S. and Fischer, Martin H. and Hill, Robin L., Ed.Amsterdam: Elsevier, 2007, ch. 22, pp. 485-500. DOI: 10.1016/B978-008044980-7/50024-0.

Abstract One type of spontaneous speech disfluency is the filled pause, in which a filler (e.g. uh) interrupts production of an utterance. We report a visual world experiment in which participants’ eye movements were monitored while they responded to ambiguous utterances containing filled pauses by manipulating objects placed in front of them. Participant’s eye movements and actions suggested that filled pauses informed resolution of the current referential ambiguity, but did not affect the final parse. We suggest that filled pauses may inform the resolution of whatever ambiguity is most salient in a given situation.
Esther de Leeuw, “Hesitation Markers in English, German, and Dutch,” Journal of Germanic Linguistics, vol. 19, no. 2, 2007, pp. 85-114. DOI: 10.1017/S1470542707000049.

Abstract This study reports on a number of highly significant differences found between English, German, and Dutch hesitation markers. English and German native speakers used significantly more vocalic-nasal hesitation markers than Dutch native speakers, who used predominantly vocalic hesitation markers. English hesitation markers occurred most frequently when preceded by silence and followed by a lexical item, or when surrounded by silence. German and Dutch hesitation markers occurred most frequently surrounded by lexical items. In Dutch, vocalic-nasal hesitation markers dominated only when surrounded by silence. Vocalic-nasal hesitation markers dominated in all positions in English and German, although in the former language this was more salient than in the latter. Nasal hesitation markers were used significantly more frequently in German than in English or Dutch. In addition to overall language trends, speaker-specific differences, especially within German and Dutch, were observed. These results raise questions in terms of the symptom versus signal hypotheses regarding the function of hesitation markers.
Carol Fehringer, and Christina Fry, “Hesitation phenomena in the language production of bilingual speakers: The role of working memory,” Folia Linguistica, vol. 41, no. 1-2, June 2007, pp. 37-72. DOI: 10.1515/flin.41.1-2.37. http://related.springerprotocols.com/lp/de-gruyter/hesitation-phenomena-in-the-language-production-of-bilingual-speakers-1GCcNqDqgA.

Abstract This paper is an empirical investigation of the use of hesitation phenomena, specifically filled pauses (ums and ers), automatisms (sort of, at the end of the day), repetitions and reformulations, in both the mother tongue (L1) and second language (L2) of highly proficient adult bilingual speakers (English and German). Its purpose is to ascertain: i) whether speakers who are highly proficient in L2 produce an approximately similar amount of hesitation phenomena in both languages; and ii) whether the production of such elements (in both languages) is linked to working memory capacity. Results show that: i) despite high proficiency, speakers produced a higher overall rate of hesitation phenomena in their L2, indicating that there was an additional cognitive load imposed by working in L2; and ii) in each language there was an underlying negative relationship between memory capacity and the production of hesitation phenomena, implying that speakers with lower memory ability rely more heavily on such time-buying devices. Furthermore, it was shown that the individual types of hesitation phenomena produced by speakers in their L1 were carried over into their L2, which suggests that a speaker’s planning behaviour is mirrored in both languages.

Keywords bilingual, hesitation, L2, memory, prefabricated utterance, Speech production, working
Jean E. Fox Tree, “Folk notions of um and uh, you know, and like,” Text & Talk, vol. 22, no. 3, 2007, pp. 297-314. DOI: 10.1515/TEXT.2007.012. https://www.degruyter.com/view/j/text.2007.27.issue-3/text.2007.012/text.2007.012.xml.

Abstract The current study measures laypeople’s uses of 'um', 'uh', 'you know', and 'like', including folk notions of meanings, self-assessments of use, history of discussing use, and attitudes toward the words. Unlike the prevalent idea in the popular press that these discourse markers are interchangeable speaker production flaws, respondents in this study demonstrated that people do possess folk notions of meanings and uses that dramatically distinguish markers from each other. 'Um' and 'uh' were thought to indicate production trouble, 'you know' was thought to be used in checking for understanding and connecting with listeners, and 'like' defied definition. The folk notions of 'um', 'uh', and 'you know' accord well with researchers’ ideas about the meanings of these words. The use of 'like' may be too subtle for laypeople to articulate. Most researchers’ views of 'like' involve some kind of discrepancy between what’s said and what’s meant. Even if they cannot state a meaning, people do treat the different markers differently.

Keywords Discourse markers, fillers, like, meaning, spontaneous speech, you know
Irena O’Brien, Norman Segalowitz, Barbara Freed, and Joe Collentine, “Phonological Memory Predicts Second Language Oral Fluency Gains in Adults,” Studies in Second Language Acquisition, vol. 29, no. 04, 2007, pp. 557-581. DOI: 10.1017/s027226310707043x. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1392672&fulltextType=RA&fileId=S027226310707043X.

Abstract This study investigated the relationship between phonological memory and second language (L2) fluency gains in native English-speaking adults learning Spanish in two learning contexts: at their home university or abroad in an immersion context. Phonological memory (operationalized as serial nonword recognition) and Spanish oral fluency (temporal/hesitation phenomena) were assessed at two times, 13 weeks apart. Hierarchical regressions showed that, after the variance attributable to learning context was partialed out, initial serial nonword recognition performance was significantly associated with L2 oral fluency development, explaining 4.5-9.7% of unique variance. These results indicate that phonological memory makes an important contribution to L2 learning in terms of oral fluency development. Furthermore, these results from an adult population extend conclusions from previous studies that have claimed a role for phonological memory primarily in vocabulary development in younger populations.
Pavel Trofimovich, and Wendy Baker, “Learning prosody and fluency characteristics of second language speech: The effect of experience on child learners’ acquisition of five suprasegmentals,” Applied Psycholinguistics, vol. 28, no. 2, 2007, pp. 251-276. DOI: 10.1017/s0142716407070130. http://journals.cambridge.org/article_S0142716407070130.

Abstract This study examined second language (L2) experience effects on children’s acquisition of fluency-(speech rate, frequency, and duration of pausing) and prosody-based (stress timing, peak alignment) suprasegmentals. Twenty Korean children (age of arrival in the United States = 7-11 years, length of US residence = 1 vs. 11 years) and 20 age-matched English monolinguals produced six English sentences in a sentence repetition task. Acoustic analyses and listener judgments were used to determine how accurately the suprasegmentals were produced and to what extent they contributed to foreign accent. Results indicated that the children with 11 years of US residence, unlike those with 1 year of US residence, produced all but one (speech rate) suprasegmentals natively. Overall, findings revealed similarities between L2 segmental and suprasegmental learning.
Ioana Vasilescu, Rena Nemoto, and Martine Adda-Decker, “Vocalic Hesitations vs Vocalic Systems: A Cross-Language Comparison,” in 16th International Congress of Phonetic Sciences, 2007. http://www.icphs2007.de/conference/Papers/1504/index.html.

Abstract This paper deals with the acoustic characteristics of vocalic hesitations in a cross-language perspective. The underlying questions concern the "neutral" vs. language-dependent timbre of vocalic hesitations and the link between their vocalic quality and the phonemic system of the language. An additional point of interest concerns the duration effect on vocalic hesitations compared to intra-lexical vowels. Acoustic measurements have been carried out in American English, French and Spanish. Results on vocalic timbre show that hesitations (i) carry language-specific information; (ii) whereas often close to measurements of existing vowels, they do not necessarily collapse with them. Finally, (iii) duration variation affects the timbre of vocalic hesitation and a centralization towards a "neutral" realization is observed for decreasing durations.

Keywords centralization, duration, timbre, vocalic hesitation, vocalic systems

2006

Felix K. Ameka, “Interjections,” in Encyclopedia of Language & Linguistics, Brown, Keith, Ed.Oxford, UK: Oxford, 2006, pp. 743-746. DOI: 10.1016/B0-08-044854-2/00396-5.

Abstract Interjections are words that conventionally constitute utterances by themselves and express a speaker’s current mental state or reaction toward an element in the linguistic or extralinguistic context. Some English interjections are words such as yuk! ‘I feel disgusted,’ ow! ‘I feel sudden pain,’ wow! ‘I feel surprised and I am impressed,’ aha! ‘I now understand,’ hey! ‘I want someone’s attention,’ damn! ‘I feel frustrated,’ and bother! ‘I feel annoyed.’ Such words are found in all languages of the world. This article surveys the different uses and definitions of the term ‘interjection’ and the different types of interjections that are found in the languages of the world. It also explores the relationship of interjections to other pragmatic devices such as particles, discourse markers, and speech formulae.

Keywords formulaic language, Indexicality, interjections, language functions, onomatopoeia, particles, routines, speech acts
Richard Bello, “Causes and paralinguistic correlates of interpersonal equivocation,” Journal of Pragmatics, vol. 38, no. 9, 2006, pp. 1430-1441. DOI: 10.1016/j.pragma.2005.09.001.

Abstract This paper examines the long standing theory of the Bavelas group which suggests that the only consistent cause of interpersonal equivocation is avoidance-avoidance conflict (AAC), and it also attempts to uncover a psycholinguistic profile of equivocation, especially in the form of paralinguistic cues such as dysfluencies. Participants responded orally to questions from hypothetical interlocutors within scenarios which manipulated both the presence/absence of AAC and level of situational formality. Their responses (72 messages) were audio taped, transcribed, rated for degree of equivocation, and coded for dysfluencies. Results of ANOVA showed that AAC not only resulted in more equivocation, but also that formality level interacted with AAC in influencing equivocation. Participants used filled pauses, surprisingly, in the condition within which they equivocated the least, although they produced other dysfluencies (combined) within conditions where they equivocated the most. Results are discussed in terms of the notion that filled pauses are special and in terms of interpersonal deception theory.

Keywords avoidance-avoidance conflict, disfluencies, Equivocation, filled pauses, Informality, Interpersonal communication, Paralinguistics
Stefan Benus, Frank Enos, Julia Hirschberg, and Elizabeth Shriberg, “Pauses in Deceptive Speech,” in Speech Prosody 18, Dresden, Germany, 2006, pp. 2-5. http://aune.lpl.univ-aix.fr/sprosig/sp2006/.

Abstract We use a corpus of spontaneous interview speech to investigate the relationship between the distributional and prosodic characteristics of silent and filled pauses and the intent of an interviewee to deceive an interviewer. Our data suggest that the use of pauses correlates more with truthful than with deceptive speech, and that prosodic features extracted from filled pauses themselves as well as features describing contextual prosodic information in the vicinity of filled pauses may facilitate the detection of deceit in speech.
Alex Boulton, “To er is human: Silent pauses and speech dysfunctions of the 2004 US presidential debates,” in Le Désaccord, Pereiro, M. and Daniels, H., Ed.Nancy: AMAES, 2006, pp. 7-32. http://hal.archives-ouvertes.fr/hal-00114282/en/.

Abstract It has become fashionable, even axiomatic in some circles today, to suppose that politics is all about form, not content—it’s not what they say but the way that they say it. It ought to follow that the most powerful politicians should be the best speakers, so this paper takes as its starting point the 2004 US presidential debates. These televised confrontations, where each candidate has to react to new questions as well as to counter his opponent, are notoriously high-risk, and present considerable opportunities for various speech "dysfunctions". These are analysed in relation to media reaction and public perception of the outcome.

Keywords cognitive science, disfluency, hesitation, linguistics, presidential debate, speed of articulation
Martin Corley, Lucy J. MacGregor, and David Donaldson, “It’s the way that you, er, say it: Hesitations in speech affect language comprehension,” Cognition, vol. 105, no. 3, 2006, pp. 658-698. DOI: 10.1016/j.cognition.2006.10.010. http://www.elsevier.com/locate/COGNIT.

Abstract Everyday speech is littered with disfluency, often correlated with the production of less predictable words (e.g., Beattie & Butterworth [Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses in spontaneous speech. Language and Speech, 22, 201-211.]). But what are the effects of disfluency on listeners? In an ERP experiment which compared fluent to disfluent utterances, we established an N400 effect for unpredictable compared to predictable words. This effect, reflecting the difference in ease of integrating words into their contexts, was reduced in cases where the target words were preceded by a hesitation marked by the word er. Moreover, a subsequent recognition memory test showed that words preceded by disfluency were more likely to be remembered. The study demonstrates that hesitation affects the way in which listeners process spoken language, and that these changes are associated with longer-term consequences for the representation of the message.

Keywords disfluency, ERPs, Language comprehension, speech
Chika Nagaoka, “Mutual influence of nonverbal behavior in interpersonal communication,” Japanese Journal of Interpersonal and Social Psychology, vol. 6, 2006, pp. 101-112. http://syasin.hus.osaka-u.ac.jp/jjisp/006/nagaoka-a.html.

Abstract In social interactions, the interactants’ nonverbal behavior may synchronize and become similar. In this study, the author called this phenomenon ‘synchrony tendency’. Since conventional research about this phenomenon has been conducted from various angles separately, there has been almost no attempt to examine the role of synchrony tendency systematically. In this light, the present study aims at reviewing synchrony tendency based on previous studies from various fields and perspectives. The synchrony tendency has been observed in various communication channels, and in various forms, such as interspeaker congruence of paralanguage, convergence of accents in cross-cultural communication, mimicry of other’s facial and vocal emotional expressions, neonate imitation, interpersonal synchrony of body movements, entrainment between a neonate’s body movement and the flow of an adult’s speech. Therefore, this phenomenon has been labeled with various terms, each one having a specific nuance. Moreover, the synchrony tendency is not always observed in all interactions, and it sensitively changes with various factors, such as the interactants’ level of empathy and socialization. For example, the results of my experiments indicate that the convergence of response latencies (i.e., latencies before responding to the last utterance of one’s partner) in dialogues reflects whether a speaker is receptive to the conversational partner during the dialogue. All these suggest that the synchrony tendency provides an effective indicator reflecting various aspects of our communication behavior. Various functions of the synchrony tendency in adults’ interactions can be inferred from past literature: (a) it facilitates the understanding of an interactional partner’s emotions, (b) it conveys empathy and rapport, and (c) it makes the speakers’ personality and attitude feel positive. Furthermore, the results of my experiments showed that the synchrony tendency facilitates goal achievement, such as reaching a compromise through discussion (the speakers whose response latencies became similar over the time course to those of their conversational partners evaluated that they reached a compromise). Past literature along with the results of my own experiments bring to light two aspects of the synchrony tendency: the emotional/automatic/inherent aspect and the cognitive/acquired aspect. Examples that clearly illustrate the former aspect are imitations of facial and vocal emotional expressions and neonate imitation. On the other hand, the cognitive/acquired aspect is illustrated by convergence or congruence of response latencies, vocal intensity, speech duration, language, or accent, and is influenced by social factors. The above-mentioned aspects of the synchrony tendency match Hess, Philippot, & Blairy (1999)’s mimicry model, Giles et al.’s communication accommodation theory (ex. Shepard, Giles, & LePoire, 2001), as well as the author’s speech style convergence model. The speech styles convergence model derived from a series of studies on the convergence of response latencies in dialogues. This model suggests that adopting a partner’s speech style and the output cycle between the interactants being influenced by the speakers’ social skills and attitude towards the partner, this cycle develops over the course of the interaction until the speech styles finally converge to a point most suitable for the members of the dyad to progress smoothly through the dialogue. In the future, it is necessary to investigate quantitatively through which communication channels, and when in the time course of an interaction, the synchrony tendency is displayed.

Keywords cognition, emotion, nonverbal behavior, synchrony tendency
Stefanie Pillai, “Self-Monitoring and Self-Repair in Spontaneous Speech,” k@ta, vol. 8, no. 2, 2006, pp. 114-126. http://puslit2.petra.ac.id/ejournal/index.php/ing/article/viewArticle/16575.

Abstract This study explores what repairs in the spontaneous production of speech reveal about the psycholinguistic processes of self-monitoring and self-repair. Three intervals were examined: error-to-cut off; cut off-to-repair; error-to-repair. The intervals indicate support theories of internal speech monitoring, and also indicate that the planning of speech-repairs can take place pre-articulatorily as well

Keywords error-detection, Perceptual loop theory, self-monitoring, self-repairs, Speech production
Mandana Seyfeddinipur, “Disfluency: Interrupting speech and gesture,” PhD Dissertation, Radboud University Nijmegen, Nijmegen, The Netherlands. 2006. DOI: 10.17617/2.59337. http://hdl.handle.net/11858/00-001M-0000-0013-1B6F-F.

Abstract (none)
Siegfried Ludwig Sporer, and Barbara Schwandt, “Paraverbal indicators of deception: a meta-analytic synthesis,” Applied Cognitive Psychology, vol. 20, no. 4, 2006, pp. 421-446. DOI: 10.1002/acp.1190. https://onlinelibrary.wiley.com/doi/abs/10.1002/acp.1190.

Abstract This meta-analysis provides a quantitative synthesis of paraverbal indicators of deception as a function of different moderator variables. Of nine different speech behaviours analysed only two were reliably associated with deception in the weighted, and four in the analysis unweighted by sample size. Pitch, response latency and speech errors were positively, message duration negatively related to deception. As most effect sizes were found to be heterogeneous, analyses of moderator variables revealed that many of the observed relationships varied as a function of content, preparation, motivation, sanctioning of the lie, experimental design and operationalization. Of different theoretical approaches reviewed, a working memory model of lie production may best account for the findings. Because of the small effect sizes, and the heterogeneity in findings, practitioners must be cautioned to use such indicators in assessing the truthfulness of reports but nonetheless practical implications for different types of situations are outlined.
Pavel Trofimovich, and Wendy Baker, “Learning Second Language Suprasegmentals: Effect of L2 Experience on Prosody and Fluency Characteristics of L2 Speech,” Studies in Second Language Acquisition, vol. 28, 2006, pp. 1-30. DOI: 10.1017/S0272263106060013.

Abstract This study examines effects of short, medium, and extended second language (L2) experience (3 months, 3 years, and 10 years of United States residence, respectively) on the production of five suprasegmentals (stress timing, peak alignment, speech rate, pause frequency, and pause duration) in six English declarative sentences by 30 adult Korean learners of English and 10 adult native English speakers. Acoustic analyses and listener judgments were used to determine how accurately the suprasegmentals were produced and to what extent they contributed to foreign accent. Results revealed that amount of experience influenced the production of one suprasegmental (stress timing), whereas adult learners’ age at the time of first extensive exposure to the L2 (indexed as age of arrival in the United States) influenced the production of others (speech rate, pause frequency, pause duration). Moreover, it was found that suprasegmentals contributed to foreign accent at all levels of experience and that some suprasegmentals (pause duration, speech rate) were more likely to do so than others (stress timing, peak alignment). Overall, results revealed similarities between L2 segmental and suprasegmental learning.
Aldert Vrij, Lucy Akehurst, Laura Brown, and Samantha Mann, “Detecting Lies in Young Children, Adolescents and Adults,” Applied Cognitive Psychology, vol. 20, 2006, pp. 1225-1237. DOI: 10.1002/acp.1278.

Abstract The ability of teachers, social workers, police officers and laypersons (undergraduate and postgraduate students) to detect truths and lies told by 5-6 year-olds, adolescents and adults was tested in the present experiment. Lie detectors judged the veracity of statements from 18 liars and 18 truth tellers belonging to these three age groups. Accuracy scores were around 60% for each of these three age groups, both for detecting truths and for detecting lies. No occupational differences emerged. Moreover, judgements made by teachers, social workers and police officers showed an overlap, suggesting that an erroneous decision made by a member of one group may not easily be detected by a member of the other groups. The lie detectors were inclined to judge cues of nervousness, cognitive demand and attempted behavioural control as cues to deceit, even when truth tellers were displaying these cues.

2005

Timothy Arbisi-Kelm, and Sun-Ah Jun, “A comparison of disfluency patterns in normal and stuttered speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_013.pdf.

Abstract While speech disfluencies are commonly found in every speaker’s speech, stuttering is a language disorder characterized by an abnormally high rate of speech aberrations, including prolongation, cessation, and repetition of speech segments. However, despite the obvious differences between stuttered and normal speech, identifying the crucial qualities that identify stuttered speech remains a significant challenge. A story-telling task was presented to four stutterers and four non-stutterers in order to analyze the prosodic patterns that surfaced from their spontaneous narrations. Preliminary results revealed that the major difference between stutterers’ and non-stutterers’ disfluencies – aside from the total number – is the type of disfluency and the context affected by the disfluency. Disfluencies in both groups included prolongation, pause and cut, but stutterers’ disfluencies also include repetition and combinations of the three (e.g., cut followed by pause). In addition, stutterers’ disfluencies were accompanied by more prosodic irregularities (e.g. pitch accent on function words, creating a prosodic break with degraded phonetic cues) prior to the actual disfluency than non-stutterers’ disfluencies, indirectly supporting the overvigilant self-monitoring hypothesis.

Keywords DiSS
Matthew P. Aylett, “Extracting the acoustic features of interruption points using non-lexical prosodic analysis,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_017.pdf.

Abstract Non-lexical prosodic analysis is our term for the process of extracting prosodic structure from a speech waveform without reference to the lexical contents of the speech. It has been shown that human subjects are able to perceive prosodic structure within speech without lexical cues. There is some evidence that this extends to the perception of disfluency, for example, the detection interruption points (IPs) in low pass filtered speech samples. In this paper, we apply non-lexical prosodic analysis to a corpus of data collected for a speaker in a multi-person meeting environment. We show how non-lexical prosodic analysis can help structure corpus data of this kind, and reinforce previous findings that non-lexical acoustic cues can help detect IPs. These cues can be described by changes in amplitude and f0 after the IP and they can be related to the acoustic characteristics of hyper-articulated speech.

Keywords DiSS
Katarina Bartkova, “Prosodic cues of spontaneous speech in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 21-25. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_021.pdf.

Abstract Disfluencies, when present in speech signal, can make syntactic parsing difficult. This difficulty is increased when machines are involved in communication and when speech devices rely on automatic speech recognition techniques. In order to improve automatic speech parsing and thus speech comprehension, methods have been proposed to filter disfluencies out from the speech signal. Attempts have been made to use prosodic parameters to improve such a filtering. However, before introducing prosodic parameters into automatic speech recognition processes, it would be useful to investigate whether disfluencies can be characterized in a prosodic way and whether their prosodic cues would be representative enough to be used in automatic systems. The aim of this study was to examine to which extent prosodic parameters would be able to characterize disfluencies in French. Word repetitions, filled and silent pauses and speech repairs were described in a prosodic way using statistical analyses of their prosodic parameters. These analyses allowed simple prosodic rules to be formulated. The efficiency of the prosodic rules was evaluated on the task of filled pauses, word repetitions and hesitation detections.

Keywords DiSS
Philippe Boula de Mareüil, Benoît Habert, Frédérique Bénard, Martine Adda-Decker, Claude Barras, Gilles Adda, and Patrick Paroubek, “A quantitative study of disfluencies in French broadcast interviews,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 27-32. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_027.pdf.

Abstract The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the so-called disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers’ status? And what are the most frequent disfuency patterns?

Keywords DiSS
Jean-Leon Bouraoui, and Nadine Vigouroux, “Disfluency phenomena in an apprenticeship corpus,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 33-37. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_033.pdf.

Abstract This papers presents a study carried out on an apprenticeship corpus. It features dialogues between air traffic controllers in formation and "pseudo-pilots". "Pseudo-pilots" are people (often instructors) that simulate the behavior of real pilots, in real situations. Its main specificities are the apprenticeship characteristic, and the fact that the production is subordinate to a particular phraseology. Our study is related to the many kinds of disfluency phenomena that occur in this specific corpus. We define 6 main categories of these phenomena, and take position in regard to the terminology used in literature. We then present the distribution of these categories. It appears that some of the occurrences frequencies largely differs from those observed in other studies. Our explanation is based on the corpus specificity: in reason of their responsibilities, both controllers and pseudo-pilots have to be especially careful to the mistakes they could do, since they could lead to some dramas. The remainder of our paper is dedicated to the more deepen study of a disfluency class: the "false starts". It consists of the beginning utterance of a word, that is not achieved. We show that this category consists of several sub-categories, of which we study the distribution.

Keywords DiSS
Pierpaolo Busan, Giovanna Pelamatti, Alessandro Tavano, Michele Grassi, and Franco Fabbro, “Improvement of verbal behavior after pharmacological treatment of developmental stuttering: a case study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 39-42. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_039.pdf.

Abstract Developmental stuttering is a disruption in normal speech fluency and rhythm. Developmental stuttering usually manifests between 6 and 9 years of age and may persist in adulthood. At present, the exact etiology of developmental stuttering is not fully clear. Besides, the dopaminergic neurological component is likely to have a causal role in the manifestation of stuttering behaviors. Actually, some studies seem to confirm the efficacy of antidopaminergic drugs (haloperidol, risperidone and olanzapine, among others) in controlling stuttering behaviors. We present a case of persistent developmental stuttering in a 24-year-old adult male who was able to control his symptoms to a significant extent after administration of risperidone, an antidopaminergic drug. Our findings show that the pharmacological intervention helped the patient improve on a set of fluency tasks but especially when the tasks involved the uttering of content words. Our results are discussed against the current theories on the cognitive and neurological basis of developmental stuttering.

Keywords DiSS
Estelle Campione, and Jean Véronis, “Pauses and hesitations in French spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 43-46. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_043.pdf.

Abstract In traditional terminology, silent and filled pauses are grouped together, whereas hesitation lengthening is put into a separate category. However, while these various phenomena are very often associated, there have been few studies on how they interact. We analyzed an hour of spontaneous speech to show that silent and filled pauses operate in a totally different way, and that contrary to common belief, silent pauses by themselves never serve as hesitation markers, but only do so when coupled with other markers – mostly syllabic lengthening and filled pauses. These last two hesitation markers have similar acoustic and articulatory characteristics; they are also distributed and function alike.

Keywords DiSS
Maria Candea, Ioana Vasilescu, and Martine Adda-Decker, “Inter- and intra-language acoustic analysis of autonomous fillers,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 47-51. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_047.pdf.

Abstract The present work deals with autonomous fillers in a multilingual context. The question addressed here is whether fillers are carrying universal or language-specific characteristics. Fillers occur frequently in spontaneous speech and represent an interesting topic for improving language-specific models in automatic language processing. Most of the current studies focus on few languages such as English and French. We focus here on multilingual fillers resulting from eight languages (Arabic, Mandarin Chinese, French, German, Italian, European Portuguese, American English and Latin American Spanish). We propose thus an acoustic typology based on the vocalic peculiarities of the autonomous fillers. Three parameters are considered here: duration, pitch (F0) and timbre (F1/F2). We also compare the vocalic segments of the fillers with intra-lexical vowels possessing similar timbre. In this purpose, a preliminary study on French language is described.

Keywords DiSS
Jennifer Cole, Mark Hasegawa-Johnson, Chilin Shih, Heejin Kim, Eun-Kyung Lee, Hsin-yi Lu, Yoonsook Mo, and Tae-Jin Yoon, “Prosodic parallelism as a cue to repetition and error correction disfluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 53-58. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_053.pdf.

Abstract Complex disfluencies that involve the repetition or correction of words are frequent in conversational speech, with repetition disfluencies alone accounting for over 20% of disfluencies. These disfluencies generally do not lead to comprehension errors for human listeners. We propose that the frequent occurrence of parallel prosodic features in the reparandum (REP) and alteration (ALT) intervals of complex disfluencies may serve as strong perceptual cues that signal the disfluency to the listener. We report results from a transcription analysis of complex disfluencies that classifies disfluent regions on the basis of prosodic factors, and preliminary evidence from F0 analysis to support our finding of prosodic parallelism.

Keywords DiSS
Andrew A. Cooper, and John T. Hale, “Promotion of disfluency in syntactic parallelism,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 59-63. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_059.pdf.

Abstract The development of a disfluency-robust speech parser requires some insight into where disfluencies occur in spontaneous spoken language. This corpus study deals with one syntactic variable which is predictive of disfluency location: syntactic parallelism. A formal definition of syntactic parallelism is used to show that syntactic parallelism is indeed predictive of disfluency.

Keywords DiSS
Rodolfo Delmonte, “Modeling conversational styles in Italian by means of overlaps,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 65-70. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_065.pdf.

Abstract Conversational styles vary cross-culturally remarkably: communities of speakers – rather than single speakers - seem to share turn-taking rules which do not always coincide with those shared by other communities of the same language. These rules are usually responsible for the smoothness of conversational interaction and the readiness of the attainment of communicative goals by conversants. Overlaps constitute a disruptive element in the economy of conversations: however, they show regular patterns which can be used to define conversational styles (Ford and Thompson, 1996). Overlaps constitute a challenge for any system of linguistic representations in that they cannot be treated as a one-dimensional event: in order to take into account the purport of an overlapping stretch of dialogue for the ongoing pragmatics and semantics of discourse, we have devised a new annotation schema which is then fed into the parser and produces a multidimensional linear syntactic constituency representation. This study takes a new tack on the issues raised by overlaps, both in terms of its linguistic representation and its semantic and pragmatic interpretation. It will present work carried out on the 60,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API - the Italian version of MapTask, in particular the parser, to produce syntactic structures of overlapped temporally aligned turns. We will also present preliminary data from IPAR, another corpus of spontaneous dialogues run with the Spot Differences protocol. Then it will concentrate on the syntactic, semantic and prosodic aspects related to this debated issue. The paper will argue in favour of a joint and thus temporally aligned representation of overlapping material to capture all linguistic information made available by the local context. This will result in a syntactically branching node we call OVL which contains both the overlapper’s and the overlappee’s material (linguistic or non-linguistic). An extended classification of the phenomenon has shown that overlaps contribute substantially to the interpretation of the local context rather than the other way around. They also determine the overall conversational style of a given community of speakers with cultural import.

Keywords DiSS
Janet Fletcher, Nicholas Evans, and Belinda Ross, “The intra-word pause and disfluency in Dalabon,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 77-81. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_077.pdf.

Abstract Earlier impressionistic analyses of Dalabon indicate that the grammatical word is often realized as either an accentual or an intonational phrase, followed by a pause. Unusually, it can also be interrupted by a silent pause, with each section being potentially (although not necessarily) realized as separate intonational phrases. Our analyses of pause duration and pause placement within grammatical words support these earlier impressions, although this use of the silent pause appears to be restricted to certain affix boundaries, and other phonological constraints relating to the following surrounding linguistic material. These interruptions also share certain characteristics of "normal" disfluencies however.

Keywords DiSS
Kristy Beers Fägersten, “Hesitations and repair in German,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 71-76. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_071.pdf.

Abstract The occurrence of pauses and hesitations in spontaneous speech has been shown to occur systematically, for example, "between sentences, after discourse markers and conjunctions and before accented content words." (Hansson [15]) This is certainly plausible in English, where pauses and hesitations can and often do occur before content words such as nominals, for example, "uh, there’s a ... man." (Chafe [8]) However, if hesitations are, in fact, evidence of "deciding what to talk about next," (Chafe [8]) then the complex grammatical system of German should render this pausing position precarious, since pre-modifiers must account for the gender of the nominals they modify. In this paper, I present data to test the hypothesis that pre-nominal hesitation patterns in German are dissimilar to those in English. Hesitations in German will be shown, in fact, to occur within noun phrase units. Nevertheless, native speakers most often succeed in supplying a nominal which conforms to the gender indicated by the determiner or pre-modifier. Corrections, or repairs, of infelicitous pre-modifiers indicate that the speaker was unable to supply a nominal of the same gender which the choice of pre-modifier had committed him/her to. The frequency of such repairs is shown to vary according to task, with fewest repairs occurring in elicited speech which allows for linguistic freedom and therefore is most like spontaneous speech. The data sets indicate that among German native speakers, hesitations occurring before noun phrase units (pre-NPU hesitations) indicate deliberation of what to say, while hesitations within or before the head of the noun phrase (pre-NPH hesitations) indicate deliberation of how to say what has already been decided (cf. Chafe [8]).

Keywords DiSS
Tiit Hennoste, “Repair-initiating particles and um-s in Estonian spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 83-88. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_083.pdf.

Abstract Particles and um-s used in spontaneous Estonian speech as initiators of different types of repair are analysed. Our model and typology of repair based on conversation analysis is introduced. Three main types of repair and particles used to initiate those are described: prepositioned self-initiated self-repair, postpositioned self-initiated self-repair (addition, substitution, insertion and abandon), and other-initiated self-repair (reformulation, clarification and misunderstanding). In conclusion 6 groups of particles are brougth out by the role they play in the initiation of the repair sequence. Data come from Corpus of Spoken Estonian of the University of Tartu, which contains everyday and institutional speech, telephone and face-to-face conversations.

Keywords DiSS
Sandrine Henry, “Repeats in spontaneous spoken French: the influence of the complexity of phrases,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 89-92. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_089.pdf.

Abstract We here present the results of a descriptive study we conducted on 383 disfluent repeats from a corpus of spontaneous spoken French. We analyze noun phrases under construction and study whether there is a co-relation between the frequency of the repeats and the complexity feature of the phrases. We then focus on complex noun phrases in order to locate precisely the repeats. We also analyze how repeats affect structures such as [Preposition + Determiner + Noun] and what the constraints upon such structures are.

Keywords DiSS
Peter Howell, and Olatunji Akande, “Simulations of the types of disfluency produced in spontaneous utterances by fluent speakers, and the change in disfluency type seen as speakers who stutter get older,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 93-98. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_093.pdf.

Abstract The EXPLAN model is implemented on a graphic simulator. It is shown that it is able to produce speech in serial order and several types of fluency failure produced by fluent speakers and speakers who stutter. A way that EXPLAN accounts for longitudinal changes in the pattern of fluency failures shown by speakers who stutter is demonstrated.

Keywords DiSS
Peter Howell, Jennifer Hayes, Ceri Savage, Jane Ladd, and Nafisa Patel, “Factors that determine the form and position of disfluencies in spontaneous utterances,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 99-102. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_099.pdf.

Abstract This presentation reviews work on types of disfluency in the spontaneous speech of fluent speakers and speakers who stutter. Examination is made of factors that determine where disfluencies are located. It is concluded that the phonological, or prosodic, word provides a good basis for explaining the distribution of different types of disfluency in spontaneous speech.

Keywords DiSS
T. Florian Jaeger, “Optional ’that’ indicates production difficulty: evidence from disfluencies,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 103-108. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_103.pdf.

Abstract Optional word omission, such as that omission in complement and relative clauses, has been argued to be driven by production pressure (rather than by comprehension). One particularly strong production-driven hypothesis states that speakers insert words to buy time to alleviate production difficulties. I present evidence from the distribution of disfluencies in non-subject-extracted relative clauses arguing against this hypothesis. While word omission is driven by production difficulties, speakers may use that as a collateral signal to addressees, informing them of anticipated production difficulties. In that sense, word omission would be subject to audience design (i.e. catering to addressees’ needs).

Keywords DiSS
Jumpei Kaneda, “Phrase-final rise-fall intonation and disfluency in Japanese - a preliminary study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 109-112. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_109.pdf.

Abstract In Japanese conversations, rise-fall intonation with vowel lengthening often occurs on the final syllable of a phrase. This phrase-final rise-fall (PFRF) is a new type of intonation first reported in the 1960’s. Researchers consider PFRF intonation a discourse marker which functions to sharpen the phrase boundary and retain the utterance turn, but other phrase-final intonation such as phrase-final lengthening (PFL) can have a similar pattern. PFLs are recognized as a type of disfluent speech with similar characteristics to PFRFs in terms of final-lengthening and having discourse functions. Also from reports about the spontaneity of speech, we assume that PFRFs would have a relation with disfluency, as well as with PFLs. To examine this assumption, this paper attempts to show the co-occurrence relation between PFRF and disfluency in the same utterance. The results show that PFRFs and PFLs have a relation to posterior disfluent units and suggest that both indicate speech planning strategies. Further, this paper speculates that a difference between PFRF and PFL is a difference in the purposes of speech planning: the latter represents ongoing linguistic editing while the former indicates adjusting the utterance according to the interlocutor’s reaction. Disfluencies accordingly occur as effects from processes of speech planning.

Keywords DiSS
Shigeyoshi Kitazawa, “Evaluation of vowel hiatus in prosodic boundaries of Japanese,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 113-116. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_113.pdf.

Abstract We investigated V-V hiatus through J-ToBI labeling and listening to whole phrases to estimate degree of discontinuity and, if possible, to determine the exact boundary between two phrases. Appropriate boundaries were found in most cases as the maximum perceptual score. Using electroglottography (EGG) of the open quotients OQ, pitch mark and spectrogram, the acoustic phonological feature of these V-V hiatus was found as phrase-initial glottalization and phrase-final nasalization observable in EGG and spectrogram, as well as phrase-final lengthening and phrase-initial shortening of the morae. A small dip was observable at the boundary of V-V hiatus showing glottalization. The test materials are taken from the "Japanese MULTEXT", consisting of a particle - vowel (36), adjective - vowel (5), and word - word (4).

Keywords DiSS
Ellen F. Lau, and Fernanda Ferreira, “Lingering effects of disfluent material on comprehension of garden path sentences,” Language and Cognitive Processes, vol. 20, no. 5, 2005, pp. 633-666. DOI: 10.1080/01690960444000142. http://www.tandf.co.uk/journals/pp/01690965.html.

Abstract In two experiments, we tested for lingering effects of verb replacement disfluencies on the processing of garden path sentences that exhibit the main verb/reduced relative (MV/RR) ambiguity. Participants heard sentences with revisions like The little girl chosen, uh, selected for the role celebrated with her parents and friends. We found that the syntactic ambiguity associated with the reparandum verb involved in the disfluency (here chosen) had an influence on later parsing: Garden path sentences that included such revisions were more likely to be judged grammatical if the reparandum verb was structurally unambiguous. Conversely, ambiguous non-garden path sentences were more likely to be judged ungrammatical if the structurally unambiguous disfluency verb was inconsistent with the final reading. Results support a model of disfluency processing in which the syntactic frame associated with the replacement verb "overlays" the previous verb’s structure rather than actively deleting the already-built tree.

Keywords Cognitive Psychology, Language, Language & Linguistics, Neuropsychology, Psychology of, Speech & Language Disorders, Speech Perception & Production
Che-Kuang Lin, Shu-Chuan Tseng, and Lin-Shan Lee, “Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 117-121. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_117.pdf.

Abstract This paper presents a whole set of new features, some duration-related and some pitch-related, to be used in disfluency interruption point (IP) detection for spontaneous Mandarin speech, considering the special linguistic characteristics of Mandarin Chinese. Decision tree is incorporated into the maximum entropy model to perform the IP detection. By examining performance degradation when each specific feature was missing from the whole set, the most important features for IP detection for each disfluency type were analyzed in detail. The experiments were conducted on the Mandarin Conversational Dialogue Corpus (MCDC) developed by the Institute of Linguistics of Academia Sinica in Taiwan.

Keywords DiSS
Tobias Lövgren, and Jan van Doorn, “Influence of manipulation of short silent pause duration on speech fluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 123-126. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_123.pdf.

Abstract Ordinary speech contains disfluencies in the form of hesitations and repairs. When listeners make global judgements on speech fluency they are influenced by the frequency and nature of the individual disfluencies contained in the speech. The aim of this study was to investigate a single dimension, pause duration, in the perception of speech fluency. The method involved simulation of pause duration within naturally fluent speech by manipulating existing acoustic silences in the speech. Four conditions were created: one for the natural speech and three with step wise increases in acoustic silence durations (average x2, x4 and x7.5 respectively). In a forced choice task listeners were asked to judge the speech samples as fluent or non fluent. The results showed that the percentage of judgements of disfluency increased as the pause durations increased, and that the difference between the unmanipulated speech condition and the two conditions with the longest pause durations were statistically significant. The results were interpreted to indicate that the individual dimension of pause duration has an independent influence on the judgement of fluency in ordinary speech.

Keywords DiSS
Elgar-Paul Magro, “Disfluency markers and their facial and gestural correlates. preliminary observations on a dialogue in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 127-131. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_127.pdf.

Abstract The aim of this article is to try to establish any observable regularities between the vocal and the visual expression of disfluency markers in a French spontaneous dialogue. The data show different configurations for different types of disfluency markers. Thus "euh"s are typically accompanied by mutual eye contact and no gesture; interrupted eye contact takes place less frequently, on occasions where speech planning is more seriously impaired (syntactical disruption and combination of "euh" with other disfluency markers). False starts seem to be typically accompanied by gesture production whereas eye contact can be maintained if the speaker relies or not on the listener to resolve the speech production problem. The article takes up the idea that disfluency markers can be classified along a continuum throughout the speech formulation process, going from the most discreet to the most prominent. It suggests that the more prominent the disfluency, the more likely is the visual channel to play a role (interrupted eye contact and gesture production).

Keywords DiSS
Jan McAllister, and Mary Kingston, “Characteristics of final part-word repetitions,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 7-11. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_007.pdf.

Abstract In an earlier paper, we have described final part-word repetitions in the conversational speech of two school-age boys of normal intelligence with no known neurological lesions. In this paper we explore in more detail the phonetic and linguistic characteristics of the speech of the boys. The repeated word fragments were more likely to be preceded by a pause than followed by one. The word immediately following the fragment tended to have a higher word frequency score than other surrounding words. Utterances containing the disfluencies typically contained a greater number of syllables than those that did not; however, there was no reliable difference between fluent and disfluent utterances in terms of their grammatical complexity.

Keywords DiSS
Hannele Nicholson, Ellen Gurman Bard, Robin Lickley, Anne H. Anderson, Catriona Havard, and Yiya Chen, “Disfluency and behaviour in dialogue: evidence from eye-gaze,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 133-138. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_133.pdf.

Abstract Previous research on disfluency types has focused on their distinct cognitive causes, prosodic patterns, or effects on the listener. This paper seeks to add to this taxonomy by providing a psycholinguistic account of the dialogue and gaze behaviour speakers engage in when they make certain types of disfluency. Dialogues came from a version of the Map Task, [2, 4], in which 36 normal adult speakers each participated in six dialogues across which feedback modality and time-pressure were counter-balanced. In this paper, we ask whether disfluency, both generally and type-specifically, was associated with speaker attention to the listener. We show that certain disfluency types can be linked to particular dialogue goals, depending on whether the speaker had attended to listener feedback. The results shed light on the general cognitive causes of disfluency and suggest that it will be possible to predict the types of disfluency which will accompany particular behaviours.

Keywords DiSS
Sieb Nooteboom, “Lexical bias re-re-visited. some further data on its possible cause.,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 139-144. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_139.pdf.

Abstract This paper describes an experiment eliciting spoonerisms by using the so-called SLIP technique. The purpose of the experiment was to provide a further test of the hypothesis that self-monitoring of inner speech is a major source of lexical bias. This is a follow-up on an earlier experiment in which subjects were explicitly prompted after each response to make a correction in case of a speech error. In the current experiment both the prompt and the extra time for correction were left out, and there was no strong time pressure for the subject in giving his response. It is shown that under these conditions many primed-for spoonerisms are replaced by other, mostly lexical, errors. These ’replacing’ or ’secondary’ errors are more frequent in the condition priming for nonword-nonword errors than in the condition priming for word-word errors. Response times obtained for replacing errors are considerably and significantly longer than response times for overtly interrupted errors, and also longer than response times for the primed-for spoonerisms. This suggests that a time-consuming operation follows the primed-for spoonerisms in inner speech, and replaces those with other speech errors, often to preserve lexicality of the error.

Keywords DiSS
Daniel O’Connell, and Sabine Kowal, “Where Do Interjections Come From? A Psycholinguistic Analysis of Shaw’s Pygmalion,” Journal of Psycholinguistic Research, vol. 34, no. 5, September 2005, pp. 497-514. DOI: 10.1007/s10936-005-6205-x.

Abstract Starting from our recent findings regarding emotional and initializing functions of interjections in TV and radio interviews (Kowal & O’Connell, 2004b; O’Connell & Kowal, in press; O’Connell, Kowal, & Ageneau, 2005), we used the book and script of Shaw (1916/1969) and the audiotape of the motion picture (Pascal, Asquith, & Howard, 1938) Pygmalion to investigate how actors use interjections to express emotions. The following hypotheses were tested: (1) The actors use the written cues selectively in their oral performance by substituting, adding, and deleting interjections; (2) primary interjections added by the actors are less conventional than those in the written text; (3) durations and number of syllables of Eliza Doolittle’s spoken renditions of her signature interjection ah-ah-ah-ow-ow-ow-oo do not correlate with the length in letters and syllables of the written versions; and (4) there is no evidence for Ameka’s (1992b, 1994) characterization of interjections as temporally isolated, i.e., preceded and followed by silent pauses, in consequence of their syntactic isolation. Our findings confirmed all the hypotheses except for one unexpectedly significant correlation between number of syllables in Eliza Doolittle’s signature interjection in the written version and duration in seconds of the spoken version thereof. The common thread throughout these data is the actor’s need to personalize emotions in a dramatic performance—by means of interjections other than those provided in the written text. In this process of personalization, the emotional and initializing functions of interjections are confirmed.

Keywords conceptual and medial orality, dramatic performance, emotional expression, interjections, spontaneity
Daniel O’Connell, and Sabine Kowal, “Uh and Um Revisited: Are They Interjections for Signaling Delay?,” Journal of Psycholinguistic Research, vol. 34, no. 6, 2005, pp. 555-576. DOI: 10.1007/s10936-005-9164-3.

Abstract Clark and Fox Tree (2002) have presented empirical evidence, based primarily on the London-Lund corpus (LL; Svartvik & Quirk, 1980), that the fillers uh and um are conventional English words that signal a speaker’s intention to initiate a minor and a major delay, respectively. We present here empirical analyses of uh and um and of silent pauses (delays) immediately following them in six media interviews of Hillary Clinton. Our evidence indicates that uh and um cannot serve as signals of upcoming delay, let alone signal it differentially: In most cases, both uh and um were not followed by a silent pause, that is, there was no delay at all; the silent pauses that did occur after um were too short to be counted as major delays; finally, the distributions of durations of silent pauses after uh and um were almost entirely overlapping and could therefore not have served as reliable predictors for a listener. The discrepancies between Clark and Fox Tree’s findings and ours are largely a consequence of the fact that their LL analyses reflect the perceptions of professional coders, whereas our data were analyzed by means of acoustic measurements with the PRAAT software (www.praat.org). A comparison of our findings with those of O’Connell, Kowal, and Ageneau (2005) did not corroborate the hypothesis of Clark and Fox Tree that uh and um are interjections: Fillers occurred typically in initial, interjections in medial positions; fillers did not constitute an integral turn by themselves, whereas interjections did; fillers never initiated cited speech, whereas interjections did; and fillers did not signal emotion, whereas interjections did. Clark and Fox Tree’s analyses were embedded within a theory of ideal delivery that we find inappropriate for the explication of these phenomena.

Keywords filled pauses, fillers, hesitations, interjections, spontaneous speech, uh, um
Daniel O’Connell, Sabine Kowal, and Carie Ageneau, “Interjections in Interviews,” Journal of Psycholinguistic Research, vol. 34, no. 2, March 2005, pp. 153-171. DOI: 10.1007/s10936-005-3636-3.

Abstract A psycholinguistic hypothesis regarding the use of interjections in spoken utterances, originally formulated by Ameka (1992b, 1994) for the English language, but not confirmed in the German-language research of Kowal and O’Connell (2004 a & c), was tested: The local syntactic isolation of interjections is paralleled by their articulatory isolation in spoken utterances, i.e., by their occurrence between a preceding and a following pause. The corpus consisted of four TV and two radio interviews of Hillary Clinton that had coincided with the publication of her book Living History (2003) and one TV interview of Robin Williams by James Lipton. No evidence was found for articulatory isolation of English-language interjections. In the Hillary Clinton interviews and Robin Williams interviews, respectively, 71% and 73% of all interjections occurred initially, i.e., at the onset of various units of spoken discourse: at the beginning of turns; at the beginning of articulatory phrases within turns, i.e., after a preceding pause; and at the beginning of a citation within a turn (either Direct Reported Speech [DRS] or what we have designated Hypothetical Speaker Formulation [HSF]. One conventional interjection (OH) occurred most frequently. The Robin Williams interview had a much higher occurrence of interjections, especially nonconventional ones, than the Hillary Clinton interviews had. It is suggested that the onset or initializing role of interjections reflects the temporal priority of the affective and the intuitive over the analytic, grammatical, and cognitive in speech production. Both this temporal priority and the spontaneous and emotional use of interjections are consonant with Wundtrsquos (1900) characterization of the primary interjection as psychologically primitive. The interjection is indeed the purest verbal implementation of conceptual orality.

Keywords conceptual orality, interjection, interview
Berthille Pallaud, “The re-adjustment of word-fragments in spontaneous spoken French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 145-149. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_145.pdf.

Abstract A study of word-fragments in spoken French has been undertaken for a few years on the basis of non directive talks corpora recorded and transcribed according to GARS’ conventions (DELIC currently). These disfluencies are often analyzed within the framework of disfluent repetitions. The observations made on these two types of disfluencies led us to distinguish them. The aim of our study is to describe on the one hand insertions which take place in relation to the word interruptions and their re-adjustment, and on the other hand, to specify the types and localizations of retracing which follow these interruptions. Two kinds of incidental clauses were observed at the time of the readjustments which follow these disturbances. Some, (the more numerous) are syntactically linked to the fragment or with its retracing, others are not. Moreover, the word-fragments which will be modified are the only one to be dependent on the type of localization. For the others, this localization does not make it possible to predict the category of interruption (complemented or unfinished). Our results on word-fragments, confirm however that in contemporary French, the retracing at the head of the nominal or verbal group which contains the disfluency remains the simplest example (at the same time the most frequent, [5]. Nevertheless, a third of the retracing either does not go back to the beginning of the Group, or exceeds it.

Keywords DiSS
Myriam Piccaluga, Jean-Luc Nespoulous, and Bernard Harmegnies, “Disfluencies as a window on cognitive processing. an analysis of silent pauses in simultaneous interpreting,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 151-155. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_151.pdf.

Abstract The paper focuses on silent pauses observed in the productions of subjects involved in simultaneous interpreting tasks. Four bilingual subjects with various degrees of expertise in interpreting and various degrees of mastery of the languages involved (French and Spanish) have been recorded while interpreting utterances of French and Spanish talks. The source discourses had been perturbated by changes both in speech rates (by time compression) and in auditory quality (by addition of a parasiting noise). On the basis of acoustical analyzes performed on the subjects’ productions, statistical analyzes focus both on the number and on the duration of the observed pauses. This double approach enables investigations of the kind of cognitive disturbances caused by the independent variables and allows further speculation on the semiology of the pauses durations.

Keywords DiSS
Melanie Soderstrom, and James L. Morgan, “Disfluency in speech input to infants? The interaction of mother and child to create error-free speech input for language acquisition,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 157-162. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_157.pdf.

Abstract One characteristic of infant-directed speech is that it is highly fluent compared with adult-directed speech. However, the speech that infants hear still contains disfluencies. Such disfluencies might potentially cause problems for infants during language development. We first analyzed samples of spontaneous speech in the presence of infants (both adult- and infant-directed) and found that under ideal circumstances the speech infants hear is highly fluent. Under less than ideal circumstances infants hear much more highly disfluent speech - however this disfluent speech is almost entirely adult-directed. While grammatically ill-formed, the prosodic structure of these disfluencies might signal their ill-formedness to the infants. In a preference experiment, 10 month olds listened longer to infant-directed speech samples containing prosodic disfluencies than to equated samples without disfluency. However, this effect was found in only one of two counterbalancing groups. Using adult ratings of low-pass versions of these speech samples, we found that infants’ preferences were correlated with the adults’ perception of the relative disfluency of the samples. A follow-up experiment using adult-directed disfluencies found that while the 10 month olds showed no differences in their listening preferences, older infants preferred to listen to the fluent speech. These results suggest that younger and older infants attend differently to infant and adult-directed speech, and that older infants may be able to differentiate grammatical adult-directed input from input distorted by disfluency. We discuss implications of these findings for language acquisition.

Keywords DiSS
Ellen Thompson, “A cross-linguistic look at VP-ellipsis and verbal speech errors,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 163-164. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_163.pdf.

Abstract This paper argues that consideration of spontaneous speech errors provides insight into cross-linguistic analyses of syntactic phenomena. In particular, I claim that differences in the distribution of non-parallel VP-Ellipsis constructions in English and German, as well as variation in the spontaneously-occurring verbal speech errors, is explained by a parametric analysis of variation in the inflectional systems of the two languages.

Keywords DiSS
Doroteo T. Toledano, Antonio Moreno Sandoval, José Colás Pasamontes, and Javier Garrido Salas, “Acoustic-phonetic decoding of different types of spontaneous speech in Spanish,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 165-168. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_165.pdf.

Abstract This paper presents preliminary acoustic-phonetic decoding results for Spanish on the spontaneous speech corpus C-ORAL-ROM. These results are compared with results on the read speech corpus ALBAYZIN. We also compare the decoding results obtained with the different types of spontaneous speech in C-ORAL-ROM. As the most important conclusions, the experiments show that the type of spontaneous speech has a deep impact on spontaneous speech recognition results. Best speech recognition results are those obtained on speech captured from the media.

Keywords DiSS
Michiko Watanabe, Yasuharu Den, Keikichi Hirose, and Nobuaki Minematsu, “The effects of filled pauses on native and non-native listeners’ speech processing,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 169-172. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_169.pdf.

Abstract Everyday speech is abundant with disfluencies. However, little is known about their roles in speech communication. We examined the effects of filled pauses at phrase boundaries on native and non-native listeners in Japanese. Study of spontaneous speech corpus showed that filled pauses tended to precede relatively long and complex constituents. We tested the hypothesis that filled pauses biased listeners’ expectation about the upcoming phrase toward a longer and complex one. In the experiment participants were presented with two shapes at one time, one simple and the other compound. Their task was to identify the one that they heard as soon as possible. The speech stimuli involved two factors: complexity and fluency. As the complexity factor, a half of the speech stimuli described compound shapes with long and complex phrases and the other half described simple shapes with short and simple phrases. As the fluency factor phrases describing a shape had a preceding filled pause, a preceding silent pause of the same length, or no preceding pause. The results of the experiments with both native and non-native listeners showed that response times to the complex phrases were significantly shorter after filled or silent pauses than when there was no pause. In contrast, there was no significant difference between the three conditions for the simple phrases, supporting the hypothesis.

Keywords DiSS
Yelena Yasinnik, Stefanie Shattuck-Hufnagel, and Nanette Veilleux, “Gesture marking of disfluencies in spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 173-178. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_173.pdf.

Abstract Speakers effectively use both visual and acoustic cues to convey information in speech. While earlier research has concentrated on the association of visual cues (provided by gestures) with fluent prosodic structure, this study looks at the relationship between visual cues, prosodic markers and spoken disfluencies. Preliminary results suggested that speakers preferentially perform gestures in the eye region in spoken disfluencies, but a more careful frame-by-frame analysis capturing all gestures revealed that movements of the eye region (blinks, frowns, eyebrow raises and changes in direction of eyegaze) occur with high frequency in both fluent and non-fluent speech. The paper describes a method for frame-by-frame labelling of speech- accompanying gestures for a speech sample, whose output can then be combined with independently derived labels of the prosody. Initial analysis of 3 minute samples from two speakers reveals that one speaker produces eye movements in association with disfluencies and the other does not, and that this tendency does not result from alignment of brow gestures with pitch accents.

Keywords DiSS
Yuan Zhao, and Dan Jurafsky, “A preliminary study of Mandarin filled pauses,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 179-182. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_179.pdf.

Abstract The paper reports preliminary results on Mandarin filled pauses (FPs), based on a large speech corpus of Mandarin telephone conversation. We find that Mandarin intensively uses both demonstratives (zhege ’this’, nage ’that’) and uh/ mm as FPs. Demonstratives are more frequent FPs and are more likely to be surrounded by other types of disfluency phenomena than uh/mm, as well as occurring more often in nominal environments. We also find durational differences: FP demonstratives are longer than non-FP demonstratives, and mm is longer than uh. The study also revealed dialectal influence on the use of FPs. Our results agree with earlier work which shows that a language may divide conversational labor among different FPs. Our work also extends this research in suggesting that different languages may assign conversational functions to FPs in different ways.

Keywords DiSS

2004

Jennifer Arnold, Michael K. Tanenhaus, Rebecca Altmann, and Maria Fagnano, “The Old and Thee, uh, New: Disfluency and Reference Resolution,” Psychological Science, vol. 15, no. 9, September 2004, pp. 578-582. DOI: 10.1111/j.0956-7976.2004.00723.x.

Abstract Most research on the rapid mental processes of online language processing has been limited to the study of idealized, fluent utterances. Yet speakers are often disfluent, for example, saying "thee, uh, candle" instead of "the candle." By monitoring listeners’ eye movements to objects in a display, we demonstrated that the fluency of an article ("thee uh" vs. "the") affects how listeners interpret the following noun. With a fluent article, listeners were biased toward an object that had been mentioned previously, but with a disfluent article, they were biased toward an object that had not been mentioned. These biases were apparent as early as lexical information became available, showing that disfluency affects the basic processes of decoding linguistic input.
J. C. Brown, “Eliminating the Segmental Tier: Evidence from Speech Errors,” Journal of Psycholinguistic Research, vol. 33, no. 2, March 2004, pp. 97-101. DOI: 10.1023/B:JOPR.0000017222.24698.73.

Abstract The dominant viewpoint regarding phonologically driven speech errors is that segments are the units responsible behind the errors. The goal of this paper is to illustrate the point that other potential candidates for explaining these speech errors, which have gone largely unnoticed, provide a better explanatory framework for speech errors than do segments. By looking at unambiguous cases and patterns of markedness, it can be shown that there exists good evidence for features and prosodic constituents in speech errors, but never any positive evidence for segments. All of these considerations taken into account together lend strong support to the argument that there is no need for a segmental level of analysis in phonology.

Keywords Phonology, production errors, segments, slips of the tongue
Fernanda Ferreira, and Karl G.D. Bailey, “Disfluencies and human language comprehension,” TRENDS in Cognitive Sciences, vol. 8, no. 5, May 2004, pp. 231-237. DOI: 10.1016/j.tics.2004.03.011.

Abstract Spoken language contains disfluencies, which include editing terms such as uh and um as well as repeats and corrections. In less than ten years the question of how disfluencies are handled by the human sentence comprehension system has gone from virtually ignored to a topic of major interest in computational linguistics and psycholinguistics. We discuss relevant empirical findings and describe a computational model that captures how disfluencies influence parsing and comprehension. The research reviewed shows that the parser, which presumably evolved to handle conversations, deals with disfluencies in a way that is efficient and linguistically principled. The success of this research program reinforces the current trend in cognitive science to view cognitive mechanisms as adaptations to real-world constraints and challenges.
Fernanda Ferreira, Ellen F. Lau, and Karl G.D. Bailey, “Disfluencies, language comprehension, and Tree Adjoining Grammars,” Cognitive Science, vol. 28, no. 5, 2004, pp. 721-749. DOI: 10.1016/j.cogsci.2003.10.006.

Abstract Disfluencies include editing terms such as uh and um as well as repeats and revisions. Little is known about how disfluencies are processed, and there has been next to no research focused on the way that disfluencies affect structure-building operations during comprehension. We review major findings from both computational linguistics and psycholinguistics, and then we summarize the results of our own work which centers on how the parser behaves when it encounters a disfluency. We describe some new research showing that information associated with misarticulated verbs lingers, and which adds to the large body of data on the critical influence of verb argument structures on sentence comprehension. The paper also presents a model of disfluency processing. The parser uses a Tree Adjoining Grammar to build phrase structure. In this approach, filled and unfilled pauses affect the timing of Substitution operations. Repairs and corrections are handled by a mechanism we term "Overlay," which allows the parser to overwrite an undesired tree with the appropriate, correct tree. This model of disfluency processing highlights the need for the parser to sometimes coordinate the mechanisms that perform garden-path reanalysis with those that do disfluency repair. The research program as a whole demonstrates that it is possible to study disfluencies systematically and to learn how the parser handles filler material and mistakes. It also showcases the power of Tree Adjoining Grammars, a formalism developed by Aravind Joshi which has yielded results in many different areas of linguistics and cognitive science.

Keywords disfluencies, parsing, syntax, TAG
Barbara F. Freed, Norman Segalowitz, and Dan P. Dewey, “Context of Learning and Second Language Fluency in French: Comparing Regular Classroom, Study Abroad, and Intensive Domestic Immersion Programs,” Studies in Second Language Acquisition, vol. 26, no. 02, 2004, pp. 275-301. DOI: 10.1017/S0272263104262064. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=214874&fulltextType=RA&fileId=S0272263104262064.

Abstract We compared the acquisition of various dimensions of fluency by 28 students of French studying in three different learning contexts: formal language classrooms in an at home (AH) institution, an intensive summer immersion (IM) program, and a study abroad (SA) setting. For the purpose of oral data collection, students participated in oral interviews (similar to the Oral Proficiency Interview) at the beginning and the end of the semester and provided information regarding language use and interactions. Analyses included comparisons of gain scores as a function of the learning context and as a function of the time reported using French outside of class. The main findings that reached statistical significance include: (a) The IM group made significant gains in oral performance in terms of the total number of words spoken, in length of the longest turn, in rate of speech, and in speech fluidity based on a composite of fluidity measures. When compared to the AH group, the SA group made statistically significant gains only in terms of speech fluidity but fewer gains than the IM group. The AH group made no significant gains. (b) The IM students reported that they spoke and wrote French significantly more hours per week than the other two groups. The SA group reported using English more than French (although the difference was not statistically significant) and reported using significantly more English in out-of-class activities than the IM group. (c) Multiple regression analyses revealed that reported hours per week spent writing outside of class was significantly associated with oral fluidity gains.
Judit Kormos, and Mariann Dénes, “Exploring measures and perceptions of fluency in the speech of second language learners,” System, vol. 32, no. 2, 2004, pp. 145-164. DOI: 10.1016/j.system.2004.01.001.

Abstract The research reported in this paper explores which variables predict native and non-native speaking teachers’ perception of fluency and distinguish fluent from non-fluent L2 learners. In addition to traditional measures of the quality of students’ output such as accuracy and lexical diversity, we investigated speech samples collected from 16 Hungarian L2 learners at two distinct levels of proficiency with the help of computer technology. The two groups of students were compared and their temporal and linguistic measures were correlated with the fluency scores they received from three experienced native and three non-native speaker teacher judges. The teachers’ written comments concerning the students’ performance were also taken into consideration. For all the native and non-native teachers, speech rate, the mean length of utterance, phonation time ratio and the number of stressed words produced per minute were the best predictors of fluency scores. However, the raters differed as regards how much importance they attributed to accuracy, lexical diversity and the mean length of pauses. The number of filled and unfilled pauses and other disfluency phenomena were not found to influence perceptions of fluency.
John Local, “Getting back to prior talk: and-uh (m) as a back-connecting device in British and American English,” in Sound Patterns in Interaction: Cross-linguistic studies from conversation (Typological Studies in Language), Couper-Kuhlen, Elizabeth and Ford, Cecilia E., Ed.Amsterdam, The Netherlands: John Benjamins, 2004, pp. 377-400. DOI: 10.1075/tsl.62.18loc. https://benjamins.com/catalog/tsl.62.18loc.

Abstract Participants in talk-in-interaction understand what is happening by reference to preceding turns at talk. However, some utterances are not closely linked to the turn that immediately precedes them. Here I explore one device that participants may use to display that subsequent talk is not to be treated as cohering with the immediately prior talk, but that it relates to some earlier talk produced by the same speaker. I document some aspects of the sequential organization, strategic deployment, and general phonetic features of and-uh(m) as a device for linking back to a speaker’s own prior talk. I argue that when and-uh(m) is employed in this fashion, it is characterized by a remarkably stable cluster of distinctive characteristics.
Sandra Merlo, and Letı́cia Lessa Mansur, “Descriptive discourse: topic familiarity and disfluencies,” Journal of Communication Disorders, vol. 37, 2004, pp. 489-503. DOI: 10.1016/j.jcomdis.2004.03.002.

Abstract This investigation was undertaken to address questions about topic familiarity and disfluencies during oral descriptive discourse of adult speakers. Participants expressed more attributes when the topic was familiar than when it was unfamiliar. Fillers and lexical pauses were the most frequent disfluencies. The mean duration of each hesitation pause was 776 ms. The sum of hesitation pause durations was well correlated with the number of occurrences. Repetitions, hesitation pauses, and prolongations were shown to have the same role, which was distinct from the role of fillers. The type of analysis conducted in this investigation may be useful in distinguishing between normal and disordered speech production. Learning outcomes: The reader will obtain information about the differences between the number of propositions in familiar and unfamiliar oral descriptions. The reader will also become aware of the distribution of disfluencies in discourse categories employed by the participants in this investigation.

Keywords Descriptive discourse, disfluency, Fluency, Topic familiarity
Daniel O’Connell, and Sabine Kowal, “The History of Research on the Filled Pause as Evidence of ’The Written Language Bias in Linguistics’ (Linell, 1982),” Journal of Psycholinguistic Research, vol. 33, no. 6, 2004, pp. 459-474. DOI: 10.1007/s10936-004-2666-6.

Abstract Erard’s (2004) publication in the New York Times of a journalistic history of the filled pause serves as the occasion for this critical review of the past half-century of research on the filled pause. Historically, the various phonetic realizations or instantiations of the filled pause have been presented with an odd recurrent admixture of the interjection ah. In addition, the filled pause has been consistently associated with both hesitation and disfluency. The present authors hold that such a mandatory association of the filled pause with disfluency is the product of The Written Language Bias in Linguistics [Linell, 1982] and disregards much cogent evidence to the contrary. The implicit prescriptivism of well formedness—a demand derived from literacy—must be rejected; literate well formedness is not a necessary or even typical property of spontaneous spoken discourse; its structures and functions—including those of the filled pause—are very different from those of written language. The recent work of Clark and Fox Tree (2002) holds promise for moving the status of the filled pause not only toward that of a conventional word, but also toward its status as an interjection. This latter development is also being fostered by lexicographers. Nonetheless, in view of ongoing research regarding the disparate privileges of occurrence and functions of filled pauses in comparison with interjections, the present authors are reluctant to categorize the filled pause as an interjection.

Keywords disfluency, filler, hesitation, interjection, orality, spontaneity, word
Daniel O’Connell, Sabine Kowal, and Edward J. Dill, “Dialogicality in TV News Interviews,” Journal of Pragmatics, vol. 36, 2004, pp. 185-205. DOI: 10.1016/j.pragma.2003.06.001.

Abstract Eight TV news interviews, six American, one British, and one German, were analyzed for markers of orality/literacy (back channeling, hesitations, interruptions, contractions and elisions, first-person singular pronominals, interjections, and tag questions). The interviewer/interviewee pairs were: W. Blitzer/B. Clinton; K. Couric/H. Clinton; B. Shaw/B. Bush, /M. Thatcher, /B. Goldwater, and /C. Powell; M. Bashir/Princess Diana; and G. Gaus/H. Arendt. The most evident markers of orality were hesitations (filled pauses, repeats, and false starts) and first-person singular pronominals on the part of interviewees. Across the four interviews of B. Shaw, there were notable differences in style for both interviewer and interviewees. The women participants used interjections and tag questions more frequently than the men and were interrupted more often by the men. The results are interpreted in light of a dialogical theory of intersubjectivity.

Keywords Dialogicality, Discourse markers, Informality, Intersubjectivity, orality, TV news interviews
Johanna Rendle-Short, “Showing structure: Using um in the academic seminar,” Pragmatics, vol. 14, no. 4, January 2004, pp. 479-498. DOI: https://doi.org/10.1075/prag.14.4.04ren. https://www.jbe-platform.com/content/journals/10.1075/prag.14.4.04ren.

Abstract Um and uh are generally considered to be indicative of dysfluency and uncertainty in speech production. However, analysis of the academic seminar indicates that the distribution of um and uh is not random. In specific well-defined environments um is used to indicate the underlying structure of the talk. Although Swerts (1998) has already suggested that fillers such as um and uh could be treated as discourse markers in Dutch, the notion that such tokens are functioning as discourse markers has not been developed in detail. This paper analyses the role played by um in a series of computer science seminars. Using traditional conversation analysis techniques, the paper focuses on the way in which um indicates structure in the academic seminar by maintaining coherence across bits of talk. It thus argues that in specific well-defined environments um functions as a discourse marker. This paper therefore addresses such issues as the role and function of um in seminar talk, the environments in which it occurs, and its use in indicating the structure of the talk to the listening audience.

Keywords Institutional talk; Um; Academic monologue; Repair; Uh; Discourse markers
Norman Segalowitz, and Barbara F. Freed, “Context, Contact, and Cognition in Oral Fluency Acquisition: Learning Spanish in At Home and Study Abroad Contexts,” Studies in Second Language Acquisition, vol. 26, no. 02, 2004, pp. 173-199. DOI: 10.1017/s0272263104262027. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=214862&fulltextType=RA&fileId=S0272263104262027.

Abstract This study investigates the role of context of learning in second language (L2) acquisition. Participants were 40 native speakers of English studying Spanish for one semester in one of two different learning contexts—a formal classroom at a home university (AH) and a study abroad (SA) setting. The research looks at various indexes of oral performance gains—particularly gains in oral fluency as measured by temporal and hesitation phenomena and gains in oral proficiency based on the Oral Proficiency Interview (OPI). The study also examines the relation these oral gains bore to L2-specific cognitive measures of speed of lexical access (word recognition), efficiency (automaticity) of lexical access, and speed and efficiency of attention control hypothesized to underlie oral performance. The learners also provided estimates of the number of hours they spent in extracurricular language-contact activities. The results show that in some respects learners in the SA context made greater gains, both in terms of temporal and hesitation phenomena and in oral proficiency as measured by the OPI, than learners in the AH context. There were also, however, significant interaction effects and correlational patterns indicating complex relationships between oral proficiency, cognitive abilities, and language contact. The results demonstrate the importance of the dynamic interactions that exist among oral, cognitive, and contextual variables. Such interactions may help explain the enormous individual variation one sees in learning outcomes, and they underscore the importance of studying such variables together rather than in isolation.
Segalowitz,Sidney J., and Lane,Korri, “Perceptual fluency and lexical access for function versus content words,” Behavioral and Brain Sciences, vol. 27, 4 2004, pp. 307–308. DOI: 10.1017/S0140525X04310071. http://journals.cambridge.org/article_S0140525X04310071.

Abstract By examining single-word reading times (in full sentences read for meaning), we show that (1) function words are accessed faster than content words, independent of perceptual characteristics; (2) previous failures to show this involved problems of frequency range and task used; and (3) these differences in lexical access are related to perceptual fluency. We relate these findings to issues in the literature on event-related potentials (ERPs) and neurolinguistics.
Chung-Hsien Wu, and Gwo-Lang Yan, “Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition,” Journal of VLSI Signal Processing, vol. 36, no. 2-3, 2004, pp. 91-104. DOI: 10.1023/B:VLSI.0000015089.17975.f4.

Abstract Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses "ah," "ung," "um," "em," and "hem" in spontaneous speech. Karhunen-Loéve transform (KLT) and linear discriminant analysis (LDA) were adopted to select discriminant features for filled pause detection. In order to suitably determine the number of discriminant features, Bartlett hypothesis testing was adopted. Twenty-six features were selected using Bartlett hypothesis testing. Gaussian mixture models (GMMs), trained with a gradient decent algorithm, were used to improve the filled pause detection performance. The experimental results show that the filled pause detection rates using KLT and LDA were 84.4% and 86.8%, respectively. A significant improvement was obtained in the filled pause detection rate using the discriminative GMM with KLT and LDA. In addition, the LDA features outperformed the KLT features in the detection of filled pauses.

2003

Martine Adda-Decker, Benoît Habert, Claude Barras, Gilles Adda, Philippe Boula de Mareuil, and Patrick Paroubek, “A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 67-70. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_067.pdf.

Abstract The aim of this study is to elaborate a disfluent speech model by comparing different types of audio iranscripts. The study makes use of 10 hours of French radio interview archives, involving journalists and personalities from political or civil society. A first type of transcripts is press-oriented where most disfluencies are discarded. For 10% of the corpus, we produced exact audio transcripts: all audible phenomena and overlapping speech segments are transcribed manually. In these iranscripts about 14% of the words correspond to disfluencies and discourse markers. The audio corpus has then been iranscribed using the LIMSI speech recognizer. With 8% of the corpus the disfluency words explain 12% of the overall error rate. This shows that disfluencies have no major effect on neighboring speech segments. Restarts are the most error prone, with a 36.9% within class error rate.

Keywords DiSS
Jennifer Arnold, Maria Fagnano, and Michael K. Tanenhaus, “Disfluencies Signal Theee, Um, New Information,” Journal of Psycholinguistic Research, vol. 32, no. 1, January 2003, pp. 25-36. DOI: 10.1023/A:1021980931292.

Abstract Speakers are often disfluent, for example, saying "theee uh candle" instead of "the candle." Production data show that disfluencies occur more often during references to things that are discourse-new, rather than given. An eyetracking experiment shows that this correlation between disfluency and discourse status affects speech comprehension. Subjects viewed scenes containing four objects, including two cohort competitors (e.g., camel, candle), and followed spoken instructions to move the objects. The first instruction established one cohort as discourse-given; the other was discourse-new. The second instruction was either fluent or disfluent, and referred to either the given or new cohort. Fluent instructions led to more initial fixations on the given cohort object (replicating Dahan et al., 2002). By contrast, disfluent instructions resulted in more fixations on the new cohort. This shows that discourse-new information can be accessible under some circumstances. More generally, it suggests that disfluency affects core language comprehension processes.

Keywords disfluency, information status, language processing, reference comprehension
Matthew P. Aylett, “Disfluency and speech recognition profile factors,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 51-54. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_051.pdf.

Abstract This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson & Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.

Keywords DiSS
Karl G.D. Bailey, and Fernanda Ferreira, “Disfluencies affect the parsing of garden-path sentences,” Journal of Memory and Language, vol. 49, no. 2, 2003, pp. 183-200. DOI: 10.1016/S0749-596X(03)00027-5.

Abstract Spontaneous speech differs in several ways from the sentences often studied in psycholinguistics experiments. One important difference is that naturally produced utterances often contain disfluencies. In this study, we examined how the presence of “uh” in a spoken sentence might affect processes that assign syntactic structure (i.e., parsing). Four experiments are reported. In the first, participants judged the grammaticality of sentences that had disfluencies either right before the head noun of the ambiguous phrase or after (e.g., Sandra bumped into the busboy and the uh uh waiter told her to be careful or Sandra bumped into the busboy and the waiter uh uh told her to be careful). Sentences in the latter condition were judged grammatical less often. This result was replicated in the second experiment, in which disfluencies were replaced with environmental sounds. These findings suggest that interruptions can affect syntactic parsing, and the content of the interruption need not be speechlike. In Experiments 3 and 4 we tested whether these effects occurred because listeners use interruptions as cues to help resolve a structural ambiguity. Results from these latter two grammaticality judgment tasks suggest that when an interruption occurs before an ambiguous noun phrase, comprehenders are more likely to assume that the noun phrase is the subject of a new clause rather than the object of an old one, and furthermore, it appears that the parser is relatively insensitive to the form of the interruption. We conclude that disfluencies can influence the parser by signaling a particular structure; at the same time, for the parser, a disfluency might be any interruption to the flow of speech.
Alan Bell, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, and Daniel Gildea, “Effects of disfluencies, predictability, and utterance position on word form variation in English conversation,” Journal of the Acoustical Society of America, vol. 113, no. 2, February 2003, pp. 1001-1024. DOI: 10.1121/1.1534836.

Abstract Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., ði, ðæt, ænd, ʌv) or a more reduced or lenited pronunciation (e.g., ðə, ðit, n, ə). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.
Ramona Benkenstein, and Adrian P. Simpson, “Phonetic correlates of self-repair involving word repetition in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 81-84. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_081.pdf.

Abstract A phonetic description of self-initiated self-repair sequences involving the repetition of words in German spontaneous speech is presented. Data are drawn from the Kiel Corpus of Spontaneous Speech. The description is primarily impressionistic auditory, but it also employs acoustic records to verify and objectify the impressionistic findings. A number of different patterns around cut-off are identified. The comparison of phonetic differences between reparandum and repair tokens is used to argue that repair sequences can also provide an interesting insight into the way in which fluent stretches of spontaneous speech are phonetically organized.

Keywords DiSS
Martin Corley, and Robert Hartsuiker, “Hesitation in speech can... um... help a listener understand,” in Proceedings of the twenty-fifth meeting of the Cognitive Science Society, Erlbaum, August 2003, pp. 276-281. http://csjarchive.cogsci.rpi.edu/proceedings/2003/mac/prof70.html.

Abstract This paper investigates the effect of disuencies on listeners’ on-line processing of speech. More specifically, it tests the hypothesis that filled pauses like um, which tend to occur before words that are low in accessibility, act as a signal to the listener that a relatively inaccessible word is about to be produced. Two experiments are reported, in which participants followed recorded instructions to press buttons corresponding to images on a computer screen. In 50% of trials, the spoken name of the image was preceded by um. In experiment 1, the intrinsic accessibility of the target items was manipulated (by means of lexical frequency); in experiment 2, the extrinsic (visual) accessibility varied. Both experiments demonstrated that participants were quicker to respond when a target was preceded by um, regardless of whether the item referred to was difficult to access or not. In addition, in experiment 2 there was a weak interaction between accessibility and presence or absence of an um. We present the data here as early evidence that listeners can benefit from disfluencies in others’ speech, and outline some methodological and theoretical considerations and further experiments.
Yasuharu Den, “Some strategies in prolonging speech segments in spontaneous Japanese,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 87-90. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_087.pdf.

Abstract In this paper, we investigate segmental prolongation in a corpus of spontaneous Japanese monologues consisting of over 700,000 words. We examine effects on the rate of prolongation of various factors including speech types, the genders of speakers, word classes, word positions in the phrase and in the inter-pausal unit, and the presence of preceding fillers. Based on the empirical findings, we state some sirategies in prolonging speech segments used by Japanese speakers.

Keywords DiSS
Sheena Finlayson, Victoria Forrest, Robin Lickley, and Janet Mackenzie Beck, “Effects of the restriction of hand gestures on disfluency,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 21-24. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_021.pdf.

Abstract This paper describes an experimental pilot study of disfluency and gesture rates in spontaneous speech where speakers perform a communication task in three conditions: hands free, one arm immobilized, both arms immobilized. Previous work suggests that the restriction of the ability to gesture can have an impact on the fluency of speech. In particular, it has been found that the inability to produce iconic gestures, which depict actions and objects, results in a higher rate of disfluency. Models of speech production account for this by suggesting that gesture and speech production are part of the same integrated system. Such models differ in their interpretation of the location of the gesture planning mechanism in relation to the speech model: some authors suggest that iconic gestures relate closely to lexical access, while others suggest that the link is located around the conceptualization stage. The findings of this study tentatively confirm that there is a relationship beiween gesture and fluency - overall, disfluency increases as gesture is restricted. But it remains unclear whether the disfluency is more related to lexical access than to conceptualization. Proposals for a larger study are suggested. The work is of interest to psycholinguists focusing on the integration of gesture into models of speech production and to Speech and Language Therapists who need to know about the impact that an impaired ability to produce gestures may have on communication.

Keywords DiSS
Kotaro Funakoshi, and Takenobu Tokunaga, “Evaluation of a robust parser for spoken Japanese,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 55-58. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_055.pdf.

Abstract We implemented a parser designed to handle ill-formedness in Japanese speech. The parser was evaluated by utilizing newly collected speech data, which was obtained from an experiment designed to produce ill-formed data effectively. Introducing the proposed method increased the number of correctly analyzed utterances from 171 to 322, from among 532 utterances in the corpus.

Keywords DiSS
Robert J. Hartsuiker, Martin Corley, Robin Lickley, and Melanie Russell, “Perception of disfluency in people who stutter and people who do not stutter: Results from magnitude estimation,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 35-37. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_035.pdf.

Abstract Recent accounts of stuttering consider disfluencies the result of an interaction between speech planning and self- monitoring, emphasizing the continuity beiween errors made in everyday speech and those made by people who stutter. On Vasi9 & Wijnen’s account, the monitor is hypervigilant for upcoming problems and interrupts and restarts the speech signal, resulting in disfluent speech. Crucially, on this account, self-monitoring is a perceptual function. Therefore, this account makes iwo predictions (1) people who stutter are also hypervigilant in perceiving another person’s speech. (2) the quality of disfluencies made by people who stutter and those who do not will be comparable. We tested these hypotheses using a magnitude estimation judgment task. Twenty participants who stutter and 20 conirols were asked to rate the fluency of excerpted fluent and disfluent fragments from recorded dialogues, either between people who stutter or beiween non-stutterers. In line with the first hypothesis, people who stutter tended to rate all fragments as more disfluent than controls did. However the second hypothesis was not confirmed: across judges, fluent and disfluent fragments excerpted from recordings of people who stutter were rated as less fluent than those excerpted from conirol dialogues, suggesting that there are perceptually relevant differences between the speech of PWS and PWDNS, independent of number and type of disfluencies.

Keywords DiSS
Sandrine Henry, and Berthille Pallaud, “Word fragments and repeats in spontaneous spoken French,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 77-80. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_077.pdf.

Abstract This paper presents the results of a study conducted on the interaction of two disfluencies: repeats and word fragments. It is based on 150 repeated word fragments (e.g., "on le re- re- revendique encore une fois") extracted from a one-million-word corpus of spoken French. Word fragments such as: "notre metier spé- spécifique", are, like repeats (e.g., "vous avez évalué le le montant des dégâts"), very frequent events in spoken language: on average, there is 1 word fragment every 50 seconds, 1 repeat every 17 seconds. Speakers and listeners alike are generally unaware of these phenomena as if they were not part of the communication process. They seldom trigger a metalinguistic reaction from the speaker and are even more rarely acknowledged by the listener. These phenomena have sometimes been interpreted as ’errors’ in the communication process, like slips of the tongue. Word fragments and repeats encompass different categories of phenomena, and this enables us to define them as an heterogeneous group ruled by different types of constraints and mechanisms.2 This analysis rests on the following criteria: structural aspects of the repeat, types of word fragments, morphological and syntactic aspects. Analyses of these repeated of identical word fragments from two different angles - that of the repeats and then that of the word fragments - confirm the relevance of the distinction beiween these two types of disfluencies.

Keywords DiSS
Peter Howell, “Is a perceptual monitor needed to explain how speech errors are repaired?,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 31-34. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_031.pdf.

Abstract Kolk & Postma [2] proposed, following Dell & O’Seaghdha [1], that when a speaker chooses a word, phonologically-related words as well as the intended word are activated. Initially, the activations of all these words are similar, though eventually the intended word reaches a higher asymptotic value when activation is complete [1]. According to Kolk & Postma [2], if a response is made in the phase where activation is building up (rather than at full activation), there is a higher chance of the competing, rather than the intended, word being selected (i.e. an error). They propose that a speaker detects such errors when they are produced overtly using the perceptual system, and a monitor in the linguistic system responds by interrupting and initiating the correction [2]. Word repetition and hesitation (not errors in themselves) have been regarded as signifying underlying errors that are detected and interrupted before speech is output in a similar way to overt errors. An assumption in [2] is that activation for a word stops (or, if it continues, is ignored) immediately a candidate word is selected. The brain processes responsible for speech production have massive parallel capacity. Consequently, activation for all the candidates for a word slot could continue beyond the point where a word is selected in cases where a word is responded to prematurely. when the selected word reaches asymptote, the relative activations of this and the other candidate words indicate when an error has occurred (when the selected word has a lower activation than one of the competing words), and what correction is appropriate (the word with the highest activation). This provides the basis for error detection and correction without the need for a perceptual monitor. Continuing the buildup of activation after a word has been selected, implies that activation of nearby words in its phrase overlaps. It is shown, with some realistic assumptions about how activation builds up and decays across different words in a phrase, that this model predicts word repetition and hesitation and also part-word disfluencies (a characteristic of stuttering), again without the need for a perceptual monitor.

Keywords DiSS
Kim Kirsner, John Dunn, and Kathryn Hird, “Fluency: Time for a Paradigm Shift,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_013.pdf.

Abstract Pauses in spontaneous speaking constitute a rich source of data for several disciplines. They have been used to enhance automatic segmentation of speech, classification of patients with acquired communication disorders, the design of psycholinguistic models of speaking, and the analysis of psychological disorders. Unfortunately, however, although pause analysis has been with us for more than 40 years, their interpretation has been compromised by several problems [1]. The first problem is that the pause distribution is skewed, making mean duration a poor measure of central tendency. The second problem is that there are at least two components to the pause duration distribution, a problem that has been confounded by the fact that most authors have assumed that short pauses can be ignored. The third problem is that many scholars have used an arbitrary criterion to separate the pause components thereby adopting statistics that reflect errors of commission or omission. In this paper we review recent work that resolves each of these issues and illustrates the application of the new paradigm to a variety of problems. Our research indicates that, first, there are at least two pause duration distribufl’ons, each of which may be sensitive to theoretically interesting variables; second, the distributions are log-normal, thereby opening the way to appropriate measures of central tendency and dispersion, and, third, the distributions can be reliably separated by application of signal detection theory, and the proportion of misclassifications minimised and estimated. This paper reviews recent research using the new approach to pause analysis.

Keywords DiSS
Koji Kitayama, Masataka Goto, Katunobu Itou, and Tetsunori Kobayashi, “Speech Spotter: New Speech Interface Capable of Invoking Speech Recognition Functions during Human-Human Conversation,” in Proceedings of Workshop on Interactive Systems and Software, 2003, pp. 9-18. http://www.wiss.org/WISS2003/program.html.

Abstract In this paper, we propose a novel speech interface function, called "Speech Spotter", which enables a user to enter voice commands into a speech recognizer during natural human-human conversation. Only when a user utters a filled pause (a vowel-lengthening hesitation like "er...") and then utters a voice command with a high pitch, its voice command is accepted by the speech recognizer. Thus the Speech Spotter function makes full use of nonverbal information of human voice: a filled pause and the voice pitch of an utterance. By using the Speech Spotter function, we built two application systems: "on-demand human-human conversation support system" and "a telephone system with BGM-playback function". The results of using these systems showed that the Speech Spotter function is robust and convenient enough to be used in daily human-human conversation at a site or over a cellular phone.
Göran Kjellmer, “Hesitation. In Defence of ER and ERM,” English Studies, vol. 84, no. 2, 2003, pp. 170-198. DOI: 10.1076/enst.84.2.170.14903.

Abstract Speech differs in a number of ways from writing. How great the differences are has only been fully realised when detailed comparisons were made possible by the publication of large corpora that were partly or wholly based on the spoken language. While the two media, speech and writing, necessarily have large sections in common, it is true to say that they often use widely differing means of conveying information. The means that are specific to speech were long either neglected or ignored by researchers, so that the description of individual languages was formerly based mainly on their written manifestations. One characteristic of speech is its frequent indication of hesitation or uncertainty. The means by which it is expressed range from nonlinguistic, such as gestures, facial expressions and bodily movements to linguistic, such as repetitions. Another linguistic hesitation marker is the pause, whether silent or filled. This feature can now be studied by means of modern corpora.
Torbjörn Lager, “In dialogue with a desktop calculator: A concurrent stream processing approach to building simple conversational agents,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 59-62. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_059.pdf.

Abstract Human spontaneous face-to-face conversations are characterized by phenomena such as turn-taking, feedback, sounds of hesitation and repairs. A simple and highly modular stream-based approach to natural language processing is proposed that attempts to deal with such things. A basic version of the model has been implemented in the Oz programming language.

Keywords DiSS
Piroska Lendvai, Antal van den Bosch, and Emiel Krahmer, “Memory-based disfluency chunking,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 63-66. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_063.pdf.

Abstract We investigate the feasibility of machine learning in automatic detection of disfluencies in a large syntactically annotated corpus of spontaneous spoken Dutch. We define disfluencies as chunks that do not fit under the syntactic iree of a sentence (including fragmented words, laughter, self-corrections, repetitions, abandoned constituents, hesitations and filled pauses). we use a memory-based learning algorithm for detecting disfluent chunks, on the basis of a relatively small set of low-level features, keeping track of the local context of the focus word and of potential overlaps between words in this context. We use attenuation to deal with sparse data and show that this leads to a slight improvement of the results and more efficient experiments. We perform a search for the optimal settings of the learning algorithm, which yields an accuracy of 97% and an F-score of 80%. This is a significant improvement of the baselines and of the results obtained with the default settings of the learner.

Keywords DiSS
Krisztina Menyhárt, “Age-dependent types and frequency of disfluencies,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 45-48. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_045.pdf.

Abstract The age-dependent changes of one’s speech production from childhood up to old age are relatively well known. However, there has been less research conducted concerning the possible alterations of the disfluency phenomena in speakers’ spontaneous speech determined by age. Our hypothesis is that permanent changes are going on in the operation of speech production processes from early childhood up to old age, and that those changes can be studied via observing disfluency phenomena. A series of experiments has been carried out with the participation of altogether 30 Hungarian-speaking persons, children, midle-aged adults and old subjects (ages of 77). Their spontaneous speech was recorded and analyzed concerning the articulation and speech tempi, silent and filled pauses, as well as other disfluency phenomena (like false starts, repetitions, slips, etc.). The aim of the research is to explore the invariant and variable factors of the disfluencies depending on age. The results highlight also the individual differences that seem to be independent of the age factor.

Keywords DiSS
Hannele Nicholson, Ellen Gurman Bard, Rohin Lickley, Anne H. Anderson, Jim Mullin, David Kenicer, and Lucy Smallwood, “The intentionality of disfluency: Findings from feedback and timing,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_017.pdf.

Abstract This paper addresses the causes of disfluency. Disfluency has been described as a strategic device for intentionally signalling to an interlocutor that the speaker is committed to an utterance under construction. It is also described as an automatic effect of cognitive burdens, particularly of managing speech production during other tasks. To assess these claims, we used a version of the map task and tested 24 normal adult subjects in a baseline untimed monologue condition against conditions adding either feedback in the form of an indication of a supposed listener’s gaze, or time-pressure, or both. Both feedback and time-pressure affected the nature of the speaker’s performance overall. Disfluency rate increased when feedback was available, as the strategic view predicts, but only deletion disfluencies showed a significant effect of this manipulation. Both the nature of the deletion disfluencies in the current task and of the information which the speaker would need to acquire in order to use them appropriately suggest ways of refining the strategic view of disfluency.

Keywords DiSS
Sieb G. Nooteboom, “Self-monitoring is the main cause of lexical bias in phonological speech errors,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 27-30. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_027.pdf.

Abstract In this paper I present new evidence, stemming both from an experiment and from spontaneous speech, demonstrating that (a) lexical bias is caused by self-monitoring of inner speech, as proposed by Levelt et al. [1], and (b) that there is phoneme-to-word feedback in the mental programming of speech, as supposed by Dell [2] and Stemberger [3]. It is argued here that possibly phoneme-to-word feedback is an unavoidable side-effect of self-monitoring of inner speech.

Keywords DiSS
Caroline L. Rieger, “Disfluencies and hesitation strategies in oral L2 tests,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 41-44. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_041.pdf.

Abstract This paper presents an investigation of hesitation strategies of intermediate learners of German as a second or foreign language (L2) when they take part in oral L2 tests. Previous studies of L2 hesitation strategies have focused on beginning and advanced L2 learners. They found that beginners tend to leave their hesitation pauses unfilled making their speech highly disfluent [17], while advanced L2 speakers - similar to native speakers - use a variety of fillers. In oral L2 tests, intermediate learners hesitate mainly for two reasons: to search for a German word or structure, or to think about the content of their utterance. Some participants use a variety of strategies to signal to the addressee that they are hesitating. This variety is not as rich as it is for advanced L2 learners or native speakers. Other participants leave their hesitation pauses unfilled or rely on quasi-lexical fillers to hold the floor when hesitating.

Keywords DiSS
Guergana Savova, and Joan Bachenko, “Prosodic features of four types of disfluencies,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 91-94. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_091.pdf.

Abstract We present a corpus-based approach for using intonation and duration to detect disfluency sites. The questions we aim to answer are: what are the prosodic cues for each disfluency type? Can predictive models be built to describe the relationship between disfluency types and prosodic cues? Are there correlations beiween the reparandum onset and offset and the repair onset and offset? Is there a general prosodic strategy? Our findings support four main hypotheses: 1) The Combination Rule: A single prosodic feature does not uniquely identify disfluencies or their types. Rather, it is a combination of several features that signals each type. 2) The Compensatory Rule: If there is an overlap of one prosodic feature, then another cue neutralizes the overlap. 3) The Discourse Type Rule: Prosodic cues for disfluencies vary according to discourse type. 4) The Expanded Reset Rule: Repair onsets are dependent on reparandum onsets and reparandum offsets. The limitation of the current study is the relatively small corpus size. Further testing of our proposed hypotheses is needed.

Keywords DiSS
Shu-Chuan Tseng, “Repairs and repetitions in spontaneous Mandarin,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 73-76. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_073.pdf.

Abstract 246 overt repairs, 653 complete repetitions and 475 partial repetitions were identified in an annotated corpus of spontaneous Mandarin conversations. On the basis of the data, this paper investigates Mandarin repairs and repetitions by segmenting them into the reparandum part, the editing part and the reparans part and by tagging them using the CKIP automatic word segmentation and tagging system. Results of the use of editing term, the distribution of part of speech and syllables in the reparandum are presented. Semantic differences and similarity in the discrepancy of tagging results of the reparandum and the reparans are also discussed.

Keywords DiSS
Fan Yang, Peter A. Heeman, and Susan E. Strayer, “Acoustically verifying speech repair annotations,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 97-100. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_097.pdf.

Abstract Identifying speech repairs is a critical part of annotating spontaneous speech. DialogueView is an annotation tool that provides visual and audio supports for directly annotating speech repairs. In this paper, we report the usability of clean play, a special feature implemented in DialogueView, which cuts out the annotated reparanda and editing terms and plays the remaining speech. We find that although clean play does not help users detect repairs, it does help them determine the extent of repairs. We also find that clean play improves users’ confidence because they have another way to verify their annotations.

Keywords DiSS

2002

Jennifer Arnold, Maria Fagnano, and Michael K. Tanenhaus, “Disfluencies signal theee, um, new stuff: Immediate use of disfluencies during reference comprehension,” in 15th CUNY Conference on Human Sentence Processing, New York, NY, 2002. http://qcpages.qc.cuny.edu/~efernand/CUNY2002/program/absts/021.htm.

Abstract Spontaneous speech is rarely fluent, resulting in hesitations, fillers ("um" / "uh"), repeated or repaired words, or pronouncing "the" as /thiy/ (Fox Tree & Clark, 1997). Yet these features are generally considered to not affect the core processes of language comprehension. While disfluencies have been argued to signal that the speaker is having difficulty (Clark & Wasow, 1998; Fox Tree & Clark, 1997), this metalinguistic knowledge has not been linked to specific language comprehension phenomena. A corpus analysis showed that speakers are disfluent more often when referring to entities that are new (rather than given) in the discourse. If listeners are sensitive to this correlation, disfluencies at the start of a noun phrase should lead them to focus on objects that are visible but have not yet been mentioned. Eye movements of 24 native speakers of English were recorded as they listened to pairs of instructions to move objects on a computer screen (Table 1). Each display contained 4 colored pictures (Rossion & Purtois, 2001), including two cohorts (e.g., camel/candle). We investigated the time course of referent identification for the first noun in the second instruction, manipulating whether: 1) the critical NP was fluent (the camel) or disfluent (thiy, uh, camel), and 2) the referent was discourse-new, or was given but unfocused, having just been mentioned as the goal of the first instruction. All NPs were accented. Disfluent NPs should lead to faster target looks in the new condition, and increased cohort competition in the given condition. By contrast, fluent, accented NPs provide an initial bias toward the given but nonfocused object (Dahan et al., in press), so we expected fluent NPs to lead to faster target looks in the given condition and more cohort competition in the new condition. Results showed precisely this interaction, beginning 200 msec after the onset of the head noun ("ca-"). Prior to the noun, there was also a preference for new objects in the disfluent condition and given objects in the fluent condition, emerging 200 msec after the determiner (the/thiy), which provided the first information about fluency. Thus, comprehenders immediately use information provided by disfluencies. This may stem from use of purely distributional information about disfluencies and discourse status, or may result from inferring that the speaker is having difficulty in lexical retrieval (which would be less likely for a just-mentioned referent). Regardless, information about fluency affects the earliest moments of reference resolution. Table 1: Sample instructions (target NP is underlined) Given (Discourse-Old) Context: Put the grapes below the candle. Discourse-new Context: Put the grapes below the camel. a. fluent (accented): Now put the candle below the salt shaker. b. disfluent: Now put thiy, uh, CANDLE below the salt shaker.
Thomas Berg, “Slips of the typewriter key,” Applied Psycholinguistics, vol. 23, no. 2, 2002, pp. 185-207. DOI: 10.1017/s0142716402002023. http://journals.cambridge.org/article_S0142716402002023.

Abstract This article presents an analysis of 500 submorphemic slips of the typewriter key that escaped the notice of authors and other proofreaders and thereby made their way into the published records of scientific research. Despite this high selectivity, the corpus is not found to differ in major ways from other collections of keying slips. The main characteristics of this error type include a predominance of within-word slips, an elevated rate of noncontextual slips, a heightened incidence of omissions (in particular, masking errors), a high number of adjacent switches, and an uncommonness of these slips in word edges. In all these respects, slips of the key resemble slips of the pen, although not slips of the tongue. It is argued that speech errors are shaped by a fully deployed structural representation, whereas key slips arise under the influence of a weak structural representation. By implication, speaking is characterized by a hierarchical strategy of activation while typewriting is subject to the so-called staircase strategy of serialization in which activation is a function of linear distance. These disparate strategies may be understood as a response of the processing system to disparate requirements, such as varying speed of execution.
Herbert Clark, and Jean E. Fox Tree, “Using uh and um in spontaneous speaking,” Cognition, vol. 84, no. 1, May 2002, pp. 73-111. DOI: 10.1016/S0010-0277(02)00017-3.

Abstract The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in "and-uh"), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.

Keywords conversation, Dialogue, disfluencies, Language production, spontaneous speech, uh, um
Catia Cucchiarini, Helmer Strik, and Lou Boves, “Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech,” Journal of the Acoustical Society of America, vol. 111, no. 6, June 2002, pp. 2862-2873. DOI: 10.1121/1.1471894.

Abstract This paper describes two experiments aimed at exploring the relationship between objective properties of speech and perceived fluency in read and spontaneous speech. The aim is to determine whether such quantitative measures can be used to develop objective fluency tests. Fragments of read speech (Experiment 1) of 60 non-native speakers of Dutch and of spontaneous speech (Experiment 2) of another group of 57 non-native speakers of Dutch were scored for fluency by human raters and were analyzed by means of a continuous speech recognizer to calculate a number of objective measures of speech quality known to be related to perceived fluency. The results show that the objective measures investigated in this study can be employed to predict fluency ratings, but the predictive power of such measures is stronger for read speech than for spontaneous speech. Moreover, the adequacy of the variables to be employed appears to be dependent on the specific type of speech material investigated and the specific task performed by the speaker.
Jean E. Fox Tree, “Interpreting pauses and ums at turn exchanges,” Discourse Processes, vol. 34, no. 1, 2002, pp. 37-55. DOI: 10.1207/S15326950DP3401_2.

Abstract In 3 experiments, this article compares how overhearers interpreted second speakers’ contributions to a conversation depending on whether the second speaker responded to a first speaker immediately; paused and responded; said um and responded; or said um, paused, and then responded. The conversational snippets tested were unscripted and diverse; an example of one exchange is, "Are you here because of affirmative action?" (pause, um, or both) "It helped me out a little bit." Overhearers thought speakers had more production difficulty, were less honest, and were less comfortable with topics under discussion when speakers either said um or paused, and even more so with both. The best explanation for the data is that overhearers are judging, for each question asked, what it means for speakers to produce an anticipated or an unanticipated delay.
Yoko Kato Nakai, “Topic Shifting Devices Used by Supporting Participants in Native/Native and Native/Non- Native Japanese Conversations,” Japanese Language and Literature, vol. 36, no. 1, April 2002, pp. 1-25. DOI: 10.2307/3250876.

Abstract In this paper, I analyzed differences in the devices used by native and nonnative supporting participants in topic openings and closings in Japanese face-to-face conversations. My analysis builds on previous research on conversational units and topic-shifting devices in Japanese conversations (Hayashi 1960; Minami 1972, 1983, 1993; Ichikawa 1978; Sugito and Sawaki 1979; Noda 1981, 1990; Ikuta 1983; Sugito 1983, 1987; Jorden with Noda 1987; Sakuma 1987, 1990, 1992; Szatrowski 1986a, 1986b, 1987, 1991, 1993, 1997, 1998; Imaishi 1992; Sakuma and Suzuki 1993; Suzuki 1994, 1995; Karatsu 1995; Emmett 1996, 1998; Okada 1996; Sasaki 1996, 1998; Kato 1999), analyses of topic-shifting devices in English conversations (Garfinkel and Sacks 1970; Reichman 1978; Derber 1979; Goodwin 1981; Long 1981; Levinson 1983; Chafe 1987; Goodwin and Goodwin 1992; Sacks 1992; Geluykens 1993), and contrastive analyses of topic-shifting strategies in English and Japanese conversation (Maynard 1989; Yamada 1992; Watanabe 1993). I demonstrate that the non-native supporting participants in my data used fewer devices such as discourse developing connectives (e.g., demo ’but’, ja ’so [in that case]’, etc.) and the extended predicate (Jorden with Noda 1987) to indicate the relation of their utterances to the context in topic openings than Japanese native supporting participants did. Non-native supporting participants also tended to use more aizuchi ’backchannel utterances’ in topic closings than did native supporting participants, who combined aizuchi with a variety of other devices such as fragments, assessments, summary utterances, direct style, final particles, prolonged vowels, overlap, repetition, and co-construction.
Miguel Oliveira, “The Role of Pause Occurrence and Pause Duration in the Signaling of Narrative Structure,” in PorTAL ’02 Proceedings of the Third International Conference on Advances in Natural Language Processing, Springer-Verlag, 2002, pp. 43-52. http://dl.acm.org/citation.cfm?id=646963.712274.

Abstract This paper addresses the prosodic feature of pause and its distribution in spontaneous narrative in relation to the role it plays in signaling narrative structure. Pause duration and pause occurrence were taken as variables for the present analysis. The results indicate that both variables consistently mark narrative section boundaries, suggesting thus that pause is a very important structuring device in oral narratives.
Michiko Watanabe, “Fillers as Indicators of Discourse Segment Boundaries in Japanese Monologues,” in Proceedings of Speech Prosody 2002, 2002. http://aune.lpl.univ-aix.fr/sp2002/papers.htm.

Abstract We investigated distribution of fillers (filled pauses) in the vicinity of boundaries of different strengths in Japanese monologues, to understand whether fillers may convey information about the location and the strength of boundaries. Consistent with the results of studies on Dutch monologues, fillers tend to increase as the boundary strength grows. It has also been revealed that fillers tend to occur phrase-initially, more strongly at deeper boundaries than at shallower ones. Regarding filler types, the frequency of eto grows most sharply as boundary strength increases, as does e to a lesser degree. These findings indicate that occurrence of fillers, particularly phrase-initial eto and e, provide contributory evidence to discourse boundaries.

2001

Laura Abou-Haidar, “Pauses in speech by French speakers with Down Syndrome,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 33-36. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_033.pdf.

Abstract A better understanding of the control mechanisms of speech in verbal interaction is very important for the evaluation of the pragmatic competence of a mentally deficient speaker. This study focuses on pauses in the oral production of a Speaker with Down syndrome involved in a conversation: it brings to light the temporal compensation mechanisms which allow the speaker to go beyond the distortions of the segmental level. It confirms the important role of prosody in the success of a conversation, particularly with a speaker who has a handicap which disrupts language structure. Down Syndrome is a condition characterised by an overall delay in cognitive, social, linguistic and motor development. At the oral production level, it leads to deficits in segmental and supra-segmental speech patterning. The goal of this study is to bring elements of response to the following question: is the pragmatic function of language preserved in spite of significant distortions of the motor functions of the phonatory organs? The description of the management of pauses by a speaker with Down syndrome involved in a conversation makes it possible to clarify this subject, while taking into account the various functions which are specific to them beyond the respiratory function: their role in encoding, in the delimitation of syntactic boundaries, and in the regulation of speaking turns, among others. This study allowed us to define criteria which make it possible to characterise the oral production of a Speaker with Down syndrome. These elements relate to the variation of the frequency and the length of pauses. The results obtained are the following: 1. a high frequency of occurrence of pauses in the production of the trisomic speaker; 2. a frequency of occurrence of "mixed pauses", of which the majority have very long lengths, this element revealing a lack of ease and disfluency on the production level; 3. a significant recourse to false-starts, hesitation, repetition and lengthening, to mark sound pauses; 4. a considerable number of very long pauses pauses; 5. a relatively high number of pauses located at the boundaries of or within syntagms, with rather long lengths of intra-syntagmatic uses. We furthermore noted a rarity of long phonic sequences in the speaker with Down syndrome, these sequences seldom exceeding 2000 ms. In spite of these results, it is important to note that we have defined parameters which show that the speaker with Down syndrome integrated rules relating to the management of pauses in verbal interaction.

Keywords DiSS
Karl G.D. Bailey, and Fernanda Ferreira, “Do non-word disfluencies affect syntactic parsing?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 61-64. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_061.pdf.

Abstract Although disfluencies such as uh are generally not treated as linguistic items, our results suggest that they can affect syntactic parsing. Using a grammaticality judgment task, we demonstrate that disfluencies are able to affect the syntactic parse of a sentence in two ways. First, disfluencies can make syntactic reanalysis more difficult by coming between an ambiguous constituent and a disambiguating item. Second, the pattern of disfluencies in spontaneous speech may be used by the listener to guide the parse of a sentence. Thus, although disfluencies have often been viewed as pragmatic phenomena, they can affect the language comprehension by influencing its parsing procedures.

Keywords DiSS
Ellen G. Bard, Robin J. Lickley, and Matthew P. Aylett, “Is disfluency just difficulty?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 97-100. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_097.pdf.

Abstract The question addressed by this paper is whether disfluency resembles Inter-Move Interval, a measure of reaction time in conversation, in displaying effects of the overall difficulty of conducting a coherent conversation. Five sources of difficulty are considered as potential causes of disfluency: planning and producing an utterance, comprehending the prior utterance, performing a communicative task, order effects, and interpersonal factors. A multiple regression analysis on simple disfluencies in the HCRC Map Task Corpus shows that planning and production make the major independent contribution to predicting the rate of disfluencies, with interpersonal variables and position in dialogue also contributing significantly. Notably, comprehension variables did not affect either the total rate of disfluency or the rate of individual kinds of disfluencies.

Keywords DiSS
Heather Bortfeld, Silvia Leon, Jonathan Bloom, Michael Schober, and Susan Brennan, “Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender,” Language and Speech, vol. 44, 2001, pp. 123-147. http://openurl.ingenta.com/content?genre=article&issn=0023-8309&volume=44&issue=2&spage=123&epage=147.

Abstract After reviewing situational and demographic factors that have been argued to affect speakers’ disfluency rates, we examined disfluency rates in a corpus of task-oriented conversations (Schober & Carstensen, 2001) with variables that might affect fluency rates. These factors included: speakers’ ages (young, middle-aged, and older), task roles (director vs. matcher in a referential communication task), difficulty of topic domain (abstract geometric figures vs. photographs of children), relationships between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only slightly higher disfluency rates than young and middle-aged speakers. Overall, disfluency rates were higher both when speakers acted as directors and when they discussed abstract figures, confirming that disfluencies are associated with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repeats or restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

Keywords communication, conversation, disfluency, speech planning, spontaneous speech
Susan Brennan, and Michael Schober, “How Listeners Compensate for Disfluencies in Spontaneous Speech,” Journal of Memory and Language, vol. 44, no. 2, 2001, pp. 274-296. DOI: 10.1006/jmla.2000.2753.

Abstract Listeners often encounter disfluencies (like uhs and repairs) in spontaneous speech. How is comprehension affected? In four experiments, listeners followed fluent and disfluent instructions to select an object on a graphical display. Disfluent instructions included mid-word interruptions (Move to the yel- purple square), mid-word interruptions with fillers (Move to the yel- uh, purple square), and between-word interruptions (Move to the yellow- purple square). Relative to the target color word, listeners selected the target object more quickly, and no less accurately, after hearing mid-word interruptions with fillers than after hearing comparable fluent utterances as well as utterances that replaced disfluencies with pauses of equal length. Hearing less misleading information before the interruption site led listeners to make fewer errors, and fillers allowed for more time after the interruption for listeners to cancel misleading information. The information available in disfluencies can help listeners compensate for disruptions and delays in spontaneous utterances.

Keywords comprehension, disfluencies, fillers, paralinguistic cues, parsing, pauses, repairs, spontaneous speech
Jeanne-Marie Debaisieux, and José Deulofeu, “Grammatically unacceptable utterances are communicatively accepted by native speakers, why are they?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 69-72. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_069.pdf.

Abstract This paper aims at redefining the generally accepted notion of unfinished or elliptic sentence, which appears to be crucial in defining in turn the notion of fluency itself. It will be shown that a large part of utterances which a regularly trained linguist would consider as unacceptable and revealing some kind of disfluency of the speaker who produced them, are in fact fully accepted by the participants of a regular verbal interaction. This apparent contradiction will be explained by the fact that linguists base their judgments of well formedness of the utterances on their grammatical structure, whereas speakers interact basically by means of communicative units, which are not necessarily made up of grammatically well formed parts.

Keywords DiSS
Yasuharu Den, “Are word repetitions really intended by the speaker?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 25-28. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_025.pdf.

Abstract This paper compares, using our Japanese data, word repetitions with error repairs in terms of their temporal structures in order to examine whether or not the prolongation of first tokens in word repetitions, observed by Den and Clark (2000), is really an effect of the speaker’s strategy. Analyses of 10 task-oriented Japanese dialogues reveal a difference between word repetitions and error repairs for the data involving cut-off in first tokens; in both types of disfluencies, the final phoneme of the first token is considerably prolonged, but the degree of the prolongation is much greater in word repetitions than in error repairs. These results support our view that prolonged first tokens in word repetitions are a product of a process under the speaker’s control or intention.

Keywords DiSS
Danielle Duez, “Acoustico-phonetic characteristics of filled pauses in spontaneous French speech: preliminary results,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 41-44. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_041.pdf.

Abstract In the current analysis we examined the acoustic and phonetic characteristics of filled pauses in spontaneous French speech and their relationship to the prosody of the surrounding context. Two main results emerged: 1) There was no effect of the duration of filled pauses or their sentence location on their F0 patterns or on the differences between the highest and lowest values. 2) There was no relationship between peak-F0 values and the F0 values of filled-pause onsets, but the F0 values of filled-pause onsets and the F0-values of non-marked breath-group onsets were highly similar. The F0 values of filled-pause onsets seem to be stable within the same speaker’s speech. They are speaker-dependent and strongly linked to the physiological, absolute aspects of speech production. It is assumed that filled-pause onset may be used by listeners as a reference for evaluating the speaker’s pitch range.

Keywords DiSS
Robert Eklund, “Prolongations: A dark horse in the disfluency stable,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 5-8. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_005.pdf.

Abstract This paper studies a specific type of disfluency, viz. segment prolongation (PR), i.e., the "stretching out" of speech sounds as a means of hesitation. It is shown that the occurrence of PRs varies as a function of phone type, position in the word, lexical factors and word class, and that PRs are subject to phonotactic constraints in Swedish. A comparison between Swedish and Tok Pisin suggests that there are languagespecific traits associated with PR production.

Keywords DiSS
Jean E. Fox Tree, “Listeners’ uses of "um" and "uh" in speech comprehension,” Memory and Cognition, vol. 29, no. 2, March 2001, pp. 320-326. https://fp.wul.waseda.ac.jp/f5-w-687474703a2f2f6d632e70737963686f6e6f6d69632d6a6f75726e616c732e6f7267$$/content/29/2/320.abstract.

Abstract Despite their frequency in conversational talk, little is known about how ums and uhs affect listeners’ on-line processing of spontaneous speech. Two studies of ums and uhs in English and Dutch reveal that hearing an uh has a beneficial effect on listeners’ ability to recognize words in upcoming speech, but that hearing an um has neither a beneficial nor a detrimental effect. The results suggest that um and uh are different from one another and support the hypothesis that uh is a signal of short upcoming delay and um is a signal of a long upcoming delay.
Mária Gósy, “The double function of disfluency phenomena in spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 57-60. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_057.pdf.

Abstract Disfluency in spontaneous speech is the outcome of a speaker’s indecision about what to say next. The listener, however, is continuously adapted to both the language signals and the types of disfluency of the heard text. What is in the background of this adaptation process? This paper analyses the types and characteristics of the disfluency phenomena of a 78-minute spontaneous speech sample (produced by 10 adults). The author’s intention is to explain the characteristics of disharmony between speech planning and articulation within the speech production process. In order to explain the hypothesized double function of disfluency in terms of perceptual necessity from the listener’s side various experiments have been carried out. Three different samples of spontaneous speech have been selected for experimental purposes. Three groups of listeners (altogether 60 university students) participated in the experiments. One of the groups had to detect the instances of disfluency in the texts marking them on a paper sheet. The subjects of the other group listened to the same texts and then wrote down their contents. The pauses and hesitations were then eliminated from the texts. The third group of the subjects had the same comprehension task as the previous one had. Results show that (i) instances of disfluency are consequences of the speaker’s speech planning processes, (ii) their reasons and occurrences are unconsciously known by the listener as well, (iii) disfluency phenomena are relatively well predicted, (iv) the listeners need pauses and hesitations in order to comprehend the heard texts successfully.

Keywords DiSS
Lynne Hansen, “Language Attrition: The Fate of the Start,” Annual Review of Applied Linguistics, vol. 21, 2001, pp. 60-73. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=168293&fulltextType=RA&fileId=S0267190501210058.

Abstract This chapter reviews the literature on psycholinguistic aspects of language attrition over the past half decade. Descriptive data-based studies have continued to dominate during this time, providing needed groundwork for the emerging discipline. A few studies have continued theoretical threads from previous work, however, by examining attrition data from the perspectives of the regression hypothesis and markedness theory. We have also seen the beginnings of promising new lines of research which draw theoretical underpinnings from neighboring disciplines, most notably from the savings paradigm in cognitive psychology and from theories of codeswitching in bilingualism studies. Evidence on the effects in attrition of non-linguistic variables such as age, proficiency level, and literacy has continued to accumulate. Hesitation phenomena in attriter speech have begun to receive serious attention. Relearning, one of the main areas to potentially benefit from language attrition studies, is also gaining new research impetus at the turn of the century.
Tapio Hokkanen, “Prosodic marking of self-repairs,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 37-40. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_037.pdf.

Abstract Slip studies predominantly focus on either structural or semantic properties of the errors. Since most analyses have been based on pen-and-paper collections, i.e., on-line notes, it is quite understandable that suprasegmental of errors have remained a neglected area. The present prosodic analysis is based on acoustical measurements of 307 self-repairs. Each repair has been measured with the Praat program. In order to make the measurements psychoacoustically relevant and comparable across speakers, the changes in F0 are expressed in terms of semitones. In general, speakers repair slightly less than three quarters of the errors they commit whereas one quarter remains either totally undetected or at least without a repair. With respect to prosodic marking, it appears that the proportion of marked repairs in the present data is significantly larger than in previous studies: approximately two thirds of self-repairs are marked with remarkably higher pitch (>+3ST), and a total of 96.7 per cent with a somewhat heigthened pitch. It is concluded that alternations of fundamental frequency are utilized in marking self-initiated repairs.

Keywords DiSS
Peter Howell, and James Au-Yeung, “Application of EXPLAN theory to spontaneous speech control,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 9-12. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_009.pdf.

Abstract Problems for theories that explain speech errors by a monitoring process are discussed. EXPLAN theory is based on a proposal about planning and execution time, not on how errors arise. This theory is outlined and support from characteristics of fluency failure and altered feedback studies given.

Keywords DiSS
Peter Howell, and Stevie Sackin, “Function Word Repetitions Emerge When Speakers Are Operantly Conditioned to Reduce Frequency of Silent Pauses,” Journal of Psycholinguistic Research, vol. 30, no. 5, 2001, pp. 457-474. http://www.ncbi.nlm.nih.gov/pubmed/11529422.

Abstract Beattie and Bradbury (1979) reported a study in which, in one condition, they punished speakers when they produced silent pauses (by lighting a light they were supposed to keep switched off). They found speakers were able to reduce silent pauses and that this was not achieved at the expense of reduced overall speech rate. They reported an unexpected increase in word repetition rate. A recent theory proposed by Howell, Au-Yeung, and Sackin (1999) predicts that the change in word repetition rate will occur on function, not content words. This hypothesis is tested and confirmed. The results are used to assess the theory and to consider practical applications of this conditioning procedure.
Ben Hutchinson, and Cécile Pereira, “Um, one large pizza. A preliminary study of disfluency modelling for improving ASR,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 77-80. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_077.pdf.

Abstract A corpus of spontaneous telephone transactions between call centre operators of a pizza company and its customers is examined for disfluencies (fillers and speech repairs) with the aim of improving automatic speech recognition. From this, a subset of the customer orders is selected as a test set. An architecture is presented which allows filled pauses and repairs to be detected and corrected. A language repair module removes fillers and reparanda and transforms utterances containing them into fluent utterances. An experiment on filled pauses using this module and architecture is then described. A speech recognition grammar for recognising fluent speech is used to provide a baseline. This grammar is then enriched with filled pauses, based on their placement in relation to syntactic boundaries. Evaluation is done at the level of understanding, using a metric on feature structures. Initial results indicate that incorporating filled pauses at syntactic boundaries improves the recognition results for spontaneous continuous speech containing disfluencies.

Keywords DiSS
Klaus J. Kohler, Benno Peters, and Thomas Wesener, “Interruption glottalization in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 45-48. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_045.pdf.

Abstract This paper analyzes the occurrence of phonetic interruption cues at points of syntactic irregularities (false starts and truncations) in a large annotated corpus of German dialogues and compares interruption glottalization with laryngealization in terminal low phrase-final prosodies. Glottalization (including glottal stop) predominantly marks word fragments, whereas non-verbal insertions, e.g. breathing, tend to be word-external interruption cues. Laryngealization (excluding glottal stop) predominantly signals terminal phrase boundaries in turn-final positions. Individual speakers differ a great deal as to the distribution of these phenomena.

Keywords DiSS
Robin J. Lickley, “Dialogue moves and disfluency rates,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 93-96. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_093.pdf.

Abstract Many factors conspire to cause speakers to produce hesitations and self-repairs in dialogue. It has been noted that disfluency rates vary between corpora, with different overall dialogue tasks and with different modalities (e.g. human-computer vs. human-human) and between speakers, where they play different roles within a given dialogue. In this paper, we attempt to account for some of these results by examining the interaction between rates of different types of disfluency and types of utterance (dialogue moves) within one corpus of human-human task oriented dialogues. We find both that overall disfluency rate varies by dialogue move type, with moves which require more planning producing more disfluency, and that the distribution of disfluency types varies between move types, most notably with complex and negative responses to questions producing more filled pauses than positive replies and other moves. This work helps us to understand how dialogue structure can account for differences in disfluency rates between and within speech corpora and has implications for research in speech production and perception, discourse studies, dialogue management and automatic speech recognition.

Keywords DiSS
Jan McAllister, Susan Cato-Symonds, and Blake Johnson, “Listeners’ ERP responses to false starts and repetitions in spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 65-68. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_065.pdf.

Abstract Hindle [1] suggested that false starts and repetitions should be handled differently in a computational account of the processing of the two kinds of disfluency, and there is behavioural evidence that the human sentence processing mechanism likewise honours this distinction [2]. The same dichotomy was also evident in the electrophysiological data reported here. False starts and repetitions were identified in a corpus of spontaneous speech. Control items for the false starts were prepared by excising the reparanda to yield apparently fluent items. Continuous EEG was recorded while subjects listened to items containing the false starts, fluent false start controls, and first and second tokens of repetitions. Compared with identical words in their fluent controls, the false starts elicited a positive response similar to the P600 which is reported for syntactically anomalous words [3, 4, 5]. By contrast, second tokens of repetitions in general resulted in increased amplitude of the N400 [6]; yet, when the same repetitions were excised from context and presented listfashion, they elicited the positive-going response which has been reported by other researchers [7].

Keywords DiSS
Nikolinka Nenova, Gina Joue, Ronan Reilly, and Julie Carson-Berndsen, “Sound and function regularities in interjections,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 49-52. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_049.pdf.

Abstract This paper investigates the relation between the sound patterns of interjections and their functional realisation in the discourse process. It considers whether certain interjection functions tend to have particular sound distributions. In order to address these questions a classification scheme for American English nonlexical interjections in terms of discourse markers is also presented.

Keywords DiSS
Sieb G. Nooteboom, “Different sources of lexical bias and overt self-corrections,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 21-24. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_021.pdf.

Abstract In this paper it is argued, on the basis of a quantitative analysis of spontaneous speech errors and their corrections in Dutch, that the mechanism leading to lexical bias in speech errors cannot be same as that leading to overt self-corrections. Although spontaneous speech errors show a strong lexical bias, overt self-corrections do not. Lexical bias strongly increases with dissimilarity between target phoneme and source phoneme No such effect is found in overt selfcorrections. Several possible sources of these differences are discussed.

Keywords DiSS
Serguei V. Pakhomov, “Hesitations and Cognitive Status of Noun Phrase Referents in Spontaneous Discourse,” PhD Dissertation, University of Minnesota. 2001. https://catalog.hathitrust.org/Record/102207142.

Abstract (none)
Anastasia Riazantseva, “Second Language Proficiency and Pausing: A Study of Russian Speakers of English,” Studies in Second Language Acquisition, vol. 23, no. 4, December 2001, pp. 497-526. DOI: 10.1017/s027226310100403x. http://journals.cambridge.org/article_S027226310100403X.

Abstract The present study examines the relationship between second language (L2) proficiency and pausing patterns (i.e., pause duration, frequency, and distribution) in the speech of 30 Russian speakers of English performing two oral tasks—a topic narrative and a cartoon description—in Russian and in English. The subjects were divided into two oral English proficiency groups, high and intermediate, on the basis of a standardized test of spoken English. Baseline data were collected from a control group of 20 native English speakers. Statistical analyses were performed to determine: (a) the native norms of pause duration, frequency, and distribution for Russian and English on the two experimental tasks; (b) the effect of the level of L2 proficiency (high and intermediate) on the pausing of Russian speakers in English; and (c) the differences or similarities in pausing exhibited by native English speakers and native Russian speakers (with two different levels of English proficiency) when speaking English. The results of this study indicate that English and Russian informal monologue speech can be characterized as having different pausing conventions, thus suggesting that crosslinguistic differences involve, among many other aspects, contrasts in pausing patterns. Additionally, L2 proficiency was found to affect the pause duration of advanced nonnative speakers in that they were able to adjust the duration of their pauses in English to produce a nativelike pausing norm. It was also found that even highly proficient L2 speakers pause more frequently in their L2 than in their first language (L1). The examination of pause distribution patterns suggests that persons of intermediate to high L2 speaking proficiency make the same number of within-constituent pauses as native speakers. Overall, the findings of this study support the view that adherence to the target language pausing norms may lead to the perception of nonnative speech as more fluent and nativelike. The findings also highlight the importance of exposing L2 students to a richer variety of situations that illustrate native patterns of verbal communication.
Caroline L. Rieger, “Idiosyncratic fillers in the speech of bilinguals,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 81-84. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_081.pdf.

Abstract This paper introduces a never before described strategy used by bilinguals to fill hesitation pauses. This strategy proved so unique that it was given the name ’idiosyncratic filler.’ It describes a filler type that is produced unusually often by one individual when hesitating. It is usually a particular lexical filler that is used as often as or more often than all other lexical fillers combined. Idiosyncratic fillers are as flexible as, but more ’prestigious’ than quasi-lexical fillers and they are used by bilinguals in their non-native language as an overgeneralization and to avoid the incessant production of ’uhs’ and ’uhms.’

Keywords DiSS
L. J. Rodríguez, I. Torres, and A. Varona, “Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 1-4. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_001.pdf.

Abstract A new database consisting of 227 dialogues in Spanish was annotated with disfluencies. Then a detailed analysis of the annotations was carried out. The database had been recorded according to the well knownWizard of Oz paradigm. Seventy-five speakers were given each one three different scenarios to make queries about timetables, prices and other conditions of train travels between two spanish cities. The notion of disfluency was relaxed to include any acoustic, lexical or syntactic feature that distinguises spontaneous from read speech. A specific XML annotation scheme was developed. A simple text editor was used to insert marks, and a specific parser was implemented to find errors in annotations. The analysis of annotations revealed that disfluencies were not uniformly distributed among either user turns or speakers. Most disfluencies were grouped into certain user turns, especially the first one. On the other hand, some speakers were remarkably more prone to hesitate, repeat or correct fragments of speech than others.

Keywords DiSS
Mandana Seyfeddinipur, and Sotaro Kita, “Gesture as an indicator of early error detection in self-monitoring of speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 29-32. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_029.pdf.

Abstract There is a theoretical controversy regarding when the selfmonitoring process interrupts the speech stream. One view holds that the speech stream is interrupted as soon as an error is detected. Another view holds that, even after an error is detected, the speaker does not interrupt immediately but continues speaking and at the same time plans the upcoming repair. We address this question by observing speech-accompanying gestures at the moment of speech disfluency. The results show that the concurrent gestural movements are typically stopped on average 240 ms before speech is stopped. In other words, the gesture suspension foreshadows the speech suspension. The gestural foreshadowing shows that the speaker must know early on that he is going to suspend speech. The gestural indication of an upcoming speech suspension suggests that the speaker does not interrupt speech at the very moment s/he detects an error. This result supports the hypothesis on speech monitoring stating that the speaker continues to talk after error detection and at the same time plans the upcoming repair.

Keywords DiSS
Richard Shillcock, Simon Kirby, Scott McDonald, and Chris Brew, “Filled pauses and their status in the mental lexicon,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 53-56. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_053.pdf.

Abstract We report a study of the relationship between form and meaning in the most frequent monosyllabic words in the lexicon of English. There is a small but significant correlation between the phonological distance and the semantic distance between each pair of words. To this extent, words that have similar meanings tend to sound similar. Words differ as to the size of this meaning-form correlation in their relationship with all of the other words. When the words are ranked according to the size of this correlation we find that the words which appear towards the top of the ranking are the communicatively important words. When we look at the position in the ranking of the speech editing terms, such as er, oh and um, we find that they are at the very top of the ranking. We argue that this position reflects the communicative importance of these items, and that it therefore makes sense to treat them as a proper part of the mental lexicon.

Keywords DiSS
Elizabeth Shriberg, “To ’errrr’ is human: ecology and acoustics of speech disfluencies,” Journal of the International Phonetic Association, vol. 31, no. 1, 2001, pp. 153-169. DOI: 10.1017/S0025100301001128.

Abstract Unlike read or laboratory speech, spontaneous speech contains high rates of disfluencies (e.g. repetitions, repairs, filled pauses, false starts). This paper aims to promote ’disfluency awareness’ especially in the field of phonetics — which has much to offer in the way of increasing our understanding of these phenomena. Two broad claims are made, based on analyses of disfluencies in different corpora of spontaneous American English speech. First, an Ecology Claim suggests that disfluencies are related to aspects of the speaking environments in which they arise. The claim is supported by evidence from task effects, location analyses, speaker effects and sociolinguistic effects. Second, an Acoustics Claim argues that disfluency has consequences for phonetic and prosodic aspects of speech that are not represented in the speech patterns of laboratory speech. Such effects include modifications in segment durations, intonation, voice quality, vowel quality and coarticulation patterns. The ecological and acoustic evidence provide insights about human language production in real-world contexts. Such evidence can also guide methods for the processing of spontaneous speech in automatic speech recognition applications.
Jörg Spilker, Anton Batliner, and Elmar Nöth, “How to repair speech repairs in an end-to-end system,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 73-76. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_073.pdf.

Abstract If automatic speech processing wants to deal with spontaneous speech, it has to deal with disfluencies in general and speech repairs in particular as well. The paper describes the processing of speech repairs in the VERBMOBIL system and discusses the special requirements of real-time systems. With respect to this criterion, the VERBMOBIL approach and its results are compared to other work. All these results are based more or less on the evaluation of a stand alone process, not integrated in a speech system. The ultimate goal is, of course, the use and the evaluation of the impact of such a repair process in a real-time, end-to-end system. An evaluation method based on this idea is presented and some preliminary results are given.

Keywords DiSS
Nada Vasic, and Frank Wijnen, “Stuttering and speech monitoring,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_013.pdf.

Abstract In this paper, we would like to argue that stuttering represents inadequate monitoring of the speech production process. The model we are proposing is the vicious circle hypothesis. The stuttering speaker has a malfunctioning monitor whose three parameters, namely focus, effort, and threshold are inappropriately set. In order to test our hypothesis, we tested 20 stuttering individuals in a dual task situation. The experiment consisted of three conditions: baseline where semi-spontansous speech was elicited and two dual-task conditions. First dual task was speaking and playiong a computer game at the same time where the processing resources were taken away from monitoring. The second dual task waqs designed to shift the monitor’s focus away from habitual monitoring. Subjects were asked to monitor for a particular word in their speech. The preliminary results for our expeiment show that in the dual task condition the number of disfluencies decreased in relation to the number of words, which, in turn supports our prediction that distraction has a positive effect on fluency in the case of stuttering individuals.

Keywords DiSS
Michiko Watanabe, “The usage of fillers at discourse segment boundaries in Japanese lecture-style monologues,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 89-92. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_089.pdf.

Abstract We examined whether fillers (filled pauses) in a Japanese lecture appeared more frequently after discourse segment boundaries (DSB) than after other sentence boundaries. Contrary to our hypothesis that fillers occur more often after DSB than after other sentence boundaries, the frequency of fillers in the first phrase after DSB did not differ statistically from that after other sentence boundaries. The location of fillers in the first phrase after DSB and after other boundaries did not show any clear difference, either. However, the types of fillers at the initial position of the first phrase after two kinds of boundaries were different; sentence initial ’eto’ appeared exclusively at DSB. This result indicates that sentence initial ’eto’ may help highlighting DSB, but not other types of fillers. Other kinds of fillers (’e’, ’ma’, ’ano’, ’sono’) seem to be mainly concerned with planning units of the utterance that are smaller than a sentence.

Keywords DiSS
Asa Wengelin, “Disfluencies in writing - are they like in speaking?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 85-88. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_085.pdf.

Abstract This paper presents a study of disfluencies in written language production. Texts from ten university students are compared to data from people who almost never use writing, namely adult dyslexics and to texts from people who communicate in writing under real-time constraints every day, namely deaf whose main use of writing is text telephone conversations. This paper investigates which types of disfluencies occur in writing, where they occur and their durations. Further, this paper investigates how different text types and the specific characteristics of deaf and dyslexic writers influence the distribution of disfluencies. The results are discussed in relation to earlier work on disfluencies in speaking.

Keywords DiSS
Kouzou Yanagawa, “Hesitation Phenomenaが高校生のリスニング理解に及ぼす影響,” STEP Bulletin, vol. 13, 2001, pp. 13-25. http://www.eiken.or.jp/teacher/research/study13.html.

Abstract 日常会話の中で，著者のいうHesitation Phenomena （HP）が，重要な役割を持ち，コミュニケーションをよりrealisticで，生き生きとしたものにしていることは経験的にもよく理解できることです。教室場面での教科書的会話文や用意された会話文にもとづく練習場面での会話がいかにも空々しく聞こえ，非現実的に思えるのもそうしたHPの潤滑油的役割が介入していないからではないでしょうか。また，You know, とか I mean, といった相槌を間髪を容れず会話に自由にはさむことなどは，日本人のいちばん苦手とすることの一つと思われます。しかしそうしたHPの存在は，初心者にとって聴解を助ける場合もあり，またかえって妨げる場合もあり，研究テーマとして興味深いものがあります。日本の高校生を対象にHPの存在が聴解に及ぼす影響を比較した本研究は，英語学習の指導上にも多くの示唆を与えるものとして，意義あるものといえるでしょう。 | 過去の先行研究のレビューから始まって，聴解素材の作成，実施，収集されたデータの分析と，論文完成に至るまでの手続きにはかなりの注意が払われ，慎重に進められていると思います。しかし，得られた結果は著者の予想仮説に反して，かなり明瞭な形でHPの存在が聴解を助けるというものでした。いままで，HPの存在が聴解にプラス効果をもたらすかそうでないかについて決着はついておらず，一般的結論を出すのはそう単純な問題ではないことを示していますが，ここでの結果は，生きた自然英語による教育を重視する人たちにとって勇気を与えるものとなるでしょう。これを契機にいろいろな発展が期待できる研究だと思います。
Michiko Yoshida, “Repeated phoneme effect in Japanese speech errors,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_017.pdf.

Abstract Analyses of errors in the natural speech of Dutch, German, and English have shown that involuntary rearrangements of phonemes (e.g., left hemisphere heft lemisphere) are more likely to occur when the two words involved in the error have the same phoneme before or after the phoneme on which the error occurred (e.g., /E/ in left hemisphere) [1, 2]. A study by Dell (1984) has revealed that phoneme repetition could also contribute to experimentally induced speech errors in English [3]. The present study explored the effect of repeated phonemes in Japanese speech errors by means of two errorinducing experiments. Analyses of subjects’ errors showed that a sequence of syllables that share the same phoneme was more error-prone than one with a variety of phonemes, suggesting that phoneme repetition could contribute to Japanese speech errors. These results are consistent with the view that the repeated phoneme effect is common to all speakers regardless of language.

Keywords DiSS

2000

Judit Kormos, “The Role of Attention in Monitoring Second Language Speech Production,” Language Learning, vol. 50, no. 2, June 2000, pp. 343-384. DOI: 10.1111/0023-8333.00120.

Abstract The study investigates the role of attention in monitoring second language speechproduction by means of analyzing the distribution and frequency of self-repairs and the correction rate of errors in the speech of 30 Hungarian learners of English at 3 levels of proficiency and of 10 native speakers of Hungarian. The results indicate that the amount of attention paid to the linguistic form of the utterance does not vary at different stages of L2 competence and that the distribution of attention in monitoring for errors is markedly different inL1 and L2. In the case of advanced L2 speakers, the extra attentional resources made available by the automaticity of certain encoding processes were used for checking the discourse-level aspects of their message.
Liz Temple, “Second language learner speech production,” Studia Linguistica, vol. 54, no. 2, August 2000, pp. 288-297. DOI: 10.1111/1467-9582.00068.

Abstract This paper reports on a study which investigated temporal variables in foreign language learner speech and native speech. The findings are discussed from a cognitive processing perspective. The subjects were 30 intermediate/advanced level adult students of French as a foreign language and 20 native speakers of French. Short extracts of recorded interviews were transcribed and quantitative measures of pause and hesitation phenomena, repairs and errors were calculated. The speech production model of Levelt (1989) provides a framework for understanding the source of these phenomena and the significant differences between natives and learners in planning and encoding speech. Capacity limitations of working memory, related in particular to foreign language learners’ non-automatic processing mode, resulted in non-fluent speech performance, compared with native speakers.

1999

Heather Bortfeld, Silvia D. Leon, Jonathan Bloom, Michael F. Schober, and Susan E. and Brennan, “Which speakers are most disfluent in conversation, and when?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 7-10. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract We examined disfluency rates in a corpus of task-oriented conversations [1] in which several factors were manipulated that could affect fluency rates. These factors included: speakers’ age (young, middleaged, and older), task roles (director vs. matcher), difficulty of domain (abstract geometric figures or tangrams vs. photographs of children’s faces), relationship between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only marginally higher (combined) disfluency rates than young and middleaged speakers. Overall, disfluency rates were higher both when speakers took the initiative and when they discussed tangrams, associating disfluencies with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repetitions and restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

Keywords DiSS
Susan E. Brennan, and Michael F. Schober, “Uhs and interrupted words: The information available to listeners,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 19-22. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Speech disfluencies are generally assumed to harm comprehension. Our studies investigated whether this is true, or whether certain disfluencies might actually help comprehension by marking for listeners which information the speaker intends to repair. We tested two hypotheses: (1) whether an interrupted word signals that the word was produced in error, and (2) whether a filler such as uh after an interrupted word signals an error. Listeners heard fluent instructions and disfluent ones whose reparanda contained completed words, interrupted words, or interrupted words with fillers, and then responded to these instructions. Responses to mid-word interruptions were no faster than to between-word interruptions, although there were fewer errors when less of the unintended word was heard. Responses to mid-word interruptions with uh were faster and more accurate than controls without disfluencies. With more complex displays, the response time advantage (but not the error rate advantage) diminished, suggesting that an interrupted word followed by uh tells listeners what the speaker does NOT mean. A fourth experiment showed that it is not the presence of the uh per se, but the additional time after the interrupted word that is the source of this "disfluency advantage."

Keywords DiSS
Mark G. Core, and Lenhart K. Schubert, “Speech Repairs: A Parsing Perspective,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 47-50. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract This paper presents a grammatical and processing framework for handling speech repairs. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. This parser can also correct a pre-parser speech repair identifier producing increases in recall varying from 2% to 4.8%.

Keywords DiSS
Robert Eklund, “A Comparative Analysis of Disfluencies in Four Swedish Travel Dialogue Corpora,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 3-6. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract This paper reports on ongoing work on disfluencies carried out at Telia Research AB. Four travel dialogue corpora are described: human-"machine"-human (Wizard-of-Oz); human-"machine" (Wizard-of-Oz); human-human and human-machine. The data collection methods are outlined and their possible influence on the collected material is discussed. An annotation scheme for disfluency labelling is described. Preliminary results on five different kinds of disfluencies are presented: filled and unfilled pauses, prolonged segments, truncations and explicit editing terms.

Keywords DiSS
Jean E. Fox Tree, “Between-Turn Pauses and Ums,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 15-17. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Pauses and ums are often treated as two versions of the same thing, with the traditional label for ums, filled pauses, emphasizing this seeming interchangeability. To explore this hypothesis, I compared how overhearers interpreted a speaker’s contribution to a conversation depending on whether the speaker responded immediately, paused and responded, or said um and responded. Overhearers answered a series of questions about the turn exchanges they had heard. The questions measured their interpretations of the second speakers’ speech production difficulty, honesty, comfort with the topic discussed, familiarity with the interlocutor, and desire to have further contact with the interlocutor. In two experiments, the type of turn exchange was found to influence overhearers’ interpretations. Results supply information about both the signalling properties of ums and the relationship between ums and pauses of varying lengths in the environment of a turn exchange.

Keywords DiSS
Jean E. Fox Tree, and Josef C. Schrock, “Discourse Markers in Spontaneous Speech: Oh What a Difference an Oh Makes,” Journal of Memory and Language, vol. 40, 1999, pp. 280-295. DOI: 10.1006/jmla.1998.2613.

Abstract Discourse markers are usually studied from the vantage point of corpora analyses. By looking at where they fall in spontaneous talk, hypotheses can be made about their possible functions. However, direct tests of listeners’ uses of these expressions are rare. In five experiments, we looked at the on-line spontaneous speech comprehension effects of one discourse marker, oh. We found that recognition of words was faster after oh than when the oh was either excised and replaced by a pause or excised entirely. We also found that semantic verification of words heard earlier in the discourse was faster after oh than when the oh was either excised and replaced by a pause or excised entirely, but only when the test point was downstream from the oh. Results demonstrate that oh is not only a potential signal to addressees, as has been suggested by corpora analyses, but that it is in fact used by addressees to help them integrate information in spontaneous talk.
Dafydd Gibbon, and Shu-Chuan Tseng, “Toward a formal characterisation of disfluency processing,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 35-38. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Inherent structural characteristics of speech disfluencies are the prerequisite for the fulfilment of detecting and correcting speech disfluencies in spontaneous speech. However, a considerable number of recent research works on speech disfluencies focus on the surface patterns of speech disfluency editing structure, instead of looking into the relations between editing structure, the syntactic structure and the prosodic structure of speech disfluencies. In this paper we present first results of a new line of research, using feature structures modelled by finite state transducers, on the formal modelling of speech disfluencies in unplanned speech, in relation to all three levels of description.

Keywords DiSS
Marie-Noëlle Guillot, Fluency and Its Teaching. Clevedon, England: Multilingual Matters.1999. https://eric.ed.gov/?id=ED438732.

Abstract We can all recognize fluency and practice it, but often do not understand what linguistic and paralinguistic operations are involved. This text tries to solve this puzzle. It begins by exploring perceptions of fluency to understand their common denomimators. It goes on to pinpoint the specific features which promote fluency while emphasizing its relative and interactional nature. These analyses produce both a methodological framework and a pedagogical strategy, illustrated by sample classroom activities. Language teachers, applied linguists, linguists and their students should find this book an accessible companion to the teaching and study of oral language, with French as its domain of application.
Peter A. Heeman, and K.H. Loken-Kim, “Detecting and Correcting Speech Repairs in Japanese,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 43-46. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract One of the characteristics of spontaneous speech is the abundance of speech repairs, in which speakers go back and repeat or change something they have just said. In other work [7], we proposed a language model for speech recognition that can detect and correct speech repairs in English. In this paper, we show that this model works equally as well on a Japanese corpus of spontaneous speech. The structure of the model captures the language independent aspect of speech repairs, while machine training techniques on an annotated corpus learn the language dependent aspects.

Keywords DiSS
Kim Kirsner, Ben Roberts, and Yong-Heng Lee, “Why does spontaneous speech unfold in temporal cycles, sometimes?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 11-14. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Spontaneous speech typically consists of alternating periods of continuous fluency, where fluency refers to the ratio of speech to pausing. Individual differences in fluency are substantial, with mean pause per minute ranging from less than 20 to more than 40 sec per minute in our sample of English and Mandarin speakers. While pauses have been regarded as critical clues for psycholinguistic analysis for decades, the existence of temporal cycles have been subject to extensive debate. The results of our experiments provide strong support for the presence of temporal cycles in spontaneous speech, and demonstrate in particular that fluency declines and increases prior and subsequent to topic shifts respectively. The source of temporal cycles is unclear, however. The prevailing assumption is that they reflect alternating periods of high level macro-planning, associated with low fluency, and low level micro-execution, associated with high fluency. However, a variety of alternative explanations merit consideration.

Keywords DiSS
Judit Kormos, “Monitoring and Self-Repair in L2,” Language Learning, vol. 49, no. 2, June 1999, pp. 303-342. DOI: 10.1111/0023-8333.00090.

Abstract The aim of this article is to review the psycholinguistic research on second language (L2) self-repair to date with particular attention to the relevance of this field for L2 production and acquisition. The article points out that W. J. M. Levelt’s (1989, 1993, 1992) and W. J. M. Levelt et al.’s (in press) perceptual loop theory of monitoring can be adopted for monitoring in L2 speech as well. It is also argued, however, that this theory needs to be complemented with recent research on consciousness, attention, and noticing in order to account for mechanisms of error detection in L2.
Willem J. M. Levelt, Ardi Roelofs, and Antje S. Meyer, “A theory of lexical access in speech production,” Behavioral and Brain Sciences, vol. 22, no. 1, 1999, pp. 1–38. DOI: 10.1017/S0140525X99001776. https://psycnet.apa.org/record/1999-13199-001.

Abstract Preparing words in speech production is normally a fast and accurate process. We generate them two or three per second in fluent conversation; and overtly naming a clear picture of an object can easily be initiated within 600 msec after picture onset. The underlying process, however, is exceedingly complex. The theory reviewed in this target article analyzes this process as staged and feed-forward. After a first stage of conceptual preparation, word generation proceeds through lexical selection, morphological and phonological encoding, phonetic encoding, and articulation itself. In addition, the speaker exerts some degree of output control, by monitoring of self-produced internal and overt speech. The core of the theory, ranging from lexical selection to the initiation of phonetic encoding, is captured in a computational model, called WEAVER++. Both the theory and the computational model have been developed in interaction with reaction time experiments, particularly in picture naming or related word production paradigms, with the aim of accounting for the real-time processing in normal word production. A comprehensive review of theory, model, and experiments is presented. The model can handle some of the main observations in the domain of speech errors (the major empirical domain for most other theories of lexical access), and the theory opens new ways of approaching the cerebral organization of speech production by way of high-temporal-resolution imaging.
Robin Lickley, David McKelvie, and Ellen Gurman Bard, “Comparing human and automatic speech recognition using word-gating,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 23-26. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract This paper describes a study in which we compare human and automatic recognition of words in fluent and disfluent spontaneous speech. In a word-level gating study with confidence judgements, we examine how the recognition and confidence of recognition of words by humans develops over utterances and show how disfluency disrupts the process. We give an automatic recogniser the same task and compare its performance with the humans’. With both systems, subsequent context supports word recognition: confidence in word recognition peaks after subsequent words have been heard. With both systems, disfluency adversely affects recognition of words in the immediate vicinity of the disfluent interruption (for repeats and repairs): disrupted subsequent context disrupts the recognition process.

Keywords DiSS
Douglas O’Shaughnessy, “Better detection of hesitations in spontaneous speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 39-42. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speeech recognition work has concentrated on read speech, whose acoustic aspects differ significanlty from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., "uhh," "umm"), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficiences), identiyfing filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation.

Keywords DiSS
Sherri Page, “Use of a postprocessor to identify and correct speaker disfluencies in automated speech recognition for medical transcription,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 27-30. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Medical practitioners speak in a quasi-spontaneous monologue when they dictate a chart note, letter, or patient history. Prior research has largely ignored the issue of disfluency in dictation, arguing that speakers can control recording and start over if necessary. In 550,000 words of hand transcribed medical dictation, however, we find numerous filled pauses, repetitions, and other self-repairs. This paper describes: a pre-theoretical classification of disfluencies, developed to identify patterns useful in automatic text processing; the patterns of disfluency found in a corpus hand tagged with this classification, which include repetitions in combination with substitutions, insertions, and deletions; and, preliminary results of implementation of a disfluency pattern matcher and filter in a postprocessor developed for commercial use.

Keywords DiSS
Sergey Pakhomov, and Guergana Savova, “Filled Pause Distribution and Modeling in Quasi-Spontaneous Speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 31-34. http://diss2019.elte.hu/wp-content/uploads/2019/01/DiSS1999_Proceedings-2.pdf.

Abstract Filled pauses (FP’s) are characteristic of spontaneous speech and present considerable problems for speech recognition by being often recognized as short words. Recognition of quasispontaneous speech (medical dictation) is subject to this problem as well. An um can be recognized as thumb or arm if the recognizer’s language model does not adequately represent FP’s. Representing FP’s in the training corpus improves recognition. Several techniques of seeding a training corpus with FP’s were evaluated to show that a stochastic method, along with random insertion uniformly distributed around the average sentence length, yield better results compared to random insertion at other ranges. The optimal method of seeding a training corpus with FP’s may be linked to clause boundaries despite the fact that an imperfect method of inserting FP’s at clause boundaries used in this study failed.

Keywords DiSS
Shu-Chuan Tseng, “Grammar, prosody and speech disfluencies in spoken dialogues,” Master's Thesis, Bielefeld University. 1999. https://pub.uni-bielefeld.de/publication/2306427.

Abstract Two questions are asked in this thesis: 1) How are speech disfluencies produced from the linguistic point of view, focusing especially on the syntactic and prosodic features? 2) Do regular internal relations exist within speech disfluencies and if so, what do they look like? These two questions are examined by means of empirical studies and formal models. With respect to the first question, a corpus analysis is carried out to investigate the production of disfluencies from the syntactic point of view. The acoustic-prosodic features of selected types of disfluencies are also examined. A further syntactic analysis is undertaken to secure the syntactic features found in the speaker-independent corpus analysis is refound in the speaker-dependent data which are used in the signal analysis. Furthermore, the empirical results of the corpus analysis and signal analysis should provide empirical evidence for the existence of regular internal relations within speech disfluencies as asked in the second question. Finite state techniques are used for describing this regularity. The results of the corpus analysis show a highly significant syntactic regularity when disfluencies are produced. The results of the signal analysis show that the production of disfluencies is also prosodically marked. Furthermore, according to the results obtained by the speaker-independent and -dependent syntactic analyses as well as the speaker-dependent acoustic-prosodic analysis, a clear mapping from the syntactic level to the prosodic level was found. This means that one can find both syntactic and prosodic cues at specific positions within or around speech disfluencies. Thus, the hypothesis that there exist specific phrase-internal disfluency relations has been empirically supported. Based on the empirical results, a formal description system is developed to cover the majority of the disfluencies produced in the corpus. The disfluency relations found are subsequently modelled by means of three disfluency models which also demonstrate the progress of investigating internal structures of disfluencies. Both syntactic and prosodic features are integrated into the final model and the relationship between syntax and prosody with respect to disfluencies is established. These relations are formally described in terms of finite state automata.

1998

Herbert Clark, and Thomas Wasow, “Repeating Words in Spontaneous Speech,” Cognitive Psychology, vol. 37, no. 3, December 1998, pp. 201-242. DOI: 10.1006/cogp.1998.0693.

Abstract Speakers often repeat the first word of major constituents, as in, "I uh I wouldn’t be surprised at that." Repeats like this divide into four stages: an initial commitment to the constituent (with "I"); the suspension of speech; a hiatus in speaking (filled with "uh"); and a restart of the constituent ("I wouldn’t . . ."). An analysis of all repeated articles and pronouns in two large corpora of spontaneous speech shows that the four stages reflect different principles. Speakers are more likely to make a premature commitment, immediately suspending their speech, as both the local constituent and the constituent containing it become more complex. They plan some of these suspensions from the start as preliminary commitments to what they are about to say. And they are more likely to restart a constituent the more their stopping has disrupted its delivery. We argue that the principles governing these stages are general and not specific to repeats.
Keiko S. Emmett, “Ano(o) is more than "um": interactional functions of ano(o) in Japanese conversation,” in Proceedings of the fifth annual symposium about language and society, vol. 39, Austin, Texas, University of Texas Department of Linguistics, 1998, pp. 136-148. http://salsa.ling.utexas.edu/proceedings/1997/index.html.

Abstract (none)
Felix C. M. Quimbo, Tatsuya Kawahara, and Shuji Doshita, “Prosodic analysis of fillers and self-repair in Japanese speech,” in ICSLP, 1998. https://www.isca-speech.org/archive/icslp_1998/i98_0762.html.

Abstract The prosodic features of filled pauses (fillers) and self-repair are investigated with a view towards the detection of disfluencies. First, we compare the prosodic features of typical fillers and their fluent homonyms using read sentences of identical phoneme sequences. It is confirmed that the fillers (1) have at least 2 times longer duration than their non-disfluent counterparts, (2) tend to be followed by definitely longer pauses, and (3) have much smaller movement in their pitch contours. Then, the spontaneous fillers segmented out from a dialogue corpus are also analyzed. The same tendency is confirmed, but some samples lie halfway between the read fillers and their fluent homonyms. The abruptly cut-off endings in self-repair are also analyzed by comparing with the ordinary endings of words. It is found that a short phoneme ending coupled with a relatively short succeeding pause indicates the abrupt cut-off.
Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den, “An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs,” Language and Speech, vol. 41, no. 3-4, 1998, pp. 295-321. DOI: 10.1177/002383099804100404. https://journals.sagepub.com/doi/10.1177/002383099804100404.

Abstract In this study, we investigate syntactic and prosodic features of the speaker's speech at points where turn-taking and backchannels occur, on the basis of our analysis of Japanese spontaneous dialogs. Specifically, we focus on features such as part of speech, duration, F0 contour pattern, relative height of the peak F0, energy trajectory pattern, and relative height of the peak energy at the final part of speech segments. We examine, first, the relationship between turn-taking/backchannels and each feature of speech segments independently, showing that the features examined in this study are all related to turn-taking or backchannels and that the way they correlate is fairly consistent with previous studies. Next, we explore the inter-relationship among the features with respect to turn-taking and backchannels. We show that in both turn-taking and backchannels, (I) some instances of syntactic features make extremely strong contributions, and (2) in general, syntax has a stronger contribution than any individual prosodic feature, although the whole prosody contributes as strongly as, or even more strongly than, syntax. We also discuss some implications of our results, comparing them with previous models that have mentioned roles of syntax and prosody in turn-taking and backchannels.
R. J. Lickley, and E. G. Bard, “When Can Listeners Detect Disfluency in Spontaneous Speech?,” Language and Speech, vol. 41, no. 2, 1998, pp. 203-226. DOI: 10.1177/002383099804100204. https://journals.sagepub.com/doi/10.1177/002383099804100204.

Abstract Three experiments investigated listeners' ability to detect disfluency in spontaneous speech. All employed gated word recognition with judgments of disfluency for spontaneous utterances containing disfluencies and for three kinds of fluent control utterances from the same six speakers: repetitions of corrected recordings of original disfluent items, spontaneous fluent utterances loosely matched in structure to the disfluent items, and repetitions of those spontaneous fluent items. In Experiment 1, 120 stimuli were word-level gated and presented to 20 subjects for word identification and for judgments on whether the utterance was about to become disfluent. Listeners were unable to predict disfluency reliably. New subjects (N = 20, 43) judged whether the same utterances had already become disfluent at each word gate in Experiment 2 or at each 35 ms gate in Experiment 3. Subjects reliably detected existing disfluencies during the first word gate after the interruption and before they recognized the word. Though more common around disfluencies than at similar points in controls, failures of word identification were not reliably associated with detection. Results are discussed in the light of computational models of disfluency detection.
Murray J. Munro, and Tracey M. Derwing, “The Effects of Speaking Rate on Listener Evaluations of Native and Foreign-Accented Speech,” Language Learning, vol. 48, no. 2, June 1998, pp. 159-182. DOI: 10.1111/1467-9922.00038.

Abstract This study tested the hypothesis that accented speech heard at a reduced rate would sound less accented and more comprehensible than speech produced at a normal rate. In 2 experiments, English native-speaker listeners rated a passage read by 10 high-proficiency Mandarin learners of English. In the first experiment, 20 listeners evaluated passages read slowly as more accented and less comprehensible than normal-rate passages. In the second experiment, in which a computer modified speaking rates, 20 new listeners preferred some speeded passages, but none of the slowed ones. Overall, the findings suggest that although native listeners may prefer to hear accented speech at slower rates, a general speaking strategy of slowing down may not help second language learners.
Ralph L. Rose, “The Communicative Value of Filled Pauses in Spontaneous Speech,” Master's Thesis, University of Birmingham, Birmingham, UK, . 1998. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.8422.

Abstract Filled pauses (FPs, e.g. er, erm) and other hesitation phenomena are ever-present elements of spontaneous speech and have been the subject of various psycholinguistic studies. However, recommendations have been sparse for language teaching; consequently little attention is given to FPs in English Language Teaching course materials. The present research addresses this gap. A systematic analysis of hesitation phenomena in a mini-corpus of spontaneous speech supports earlier research on FPs, but suggests a refinement: although researchers have generally combined open and closed FPs (er and erm, respectively), this study suggests they are independent. Recommendations are given on approaches to FPs in the language classroom. It is suggested that a focus on FPs may benefit listening comprehension by encouraging students to make use of speakers’ pause time to process input. FPs may further benefit speaking ability by helping students to hold their conversational turns and to improve their apparent fluency. Specific activities designed to improve both listening and speaking skills are given.
Marc Swerts, “Filled pauses as markers of discourse structure,” Journal of Pragmatics, vol. 30, no. 4, 1998, pp. 485-496. DOI: 10.1016/S0378-2166(98)00014-9.

Abstract This study aims to test whether filled pauses (FPs) may highlight discourse structure. This question is tackled from the perspectives of both the speaker and the listener. More specifically, it is first investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically different from those at shallower breaks. Analyses of twelve spontaneous monologues (Dutch) show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phrase-initially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also differ with respect to the distribution of neighbouring silent pauses. Finally, a general linear model reveals that discourse structure can to some extent be predicted from characteristics of the FPs.

1997

Yasuharu Den, Yuu Haruki, and Masato Ishizaki, “A Corpus-based Analysis of Speech Repairs in Japanese,” in Computational Psycholinguistics, August 1997.

Abstract this paper is preliminary, it basically showed that Levelt’s monitoring theory holds for speech repairs naturally occurring in Japanese dialogues except for the timing of trouble detection and the nonretracing strategy in substitution of function words. These exceptions might indicate a difference of cognitive linguistic units between Japanese and other languages like Dutch. This issue is the next target of our study.
Jean E. Fox Tree, and Herbert Clark, “Pronouncing "the" as "thee" to signal problems in speaking,” Cognition, vol. 62, 1997, pp. 151-167. DOI: 10.1016/S0010-0277(96)00781-0.

Abstract In spontaneous speaking, the is normally pronounced as thuh, with the reduced vowel schwa (rhyming with the first syllable of about). But it is sometimes pronounced as thiy, with a nonreduced vowel (rhyming with see). In a large corpus of spontaneous English conversation, speakers were found to use thiy to signal an immediate suspension of speech to deal with a problem in production. Fully 81% of the instances of thiy in the corpus were followed by a suspension of speech, whereas only 7% of a matched sample of thuhs were followed by such suspensions. The problems people dealt with after thiy were at many levels of production, including articulation, word retrieval, and choice of message, but most were in the following nominal.
Stefan G. Hofmann, Alexander L. Gerlach, Amy Wender, and Walton T. Roth, “Speech Disturbances and Gaze Behavior During Public Speaking in Subtypes of Social Phobia,” Journal of Anxiety Disorders, vol. 11, no. 6, 1997, pp. 573-585. DOI: 10.1016/S0887-6185(97)00040-6.

Abstract Twenty-four social phobics with public speaking anxiety and 25 nonphobic individuals (controls) gave a speech in front of two people. Subjective anxiety, gaze behavior, and speech disturbances were assessed. Based on subjects’ fear ratings of social situations, phobics and controls were divided into the generalized and nongeneralized subtype. Results showed that generalized phobics reported the most, and nongeneralized controls the least anxiety during public speaking. All subjects had longer and more frequent eye contact when delivering a speech than when talking with an experimenter or sitting in front of an audience. Phobics showed more filled pauses, had longer silent pauses, paused more frequently, and spent more time pausing than controls when giving a speech. Generalized phobics spent more time pausing during their speech than the other subgroups (nongeneralized controls, generalized controls, and nongeneralized phobics). These results suggest that generalized phobics tended to shift attentional resources from speech production to other cognitive tasks.
Elizabeth Shriberg, R. Bates, and A. Stolcke, “A prosody-only decision-tree model for disfluency detection,” in Proceedings of Eurospeech, vol. 5, Rhodes, Greece, 1997, pp. 2383-2386. http://130.203.136.60/viewdoc/summary?doi=10.1.1.158.3851.

Abstract Speech disfluencies (filled pauses, repetitions, repairs, and false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for effective natural language understanding, as well as to improve speech models in general. Previous approaches to disfluency detection have relied heavily on lexical information, which makes them less applicable when word recognition is unreliable. We have developed a disfluency detection method using decision tree classifiers that use only local and automatically extracted prosodic features. Because the model doesn’t rely on lexical informtion, it is widely applicable even when word recognition is unreliable. The model performed significantly better than chance at detecting four disfluency types. It also outperformed a language model in the detection of false starts, given the correct transcription. Combining the prosody model with a specialized language model improved accuracy over either model alone for the detection of false starts. Results suggest that a prosody-only model can aid the automatic detection of disfluencies in spontaneous speech.

1996

Nicholas Christenfeld, “Effects of a Metronome on the Filled Pauses of Fluent Speakers,” Journal of Speech & Hearing Research, vol. 39, no. 6, 1996, pp. 1232-1238. http://jslhr.asha.org/cgi/content/abstract/39/6/1232.

Abstract Filled pauses (the "ums" and "uhs" that litter spontaneous speech) seem to be a product of the speaker paying deliberate attention to the normally automatic act of talking. This is the same sort of explanation that has been offered for stuttering. In this paper we explore whether a manipulation that has long been known to decrease stuttering, synchronizing speech to the beats of a metronome, will then also decrease filled pauses. Two experiments indicate that a metronome has a dramatic effect on the production of filled pauses. This effect is not due to any simplification or slowing of the speech and supports the view that a metronome causes speakers to attend more to how they are talking and less to what they are saying. It also lends support to the connection between stutters and filled pauses.

Keywords automaticity, disfluency, filled pauses, metronome, stuttering
Nicholas Christenfeld, and Beth Creager, “Anxiety, Alcohol, Aphasia, and Ums,” Journal of Personality and Social Psychology, vol. 70, no. 3, 1996, pp. 451-460. DOI: 10.1037/0022-3514.70.3.451.

Abstract Although several studies have documented a link between anxiety and filled pauses (ums, ers, and uhs), numerous failures make it impossible to believe that the two are linked in any simple way. This article suggests anxiety may increase ums not when it makes the speech task harder but when it causes the speaker to pay attention to the speech. Two experiments examined this idea. One manipulated evaluation apprehension, and the other manipulated self-consciousness. Both showed dramatic increases in ums. Two more studies examined the real-world implications of this approach. Alcohol, which makes speaking harder but also makes speakers care less about what they say, was found to reduce ums. The second study found that Broca’s aphasics, who produce simple speech but must deliberate over every word, produce many ums. Wernicke’s aphasics may not talk well, but do not mind, and manage with few ums.

Keywords anxiety & self consciousness, Broca’s aphasia, college students, social alcohol drinkers, speech fluency, Wernicke’s aphasia
Barbara A. Fox, Makoto Hayashi, and Robert Jasperson, “Resources and repair: a cross-linguistic study of syntax and repair,” in Interaction and Grammar (Studies in Interactional Sociolinguistics), Ochs, Elinor and Schegloff, Emanuel A. and Thompson, Sandra A., Ed.Cambridge, UK: Cambridge University Press, 1996, ch. 4, pp. 185-237. DOI: 10.1017/CBO9780511620874.004.

Abstract The organization of repair in conversation has been the focus of much work in conversation analysis and related fields over the last twenty years (e.g., Hockett, 1967; Du Bois, 1974; Jefferson, 1974, 1987; Moerman, 1977; Schegloff, Sacks, and Jefferson, 1977; Schegloff, 1979, 1987a; Goodwin, 1981; Levelt, 1982, 1983, 1989; Carbonell and Hayes, 1983; Hindle, 1983; Levelt and Cutler, 1983; Reilly, 1987; van Wijk and Kempen, 1987; Good, 1990; Postma, Kolk, and Povel, 1990; Bredart, 1991; Blacker and Mitton, 1991; Bear, Dowding, and Shriberg, 1992; Couper-Kuhlen, 1992; Local, 1992; Shriberg, Bear, and Dowding, 1992; Nakatani and Hirschberg, 1993). This work has uncovered the mechanisms of self- and other-initiation of repair, self- and other-achievement of repair, repair position, perception of repair, and so on. But within this fairly extensive literature, the relationships between repair and syntax have received relatively little attention (the major exceptions being Schegloff, 1979; Goodwin, 1981; Levelt, 1983; Geluykens, 1987; van Wijk and Kempen, 1987; Fox and Jasperson, frth.). And the operation of repair in different languages, with different syntactic systems, has, to the best of our knowledge, been the object of only a small body of research (see Schegloff, 1987b). This present study aims to begin to fill this gap by focusing on the syntax of repair from a cross-linguistic perspective. Cross-linguistic work on repair is especially compelling to us given our own, and others", research on the relationships between same-turn (also known as first-position) self-repair and syntax in English conversation (Schegloff, 1979, this volume; Fox and Jasperson, frth.).

Keywords Language and linguistics, Linguistic anthropology, Sociolinguistics
Elizabeth Shriberg, “Disfluencies in SWITCHBOARD,” in Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, October 1996, pp. 11-14. http://www.speech.sri.com/people/ees/publications.html.

Abstract Disfluencies ("um," repeats, self-repairs) are prevalent in spontaneous speech, and are relevant to both human speech communicatoin and speech processing by machine. Although disfluencies have commonly been viewed as ’noisy’ events, results from a large descriptive study indicate that disfluencies show regularities in a number of dimensions (Shriberg, 1994). This paper reports selected results on Switchboard and two comparison corpora of spontaneous speech. Results illustrate the systematic distribution of disfluencies, and highlight differences as well as universals across corpora and speakers.
Elizabeth Shriberg, and Andreas Stolcke, “Word predictability after hesitations: A corpus-based study,” in Proceedings of the International Conference on Spoken Language Processing, vol. 3, 1996, pp. 1868-1871. http://www.speech.sri.com/people/ees/publications.html.

Abstract We ask whether lexical hesitations in spontaneous speech tend to precede words that are difficult to predict. We define predictability in terms of both transition probability and entropy, in the context of an N-gram language model. Results show that transition probability is significantly lower at hesitation transitions, and that this is attributable to both the following word and the word history. In addition, results suggest that fluent transitions in sentences with a hesitation elsewhere are significantly more likely than transitions in fluent sentences to contain out-of-vocabulary words and novel word combinations. Such findings could be used to improve statistical language modeling for spontaneous-speech applications.
Andreas Stolcke, and Elizabeth Shriberg, “Statistical language modeling for speech disfluencies,” in Proceedings of the International Conference on Acoustics: Speech and Signal Processing, vol. 1, Atlanta, GA, May 1996, pp. 405-408. http://www.speech.sri.com/people/ees/publications.html.

Abstract Speech disfluencies (such as filled pauses, repetitions, restarts) are among the characteristics distinguishing spontaneous speech from planned or read speech. We introduce a language model that predicts disfluencies probabilistically and uses an edited, fluent context to predict following words. The model is based on a generalization of the standard N-gram language model. It uses dynamic programming to compute the probability of a word sequence, taking into account possible hidden disfluency events. We analyze the model’s performance for various disfluency types on the Switchoard corpus. We find that the model reduces word perplexity in the neighborhood of disfluency events; however, overall differences are small and have no significant impact on recognition accuracy. We also note that for modeling of the most frequent type of disfluency, filled pauses, a segmentation of utterances into linguistic (rather than acoustic) units is required. Our analysis illustrates a generally useful technique for language model evaluation based on local perplexity comparisons.
Marc Swerts, Anne Wichmann, and Robbert-Jan Beun, “Filled pauses as markers of discourse structure,” in Fourth International Conference on Spoken Language Processing, vol. 2, 1996, pp. 1033 - 1036. DOI: 10.1109/ICSLP.1996.607780.

Abstract The study aims to test quantitatively whether filled pauses (FPs) may highlight discourse structure. More specifically it is first investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, the FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically different from those at shallower breaks. Analyses of twelve spontaneous monologues (Dutch) show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phrase-initially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also differ with respect to the distribution of neighbouring silent pauses
R. Towell, Roger Hawkins, and N. Bazergui, “The Development of Fluency in Advanced Learners of French,” Applied Linguistics, vol. 17, no. 1, March 1996, pp. 84-119. DOI: 10.1093/applin/17.1.84. http://applij.oxfordjournals.org/content/17/1/84.abstract.

Abstract In this article, it will be argued that the proceduralization of linguistic knowledge is the most important factor in the development of fluency in advanced second language learners Levelt’s (1989) model of language production is used to provide the descriptive base for the sub-processes of language production This posits the existence of a conceptualizer, a formulator, and an articulator, each of which contains procedural knowledge Levelt’s model does not, however, deal with how that knowledge is developed It is proposed that Anderson’s (1983) model of adaptive control of thought may be used to account for developmental aspects This posus that the learning process involves the conversion of declarative knowledge into procedural knowledge via cognitive, associative, and autonomous stages of compilation and tuning Neither Levelt nor Anderson, however, have stated how the contribution of the sub-processes or how the developmental stages may be measured in language use It is argued that the temporal variables used by Grosjean and Deschamps (1972, 1973, 1975) provide a way of fluency and (b) the contribution of the sub-processed in the model Evidence from 12 advanced learners of French and English is used to show how this may be done Initial results from experiments indicate that on a specific task learners became more fluent (as measured by speaking rate) as a result of the period of residence abroad and that an increase in mean length of run was the most important of the temporal variables contributing to this development It is argued that the increase in mean length of run is mainly attributable to the proceduralization of different kinds of knowledge, including procedural knowledge of syntax and of lexical phrases (Nattiger and DeCarnco 1992) The way in which this may have taken place is illustrated by means of extracts from the texts produced by the subjects We conclude that the quantitative and qualitative evidence supports the contention that increases in fluency are attributable mainly to increases in the degree of proceduralization of knowledge

1995

Susan Brennan, and Maurice Williams, “The feeling of another’s knowing: Prosody and filled pauses as cues to listeners about the metacognitive states of speakers,” Journal of Memory and Language, vol. 34, no. 3, 1995, pp. 383-398. DOI: 10.1006/jmla.1995.1017.

Abstract In question-answering, speakers display their metacognitive states using filled pauses and prosody (Smith & Clark, 1993). We examined whether listeners are actually sensitive to this information. Experiment 1 replicated Smith and Clark’s study; respondents were tested on general knowledge questions, surveyed about their FOK (feeling-of-knowing) for these questions, and tested for recognition of answers. In Experiment 2, listeners heard spontaneous verbal responses from Experiment 1 and were tested on their feeling-of-another’s-knowing (FOAK) to see if metacognitive information was reliably conveyed by the surface form of responses. For answers, rising intonation and longer latencies led to fewer FOAK ratings by listeners. For nonanswers, longer latencies led to higher FOAK ratings. In Experiment 3, electronically edited responses with 1-s latencies led to higher FOAK ratings for answers and lower FOAK ratings for nonanswers than those with 5-s latencies. Filled pauses led to lower ratings for answers and higher ratings for nonanswers than did unfilled pauses. There was no support for a filler-as-morpheme hypothesis, that "um" and "uh" contrast in meaning. We conclude that listeners can interpret the metacognitive information that speakers display about their states of knowledge in question-answering.
Nicholas Christenfeld, “Does it Hurt to Say Um?,” Journal of Nonverbal Behavior, vol. 19, no. 3, 1995, pp. 171-186. DOI: 10.1007/BF02175503.

Abstract This paper examines whether the profusion of ums that so many speakers produce is noticed, and whether these ums influence what audiences think of speakers. Even though ums do not seem to be a product of anxiety or lack of preparation, the first study, using a simple questionnaire, indicated that the average listener assumes that they are. The second study manipulated um rates by editing a tape to create a version where ums were replaced by silence or were eliminated. The original and edited versions were played to audiences who were told to focus on either the content or the style, or were not given any particular instructions. Estimates of ums showed no sensitivity whatsoever in the content focus, some sensitivity without focus instruction, and greatest sensitivity with the style focus, suggesting that ums can be, but are not always, processed automatically. On subjective ratings of the speaker, filled pauses created a better impression than silent pauses, but no pauses proved best of all. The ums had an effect even in conditions where the audience was unable to report their presence.
Jean E. Fox Tree, “The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech,” Journal of Memory and Language, vol. 34, no. 6, 1995, pp. 709-738. DOI: 10.1006/jmla.1995.1032.

Abstract Speech disfluencies have different effects on comprehension depending on the type and placement of disfluency. Words following false starts (such as windmill after in the in the eleventh example is um in the a windmill) have longer word monitoring latencies than the same tokens with the false starts excised. The decremental effect seems to be limited to false starts that occur in the middle of sentences or after discourse markers. I suggest it is at these points that the repair process is most burdened by the false start. In contrast, words following repetitions (heart in of a of a heart) do not have longer word monitoring latencies than the same tokens with the repetitions excised. In two experiments, words following spontaneously produced repetitions have faster word monitoring latencies. Two other experiments suggest that this seeming repetition advantage is more likely the result of slowed monitoring after a phonological phrase disruption. Inserting repetitions where they did not occur in a manner that preserved the original phonological phrases resulted in neither an advantage nor a disadvantage or repeating. These studies provide a first glimpse at how speech disfluencies affect understanding, and also provide information about the types of comprehension models that can accommodate the effects of speech disfluencies.
Robin J. Lickley, “Missing Disfluencies,” in Proceedings of International Congress of Phonetic Science, vol. 4, Stockholm, 1995, pp. 192-195. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS1995/ICPhS95_Vol4.

Abstract Everyday experience suggests that many disfluencies pass unnoticed by listeners attending to speech. This paper presents the results of a perception experiment on a corpus of spontaneous Dutch speech, where the subjects are asked to detect disfluencies as they compare a transcript with the recording they are hearing. The results show that many disfluencies are missed by listeners even when they are trying to spot them.
Sharon Oviatt, “Predicting spoken disfluencies during human–computer interaction,” Computer Speech & Language, vol. 9, no. 1, January 1995, pp. 19-35. DOI: https://doi.org/10.1006/csla.1995.0002. http://www.sciencedirect.com/science/article/pii/S0885230885700022.

Abstract This research characterizes the spontaneous spoken disfluencies typical of human–computer interaction, and presents a predictive model accounting for their occurrence. Data were collected during three empirical studies in which people spoke or wrote to a highly interactive simulated system as they completed service transactions. The studies involved within-subject factorial designs in which the input modality and presentation format were varied. Spoken disfluency rates during human–computer interaction were documented to be substantially lower than rates typically observed during comparable human–human speech. Two separate factors, both associated with increased planning demands, were statistically related to higher disfluency rates: (1) length of utterance; and (2) lack of structure in the presentation format. Regression techniques demonstrated that a linear model based simply on utterance length accounted for over 77% of the variability in spoken disfluencies. Therefore, design methods capable of guiding users» speech into briefer sentences have the potential to eliminate the majority of spoken disfluencies. In this research, for example, a structured presentation format successfully eliminated 60–70% of all disfluent speech. The long-term goal of this research is to provide empirical guidance for the design of robust spoken language technology.
V.M. Holmes, “A crosslinguistic comparison of the production of utterances in discourse,” Cognition, vol. 54, no. 2, 02/1995 1995, pp. 169-207. DOI: http://dx.doi.org/10.1016/0010-0277(94)00635-X. http://www.sciencedirect.com/science/article/pii/001002779400635X.

Abstract Functionalist theorists have proposed a number of decisions that a speaker has to make regarding the packaging of messages in response to the knowledge shared by the speaker and the listener in a discourse situation. The present study examined some procedures used by French and English speakers to implement message packaging during sentence formulation. The speech of French and English students talking informally about topics of interest to them was recorded, and hesitations were identified and located in the speech. According to the hesitation data, like English speakers, French speakers organised their thoughts into successive units having a variety of structural characterisations. Sentences, surface clauses, basic clauses and phrases were all found to be output units. In addition, French as well as English speakers output clauses containing new information more independently than clauses either containing presupposed information or satisfying an essential argument of the verb. French speakers also differed from English speakers in several ways. During articulation, phrases acted as more tightly integrated output units for French than for English speakers. French speakers also used different syntactic devices from English speakers for introducing and focussing on topics in the discourse. They did this by means other than the use of lexical subjects, such as left-detached topics and cleft sentences, supporting the hypothesis that spoken French has topic-comment structure, while English has subject-verb-object organisation. The crosslinguistic differences were argued to result largely from the distinct prosodic characteristics of the languages. The results were seen as providing new evidence for the similar and contrasting ways in which speakers of different languages respond to decisions about message packaging.

1994

Janet Anderson-Hsieh, and Horabail Venkatagiri, “Syllable Duration and Pausing in the Speech of Chinese ESL Speakers,” TESOL Quarterly, vol. 28, no. 4, 1994, pp. 807-812. http://www.eric.ed.gov/ERICWebPortal/detail?accno=EJ499443.

Abstract Reports a study that investigated syllable duration and pausing in Chinese speakers learning English as a Second Language.
Nicholas Christenfeld, “Options and Ums,” Journal of Language & Social Psychology, vol. 13, no. 2, June 1994, pp. 192-199. DOI: 10.1177/0261927X94132005.

Abstract Most people who have speculated about the causes of ums in speech (also known as filled pauses) have suggested that they are produced when the speaker is confronted with a challenging choice. This idea, in spite of its intuitive appeal and theoretical usefulness, has never been directly tested. The present experiment manipulates the complexity of options facing a speaker by having subjects describe mazes with a varying number of alternate possible routes. The mazes with more options did produce more filled pauses. However, in describing even the simplest maze, one of the easiest possible speech tasks, the subjects still said um regularly. It is suggested that options are only one factor in filled pause production, and that breaking up the rhythm of speech may also fosterfilled pauses.
Herbert Clark, “Managing Problems in Speaking,” Speech Communication, vol. 15, 1994, pp. 243-250. DOI: 10.1016/0167-6393(94)90075-2.

Abstract The problems that participants in conversation have, it is argued, are really joint problems and have to be managed jointly. The participants have three types of strategies for managing them. (1) They try to prevent foreseeable but avoidable problems. (2) They warn partners of foreseeable but unavoidable problems. And (3) they repair problems that have already arisen. Speakers and addressees coordinate actions at three levels of talk: (1) the speaker’s articulation and the addressees’ attention to that articulation; (2) the speaker’s presentation of an utterance and the addressees’ identification of that utterance; and (3) the speaker’s meaning and the addressees’ understanding of that meaning. There is evidence that the participants have joint strategies for preventing, warning about and repairing problems at each of these levels. There is also evidence that they prefer preventatives to warnings, and warnings to repairs, all other things being equal.
Elizabeth Shriberg, “Preliminaries to a theory of speech disfluencies,” Master's Thesis, University of California, Berkeley. 1994. http://www.speech.sri.com/people/ees/publications.html.

Abstract This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably regular trends in a number of dimensions. These regularities have consequences for models of human language production; they can also be exploited to improve performance in speech applications. The method includes analysis of over 5000 hand-annotated disfluencies from a database 250,000 words) containing three different styles of spontaneous speech: task-oriented human-computer dialog, task-oriented human-human dialog, and human-human conversation on a prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations correspond to observable characteristics ("features") in the data, including: 1) the speech domain; 2) the speaker; 3) the sentence in which a disfluency occurs; 4) word-related characteristics of the disfluency; and 5) simple acoustic characteristics of the disfluency. A methodology is developed for representing these features in a database format, and an algorithm is provided for automatic disfluency type classification based on this representation. Results show regular trends in disfluency rates by sentence length, by disfluency position, by presence of another disfluency in the same sentence, by disfluency type, and by combinations of these features both across and within speakers. Regularities are also found for word-related features of the disfluency, including the number of excised words, the rate of cut-off words, and the rate of editing phrases. Additional analyses describe characteristics of overlapping disfluencies and prosodic characteristics of the simplest disfluency types. Across analyses, data from the three different speech styles are compared; where relevant, simple parametric models are provided. In sum, disfluencies show regularities in a variety of dimensions. These regularities can help guide and constrain models of spoken language production. In addition they can be modeled in applications to improve the automatic processing of spontaneous speech.
Brigitte Zellner, “Pauses and the temporal structure of speech,” in Fundamentals of speech synthesis and speech recognition, Keller, Eric, Ed.Chichester: John Wiley, September 1994, ch. 3, pp. 41-62. DOI: 10.5555/214780.214786. https://dl.acm.org/doi/10.5555/214780.214786.

Abstract (none)

1993

Elizabeth Shriberg, and Robin J. Lickley, “Intonation of clause-internal filled pauses,” Phonetica, vol. 50, no. 3, 1993, pp. 172-179. DOI: 10.1159/000261937. http://www.speech.sri.com/people/ees/publications.html.

Abstract Clause-internal filled pauses and preceding peak fundamental frequency (F0) values were analyzed to determine whether the intonation of filled pauses is relative to, or independent of, prior prosodic context. Higher peaks were found to be systematically associated with higher filled-pause values, supporting the ‘relative’ hypothesis. A linear model, in which filled-pause F0 was expressed as an invariant (over speakers) proportion of the distance between preceding peak F0 and a speaker-dependent baseline F0, produced results nearly identical to those of a two-parameter model in which the coefficients of peak and baseline were allowed to vary freely. The model was less appropriate for filled pauses after sentence-initial peaks, but unaffected by temporal variables.
Vicki Smith, and Herbert Clark, “On the course of answering questions,” Journal of Memory and Language, vol. 32, no. 1, February 1993, pp. 25-38. DOI: 10.1006/jmla.1993.1002.

Abstract People responding to questions are sometimes uncertain, slow, or unable to answer. They handle these problems of self-presentation, we propose, by the way they respond. Twenty-five respondents were each asked 40 factual questions in a conversational setting. Later, they rated for each question their feeling that they would recognize the correct answer, then took a recognition test on all 40 questions. As found previously, the weaker their feeling of knowing, the slower their answers, the faster their nonanswers ("I don't know"), and the worse their recognition. But further, as proposed, the weaker their feeling of knowing, the more often they answered with rising intonation, used hedges such as "I guess," responded "I don't know" instead of "I can't remember," and added "uh" or "um," self-talk, and other face-saving comments. They reliably used "uh" to signal brief delays and "um" longer ones.

1992

“To Er (or Um) is Human,” Discover, vol. 13, no. 1, 1992, pp. 8+.

Abstract (none)

1991

Elizabeth R. Blackmer, and Janet L. Mitton, “Theories of monitoring and the timing of repairs in spontaneous speech,” Cognition, vol. 39, no. 3, June 1991, pp. 173-194. DOI: 10.1016/0010-0277(91)90052-6. https://www.sciencedirect.com/science/article/pii/0010027791900526.

Abstract This study reports the first data to be published on the timing of self-repairs in spontaneous speech, giving means and confidence intervals for cut-off-to repair, error-to-repair, and cut-off-to-repair times for different types of repair based on 1525 repairs made in the conversational turns of 61 callers to a radio talk show. The three most detailed models of monitoring are discussed in the introduction, with emphasis on their temporal implications. Many of the cut-off-to-repair times observed were faster than would be predicted by any model in the Literature. Laver's (1980) theory of monitoring is shown to be incongruent with the observed times, as is Levelt's (1983, 1989) main interruption rule. The results show that people can plan corrections to their speech while talking, and suggest that Kempen and Hoenkamp's (1987) concept of incremental processing can be extended to repairs.
Eileen Blau, “More on Comprehensible Input: The Effect of Pauses and Hesitation Markers on Listening Comprehension,” 1991. http://www.eric.ed.gov/ERICWebPortal/detail?accno=ED340234.

Abstract Two studies, one in Puerto Rico and one in Japan, assessed the effects of pauses and hesitation markers on listening comprehension of university students who were learners of English as a Second Language. In one, 61 students of basic English were assigned to three groups to hear monologues under three conditions: (1) normal speed; (2) with 3-second pauses inserted, on average, every 23 words; and (3) with similar pauses filled with hesitation markers (e.g., "well, I mean, uh"). Students responded in Spanish to questions immediately after each monologue. Results indicate comprehension of the version with filled pauses was significantly higher than comprehension of the normal version. The version with blank pauses was understood slightly less well than the filled-pause version. In the second study, 48 Japanese education majors were randomly assigned to four groups. Three heard the monologues used in the previous study and the fourth heard a mechanically slowed monologue. Comprehension questions were in English. Results indicate comprehension of the filled-pause version was significantly better than for the normal, slow, and blank-pause versions, with little difference in comprehension found among those versions. Overall, insertion of hesitation markers was the most effective aid to listening comprehension. Instructional implications are considered.
Nicholas Christenfeld, Stanley Schachter, and Frances Bilous, “Filled Pauses and Gestures: It’s Not Coincidence,” Journal of Psycholinguistic Research, vol. 20, no. 1, 1991, pp. 1-20. DOI: 10.1007/BF01076916.

Abstract Though filled pauses and gestures frequently accompany speech, their function is not well understood. We suggest that it may be helpful in furthering our knowledge of these phenomena to examine their relationship to each other. To this end, we carried out two studies examining whether they tend to occur together, or to occur at separate times. Both faculty colloquium speakers and undergraduate subjects used filled pauses less frequently when they were gesturing than when they were not gesturing. This effect held for 30 out of 31 subjects. We suggest that detailed theories may be premature, but speculate that gestures may be an indication that the speech production apparatus has completed its search for the next word, phrase or idea and is ready to continue.
Roger Griffiths, “Pausological Research in an L2 Context: A Rationale, and Review of Selected Studies,” Applied Linguistics, vol. 12, no. 4, December 1991, pp. 345-364. DOI: 10.1093/applin/12.4.345. http://applij.oxfordjournals.org/content/12/4/345.short.

Abstract It is here suggested that temporal variables, such as speech rate, and pause and hesitation phenomena, which are studied within the science of pausology, are of direct relevance to L2 (second language) research and ELT methodology. Examples are given, however, to demonstrate that the use of methodology conventions from this very specialized area are not evident in early L2 research, and it is only as they are increasingly observed that L2 findings can be reported with confidence.
William Safire, “Impregnating the Pause,” New York Times Magazine, June 1991, pp. 8. https://www.nytimes.com/1991/06/16/magazine/on-language-impregnating-the-pause.html.

Abstract (none)
Stanley Schachter, Nicholas Christenfeld, Bernard Ravina, and Frances Bilous, “Speech Disfluency and the Structure of Knowledge,” Journal of Personality and Social Psychology, vol. 60, no. 3, 1991, pp. 362-367. DOI: 10.1037/0022-3514.60.3.362.

Abstract It is generally accepted that filled pauses ("uh," "er," and "um") indicate time out while the speaker searches for the next word or phrase. It is hypothesized that the more options, the more likely that a speaker will say "uh." The academic disciplines differ in the extent to which their subject matter and mode of thought require a speaker to choose among options. The more formal, structured, and factual the discipline, the fewer the options. It follows that lecturers in the humanities should use more filled pauses during lectures than social scientists and that natural scientists should use fewest of all. Observations of lecturers in 10 academic disciplines indicate that this is the case. That this is due to subject matter rather than to self-selection into disciplines is suggested by observations of this same set of lecturers all speaking on a common subject. In this circumstance, the academic disciplines are identical in the number of filled pauses used.

Keywords lecturers, number of filled pauses in speech, word options in academic discipline

1990

Jens Allwood, Joakim Nivre, and Elisabeth Ahlsén, “Speech Management - on the Non-written Life of Speech,” Nordic Journal of Linguistics, vol. 13, no. 01, 1990, pp. 3-48. DOI: 10.1017/s0332586500002092. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=2863368&fulltextType=RA&fileId=S0332586500002092.

Abstract This paper introduces the concept of speech management (SM), which refers to processes whereby a speaker manages his or her linguistic contributions to a communicative interaction, and which involves phenomena which have previously been studied under such rubrics as "planning", "editing", "(self-)repair", etc. It is argued that SM phenomena exhibit considerable systematicity and regularity and must be considered part of the linguistic system. Furthermore, it is argued that SM phenomena must be related not only to such intraindividual factors as planning and memory, but also to interactional factors such as turntaking and feedback, and to informational content. Structural and functional taxonomies are presented together with a formal description of complex types of SM. The structural types are exemplified with data from a corpus of SM phenomena.
Paul Lennon, “Investigating Fluency in EFL: A Quantitative Approach,” Language Learning, vol. 40, no. 3, 9 1990, pp. 387–417. DOI: 10.1111/j.1467-1770.1990.tb00669.x. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-1770.1990.tb00669.x.

Abstract This paper investigates various easily quantifiable performance features that might function as objective indicators of oral fluency. It would be advantageous if we could assemble a set of variables that functioned as good indicators of what expert judges, such as experienced native-speaker EFL teachers, are reacting to when subjectively assessing fluency. This would advance our knowledge of what constitutes fluency and especially what makes for perceived fluency differences among learners and how an individual learner improves in fluency over time. | To these ends a sample of the spoken performance of four advanced EFL learners was recorded at the start of six-months’ residence in Britain and again shortly before departure. A panel of 10 native-speaker teachers of EFL subjectively rated the recordings for global fluency and generally agreed that the second set was more fluent than was the first, though for each subject one or two panel members dissented. | A battery of 12 readily quantifiable performance variables considered to be related to fluency was then assembled. Values per subject per recording were obtained, expressed as frequency rates or as proportions so that comparisons could be made between first and second renderings. For each variable, subjects’ scores were compared between the two time points to ascertain in which features improvements were consistently manifested. For each variable t-tests were conducted between sample means at Week 2 and Week 23. Improvement of note at the 0.05 level of significance was found for three variables (one-tailed test), namely, speech rate, filled pauses per T-Unit, and percentage of T-Units followed by pause. Surprisingly, self-corrections did not prove a good indicator. | The implications of the study are that quantitative analysis can indeed help to identify fluency improvements in individual learners, and may have the potential to provide objective assessment of spoken fluency. Findings revealed two key areas of performance that seem to be important for fluency: (1) speech-pause relationships in performance and (2) frequency of occurrence of dysfluency markers such as filled pauses and repetitions (but not necessarily self-corrections). | However, even from this small-scale study it does seem that there is scope for individual variation among subjects in the precise areas in which fluency improvements may occur. Further research might be able to identify both “core” and “peripheral” fluency variables. | Quantitative analysis has applications both as a testing instrument and as a diagnostic tool to identify individual learner strengths and weaknesses among the components of fluency. Investigation of native-speaker performance might provide native-like target score ranges on each variable for learners to aim at.
Norman Markel, “Speaking style as an expression of solidarity: Words per pause,” Language in Society, vol. 19, no. 01, 1990, pp. 81-88. DOI: 10.1017/s0047404500014123. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=3003284&fulltextType=RA&fileId=S0047404500014123.

Abstract This study examines the use of words per pause (W/P) as a practical means for identifying solidarity in everyday conversation. Eight listeners recorded the narratives of a female and a male, either friends or strangers. Ten speakers were categorized as friends and six as strangers; they talked about a good and a bad experience. Average reliability of coding pauses was .83. The results indicated a statistically significant difference in W/P of speakers who were friends and those who were strangers. Statistical results support the conclusion that friends are more likely to employ many W/P and strangers few W/P. One practical implication of this study is that W/P can be employed by researchers with relative ease and a high degree of reliability for investigations of speaking style in a variety of contexts. A second practical implication is that W/P is a diagnostic device that can serve as a social litmus test in everyday conversation to identify the expression of sympathy and estrangement.

Keywords Expressive language, nonverbal communication, paralanguage, pauses, psycholinguistics, Sociolinguistics, solidarity, speech and personality
Frank Wijnen, “The development of sentence planning,” Journal of Child Language, vol. 17, no. 03, 1990, pp. 651-675. DOI: 10.1017/s030500090001093x. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=4234816&fulltextType=RA&fileId=S030500090001093X.

Abstract This is an exploratory case study of the relation between speech output disturbances (dysfluencies) and the development of language production processes. The data consist of transcribed weekly speech samples of a Dutch boy between 2;4 and 2;11. The period of observation captures the early phase of the transition from to grammatical language. The frequency of occurrence of dysfluencies (i.e. repetitions, revisions and incomplete phrases) shows a significant increase and a subsequent decline. Whereas in the first half of the observation period the dysfluencies are distributed relatively randomly over sentences, in the second half they tend to concentrate in function words and sentence-initial words. The decline of dysfluency rate is shown to be related to an abundant use of a few . It is argued that these results reflect the emergence of a component in the speech production apparatus which is specifically dedicated to serial-order planning.

1989

John O. Greene, and A. E. Lindsey, “Encoding Processes in the Production of Multiple-Goal Messages,” Human Communication Research, vol. 16, no. 1, 1989, pp. 120-140. DOI: 10.1111/j.1468-2958.1989.tb00207.x.

Abstract It is commonly recognized that interpersonal messages function in the service of multiple social goals. Despite this, relatively little is known of the encoding processes underlying the production of such messages. One possible account of these encoding processes is found in action assembly theory. This article explicates the production of multiple-goal messages from the perspective of action assembly theory and reports an experimental investigation of this account. In this study, the speech of participants assigned the task of pursuing multiple social goals was contrasted with that of people assigned a single task. Consistent with the theory, the results revealed that participants pursuing multiple goals had longer onset latencies than their counterparts given a single goal. Similarly, multiple goals were associated with greater pause/phonation ratios after the onset of speech. The effects of opportunity for advance message preparation were also examined. As expected, participants given the opportunity for advance planning exhibited shorter response latencies than those who spoke spontaneously. In keeping with previous research in this area, filled-pause rate was not significantly affected by either number of goals or the opportunity for advance preparation.
Lawrence A. Hosman, “The Evaluative Consequences of Hedges, Hesitations, and Intensifiers: Powerful and Powerless Speech Styles,” Human Communication Research, vol. 15, no. 3, 1989, pp. 383-406. DOI: 10.1111/j.1468-2958.1989.tb00190.x.

Abstract This article examines the separate and combined impact of hedges, hesitations, and intensifiers on perceptions of authoritativeness, sociability, character, and similarity, and the extent to which messages containing one or more of these language variables differs from a "prototypically" powerless message in evaluative consequences. A "prototypically" powerless message is one that contains not only hedges, hesitations, and intensifiers, but also contains polite forms and meaningless particles, such as "oh, well" and "you know." Two studies indicated that hedges and hesitations individually affected perceptions of authoritativeness and sociability, but interactions among the variables were not found in the studies. Furthermore, only high intensifiers/low hedges/low hesitations and low intensifiers/low hedges/low hesitations messages differed significantly from the "prototypically" powerless message. The second study revealed that speaker status interacted to affect evaluative consequences. The results are discussed in terms of their implications for the power of speech style construct.
Willem J. M. Levelt, Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.1989. https://mitpress.mit.edu/books/speaking.

Abstract In Speaking, Willem "Pim" Levelt, Director of the Max-Planck-Institut für Psycholinguistik, accomplishes the formidable task of covering the entire process of speech production, from constraints on conversational appropriateness to articulation and self-monitoring of speech. Speaking is unique in its balanced coverage of all major aspects of the production of speech, in the completeness of its treatment of the entire speech process, and in its strategy of exemplifying rather than formalizing theoretical issues.
Paul Nation, “Improving speaking fluency,” System, vol. 17, no. 3, 1989, pp. 377-384. DOI: 10.1016/0346-251X(89)90010-9.

Abstract This paper examines the improvement of learners of English during the performance of a speaking activity which involves repeating the same unrehearsed talk. Improvements in fluency, grammatical accuracy, and control of the content showed that during the short time spent doing the activity, learners performed at a level above their normal level of performance. It is argued that working at this higher than usual performance is a way of bringing about long-term improvement in fluency.

1988

Martin Duckworth, and Martin J. Ball, “Problems in the transcription of dysfluent speech,” Journal of the International Phonetic Association, vol. 18, 1988, pp. 152-155. DOI: 10.1017/S0025100300003777.

Abstract Clinical phoneticians and speech pathologists often face difficulties in transcription of a different nature from those which the general phonetician encounters. A wide range of speech disorders present at the clinic (see Crystal 1980, 1981, 1984), and in many cases a detailed phonetic description of speech is an important part of the analysis and diagnosis of a person’s problem (see Carney 1979, on the dangers of inadequate description).

1987

Joan Fayer, and Emily Krasinski, “Native and Nonnative Judgments of Intelligibility and Irritation,” Language Learning, vol. 37, no. 3, 1987, pp. 313-326. DOI: 10.1111/j.1467-1770.1987.tb00573.x.

Abstract This study compares the reactions of native English speakers and native Spanish speakers who listened to tapes of Puerto Rican learners of English of various levels of proficiency. The listeners completed a questionnaire that examines the following variables: intelligibility, grammar, pronunciation, intonation, wrong words, voice, hesitations, distraction and annoyance. It was found that the English and Spanish listeners differed principally in how they rated the linguistic form of the speakers and in the annoyance reported. The Spanish listeners rated the linguistic form much lower than did the English listeners and also reported more annoyance. This indicates that the Spanish listeners were less tolerant toward nonnative speech than were the English listeners. In addition, pronunciation and hesitations were reported by both groups of listeners to be, overall, the features most distracting from the message.
Marian Olynyk, Alison d’Anglejan, and David Sankoff, “A quantitative and qualitative analysis of speech markers in the native and second language speech of bilinguals,” Applied Psycholinguistics, vol. 8, no. 02, 1987, pp. 121-136. DOI: 10.1017/s0142716400000163. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=2607360&fulltextType=RA&fileId=S0142716400000163.

Abstract The present study investigated the use of five speech markers in the native and second language production of French-English bilinguals in a military setting. We propose that these speech markers, mechanisms for self-repair and turn-taking in conversations, are a major component of fluency. The ten participants, five high fluency speakers and five low fluency speakers, were tape-recorded with their peers in three different situations in their native and second languages, and the frequency of occurrence of speech markers was tabulated for a 5-minute segment for each situation.It was hypothesized that speakers who used differentially more prepositioned repairs (progressives) or markers placed before the repair that do not require a reorganization of the expectation of what is to follow based on what has been produced in the turn so far, would be judged more favourably than those who used more postpositioned repairs (regressives). There was no quantitative difference in the frequency of occurrence of speech markers between the high and low fluency speakers, but the high fluency speakers used more progressive than regressive types of marker. Progressive markers place fewer demands on the interlocutor than regressive markers, which require constant readjustments on the part of the listener. The profiles were similar for each individual in the native and second language but in every case there were fewer markers in the native than in the second language. Furthermore, there were fewer markers in the planned (teaching) than in the unplanned (interview) situation. The findings have important implications for the evaluation of second language fluency.
Marcel E. Wingate, “Fluency and disfluency; Illusion and identification,” Journal of Fluency Disorders, vol. 12, no. 2, April 1987, pp. 79-101. DOI: 10.1016/0094-730X(87)90015-5. http://www.sciencedirect.com/science/article/pii/0094730X87900155.

Abstract At approximately the same time, two lines of research have studied disfluencies from different orientations—one in stuttering and the other in normal speech. In certain important respects the findings of these separate lines differ. Resolution of these differences, which is particularly important for understanding stuttering in its relation to disfluency and fluency, has been precluded because the two research areas have remained essentially isolated from each other. Progress in understanding stuttering would benefit considerably from adequate attention to the findings of research on disfluency in normal speech, which already has yielded a substantial amount of information pertinent to the concepts of fluency and disfluency; the nature and extent of disfluency; the linguistic and cognitive significance of disfluencies; and the differentiation between normal and abnormal disfluency.

1986

Bennett, Patrick, R., “The role of pause in discourse and its place in linguistics: Some evidence from Eastern Bantu,” Language Sciences, vol. 8, no. 1, April 1986, pp. 63-79. DOI: 10.1016/S0388-0001(86)80006-7.

Abstract Relatively few studies of breaks in the flow of speech assume a role in linguistic structure for pause. Using evidence from pause placement in several Eastern Bantu languages, including Kikuyu (from which all examples cited in the paper are drawn), it is possible to show that pause is not random, nor physiologically conditioned, nor explicable in terms of planning speech-bursts to come. There is a consistent contrast of three lengths of pause, with a ratio 1:2:3 or 1:2:4. The shortest pauses occur, apparently optionally, at word, phrase, or clause boundaries. The two longer categories, however, correlate with linguistic boundaries at the discourse level. The pattern of three pause levels is found in all the Eastern Bantu languages so far investigated, as a feature of narrative. If cases of hesitation are carefully distinguished from significant pause, it can be shown that the same pause hierarchy is part of normal conversation, at least for Kikuyu and other Dhaagiew languages. It is concluded that pause should indeed be seen as a linguistic category, though other languages have yet to be investigated, the possibility remains that pause is universal and automatic, rather than language-specific and linguistically significant. However, the fact that Kikuyu and Cinyanja differ, even though minimally, with ratios of 1:2:3 and 1:2:4, respectively, encourages acceptance of the pause as a valid linguistic unit.

1985

Rita Denny, “Marking the interaction order: The social consitution of turn exchange and speaking turns,” Language in Society, vol. 14, no. 01, 1985, pp. 41-62. DOI: 10.1017/s0047404500010939. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=2992216&fulltextType=RA&fileId=S0047404500010939.

Abstract This paper is about turn exchanges, the structure of speaking turns and the relationship of nonverbal behavior to both exchanges and turns. Its purpose is to present a conceptual framework for analyzing and interpreting turn exchange and speaking turns, and data are cited when possible. First discussed are specific forms of exchange such as and The Praguean concept of functional differentiation is invoked to argue that forms of turn exchange have indexical value. The relationship of nonverbal behavior to turn exchange is then discussed. An analysis of videotaped, dyadic interactions between strangers, conversation, is reported in order to demonstrate that the nonoverlapping exchange has structurally, hence indexically. distinct forms. These forms, defined by both verbal and nonverbal elements, are ordered in a hierarchy of pragmatic markedness. It is concluded that differentiating pragmatic markedness in conversational patterns is a powerful device for determining indexical features of conversation and thus of relevance for a semiotic understanding of everyday speech.

1984

John O. Greene, “Speech Preparation Processes and Verbal Fluency,” Human Communication Research, vol. 11, no. 1, 1984, pp. 61-84. DOI: 10.1111/j.1468-2958.1984.tb00038.x.

Abstract Preparation of speech in advance of actual production has consistently been shown to result in greater speech fluency. This observation is important given the impact of speech fluency in social perception; however, it raises questions concerning the nature of the processes by which communicative behaviors are prepared and of the representation of those behaviors in the cognitive system. The current research represents an attempt to address these issues. In Experiment I subjects provided with an abstract problem-solution sequence exhibited less silent pausing during speech than a control group which was not given such a sequence. A second experimental group provided with an abstract solution-problem sequence exhibited less pausing than the control group, but not significantly so. In Experiment II, increasing practice with the solution-problem sequence was found to lead a decreasing linear trend in silent pausing. These findings are discussed in terms of their implications for understanding the nature of production of communicative behavior.
John Sherblom, and Dwayne D. Van Rheenen, “Spoken Language Indices of Uncertainty,” Human Communication Research, vol. 11, no. 2, 1984, pp. 221 - 230. DOI: 10.1111/j.1468-2958.1984.tb00046.x.

Abstract The present study investigates two propositions of uncertainty reduction theory and examines their effects on language use. Linguistic diversity and verbal immediacy were measured in two conversational segments taken from different periods of the entry phase of 72 interviews. A discriminant analysis function accounted for 19.36% of the variance in the measures. A sign test of the discriminant function coefficients showed a significantly consistent shift, across individuals, from the earlier conversational segment to the later. The results are consistent with the propositions of uncertainty reduction. Implications of this interpretation are discussed.

1983

Geoffrey Beattie, Talk: an analysis of speech and non-verbal behaviour in conversation. : Milton Keynes: Open University Press.1983. https://www.researchgate.net/publication/259194270_Talk_An_Analysis_of_Speech_and_Non-Verbal_Behavior_in_Conversation.

Abstract (none)
Willem J. M. Levelt, “Monitoring and self-repair in speech,” Cognition, vol. 14, no. 1, July 1983, pp. 41-104. DOI: 10.1016/0010-0277(83)90026-4. http://www.ncbi.nlm.nih.gov/pubmed/6685011?dopt=Abstract.

Abstract Making a self-repair in speech typically proceeds in three phases. The first phase involves the monitoring of one’s own speech and the interruption of the flow of speech when trouble is detected. From an analysis of 959 spontaneous self-repairs it appears that interrupting follows detection promptly, with the exception that correct words tend to be completed. Another finding is that detection of trouble improves towards the end of constituents. The second phase is characterized by hesitation, pausing, but especially the use of so-called editing terms. Which editing term is used depends on the nature of the speech trouble in a rather regular fashion: Speech errors induce other editing terms than words that are merely inappropriate, and trouble which is detected quickly by the speaker is preferably signalled by the use of ’uh’. The third phase consists of making the repair proper. The linguistic well-formedness of a repair is not dependent on the speaker’s respecting the integriv of constituents, but on the structural relation between original utterance and repair. A bi-conditional well-formedness rule links this relation to a corresponding relation between the conjuncts of a coordination. It is suggested that a similar relation holds also between question and answer. In all three cases the speaker respects certain Istructural commitments derived from an original utterance. It was finally shown that the editing term plus the first word of the repair proper almost always contain sufficient information for the listener to decide how the repair should be related to the original utterance. Speakers almost never produce misleading information in this respect. It is argued that speakers have little or no access to their speech production process; self-monitoring is probably based on parsing one’s own inner or overt speech.
Marian Olynyk, David Sankoff, and Alison d’Anglejan, “Second Language Fluency and the Subjective Evaluation of Officer Cadets in a Military College,” Studies in Second Language Acquisition, vol. 5, no. 02, 1983, pp. 213-236. DOI: 10.1017/s0272263100004861. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=2546036&fulltextType=RA&fileId=S0272263100004861.

Abstract The present study investigated the role of first and second language fluency in subjective judgments of linguistic, social and professional competence of adult bilinguals in a military setting. We examined the use of five types of speech marker, commonly referred to as hesitation phenomena, among ten Francophone officer-cadets in their native and second language, English. The results confirmed the experimenters’ a priori classification of the subjects as high or low fluency speakers. Anglophone and Francophone peer judges of various levels of proficiency in their second language listened to a tape assembled of fifteen second segments of each subject’s speech production in the native and second language and completed a questionnaire composed of ten scales evaluating the subjects in three domains: linguistic, social, and professional. The results showed that the eighty-six judges evaluated the subjects more positively in their native than in their second language guises in all three areas. High fluency speakers were evaluated more highly than low fluency speakers. Judges reactions were shown to vary as a function of their degree of bilingualism and their minority versus majority group membership.

1982

Danielle Duez, “Silent and Non-Silent Pauses in Three Speech Styles,” Language and Speech, vol. 25, no. 1, January 1982, pp. 11-28. DOI: 10.1177/002383098202500102.

Abstract The frequency, duration and distribution of pauses in French were investigated acoustically in three types of speech styles: political interviews and casual interviews, which belong to spontaneous speech, and political speeches, which are carefully prepared. The speech samples were subdivided into articulated sequences, silent pauses, and non-silent pauses. The total time of silent pauses was 50% greater in political speeches than in either type of interview. It appears to be one of the characteristics of political speeches. In all three styles, the distribution of silent pauses was generally correlated with the syntactic structure of the sentence. Most of the time, these pauses occurred at clause or phrase boundaries. In political speeches, however, their frequency was greater and their duration longer. Some of these pauses, particularly the long ones, must have a predominantly stylistic function. In interviews, non-silent pauses were frequent and long, particularly in casual interviews, whereas they were almost completely absent in political speeches. These results confirm previous studies that involve other languages as well, and investigate the syntactic distribution of pauses and the importance of hesitation in spontaneous speech; they open onto a new research area concerned with the stylistic function of pauses.
Margaret McLaughlin, and Michael J. Cody, “Awkward Silences: Behavioral Antecedents and Consequences of the Conversational Lapse,” Human Communication Research, vol. 8, no. 4, 1982, pp. 299-316. DOI: 10.1111/j.1468-2958.1982.tb00669.x.

Abstract Audio tape-recordings of 30-minute conversations between pairs of strangers (N=90) were scored for the frequency and duration of conversational lapses, interactive silences of three or more seconds occurring at the recognizable completion of a turn-constructional unit. Ten-utterance segments of conversation immediately prior and immediately subsequent to lapses were transcribed from the tapes of 45 of the conversations characterized by multiple lapses. Pre-and post lapse behaviors were coded as (A) (B) discloses, questions, edifies, acknowledges, advises, interprets, confirms, reflects. Also coded were gaps and laughter outbursts. Lag sequential analysis of the pre lapse data indicated that behavior sequences prior to lapses were characterized by a pattern of "minimal response" by one of the participants. Post lapse sequences were characterized by the presence of question-answer adjacency pairs.
J. Donald Ragsdale, and Catherine F. Silvia, “Distribution of Kinesic Hesitation Phenomena in Spontaneous Speech,” Language and Speech, vol. 25, no. 2, April 1982, pp. 185-190. DOI: 10.1177/002383098202500205.

Abstract Forty college student volunteers, equally divided by sex, participated in a study of vocal hesitations and kinesic hesitation phenomena in an anxiety-producing interview conducted by male and female interviewers. The vocal hesitations were of Mahl’s non-ah type and included sentence change, repetition, stutter, omission, sentence incompletion, tongue slip, and intruding incoherent sounds. The kinesic hesitation phenomena included head, hand, arm, leg, and foot movement, posture change, and body shift. The results showed a close relationship between the occurrences of vocal hesitations and movement. Additionally, they showed clear differences in the frequency of occurrence of the kinds of vocal hesitations and movements. Finally, it was found that the most common location of a kinesic phenomenon is just before or simultaneously with a nonfluency rather than after one. The findings lend support to the idea that vocal and nonvocal behaviors may not always be so interdependent as parallel.

1981

Adolf E. Hieke, “A Content-Processing View of Hesitation Phenomena,” Language and Speech, vol. 24, no. 2, April 1981, pp. 147-160. DOI: 10.1177/002383098102400203.

Abstract Hesitation phenomena are intricately connected with prospective and retrospective speech-production tasks and mark critical points in processing. They are also causally related to types of quality control which can be expressed as conversational postulates governing wellformedness criteria. Corresponding to the concepts of forestalled versus committed errors (error-free or error-full output), two major hesitation categories suffice: stalls and repair. Supported by a corpus of English and German, the new taxonomy captures previously uncategorized information: the grammatical locus of repair operations and the structural changes they cause.

1980

Geoffrey W. Beattie, “The Skilled Art of Conversational Interaction: Verbal and Nonverbal Signals in Its Regulation and Management,” in The Analysis of Social Skill, Singleton, W.T. and Spurgeon, P. and Stammers, R.B., Ed.Boston, MA: Springer, 1980, pp. 193-211. DOI: 10.1007/978-1-4684-3623-5_11. https://link.springer.com/chapter/10.1007/978-1-4684-3623-5_11.

Abstract This chapter will differ from many others in this book because it concentrates upon one specific, single, low-level social skill. Many of the other chapters will provide broad and general accounts of higher-level skills and direct criticism of the concept of social skills. This chapter will provide a different kind of criticism of social skills by presenting an argument almost in the form of a fable (and I don’t mean by this that the story is untrue), in that the chapter will, hopefully, illustrate some general principles (in this case, of skilled social performance) by providing a detailed account of one low-level but common phenomenon. Unlike the characters from traditional fables — the woodcutters, the bakers and the shoe menders — the central figure in this story has not been chosen at random. The central figure in this story, the process of turn-taking in conversation — seems to have always been near the centre of the stage in attempts by social psychologists to apply the concept of skill to social performance (see Argyle, 1967, 1974; Argyle & Kendon, 1967). Here, I want to provide a detailed and up-to-date account of the way in which this phenomenon has been approached by social psychologists. Through so doing, I hope to demonstrate that psychologists are now, at last, coming to grips with the complexity and richness of some of the more basic processes which underlie all skilled social performance.

Keywords Nonverbal Behaviour, Traffic Signal, Traffic Light, Spontaneous Speech, Conversational Interaction
David A. Good, and Brian L. Butterworth, “Hesitancy as a conversational resource: Some methodological implications,” in Temporal Variables in Speech, Dechert, Hans W. and Raupach, Manfred, Ed.Berlin: De Gruyter Mouton, 1980, pp. 145-152. DOI: 10.1515/9783110816570.145. https://www.degruyter.com/view/book/9783110816570/10.1515/9783110816570.145.xml.

Abstract In a previous paper (Good, 1978) it was argued that the levels of hesitancy in the speech of an individual form an important prosodic cue, for the participants in a conversation, as to the relationship between the speaker and his utterance. This claim was made principally in the context of hesitancy as an indicator of cognitive load for the speaker, it being proposed that, whilst speakers may well need to hesitate more when faced with a heavy task demand, they may also increase the relative amounts of hesitation in their speech to achieve some interactional goal, even though the difficulty of the particular utterance would not directly necessitate the change. However no direct empirical evidence was offered to support this position, nor were any claims made as to whether speakers who were hesitating more than they needed to, would produce patterns of speech and silence that corresponded to those found in spontaneous speech or not. The purpose of this paper is to report an investigation of speaker behaviour when producing material that was already well known, whilst under the constraint of attempting to generate the impression that the converse was true. Thus the hypothesis offered by Good (1978), would be directly tested, and samples of 'simulated', and genuinely hesitant speech would be provided for a comparative analysis.
Adolf E. Hieke, “Aspects of native and non-native fluency skills,” Master's Thesis, University of Kansas. 03/1980 1980, pp. 274. http://www.worldcat.org/title/aspects-of-native-and-non-native-fluency-skills/oclc/6496448.

Abstract One measure of language competency is fluency, whether it be in the speaker’s first or second language, yet the concept of fluency has received little attention in linguistics so far. Research on second language acquisition has also neglected this complex issue although the attainment of fluency is the ultimate objective of most foreign language programs. In an article on testing oral fluency published no more than five years ago, only the following, rather general definition could be offered: Fluency: tentatively defined as the ability to give proof of sustained oral production implying a certain communicative competence, as well as the unstilted, spontaneous use of English "conversanal [sic] lubricants" (Beardsmore 1974:323). Investigations by psychologists and others have paved the way for an exploration of the potentials of a linguistic inquiry into fluency. There is a growing body of literature on the general subject as well as fairly sophisticated equipment to study components of fluency and hesitation phenomena. With the study of discourse becoming an area of growing interest to linguists, a focus on fluency promises to shed additional light on the nature of language and may off er valuable insights into language learning. Fluency must not be understood as a unified concept, as the only book-length treatment on teaching fluency (Leeson 1975) makes clear, because the phenomenon is highly complex and the issues attending it are multi-faceted. Therefore any individual study such as the present one must be selective in its focus and can make only a partial and modest contribution. In this case the concentration is on just one mode of speech, paraphrase, through which the more readily identifiable aspects of fluency lend themselves to investigation: rate of speech and a number of so-called hesitation phenomena, namely silent and filled pauses, repeats, false starts, and parenthetical remarks. All statements and figures relating to fluency and to hesitation phenomena throughout this study therefore pertain to spontaneous speech under the conditions of the paraphrase mode only. Although paraphrase is not spontaneous speech in the strictest sense, it preserves the essential characteristics of spontaneous speech and offers several advantages for the researcher concerned with fluency. It permits some control over content so that, for instance, deviations from the known story would become immediately apparent in the re-telling, which reveals something about the strategies in speech planning. More importantly, since this mode has been used in experimentation before, its adoption makes comparisons with other research findings possible. For teachers who wish to test oral fluency, the experimental results offered in the following can provide some normative data against which to gauge their own test results. Since the paraphrase skill is commonly used in oral fluency assessment, the experiments here have been set up in much the same way they would be found in a teaching program. That the results from the present study may be put to immediate practical use conforms exactly to the overall purpose here, which is above all to present research of practical value. Even the experimental design adopted here derives from the attempt to approximate as closely as possible an ordinary, realistic teaching situation. Consequently, all experiments were conducted as part of a normal, on-going teaching program and testing process in English as a Foreign Language (EFL) classes at the University of Tuebingen. Based on a number of experiments conducted between 1977 and 1979, the present study pursues three goals. First, it attempts to establish a range of baseline data for native English and German spontaneous speech; such data then serve to evaluate nonnative fluency skills in English, both before and after instruction of a type involving a new teaching technique; finally, it offers a reclassification of all the hesitation phenomena along criteria different from the traditional ones, which in turn leads to a re-analysis of the raw data. Chapter One provides a review of studies on crosslinguistic speech rate and hesitation phenomena. Chapter Two reports on experimentally derived baseline data on native speech rates and hesitation phenomena in both English and German, respectively, which amounts to a thorough investigation of the time continuum in spontaneous speech: rate; mean length and frequency of silent pauses; rate of filled pauses, repeats, false starts, and parenthetical remarks; articulation rate and length of runs. In this manner the time components can be accounted for in terms of silence, speech, and hesitation phenomena and the percentage of time each takes up. Chapter Three presents the rationale and function of Audio-Lectal Practice (ALP), a new teaching technique designed to facilitate fluency acquisition, in some detail. This sets the stage for Chapter Four which shows the effect of controlled, imitative practice in continuous speech on fluency skills, based on pre-tests and post-tests after a twelve-week exposure to ALP. An interpretation of the data provides information on fluency skills in both native and non-native speakers (in this case German university students learning English) as well as comparative values between these groups. With that the purposes of the study are fulfilled as far as normative and comparative measures are concerned, but the process of analyzing the wealth of data available (all in all, 78 speech samples of one minute each) forcefully suggested a reclassification of hesitation phenomena. Chapter Five thus turns from numerical findings to matters of classification. In addition to the traditional classification system, which handles the data in sequential fashion along a time axis, a point of view is possible which captures the data non-redundantly and with less need for controversial decisions in classification. The focus here shifts to criteria of acceptability; for that purpose a set of conversational postulates governing oral speech are introduced. Seen from this angle, all the hesitation phenomena (except parenthetical remarks which are not affected) are divided into just two major classes. These are labelled ’stalls’ and ’repair’. It is shown that the data support such an analysis, but this also makes it necessary to re-analyze the raw data accordingly; this is accomplished in Chapter Six, along with the presentation of the derivative set of data. Chapter Seven, finally, provides a summary of findings and interpretations and, in addition, some conclusions which may be drawn from the study for linguistics and language teaching.
Shuli Reich, “Significance of Pauses for Speech Perception,” Journal of Psycholinguistic Research, vol. 9, no. 4, 1980, pp. 379-389. DOI: 10.1007/BF01067450.

Abstract Pauses can be used to facilitate certain operations involved in the production and in the perception of speech. In the case of speech perception, pauses have been found to improve the accuracy of detection and the recall of lists of digits and letters. The aim of the present experiments was to examine the effects of pause time on the perception of sentences. In experiment I, a semantic categorization task was used and in experiment H a sentence recall task. The results indicated that in sentences containing pauses between clauses, words were categorized more rapidly (experiment I) and propositions were recalled more accurately (experiment 1I) than in sentences containing pauses within the clause. The results are interpreted in the context of existing models of speech processing, and the significance of pause time for cognitive activity is discussed.
Temporal Variables in Speech: Studies in Honour of Frieda Goldman-Eisler. The Hague: Mouton.1980. DOI: https://doi.org/10.1515/9783110816570. https://www.degruyter.com/view/title/6106?tab_body=overview.

Abstract (none)

1979

Geoffrey W. Beattie, “Planning units in spontaneous speech: some evidence from hesitation in speech and speaker gaze direction in conversation,” Linguistics, vol. 17, no. 1-2, January 1979, pp. 61 - 78. DOI: 10.1515/ling.1979.17.1-2.61. https://www.degruyter.com/view/journals/ling/17/1-2/article-p61.xml.

Abstract The aim of the present study was to attempt to elucidate the nature of the units of encoding involved in the generation of spontaneous speech, firstly through analysis of the distribution of hesitations in speech, and secondly through analysis of speaker gaze direction in conversation. These analyses suggested that both suprasentential units and simple clausal units are implicated in the encoding process. Moreover, evidence of encoding on a clausal basis was only obtained for speech produced during the planning phases of the larger, suprasentential units.
Geoffrey W. Beattie, and R. J. Bradbury, “An Experimental Investigation of the Modifiability of the Temporal Structure of Spontaneous Speech,” Journal of Psycholinguistic Research, vol. 8, no. 3, 1979, pp. 225-248. DOI: 10.1007/BF01067306.

Abstract This study attempted to test the hypothesis that the temporal structure of spontaneous speech is modifiable by reinforcing and punishing pauses, of a certain duration, in an operant conditioning situation. Pause rate was significantly affected by these contingencies: moreover, rate of change was rapid, indicating a "prepared" association between pausing and such contingencies. This study also attempted to test the hypothesis that there is a class of noncognitive pauses in monologue by punishing UPs to determine if UPs can be eliminated without affecting speech content. Although this manipulation did lead to a decline in pause rate, a significant increase in the amount of filled hesitation, particularly in repetition, resulted. This suggests that the overall amount of hesitation is fixed by the cognitive demands of the task but that a speaker is able to adapt to different interactional contexts by varying the category of hesitation used for cognitive planning.
Geoffrey W. Beattie, and Brian L. Butterworth, “Contextual Probability and Word Frequency as Determinants of Pauses and Errors in Spontaneous Speech,” Language and Speech, vol. 22, no. 3, July 1979, pp. 201-211. DOI: 10.1177/002383097902200301.

Abstract This study investigated the relationship between the contextual probability of lexical items in spontaneous speech, as measured by the Cloze procedure, and word frequency. It also attempted to determine the relative importance of the two variables in causing delay, in the form of hesitation, in the production of spontaneous speech. The analysis revealed that content words of low contextual probability tended to be more infrequent than other words, and that both contextual probability and word frequency were associated with hesitation in speech. Contextual probability had an effect on hesitation even when word frequency was held constant, but word frequency had no effect when contextual probability was controlled. Analysis of certain types of errors, also, revealed that word frequency may play an important role in the lexical selection process.
Susan J. Frances, “Sex Differences in Nonverbal Behavior,” Sex Roles, vol. 5, no. 4, August 1979, pp. 519-535. DOI: 10.1007/BF00287326.

Abstract A variety of nonverbal behaviors was coded from videotapes of 88 dyadic conversations. The 44 male and 44 female subjects were paired so that each participated in one conversation with a stranger of the same sex and one conversation with a stranger of the opposite sex. It was found that sex of subject, but not sex of partner, had a significant effect on many of the nonverbal behaviors displayed during the conversations. Subjects’ scores on the behavioral measures were correlated with their scores on several personality measures and on a post-conversation questionnaire. Sex differences in these correlations were used to generate hypotheses linking specific behavioral differences between the sexes to more general differences between the masculine and feminine interpersonal styles.
Bernd Voss, “Hesitation phenomena as sources of perceptual errors for non-native speakers,” Language and Speech, vol. 22, no. 2, April 1979, pp. 129-144. DOI: 10.1177/002383097902200203. http://www.eric.ed.gov/ERICWebPortal/search/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=EJ214159&ERICExtSearch_SearchType_0=no&accno=EJ214159.

Abstract Analyzes the perceptual problems of 22 nonnative speakers of English who transcribed spontaneous speech. Finds that perception resembles a matching of the listener’s projection and the incoming acoustic information, that native/nonnative perception strategies were similar, and that hesitation phenomena were important sources of nonnative speakers’ perceptual problems.

1978

Clara Mayo, and Marianne La France, “On the Acquisition of Nonverbal Communication: A Review,” Merrill-Palmer Quarterly of Behavior and Development, vol. 24, no. 4, October 1978, pp. 213-228. http://www.jstor.org/stable/23083902.

Abstract Children need to learn to communicate nonverbally as well as verbally. A child's first word is so dramatic an event that its appearance tends to blind observers to emergent developments in other communication channels. In the past 15 years, considerable research has appeared showing the communicative aspects of such nonverbal behaviors as gaze direction and eye contact; posture, interpersonal distance, and touch; facial expression and body movement; and vocal variations in speech.

1977

Geoffrey W. Beattie, “The dynamics of interruption and the filled pause,” British Journal of Social & Clinical Psychology, vol. 16, no. 3, 1977, pp. 283-284. DOI: 10.1111/j.2044-8260.1977.tb00230.x.

Abstract (none)

1976

J. Donald Ragsdale, “Relationship between hesitation phenomena, anxiety and self-control in a normal communication situation,” Language and Speech, vol. 19, no. 4, July 1976, pp. 257-265. DOI: 10.1177/002383097601900307.

Abstract This study investigated relationships between three categories of hesitation phenomena, anxiety as measured by Welsh’s Anxiety Index, and self-control as measured by Welsh’s Internalization Ratio. The three categories of hesitation phenomena were ah and its variants, non-ah (stutters, repetitions, etc.), and silent pause. Since previous research had concentrated primarily upon psychiatric interviews, this study focused on a normal, interpersonal communication situation. Previous research also had not utilized Welsh’s Internalization Ratio. Subjects were 15 male and 15 female undergraduate beginning speech students engaged in small-group discussion. It was hypothesized that data from these subjects, as with that from subjects in a clinical communication situation, would reveal that non-ah phenomena would be positively correlated with anxiety, but that there would be no other significant correlations among the variables. The findings confirmed the hypotheses, except that a significant r was found for the relationship between non-ah phenomena and the Internalization Ratio. The normal subjects in this study exhibited behaviour quite similar to that of the clinical subjects of previous research. As their Anxiety Indexes and Internalization Ratios increased, so did their stutters, repetitions, sentence changes, and the like.

1975

Peter Ball, “Listener’s Responses to Filled Pauses in Relation to Floor Apportionment,” British Journal of Social & Clinical Psychology, vol. 14, no. 1, 1975, pp. 423-424. DOI: 10.1111/j.2044-8260.1975.tb00198.x. https://onlinelibrary.wiley.com/doi/10.1111/j.2044-8260.1975.tb00198.x.

Abstract (none)
Brian L. Butterworth, “Hesitation and Semantic Planning in Speech,” Journal of Psycholinguistic Research, vol. 4, no. 1, 1975, pp. 75-87. DOI: 10.1007/BF01066991.

Abstract Samples of spontaneous speech were analyzed according to their distributions of phonations and silences. Some of these exhibited cyclic, or "rhythmic" patterns, in the sense defined by Goldman-Eisler. Transcripts of three such samples were subjected to a segmentation procedure carried out by independent judges utilizing a common semantic intuition. Points in the transcripts where agreement was high among the judges were found to correspond with the beginnings of temporal cycles, and agreed semantic segments coincided with sentence or clause boundaries and usually consisted of several clauses and more than one sentence. It is argued that a theory of speech generation must contain provision for semantic integration at the suprasentential level.
Richard Leeson, Fluency and Language Teaching. London: Longman.1975. https://books.google.co.jp/books/about/Fluency_and_Language_Teaching.html?id=z02YnQEACAAJ&redir_esc=y.

Abstract (none)

1973

Sherry R. Rochester, “The significance of pauses in spontaneous speech,” Journal of Psycholinguistic Research, vol. 2, no. 1, 1973, pp. 51-81. DOI: 10.1007/BF01067111.

Abstract Studies of filled and silent pauses performed in the last two decades are reviewed in order to determine the significance of pauses for the speaker. Following a brief history, the theoretical implications of pause location are examined and the relevant studies summarized. In addition, the functional significance of pauses is considered in terms of cognitive, affective-state, and social interaction variables.

1972

Frieda Goldman-Eisler, “Pauses, Clauses, Sentences,” Language and Speech, vol. 15, no. 2, April 1972, pp. 103-113. DOI: 10.1177/002383097201500201.

Abstract The tool of pause measurement was applied to the question of the psychological reality of syntactic structures in spontaneous speech. The material investigated covered a wide field of speech productions, of different speakers and different speech tasks. Their analysis showed that the hierarchy of syntactic structures is reflected differentially in the pause structure of spontaneous speech. When readings of the spontaneous texts were compared with the original spontaneous speech it emerged that the reading process modifies the pausing for different syntactic structures differently. Sentences as distinct from clauses are marked by their temporal cohesion in spontaneous speech as well as in reading. This fact is discussed with reference to Wundt’s analytical theory of sentence-wholes.
Daniel O’Connell, and Sabine Kowal, “Cross-Linguistic Pause and Rate Phenomena in Adults and Adolescents,” Journal of Psycholinguistic Research, vol. 1, no. 2, 1972, pp. 155-164. DOI: 10.1007/BF01068105.

Abstract Three groups of 40 Ss (German adolescents and American adults and adolescents) read two passages and retold them. In confirmation of O’Connell Kowal, and Hörmann (1969) for German adults, a number of pause and rate measures were significantly different for semantically ordinary or unusual passages. Comparisons among the four experiments manifested different patterns of pauses and rate across the two languages and age brackets.

1971

Mark Cook, “The incidence of filled pauses in relation to part of speech,” Language & Speech, vol. 14, 1971, pp. 135-139. DOI: 10.1177/002383097101400203. https://journals.sagepub.com/doi/abs/10.1177/002383097101400203.

Abstract Maclay and Osgood's theory about filled pauses (FP's) is described and evidence relevant to it is summarized. Maclay and Osgood's own evidence—that FP's occur more often before certain types of word than before others—is criticised and suggestions made for an improved procedure. Data obtained by this improved procedure does not show the same tendency as Maclay and Osgood's, i.e. filled pauses occur as often before nouns, pronouns, verbs, adverbs, and adjectives as before other parts of speech. However it is also found that FP's occur more often than would be expected by chance before pronouns, but less often before nouns, verbs and adverbs. Some details of individual differences are presented. The significance of these findings is discussed in the light of Maclay and Osgood's hypothesis.

1970

Mark Cook, and Mansur Lalijee, “The Interpretation of Pauses by the Listener,” British Journal of Social and Clinical Psychology, vol. 9, no. 4, 1970, pp. 375-376. DOI: 10.1111/j.2044-8260.1970.tb00988.x.

Abstract It has been suggested (Maclay & Osgood, 1959) that filled pauses (FPs) are signals by the speaker that he has not finished, even though he has paused; previous work (Lalljee & Cook, 1969) has not confirmed this. It was decided to test the hypothesis again, by determining whether FPs were interpreted by listeners as meaning that the speaker had not finished. The listeners were asked to decide when the speaker had finished his utterance, and silent and filled pauses were inserted to see whether this affected their decision. According to the hypothesis, they should think the speaker has finished if a silent pause occurs in the utterance, and should not think he has finished if an filled pause occurs.

Keywords Sociology & Social History ; Psychology;
Richard Leeson, “The Exploitation of Pauses and Hesitation Phenonema in Second Language Teaching: Some possible lines of exploration,” Audiovisual Language Journal, vol. 8, no. 1, 1970, pp. 19-22. https://eric.ed.gov/?id=EJ021605.

Abstract Identifies three types of pausing and discusses their relevance to language teaching.

1969

Mark Cook, “Anxiety, Speech Disturbances and Speech Rate,” British Journal of Social and Clinical Psychology, vol. 8, no. 1, 1969, pp. 13-21. DOI: 10.1111/j.2044-8260.1969.tb00580.x. https://bpspsychub.onlinelibrary.wiley.com/doi/abs/10.1111/j.2044-8260.1969.tb00580.x.

Abstract A distinction is drawn between two types of anxiety that might affect speech. Previous work on speech disturbance and speech rate is reviewed in the light of this distinction. An experiment is carried out in which both types of anxiety are varied. A significant effect of one type of anxiety on certain types of speech disturbance is found. A significant interaction between both types of anxiety and speech rate is found. On the basis of these results, conclusions are drawn about the usefulness of speech disturbances as an indicator of anxiety.
Insup Taylor, “Content and structure in sentence production,” Journal of Verbal Learning and Verbal Behavior, vol. 8, no. 2, 1969, pp. 170 - 175. DOI: http://dx.doi.org/10.1016/S0022-5371(69)80057-5. http://www.sciencedirect.com/science/article/pii/S0022537169800575.

Abstract Subjects were asked to produce sentences using given topic words. Content was manipulated by varying levels of difficulty of the topics selected. Latencies and various types of hesitations were recorded. Latency, which is assumed to reflect preprocessing, was examined as a function of topic difficulty, sentence length, structural complexity, and types of sentences produced. Levels of difficulty of the topics, but not structural encoding operations, sentence length, or sentence types affected latency. Based on the findings, a tentative working model of sentence production is proposed with emphasis more on content than on structure. Content is conceptualized in differing degrees, perhaps in the form of a concise central idea, and stored in short-term memory (STM). In the process of speaking, the rest of the sentence is fitted in around the conceptualized content retrieved from STM. Structure serves as a container of content, and is more or less automatically produced to suit a particular content.
Mansur Lallgee, and Mark Cook, “An experimental investigation of the function of filled pauses in speech,” Language and Speech, vol. 12, no. 1, January 1969, pp. 24-29. DOI: 10.1177/002383096901200102.

Abstract Filled pauses have been described as a product of anxiety, and have also been explained as attempts by the speaker to maintain control of the ’ floor’. The latter hypothesis is tested directly, by altering the pressure on the subject to continue speaking. Possible confounding effects of anxiety are controlled for. Filled pauses do not increase, as pressure to continue speaking increases. It is suggested that the ’ control’ hypothesis may apply only to monologues ; evidence concerning the relative frequency of filled pauses in monologues and dialogues is presented.
James McCroskey, and R. Mehrley, “The effects of disorganization and nonfluency on attitude change and source credibility,” Speech Monographs, vol. 36, no. 1, 1969, pp. 13-21. DOI: 10.1080/03637756909375604. https://www.tandfonline.com/doi/abs/10.1080/03637756909375604?journalCode=rcmm19.

Abstract Rhetorical theorists commonly assert that message disorganization and nonfluent delivery reduce persuasive effectiveness. But empirical data provide only ambiguous support for this generally accepted view. The purpose of the present study was to test hypothesized effects of organization and fluency on attitude change and source credibility. A review of the literature clarifies the relationship between these variables and provides a theoretical schema for the hypotheses that we tested.
Daniel O'Connell, Sabine Kowal, and Hans Hörmann, “Semantic determinants of pauses,” Psychologische Forschung, vol. 33, no. 1, March 1969, pp. 50-67. DOI: 10.1007/BF00424616.

Abstract The following experiment presents evidence that variations in semantic context can produce changes in the rate and length of pauses in a situation in which syntactic and other variations are minimized. Each of 40 Ss read two paragraphs aloud and after each paragraph retold the “story” without further instructions. Each paragraph consisted of five sentences, each containing 23 syllables. The third sentence was either in accord with the story or an unusual occurrence (depending on exchange of subject and object). The most important experimental finding was that both number and length of unfilled pauses are more frequent throughout the unusual stories as compared with the usual ones. In the readings, the effect was limited to the critical sentence and the pauses immediately thereafter. The evidence supports the view of the authors that the role of semantic context has been underestimated in psycholiguistic research to date.

1968

James Martin, and Winifred Strange, “The perception of hesitation in spontaneous speech,” Psychonomic Journals: Perception & Psychophysics, vol. 3, no. 6, November 1968, pp. 427-438. DOI: 10.3758/BF03205750.

Abstract The issue in this paper was whether attending to acoustic elements and to message elements in a speech signal were compatible operations. In four experiments Ss listened for pauses and other hesitation phenomena in spontaneous speech; in three the task was reproduction of heard speech to include hesitations; in one the task was simply the marking of heard hesitations on transcripts. Experimental variables were instructions, degree of “ungrammaticality” of hesitations in speech inputs, time interval between listening and reproduction, and task manipulations along a continuum between simple hesitation detection and hesitation detection plus simultaneous speech decoding. Results were: (I) In all experiments Ss displaced within-constituent hesitations to constituent boundaries, suggesting a grammatical organization between input and output. (2) Instructional set to reproduce hesitations increased hesitations and words but at the expense of per cent words correct, suggesting that attending to acoustic elements such as hesitations was an interfering task during speech decoding. (3) The hesitation shift persisted in the hesitation-marking task when simultaneous speech decoding was required by the nature of the task, indicating that speaking (encoding) characteristics may not completely account for the shift. (4) The distribution of hesitation marking errors toward grammatical organization seemed to require an account in terms of perceptual processes during listening.
A. Reynolds, and A. Paivio, “Cognitive and emotional determinants of speech,” Canadian Journal of Psychology/Revue canadienne de psychologie, vol. 22, no. 3, 1968, pp. 164-175. DOI: 10.1037/h0082757. https://psycnet.apa.org/doi/10.1037/h0082757.

Abstract STUDIED 48 UNIVERSITY STUDENTS IN VERBAL ASSOCIATIVE PRODUCTIVITY AND AUDIENCE SENSITIVITY TEST SCORES. SS DEFINED AND CONCRETE NOUNS BEFORE AN AUDIENCE OR IN THE PRESENCE OF E ALONE. DEFINITIONS WERE SCORED FOR EXTRALINGUISTIC AND STYLE FEATURES OF SPEECH. THE DESIGN WAS BASED ON THE ASSUMPTION THAT EFFECTS OF ASSOCIATIVE PRODUCTIVITY AND STIMULUS CONCRETENESS ARE MEDIATED BY COGNITIVE PROCESSES, WHEREAS EFFECTS OF AUDIENCE SENSITIVITY AND AUDIENCE CONDITIONS ARE MEDIATED BY EMOTIONAL STATES. LATENCY OF DEFINITIONS, WORD PRODUCTION, AND FILLED PAUSES ("AHS") WERE RELATED ONLY TO THE 2 COGNITIVE FACTORS. A SIGNIFICANT INTERACTION REVEALED THAT IN THE AUDIENCE SITUATION HIGHLY AUDIENCE-SENSITIVE SS HAD THE HIGHEST SILENT-PAUSE RATIO, SUGGESTING THAT THIS VARIABLE WAS MOST AFFECTED BY EMOTIONAL AROUSAL. FREQUENCY OF SILENT PAUSES, WORD LENGTH, RATIO OF CONCRETE TO ABSTRACT NOUNS IN THE DEFINITIONS, AND EVALUATIVE RATINGS OF THE DEFINITIONS WERE RELATED TO BOTH CLASSES OF INDEPENDENT VARIABLES.

1967

Harry Levin, Irene Silverman, and Boyce Ford, “Hesitations in Children’s Speech During Explanation and Description,” Journal of Verbal Learning and Verbal Behavior, vol. 6, no. 4, August 1967, pp. 560-564. DOI: 10.1016/S0022-5371(67)80017-3.

Abstract Twenty-four children, six each from the kindergarten, second, fourth, and sixth grades, were shown three simple physical demonstrations. They described and explained what they saw. For children of all ages, explanation compared with description was characterized by more words, pauses, hesitations, longer pauses, and a slower rate of speaking.
James Martin, “Hesitation in speaker’s production and listener’s reproduction of utterances,” Journal of Verbal Learning and Verbal Behavior, vol. 6, no. 6, December 1967, pp. 903-909. DOI: 10.1016/S0022-5371(67)80157-9.

Abstract Twenty-four college Ss (encoders) described TAT pictures in short utterances. Each was yoked unsystematically with one of 24 listener Ss (decoders) who heard his recorded utterances and attempted to reproduce them. Words were classified as content or function. While encoders and decoders yielded about the same proportion of content words (41%), encoders yielded a relatively higher proportion of repeats, unfilled pauses, and total hesitations before content words (which have greater uncertainty) than did decoders. Decoders placed relatively more of their hesitations at sentence breaks than did encoders. Apparently, while encoder pauses reflect uncertainty, decoder pauses tend more to mark grammatical boundaries. The selection of semantic-syntactic structure precedes selection of individual words during encoding but follows during decoding.

1966

Alan Henderson , Frieda Goldman-Eisler, and Andrew Skarbek, “Sequential Temporal Patterns in Spontaneous Speech,” Language and Speech, vol. 9, no. 4, 1966, pp. 207-216. DOI: 10.1177/002383096600900402.

Abstract The successive speech and silence durations in selected passages of spontaneous speech were found to have a regular structure. Relatively long pauses tended to occur with short utterances and these periods alternated with periods in which relatively short pauses and long utterances occurred together. Other hesitation phenomena appear to be associated more with the periods characterised by the longer pauses than with the periods characterised by long utterances. A relationship was demonstrated in terms of temporal features such that a hesitant period and the following fluent period together might be hypothesised to be a unit.
Aron W Siegman, and Benjamin Pope, “Ambiguity and verbal fluency in the TAT,” Journal of consulting psychology, vol. 30, 1966. DOI: http://dx.doi.org/10.1037/h0023374.

Abstract 15 TAT cards, divided into low-, medium-. and high-ambiguity groups, were administered to 30 female nursing students. Stimulus ambiguity, defined in terms of variability of themes evoked by a given card, was found to be associated with hesitant and disrupted speech. These findings are explained in terms of the mediating role of uncertainty on speech. An adaptation effect was noted. The later, as opposed to the earlier stories, are associated with a longer reaction time, but fewer "ah’s," less silence, and a quicker articulation rate. Finally, significant differences are noted between Ss’ verbal fluency indexes, based on all 15 cards and thus independent of stimulus ambiguity, and verbal fluency indexes obtained in an interview situation. These differences are discussed in terms of monological vs. dialogical speech. (21 ref.) (PsycINFO Database Record (c) 2016 APA, all rights reserved)

1965

Donald S. Boomer, “Hesitation and Grammatical Encoding,” Language and Speech, vol. 8, no. 3, July 1965, pp. 148-158. DOI: 10.1177/002383096500800302.

Abstract The occurrence of filled and unfilled pauses was examined with respect to their location in phonemic clauses. Both types of hesitation were most frequent after the first word in the clause, regardless of length. These data are regarded as directly challenging the transitional probability theory of hesitations. The phonemic clause is proposed as the encoding unit of speech at the grammatical level.
Stanislav V Kasl, and George F Mahl, “Relationship of disturbances and hesitations in spontaneous speech to anxiety,” Journal of Personality and Social Psychology, vol. 1, 1965. DOI: 10.1037/h0021918.

Abstract Past work has indicated that flustered or confused speech can be classed into several distinct speech disturbance categories. Such disturbances, occurring frequently in everyday conversation, have no conventional semantic function. In the present study, 25 experimental and 20 control male Ss were used. Anxiety was manipulated in an interview setting. Under anxiety, the frequency of all speech disturbances, except the familiar "ah," showed a sizable increase. The frequency of ah’s increased strikingly in a change from normal to a telephonelike conversation. Such change did not affect the other disturbances. Measurement of palmar sweat revealed modest positive association with the speech disturbances. Exploration of the relationship of the Taylor MA scale to the disturbances suggested that the ah is functionally distinct from the other speech disturbances. (PsycINFO Database Record (c) 2016 APA, all rights reserved)
Percy H. Tannenbaum, Frederick Williams, and Carolyn S. Hillier, “Word predictability in the environments of hesitations,” Journal of Verbal Learning and Verbal Behavior, vol. 4, no. 2, 1965, pp. 134 - 140. DOI: 10.1016/S0022-5371(65)80097-4. http://www.sciencedirect.com/science/article/pii/S0022537165800974.

Abstract Two experiments were conducted to study the predictability of words in hesitation contexts. The first study focused on a comparison of the first word after hesitations with words sampled from fluent contexts. The second study involved gathering predictability data for all words in a language sample. Results supported the hypothesis that words subsequent to hesitations tend to be less predictable than words uttered in fluent context. But the associated hypothesis that the word antecedent to hesitations is more predictable than other fluent context was not supported. This led to further analysis of predictability of words in the environments of different hesitations, specifically filled pauses and repeats. The implication drawn was that different types of hesitations index different kinds of encoding decision points.
Aron W Siegman, and Benjamin Pope, “Effects of question specificity and anxiety-producing messages on verbal fluency in the initial interview.,” Journal of Personality and Social Psychology, vol. 2, 1965. DOI: 10.1037/h0022491.

Abstract An experimental analogue of the initial interview is used to investigate the effects of interviewer specificity and topical focus, i.e., a low anxiety-arousing vs. a high anxiety-arousing topic, on interviewee’s verbal behavior. It was found that low-specificity interviewer remarks are associated with verbal indices of caution and hesitation ("ah’s," a slow articulation rate and silent pauses). It is suggested that a conceptualization of the specificity variable in terms of informational uncertainty provides a parsimonious explanation for the above findings. The anxiety-arousing topic was associated with disrupted speech ("non-ah" speech disturbances). (25 ref.) (PsycINFO Database Record (c) 2016 APA, all rights reserved)

1963

William Livant, “Antagonistic Functions of Verbal Pauses: Filled and Unfilled Pauses in the Solution of Additions,” Language and Speech, vol. 6, no. 1, January/March 1963, pp. 1-4. DOI: 10.1177/002383096300600101.

Abstract The time required for the mental solution of addition problems is much greater if the subject fills the pause with vocalization habitually used to fill pauses in speaking (e.g.: ah, er, um), than if the pause is filled with silence. This supports previous results of Goldman-Eisler with thematic materials. Practice reduces solution time under both filled and unfilled pauses by an identical proportion whose magnitude varies from subject to subject. It is argued that filled pauses serve antagonistic functions, increasing the speakers control of conversation, but decreasing the quality of his production; filled pauses in speech without an audience may serve an analogous function of controlling distractions introduced by anxiety.

1962

Basil Bernstein, “Linguistic Codes, Hesitation Phenomena and Intelligence,” Language and Speech, vol. 5, 1962, pp. 31-46. DOI: 10.1177/002383096200500104. https://journals.sagepub.com/doi/10.1177/002383096200500104.

Abstract Two linguistic codes have been proposed, elaborated and restricted. These codes are regarded as functions of different social structures. The codes are considered to entail qualitatively different verbal planning orientations which control different modes of self-regulation and levels of cognitive behaviour. Social class differences in the use of the codes were postulated and the hesitation phenomena associated with them predicted. Speech samples were obtained and the hesitation phenomena analysed from a discussion situation involving small groups of middle-class and working-class subjects with varying I.Q. profiles.
Donald Boomer, and Allen Dittmann, “Hesitation Pauses and Juncture Pauses in Speech,” Language and Speech, vol. 5, 1962, pp. 215-220. DOI: 10.1177/002383096200500404.

Abstract A psychophysical comparison of speech pause perception thresholds for juncture and hesitation pauses yielded significantly lower thresholds for the latter. On the basis of these and other data, a functional and methodological distinction between these two types of pause is proposed.

1961

Frieda Goldman-Eisler, “A Comparative Study of Two Hesitation Phenomena,” Language and Speech, vol. 4, no. 1, 1961, pp. 18-26. DOI: 10.1177/002383096100400102.

Abstract The durations of hesitation devices such as the sounds /α, ∊, æ, r, ∂, m/, also called filled pauses, were measured and compared with the durations of silent hesitations or unfilled pauses. Their individual consistency and psychological significance were also investigated and the relation to uncertainty of filled pauses and unfilled pauses respectively was compared. It appears that under certain conditions of speech production the two hesitation phenomena reflect different internal processes.

1959

Howard Maclay, and Charles Osgood, “Hesitation Phenomena in Spontaneous English Speech,” Word, vol. 15, 1959, pp. 19-44. DOI: 10.1080/00437956.1959.11659682. https://www.tandfonline.com/doi/abs/10.1080/00437956.1959.11659682.

Abstract This paper reports an exploratory investigation of hesitation phenomena in spontaneously spoken English. Following a brief review of the literature bearing on such phenomena, a quantitative study of filled and unfilled pauses, repeats, and false starts in the speech of some twelve participants in a conference is described. Analysis in terms of both individual differences and linguistic distribution is made, and some psycholinguistic implications are drawn, particularly as to the nature of encoding units and their relative uncertainty. A distinction between non-chance statistical dependencies and all-or-nothing dependencies in linguistic methodology is made.