Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Bibliography of hesitation phenomena resources

Following is a complete list of published resources in the FPRC bibliography. Note that this is not an exhaustive list of publications related to hesitation phenomena. If you know of resources that ought to be in this list, then please send them to me via the FPRC contact form. Download the entire list in bibtex format here.

in press


  • Luis Bernardo QUESADA NIETO, “Fenómenos de vacilación, sus contextos léxicos ysintácticos en entrevistas formales de legisladores aciudadanos en el Congreso de la Ciudad de México [Lexical and syntactic contexts of hesitation phenomena informal deputy-citizen interviews conducted at Mexico City congress],” Cuadernos de Lingüística de El Colegio de México, vol. 7, no. e141, October 2020, pp. 1-50. DOI: 10.24201/clecm.v7i0.141.

    Abstract This article, which is an outcome of an ethnographic research, aims to offer an insight into lexical and syntactic contexts of some hesitation phenomena (short fillers, repetitions, long fillers, word lengthening, unfinished words and unfinished phrases), identified in a corpus sample that consists of structured interviews conducted by a group of deputies of Mexico City Local Congress with citizens who applied for the ombudsperson’s position at the city’s Human Rights Office (Comisión de Derechos Humanos de la Ciudad de México). Drawing upon a lexical and syntactic description, some remarks on the hesitation phenomena’s linguistic and communicative values are presented. I propose an interpretation of hesitation occurrence patterns that appear in the respondent’s answers. This interpretation is based on the discursive planning level, the interaction between hesitation markers and word classes, and the concept of repertoire as it has been used in the theory of translanguaging. Towards the end of the manuscript I argue that the studied phenomena and their distribution are directly related to open class words, and the cognitive effort of producing grammatical, accurate and socially appropriate messages.

    Keywords hesitation markers, discursive planning, oral language, word classes, repertoire, translanguaging theory

  • Wikipedia contributors, “Filler (linguistics) -- Wikipedia, The Free Encyclopedia,” October 2020.

    Abstract In linguistics, a filler, filled pause, hesitation marker or planner is a sound or word that is spoken in conversation by one participant to signal to others a pause to think without giving the impression of having finished speaking. (These are not to be confused with placeholder names, such as thingamajig, whatchamacallit, whosawhatsa and whats'isface, which refer to objects or people whose names are temporarily forgotten, irrelevant, or unknown.) Fillers fall into the category of formulaic language, and different languages have different characteristic filler sounds. The term filler also has a separate use in the syntactic description of wh-movement constructions.


  • Thanaporn Anansiripinyo, and Chutamanee Onsuwan, “Acoustic-phonetic characteristics of Thai filled pauses in monologues,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 51-54. DOI: 10.21862/diss-09-014-anan-onsu.

    Abstract Filled pause (FP) is one type of disfluent phenomena that is commonly found in everyday speech. It has been widely studied in many languages, but little is known about this topic in Thai. This work explored three important acoustic-phonetic characteristics of Thai filled pauses in monologues. To elicit target monosyllabic tokens of FPs and those of regular word (RW) counterparts, 31 Thai adult females were asked to watch two short cooking videos and describe the contents. They were also asked to read out loud target word lists. Three acoustic measures: syllable dura¬tion, first (F1) and second formant (F2) frequencies were taken from 738 tokens. Across vowel contexts, only F2, not F1, in FPs, was significantly different from that in RWs. Differences in syllable duration between RWs versus FPs were near significant. The findings suggest that Thai speakers produced FPs in a presumably different way from RWs. In FPs, the syllable was relatively lengthened and the tongue position was moved towards the center of vowel space. Future directions include a detailed analysis of FPs in terms of amplitude, fundamental frequency, pause duration before/after fillers and other non-linguistic factors.

  • Maria Bakti, “Error type disfluencies in consecutively interpreted and spontaneous monolingual Hungarian speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 71-74. DOI: 10.21862/diss-09-019-bakti.

    Abstract Interpreting can be considered as a form of spontaneous speech, the key differences being that language change is involved in interpreting and the fact that speech production is influenced by several constraints during interpreting. Research has shown that the interpreting task influences the disfluency patterns of target language texts. The aim of this paper is to investigate how the frequency and distribution of error type disfluencies changes in the target language output of trainee interpreters as they progress in their training. Results indicate that there is no considerable change in the frequency and proportion of error type disfluencies in the target language texts recorded at the end of the second, third and fourth semesters of interpreter training. The proportion of error type disfluencies is higher in the consecutively interpreted texts than in the spontaneous monolingual speech of the students. This suggests that the complexity of the task, rather than progress in training, determines the disfluency pattern of consecutively interpreted target language texts.

  • Charlotte Bellinghausen, Thomas Fangmeier, Bernhard Schröder, Johanna Keller, Susanne Drechsel, Peter Birkholz, Ludger Tebartz van Elst, and Andreas Riedel, “On the role of disfluent speech for uncertainty in articulatory speech synthesis,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 39-42. DOI: 10.21862/diss-09-011-bell-etal.

    Abstract In this paper we present a perception study on the role of disfluent speech in forms of prosodic cues of uncertainty in question-answering situations. In our scenario the answer to each question was modeled by varying three prosodic cues: pause, intonation, and hesitation. The utterances were generated by means of an articulatory speech synthesizer. Subjects were asked to rate each answer on a Likert scale with respect to uncertainty, naturalness and understandability. Results showed evidence for an additive principle of the prosodic cues, i.e. the more cues were activated the higher the perceived level of uncertainty. Overall, the effect of intonation and hesitation was more evident than the effect of pause.

  • Simon Betz, and Loulou Kosmala, “Fill the silence! Basics for modeling hesitation,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 11-14. DOI: 10.21862/diss-09-004-betz-kosm.

    Abstract In order to model hesitations for technical applications such as conversational speech synthesis, it is desirable to understand interactions between individual hesitation markers. In this study, we explore two markers that have been subject to many discussions: silences and fillers. While it is generally acknowledged that fillers occur in two distinct forms, um and uh, it is not agreed on whether these forms systematically influence the length of associated silences. This notion will be investigated on a small dataset of English spontaneous speech data, and the measure of distance between filler and silence will be introduced to the analyses. Results suggest that filler type influences associated silence duration systematically and that silences tend to gravitate towards fillers in utterances, exhibiting systematically lower duration when preceding them. These results provide valuable insights for improving existing hesitation models.

  • Harry Collins , Willow Leonard-Clarke, and Hannah O’Mahoney, “‘Um, er’: how meaning varies between speech and its typed transcript,” Qualitative Research, vol. 19, no. 6, 2019, pp. 653-668. DOI: 10.1177/1468794118816615.

    Abstract We report a small empirical study on the way the transcription used to represent speech affects its meaning. We show that ‘disfluencies’ in speech indicate far more uncertainty in the speaker when transmitted in text than when transmitted in recorded sound. This has important implications for how transcribed interviews should be edited when they are being used to convey meaning rather than the organization of phonemes. We propose the implications of different ways of representing speech in text could be a new subject for investigation. Presented here is one possible empirical approach to such studies.

    Keywords certainty in text and speech, disfluencies, editing of transcripts, interview transcription, meaning, qualitative research, transcribing fillers: um, er, uh

  • Iulia Grosman, Anne Catherine Simon, and Liesbeth Degand, “Empathetic hearers perceive repetitions as less disfluent, especially in non-broadcast situations,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 23-26. DOI: 10.21862/diss-09-007-gros-etal.

    Abstract This experiment measures the impact of the communicative situation on perceived fluency in French speech. We consider three dimensions of fluency: grammatical, discursive and socio-interper¬sonal. We first hypothesise that grammatical fluency is less influenced by contextual constraints than the other two dimensions. Furthermore, taking into account the Interpersonal Reactivity Index of each participant, we hypothesise that hearers with higher interpersonal capacities will be more tolerant in their fluency evaluation, because of their ability to project into the speaker’s mind. The strength of the design rests on the proposal to test natural stimuli and integrate social and individual variables in a perception experiment.

  • Dorottya Gyarmathy, and Viktória Horváth, “Pausing strategies with regard to speech style,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 27-30. DOI: 10.21862/diss-09-008-gyar-horv.

    Abstract Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrela¬tion between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style.

  • Mária Gósy, “Halt command in word retrieval,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 3-6. DOI: 10.21862/diss-09-002-gosy.

    Abstract In this study, occurrences and temporal patterns of five types of disfluencies were analyzed that show a common feature on the surface. All of them have some kind of interruption of content words followed by some continuation. The purpose was to show whether the place of interruption of the word articulation and the durational patterns of the editing phases are characteristic of re-starts, false starts, slips of the tongue, pauses within words, and prolongations. More than 1,400 instances were processed. Both (i) the number of pronounced segments of abandoned words and the duration of the corresponding editing phases are characteristic of a specific disfluency type, and (ii) speakers select a strategy to overcome their speech planning difficulties most economically.

  • Julianna Jankovics, and Luca Garai, “Disfluencies in mildly intellectually disabled young adults’ spontaneous speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 79-82. DOI: 10.21862/diss-09-021-jank-gara.

    Abstract The study analyzes various hesitations and repairs in the spontaneous speech of mildly intellectually disabled women. The main research questions of the study focus on the similarities and differences in the frequency of disfluencies and the duration of pauses between the spontaneous speech of mildly intellectually disabled and mentally healthy young adults. Our results show that hesitation phenomena were more frequent among intellectually disabled subjects in spontaneous speech, while repairs occurred more frequently among control subjects in guided spontaneous speech.

  • Annelies Jehoul, “Filled pauses from a multimodal perspective. On the interplay of speech and eye gaze.,” PhD Dissertation, Katholieke Universiteit Leuven. September 2019(in eng).

    Abstract This project offers a novel, integrative approach on filled pauses, the elements 'euh' and 'euhm' in Dutch. Insights on filled pauses from various research traditions are united to obtain a comprehensive overview of their form and function. Starting from a cognitive-interactional framework, our analysis relates formal variation in filled pauses to the functional variation. We show that formal differences in filled pauses, such as the difference between 'euh' and 'euhm', the difference in duration, the presence of surrounding silences and the speaker's eye gaze behavior, are associated with functional variation. In the study of the function of filled pauses, earlier studies can be distinguished in two approaches: the filler-as-symptom approach and the filler-as-signal approach (Clark & Fox Tree 2002, De Leeuw 2007). The filler-as-symptom perspective interprets filled pauses as symptoms of cognitive difficulties, for example when the speaker is uncertain or has trouble producing an utterance (e.g. Siegman & Pope 1965, Goldman-Eisler 1968, Christenfeld 1994). In the filler-as-signal perspective, a signaling function is attributed to filled pauses. Filled pauses are, amongst other things, claimed to signal the speaker's intention to continue the turn (Maclay & Osgood 1959), mark a delay in speech (Clark & Fox Tree 2002), structure the discourse (Rendle-Short 2004) and exit a sequence (Schegloff 2010). In this project, however, we show that filled pauses cannot be distinguished into cognitive and discursive filled pauses, but rather, that in most of their functions, these two dimensions are connected. There is an association of the complexity of the cognitive processing, and the scope of the discursive force. Both complex cognitive processing and a broad scope are reflected in the form of the filled pause: a longer duration of the filled pause, more pauses, the use of 'euhm' (instead of 'euh'), and the speaker's gaze aversion.

  • Borbála Keszler, and Judit Bóna, “Pausing and disfluencies in elderly speech: Longitudinal case studies,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 67-70. DOI: 10.21862/diss-09-018-kesz-bona.

    Abstract The aim of this paper was to investigate the changes in fluency of speech during ageing. The novelty of the examination is that this is a longitudinal study: it analyses the speech of 7 speakers from middle or young-old age to old-old age. Pausing strategies and frequency of disfluencies were analyzed. Results show that active aging helps to preserve certain parameters of speech characteristics of young speakers.

  • Valéria Krepsz, “Vowel lengthening — Effect of position, age, and phonological quantity,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 59-62. DOI: 10.21862/diss-09-016-krepsz.

    Abstract The present research examined the effect of phrase-final lengthening on the spectral structure of vowels in the spontaneous speech of children and adults. Three Hungarian vowel pairs (in quantity pairs) were analyzed in two positions: in the middle of the phrase and at the end of the phrase. The effect of lengthening on the spectral structure of the vowels were already be detected in four-year-olds. However, its extent was strongly correlated with the articulation aspects of the vowels. There was a discrepancy in the tendencies of the lengthening’s effect between the two groups of children and the adults, presumably due to different linguistic experience, inaccuracy of articulation, and significant individual differences.

  • Mária Laczkó, “Temporal characteristics of teenagers’ spontaneous speech and topic based narratives produced during school lessons,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 63-66. DOI: 10.21862/diss-09-017-laczko.

    Abstract The aim of this presentation is to analyse the articulation and speech rates of teenagers and the types of pauses in their spontaneous speech and topic based narratives during school lessons. The speech samples were analysed in terms of temporal characteristics by Praat program. The results showed the different tempo values and various function of filled pauses in the examnined situations.

  • Mark Liberman, “Dysfluency considered Harmful,” May 2019.

    Abstract … as a technical term, that is. Disfluency is no better, although the prefix is less judgmental. There are two problems: 1. These terms pathologize normal behavior, creating confusion between pathological symptoms and common phenomena in normal speech, which may be different not only in their causes and their frequency but also in behavioral detail; 2. Applied to normal speech, these terms often treat intrinsic aspects of the content and performance of spoken messages as if they were disruptions or failures.

  • Kikuo Maekawa, “Five pieces of evidence suggesting large lookahead in spontaneous monologue,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 7-10. DOI: 10.21862/diss-09-003-maekawa.

    Abstract There is considerable disagreement among the researchers of speech production with respect to the range of lookahead or pre-planning. In this paper, five pieces of evidence suggesting the presence of relatively large lookahead in spontaneous monologues are presented, based on the analyses of the Corpus of Spontaneous Japanese. This evidence consistently suggests that the range of a lookahead is six to seven accentual phrases long, which corresponds on average to 3–4 seconds in the time domain.

  • Helena Moniz, “Processing disfluencies in distinct speaking styles: Idiosyncrasies and transversality,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 1-2. DOI: 10.21862/diss-09-001-moniz.

    Abstract This talk will tackle the idiosyncratic properties of disfluencies in distinct speaking styles, mostly university lectures (Trancoso et al., 2008) and map-task dialogues (Trancoso et al., 1998), but also featuring verbal fluency tests, and (more recently) second language learning presentations in ecological settings. It will also discuss the transversal acoustic-prosodic properties pertained across speaking styles. The main research questions are twofold: i) are there domain effects in the production of disfluencies when speakers adjust to distinct communicative contexts, as in university lectures and dialogues?; ii) if domain effects do exist, are there still acoustic-prosodic properties that can be shared across domains?

  • Johanna Pap, “Effects of speech rate changes on pausing and disfluencies in cluttering,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 75-78. DOI: 10.21862/diss-09-020-pap.

    Abstract People with cluttering (PWC) often receive feedback, such as “Slow down!”, even so, this fluency disorder cannot be cured by only slowing down the speakers’ speech rate. When PWC accelerate their speech rate, language planning difficulties and word structure errors might occur, which might result in breakdowns in fluency and/or intelligibility. In the present paper characteristics of the frequency of disfluencies were examined in four different speech tasks from deliberately slow to maximum speech rate, whether speech rate changes have effects on cluttered speech. Twenty participants of this investigation were individuals suspected of cluttering with ages between 20 and 50 years of both genders. The results show that PWC are able to change, not only their speech rate but articulatory rate as well. Moreover, disfluencies were produced the most frequently in the speech task of maximum speech rate, where PWC do not have enough time for speech planning. The research provides empirical, measured data for a better insight into the nature of cluttering. Understanding the correlation between speech rate and disfluencies in cluttered speech is fundamental to improve the diagnosis of cluttering.

  • Kata Baditzné Pálvölgyi, “Hesitation patterns in the Spanish spontaneous speech of Hungarian learners of Spanish,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 35-38. DOI: 10.21862/diss-09-010-badi.

    Abstract This paper examines what native Spanish speakers find most disturbing in the pronunciation of Hungarian language learners of Spanish. Former research (Baditzné Pálvölgyi, 2019) showed that in spontaneous Spanish speech of at least threshold level Hungarian learners, one of the aspects that Spanish native speakers least tolerated was the way Hungarians hesitated. So the present paper focuses primarily on hesitation phenomena—lengthening and filled pauses—assuming that Hungarians hesitate more, and the lengthened segments are longer than the Spanish ones. In order to validate the hypothesis, an investigation comparing a corpus of Northern Spanish spontaneous speech to another corpus of advanced Hungarian learners of Spanish was conducted.

  • Ralph L. Rose, “A comparison of filled pauses in scripted and non-scripted spontaneous speech,” in The 3rd International Symposium on Linguistic Patterns in Spontaneous Speech, Taipei, Taiwan, November 2019, pp. 21-25.

    Abstract Television and film productions are heavily scripted, but intend to portray speech as unscripted within the fiction of the dramatic universe they depict. Previous evidence (Quaglio, 2009) suggests however, that various lexical features of speech occur in such scripted spontaneous speech differently than they do in actual spontaneous speech. The present study is a comparison of the occurrence of filled pause disfluencies (in English, uh and um) in scripted spontaneous speech and actual spontaneous speech, to see if the basic usage patterns are similar. Using the web site interface, filled pauses were examined in three corpora (spontaneous speech, TV transcripts, and movie transcripts) in terms of their basic frequency of occurrence, their um:uh ratios, and their structural distribution with respect to sentence boundaries. Each was also evaluated in terms of how they shifted over time. Results show that the disfluency patterns of scripted spontaneous speech are similar in many ways to that of actual spontaneous speech. The frequency of filled pauses is similar to that shown in other major corpora and the um:uh ratio also replicates a trend observed in other work (Wieling et al, 2016; Fruehwald, 2016) suggesting an ongoing shift toward the use of um over uh but with television and film speech patterns lagging that of society.

  • Ralph L. Rose, “The structural signaling effect of silent and filled pauses,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 19-22. DOI: 10.21862/diss-09-006-rose.

    Abstract Filled pauses (uh, um) have been shown in a number of studies to have a facilitative effect for listeners, such as helping them better perceive the syntactic structure of ongoing speech. This may be because the extra time afforded by the filled pause gives listeners more time to process the input. Theoretically, then, silent pauses should show a comparable effect. The present study tests this prediction using a grammaticality judgment task following a study by Bailey and Ferreira (2003). Results show that filled and silent pauses have a comparable influence on listeners’ grammaticality judgments but further suggest that listeners deem silent pauses as more important and influential markers.

  • Vered Silber-Varod, Mária Gósy, and Robert Eklund, “Segment prolongation in Hebrew,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 47-50. DOI: 10.21862/diss-09-013-silb-etal.

    Abstract In this paper we study segment prolongations (PRs), a type of disfluency sometimes included under the term “hesitation disfluencies”, in Hebrew. PRs have previously been studied in a number of other lan¬guages within a comprehensive speech disfluency framework, which is applied to Hebrew in the cur¬rent study. For the purpose of this study we defined Hebrew clitics, such as conjunctions, articles, prepositions and so on, as words. The most striking difference between Hebrew and the previously studies languages is how restricted PRs seem to be in Hebrew, occurring almost exclusively on word-final vowels. The most frequently prolonged vowel is [e]. The segment type does not affect PRs’ duration. We found significant differences between men and women regarding the frequency of PRs.

  • Shungo Suzuki, and Judit Kormos, “The effects of read-aloud assistance on second language oral fluency in text summary speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 31-34. DOI: 10.21862/diss-09-009-suzu-korm.

    Abstract Focusing on text summary speaking tasks, the present study investigated the effects of the activation of phonological representations during text comprehension (operationalized by read-aloud assistance) on the subsequent retelling speech. A total of 24 Japanese learners of English completed text summary speaking tasks under two conditions: (a) reading without read-aloud assistance and (b) reading with read-aloud assistance. Their speech data were analyzed by lexical overlap indices (i.e. the ratio of characteristic single-words and multiword sequences) and by fluency measures capturing three major dimensions of fluency—speed, breakdown, and repair fluency. The results showed that read-aloud assistance directly facilitated lexical overlaps with source texts and indirectly improved speed and repair fluency. Furthermore, read-aloud assistance was found to affect the interrelationship between lexical overlaps and utterance fluency. The findings suggested that read-aloud assistance might help second language learners to store multiword sequences as a single unit (i.e. chunking) during text comprehension.

  • Linda Taschenberger, Outi Tuomainen, and Valerie Hazan, “Disfluencies in spontaneous speech in easy and adverse communicative situations: The effect of age,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 55-58. DOI: 10.21862/diss-09-015-tasc-etal.

    Abstract Disfluencies are a pervasive feature of speech communication. Their function in communication is still widely discussed with some proposing that their usage might aid understanding. Accordingly, talkers may produce more disfluencies when conversing in adverse communicative situations, e.g. in background noise. Moreover, increasing age may have an effect on disfluency use as older adults report particular difficulties when communicating in adverse condi¬tions. In this study, we elicited spontaneous speech via a problem-solving task from four different age groups (19–76 years old) to investigate the effect of energetic and informational maskers on the use of filled pauses (FPs), and its interaction with age. Measures of disfluency rates, effort ratings, and communication efficiency were obtained. Results show that, against our predictions, FP usage may decrease in adverse conditions. Moreover, age does not play a great role in adults with normal hearing. The results indicate that individuals differ greatly in their disfluency adaptations, utilising different strategies to overcome challenging communicative situations.

  • Michiko Watanabe, Yusaku Korematsu, and Yuma Shirahata, ““Uh” is preferred by male speakers in informal presentations in American English,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 43-46. DOI: 10.21862/diss-09-012-wata-etal.

    Abstract This study investigates factors that are likely to be related to speakers' choice of filler type between uh and um in English, using an informal presentation speech corpus. The effects of the following factors on the probability of each filler type was examined: (1) immediately preceding clause boundary depth, (2) clause size measured as the number of words in the clause, (3) the number of quotation remarks in the clause, and (4) speaker's sex. The filler probabilities increased with the boundary depths. This trend was much stronger with um than with uh. Ums are more likely to appear clause-initially than uhs. Clause size had similar effect sizes on the two filler types. The number of quotation remarks had a stronger negative effect with ums. Speaker's sex had a significant effect only with uhs. Uhs are used more frequently by male speakers than by female speakers. The results indicate that speakers' choice of filler type is affected by the combination of multiple factors with various effect sizes.

  • Hong Zhang, “Variation in the choice of filled pause: A language change, or a variation in meaning?,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 15-18. DOI: 10.21862/diss-09-005-zhang.

    Abstract The role of filled pauses in message structuring is a heavily debated question, but the result is still somewhat inconclusive. In this study, I consider this question jointly with sociolinguistic factors that have been thought to affect the choice of filled pause in American English. The results suggest that the use of uh is subject to higher variability across not only age groups, but also conversation topics and interlocutors. A latent semantic analysis found consistent difference between two forms of filled pause and silent pauses of varying duration in the primary latent dimension, but similarity between short silent pause and uh, as well as long silent pause and um in the second dimension. Therefore, the functional difference between um and uh should be acknowledged, and the observed change in their relative popularity is potentially related to their different meaning or function in the discourse.


  • Ayşe Altıparmak, and Gülmira Kuruoğlu, “An Analysis of Speech Disfluencies of Turkish Speakers Based on Age Variable,” Journal of Psycholinguistic Research, Jan 2018. DOI: 10.1007/s10936-017-9553-4.

    Abstract The focus of this research is to verify the influence of the age variable on fluent Turkish native speakers’ production of the various types of speech disfluencies. To accomplish this, four groups of native speakers of Turkish between ages 4–8, 18–23, 33–50 years respectively and those over 50-year-olds were constructed. A total of 84 participants took part in this study. Prepared and unprepared speech samples of at least 300 words were collected from each participant via face-to-face interviews that were tape recorded and transcribed; for practical reasons, only the unprepared speech samples were collected from children. As a result, for the prepared speech situation, there was no statistically significant difference in terms of age in the production rates of filled gaps, false starts, slips of the tongue and repetitions; however, participants in the over 50-year-old group produced more hesitations and prolongations than participants in the 18–23 and 33–50-year-old groups. For the unprepared speech situation, age variable was not effective on the production rates of filled gaps. However, 4–8 and over 50-year-old participants produced more hesitations and prolongations than the 18–23 and 33–50-year-old groups. 4–8-year-old children produced more slips of the tongue than the 18–23 and 33–50-year-old groups, and more false starts and repetitions than the participants in the other three age groups (18–23, 33–50, over 50). Further analyses revealed more extensive insights related to the types of disfluencies, the position of disfluencies, and the linguistic units involved in disfluency production in Turkish speech.

    Keywords linguistics, Speech disfluencies, Speech production, Turkish speech

  • Yu-Lin Cheng, “Unfamiliar Accented English Negatively Affects EFL Listening Comprehension: It Helps to be a More Able Accent Mimic,” Journal of Psycholinguistic Research, Feb 2018. DOI: 10.1007/s10936-018-9562-y.

    Abstract In this study, EFL learners who listened to four short context-rich audio files each delivered in an unfamiliar English accent were required to produce best-attempt transcriptions and accent imitation recordings. Results indicate that exposure alone does not suffice to eliminate accent impact on EFL listeners. Importantly, results from one-way ANOVA analyses reveal between-participants differences in residual accent impact, vocabulary knowledge, and quality of accent imitation. Results from a linear mixed-effects model analysis, while suggesting that other unidentified factors may also assist EFL listeners in processing unfamiliar accented English, demonstrate that the more able mimics cope more successfully with unfamiliar accents than the less able mimics. Counter-intuitively, vocabulary knowledge is rejected as a predictor for success in reducing accent impact. A logical explanation for this particular finding is that a larger vocabulary repertoire aids listeners where there is no interference from unfamiliar accents. Given these findings, to better prepare EFL listeners for the English-as-an-International-Language world, training should include both listening to a variety of native and non-native accents and performing accent imitation (reproduction) exercises to further expand listeners’ phonological-phonetic flexibility.

    Keywords Accent imitation, Accent impact, Chinese-L1, EFL

  • Felix Ball, Lara E. Michels, Carsten Thiele, and Toemme Noesselt, “The role of multisensory interplay in enabling temporal expectations,” Cognition, vol. 170, no. Supplement C, 2018, pp. 130 - 146. DOI: 10.1016/j.cognition.2017.09.015.

    Abstract Temporal regularities can guide our attention to focus on a particular moment in time and to be especially vigilant just then. Previous research provided evidence for the influence of temporal expectation on perceptual processing in unisensory auditory, visual, and tactile contexts. However, in real life we are often exposed to a complex and continuous stream of multisensory events. Here we tested – in a series of experiments – whether temporal expectations can enhance perception in multisensory contexts and whether this enhancement differs from enhancements in unisensory contexts. Our discrimination paradigm contained near-threshold targets (subject-specific 75% discrimination accuracy) embedded in a sequence of distractors. The likelihood of target occurrence (early or late) was manipulated block-wise. Furthermore, we tested whether spatial and modality-specific target uncertainty (i.e. predictable vs. unpredictable target position or modality) would affect temporal expectation (TE) measured with perceptual sensitivity (d′) and response times (RT). In all our experiments, hidden temporal regularities improved performance for expected multisensory targets. Moreover, multisensory performance was unaffected by spatial and modality-specific uncertainty, whereas unisensory TE effects on but not RT were modulated by spatial and modality-specific uncertainty. Additionally, the size of the temporal expectation effect, i.e. the increase in perceptual sensitivity and decrease of RT, scaled linearly with the likelihood of expected targets. Finally, temporal expectation effects were unaffected by varying target position within the stream. Together, our results strongly suggest that participants quickly adapt to novel temporal contexts, that they benefit from multisensory (relative to unisensory) stimulation and that multisensory benefits are maximal if the stimulus-driven uncertainty is highest. We propose that enhanced informational content (i.e. multisensory stimulation) enables the robust extraction of temporal regularities which in turn boost (uni-)sensory representations.

    Keywords Auditory dominance, Multisensory interplay, Redundant target, Spatial coincidence, Temporal expectation, Temporal orienting

  • Matthew Purver, Julian Hough, and Christine Howes, “Computational Models of Miscommunication Phenomena,” Topics in Cognitive Science, 3 2018. DOI: 10.1111/tops.12324. http:

    Abstract Miscommunication phenomena such as repair in dialogue are important indicators of the quality of communication. Automatic detection is therefore a key step toward tools that can characterize communication quality and thus help in applications from call center management to mental health monitoring. However, most existing computational linguistic approaches to these phenomena are unsuitable for general use in this way, and particularly for analyzing human–human dialogue: Although models of other-repair are common in human-computer dialogue systems, they tend to focus on specific phenomena (e.g., repair initiation by systems), missing the range of repair and repair initiation forms used by humans; and while self-repair models for speech recognition and understanding are advanced, they tend to focus on removal of “disfluent” material important for full understanding of the discourse contribution, and/or rely on domain-specific knowledge. We explain the requirements for more satisfactory models, including incrementality of processing and robustness to sparsity. We then describe models for self- and other-repair detection that meet these requirements (for the former, an adaptation of an existing repair model; for the latter, an adaptation of standard techniques) and investigate how they perform on datasets from a range of dialogue genres and domains, with promising results.

    Keywords Dialogue, disfluency, Incrementality, Miscommunication, Parallelism, repair, Sparsity

  • Julie Sedivy, “Your Speech Is Packed With Misunderstood, Unconscious Messages,” March 2018.

    Abstract Imagine standing up to give a speech in front of a critical audience. As you do your best to wax eloquent, someone in the room uses a clicker to conspicuously count your every stumble, hesitation, um and uh; once you’ve finished, this person loudly announces how many of these blemishes have marred your presentation...

  • Sophia Uddin, Shannon L.M. Heald, Stephen C. Van Hedger, Serena Klos, and Howard C. Nusbaum, “Understanding environmental sounds in sentence context,” Cognition, vol. 172, 2018, pp. 134 - 143. DOI: 10.1016/j.cognition.2017.12.009.

    Abstract There is debate about how individuals use context to successfully predict and recognize words. One view argues that context supports neural predictions that make use of the speech motor system, whereas other views argue for a sensory or conceptual level of prediction. While environmental sounds can convey clear referential meaning, they are not linguistic signals, and are thus neither produced with the vocal tract nor typically encountered in sentence context. We compared the effect of spoken sentence context on recognition and comprehension of spoken words versus nonspeech, environmental sounds. In Experiment 1, sentence context decreased the amount of signal needed for recognition of spoken words and environmental sounds in similar fashion. In Experiment 2, listeners judged sentence meaning in both high and low contextually constraining sentence frames, when the final word was present or replaced with a matching environmental sound. Results showed that sentence constraint affected decision time similarly for speech and nonspeech, such that high constraint sentences (i.e., frame plus completion) were processed faster than low constraint sentences for speech and nonspeech. Linguistic context facilitates the recognition and understanding of nonspeech sounds in much the same way as for spoken words. This argues against a simple form of a speech-motor explanation of predictive coding in spoken language understanding, and suggests support for conceptual-level predictions.

    Keywords Constraint, Context, Environmental sound perception, Language, Recognition, speech perception

  • Sylvie Hancil, “Discourse coherence and intersubjectivity: The development of final but in dialogues,” Language Sciences, 2018. DOI: 10.1016/j.langsci.2017.12.002.

    Abstract All the studies on final particles in non-Asian languages systematically propose a synchronic view of the constructions under consideration. This paper closes the gap by offering a diachronic analysis of final but in dialogues in a corpus of Northern English over a sixty-year period. Relying on Schiffrin’s (1987) planes of discourse and Hasselgård’s (2006) definition of a modal particle, it is shown that final but has semantic–pragmatic properties of both a discourse marker and a modal particle. A socio-linguistic approach complements the analysis. Besides, the modal values identified are discussed in relation to Traugott’s (1982) and Traugott and Dasher’s (2002) theories of language change. Finally, it is explained how final but can be inserted in the category of final particles.

    Keywords Discourse value, Final particles, language change, Modal value, Northern English, Socio-linguistic parameters


  • Jens Allwood, “Fluency or disfluency?,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 1-4.

    Abstract In this paper, I investigate the concepts of “fluency” and “disfluency” and argue that the application of the two concepts must be relativized to type of communicative activity. It is not clear that there is a generic sense of fluency or disfluency, rather what contributes to fluency and disfluency depends on what type of communication we are dealing with. The paper then turns to a brief investigation of what makes interactive face-to-face communication fluent or disfluent and argues that many of the features that have been labeled as disfluent, in fact, contribute to the fluency of interactive communication. Finally, I suggest that maybe it is time for a change of terminology and abandon the term “disfluent” for more positive or neutral terminology.

    Keywords DiSS

  • Ana Rita S. Valente, Kenneth O. St. Louis, Margaret Leahy, Andreia Hall, and Luis M.T. Jesus, “A country-wide probability sample of public attitudes toward stuttering in Portugal,” Journal of Fluency Disorders, vol. 52, 2017, pp. 37 - 52. DOI:

    Abstract Background. Negative public attitudes toward stuttering have been widely reported, although differences among countries and regions exist. Clear reasons for these differences remain obscure. | Purpose. Published research is unavailable on public attitudes toward stuttering in Portugal as well as a representative sample that explores stuttering attitudes in an entire country. This study sought to (a) determine the feasibility of a country-wide probability sampling scheme to measure public stuttering attitudes in Portugal using a standard instrument (the "Public Opinion Survey of Human Attributes–Stuttering" ["POSHA–S"]) and (b) identify demographic variables that predict Portuguese attitudes. | Methods. The POSHA–S was translated to European Portuguese through a five-step process. Thereafter, a local administrative office-based, three-stage, cluster, probability sampling scheme was carried out to obtain 311 adult respondents who filled out the questionnaire. | Results. The Portuguese population held stuttering attitudes that were generally within the average range of those observed from numerous previous POSHA–S samples. Demographic variables that predicted more versus less positive stuttering attitudes were respondents’ age, region of the country, years of school completed, working situation, and number of languages spoken. Non-predicting variables were respondents’ sex, marital status, and parental status. | Conclusion. A local administrative office-based, probability sampling scheme generated a respondent profile similar to census data and indicated that Portuguese attitudes are generally typical.

    Keywords Representative Sampling

  • Anne Ruth van Leeuwen, Right on time. Utrecht, the Netherlands: Netherlands Graduate School of Linguistics / Landelijke (LOT).2017, pp. 155.

    Abstract When a conversation is running smoothly, you know exactly when to nod, hum, or when to start your turn. You feel understood and connected, and you sense that your conversational partner feels the same. However, a conversation may also contain awkward silences, simultaneous starts, and an overall feeling of stuttering and stammering. During such conversations, you are often left with feelings of distance and mutual incomprehension. | Many people share the intuition that the expression of ‘being in sync’ with someone means that you are somehow in tune, in agreement, or in harmony with the other. This dissertation explores whether this intuition is correct; it investigates whether specific temporal patterns between turn-taking speakers, including synchronization of speech rhythms, shape the affective impression of speakers in conversation. The answer to this question can broaden our understanding of the affective push-and-pull of spoken interaction that we experience every day. | This question was explored by presenting participants with short fragments of dialogues between speakers in which we manipulated the temporal patterns between those speakers. Participants were then asked to rate the perceived degree of affiliation between the speakers of those fragments. In the last study of this dissertation we also recorded participants’ real-time affective response during listening to these fragments. We found that, in addition to the presence of overlapping talk, responding too early given the beat of the previous speaker conveys disaffiliation. ‘Being in sync’ is not just a figure of speech, but a real sign of affiliation in spoken dialogue.

  • Malte Belz, “Glottal filled pauses in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 5-8.

    Abstract For German, filled pauses are traditionally described with a vocalic form äh and a vocalic-nasal form ähm. A corpus-based approach and a closer phonetic inspection is used here to argue for an additional form, namely glottal filled pauses. In the data analysed for this study, the glottal form is produced by all seven speakers and amounts to 21% of all filled pauses. Contexts and durations of occurrences are discussed and compared to earlier studies on traditional filled pauses. It is suggested that the glottal variant should be considered in future studies on filled pauses and disfluencies.

    Keywords DiSS

  • Axel Bergström, Martin Johansson, and Robert Eklund, “Differences in production of disfluencies in children with typical language development and children with mixed receptive-expressive language disorder,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 9-12.

    Abstract There are several studies about non-fluency in people who stutter, but comparatively few regarding children with language impairment. The current research body regarding disfluencies in children with language impairment has been using different study-designs and definitions, making some results rather contradictory. The purpose of the present study is to expand the knowledge about disfluencies in children with language impairment and compare the occurrence of disfluencies between children with language impairment and children with typical language development in the same age group. A total of ten children with language impairment and six children with typical language development participated in this study. The subjects were recorded when talking freely about a thematic picture or toys and then analysed by calculating disfluencies per 50 words including frequency of different kinds of disfluencies according to Johnson and Associates’ (1959) classic taxonomy. Our results show that children with language impairment do produce statistically significant more disfluency in general, notably sound and syllable repetition, broken words and prolongations.

    Keywords DiSS

  • Simon Betz, Robert Eklund, and Petra Wagner, “Prolongation in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 13-16.

    Abstract We investigate segment prolongation as a means of disfluent hesitation in spontaneous German speech. We describe phonetic and structural features of disfluent prolongation and compare it to data of other languages and to non-disfluent prolongations.

    Keywords DiSS

  • Hans Rutger Bosker, “How our own speech rate influences our perception of others.,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 43, no. 8, 08/2017 2017, pp. 1225-1238. DOI: 10.1037/xlm0000381.

    Abstract In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved)

  • Hans Rutger Bosker, Eva Reinisch, and Matthias J. Sjerps, “Cognitive load makes speech sound fast, but does not modulate acoustic context effects,” Journal of Memory and Language, vol. 94, 2017, pp. 166 - 176. DOI: 10.1016/j.jml.2016.12.002.

    Abstract In natural situations, speech perception often takes place during the concurrent execution of other cognitive tasks, such as listening while viewing a visual scene. The execution of a dual task typically has detrimental effects on concurrent speech perception, but how exactly cognitive load disrupts speech encoding is still unclear. The detrimental effect on speech representations may consist of either a general reduction in the robustness of processing of the speech signal (‘noisy encoding’), or, alternatively it may specifically influence the temporal sampling of the sensory input, with listeners missing temporal pulses, thus underestimating segmental durations (‘shrinking of time’). The present study investigated whether and how spectral and temporal cues in a precursor sentence that has been processed under high vs. low cognitive load influence the perception of a subsequent target word. If cognitive load effects are implemented through ‘noisy encoding’, increasing cognitive load during the precursor should attenuate the encoding of both its temporal and spectral cues, and hence reduce the contextual effect that these cues can have on subsequent target sound perception. However, if cognitive load effects are expressed as ‘shrinking of time’, context effects should not be modulated by load, but a main effect would be expected on the perceived duration of the speech signal. Results from two experiments indicate that increasing cognitive load (manipulated through a secondary visual search task) did not modulate temporal (Experiment 1) or spectral context effects (Experiment 2). However, a consistent main effect of cognitive load was found: increasing cognitive load during the precursor induced a perceptual increase in its perceived speech rate, biasing the perception of a following target word towards longer durations. This finding suggests that cognitive load effects in speech perception are implemented via ‘shrinking of time’, in line with a temporal sampling framework. In addition, we argue that our results align with a model in which early (spectral and temporal) normalization is unaffected by attention but later adjustments may be attention-dependent.

    Keywords Acoustic context, cognitive load, Rate normalization, Spectral normalization

  • Shin Ying Chu, Naomi Sakai, Koichi Mori, and Lisa Iverach, “Japanese normative data for the Unhelpful Thoughts and Beliefs about Stuttering (UTBAS) Scales for adults who stutter,” Journal of Fluency Disorders, vol. 51, 03/2017 2017, pp. 1-7. DOI:

    Abstract Purpose. This study reports Japanese normative data for the Unhelpful Thoughts and Beliefs about Stuttering (UTBAS) scales. We outline the translation process, and evaluate the psychometric properties of the Japanese version of the UTBAS scales. | Methods. The translation of the UTBAS scales into Japanese (UTBAS-J) was completed using the standard forward-backward translation process, and was administered to 130 Japanese adults who stutter. To validate the UTBAS-J scales, scores for the Japanese and Australian cohorts were compared. Spearman correlations were conducted between the UTBAS-J and the Modified Erickson Communication Attitude scale (S-24), the self-assessment scale of speech (SA scale), and age. The test-retest reliability and internal consistency of the UTBAS-J were assessed. Independent t-tests were conducted to evaluate the differences in the UTBAS-J scales according to gender, speech treatment experience, and stuttering self-help group participation experience. | Results. The UTBAS-J showed good test-retest reliability, high internal consistency, and moderate to high significant correlations with S-24 and SA scale. A weak correlation was found between the UTBAS-J scales with age. No significant relationships were found between UTBAS-J scores, gender and speech treatment experience. However, those who participated in the stuttering self-help group demonstrated lower UTBAS-J scores than those who did not. | Conclusion. Given the current scarcity of clinical assessment tools for adults who stutter in Japan, the UTBAS-J holds promise as an assessment tool and outcome measure for use in clinical and research environments.

    Keywords Assessment, Japanese, Psychosocial issues, Questionnaire, stuttering

  • Jennifer Cole, Timothy Mahrt, and Joseph Roy, “Crowd-sourcing prosodic annotation,” Computer Speech & Language, 2017, pp. -. DOI:

    Abstract Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

    Keywords Speech transcription

  • Ludivine Crible, Liesbeth Degand, and Gaëtanelle Gilquin, “The clustering of discourse markers and filled pauses A corpus-based French-English study of (dis)fluency,” Languages in Contrast, vol. 17, 02/2017 2017, pp. 69-95. DOI: 10.1075/lic.17.1.04cri.

    Abstract This article presents a corpus-based contrastive study of (dis)fluency in French and English, focusing on the clustering of discourse markers (DMs) and filled pauses (FPs) across various spoken registers. Starting from the hypothesis that markers of (dis)fluency, or ‘fluencemes’, occur more frequently in sequences than in isolation, and that their contribution to the relative fluency of discourse can only be assessed by taking into account the contextual distribution of these sequences, this study uncovers the specific contextual conditions that trigger the clustering of fluencemes in the two languages. First, the contexts of appearance of DMs and FPs are described separately, both in English and French, focusing on their distribution, position and co-occurrence patterns. Then, the combination of DMs and FPs in sequences and their different configurations (DM+FP, FP+DM, etc.) are investigated. Overall, it appears that FPs function differently depending on whether they are clustered with DMs or not, and this difference consists in either maintaining or erasing inter- and intra-linguistic contrasts.

    Keywords comparable corpus, Discourse markers, English/French, filled pauses, Fluency

  • Jillian Donahue, Christine Schoepfer, and Robin Lickley, “The effects of disfluent repetitions and speech rate on recall accuracy in a discourse listening task,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 17-20.

    Abstract disfluency on word recognition and local syntactic or semantic issues, fewer have addressed the impact on comprehension at a discourse level. In this work, we ask what effects features typical in the pathological condition of cluttering (essentially, rapid, disfluent and unintelligible speech) have on our ability to retain the information conveyed in speech. Specifically, we manipulate repetition disfluencies and speech rate in passages of running speech. Forty participants listened to four recordings of passages presented in four conditions: Control, Rapid, Disfluent, Rapid + Disfluent. They were asked to recall details of the passages and rate their speed, fluency and comprehensibility. Both repetition disfluencies and increased speech rate significantly reduced recall of information from discourse. Though no relationship was found between the working memory span of individuals and information recall, we argue that the cognitive load of these features of cluttered speech significantly affects intelligibility and thus recall of speech.

    Keywords DiSS

  • Megan Drevets, and Robin Lickley, “A psycholinguistic exploration of disfluency behaviour during the tip-of-the-tongue phenomenon,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 21-24.

    Abstract A tip-of-the-tongue state (TOT) occurs when a speaker knows a word but cannot retrieve its phonological form from memory. While previous studies have found that disfluencies are related to lexical retrieval difficulties, the literature lacks studies which have specifically investigated the impact of TOTs on disfluency. This study explores the relationship between TOTs and such disfluency behaviours as hesitations and target approximations (i.e. incorrect attempts to produce targets). TOTs were induced using the TOTimal method (Smith, Brown & Balfour, 1991), where participants memorised and retrieved the names of imaginary animals. Speech samples were analysed for TOTs and disfluencies. Disfluency rates increased with retrieval times during resolved TOTs. Additionally, target approximation rates correlated with the rates of both TOTs and “Don’t Know” responses, suggesting that target approximations are not unique to TOTs but are indicative of general uncertainty during lexical retrieval.

    Keywords DiSS

  • Gary Geunbae Lee, Ho-Young Lee, Jieun Song, Byeongchang Kim, Sechun Kang, Jinsik Lee, and Hyosung Hwang, “Automatic sentence stress feedback for non-native English learners,” Computer Speech & Language, vol. 41, 2017, pp. 29 - 42. DOI:

    Abstract This paper proposes a sentence stress feedback system in which sentence stress prediction, detection, and feedback provision models are combined. This system provides non-native learners with feedback on sentence stress errors so that they can improve their English rhythm and fluency in a self-study setting. The sentence stress feedback system was devised to predict and detect the sentence stress of any practice sentence. The accuracy of the prediction and detection models was 96.6% and 84.1%, respectively. The stress feedback provision model offers positive or negative stress feedback for each spoken word by comparing the probability of the predicted stress pattern with that of the detected stress pattern. In an experiment that evaluated the educational effect of the proposed system incorporated in our CALL system, significant improvements in accentedness and rhythm were seen with the students who trained with our system but not with those in the control group.

    Keywords CALL

  • Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in chat and chunk phases of multiparty casual talk,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 25-28.

    Abstract Multiparty casual conversation lasting more than a few minutes can be viewed as a series of phases of chat and chunk type interaction, where chat is interactive conversation with several participants taking turns, and chunk refers to phases where one participant dominates the conversation, often by telling a story or giving an opinion. We investigate the distribution of disfluency in these phases in a 70-minute 5-party conversation where participants had no practical task to perform. This pilot study shows differences in the distribution of disfluency types and frequency in the two phases.

    Keywords DiSS

  • Mária Gósy, Dorottya Gyarmathy, and András Beke, “Phonetic analysis of filled pauses based on a Hungarian-English learner corpus,” International Journal of Learner Corpus Research, vol. 3, 12/2017 2017, pp. 149-174. DOI: 10.1075/ijlcr.3.2.03gos.

    Abstract Filled pauses may reveal speech planning or execution problems to a greater extent in L2 spontaneous speech than in L1. The purpose of this study was to analyze the forms and position of all filled pauses, and the durations and the formants of vocalic filled pauses in English (L2) and in Hungarian (L1) spontaneous speech produced by 30 young learners with various L2 proficiency levels using data from our HunEng-D learner corpus. The findings showed that the forms of filled pauses were similar in both languages, irrespective of level of language proficiency. Results confirmed significantly longer vocalic filled pauses in basic and intermediate learners in their L2 relative to their more advanced peers. Formant values (as acoustic reflections of vowel quality) indicated very similar articulatory configurations for all vocalic filled pauses, irrespective of language and language proficiency.

    Keywords acoustics of vocalic filled pauses, duration, HunEng-D corpus, proficiency level

  • Mária Gósy, and Robert Eklund, “Segment prolongation in Hungarian,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 29-32.

    Abstract Segment prolongation (PR) has been shown to be one of the most common forms of non-pathological speech disfluencies (Eklund, 2001). The distribution of PRs in the word (initial–medial–final segment) seems to vary between languages of different syllable-structure complexity, making it interesting to study segment prolongation in languages that exhibit different syllable structure characteristics. Previous studies have studied languages with complex syllable structure, such as English and Swedish (Eklund & Shriberg, 1998; Eklund, 2001, 2004) where affixation creates complex consonant clusters, and languages with very simple syllable, such as Japanese (Den, 2003) or Tok Pisin (Eklund, 2001, 2004), as well as Mandarin Chinese (Lee et al., 2004). In this paper we study PRs in Hungarian. Our results indicate that PRs in Hungarian are more similar to English and Swedish than it is to Japanese, Tok Pisin or Mandarin Chinese, which lends support to the notion that underlying morphology plays a role in how PRs is realised.

    Keywords DiSS

  • Peter Howell, Kaho Yoshikawa, Kevin Tang, John Harris, and Clarissa Sorger, “Intervention for word-finding difficulty for children starting school who have diverse language backgrounds,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 33-36.

    Abstract Children who have word-finding difficulty can be identified by the pattern of disfluencies in their spontaneous speech; in particular whole-word repetition of prior words often occurs when they cannot retrieve the subsequent word. Work is reviewed that shows whole-word repetitions can be used to identify children from diverse language backgrounds who have word-finding difficulty. The symptom-based identification procedure was validated using a non-word repetition task. Children who were identified as having word-finding difficulty were given phonological training that taught them features of English that they lacked (this depended on their language background). Then they received semantic training. In the cases of children whose first language was not English, the children were primed to use English and then presented with material where there was interference in meanings across the languages (English names had to be produced). It was found that this training improved a range of outcome measures related to education.

    Keywords DiSS

  • Kenneth O. St. Louis, Farzan Irani, Rodney M. Gabel, Stephanie Hughes, Marilyn Langevin, Midori Rodriguez, Kathleen Scaler Scott, and Mary E. Weidner, “Evidence-based guidelines for being supportive of people who stutter in North America,” Journal of Fluency Disorders, 2017, pp. -. DOI: 10.1016/j.jfludis.2017.05.002.

    Abstract Purpose. While many resources, particularly those available on the Internet, provide suggestions for fluent speakers as they interact with people who stutter (PWS), little evidence exists to support these suggestions. Thus, the purpose of this study was to document the supportiveness of common public reactions, behaviors, or interventions to stuttering by PWS. | Methods. 148 PWS completed the Personal Appraisal of Support for Stuttering-Adults. Additionally, a comparison of the opinions of adults who stutter based on gender and their involvement in self-help/support groups was undertaken. | Results. Many of the Internet-based suggestions for interacting with PWS are aligned with the opinions of the participants of this study. Significant differences were found amongst people who stutter on the basis of gender and involvement in self-help groups. | Conclusions. Lists of “DOs and DON’Ts” that are readily available on the Internet are largely supported by the data in this study; however, the findings highlight the need for changing the emphasis from strict rules for interacting with people who stutter to more flexible principles that keep the needs of individual PWS in mind.

  • Loulou Kosmala, and Aliyah Morgenstern, “A preliminary study of hesitation phenomena in L1 and L2 productions: a multimodal approach,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 37-40.

    Abstract This paper presents a preliminary study of vocal hesitations in L1 and L2 productions using a multimodal perspective. It investigates the use of vocal hesitations of French learners of English interacting in tandem with American speakers in semi-spontaneous speech. Several hesitation markers were analyzed (filled pauses, unfilled pauses, prolongations and non-lexical sounds) based on formal and functional features as well as their relation to gesture. Results do not show great differences in the frequency of vocal hesitations between L1 and L2 productions overall; however, we find differences in duration and combination complexity. Our study indicated that vocal hesitations mainly served planning functions and were very often accompanied with gaze aversion both in L1 and L2 productions. Moreover, speakers did not tend to gesture while hesitating. We conclude that hesitations mainly served planning strategies both in L1 and L2 speech, but with some differences in duration and complexity.

    Keywords DiSS

  • Kurt Eggers, and Sabine Van Eerdenbrugh, “Speech disfluencies in children with Down Syndrome,” Journal of Communication Disorders, 2017. DOI: 10.1016/j.jcomdis.2017.11.001.

    Abstract Purpose. Speech and language development in individuals with Down syndrome is often delayed and/or disordered and speech disfluencies appear to be more common. These disfluencies have been labeled over time as stuttering, cluttering or both. | Findings. were usually generated from studies with adults or a mixed age group, quite often using different methodologies, making it difficult to compare findings. Therefore, the purpose of this study was to analyze and describe the speech disfluencies of a group, only consisting of children with Down Syndrome between 3 and 13 years of age. | Method. Participants consisted of 26 Dutch-speaking children with DS. Spontaneous speech samples were collected and 50 utterances were analyzed for each child. Types of disfluencies were identified and classified into stuttering-like (SLD) and other disfluencies (OD). The criterion of three or more SLD per 100 syllables (cf. Ambrose & Yairi, 1999) was used to identify stuttering. Additional parameters such as mean articulation rate (MAR), ratio of disfluencies, and telescoping (cf. Coppens-Hofman et al., 2013) were used to identify cluttering and to differentiate between stuttering and cluttering. | Results & conclusion. Approximately 30 percent of children with DS between 3 and 13 years of age in this study stutter, which is much higher than the prevalence in normally developing children. Moreover, this study showed that the speech of children with DS has a different distribution of types of disfluencies than the speech of normally developing children. Although different cluttering-like characteristics were found in the speech of young children with DS, none of them could be identified as cluttering or cluttering-stuttering.

    Keywords Cluttering, Down Syndrome, Speech disfluencies, stuttering

  • Craig Lambert, Judit Kormos, and Danny Minn, “Task Repetition and Second Language Speech Processing,” Studies in Second Language Acquisition, vol. 39, no. 1, 2017, pp. 167–196. DOI: 10.1017/S0272263116000085.

    Abstract This study examines the relationship between the repetition of oral monologue tasks and immediate gains in L2 fluency. It considers the effect of aural-oral task repetition on speech rate, frequency of clause-final and midclause filled pauses, and overt self-repairs across different task types and proficiency levels and relates these findings to specific stages of L2 speech production (conceptualization, formulation, and monitoring). Thirty-two Japanese learners of English sampled at three levels of proficiency completed three oral communication tasks (instruction, narration, and opinion) six times. Results revealed that immediate aural-oral same task repetition was related to gains in oral fluency regardless of proficiency level or task type. Overall gains in speech rate were the largest across the first three performances of each task type but continued until the fifth performance. More specifically, however, clause-final pauses decreased until the second performance, midclause pauses decreased up to the fourth, and self-repairs decreased only after the fourth performance, indicating that task repetition may have been differentially related to specific stages in the speech production process.

  • Ludivine Crible, “Discourse markers and (dis)fluency in English and French Variation and combination in the DisFrEn corpus,” International Journal of Corpus Linguistics, vol. 22, no. 2, 09/2017 2017, pp. 242-264. DOI: 10.1075/ijcl.22.2.04cri.

    Abstract While discourse markers (DMs) and (dis)fluency have been extensively studied in the past as separate phenomena, corpus-based research combining large-scale yet fine-grained annotations of both categories has, however, never been carried out before. Integrating these two levels of analysis, while methodologically challenging, is not only innovative but also highly relevant to the investigation of spoken discourse in general and form-meaning patterns in particular. The aim of this paper is to provide corpus-based evidence of the register-sensitivity of DMs and other disfluencies (e.g. pauses, repetitions) and of their tendency to combine in recurrent clusters. These claims are supported by quantitative findings on the variation and combination of DMs with other (dis)fluency devices in DisFrEn, a richly annotated and comparable English-French corpus representative of eight different interaction settings. The analysis uncovers the prominent place of DMs within (dis)fluency and meaningful association patterns between forms and functions, in a usage-based approach to meaning-in-context.

    Keywords corpus annotation, dis uency, Discourse markers, speech, usage-based

  • Kikuo Maekawa, Ken’ya Nishikawa, and Shu-Chuan Tseng, “Phonetic characteristics of filled pauses: a preliminary comparison between Japanese and Chinese,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 41-44.

    Abstract Filled pauses in spontaneous Chinese and Japanese were analyzed to examine if there is systematic phonetic difference between the vowels of filled pauses and those occurred in ordinary lexical items. Also, the effect of the category of filled pauses (simple vocalic fillers versus fillers derived from demonstratives) was examined in both languages. Random forests analysis revealed that it was possible to construct automatic classifiers that achieved F-measure values of .7-.9. It turned out also that, in both languages, vowels in simple vocalic filled pauses showed higher F-values than the filled pauses derived from demonstratives. Lastly, it turned out that acoustic features distinguishing filled pauses from ordinary lexical items differ depending on both the category of filled pauses and languages.

    Keywords DiSS

  • Srdan Medimorec, Torin P. Young, and Evan F. Risko, “Disfluency effects on lexical selection,” Cognition, vol. 158, 01/2017 2017, pp. 28 - 32. DOI:

    Abstract Recent research has suggested that introducing a disfluency in the context of written composition (i.e., typing with one hand) can increase lexical sophistication. In the current study, we provide a strong test between two accounts of this phenomenon, one that attributes it to the delay caused by the disfluency and one that attributes it to the disruption of typical finger-to-letter mappings caused by the disfluency. To test between these accounts, we slowed down participants’ typewriting by introducing a small delay between keystrokes while individuals wrote essays. Critically, this manipulation did not disrupt typical finger-to-letter mappings. Consistent with the delay-based account, our results demonstrate that the essays written in this less fluent condition were more lexically diverse and used less frequent words. Implications for the temporal dynamics of lexical selection in complex cognitive tasks are discussed.

    Keywords Lexical sophistication

  • Mohammad Alameer, Lotte Meteyard, and David Ward, “Stuttering Generalization Self-Measure: Preliminary Development of a Self-Measuring Tool,” Journal of Fluency Disorders, 2017, pp. -. DOI: 10.1016/j.jfludis.2017.04.001.

    Abstract Introduction. Generalization of treatment is considered a difficult task for clinicians and people who stutter (PWS), and can constitute a barrier to long-term treatment success. To our knowledge, there are no standardized tests that collect measurement of the behavioral and cognitive aspects alongside the client’s self-perception in real-life speaking situations. | Purpose. This paper describes the preliminary development of a Stuttering Generalization Self-Measure (SGSM). The purpose of SGSM is to assess 1) stuttering severity and 2) speech-anxiety level during real-life situations as perceived by PWS. Additionally, this measurement aims to 3) investigate correlations between stuttering severity and speech-anxiety level within the same real-life situation. | Method. The SGSM initially reported includes nine speaking situations designed that are developed to cover a variety of frequent speaking scenario situations. However, two of these were less commonly encountered by participants and subsequently not included in the final analyses. Items were created according to five listener categories (family and close friends, acquaintances, strangers, persons of authority, and giving a short speech to small audience). Forty-three participants (22 PWS, and 21 control) aged 18 to 53 years were asked to complete the assessment in real-life situations. | Results. Analyses indicated that test-retest reliability was high for both groups. Discriminant validity was also achieved as the SGSM scores significantly differed between the controls and PWS two groups for stuttering and speech-anxiety. Convergent validity was confirmed by significant correlations between the SGSM and other speech-related anxiety measures.

    Keywords Assessment, Generalization, Self-perception, Speech-anxiety, Stuttering severity

  • Naomi Ogi, Involvement and Attitude in Japanese Discourse. Amsterdam, Netherlands: John Benjamins.2017. DOI: 10.1075/pbns.272.$#$catalog/books/pbns.272/main.

    Abstract This book addresses the long discussed issue of Japanese interactive markers (traditionally called sentence-final particles) in a new light, and provides the comprehensive linguistic documentation of the interactional functions of seven interactive markers: ne, na, yo, sa, wa, zo and ze. By adopting three key notions, ‘involvement’, ‘formality’ and ‘gender’, the study not only reveals the functions and pragmatic effects of each marker, but also sheds light on some fundamental issues of the nature of spoken discourse in general, including how speakers collaborate with each other to create and sustain their conversations and how linguistic functions of verbal forms interface with sociocultural norms. This book will be of interest to students and scholars in a wide range of linguistic fields such as Japanese linguistics, pragmatics, sociolinguistics, discourse analysis and applied linguistics and to teachers and learners of Japanese and of a second/foreign language.

  • Sieb Nooteboom, and Hugo Quené, “The time course of self-monitoring within words and utterances,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 45-48.

    Abstract The within-word and within-utterance time course of internal and external self-monitoring is investigated in a four-word tongue twister experiment eliciting interactional word initial and word medial segmental errors and their repairs. It is found that detection rate for both internal and external self-monitoring decreases from early to late both within words and within utterances. Also, offset-to-repair times are more often of 0 ms in initial than in medial consonants.

    Keywords DiSS

  • Dan Nosowitz, “The Mystery and Occasional Poetry of, Uh, Filled Pauses,” January 2017.

    Abstract NEARLY EVERY LANGUAGE AND EVERY culture has what are called “filled pauses,” a notoriously difficult-to-define concept that generally refers to sounds or words that a speaker uses when, well, not exactly speaking. In American English, the most common are “uh” and “um.”

  • Pauliina Peltonen, “Temporal fluency and problem-solving in interaction: An exploratory study of fluency resources in L2 dialogue,” System, vol. 70, 2017, pp. 1 - 13. DOI: 10.1016/j.system.2017.08.009.

    Abstract Second language (L2) speech fluency has mostly been studied from monologues with temporal measures. In the present study, dialogue data are examined with a new framework that links (temporal) fluency analysis to a broader problem-solving perspective, offering a unique approach to examining the resources learners have for maintaining fluent speech despite problems. Dialogues based on a pairwise problem-solving task from 42 Finnish learners of English at two school levels were analyzed quantitatively for temporal fluency, dialogue fluency, stalling mechanisms, and communication strategies (CSs). A complementary qualitative analysis of selected productions was also conducted. The results indicate that temporal and dialogue fluency measures differentiate learners at different school levels, but the relationship between CSs and fluency is complex. While correlations between mid-clause pauses and certain strategies were found, the qualitative analysis indicated that stalling mechanisms and CSs can compensate for local dysfluencies and even contribute to temporal fluency. The results highlight the importance of combining quantitative and qualitative analysis in L2 fluency studies. Conceptually, L2 speech fluency should include collaborative aspects (dialogue fluency) in addition to individual, temporal fluency, and cover resources for maintaining fluency.

    Keywords Communication strategies, interaction, Mixed-methods, oral fluency, Problem-solving, second language speech

  • Ralph Rose, “Silent and filled pauses and speech planning in first and second language production,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 49-52.

    Abstract The present study looks at the relative association of silent and filled pauses to problems in discourse and syntactic planning via utterance and clause boundary phenomena, respectively, by focusing on crosslinguistic data. The occurrence of boundary pauses in a crosslinguistic corpus of speech suggests that silent pauses are more closely related to both discourse and syntactic planning than filled pauses, but more strongly so for discourse planning. These results were consistent across both first and second language production. However, clause boundary silent pauses in first language speech were more atypical (i.e., longer than average) than those in second language speech. This difference may be due to complexity differences in the first and second language speech samples.

    Keywords DiSS

  • Ralph L. Rose, “Differences in second language speech fluency ratings: Native versus nonnative listeners,” February 2017.

    Abstract (none)

  • Ralph L. Rose, “A Comparison of Form and Temporal Characteristics of Filled Pauses in L1 Japanese and L2 English,” Journal of the Phonetic Society of Japan, vol. 21, no. 3, 2017, pp. 33-40. DOI: 10.24467/onseikenkyu.21.3_33.

    Abstract Filled pauses (FPs) in English can be either monophonemic ‘uh’ [ə] or polyphonemic ‘um’ [əm]. These differ temporally: shorter ‘uh’ is associated with shorter overall delay (including silent pauses). Japanese FPs are more varied, including both monophonemic ([ε], [ŋ]) and polyphonemic ([ε:to], [ɑno]) forms. This study compares the FPs of native Japanese speakers in a crosslinguistic speech corpus. Results show speakers use FPs with a lower F1 than native English speakers and strongly prefer the monophonemic form. Duration patterns are similar, but low proficiency speakers delay longer with monophonemic FPs. Results suggest possibilities for nonnative speech detection in speech applications.

  • June Ruivivar, and Laura Collins, “The Effects of Foreign Accent on Perceptions of Nonstandard Grammar: A Pilot Study Authors,” TESOL Quarterly, 05/2017 2017. DOI: 10.1002/tesq.374.

    Abstract (none)

  • Naomi Sakai, Shin Ying Chu, Koichi Mori, and J. Scott Yaruss, “The Japanese version of the Overall Assessment of the Speaker’s Experience of Stuttering for Adults (OASES-A-J): Translation and psychometric evaluation,” Journal of Fluency Disorders, 01/2017 2017. DOI: 10.1016/j.jfludis.2016.11.002.

    Abstract Purpose. This study evaluates the psychometric performance of the Japanese version of the Overall Assessment of the Speaker’s Experience of Stuttering for Adults (OASES-A), a comprehensive assessment tool of individuals who stutter. | Methods. The OASES-A-J was administered to 200 adults who stutter in Japan. All respondents also evaluated their own speech (SA scale), satisfaction of their own speech (SS scale) and the Japanese translation version of the Modified Erickson Communication Attitude scale (S-24). The test-retest reliability and internal consistency of the OASES-A-J were assessed. To examine the concurrent validity of the questionnaire, Pearson correlation was conducted between the OASES-A-J Impact score and the S-24 scale, SA scale and SS scale. In addition, Pearson correlation among the impact scores of each section and total were calculated to examine the construct validity. | Results. The OASES-A-J showed a good test-retest reliability (r = 0.81–0.95) and high internal consistency (α > 0.80). Concurrent validity was moderate to high (0.55–0.75). Construct validity was confirmed by the relation between internal consistency in each section and correlation among sections’ impact scores. Japanese adults showed higher negative impact for ‘General Information’, ‘Reactions to Stuttering’ and ‘Quality of Life’ sections. | Conclusion. These results suggest that the OASES-A-J is a reliable and valid instrument to measure the impact of stuttering on Japanese adults who stutter. The OASES-A-J could be used as a clinical tool in Japanese stuttering field.

    Keywords ICF, OASES, Psychometric analysis, Quality of life, stuttering

  • Vered Silber-Varod, and Anat Lerner, “Analysis of silences in unbalanced dialogues: the effect of genre and role,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 53-57.

    Abstract This study examines the diversity of silences in unbalanced dialogues, i.e. dialogues between speakers with different participation levels: responder and reporter. We examined two genres: therapeutic sessions and private dialogues that are based on this responder-reporter structure. When looking at silences versus speech ratios, we found no differences between the genres nor between the roles. However, when grouping the silences by their types: Pauses (intra-speaker silences), gaps (interspeakers’ silences) and silences that occur in the vicinity of speech overlaps, we found that the silence duration of pauses are role dependent in both genres, while the silence duration of gaps were found genre dependent, but not role dependent. Moreover, speech rate was not found genre dependent. It seems that although silences in unbalanced dialogues vary considerably, genre and speaker’s role are influential.

    Keywords DiSS

  • Richard Stephens, and Amy Zile, “Does Emotional Arousal Influence Swearing Fluency?,” Journal of Psycholinguistic Research, 01/2017 2017, pp. 1–13. DOI: 10.1007/s10936-016-9473-8.

    Abstract This study assessed the effect of experimentally manipulated emotional arousal on swearing fluency. We hypothesised that swear word generation would be increased with raised emotional arousal. The emotional arousal of 60 participants was manipulated by having them play a first-person shooter video game or, as a control, a golf video game, in a randomised order. A behavioural measure of swearing fluency based on the Controlled Oral Word Association Test was employed. Successful experimental manipulation was indicated by raised State Hostility Questionnaire scores after playing the shooter game. Swearing fluency was significantly greater after playing the shooter game compared with the golf game. Validity of the swearing fluency task was demonstrated via positive correlations with self-reported swearing fluency and daily swearing frequency. In certain instances swearing may represent a form of emotional expression. This finding will inform debates around the acceptability of using taboo language.

  • Stewart M. McCauley, and Morten H. Christiansen, “Computational Investigations of Multiword Chunks in Language Learning,” Topics in Cognitive Science, 2017. DOI: 10.1111/tops.12258. http:

    Abstract Second-language learners rarely arrive at native proficiency in a number of linguistic domains, including morphological and syntactic processing. Previous approaches to understanding the different outcomes of first- versus second-language learning have focused on cognitive and neural factors. In contrast, we explore the possibility that children and adults may rely on different linguistic units throughout the course of language learning, with specific focus on the granularity of those units. Following recent psycholinguistic evidence for the role of multiword chunks in online language processing, we explore the hypothesis that children rely more heavily on multiword units in language learning than do adults learning a second language. To this end, we take an initial step toward using large-scale, corpus-based computational modeling as a tool for exploring the granularity of speakers’ linguistic units. Employing a computational model of language learning, the Chunk-Based Learner, we compare the usefulness of chunk-based knowledge in accounting for the speech of second-language learners versus children and adults speaking their first language. Our findings suggest that while multiword units are likely to play a role in second-language learning, adults may learn less useful chunks, rely on them to a lesser extent, and arrive at them through different means than children learning a first language.

    Keywords chunking, Comput ational modeling, Corpora, L2, Language learning

  • Uriel Cohen Priva, “Not so fast: Fast speech correlates with lower lexical and structural information,” Cognition, vol. 160, 2017, pp. 27 - 34. DOI: 10.1016/j.cognition.2016.12.002.

    Abstract Speakers dynamically adjust their speech rate throughout conversations. These adjustments have been linked to cognitive and communicative limitations: for example, speakers speak words that are contextually unexpected (and thus add more information) with slower speech rates. This raises the question whether limitations of this type vary wildly across speakers or are relatively constant. The latter predicts that across speakers (or conversations), speech rate and the amount of information content are inversely correlated: on average, speakers can either provide high information content or speak quickly, but not both. Using two corpus studies replicated across two corpora, I demonstrate that indeed, fast speech correlates with the use of less informative words and syntactic structures. Thus, while there are individual differences in overall information throughput, speakers are more similar in this aspect than differences in speech rate would suggest. The results suggest that information theoretic constraints on production operate at a higher level than was observed before and affect language throughout production, not only after words and structures are chosen.

    Keywords Information, Information rate, Language, speech rate

  • Vered Aharonson, Eran Aharonson, Katia Raichlin-Levi, Aviv Sotzianu, Ofer Amir, and Zehava Ovadia-Blechman, “A real-time phoneme counting algorithm and application for speech rate monitoring,” Journal of Fluency Disorders, vol. 51, 2017, pp. 60 - 68. DOI: 10.1016/j.jfludis.2017.01.001.

    Abstract Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient’s speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient’s practice. The algorithm’s phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of −4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.

    Keywords Smartphone application, Speaking rate computation, Spectral transition measure, Stuttering therapy

  • Xiaoming Jiang, and Marc D. Pell, “The sound of confidence and doubt,” Speech Communication, vol. 88, 2017, pp. 106 - 126. DOI:

    Abstract Feeling of knowing (or "expressed confidence") reflects a speaker’s certainty or commitment to a statement and can be associated with one’s trustworthiness or persuasiveness in social interaction. We investigated the perceptual-acoustic correlates of expressed confidence and doubt in spoken language, with a focus on both linguistic and vocal speech cues. In Experiment 1, utterances subserving different communicative functions (e.g., stating facts, making judgments) produced in a confident, close-to-confident, unconfident, and neutral-intending voice by six speakers, were then rated for perceived confidence by 72 native listeners. As expected, speaker confidence ratings increased with the intended level of expressed confidence; neutral-intending statements were frequently judged as relatively high in confidence. The communicative function of the statement, and the presence vs. absence of an utterance-initial probability phrase (e.g., Maybe, I’m sure), further modulated speaker confidence ratings. In Experiment 2, acoustic analysis of perceptually valid tokens rated in Experiment 1 revealed distinct patterns of pitch, intensity and temporal features according to perceived confidence levels; confident expressions were highest in fundamental frequency (f0) range, mean amplitude, and amplitude range, whereas unconfident expressions were highest in mean f0, slowest in speaking rate, with more frequent pauses. Dynamic analyses of f0 and intensity changes across the utterance uncovered distinctive patterns in expression as a function of confidence level at different positions of the utterance. Our findings provide new information on how metacognitive states such as confidence and doubt are communicated by vocal and linguistic cues which permit listeners to arrive at graded impressions of a speaker’s feeling of (un)knowing.

    Keywords nonverbal behavior

  • Yuh-show Cheng, “Development and preliminary validation of four brief measures of L2 language-skill-specific anxiety,” System, 2017, pp. -. DOI: 10.1016/j.system.2017.06.009.

    Abstract This paper reports a study on the development and validation of four brief measures of L2 language-skill-specific anxiety scales: L2 listening, speaking, reading, and writing anxiety scales. A total of 523 college students in Taiwan participated in the study. Lang’s (1971) tripartite model of anxiety provided a theoretical basis for developing the four scales. An initial pool of items were developed based on a review of related literature and the results of a focus group interview. Less ideal items were removed based upon the results of a pilot test. In the formal study, exploratory factor analysis was conducted to select items for each anxiety scale, which was subsequently validated by confirmatory factor analysis and correlation analysis. The results provided evidence for the reliability, convergent validity, and discriminant validity of the scores of the four brief measures.

    Keywords Brief measure, L2, Language anxiety, Language-skill-specific, Psychometric properties


  • Akiko Fuse, and Erika A. Lanham, “Impact of social media and quality life of people who stutter,” Journal of Fluency Disorders, vol. 50, 2016, pp. 59 - 71. DOI: 10.1016/j.jfludis.2016.09.005.

    Abstract Highlights. • People who stutter (PWS) who are connecting with other PWS have seen an improvement in their overall confidence. • PWS who use social media feel that they do not rely on it as their main form of communication and feel that they use social media an average amount. • Social media relieves PWS anxiety in communication by allowing them to communicate without negative evaluation or experience difficulty with functional communication.

  • Amy Watts, Patricia Eadie, Susan Block, Fiona Mensah, and Sheena Reilly, “Language skills of children during the first 12 months after stuttering onset,” Journal of Fluency Disorders, 12/2016 2016, pp. -. DOI:

    Abstract Purpose To describe the language development in a sample of young children who stutter during the first 12 months after stuttering onset was reported. Methods Language production was analysed in a sample of 66 children who stuttered (aged 2 to 4 years). The sample were identified from a pre-existing prospective, community based longitudinal cohort. Data were collected at three time points within the first year after stuttering onset. Stuttering severity was measured, and global indicators of expressive language proficiency (length of utterances and grammatical complexity) were derived from the samples and summarised. Language production abilities of the children who stutter were contrasted with normative data. Results The majority of children’s stuttering was rated as mild in severity, with more than 83% of participants demonstrating very mild or mild stuttering at each of the time points studied. The participants demonstrated developmentally appropriate spoken language skills comparable with available normative data. Conclusion In the first year following the report of stuttering onset, the language skills of the children who were stuttering progressed in a manner that is consistent with developmental expectations.

    Keywords Language

  • Andrea Révész, Monika Ekiert, and Eivind Nessa Torgersen, “The Effects of Complexity, Accuracy, and Fluency on Communicative Adequacy in Oral Task Performance,” Applied Linguistics, vol. 37, no. 6, 12/2016 2016, pp. 828-848. DOI: 10.1093/applin/amu069.

    Abstract Communicative adequacy is a key construct in second language research, as the primary goal of most language learners is to communicate successfully in real-world situations. Nevertheless, little is known about what linguistic features contribute to communicatively adequate speech. This study fills this gap by investigating the extent to which complexity, accuracy, and fluency (CAF) predict adequacy, and whether proficiency and task type moderate these relationships. In all, 20 native speakers and 80 second language users from four proficiency levels performed five tasks. Speech samples were rated for adequacy and coded for a range of CAF indices. Filled pause frequency, a feature of breakdown fluency, emerged as the strongest predictor of adequacy. Predictors with significant but smaller effects included indices of all three CAF dimensions: linguistic complexity (lexical diversity, overall syntactic complexity, syntactic complexity by subordination, and frequency of conjoined clauses), accuracy (general accuracy and accuracy of connectors), and fluency (silent pause frequency and speed fluency). For advanced speakers, incidence of false starts also emerged as predicting communicatively adequate speech. Task type did not influence the link between linguistic features and adequacy.

  • Andrew Martin, Yosuke Igarashi, Nobuyuki Jincho, and Reiko Mazuka, “Utterances in infant-directed speech are shorter, not slower,” Cognition, vol. 156, 2016, pp. 52 - 59. DOI:

    Abstract It has become a truism in the literature on infant-directed speech (IDS) that IDS is pronounced more slowly than adult-directed speech (ADS). Using recordings of 22 Japanese mothers speaking to their infant and to an adult, we show that although IDS has an overall lower mean speech rate than ADS, this is not the result of an across-the-board slowing in which every vowel is expanded equally. Instead, the speech rate difference is entirely due to the effects of phrase-final lengthening, which disproportionally affects IDS because of its shorter utterances. These results demonstrate that taking utterance-internal prosodic characteristics into account is crucial to studies of speech rate.

    Keywords Final lengthening

  • Elina Banzina, “Consonant lengthening for persuasiveness in L1 and L2 English,” International Journal of Applied Linguistics, vol. 26, no. 3, 11/2016 2016, pp. 403-419. DOI:

    Abstract The present study explored how persuasiveness is expressed phonetically in English and whether non-native speakers of English are able to employ L2 phonetic cues to convey importance in L2 in a native-like manner. An acoustic experiment compared English and Latvian speakers’ of English treatment of syllable-onset consonant duration relative to vowels in (i) neutral and (ii) persuasive speech contexts. Duration was measured in voiceless stops and continuants and a wide variety of vowels in the stressed syllables of key words. Results revealed that in persuasive speech, native English speakers significantly increased the proportion of consonantal duration, whereas no consonant lengthening was found in Latvian L1 and L2 productions. These findings provide evidence for the paralinguistic function of consonants and the existence of language-specific persuasion cues.

    Keywords consonant duration, consonant lengthening, discurso persuasivo, discurso público, duración de consonante, emphasis, énfasis, inglés como lengua extranjera, persuasive speech, public speaking, spoken English

  • Benjamin V. Tucker, Mirjam Ernestus, and View Affiliations, “Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon,” The Mental Lexicon, vol. 11, no. 3, 12/2016 2016, pp. 375-400. DOI: 10.1075/ml.11.3.03tuc.

    Abstract The majority of studies addressing psycholinguistic questions focus on speech produced and processed in a careful, laboratory speech style. This ‘careful’ speech is very different from the speech that listeners encounter in casual conversations. This article argues that research on casual speech is necessary to show the validity of conclusions based on careful speech. Moreover, research on casual speech produces new insights and questions on the processes underlying communication and on the mental lexicon that cannot be revealed by research using careful speech. This article first places research on casual speech in its historic perspective. It then provides many examples of how casual speech differs from careful speech and shows that these differences may have important implications for psycholinguistic theories. Subsequently, the article discusses the challenges that research on casual speech faces, which stem from the high variability of this speech style, its necessary casual context, and that casual speech is connected speech. We also present opportunities for research on casual speech, mostly in the form of new experimental methods that facilitate research on connected speech. However, real progress can only be made if these new methods are combined with advanced (still to be developed) statistical techniques.

    Keywords casual speech, conversational speech, experimental paradigms, pronunciation variability, statistical analyses

  • Bjørn Wessel-Tolvig, and Patrizia Paggio, “Revisiting the thinking-for-speaking hypothesis: Speech and gesture representation of motion in Danish and Italian,” Journal of Pragmatics, vol. 99, 07/2016 2016, pp. 39 - 61. DOI:

    Abstract Many studies try to explain thought processes based on verbal data alone and often take the linguistic variation between languages as evidence for cross-linguistic thought processes during speaking. We argue that looking at co-speech gestures might broaden the scope and shed new light on different thinking-for-speaking patterns. Data comes from a corpus study investigating the relationship between speech and gesture in two typologically different languages: Danish, a satellite-framed language and Italian, a verb-framed language. Results show cross-linguistic variation in how motion components are mapped onto linguistic constituents, but also show how Italian speakers to some degree deviate from standard verb-framed lexicalization patterns, and use typical satellite-framed constructions. Co-speech gestures, when they occur, largely follow the patterns used in speech, with a notable exception: In 28% of the cases, in fact, Italian speakers express manner in path-only speech constructions gesturally. This finding suggests that gestures may be instrumental in revealing what semantic components speakers attend to while speaking; in other words, purely verbal data may not fully account for the thinking part of the thinking-for-speaking hypothesis.

    Keywords Gesture

  • Boaz M. Ben-David, Maroof I. Moral, Aravind K. Namasivayam, Hadas Erel, and Pascal H.H.M. van Lieshout, “Linguistic and Emotional-Valence Characteristics of Reading Passages for Clinical Use and Research,” Journal of Fluency Disorders, 2016, pp. -. DOI:

    Abstract Highlights: • There is little information on fundamental properties of reading passages that can affect reading (e.g., words’ arousal and valence, passage readability). • In a detailed analysis, the three commonly used passages were found to contain a share of emotionally valenced, high arousal, lower familiarity and polysyllabic content words. • The paper also provides a new well-balanced (and ranked high on ease of readability) passage that minimizes the impact of these properties (e.g., low arousal words). • Testing 26 PWS, error rates on a traditional passage and on the novel passage were correlated, yet many individuals showed a large difference between the two. • We suggest a combined procedure, using more than one passage. The details on passage characteristics can inform clinical practice.

  • Jazmín Cevasco, and Paul van den Broek, “The effect of filled pauses on the processing of the surface form and the establishment of causal connections during the comprehension of spoken expository discourse,” Cognitive Processing, vol. 17, no. 2, 2016, pp. 185–194. DOI: 10.1007/s10339-016-0755-8.

    Abstract The purpose of this study was to examine the effect of filled pauses (uh) on the verification of words and the establishment of causal connections during the comprehension of spoken expository discourse. With this aim, we asked Spanish-speaking students to listen to excerpts of interviews with writers, and to perform a word-verification task and a question-answering task on causal connectivity. There were two versions of the excerpts: filled pause present and filled pause absent. Results indicated that filled pauses increased verification times for words that preceded them, but did not make a difference on response times to questions on causal connectivity. The results suggest that, as signals of delay, filled pauses create a break with surface information, but they do not have the same effect on the establishment of meaningful connections.

  • David Wood, “Willingness to communicate and second language speech fluency: An idiodynamic investigation,” System, vol. 60, 2016, pp. 11 - 28. DOI:

    Abstract Second language (L2) speech fluency has usually been studied as a function of a set of measurable temporal features of speech, but it has seldom been researched in relation to learner or situational factors in performance such as willingness to communicate (WTC), definable as readiness to engage in communication at a specific time and with specific interlocutors. The present study is an examination of the fluid relationship between WTC and L2 fluency from a dynamic systems perspective. The exploratory case study presents an examination of WTC and fluency in Japanese learners of English L2, in communication with a non-Japanese interlocutor. Speech samples produced by the learners were analyzed for markers of fluency. The learners produced WTC profiles for their speech samples by creating bitmaps during stimulated recall, and also provided retrospective self-analysis of WTC in stimulated recall. The fluency profiles and WTC profiles were matched and analyzed to explore the interrelationship between fluency and WTC. The results illuminate the relationship between fluency and WTC, particularly the fluidity and possible directionality of the relationship, i.e. whether fluency breakdowns lead to lowered WTC or vice versa.

    Keywords Cognitive fluency

  • Nivja H. de Jong, “Predicting pauses in L1 and L2 speech: the effects of utterance boundaries and word frequency,” International Review of Applied Linguistics in Language Teaching, vol. 54, no. 2, 06/2016 2016, pp. 113-132. DOI: 10.1515/iral-2016-9993.

    Abstract This paper compares the distribution of silent and filled pauses in first (L1) and second language (L2) speech. The occurrence of pauses of 52 L2 and 18 L1 Dutch speakers was evaluated with respect to utterance boundaries and word frequency. We found that L2 speakers paused more often than L1 speakers within utterances; but not between utterances. Similarly, only within utterances, L2 pauses were longer than L1 pauses. Regarding word frequency, both L1 and L2 speakers are more likely to pause before lower frequency words as compared to higher frequency words. These findings imply that L1 and L2 speakers’ production processes may be similar in that (1) pauses at utterance boundaries are used for conceptual planning mostly and (2) lexical retrieval difficulties are comparable for L1 and L2 speakers. These findings furthermore imply that when using fluency for L2 testing, pause locations must be taken into account.

  • Francesca Bianchi, and Sara Gesuato, Pragmatic Issues in Specialized Communicative Contexts. : Brill.2016, pp. 240. DOI: 10.1163/9789004323902.

    Abstract "Pragmatic Issues in Specialized Communicative Contexts", edited by Francesca Bianchi and Sara Gesuato, illustrates how interactants systematically and effectively employ micro and macro linguistic resources and textual strategies to engage in communicative practices in such specific contexts as healthcare services, TV interpreting, film dialogue, TED talks, archaeology academic communication, student-teacher communication, and multilingual classrooms. Each contribution presents a pedagogical slant, reporting on or suggesting didactic approaches to, or applications of, pragmatic aspects of communication in SL, FL and LSP learning contexts. The topics covered and the issues addressed are all directly relevant to applied pragmatics, that is, pragmatically oriented linguistic analysis that accounts for interpersonal-transactional issues in real-life situated communication.

  • Effrosyni Georgiadou, and Karen Roehr-Brackin, “Investigating Executive Working Memory and Phonological Short-Term Memory in Relation to Fluency and Self-Repair Behavior in L2 Speech,” Journal of Psycholinguistic Research, 2016, pp. 1–19. DOI: 10.1007/s10936-016-9463-x.

    Abstract This paper reports the findings of a study investigating the relationship of executive working memory (WM) and phonological short-term memory (PSTM) to fluency and self-repair behavior during an unrehearsed oral task performed by second language (L2) speakers of English at two levels of proficiency, elementary and lower intermediate. Correlational analyses revealed a negative relationship between executive WM and number of pauses in the lower intermediate L2 speakers. However, no reliable association was found in our sample between executive WM or PSTM and self-repair behavior in terms of either frequency or type of self-repair. Taken together, our findings suggest that while executive WM may enhance performance at the conceptualization and formulation stages of the speech production process, self-repair behavior in L2 speakers may depend on factors other than working memory.

    Keywords Executive working memory, Fluency, hesitation phenomena, L2 speech production, Phonological short-term memory, Self-repair behavior, Working memory capacity

  • Anna Gladkova, Ulla Vanhatalo, and Cliff Goddard, “The semantics of interjections: An experimental study with natural semantic metalanguage,” Applied Psycholinguistics, vol. 37, 7 2016, pp. 841–865. DOI: 10.1017/S0142716415000260.

    Abstract The paper reports the results of a pilot experimental study aimed at evaluating natural semantic metalanguage (NSM) explications of English interjections. It proposes a novel online survey technique to test NSM explications with language speakers. The survey tested recently developed semantic explications of selected English interjections as published in Goddard (2014a): 'wow', 'gosh', 'gee', 'yikes' (“surprise” group) and 'yuck', 'ugh' (“disgust” group). The results provide overall support for the proposed explications and indicate directions for their further development. It is interesting that respondents’ preexisting knowledge of NSM and other background variables (age, gender, being a native speaker, or studying linguistics) were shown to have little influence on the test results.

  • Kaisa Hash, Heini-Marja Javinen, and Kalle Juuti, “Accommodating to English-medium instruction in teacher education in Finland,” International Journal of Applied Linguistics, vol. 26, no. 3, 11/2016 2016, pp. 291-310. DOI: 10.1111/ijal.12093.

    Abstract This study analyses teacher educators’ and student teachers’ perceptions of teaching and learning situations in an international English as a lingua franca (ELF) context in an English-medium instruction (EMI) teacher education programme in Finland. The analysis of semi-structured interviews revealed that the participants perceived a partial reversal of traditional teacher and student roles; students assisted voluntarily and teaching became reciprocal. Some teachers reflected on having used typical strategies in ELF context, such as code-switching, to further communication and engage students. However, teachers’ lack of fluency was sometimes considered causing frustration among students and affected negatively their feeling of being professional teacher educators. Nevertheless, by increasing more learner-led activities, ELF can positively affect teacher education pedagogy.

    Keywords accommodation strategies, co-construction of communication, ELF, EMI, englanninkielinen koulutus, opettajankoulutus, sovittamisstrategiat, teacher education, yhdessä rakennettu viestintä

  • Hyunkyung Lee, Hyunsub Sim, Eunju Lee, and Dahye Choi, “Disfluency characteristics of children with attention-deficit/hyperactivity disorder symptoms,” Journal of Communication Disorders, 2016, pp. -. DOI:

    Abstract The purpose of the current study was to investigate the characteristics of speech disfluency in 15 children with attention-deficit/hyperactivity disorder (ADHD) symptoms and 15 age-matched control children. Reading, story retelling, and picture description tasks were used to elicit utterances from the participants. The findings indicated that children with ADHD symptoms produced significantly more stuttering-like disfluencies (SLD) and other disfluencies (OD) when compared to the control group during all three tasks. Further statistical analysis showed that children with ADHD symptoms produced more OD during the story retelling task than the other two tasks, whereas no significant differences in OD were observed among the three tasks in the control children. Finally, children with ADHD symptoms exhibited a higher proportion of SLD in total disfluencies (TD) than the control children. These results are consistent with previous studies that children with ADHD are disfluent in their verbal production. Furthermore, children with ADHD symptoms seem to be more vulnerable to a speaking task that places greater demands on their attentional resources for language production, resulting in increased speech disfluencies.

    Keywords Stuttering-like disfluency

  • Jennifer A. Foote, and Pavel Trofimovich, “A Multidimensional Scaling Study of Native and Non-Native Listeners’ Perception of Second Language Speech,” Perceptual and Motor Skills, vol. 122, no. 2, 03/2016 2016, pp. 470-489. DOI: 10.1177/0031512516636528.

    Abstract Second language speech learning is predicated on learners’ ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers’ pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training.

    Keywords multidimensional scaling, second language speech, speech perception

  • Joana Cholin, Sabrina Heiler, Alexander Whillier, and Martin Sommer, “Premonitory Awareness in Stuttering Scale (PAiS),” Journal of Fluency Disorders, 2016, pp. -. DOI:

    Abstract Anticipation of stuttering events in persistent developmental stuttering is a frequent but inadequately measured phenomenon that is of both theoretical and clinical importance. Here, we describe the development and preliminary testing of a German version of the Premonitory Awareness in Stuttering Scale (PAiS) a 12-item questionnaire assessing immediate and prospective anticipation of stuttering that was translated and adapted from the Premonitory Urge for Tics Scale (PUTS) (Woods, Piacentini, Himle, & Chang, 2005). After refining the preliminary PAiS scale in a pilot study, we administered a revised version to 21 adults who stutter (AWS) and 21 age, gender and education-matched control participants. Results demonstrated that the PAiS had good internal consistency and discriminated the two speaker groups very effectively, with AWS reporting anticipation of speech disruptions significantly more often than adults with typical speech. Correlations between the PAiS total score and both the objective and subjective measures of stuttering severity revealed that AWS with high PAiS scores produced fewer stuttered syllables. This is possibly because these individuals are better able to adaptively use these anticipatory sensations to modulate their speech. These results suggest that, with continued refinement, the PAiS has the potential to provide clinicians and researchers with a practical and psychometrically sound tool that can quantify how a given AWS anticipates upcoming stuttering events.

    Keywords premonitory awareness

  • Kristen Lucas, Sharon A. Kerrick, Jenna Haugen, and Cole J. Corider, “Communicating Entrepreneurial Passion: Personal Passion vs. Perceived Passion in Venture Pitches,” IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, vol. 59, no. 4, 10/2016 2016, pp. 363-378. DOI: 10.1109/TPC.2016.2607818.

    Abstract Research problem: Entrepreneurial passion has been shown to play an important role in venture success and, therefore, in investors’ funding decisions. However, it is unknown whether the passion entrepreneurs personally feel or experience can be accurately assessed by investors during a venture pitch. Research questions: (1) To what extent does entrepreneurs’ personal passion align with investors’ perceived passion? (2) To what cues do investors attend when assessing entrepreneurs’ passion? Literature review: Integrating theory and research in entrepreneurship communication and entrepreneurial passion within the context of venture pitching, we explain that during venture pitches, investors make judgments about entrepreneurs’ passion that have consequences for their investment decisions. However, they can attend to only those cues that entrepreneurs outwardly display. As a result, they may not be assessing the passion entrepreneurs personally feel or experience. Methodology: We used a sequential explanatory mixed methods research design. For our data collection, we surveyed 40 student entrepreneurs, videorecorded their venture pitches, and facilitated focus groups with 16 investors who viewed the videos and ranked, rated, and discussed their perceptions of entrepreneurs’ passion. We conducted statistical analyses to assess the extent to which entrepreneurs’ personal passion and investors’ perceived passion aligned. We then performed an inductive analysis of critical cases to identify specific cues that investors attributed to passion or lack thereof. Results and conclusions: We revealed a large misalignment between entrepreneurs’ personal passion and investors’ perceived passion. Our critical case analysis demonstrated that entrepreneurs’ weak or strong presentation skills led investors either to underestimate or overestimate, respectively, perceptions of entrepreneurs’ passion. We suggest that entrepreneurs should develop specific presentation skills and rhetorical strategies for displaying their passion; at the same time, investors should be wary of attending too closely to presentation skills when assessing passion.

    Keywords Communication effectiveness, oral communication, public speaking

  • Lisa Iverach, Mark Jones, Lauren F. McLellan, Heidi J. Lyneham, Ross G. Menzies, Mark Onslow, and Ronald M. Rapee, “Prevalence of anxiety disorders among children who stutter,” Journal of Fluency Disorders, 2016, pp. -. DOI:

    Abstract Purpose Stuttering during adulthood is associated with a heightened rate of anxiety disorders, especially social anxiety disorder. Given the early onset of both anxiety and stuttering, this comorbidity could be present among stuttering children. Method Participants were 75 stuttering children 7–12 years and 150 matched non-stuttering control children. Multinomial and binary logistic regression models were used to estimate odds ratios for anxiety disorders, and two-sample t-tests compared scores on measures of anxiety and psycho-social difficulties. Results Compared to non-stuttering controls, the stuttering group had six-fold increased odds for social anxiety disorder, seven-fold increased odds for subclinical generalized anxiety disorder, and four-fold increased odds for any anxiety disorder. Conclusion These results show that, as is the case during adulthood, stuttering during childhood is associated with a significantly heightened rate of anxiety disorders. Future research is needed to determine the impact of those disorders on speech treatment outcomes.

    Keywords stuttering

  • Louise Cummings, Case Studies in Communication Disorders. New York: Cambridge University Press.2016. get-book.cfm?BookID=109554.

    Abstract Designed for students of speech-language pathology, audiology and clinical linguistics, this valuable text introduces students to all aspects of the assessment, diagnosis and treatment of clients with developmental and acquired communication disorders through a series of structured case studies. Each case study includes questions which direct readers to important features of the case that will facilitate clinical learning. A selection of further readings encourages students to extend their knowledge of communication disorders. Key features of this book include: • 48 detailed case studies based on actual clients with communication disorders • 25 questions within each case study • Fully-worked answers to every question • 105 suggestions for further reading The text also develops knowledge of the epidemiology, aetiology, and linguistic and cognitive features of communication disorders, highlights salient aspects of client histories, and examines assessments and interventions used in the management of clients.

    Keywords cognitive science, General Linguistics, Neurolinguistics, psycholinguistics

  • Carolyn Mancuso, and Raymond G. Miltenberger, “Using habit reversal to decrease filled pauses in public speaking,” Journal of Applied Behavior Analysis, vol. 49, no. 1, 2016, pp. 188–192. DOI: 10.1002/jaba.267.

    Abstract This study evaluated the effectiveness of simplified habit reversal in reducing filled pauses that occur during public speaking. Filled pauses consist of “uh,” “um,” or “er”; clicking sounds; and misuse of the word “like.” After baseline, participants received habit reversal training that consisted of awareness training and competing response training. During postintervention assessments, all 6 participants exhibited an immediate decrease in filled pauses.

    Keywords awareness training, competing response training, habit reversal, public speaking

  • Martijn Wieling, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman, and Mark Liberman, “Variation and Change in the Use of Hesitation Markers in Germanic Languages,” Language Dynamics and Change, vol. 6, no. 2, 2016 2016, pp. 199-234. DOI: 10.1163/22105832-00602001.

    Abstract In this study, we investigate crosslinguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.

    Keywords corpus linguistics, crosslinguistic change, hesitation markers, language change

  • Michael P. Boyle, Lauren Dioguardi, and Julie E. Pate, “A comparison of three strategies for reducing the public stigma associated with stuttering,” Journal of Fluency Disorders, vol. 50, 09/2016 2016, pp. 44-58. DOI: 10.1016/j.jfludis.2016.09.004.

    Abstract Purpose. The effects of three anti-stigma strategies for stuttering—contact (hearing personal stories from an individual who stutters), education (replacing myths about stuttering with facts), and protest (condemning negative attitudes toward people who stutter)—were examined on attitudes, emotions, and behavioral intentions toward people who stutter. | Method. Two hundred and twelve adults recruited from a nationwide survey in the United States were randomly assigned to one of the three anti-stigma conditions or a control condition. Participants completed questionnaires about stereotypes, negative emotional reactions, social distance, discriminatory intentions, and empowerment regarding people who stutter prior to and after watching a video for the assigned condition, and reported their attitude changes about people who stutter. Some participants completed follow-up questionnaires on the same measures one week later. | Results. All three anti-stigma strategies were more effective than the control condition for reducing stereotypes, negative emotions, and discriminatory intentions from pretest to posttest. Education and protest effects for reducing negative stereotypes were maintained at one-week follow-up. Contact had the most positive effect for increasing affirming attitudes about people who stutter from pretest to posttest and pretest to follow-up. Participants in the contact and education groups, but not protest, self-reported significantly more positive attitude change about people who stutter as a result of watching the video compared to the control group. | Conclusion. Advocates in the field of stuttering can use education and protest strategies to reduce negative attitudes about people who stutter, and people who stutter can increase affirming attitudes through interpersonal contact with others.

    Keywords Anti-stigma programs, Empowerment, Public stigma, Stereotypes, Stuttering advocacy

  • Milly Heelan, Jan McAllister, and Jane Skinner, “Stuttering, alcohol consumption and smoking,” Journal of Fluency Disorders, vol. 48, 2016, pp. 27 - 34. DOI:

    Abstract Purpose: Limited research has been published regarding the association between stuttering and substance use. An earlier study provided no evidence for such an association, but the authors called for further research to be conducted using a community sample. The present study used data from a community sample to investigate whether an association between stuttering and alcohol consumption or regular smoking exists in late adolescence and adulthood. Methods: Regression analyses were carried out on data from a birth cohort study, the National Child Development Study (NCDS), whose initial cohort included 18,558 participants who have since been followed up until age 55. In the analyses, the main predictor variable was parent-reported stuttering at age 16. Parental socio-economic group, cohort member’s sex and childhood behavioural problems were also included. The outcome variables related to alcohol consumption and smoking habits at ages 16, 23, 33, 41, 46, 50 and 55. Results: No significant association was found between stuttering and alcohol consumption or stuttering and smoking at any of the ages. It was speculated that the absence of significant associations might be due to avoidance of social situations on the part of many of the participants who stutter, or adoption of alternative coping strategies. Conclusion: Because of the association between anxiety and substance use, individuals who stutter and are anxious might be found to drink or smoke excessively, but as a group, people who stutter are not more likely than those who do not to have high levels of consumption of alcohol or nicotine.

    Keywords Birth cohort

  • Nadia Brejon Teitler, Sandrine Ferré, and Clémentine Dailly, “Specific subtype of fluency disorder affecting French speaking children: A phonological analysis,” Journal of Fluency Disorders, vol. 50, 2016, pp. 33 - 43. DOI:

    Abstract Purpose Clinicians working with fluency disorders sometimes see children whose word repetitions are mostly located at the end of words and do not induce physical tension. Prior studies on the topic have proposed several names for these disfluencies including “end word repetitions”, “final sound repetitions” and “atypical disfluency”. The purpose of this study was to use phonological analysis to explore the patterns of this poorly recognized fluency disorder in order to better understand its specific speech characteristics. Methods We analyzed a spontaneous language sample of 8 French speaking children. Audio and video recordings allowed us to study general communication issues as well as linguistic and acoustical data. Results We did not detect speech rupture or coarticulation failures between the syllable onset and rhyme. The problem resides primarily on the rhyme production with a voicing interruption in the middle of the syllable nucleus or a repetition of the rhyme (nucleus alone or nucleus and coda), regardless of the position in the word or phrase. Conclusion The present study provides data suggesting that there exist major differences in syllable production between the disfluencies produced by our 8 children and stuttered disfluencies. Consequently, we believe that this fluency disorder should be recognized as distinct from stuttering.

    Keywords Syllable rhyme

  • Naomi Hertsberg, and Patricia M. Zebrowski, “Self-perceived competence and social acceptance of young children who stutter: Initial findings,” Journal of Communication Disorders, vol. 64, 2016, pp. 18 - 31. DOI:

    Abstract Purpose. The goals of this study were to determine whether young children who stutter (CWS) perceive their own competence and social acceptance differently than young children who do not stutter (CWNS), and to identify the predictors of perceived competence and social acceptance in young speakers. | Method. We administered the "Pictorial Scale of Perceived Competence and Social Acceptance for Young Children" (PSPCSA; Harter & Pike, 1984) to 13 CWS and 14 CWNS and examined group differences. We also collected information on the children’s genders, temperaments, stuttering frequencies, language abilities, and phonological skills to identify which of these factors predicted PSPCSA scores. | Results. CWS, as a group, did not differ from CWNS in their perceived general competence or social acceptance. Gender predicted scores of perceived general competence, and stuttering frequency predicted perceived social acceptance. Temperament, language abilities, and phonological skills were not significant predictors of perceived competence or social acceptance in our sample. | Conclusions. While CWS did not significantly differ from CWNS in terms of perceived competence and social acceptance, when both talker groups were considered together, girls self-reported greater perceived competence than boys. Further, lower stuttering frequency was associated with greater perceived social acceptance. These preliminary findings provide motivation for further empirical study of the psychosocial components of childhood stuttering. | Learning outcomes. Readers will be able to describe the constructs of perceived competence and social acceptance in young children, and whether early stuttering plays a role in the development of these constructs.

    Keywords children

  • Olga Kozar, “Teachers’ reaction to silence and teachers’ wait time in video and audioconferencing English lessons: Do webcams make a difference?,” System, 2016, pp. -. DOI:

    Abstract There is a mismatch between an increasing number of people teaching languages via video or audioconferencing tools, and the amount of research available to such teachers to guide their practice. One particular pedagogical question that research does not provide guidance on teachers’ treatment of during videoconferencing and audioconferencing lessons. This study uses Conversation Analysis to compare lessons conducted by the same teacher-student dyads in audio and videoconferencing. The findings show distinct differences in teachers’ treatment of silence and teachers’ and students’ pausing behaviour in video and audioconferencing. Specifically, teachers tended to wait longer in videoconferencing and took the conversational floor faster in audioconferencing, thus leading to a higher number of overlaps with students’ emergent turns. This suggests that teachers need to be trained for conducting lessons via audio and video conferencing, and that teachers and teacher trainers need to identify specific pedagogical behaviours for each of these contexts.

    Keywords Online language teaching

  • Mary Grantham O’Brien, “Methodological Choices in Rating Speech Samples,” Studies in Second Language Acquisition, vol. 38, 9 2016, pp. 587–605. DOI: 10.1017/S0272263115000418.

    Abstract Much pronunciation research critically relies upon listeners’ judgments of speech samples, but researchers have rarely examined the impact of methodological choices. In the current study, 30 German native listeners and 42 German L2 learners (L1 English) rated speech samples produced by English-German L2 learners along three continua: accentedness, fluency, and comprehensibility. The goal was to determine whether rating condition, that is, (a) whether each speech sample is rated along all three continua after it is heard once or (b) whether all speech samples are rated along one continuum before being rated along the next continuum, and continuum order (e.g., whether participants rate speech samples for accentedness before comprehensibility or fluency) have an effect on listeners’ ratings. Results indicate no significant overall effects of rating condition or continuum order, but there is evidence of rating condition effects by listener group. The results have implications for laboratory and classroom assessments of L2 speech.

  • Ross Menzies, Sue O’Brian, Robyn Lowe, Ann Packman, and Mark Onslow, “International Phase II clinical trial of CBTPsych: A standalone Internet social anxiety treatment for adults who stutter,” Journal of Fluency Disorders, vol. 48, 2016, pp. 35-43. DOI:

    Abstract Purpose : is an individualized, fully automated, standalone Internet treatment program that requires no clinical contact or support. It is designed specifically for those who stutter. Two preliminary trials demonstrated that it may be efficacious for treating the social anxiety commonly associated with stuttering. However, both trials involved pre- and post-treatment assessment at a speech clinic. This contact may have increased compliance, commitment and adherence with the program. The present study sought to establish the effectiveness of : in a large international trial with no contact of any kind from researchers or clinicians. Method Participants were 267 adults with a reported history of stuttering who were given a maximum of 5 months access to CBTPsych. Pre-and post-treatment functioning was assessed within the online program with a range of psychometric measures. Results Forty-nine participants (18.4%) completed all seven modules of : and completed the post-treatment online assessments. That compliance rate was far superior to similar community trials of self-directed Internet mental health programs. Completion of the program was associated with large, statistically and clinically significant reductions for all measures. The reductions were similar to those obtained in earlier trials of CBTPsych, and those obtained in trials of in-clinic {CBT} with an expert clinician. Conclusions : is a promising individualized treatment for social anxiety for a proportion of adults who stutter, which requires no health care costs in terms of clinician contact or support. Educational objectives The reader will be able to: (a) Discuss the reasons for investigating : without any clinical contact (b) Describe the main components of the : treatment; (c) Summarize the results of this clinical trial; (d) Describe how the results might affect clinical practice, if at all.

    Keywords Stuttering, Cognitive behavior therapy, E-therapy, Internet

  • Benjamin G. Schultz, Irena O’Brien, Natalie Phillips, David H. McFarland, Debra Titone, and Caroline Palmer, “Speech rates converge in scripted turn-taking conversations,” Applied Psycholinguistics, vol. 37, 09/2016 2016, pp. 1201–1220. DOI: 10.1017/S0142716415000545.

    Abstract When speakers engage in conversation, acoustic features of their utterances sometimes converge. We examined how the speech rate of participants changed when a confederate spoke at fast or slow rates during readings of scripted dialogues. A beat-tracking algorithm extracted the periodic relations between stressed syllables (beats) from acoustic recordings. The mean interbeat interval (IBI) between successive stressed syllables was compared across speech rates. Participants’ IBIs were smaller in the fast condition than in the slow condition; the difference between participants’ and the confederate’s IBIs decreased across utterances. Cross-correlational analyses demonstrated mutual influences between speakers, with greater impact of the confederate on participants’ beat rates than vice versa. Beat rates converged in scripted conversations, suggesting speakers mutually entrain to one another’s beat.

  • Ye Tian, Takehiko Maruyama, and Jonathan Ginzburg, “Self Addressed Questions and Filled Pauses: A Cross-linguistic Investigation,” Journal of Psycholinguistic Research, 12/2016 2016, pp. 1–18. DOI: 10.1007/s10936-016-9468-5.

    Abstract There is an ongoing debate whether phenomena of disfluency (such as filled pauses) are produced communicatively. Clark and Fox Tree (Cognition 84(1):73–111, 2002) propose that filled pauses are words, and that different forms signal different lengths of delay. This paper evaluates this Filler-As-Words hypothesis by analyzing the distribution of self-addressed-questions or SAQs (such as ‘‘what’s the word’’) in relation to filled pauses. We found that SAQs address different problems in different languages (most frequently about memory-retrieval in English and Chinese, and about appropriateness in Japanese). In relation to filled pauses, British but not American English uses ‘‘um’’ to signal a more severe problem than ‘‘uh’’. Chinese uses different filled pauses to signal the syntactic category of the problem constituent. Japanese uses different filled pauses to signal levels of interaction with the interlocuter. Overall, our data supports the Filler-As-Words hypothesis that filled pauses are used communicatively. However, the dimensions of its meanings vary across languages and dialects.

    Keywords Cross-linguistic analysis, disfluency, filled pauses, Self addressed questions

  • Gunnel Tottie, “Planning what to say: Uh and um among the pragmatic markers,” in Outside the Clause: Form and function of extra-clausal constituents (Outside the Clause: Form and function of extra-clausal constituents), .: John Benjamins, 2016, pp. 97-122.$#$catalog/books/slcs.178.04tot/details.

    Abstract Based on data from the Santa Barbara Corpus of Spoken American English, this paper argues that the vocalizations [ə(:)] and [ə(:)m]), usually transcribed 'uh' and 'um,' can be regarded as pragmatic markers, rather than as undesirable disfluencies or hesitation markers. It is shown that they are especially frequent in registers and contexts that require more planning by speakers, like narrative passages in conversation and in task-related contexts, especially in long turns. The term 'planner' is therefore proposed as an appropriate designation. Co-occurrences of 'uh' and 'um' with other pragmatic markers such as 'well, you know, I mean' and 'like' as well as with 'and' and 'but' are shown to support this view.

  • Vincent Hughes, Sophie Wood, and Paul Foulkes, “Strength of forensic voice comparison evidence from the acoustics of filled pauses,” International Journal of Speech Language and the Law, vol. 23, no. 1, 2016, pp. 99-132. DOI: 10.1558/ijsll.v23i1.29874.

    Abstract This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed, drawn from Task 1 of the DyViS corpus (Nolan et al. 2009). The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (Brümmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; Brümmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases.

    Keywords durations, Forensic voice comparison, formant dynamics, hesitation markers, likelihood ratio

  • Vincenza Tudini, “Repair and codeswitching for learning in online intercultural talk,” System, 2016, pp. -. DOI:

    Abstract This study examines the role of repair and code switching for language learning in online written interaction between two speakers of both Italian and English as, respectively, either an L1 or L2. Specifically, during episodes of general repair and corrective feedback, these geographically dispersed university language students used both languages in their repertoire as key interactional and learning resources to co-construct a language learning partnership and pursue affiliation. Despite the face-threatening nature of corrective feedback, also known as other-initiated other-repair, participants managed to construct and maintain intersubjectivity in the text chat environment by availing themselves of the reciprocal possibilities of their bilingual expertise, thus overcoming linguistic asymmetries. In this way both social and learning objectives were achieved during written talk-in-interaction, suggesting that online language learning partnerships with multilingual intercultural speakers of the target language rather than monolingual native speaker partners should be given a more prominent role in languages programs across sectors.

    Keywords Written talk-in-interaction

  • Yvonne Préfontaine, Judit Kormos, and Daniel Ezra Johnson, “How do utterance measures predict raters’ perceptions of fluency in French as a second language?,” Language Testing, vol. 33, no. 1, 2016, pp. 53-73. DOI: 10.1177/0265532215579530.

    Abstract While the research literature on second language (L2) fluency is replete with descriptions of fluency and its influence with regard to English as an additional language, little is known about what fluency features influence judgments of fluency in L2 French. This study reports the results of an investigation that analyzed the relationship between utterance fluency measures and raters’ perceptions of L2 fluency in French using mixed-effects modeling. Participants were 40 adult learners of French at varying levels of proficiency, studying in a university immersion context. Speech performances were collected on three different types of narrative tasks. Four utterance fluency measures were extracted from each performance. Eleven untrained judges rated the speech performances and we examined which utterance fluency measures are the best predictors of the scores awarded by the raters. The mean length of runs and articulation rate proved to be the most influential factors in raters’ judgments, while the frequency of pauses played a less important role. The length of pauses was positively related to fluency scores, indicating a prominent cross-linguistic variation specific to French. The relative importance of the utterance measures in predicting fluency ratings, however, was found to vary across tasks.

  • Peyman Zamani, Majid Ravanbakhsh, Farzad Weisi, Vahid Rashedi, Sara Naderi, Ayub Hosseinzadeh, and M Rezaei, “Effect(s) of Language Tasks on Severity of Disfluencies in Preschool Children with Stuttering,” Journal of Psycholinguistic Research, 05/2016 2016. DOI: 10.1007/s10936-016-9437-z.

    Abstract Speech disfluency in children can be increased or decreased depending on the type of linguistic task presented to them. In this study, the effect of sentence imitation and sentence modeling on severity of speech disfluencies in preschool children with stuttering is investigated. In this cross-sectional descriptive analytical study, 58 children with stuttering (29 with mild stuttering and 29 with moderate stuttering) and 58 typical children aged between 4 and 6 years old participated. The severity of speech disfluencies was assessed by SSI-3 and TOCS before and after offering each task. In boys with mild stuttering, The mean stuttering severity scores in two tasks of sentence imitation and sentence modeling were 21.81±1.7221.81±1.72 and 12.94±1.3812.94±1.38 respectively (P=0.837P=0.837). But, in boys with moderate stuttering the stuttering severity in the both tasks were 23.79±1.2623.79±1.26 and 29.00±2.0329.00±2.03 respectively (P=0.004P=0.004). In girls with mild stuttering, the stuttering severity in two tasks of sentence imitation and sentence modeling were 13.14±2.4713.14±2.47 and 13.86±2.0313.86±2.03 respectively (P=0.094P=0.094). But, in girls with moderate stuttering the mean stuttering severity in the both tasks were 25.27±1.9325.27±1.93 and 33.18±2.3233.18±2.32 respectively (P=0.007P=0.007). In both gender of typical children, the score of speech disfluencies had no significant difference between two tasks (P>0.05P>0.05). In preschool children with mild stuttering and peer non-stutters, performing the tasks of sentence imitation and sentence modeling could not increase the severity of stuttering. But, in preschool children with moderate stuttering, doing the task of sentence modeling increased the stuttering severity score.


  • Malte Belz, and Uwe Reichel, “Pitch Characteristics of Filled Pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract We investigate the pitch characteristics of filled pauses in order to distinguish between hesitational and floor-holding functions of filled pauses. A corpus of spontaneous dialogues is explored using a parametric bottom-up approach to extract intonation contours. We find that subjects tend to utter filled pauses more prominently when they cannot see each other, which indicates an increased floor-holding usage of filled pauses in this condition.

    Keywords disfluencies, DiSS, filled pauses, floor-holding, intonation

  • Hans Rutger Bosker, and Eva Reinisch, “Normalization for Speechrate in Native and Nonnative Speech,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0324.1-5.

    Abstract Speech perception involves a number of processes that deal with variation in the speech signal. One such process is normalization for speechrate: local temporal cues are perceived relative to the rate in the surrounding context. It is as yet unclear whether and how this perceptual effect interacts with higher level impressions of rate, such as a speaker’s nonnative identity. Nonnative speakers typically speak more slowly than natives, an experience that listeners take into account when explicitly judging the rate of nonnative speech. The present study investigated whether this is also reflected in implicit rate normalization. Results indicate that nonnative speech is implicitly perceived as faster than temporally-matched native speech, suggesting that the additional cognitive load of listening to an accent speeds up rate perception. Therefore, rate perception in speech is not dependent on syllable durations alone but also on the ease of processing of the temporal signal.

    Keywords cognitive load, implicit processing, nonnative speech, speech perception, speechrate

  • Hans Rutger Bosker, Jade Tjiong, Hugo Quené, Ted Sanders, and Nivja De Jong, “Both native and non-native disfluencies trigger listeners’ attention,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For instance, disfluencies may elicit prediction of less accessible referents and may trigger listeners’ attention to the following word. However, recent work suggests differential processing of disfluencies in native and non-native speech. The current study investigated whether the beneficial effects of disfluencies on listeners’ attention are modulated by the (non-)native identity of the speaker. Using the Change Detection Paradigm, we investigated listeners’ recall accuracy for words presented in disfluent and fluent contexts, in native and non-native speech. We observed beneficial effects of both native and non-native disfluencies on listeners’ recall accuracy, suggesting that native and non-native disfluencies trigger listeners’ attention in a similar fashion.

    Keywords attention, Change Detection Paradigm, disfluencies, DiSS, non-native speech

  • Angelika Braun, and Annabelle Rosin, “On the Speaker-Specificity of Hesitation Markers,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0731.1-5.

    Abstract The occurrence of hesitation markers is generally considered to be part of the verbal planning process. It is also a feature which is of potential importance to the forensic application of phonetics if hesitation behaviour could be linked to individual speakers. This study examines a total of eight female speakers on three different days. It can be demonstrated that, even though results vary across sessions, subjects exhibit distinct patterns of hesitation marker usage. This pertains to the number as well as the type of hesitations marker, which makes this feature a potential candidate for forensic investigations.

    Keywords forensic phonetics, verbal planning

  • Vera Cabarrão, Helena Moniz, Jaime Ferreira, and Fernando Batista, “Prosodic Classification of Discourse Markers,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0634.1-5.

    Abstract The first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, and which correspond to words that are neither markers nor disfluencies (chunks). Our goal is to automatically predict discourse markers and include them in rich transcripts, along with other structural metadata events (e.g., disfluencies and punctuation marks) that are already encompassed in the language models of our in-house speech recognizer. Results show that the automatic classification of discourse markers is better for the lectures corpus (87%) than for the dialogue corpus (84%). Nonetheless, in both corpora, discourse markers are more easily confused with chunks than with disfluencies.

    Keywords Dialogues, Discourse markers, Lectures, prosody, Structural Metadata Events

  • Rasmus Dall, Mirjam Wester, and Martin Corley, “Disfluencies in change detection in natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract In this paper, we investigate the effect of filled pauses, a discourse marker and silent pauses in a change detection experiment in natural, vocoded and synthetic speech. In natural speech change detection has been found to increase in the presence of filled pauses, we extend this work by replicating earlier findings and explore the effect of a discourse marker, like, and silent pauses. Furthermore we report how the use of "unnatural" speech, namely synthetic and vocoded, affects change detection rates. It was found that the filled pauses, the discourse marker and silent pauses all increase change detection rates in natural speech, however in neither synthetic nor vocoded speech did this effect appear. Rather, change detection rates decreased in both types of "unnatural" speech compared to natural speech. The natural results suggests that while each type of pause increase detection rates, the type of pause may have a further effect. The "unnatural" results suggest that it is not the full pipeline of synthetic speech that causes the degradation, but rather that something in the pre-processing, i.e. vocoding, of the speech database limits the resulting synthesis.

    Keywords change detection, DiSS, filled pauses, speech synthesis

  • Nivja H. de Jong, Rachel Groenhout, Rob Schoonen, and Jan H. Hulstijn, “Second language fluency: Speaking style or proficiency? Correcting measures of second language fluency for first language behavior,” Applied Psycholinguistics, vol. 36, no. 2, 03/2015 2015, pp. 223-243. DOI: 10.1017/S0142716413000210.

    Abstract In second language (L2) research and testing, measures of oral fluency are used as diagnostics for proficiency. However, fluency is also determined by personality or speaking style, raising the question to what extent L2 fluency measures are valid indicators of L2 proficiency. In this study, we obtained a measure of L2 (Dutch) proficiency (vocabulary knowledge), L2 fluency measures, and fluency measures that were corrected for first language behavior from the same group of Turkish and English native speakers (N = 51). For most measures of fluency, except for silent pause duration, both the corrected and the uncorrected measures significantly predicted L2 proficiency. For syllable duration, the corrected measure was a stronger predictor of L2 proficiency than was the uncorrected measure. We conclude that for L2 research purposes, as well as for some types of L2 testing, it is useful to obtain corrected measures of syllable duration to measure L2-specific fluency.

  • Mark Dingemanse, Seán G. Roberts, Julija Baranova, Joe Blythe, Paul Drew, Simeon Floyd, Rosa S. Gisladottir, Kobin H. Kendrick, Stephen C. Levinson, Elizabeth Manrique, Giovanni Rossi, and N. J. Enfield, “Universal Principles in the Repair of Communication Problems,” PLoS ONE, vol. 10, no. 9, 09/2015 2015, pp. e0136100. DOI: 10.1371/journal.pone.0136100.

    Abstract There would be little adaptive value in a complex communication system like human language if there were no ways to detect and correct problems. A systematic comparison of conversation in a broad sample of the world’s languages reveals a universal system for the real-time resolution of frequent breakdowns in communication. In a sample of 12 languages of 8 language families of varied typological profiles we find a system of ‘other-initiated repair’, where the recipient of an unclear message can signal trouble and the sender can repair the original message. We find that this system is frequently used (on average about once per 1.4 minutes in any language), and that it has detailed common properties, contrary to assumptions of radical cultural variation. Unrelated languages share the same three functionally distinct types of repair initiator for signalling problems and use them in the same kinds of contexts. People prefer to choose the type that is the most specific possible, a principle that minimizes cost both for the sender being asked to fix the problem and for the dyad as a social unit. Disruption to the conversation is kept to a minimum, with the two-utterance repair sequence being on average no longer that the single utterance which is being fixed. The findings, controlled for historical relationships, situation types and other dependencies, reveal the fundamentally cooperative nature of human communication and offer support for the pragmatic universals hypothesis: while languages may vary in the organization of grammar and meaning, key systems of language use may be largely similar across cultural groups. They also provide a fresh perspective on controversies about the core properties of language, by revealing a common infrastructure for social interaction which may be the universal bedrock upon which linguistic diversity rests.

  • Stephanie Don, and Robin Lickley, “Uh I forgot what I was going to say: How memory affects fluency,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Disfluency rates vary considerably between individuals. Previous studies have considered gender, age and conversational roles amongst other factors that may affect fluency. Testing a nonclinical, gender-balanced population of young adults performing the same speaking tasks, this study explores how inter-speaker variations in working memory and in long-term (lexical) memory affect disfluency in two different ways. Working memory was tested by a forward digit span test; long-term lexical memory was tested by the Verbal Fluency Test, both semantic and phonological versions. In addition, each participant provided 3 one-minute samples of monologue speech. The speech samples were analysed for disfluencies. Speakers with lower working memory scores produced more error repairs in running speech. Speakers with lower lexical access scores produced a higher rate of hesitations. The two types of memory affected fluency in different ways.

    Keywords DiSS, error repair, hesitation, long term lexical memory, working memory

  • Robert Eklund, Peter Fransson, and Martin Ingvar, “Neural correlates of the processing of unfilled and filled pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Spontaneously produced Unfilled Pauses (UPs) and Filled Pauses (FPs) were played to subjects in an fMRI experiment. While both stimuli resulted in increased activity in the Primary Auditory Cortex, FPs, unlike UPs, also elicited modulation in the Supplementary Motor Area, Brodmann Area 6. This observation provides neurocognitive confirmation of the oft-reported difference between FPs and other kinds of speech disfluency and also could provide a partial explanation for the previously reported beneficial effect of FPs on reaction times in speech perception. The results are discussed in the light of the suggested role of FPs as floor-holding devices in human polylogs.

    Keywords Auditory Cortex, BA6, Brodmann Area 6, DiSS, filled pauses, fMRI, PAC, SMA, speech disfluency, speech perception, spontaneous speech, Supplementary Motor Area, unfilled pauses

  • Ewa Guz, “Establishing the Fluency Gap Between Native and Non-Native-Speech,” Research in Language, vol. 13, no. 3, 2015. DOI: 10.1515/rela-2015-0021.

    Abstract Although various dimensions of speech fluency have so far generated a great deal of research interest, very few accounts have tackled the issue of the relationship between L1 and L2 fluency. Also, little empirical evidence has been provided to support the claim that language users are more fluent in their mother tongue than in a foreign/second language. This study examines the fluency gap between L1 and L2 fluency using a battery of objectively quantifiable temporal measures of speed and breakdown fluency. It also attempts to identify those temporal fluency variables which are affected by the individual way of speaking rather than the degree of automatisation of speech processing and which underlie oral performance both in L1 and L2. The analysis draws on transcriptions of elicited speech samples in L1 (Polish) and L2 (English).

    Keywords breakdown fluency, hesitation phenomena, L1/ L2 speech fluency, pausing, speech rate, speed fluency, temporal measures of fluency

  • Elena Galkina, “Processing of Garden-Path Sentences Containing Silent and Filled Pauses in Stuttered Speech: Evidence From a Comprehensive Study,” Master's Thesis, University of South Carolina - Columbia, Columbia, South Carolina, USA, . 2015.

    Abstract Disfluency is common in spontaneous speech. Self-correction is a type of disfluency that consists of reparandum, filler, and repair (Levelt, 1989). Little is known about the processing of self-corrections in a normally disfluent speech, and even less is known about its processing in atypically disfluent speech (e.g. speech in patients with autism spectrum disorder, hearing impaired, patients with brain damage, and stuttered speech; see: Lake, Humphreys, & Cardy, 2011; Lind, Hickson, & Erber, 2004; Plexico et al., 2010; Rossi et al., 2011; Yairi, Gintautas, & Avent, 1981). This study focuses on self-correction disfluencies in garden-path sentences and employs a behavioral data collection method to investigate how disfluencies are processed as they are heard. This experiment examines spoken language comprehension by measuring accuracy and response time to comprehension questions. The data was gathered and analyzed. Two experimental conditions were presented where in the first one normal speakers listened to typically disfluent speech, and in the second one normal speakers listened to atypically disfluent stuttered speech. The information about the speakers in the recorded stimuli was kept from the listeners. Fillers, such as uh and um are common in stuttered speech because of their helpful role in starting an utterance. In stuttered speech, the uhs, ums and pauses tend to be longer and in odd places, relative to the speech of people who do not stutter. Therefore, the hypothesis of this study was that the fillers and pauses made by people who stutter affect the dynamics of processing, particularly in garden-path sentences. Namely, the accuracy rate for the comprehensive questions was predicted to be lower for the garden-path filled pause sentences, particularly for atypical speaker condition. Reaction time was predicted to be longer for the same condition. The analysis revealed an accuracy measure dependence on the speaker condition but no significant time correlation. This study provides significant information about how normal speakers’ comprehension is affected by disfluency such as pauses in general, and how speech impairment, such as stuttering, affects the processing of filled and silent pause disfluecies.

  • Lorenzo García-Amaya, “A longitudinal study of filled pauses and silent pauses in second language speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract This study provides a longitudinal analysis of speech rate and the use of filled pauses (FPs) and unfilled or silent pauses (SPs) in the oral production of L2 learners of Spanish in two learning contexts: a 6-week intensive overseas immersion program (OIM), and a 15-week US-based ‘at-home’ foreign language classroom (AH). Fifty-six native speakers of English performed two video-retell tasks at three different time points. A total of five measurements of oral production were calculated. The results show a significant increase in rate of speech over time in the OIM group compared to the AH group. Additionally, the OIM learners show greater use of “disfluencies” over time, namely FPs and short Sps. We suggest that OIM learners increase their use of hesitation phenomena over time as a speech processing and planning strategy and discuss this finding within the framework of L2 cognitive Fluency.

    Keywords disfluencies, DiSS, filled pauses, rate of speech, second language fluency, silent pauses, Spanish, study abroad

  • Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in multiparty social talk,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Much research on disfluencies in spontaneous spoken interaction has been carried out on corpora of task-based conversations, resulting in greater understanding of the role of several phenomena. Modern multimodal corpora allow the full spectrum of signals in face to face communication to be analysed. However, the ‘unmarked’ case of casual conversation or social talk with no obvious short-term instrumental goal has been less studied in this manner. Corpus-based work on social talk tends to deal with short dyadic interactions, although the norm for social conversation is for longer multiparty interaction. In this paper, we outline our programme of exploratory studies of disfluency in a longer multiparty conversation. We briefly describe the background to our research goals, and then report on the collection, transcription, and annotation of the data for our experiments. We present and discuss some of our early results.

    Keywords casual conversation, disfluency, DiSS, hesitation, repair, spoken interaction

  • Iulia Grosman, “Complexity cues or attention triggers? Repetitions and editing terms for native speakers of French,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract A growing stream of research shows evidence of the metalinguistic information that disfluencies (silent and filled pauses, repetitions, false-starts, repairs, etc.) can display to listeners. As a result, disfluencies may work as fluent devices. By means of a decision task latencies, this study investigates whether lexical repetition co-occurring with an editing term affects the perception of native speakers of French. There is a lack of consensus in the literature: do disfluencies trigger conceptual priming of complex entity or act simply as attention cues? Results from multiple analysis of variance and linear mixed-effect modelling show that the presence of a disfluency triggers a faster response from the participant, however complex the following noun-phrase might be, supporting the hypothesis that repetition and co-occurring editing terms act as cognitive signposts rather than as cues of complexity of an upcoming event.

    Keywords disfluencies, DiSS, French, perception, prosody, reaction time, repetitions

  • Sandra Götz, “Fluency in ENL, ESL and EFL: A corpus-based approach,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Against the background of a ‘cline model’ of increasing fluency/decreasing disfluency from ENL to ESL to EFL forms of English, the present pilot study investigates (dis)fluency features in British English, Sri Lankan English and German Learner English. The analysis of selected variables of temporal fluency (viz. unfilled pauses, mean length of runs) and fluency-enhancement strategies (viz. discourse markers, smallwords and repeats) is based on the c. 40,000-word subcorpora of the British and the Sri Lankan components of the International Corpus of English (ICE-GB and ICE-SL) and the c. 80,000-word German component of the Louvain International Database of Spoken English Interlanguage (LINDSEI-GE). The study reveals that, while the EFL variant shows the lowest degree of temporal fluency (e.g. the highest number of unfilled pauses), the findings are mixed for ESL and ENL (e.g. the ESL speakers show a lower number of unfilled pauses, but the ENL speakers show a higher number of smallwords). Also, variant-specific preferences of using certain fluency-enhancement strategies become clearly visible.

    Keywords corpus-based (dis)fluency, DiSS, ENL vs. ESL vs. EFL, Fluency, fluency profiles

  • Zara Harmon, and Vsevolod Kapatsinski, “Studying the dynamics of lexical access using disfluencies,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Faced with planning problems related to lexical access, speakers take advantage of a major function of disfluencies: buying time. It is reasonable, then, to expect that the structure of disfluencies sheds light on the mechanisms underlying lexical access. Using data from the Switchboard Corpus, we investigated the effect of semantic competition during lexical access on repetition disfluencies. We hypothesized that the more time the speaker needs to access the following unit, the longer the repetition. We examined the repetitions preceding verbs and nouns and tested predictors influencing the accessibility of these items. Results suggest that speed of lexical access negatively correlates with the length of repetition and that the main determinants of lexical access speed differ for verbs and nouns. Longer disfluencies before verbs appear to be due to significant paradigmatic competition from semantically similar verbs. For nouns, they occur when the noun is relatively unpredictable given the preceding context.

    Keywords DiSS, lexical access, lexicalization, repetition, semantic competition, sentence planning

  • Clara Hedenqvist, Frida Persson, and Robert Eklund, “Disfluency incidence in 6-year old Swedish boys and girls with typical language development,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract This paper reports the prevalence of disfluencies in a group of 55 (25F/30M) Swedish children with typical speech development, and within the age range 6;0 and 6;11. All children had Swedish as their mother tongue. Speech was elicited using an “event picture” which the children described in their own, spontaneously produced, words. The data were analysed with regard to sex differences and lexical ability, including size of vocabulary and word retrieval, which was assessed using the two tests Peabody Picture Vocabulary Test and Ordracet. Results showed that girls produced significantly more unfilled pauses, prolongations and sound repetitions, while boys produced more word repetitions. However, no correlation with lexical development was found. The results are of interest to speech pathologists who study early speech development in search for potential early predictors of speech pathologies.

    Keywords children, DiSS, lexical development, sex differences, speech disfluency

  • Julian Hough, Laura de Ruiter, Simon Betz, and David Schlangen, “Disfluency and laughter annotation agreement in a light-weight dialogue mark-up protocol,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Despite a great deal of research effort, disfluency and laughter annotation is still an unsolved problem, both in terms of consensus for a general applicable system, and in terms of annotation agreement metrics. In this paper we present a new annotation scheme within a light-weight mark-up for spontaneous speech. We show, despite the low overhead required for understanding the annotation protocol, it allows for good inter-annotator agreement and can be used to map onto existing disfluency categorization, with no loss of information.

    Keywords disfluency annotation, DiSS, German corpora, inter-annotator agreement, laughter, spontaneous speech

  • Peter Howell, “Intervention for children with word-finding difficulty: Impact on fluency during spontaneous speech for children using English as their native or as an additional language,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Types of intervention that could be targeted when there are high rates of word-finding difficulty were examined for any impact they had on speech fluency (whole-word repetition rate in particular). Results are reported that are interpreted as showing that a semantic-based intervention has an impact on fluency as well as word-finding.

    Keywords DiSS, EAL, intervention, stuttering, word-finding

  • Jennifer E. Mack, Sarah D. Chandler, Aya Meltzer-Asscher, Emily Rogalski, Sandra Weintraub, M.-Marsel Mesulam, and Cynthia K. Thompson, “What do pauses in narrative production reveal about the nature of word retrieval deficits in PPA?,” Neuropsychologia, vol. 77, 2015, pp. 211 - 222. DOI:

    Abstract Naming and word-retrieval deficits, which are common characteristics of primary progressive aphasia (PPA), differentially affect production across word classes (e.g., nouns, verbs) in some patients. Individuals with the agrammatic variant (PPA-G) often show greater difficulty producing verbs whereas those with the semantic variant (PPA-S) show greater noun deficits and those with logopenic PPA (PPA-L) evince no clear-cut differences in production of the two word classes. To determine the source of these production patterns, the present study examined word-finding pauses as conditioned by lexical variables (i.e., word class, frequency, length) in narrative speech samples of individuals with PPA-S (n=12), PPA-G (n=12), PPA-L (n=11), and cognitively healthy controls (n=12). We also examined the relation between pause distribution and cortical atrophy (i.e., cortical thickness) in nine left hemisphere regions of interest (ROIs) linked to word production. Results showed higher overall pause rates for PPA compared to unimpaired controls; however, greater naming severity was not associated with increased pause rate. Across all groups, more pauses were produced before lower vs. higher frequency words, with no independent effects of word length after controlling for frequency. With regard to word class, the PPA-L group showed a higher rate of pauses prior to production of nouns compared to verbs, consistent with noun-retrieval deficits arising at the lemma level of word production. Those with PPA-G and PPA-S, like controls, produced similar pause rates across word classes; however, lexical simplification (i.e., production of higher-frequency and/or shorter words) was evident in the more-impaired word class: nouns for PPA-S and verbs for PPA-G. These patterns are consistent with conceptual and/or lemma-level impairments for PPA-S, predominantly affecting objects/nouns, and a lemma-level verb-retrieval deficit for PPA-G, with a concomitant impairment in phonological encoding and articulation affecting overall pause rates. The greater tendency to pause before nouns was correlated with atrophy in the left precentral gyrus, inferior frontal gyrus and inferior parietal lobule, whereas the greater tendency to pause before less frequent and longer words was associated with atrophy in left precentral and inferior parietal regions.

    Keywords Brain–behavior relationship

  • Hanae Koiso, and Yasuharu Den, “Causal analysis of acoustic and linguistic factors related to speech planning in Japanese monologs,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract In this paper, we applied a general method of testing path models, investigating causal relationship between cognitive load in speech planning and four types of disfluencies in Japanese monologs. The four disfluencies examined were i) clause-initial fillers, ii) inter-clausal pauses, iii) clause-final lengthening, and iv) boundary pitch movements, which occurred at weak clause boundaries. The length of the constituents following weak clause boundaries was assumed to be a measure of the complexity affecting the cognitive load. By using a model selection technique based on the AIC, we found an optimal model with the smallest AIC, in which the constituent complexity had direct effects on all of the four disfluency variables. In addition, some of the disfluencies influenced one another; clause-final lengthening was enhanced by the presence of a boundary pitch movement and the occurrence of clause-initial fillers was affected by all the other three disfluency variables.

    Keywords boundary pitch movements, clause-final lengthening, DiSS, fillers, path models, pauses

  • Marie-José Kolly, Adrian Leemann, Philippe Boula de Mareüil, and Volker Dellwo, “Speaker-Idiosyncrasy in Pausing Behavior: Evidence from a Cross-Linguistic Study,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0294.1-5.

    Abstract Phoneticians study acoustic speech signals. But what about the aspects of speech where the signal is silent? The present study investigated speakers’ pausing behavior in their native and non-native speech. Pausing measures were applied in order to study between-speaker and within-speaker variability, where within-speaker variability was introduced by recording speakers in their native Zurich German, and in their second languages English and French. Results showed that pausing measures in the form of pause numbers and pause durations are speaker-specific. Furthermore, this speaker-specificity became evident across different languages. Results are discussed in the context of forensic voice comparison.

    Keywords forensic phonetics, pausing, second language, speaker-idiosyncrasy, temporal features

  • Jixing Li, and Sam Tilsen, “Phonetic Evidence for Two Types of Disfluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0766.1-5.

    Abstract Disfluency, such as pause (silences), filled pause (e.g., ‘um’, ‘uh’), repetition (e.g., ‘the the’) and cutoff word (e.g., ‘hori[zontal]-’), is a common part of human speech that occurs at a rate of 6 to 10 per 100 words [2, 5]. According to one model of speech production [8], there are two types of disfluency: disfluency at the internal planning stage (e.g., word-retrieval difficulties), and disfluency at the external monitoring stage (e.g., self-correction of speech errors). The current study provides phonetic evidence for the two types of disfluency by examining word durations before different types of disfluency in the Switchboard corpus [6]. The results showed only a marginal increase in the durations of words before cutoffs, but a large increase in the durations of words before repetitions, silences and filled pauses, suggesting internal processing difficulty before noncutoff disfluency, but not before cutoff disfluency.

    Keywords disfluency, duration, self-monitoring, Switchboard

  • Yan-Hua Long, and Hong Ye, “Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech,” PLoS ONE, vol. 10, no. 4, 04/2015 2015. DOI: doi:10.1371/journal.pone.0123466.

    Abstract Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  • Kikuo Maekawa, and Hiroki Mori, “Voice quality analysis of Japanese filled pauses : a preliminary report,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Using the Core of the Corpus of Spontaneous Japanese, acoustic analysis of F1, spectral tilt (TL), H1-H2, jitter and F0 was conducted to examine the voice-quality difference between the vowels in filled pauses and those in ordinary lexical items. It turned out by simple SVM analysis that the two classes of vowels could be discriminated with the mean accuracy of higher than 70%.

    Keywords DiSS

  • Kirsty McDougall, Martin Duckworth, and Toby Hudson, “Individual and Group Variation in Disfluency Features: A Cross-Accent Investigation,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0308.1-5.

    Abstract A study of individual differences in the fluency disruptions of speakers of two different accents, Standard Southern British English (SSBE) and York English is presented. Distributions of rates of occurrence per 100 syllables are examined for filled and silent pauses, repetitions, prolongations and (self-)interruptions, and subcategories of these. Patterns of occurrence of disfluency features show considerable between-speaker variation in both SSBE and York English. Similar ranges of speakers’ overall disfluency rates are exhibited by both accents, but cross-accent differences are present in the patterning of some disfluency feature categories. The results suggest that a detailed record of disfluency features is a useful additional tool in forensic speaker comparison.

    Keywords accent differences, disfluency, forensic speaker comparison, individual differences

  • Helena Moniz, Jaime Ferreira, Fernando Batista, and Isabel Trancoso, “Disfluency detection across domains,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.

    Keywords acoustic-prosodic features, cross-domain analysis, disfluency detection, DiSS, European Portuguese.

  • Helena Moniz, A. Pompili, Fernando Batista, Isabel Trancoso, A. Abad, and C. Amorim, “Automatic Recogntion of Prosodic Patterns in Semantic Verbal Fluency Tests – An Animal Naming Task for Edutainment Applications,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0997.1-5.

    Abstract This paper automatically detects prosodic patterns in the domain of semantic fluency tests. Verbal fluency tests aim at evaluating the spontaneous production of words under constrained conditions. Mostly used for assessing cognitive impairment, they can be used in a plethora of domains, as edutainment applications or games with educational purposes. This work discriminates between list effects, disfluencies, and other linguistic events in an animal naming task. Recordings from 42 Portuguese speakers were automatically recognized and AuToBI was applied in order to detect prosodic patterns, using both European Portuguese and English models. Both models allowed to differentiate list effects from the other events, mostly represented by the tunes: L* H/L(-%) (English models) or L*+H H/L(-%) (Portuguese models). However, English models proved to be more suitable because they rely in substantial more training material.

    Keywords and Automatic Speech Recognition, Edutainment, prosody, Semantic Fluency

  • Sieb Nooteboom, and Hugo Quené, “The Word-Onset Effect: Some Contradictory Findings,” 2015.

    Abstract In this paper we describe two experiments exploring possible for reasons for earlier conflicting results concerning the so-called word-onset effect in interactional segmental speech errors. Experiment 1 elicits errors in pairs of CVC real words with the SLIP technique. No word-onset effect is found. Experiment 2 is a tongue-twister experiment with lists of four disyllabic words. A significant word-onset effect is found. The conflicting results are not resolved. We also found that intervocalic consonants hardly ever interact with initial and final consonants, and that words sharing a stress pattern are a major factor in generating interactional errors.

  • Núria Enríquez, Lourdes Díaz, and Mariona Taulé, “Mental Processes in the Oral Production of Non-Native Spanish Speakers: Pauses and Self-Correction,” Procedia - Social and Behavioral Sciences, vol. 173, 2015, pp. 24-30. DOI:

    Abstract In the field of teaching Spanish as a Foreign Language (SFL), textbooks and teaching materials often provide learners with language samples characterized by a lack of naturalness. We propose the use of a prototypical model of core competence, obtained from the analysis of communicative situations based on real corpora and the comparison of the same type of work with native and non-native speakers. The specific objective is the study of communication strategies related to pauses and self-correction in native and non-native speech, in order to analyse the repair strategies related to language processing

    Keywords L1/L2 corpora

  • Leendert Plug, “Prosodic Marking and Predictability in Lexical Self-Repair,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0032.1-5.

    Abstract This paper reports on an investigation of lexical self-repair in Dutch spontaneous dialogue. Lexical self-repairs, in which one word is rejected for another, can be produced with or without notable ’prosodic marking’ of the second word. It remains unclear what motivates speakers‘ choices, but previous research has shown that the semantic distance between the two words is relevant. This study assesses the relevance of the words’ predictability. Prosodic marking judgements are modelled using an established semantic classification and a range of probabilistic variables, including both frequency-based and cloze-based measures. Results suggest that probabilistic measures add little predictive power to the semantic classification, although informative data trends can be observed.

    Keywords Dutch, predictability, prosody, self-repair, spontaneous speech

  • Sandra Reitbrecht, and Ursula Hirschfeld, “The Impact of Fluency and Hesitation Phenomena on the Perception of Non-native Speakers by Native Listeners of German,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0166.1-4.

    Abstract The here presented and ongoing study addresses L2 fluency and hesitation phenomena in the context of speech effects in intercultural communication. It investigates the impact of fluency and hesitation phenomena on the perception of non-native speakers by native listeners of German. The first results underline the importance and salience of hesitation phenomena and fluency for speech effects and suggest a higher consideration of these features in future studies. Native recipients’ verbal reactions to L2 speech material show that they often make reference to features of L2 utterance fluency to explain how they perceive non-native speakers, their personality and their emotional state. Furthermore, Spearman’s rank correlation tests for a certain number of fixed perceptual categories prove significant correlations between perceived fluency and the attributes assured (r(309)=0.617, p<0.01), well prepared (r(303)=0.589, p<0.01), competent (r(305)=0.483, p<0.01), relaxed (r(307)=0.375, p<0.01) and nervous (r(309)=-0.322, p<0.01).

    Keywords Czech, Fluency, French, German as a foreign language, speech effects

  • Ralph Rose, “Um and uh as differential delay markers: the role of contextual factors,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract The English filled pauses uh and um have been argued to correspond respectively to shorter and longer anticipated delays in speech production. This study looks at some contextual factors that might cause this difference by investigating filled pause instances in monologue and conversation speech corpora. Results are consistent with previously observed delay differences and further show that discourse-level processing may influence differential delay marking though monologue results are more conclusive than conversation results. However, no evidence was found that lexical factors (word type, frequency) correlate with filled pause choice. The findings suggest a limited view of how speakers use filled pauses as delay markers: Not all contextual factors may trigger differential delay marking.

    Keywords contextual factors, delay, DiSS, filled pause

  • Ralph Rose, “Temporal Variables in First and Second Language Speech and Perception of Fluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0405.1-5.

    Abstract Evidence is accumulating that many temporal features of second language speech are correlated with those of first language speech. This study looks at the correlation between articulation rate, pause rate, and mean pause duration in Japanese first and English second language speech and how second language fluency raters perceive these. In a crosslinguistic corpus of spontaneous speech, mean pause duration was found to have a near-high correlation while the other two temporal variables have a moderate correlation. A subsequent elicitation of fluency judgments on the second language English speech via Amazon Mechanical Turk showed that ratings were highly dependent on pause duration, rather less on articulation rate, but not on pause rate. Results suggest that raters’ perception of second language fluency is divergent from speakers’ actual second language development: Ratings are related to features that are not indicative of second language development but rather of individual speech patterns.

    Keywords articulation rate, Fluency, second language acquisition, silent pause

  • Sara Bögels, Kobin H. Kendrick, and Stephen C. Levinson, “Never Say No … How the Brain Interprets the Pregnant Pause in Conversation,” PLoS ONE, vol. 10, no. 12, 2015, pp. 15. DOI: 10.1371/journal.pone.0145474.

    Abstract In conversation, negative responses to invitations, requests, offers, and the like are more likely to occur with a delay–conversation analysts talk of them as dispreferred. Here we examine the contrastive cognitive load ‘yes’ and ‘no’ responses make, either when relatively fast (300 ms after question offset) or delayed (1000 ms). Participants heard short dialogues contrasting in speed and valence of response while having their EEG recorded. We found that a fast ‘no’ evokes an N400-effect relative to a fast ‘yes’; however, this contrast disappeared in the delayed responses. ’No’ responses, however, elicited a late frontal positivity both if they were fast and if they were delayed. We interpret these results as follows: a fast ‘no’ evoked an N400 because an immediate response is expected to be positive–this effect disappears as the response time lengthens because now in ordinary conversation the probability of a ‘no’ has increased. However, regardless of the latency of response, a ‘no’ response is associated with a late positivity, since a negative response is always dispreferred. Together these results show that negative responses to social actions exact a higher cognitive load, but especially when least expected, in immediate response.

  • Miki Shrosbree, “Cross-Linguistic Articulation Rate among Near-Balanced Bilinguals and Implications for Second Language Fluency Measurement,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0572.1-4.

    Abstract The present study examines cross-linguistic articulation rates in read speech among 28 native speakers (14 English and 14 Japanese) and 14 Japanese-English near-balanced bilinguals. The results show that: (1) articulation rates are comparable between the native speakers and the bilinguals; (2) there was a significant difference of articulation rates in Japanese and English among the bilinguals; (3) there is a strong positive correlation between English and Japanese articulation rates among bilinguals. Implications for development of L2 fluency measurement using the L1 fluency as a baseline are discussed.

    Keywords articulation rate, balanced bilingual, Fluency, second language, speech rate

  • Vered Silber-Varod, Adva Weiss, and Noam Amir, “Can you hear these mid-front vowels? Formants analysis of hesitation disfluencies in spontaneous Hebrew,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract This study attempts to characterize the timbre of the default type of hesitation disfluency (HD) in Israeli Hebrew: the mid-front vowel /e/. For this purpose, we analysed the frequencies of the first three formants, F1, F2, and F3, of hundreds of HD pronunciations taken from The Corpus of Spoken Israeli Hebrew (COSIH). We also compared the formant values with two former studies that were carried out on the vowel /e/ in fluent speech. The findings show that, in general, elongated word-final syllables and appended [e]s are pronounced with the same amount of openness as fluent [e], while filled pauses tend to be more open (lower F1), and more frontal (higher F2). Following these results, we suggest to use different set of IPA symbols, and not the phonemic mid-front /e/, in order to better represent hesitation disfluencies.

    Keywords DiSS, filled pauses, formants, Hebrew, hesitation disfluency, LPC analysis, spontaneous speech

  • Anton Stepikhov, and Anastassia Loukina, “Sentence Boundaries in Text and Pauses in Speech: Correlation or Confrontation?,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0588.1-5.

    Abstract The paper explores the interaction between sentence boundaries marked by annotators in transcriptions of Russian spontaneous speech and actual prosodic boundaries in the signal. The aim of the research is to investigate whether annotators’ prosodic competence allows them to correctly detect sentence boundaries in speech based on textual information only. We found that inter-annotator agreement for each sentence boundary identified in transcription was affected by both presence or absence of pause and pause duration. Mixed linear model showed that presence or absence of pause explain 13% of variance in boundary detection. Pause duration explained only 4% of variance in inter-annotator agreement with moderate correlation of r = 0.21. We argue that relatively small size of effect in this case may be due to the interaction of different pausing strategies typical for reading and spontaneous speech, ambiguity of sentence boundaries and individual differences in speech perception.

    Keywords annotation, boundary detection, pausing, Russian, spontaneous speech

  • Jozsef Szakos, and Ulrike Glavitsch, “Investigating disfluency in recordings of last speakers of endangered Austronesion languages in Taiwan,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract The nearly three decades spent in Formosan language documentation produced hundreds of hours of recorded speech. In this paper, we show how the use of SpeechIndexer for transcribing and indexing the data visualises the problem of disfluency in the spontaneous narratives and dialogues. The semiautomatic alignment of speech and transcription needs to be adjusted manually each time when unpredictable pauses occur which are disfluencies, rather than markers of phrasal units. It is illustrated how the combination of SpeechIndexer’s pause finder with pitch measurements can help to pinpoint the difference of phrasal boundaries and pauses of disfluency.

    Keywords Austronesian, DiSS, lesser-documented unwritten language, pause finder, SpeechIndexer

  • Leimin Tian, Catherine Lai, and Johanna Moore, “Recognising emotions in dialogues with disfluencies and non-verbal vocalisations,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV features alone is not as predictive as using lexical or acoustic features. However, adding them to lexical or acoustic feature set yields improvement compared to using lexical or acoustic features alone. This indicates that disfluencies and non-verbal vocalisations provide useful information overlooked by the other two types of features for emotion recognition.

    Keywords Dialogue, disfluency, DiSS, emotion recognition, HCI, speech processing

  • Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King, “A lattice-based approach to automatic filled pause insertion,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.

    Keywords disfluency, DiSS, f-RNNLMs, filled pauses, lattices, Ngrams

  • Gunnel Tottie, “From pause to word: Uh and um in written language.,” in ICAME 36 (WORDS, WORDS, WORDS – CORPORA AND LEXIS), 05/2015 2015, pp. 174.

    Abstract (none)

  • Michiko Watanabe, Yosuke Kashiwagi, and Kikuo Maekawa, “The relationship between preceding clause type, subsequent clause length and duration of silent and filled pauses at clause boundaries in Japanese monologues,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Filled pauses (FPs) are claimed to occur when speakers have some difficulties and need extra time in speech production. This study investigated whether the following two factors affect silent pause (SP) and FP durations at clause boundaries, using a spontaneous speech corpus: 1) boundary strength and 2) subsequent clause length. First, whether SP and FP durations increase with syntactic boundary strength was examined. Second, whether subsequent clause length affects SP and FP durations at the boundaries was investigated. Results show SP duration increased with boundary strength and subsequent clause length, but FP duration did not, suggesting only SP duration is affected by the two Factors.

    Keywords clause boundary, disfluency, DiSS, filled pause, silent pause, speech planning

  • Mirjam Wester, Martin Corley, and Rasmus Dall, “The temporal delay hypothesis: natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Including disfluencies in synthetic speech is being explored as a way of making synthetic speech sound more natural and conversational. How to measure whether the resulting speech is actually more natural, however, is not straightforward. Conventional approaches to synthetic speech evaluation fall short as a listener is either primed to prefer stimuli with filled pauses or, when they aren’t primed they prefer more fluent speech. Psycholinguistic reaction time experiments may circumvent this issue. In this paper, we revisit one such reaction time experiment. For natural speech, delays in word onset were found to facilitate word recognition regardless of the type of delay; be they a filled pause (um), silence or a tone. We expand these experiments by examining the effect of using vocoded and synthetic speech. Our results partially replicate previous findings. For natural and vocoded speech, if the delay is a silent pause, significant increases in the speed of word recognition are found. If the delay comprises a filled pause there is a significant increase in reaction time for vocoded speech but not for natural speech. For synthetic speech, no clear effects of delay on word recognition are found. We hypothesise this is because it takes longer (requires more cognitive resources) to process synthetic speech than natural or vocoded speech.

    Keywords delay hypothesis, disfluency, DiSS

  • Maria K. Wolters, Luis Ferrini, Elaine Farrow, Aurora Szentagotai Tatar, and Christopher D. Burton, “Tracking Depressed Mood Using Speech Pause Patterns,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0811.1-5.

    Abstract The speech of people with depression often shows clear signs of their condition (e.g., flat intonation, slow speech, long pauses), but it is not clear to what extent these signs covary with diurnal fluctuations in mood. In this paper, we report results from a pilot longitudinal study where 11 people with depression tracked various aspects of their mental health for a month. This included a daily mood tracker and regular completion of speech tasks. Speech tasks were designed to be emotionally neutral and require different levels of automaticity. We found that participants differed in their willingness to complete the speech tasks, and that preliminary analyses show no clear link between mood and prosody. We discuss implications of this study for tracking depressed mood using speech in real-life applications.

    Keywords depression, emotion, pauses, prosody

  • Clare Wright, and Cong Zhang, “The effect of study abroad experience on L2 Mandarin disfluency in different types of tasks,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015.

    Abstract Disfluency is a common phenomenon in L2 speech, especially in beginners’ speech. Whether studying abroad can help with reducing their disfluency or not remains debated [8]. We examined longitudinal data from 10 adult English instructed learners of Mandarin measured before and after ten months of studying abroad (SA) in this paper. We used two speaking tasks comparing pre-planned vs. Unplanned spontaneous speech to compare differences over time and between tasks, using eight linguistic and temporal fluency measures (analysed using CLAN and PRAAT). Overall mean linguistic and temporal fluency scores improved significantly (p < .05), especially speech rate (p <.01), supporting the general claim that SA favours oral development, particularly fluency [2]. Further analysis revealed task differences at both times of measurement, but with greater improvement in the spontaneous task.

    Keywords DiSS, Fluency, L2 Mandarin, study abroad


  • Hans Rutger Bosker, Hugo Quené, Ted Sanders, and Nivja H. de Jong, “Native ‘um’s elicit prediction of low-frequency referents, but non-native ‘um’s do not,” Journal of Memory and Language, vol. 75, 2014, pp. 104 - 116. DOI:

    Abstract Speech comprehension involves extensive use of prediction. Linguistic prediction may be guided by the semantics or syntax, but also by the performance characteristics of the speech signal, such as disfluency. Previous studies have shown that listeners, when presented with the filler uh, exhibit a disfluency bias for discourse-new or unknown referents, drawing inferences about the source of the disfluency. The goal of the present study is to study the contrast between native and non-native disfluencies in speech comprehension. Experiment 1 presented listeners with pictures of high-frequency (e.g., a hand) and low-frequency objects (e.g., a sewing machine) and with fluent and disfluent instructions. Listeners were found to anticipate reference to low-frequency objects when encountering disfluency, thus attributing disfluency to speaker trouble in lexical retrieval. Experiment 2 showed that, when participants listened to disfluent non-native speech, no anticipation of low-frequency referents was observed. We conclude that listeners can adapt their predictive strategies to the (non-native) speaker at hand, extending our understanding of the role of speaker identity in speech comprehension.

    Keywords Speech comprehension

  • Eszter Tisljár-Szabó, and Csaba Pléh, “Ascribing emotions depending on pause length in native and foreign language speech,” Speech Communication, vol. 56, 2014, pp. 35-48. DOI:

    Abstract Although the relationship between emotions and speech is well documented, little is known about the role of speech pauses in emotion expression and emotion recognition. The present study investigated how speech pause length influences how listeners ascribe emotional states to the speaker. Emotionally neutral Hungarian speech samples were taken, and speech pauses were systematically manipulated to create five variants of all passages. Hungarian and Austrian participants rated the emotionality of these passages by indicating on a 1–6 point scale how angry, sad, disgusted, happy, surprised, scared, positive, and heated the speaker could have been. The data reveal that the length of silent pauses influences listeners in attributing emotional states to the speaker. Our findings argue that pauses play a relevant role in ascribing emotions and that this phenomenon might be partly independent of language.

    Keywords Foreign language

  • Ian R. Finlayson, “Testing the roles of disfluency and rate of speech in the coordination of conversation,” Master's Thesis, Queen Margaret University, Edinburgh, Scotland, UK, . 2014.

    Abstract This thesis is concerned with two different accounts of how speakers coordinate conversation. In both accounts it is suggested that aspects of the manner in which speech is performed (its disfluency and its rate) are integral to the smooth performance of conversation. In the first strand, we address Clark’s (1996) suggestion that speakers design hesitations, such as filled pauses (e.g. uh and um), repetitions and prolongations, to signal to their audience that they are experiencing difficulties during language production. Such signals allow speakers to account for their use of time, particularly when they experience disruptions during production. The account is tested against three criteria, proposed by Kraljic and Brennan (2005), for evaluating whether a feature of speech is being designed: That it be produced with regularity, that it be interpretable by listeners, and that its production varies according to the speaker’s communicative intention. While existing literature offers support for the first two criteria, neither an experiment with dyads nor analyses of dialogue in the Map Task Corpus (MTC; Anderson et al., 1991) found support for the third criterion. We conclude that, rather than being signals of difficulty, hesitations are merely symptoms which listeners may exploit to aid comprehension. In the second strand, we tested Wilson and Wilson’s (2005) oscillator theory of the timing of turn-taking. This suggests that entrainment between conversational partners’ rates of speech allow them to make precise predictions about when each others’ turns are going to end, and, subsequently, when they can begin a turn of their own. As a critical test of the theory, we predicted that speakers who were more tightly entrained would produce more seamless turn-taking. Again using the MTC, we found no evidence of a relationship between how closely entrained speakers were and how precisely they timed the beginning of their turns relative to the ends of each others’ turns.

  • Craig Lambert, and Judit Kormos, “Complexity, Accuracy, and Fluency in Task-based L2 Research: Toward More Developmentally Based Measures of Second Language Acquisition,” Applied Linguistics, vol. 35, no. 5, 08/2014 2014, pp. 607-614. DOI:

    Abstract This article surveys how complexity, accuracy, and fluency (CAF) have been operationalized in studies of task-based L2 production, pointing out some problems with this approach and the need for more precise information about L2 development during task performance. Research into developing L1 text construction ability is then discussed and some approaches for establishing measures of the relevant constructs in L2 performance are suggested.

  • Charlyn M. Laserna, Yi-Tai Seih, and James W. Pennebaker, “Um . . . Who Like Says You Know : Filler Word Use as a Function of Age, Gender, and Personality,” Journal of Language and Social Psychology, vol. 33, no. 3, 2014, pp. 328-338. DOI: 10.1177/0261927X14526993.

    Abstract Filler words ('I mean, you know, like, uh, um') are commonly used in spoken conversation. The authors analyzed these five filler words from transcripts recorded by a device called the Electronically Activated Recorder (EAR), which sampled participants’ language use in daily conversations over several days. By examining filler words from 263 transcriptions of natural language from five separate studies, the current research sought to clarify the psychometric properties of filler words. An exploratory factor analysis extracted two factors from the five filler words: filled pauses ('uh, um') and discourse markers ('I mean, you know, like'). Overall, filled pauses were used at comparable rates across genders and ages. Discourse markers, however, were more common among women, younger participants, and more conscientious people. These findings suggest that filler word use can be considered a potential social and personality marker.

    Keywords discourse marker, EAR, filler word, LIWC

  • Olga Vyacheslavovna Maletina, “All Theses and Dissertations Understanding L1-L2 Fluency Relationship Across Different Languages and Different Proficiency Levels,” Master's Thesis, Brigham Young University. 06/2014 2014, pp. 4094.

    Abstract The purpose of this research was to better understand the relationship between L1 and L2 fluency, precisely, whether there is a relationship between L1 and L2 temporal fluency measures and whether this relationship differs across different languages and different proficiency levels. In order to answer these questions, L1 and L2 speech samples of the same speakers were collected and analyzed. Twenty-five native speakers and 45 non-native speakers of Japanese, Mandarin Chinese, Portuguese, Spanish, and Russian were asked to respond to questions and perform picture descriptions in their L1 and L2. The recorded speech samples were then analyzed by means of a Praat script in order to identify mean length of run (MLR), speech rate, and number of pauses. Several different statistical analyses were then performed to compare these L1 and L2 temporal features across different languages and different proficiency levels. The results of this study indicate that there is a strong relationship between L1 and L2 fluency and that this relationship may play a role in L2 production. Furthermore, it was found that native languages differ in their patterns of L1 temporal fluency production and that these differences may affect the production of L2 temporal fluency. It was also found that L1-L2 fluency relationship did not differ at different proficiency levels suggesting that individual factors may play a role in L2 fluency production. Thus, it was found that an Intermediate speaker of Spanish, for instance, did not speak faster than an Intermediate speaker of Russian, suggesting that naturally slower speakers in their L1 will still speak slower in their L2. These results indicate that fluency is as much of a trait as it is a state. However, it was also found that not all of the L1-L2 language combinations demonstrated the same results, indicating that the L1-L2 fluency relationship is affected by the L2. These findings have different implications for both L2 teaching and learning, as well as L2 assessment of fluency and overall language proficiency.

    Keywords acquisition, Fluency, proficiency, second-language

  • O’Brien,Mary Grantham, “L2 Learners’ Assessments of Accentedness, Fluency, and Comprehensibility of Native and Nonnative German Speech,” Language Learning, vol. 64, no. 4, 12/2014 2014, pp. 715-748. DOI: 10.1111/lang.12082.

    Abstract In early stages of classroom language learning, many adult second language (L2) learners communicate primarily with one another, yet we know little about which speech stream characteristics learners tune into or the extent to which they understand this lingua franca communication. In the current study, 25 native English speakers learning German as a L2 with varying levels of German proficiency rated German speech produced by native speakers and fellow learners of German along three continua: accentedness, fluency, and comprehensibility. An examination of speech stream (i.e., phonological, fluency based, and lexical/grammatical) characteristics along with partial correlations indicates both that the raters distinguished among the three concepts but that they conflated the term fluency with proficiency. Self‐reported proficiency in German and linguistic training were the best predictors of the ratings assigned.

    Keywords accentedness, Comprehensibility, Fluency, German, L2 raters, L2 speech

  • Vikram Ramanarayanan, Adam Lammert, Louis Goldstein, and Shrikanth Narayanan, “Are Articulatory Settings Mechanically Advantageous for Speech Motor Control?,” PLoS ONE, vol. 9, no. 8, 08/2014 2014, pp. e104168. DOI: 10.1371/journal.pone.0104168.

    Abstract We address the hypothesis that postures adopted during grammatical pauses in speech production are more “mechanically advantageous” than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and consonant intervals using automated analysis of video captured with real-time magnetic resonance imaging during production of read and spontaneous speech by 5 healthy speakers of American English. We then use locally-weighted linear regression to estimate the articulatory forward map from low-level articulator variables to high-level task/goal variables for these postures. We quantify the overall magnitude of the first derivative of the forward map as a measure of mechanical advantage. We find that postures assumed during grammatical pauses in speech as well as speech-ready postures are significantly more mechanically advantageous than postures assumed during absolute rest. Further, these postures represent empirical extremes of mechanical advantage, between which lie the postures assumed during various vowels and consonants. Relative mechanical advantage of different postures might be an important physical constraint influencing planning and control of speech production.

  • Scott H. Fraundorf, and Duane G. Watson, “Alice’s adventures in um-derland: psycholinguistic sources of variation in disfluency production,” Language, Cognition and Neuroscience, vol. 29, no. 9, 2014, pp. 1083-1096. DOI: 10.1080/01690965.2013.832785.

    Abstract This study tests the hypothesis that three common types of disfluency (fillers, silent pauses and repeated words) reflect variance in what strategies are available to the production system for responding to difficulty in language production. Participants’ speech in a storytelling paradigm was coded for the three disfluency types. Repeats occurred most often when difficult material was already being produced and could be repeated, but fillers and silent pauses occurred most when difficult material was still being planned. Fillers were associated only with conceptual difficulties, consistent with the proposal that they reflect a communicative signal, whereas silent pauses and repeats were also related to lexical and phonological difficulties. These differences are discussed in terms of different strategies available to the language production system.

    Keywords discourse, Disfluency, Language production

  • Gunnel Tottie, “On the use of uh and um in American English,” Functions of Language, vol. 21, no. 1, 2014, pp. 6-29. DOI:

    Abstract This study examines the use of uh and um — referred to jointly as UHM — in 14 conversations totaling c. 62,350 words from the Santa Barbara Corpus of Spoken American English. UHM was much less frequent than in British English with 7.5 vs. 14.5 instances per million words in the British National Corpus. However, as in British English the frequency of UHM was closely correlated to extra-linguistic context. Conversations in non-private environments (such as offices and classrooms) had higher frequencies than those taking place in private spaces, mostly homes. Time required for planning, especially when difficult subjects were discussed, appeared to be an important explanatory factor. It is clear that UHM cannot be dismissed as mere hesitation or disfluency; it functions as a pragmatic marker on a par with well, you know, and I mean, sharing some of the functions of these in discourse. Although the role of sociolinguistic factors was less clear, the tendencies for older speakers and educated speakers to use UHM more frequently than younger and less educated ones paralleled British usage, but contrary to British usage, there were no gender differences.


  • Julie Beliao, and Anne Lacheret, “Disfluency and discursive markers: when prosody and syntax plan discourse,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 5-8.

    Abstract Hesitations, interruptions within phrases or within words are common in spontaneous speech. Those phenomena are widely known to be observable from a prosodic point of view through disfluencies. From a syntactic point of view, many studies already established that discursive markers such as hm, oh, I mean, etc. are representative of spontaneous speech. In this study, we demonstrate through a joint corpus-based analysis that these prosodical and syntactical features are correlated, without however being equivalent. More precisely, the lack of either disfluencies or discursive markers is consistently shown to be representative of a planned discourse.

    Keywords discursive marker, disfluency, DiSS, genres

  • Malte Belz, and Myriam Klapi, “Pauses following fillers in L1 and L2 German map task dialogues,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 9-12.

    Abstract Fillers and pauses in spoken language indicate hesitations. Filler type (uh vs. um) is believed to signal a minor or major following speech delay in L1. We examined whether advanced speakers of L2 German use pauses following filler type (äh vs. ähm) in the same way as native speakers do. Two Map Task corpora of L1 and L2 were contrasted with respect to speaker role, filler type and the exact time interval of fillers and pauses. Speaker role influenced the disfluency patterns in L1 and L2 in the same way. Filler type had no impact on the length of the following pause, but the time interval patterns differed significantly. Longer filler intervals are followed by longer pauses in L2 and by shorter pauses in L1. These results suggest that filler type in German is not used to indicate the length of the following delay. Advanced learners seem to have adopted this pattern of use, but cannot overcome their hesitations as fast as native speakers, probably due to their less automatised speech production.

    Keywords contrastive analysis, disfluencies, DiSS, fillers, German, L1, L2, map task, pauses, spontaneous speech

  • Sara Candeias, Dirce Celorico, Jorge Proença, Arlindo Veiga, and Fernando Perdigão, “HESITA(tions) in Portuguese: a database,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 13-16.

    Abstract With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.

    Keywords annotation, disfluency, DiSS, hesitation corpus, hesitations, prepared speech, spontaneous speech

  • Rebecca Carroll, and Esther Ruigendijk, “The Effects of Syntactic Complexity on Processing Sentences in Noise,” Journal of Psycholinguistic Research, vol. 42, no. 2, 2013, pp. 139–159. DOI: 10.1007/s10936-012-9213-7.

    Abstract This paper discusses the influence of stationary (non-fluctuating) noise on processing and understanding of sentences, which vary in their syntactic complexity (with the factors canonicity, embedding, ambiguity). It presents data from two RT-studies with 44 participants testing processing of German sentences in silence and in noise. Results show a stronger impact of noise on the processing of structurally difficult than on syntactically simpler parts of the sentence. This may be explained by a combination of decreased acoustical information and an increased strain on cognitive resources, such as working memory or attention, which is caused by noise. The noise effect for embedded sentences is less than for non-embedded sentences, which may be explained by a benefit from prosodic information.

  • Nivja H. de Jong, and Hans Rutger Bosker, “Choosing a threshold for silent pauses to measure second language fluency,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 17-20.

    Abstract Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.

    Keywords DiSS, duration of pauses, number of pauses, second language speech, silent pause threshold, silent pauses

  • Nivja H. de Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, and Jan H. Hulstijn, “Linguistic skills and speaking fluency in a second language,” Applied Psycholinguistics, vol. 34, no. 5, 09/2013 2013, pp. 893-916. DOI: 10.1017/S0142716412000069.

    Abstract This study investigated how individual differences in linguistic knowledge and processing skills relate to individual differences in speaking fluency. Speakers of Dutch as a second language (N = 179) performed eight speaking tasks, from which several measures of fluency were derived such as measures for pausing, repairing, and speed (mean syllable duration). In addition, participants performed separate tasks, designed to gauge individuals’ second language linguistic knowledge and linguistic processing speed. The results showed that the linguistic skills were most strongly related to average syllable duration, of which 50% of individual variance was explained; in contrast, average pausing duration was only weakly related to linguistic knowledge and processing skills.

  • Laura E. de Ruiter, “Self-repairs in German children’s peer interaction - initial explorations,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 29-32.

    Abstract Forty-nine self-repairs were extracted from a corpus of conversational speech of ten German children (mean age 5;1) with peers. The repairs were analysed using Levelt’s [1] classification and compared with his adult data. Children produced fewer appropriateness repairs than adults, but more covert repairs and more phonetic repairs. Like adults, children had a preference to interrupt themselves within-word only for error repairs. Unlike adults, children did not produce editing terms following interruptions.

    Keywords DiSS

  • Andrea Deme, and Alexandra Markó, “Lengthenings aand filled pauses in Hungarian adults’ and children’s speech,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 21-24.

    Abstract In the present paper vowel lengthenings and non-lexicalized filled pauses were studied in the spontaneous speech of children and adults (focusing more on the much less studied phenomenon: vowel lengthening). The results revealed different usage and appearance of lengthenings in the two age groups, therefore, differences in speech skills and strategies can be concluded. LEs and FPs differ mostly in their position in the speech session between the age groups, which has implications regarding different planning strategies of adults and children. We also draw conclusions regarding the methodological considerations in the issue of identifying vowel lengthening supporting a previously formulated conception.

    Keywords (non-lexicalized) filled pause, discourse management, DiSS, lengthening, speech planning, spontaneous speech

  • Yasuharu Den, and Natsuko Nakagawa, “Anti-zero pronominalization: when Japanese speakers overtly express omissible topic phrases,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 25-28.

    Abstract In this paper, we focus on cases where Japanese speakers overtly express a topic phrase that could have been omitted. We call this phenomenon anti-zero-pronominalization and hypothesize that this helps speakers gain time for planning a following utterance; anti-zero-pronominalization is another option to deal with cognitive load at the beginning of an utterance in addition to fillers and other speech disfluencies. Based on a quantitative analysis of a corpus of spontaneous Japanese dialogs, we investigate the difference between overt topic NPs and zero-pronouns. We show that i) the utterance is more complex when the topic is expressed as an overt NP than when it is expressed as a zero-pronoun; ii) turn-initial items such as fillers are produced less frequently when overt NPs appear than when zero-pronouns appear; and iii) the utterance becomes more complex when the last mora of the topic is more prolonged.

    Keywords cognitive load, DiSS, Japanese dialogs, topic phrases, zero-pronouns

  • Luis J. García-López, M. Belén Díez-Bedmar, and José M. Almansa-Moreno, “From Being a Trainee to Being a Trainer: Helping Peers Improve their Public Speaking Skills,” Revista de Psicodidáctica, vol. 18, no. 2, 2013, pp. 331-342. DOI: 10.1387/RevPsicodidact.6419.

    Abstract Although public speaking anxiety is present at all educational stages, the university period is critical since the students’ lack of oral communication skills may prevent them from accomplishing their educational goals. To improve this situation, a two-fold objective was pursued in this study. First, to examine the effects of a 3-hour public speaking training workshop for Psychology undergraduates. Second, to test if these students could effectively train other undergraduates to use public speaking skills and reduce their anxiety by using a collaborative methodology and peer tutoring. The findings prove that the training of Psychology students resulted in their peers’ improvement of their oral communication skills and reduction of their speech anxiety. Both groups of students benefited from the study: Psychology students had the opportunity to improve their communication skills and gained practical experience, and the other undergraduates received a free, personalized and successful workshop which improved their communication skills and reduced their anxiety levels.

    Keywords collaborative methodology, Communication skills, peers, public speaking

  • Jonathan Ginzburg, Raquel Fernández, and David Schlangen, “Self-addressed questions in disfluencies,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 33-36.

    Abstract The paper considers self-addressed queries – queries speakers address to themselves in the aftermath of a filled pause. We study their distribution in the BNC and show that such queries show signs of sensitivity to the syntactic/semantic type of the sub-utterance they follow. We offer a formal model that explains the coherence of such queries.

    Keywords DiSS

  • Sandra Götz, Fluency in Native and Nonnative English Speech. Amsterdam, Netherlands: John Benjamins Publishing Company.2013, pp. 238. DOI: 10.1075/scl.53.$#$catalog/books/scl.53/main.

    Abstract This book takes a new and holistic approach to fluency in English speech and differentiates between productive, perceptive, and nonverbal fluency. The in-depth corpus-based description of productive fluency points out major differences of how fluency is established in native and nonnative speech. It also reveals areas in which even highly advanced learners of English still deviate strongly from the native target norm and in which they have already approximated to it. Based on these findings, selected learners are subjected to native speakers’ ratings of seven perceptive fluency variables in order to test which variables are most responsible for a perception of oral proficiency on the sides of the listeners. Finally, language-pedagogical implications derived from these findings for the improvement of fluency in learner language are presented. This book is conceptually and methodologically relevant for corpus-linguistics, learner corpus research and foreign language teaching and learning.

  • Ivan Hernandez, and Jesse Lee Preston, “Disfluency disrupts the confirmation bias,” Journal of Experimental Social Psychology, vol. 49, no. 1, 01/2013 2013, pp. 178-182. DOI:

    Abstract One difficulty in persuasion is overcoming the confirmation bias, where people selectively seek evidence that is consistent with their prior beliefs and expectations. This biased search for information allows people to analyze new information in an efficient, but shallow way. The present research discusses how experienced difficultly in processing (disfluency) can reduce the confirmation bias by promoting careful, analytic processing. In two studies, participants with prior attitudes on an issue became less extreme after reading an argument on the issues in a disfluent format. The change occurred for both naturally occurring attitudes (i.e. political ideology) and experimentally assigned attitudes (i.e. positivity toward a court defendant). Importantly, disfluency did not reduce confirmation biases when participants were under cognitive load, suggesting that cognitive resources are necessary to overcome these biases. Overall, these results suggest that changing the style of an argument’s presentation can lead to attitude change by promoting more comprehensive consideration of opposing views.

    Keywords Attitude change, Confirmation bias, Fluency, Persuasion

  • Martina Jakesch, Helmut Leder, and Michael Forster, “Image Ambiguity and Fluency,” PLoS ONE, vol. 8, no. 9, 09/2013 2013, pp. e74084. DOI: 10.1371/journal.pone.0074084.

    Abstract Ambiguity is often associated with negative affective responses, and enjoying ambiguity seems restricted to only a few situations, such as experiencing art. Nevertheless, theories of judgment formation, especially the “processing fluency account”, suggest that easy-to-process (non-ambiguous) stimuli are processed faster and are therefore preferred to (ambiguous) stimuli, which are hard to process. In a series of six experiments, we investigated these contrasting approaches by manipulating fluency (presentation duration: 10ms, 50ms, 100ms, 500ms, 1000ms) and testing effects of ambiguity (ambiguous versus non-ambiguous pictures of paintings) on classification performance (Part A; speed and accuracy) and aesthetic appreciation (Part B; liking and interest). As indicated by signal detection analyses, classification accuracy increased with presentation duration (Exp. 1a), but we found no effects of ambiguity on classification speed (Exp. 1b). Fifty percent of the participants were able to successfully classify ambiguous content at a presentation duration of 100 ms, and at 500ms even 75% performed above chance level. Ambiguous artworks were found more interesting (in conditions 50ms to 1000ms) and were preferred over non-ambiguous stimuli at 500ms and 1000ms (Exp. 2a - 2c, 3). Importantly, ambiguous images were nonetheless rated significantly harder to process as non-ambiguous images. These results suggest that ambiguity is an essential ingredient in art appreciation even though or maybe because it is harder to process.

  • Tyler Kendall, Speech Rate, Pause and Sociolinguistic Variation. Basingstoke: Palgrave Macmillan.2013. DOI: 10.1057/9781137291448.0001.

    Abstract Speech Rate, Pause, and Sociolinguistic Variation examines the confluence of psycholinguistic factors and social factors in linguistic variation through corpus-based analyses of speech rate and silent pause in US English. In particular, based on a large amount of data extracted from a wide range of sociolinguistic interview recordings, it demonstrates the great extent to which articulation rates are correlated with social factors of speakers (such as regional origin and sex) while pause durations are less so. Through the development of new quantitative techniques, it considers the cognitive importance of variability in pauses and highlights new ways that speech features like these can be used to help understand the production of sociolinguistic variables. With detailed discussions of its data and methods, and with a helpful accompanying website, it makes a valuable guide for conducting one’s own corpus (socio)phonetic research.

  • Hanae Koiso, and Yasuharu Den, “Acoustic and linguistics features related to speech planning appearing at weak clause boundaries in Japanese monologs,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 37-40.

    Abstract In this paper, we focus on weak clause boundaries in Japanese monologs in order to investigate the relationship of the length of constituents following weak boundaries to three acoustic and linguistic features: 1) occurrence rate of fillers, 2) occurrence rate of boundary pitch movements, and 3) degree of lengthening of clause-final morae. We found that all these features were significantly correlated with the length of following constituents. Most importantly, boundary pitch movements had an additional effect that can be distinct from the effect of clause-final lengthening. These results suggest that Japanese speakers have earlier-occurring items that help them deal with cognitive load in speech planning, in addition to fillers and other clause-initial disfluencies.

    Keywords boundary pitch movements, clause-final lengthening, DiSS, fillers, Japanese monologs

  • Kikuo Maekawa, “Prediction of F0 height of filled pauses in spontaneous Japanese: a preliminary report,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 41-44.

    Abstract F0 values of filled pauses (FP) in the Corpus of Spontaneous Japanese were analyzed to examine the mechanism by which the F0 heights of FP were determined. Statistical analyses of the F0 values of FP occurring in between two full-fledged accentual phrases (AP) revealed correspondence between the occurrence timing of FP and the F0 height. Based upon this finding, 5 models of F0 prediction were proposed. Comparison of the mean prediction errors revealed that the best prediction was obtained in a model that linearly interpolate the phrase-final L% tone of the immediately preceding AP and the phrase-initial %L tone of the immediately following AP. This finding suggests that the F0 of FP was specified at the level of phonetic realization rather than phonological prosodic representation.

    Keywords DiSS

  • Takehiko Maruyama, “Analysis of parenthetical clauses in spontaneous Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 45-48.

    Abstract In this paper, I will discuss the functional aspects of parenthetical clauses and sentences in spontaneous Japanese monologues. Parentheticals can be defined as syntactic elements that are instantly inserted in the middle of an ongoing utterance to add supplemental information and thus interrupts the fluent flow of speech production. Examples of parenthetical clauses/sentences that appeared in the Corpus of Spontaneous Japanese were examined and then classified into three types. These types differ in their contextual functions, but share a commonality in that they present multiplex information simultaneously in the process of producing spontaneous speech.

    Keywords contextual functions, Corpus of Spontaneous Japanese, DiSS, parenthetical clause/sentence

  • Helena Moniz, Fernando Batista, Isabel Trancoso, and Ana Isabel Mata, “Automatic structural metadata identification based on multilayer prosodic information,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 49-52.

    Abstract This paper discriminates different types of structural metadata in transcripts of university lectures: boundary events (comma, full stops and interrogatives), and disfluencies (repair). The disambiguation process is based on predefined multilayered linguistic information and on its hierarchical structure. Since boundary events may share similar linguistic properties, in terms of F0 and energy slopes, presence/absence of silent pauses, and duration of different units of analysis, different classification methods based on a set of automatically derived prosodic features have been applied to differentiate between those events and disfluencies. This paper also performs a detailed analysis on the impact of each individual feature in discriminating each structural event. The results of our data-driven approach allow us to reach a structured set of basic features towards the disambiguation of metadata events. These results are a step forward towards the analysis of speech acts and their disambiguation from disfluencies.

    Keywords automatic speech processing, disfluencies, DiSS, speech prosody, structural metadata

  • Rena Nemoto, “Which kind of hesitations can be found in Estonian spontaneous speech?,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 53-54.

    Abstract This paper describes the acoustic characteristics of hesitations in Estonian spontaneous speech. We especially investigate duration, fundamental frequency, and first two formant analyses. Most frequent hesitations can be expressed by lengthened phonemes such as /ää/, /ee/, /õõ/, and /mm/. We compare lengthened phoneme hesitations with their related phonemes. The results from our preliminary hesitation study show (i) hesitations have longer duration and its range is spread; (ii) hesitations globally include lower pitch; (iii) hesitation formants are likely to be centralized or posterior and opened in comparison with related phonemes.

    Keywords DiSS, Estonian, hesitation, spontaneous speech

  • Sieb Nooteboom, and Hugo Quené, “Self-monitoring as reflected in identification of misspoken segments,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 55-57.

    Abstract Most segmental speech errors probably are articulatory blends of competing segments. Perceptual consequences were studied in listeners’ reactions to misspoken segments. 291 speech fragments containing misspoken initial consonants plus 291 correct control fragments, all stemming from earlier SLIP experiments, were presented for identification to listeners. Results show that misidentifications (i.e. deviations from an earlier auditory transcription) are rare (3%), but reaction times to correctly identified fragments systematically reflect differences between correct controls, undetected, early detected and late detected speech errors, leading to the following speculative conclusions: (1) segmental errors begin their life in inner speech as full substitutions, and competition with correct target segments often is slightly delayed; (2) in early interruptions speech is initiated before competing target segments are activated, but then rapidly interrupted after error detection; (3) late detected errors reflect conflict-based monitoring of articulation or monitoring overt speech.

    Keywords DiSS

  • Klim Peshkov, Laurent Prévot, Stéphane Rauzy, and Berthille Pallaud, “Catogorizing syntactic chunks for marking disfluent speech in French language,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 59-62.

    Abstract Disfluency is the first phenomenon one has to address when processing spontaneous speech. Efficient systems combining transcription-based and signal-based cues have been created for English. These systems generally use supervised machine learning models, trained over large annotated datasets combining signal and transcription. As for other languages, including French, the situation is complicated by the lack of resources. A few proposals based on filled pauses, truncated words and repetitions have been made for identifying disfluencies in French. In this paper, we propose a transcription-based approach to this task, with high-quality morpho-syntactic tags as input for identifying disfluent areas. Originally, we adopted a transcription-based approach for obtaining an independent way of characterizing disfluencies. This can be later compared and combined with prosodic cues. Our method consists in building syntactic chunks from our tagging and then classify these chunks into several categories, some of them being considered as disfluent. We apply our method to speaker style characterization, discourse genres zoning, as well as to dataset cleaning. Finally, an attempt is made to relate our disfluent chunks to a more standard description of disfluencies in order to open the way of a deeper integration of our work with the one of the disfluency community.

    Keywords chunking, disfluencies, DiSS, speaking style, tagging, transcription-based approach

  • Jorge Proença, Dirce Celorico, Arlindo Veiga, Sara Candeias, and Fernando Perdigão, “Acoustical characterization of vocalic fillers in European Portuguese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 63-66.

    Abstract This study attempts to acoustically characterize the most common filled pause vocalizations (or vocalic fillers) in spontaneous speech in European Portuguese: the near-open central vowel [ɐ] and the mid-central vowel [ə]. For this purpose we analyzed the spectral information of the vocalic fillers by estimating their first two formant frequencies as well as their duration properties. The vocalic fillers are taken from a large corpus of European Portuguese broadcast news’ speech. We also compared the vocalic fillers with lexical vowels possessing similar timbre. No formant variation trend was attained for the vocalic fillers and a great overlap of formant values is observed. These results provide a base of information for understanding the most common vocalic fillers in European Portuguese spontaneous speech.

    Keywords DiSS, filled pauses, formant estimation, hesitations, spontaneous speech, vocalic fillers

  • Ralph L. Rose, “Crosslinguistic Corpus of Hesitation Phenomena: A Corpus for Investigating First and Second Language Speech Performance,” in INTERSPEECH 2013, Lyon, France, 08/2013 2013, pp. 992-996.

    Abstract There is a growing consensus that there is a need to evaluate second language speech performance with respect to first language speech behavior. To support this need, the Crosslinguistic Corpus of Hesitation Phenomena was developed. This freely available corpus is designed to investigate the crosslinguistic influence of speech patterns and consists of recordings of speakers producing first and second language speech samples in response to parallel elicitation tasks in each language. Preliminary results from the corpus are consistent with other findings that second language performance is sometimes correlated with first language speech behavior. In particular, findings show that silent pause rate and duration as well as other hesitation phenomena correlate with first language performance while speech rate does not. Interestingly, repeats also differ from first language production. Results show that the corpus may be a useful tool for researchers who wish to investigate the correspondence between first and second language speech, particularly with respect to the use of hesitation phenomena.

    Keywords corpus, hesitation phenomena, second language speech

  • Vered Silber-Varod, and Takehiko Maruyama, “The linguistic role of hesitation disfluencies: evidence from Hebrew and Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 67-70.

    Abstract In this paper we examine a certain aspect of prosodysyntax interface, that of hesitation disfluencies (HD) that occur intra-phrases or intra-morphemes. Such cases were found in two spontaneous corpora of two syntactically distinct languages – Israeli Hebrew (IH) and Japanese. It was found that intra-phrasal hesitations in the two languages calls for different explanations, since in Japanese the noun (e.g., in NP) precedes the case marking particle while in IH the preposition (e.g., in PP) precedes the noun. In this paper we will present qualitative findings and suggest a unified view of the phenomenon of intra-phrasal HDs.

    Keywords DiSS, hesitation disfluency, Israeli Hebrew, Japanese, prosody-syntax interface

  • Michiko Watanabe, “Phrasal complexity and the occurrence of filled pauses in presentation speeches in Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 71-72.

    Abstract Filled pauses are ubiquitous in everyday speech. I investigated whether linguistic complexity of upcoming phrases affects filler rate at phrase boundaries in presentation speeches in Japanese. Filler rate at phrase boundaries increased monotonically with complexity of the following phrases. However, when the following phrase was composed of more than 11 Bunsetsu-phrases, the filler rate did not show any constant increase. The results indicate that filler rate at phrase boundaries is closely related to cognitive load of local linguistic encoding and that the maximum planning span for linguistic encoding is about 10 Bunsetsu-phrases in Japanese monologues.

    Keywords bunsetsu-phrase, DiSS, filled pause, linguistic complexity, planning load

  • Charlotte Wollermann, Eva Lasarcyk, Ulrich Schade, and Bernhard Schröder, “Disfluencies and uncertainty perception - evidence from a human-machine scenario,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 73-76.

    Abstract This paper deals with the modelling and perception of disfluencies in articulatory speech synthesis. The stimuli are embedded into short dialogues in question-answering situations in a human–machine scenario. The system is supposed to express uncertainty in the answer. We test the influence of delay, intonation, and filler as prosodic indicators of uncertainty on perception in two studies. Study 1 deals with the effect of delay and filler on uncertainty perception. Results suggest an additive effect of the cues, i.e. the activation of both prosodic cues of uncertainty has a stronger impact on uncertainty perception than the deactivation of a single cue or of both cues. With respect to the effect of single cues, no significant difference can be observed. Study 2 investigates the impact of delay and intonation on perceived uncertainty. Again, a principle of additivity can be observed. Furthermore as modelled here, intonation has a stronger influence than delay. In both studies no correlation between the ranking of uncertainty and naturalness of the stimuli is found.

    Keywords disfluencies, DiSS, speech perception, speech synthesis, uncertainty

  • Luke Jai Wood, Kerstin Dautenhahn, Austen Rainer, Ben Robins, Hagen Lehmann, and Dag Sverre Syrdal, “Robot-Mediated Interviews - How Effective Is a Humanoid Robot as a Tool for Interviewing Young Children?,” PLoS ONE, vol. 8, no. 3, 03/2013 2013, pp. e59448. DOI: 10.1371/journal.pone.0059448.

    Abstract Robots have been used in a variety of education, therapy or entertainment contexts. This paper introduces the novel application of using humanoid robots for robot-mediated interviews. An experimental study examines how children’s responses towards the humanoid robot KASPAR in an interview context differ in comparison to their interaction with a human in a similar setting. Twenty-one children aged between 7 and 9 took part in this study. Each child participated in two interviews, one with an adult and one with a humanoid robot. Measures include the behavioural coding of the children’s behaviour during the interviews and questionnaire data. The questions in these interviews focused on a special event that had recently taken place in the school. The results reveal that the children interacted with KASPAR very similar to how they interacted with a human interviewer. The quantitative behaviour analysis reveal that the most notable difference between the interviews with KASPAR and the human were the duration of the interviews, the eye gaze directed towards the different interviewers, and the response time of the interviewers. These results are discussed in light of future work towards developing KASPAR as an ‘interviewer’ for young children in application areas where a robot may have advantages over a human interviewer, e.g. in police, social services, or healthcare applications.


  • Hans Rutger Bosker, Anne-France Pinget, Hugo Quené, Ted Sanders, and Nivja H. de Jong, “What makes speech sound fluent? The contributions of pauses, speed and repairs,” Language testing, vol. 30, no. 2, 04/2013 2012, pp. 159-175. DOI: 10.1177/0265532212455394.

    Abstract The oral fluency level of an L2 speaker is often used as a measure in assessing language proficiency. The present study reports on four experiments investigating the contributions of three fluency aspects (pauses, speed and repairs) to perceived fluency. In Experiment 1 untrained raters evaluated the oral fluency of L2 Dutch speakers. Using specific acoustic measures of pause, speed and repair phenomena, linear regression analyses revealed that pause and speed measures best predicted the subjective fluency ratings, and that repair measures contributed only very little. A second research question sought to account for these results by investigating perceptual sensitivity to acoustic pause, speed and repair phenomena, possibly accounting for the results from Experiment 1. In Experiments 2–4 three new groups of untrained raters rated the same L2 speech materials from Experiment 1 on the use of pauses, speed and repairs. A comparison of the results from perceptual sensitivity (Experiments 2–4) with fluency perception (Experiment 1) showed that perceptual sensitivity alone could not account for the contributions of the three aspects to perceived fluency. We conclude that listeners weigh the importance of the perceived aspects of fluency to come to an overall judgment.

    Keywords Fluency perception, pauses, perceptual sensitivity, repair, speed

  • Troy Cox, and Wendy Baker-Smemoe, “The relationship between L1 fluency and L2 fluency across different proficiency levels and L1s,” November 2012.

    Abstract Our understanding of oral temporal fluency (i.e., speech rate, pauses, and hesitations) in a second language (L2) has increased greatly in the past several years, along with our understanding of its relationship to overall proficiency, language processing, and automaticity (i.e., Brand & Götz, 2011; Segalowitz, 2007). However, the role of the speaker’s fluency in their native language (L1) on L2 fluency is still not understood. Few studies have examined this relationship, and these studies have examined few L1/L2 relationships across few proficiency levels (Scanlon, 1987; Derwing et al., 2009). Thus, the influence of L1 fluency on L2 fluency development is still unclear. The purpose of this study is to determine the effect of native language (L1) fluency and L2 proficiency level on features of L2 temporal fluency. Over one hundred English as a second language (ESL) students participated from five L1 backgrounds (Chinese, Japanese, Korean, Spanish, Portuguese) and 9 proficiency levels (novice high to advanced high on the ACTFL scale). Participants were asked to describe 4 pictures stories, 2 in their L1 and 2 in their L2. Several fluency measures including unfilled pauses, speech rate, and articulation rate were analyzed using the Praat script described in de Jong and Wempe (2007). These fluency measures in the L1 were compared to those in the L2. The results of this analysis revealed that all features were highly correlated across the two languages, that these correlations were stronger for lower than higher proficiency speakers, and that differences in the number and type of pauses, as well as speaking rate, differed across L1s. These results suggest that fluency reveals more than processing constraints aggregated by learning an L2, and suggest that measuring L1 fluency is important in any investigation of L2 fluency.

  • Nivja De Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, and Jan H. Hulstijn, “Facets of Speaking Proficiency,” Studies in Second Language Acquisition, vol. 34, no. 1, March 2012, pp. 5-34. DOI: 10.1017/S0272263111000489.

    Abstract This study examined the componential structure of second-language (L2) speaking proficiency. Participants—181 L2 and 54 native speakers of Dutch—performed eight speaking tasks and six tasks tapping nine linguistic skills. Performance in the speaking tasks was rated on functional adequacy by a panel of judges and formed the dependent variable in subsequent analyses (structural equation modeling). The following independent variables were assessed separately: linguistic knowledge in two tests (vocabulary and grammar); linguistic processing skills (four reaction time measures obtained in three tasks: picture naming, delayed picture naming, and sentence building); and pronunciation skills (speech sounds, word stress, and intonation). All linguistic skills, with the exception of two articulation measures in the delayed picture naming task, were significantly and substantially related to functional adequacy of speaking, explaining 76% of the variance. This provides substantial evidence for a componential view of L2 speaking proficiency that consists of language-knowledge and language-processing components. The componential structure of speaking proficiency was almost identical for the 40% of participants at the lower and the 40% of participants at the higher end of the functional adequacy distribution (n = 73 each), which does not support Higgs and Clifford’s (1982) relative contribution model, predicting that, although L2 learners become more proficient over time, the relative weight of component skills may change.

  • Ian R. Finlayson, and Martin Corley, “Disfluency in dialogue: an intentional signal from the speaker?,” Psychonomic Bulletin & Review, vol. 19, no. 5, October 2012, pp. 921-928. DOI: 10.3758/s13423-012-0279-x.

    Abstract Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use disfluency in dialogue to manage listeners’ expectations? To address this question, we present two experiments investigating the production of disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue.

  • Jordi Adell, David Escudero, and Antonio Bonafonte, “Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence,” Speech Communication, vol. 54, no. 3, 2012, pp. 459-476. DOI:

    Abstract Until now, speech synthesis has mainly involved reading-style speech. Today, however, text-to-speech systems must provide a variety of styles because users expect these interfaces to do more than just read information. If synthetic voices must be integrated into future technology, they must simulate the way people talk instead of the way people read. Existing knowledge about how disfluencies occur has made it possible to propose a general framework for synthesising disfluencies. We propose a model based on the definition of disfluency and the concept of underlying fluent sentences. The model incorporates the parameters of standard prosodic models for fluent speech with local modifications of prosodic parameters near the interruption point. The constituents of the local models for filled pauses are derived from the analysis corpus, and constituent’s prosodic parameters are predicted via linear regression analysis. We also discuss the implementation details of the model when used in a real speech synthesis system. Objective and perceptual evaluations showed that the proposed models outperformed the baseline model. Perceptual evaluations of the system showed that it is possible to synthesise filled pauses without decreasing the overall naturalness of the system, and users stated that the speech produced is even more natural than the one produced without filled pauses.

    Keywords Perceptual evaluation

  • Ralph L. Rose, “On the lexical status of filled pauses: Seeing ’uh’ and ’um’ as words,” 2012.

    Abstract Filled pauses (FPs: e.g., English uh/um, Japanese e-(to)) occur frequently in everyday communication. However, the exact linguistic status of FPs has been the subject of some debate. Some researchers have argued that FPs are words, with the same lexical status as such interjections as well or oh (Clark and Fox Tree 2002), or at least word-like in that they can be used in a controlled fashion (Villar et al 2012). However, others have argued that the evidence is inconclusive and that FPs can be regarded as resulting automatically from cognitive processes (Corley and Stewart 2008). I argue that FPs are words based on facts showing the systematic and distinctive use of FPs in speech corpora (Kjellmer, 2003), and particularly in a corpus of blog writings (Rose 2011). Evidence from these corpora show that FPs are used, among other ways, to highlight unexpected or unusual words and phrases (e.g., "Jan Wenner’s famous pub has gone, um, gaga for [Lady] Gaga.").

  • Gina Villar, Joanne Arciuli, and David Mallard, “Use of "um" in the deceptive speech of a convicted murderer,” Applied Psycholinguistics, vol. 33, no. 1, January 2012, pp. 83-95. DOI: 10.1017/S0142716411000117.

    Abstract Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data revealed significantly fewer instances of "um" in deceptive speech. These results are in line with our recent study of "um" in laboratory elicited low-stakes lies. Rather than constituting a filled pause or speech disfluency, "um" may have a lexical status similar to other English words and may be under the strategic control of the speaker. In an attempt to successfully deceive, humans may alter their speech, perhaps in order to avoid certain language behaviors that they think might give them away.


  • Karin Aijmer, “"Well I’m not sure I think…" The use of "well" by non-native speakers,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 231-254. DOI: 10.1075/ijcl.16.2.04aij.

    Abstract Pragmatic markers are an important part of the grammar of conversation and not simply markers of disfluency. They have a number of functions that help the speaker to organise the conversation and to express feelings and attitudes. Advanced EFL learners use frequent pragmatic markers such as well. However their use of well diverges from the native speaker norm. The present study uses data from the Swedish component of the LINDSEI corpus and its native speaker counterpart (LOCNEC) to examine similarities and differences between native and non-native speakers. The overall picture is that Swedish learners overuse well, although there are considerable individual differences. Thus learners use well above all as a fluency device to cope with speech management problems but underuse it for attitudinal purposes. Pragmatic markers cannot be taught in the same way as other lexical items but it is important to discuss how and where they are used.

    Keywords language teaching, learner corpora, non-native speaker, pragmatic marker, well

  • Christiane Brand, and Sandra Götz, “Fluency versus accuracy in advanced spoken learner language: A multi-method approach,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 255-275. DOI: 10.1075/ijcl.16.2.05bra.

    Abstract In this paper we present a possible multi-method approach towards the description of a potential correlation between errors and temporal variables of (dys-)fluency in spoken learner language. Using the German subcorpus of the Louvain International Database of Spoken English Interlanguage (LINDSEI) and the native control corpus Louvain Corpus of Native English Conversation (LOCNEC), we first analysed errors and temporal variables of fluency quantitatively. We detected lexical and grammatical categories which are especially error-prone as well as problematic aspects of fluency for all learners in the LINDSEI subcorpus, e.g. confusion in tense agreement across clauses or an overuse of unfilled pauses. In the ensuing qualitative analysis of five prototypical learners, no trend for a possible correlation of accuracy and fluency could be observed. Fifty native speakers’ ratings of these five learners revealed that the learner with an average performance across the investigated variables received the highest ratings for overall oral proficiency.

    Keywords accuracy, error analysis, errors, Fluency, learner corpus, LINDSEI

  • Nivja De Jong, “Cross-linguistic differences in pausing behavior,” December 2011.

    Abstract Pauses in speech can serve communicative means, to help listeners understand (Clark, 1994), and pauses can be due to cognitive factors, when a speaker has not finished planning and formulating the upcoming utterance (Howell & Au-Yeung, 2002). In theories of speech production, lexical concepts are seen as the basic units of planning. If this holds for all languages, one would predict that for an agglutinative language such as Turkish, units of planning can be larger than for a non-agglutinative language such as English. Following this reasoning, speakers of Turkish would have fewer opportunities to pause than speakers of English. This hypothesis is tested by comparing speech data of Turkish and English native speakers. Twenty-four Turkish speakers and twenty-nine English speakers performed eight speaking tasks. These tasks were long turns in simulated conversation. In total, nine hours of Turkish and English speech were annotated, adding information about frequency and duration of silent pauses (as well as other hesitation phenomena). The results showed that Turkish words are indeed longer in number of syllables and in duration. Furthermore, speakers hardly paused within words, confirming the hypothesis that lexical items form the basis of units-of-speech. Finally, Turkish speakers paused less often than English speakers, but when they paused the duration of these pauses was longer. In total, percentage of time spent pausing did not differ for the Turkish and English speakers. We conclude that usage of pauses due to cognitive factors is dependent on typological features of languages, leading to cross-linguistic differences in pausing behavior.

  • Tyko Dirksmeyer, “Lexical hesitation marking in Chintang: Evidence for fillers as words,” December 2011.

    Abstract The status of hesitation markers (or ‘fillers’, ‘filled pauses’, ‘editing expressions’, etc. — such as uh(m) in English) has been fiercely disputed in various subdisciplines of the language sciences over the past decades. | Should these items be viewed as aberrations in performance that need to be excluded from linguistic analysis (e.g. Chomsky 1965), are they symptoms of speech production processes that signal trouble but do not signify anything beyond that (Goldman-Eisler 1968; Levelt 1989), or are they actively employed as communicative means just like other words are (Clark and Fox Tree 2002; Jefferson 1974; Schegloff 2010), and thus form an integral part of language? | Chintang, a Tibeto-Burman language spoken in two villages in Nepal, provides evidence for the latter view. Its principal hesitation marker me~ı occurs in the same range of functional environments — word search, self-repair, prefacing dispreferred turns, among others — in which uh(m) appears in English (and similar forms feature in other wellknown languages). Yet, me~ı demonstrably conforms to standard phonological, morphosyntactic and semantic criteria for wordhood, can be seamlessly integrated into utterances, and is regularly exploited for communicative purposes such as "floor management" and projecting what to expect next. | In this talk, I will review data drawn from a corpus of video-recorded naturallyoccurring conversational interaction in Chintang and argue for the profoundly conventional nature of hesitation marking with me~ı. The findings from this small, as-yet-understudied speech community indicate that fillers should indeed be treated as lexical items on a par with other words. Consequently, they call on linguistic theorizing not only to take hesitation marking and its communicative functions in conversational speech seriously, but also to embrace and incorporate typological diversity in order to arrive at truly generalizable models of language processing.

  • Gaëtanelle Gilquin, and Sylvie De Cock, “Errors and disfluencies in spoken corpora: Setting the scene,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 141-172. DOI: 10.1075/ijcl.16.2.01gil.

    Abstract (none)

  • John Osborne, “Fluency, complexity and informativeness in native and non-native speech,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 276-298. DOI: 10.1075/ijcl.16.2.06osb.

    Abstract Individual speakers vary considerably in their rate of speech, their syntactic choices, and the organization of information in their discourse. This study, based on a corpus of monologue productions from native and non-native speakers of English and French, examines the relations between temporal fluency, syntactic complexity and informational content. The purpose is to identify which features, or combinations of features, are common to more fluent speakers, and which are more idiosyncratic in nature. While the syntax of fluent speakers is not necessarily more complex than that of less fluent speakers, it is suggested that they are able to deliver content more efficiently through a combination of less hesitant speech and of lexical and syntactic choices that allow them to package information more economically.

    Keywords Fluency, information content, learner corpora, lexical bundles, syntactic complexity

  • Anne-France Pinget, “Native Speakers’ Perceptions of Fluency and Accent in L2 Speech,” Master's Thesis, Utrecht University, Utrecht, the Netherlands, . June 2011.

    Abstract The goal of this study is threefold. It is aimed at exploring (i) the relationship between objective properties of speech and perceived fluency, (ii) the relationship between segmental characteristics of speech and perceived accent, and (iii) the relationship between fluency and accent. We collected 90 speech samples from Turkish and English L2 learners of Dutch. Objective measures of fluency and accent were made for each sample. Forty untrained native speakers of Dutch rated the samples for fluency and accentedness. The results showed that the temporal measures of fluency were good predictors of fluency ratings, and that their predictive power depends on the type of measures used (i.e. traditional measures per time units, measures per information units, measures that take the L1 into consideration). Furthermore, the segmental measure of accent could predict a small part of accent ratings. Finally, perceived fluency and accent appeared to be weakly correlated, but objective measures of fluency and accent did not add additional explanatory power to the models of perceived accent and perceived fluency respectively.

    Keywords accent, Fluency, perception, second language acquisition

  • Ralph L. Rose, “Filled Pauses in Writing: What can they Teach us about Speech?,” December 2011.

    Abstract This presentation reports on a research effort to use filled pauses ('uh', 'um': hereafter, FPs) in blog writings to better understand how and why speakers use them in spontaneous speech. Blog FPs are written intentionally and cannot be the result of some linguistic processing shortcoming (i.e., speech-repair as in Levelt, 1983). Hence, if written FPs can be accurately characterized, then the spoken FPs that fit this characterization can be removed from consideration leaving a smaller, potentially more uniform set of other FPs for further study. | Samples of FPs in blog writings were gathered from 100 top blogs. Samples of FPs in spontaneous speech were taken from the Switchboard corpus. A balanced sample of 227 FPs were gathered of each type. Each FP was categorized according to its medium (written or spoken), its location (at clause boundary or clause-internal), the part-of-speech of the immediately following word (content or function, following Maclay and Osgood's 1959 classification), and the FP type (open 'uh' or closed 'um', after Rose, 1998). The data was analyzed under a generalized linear model with chi-square tests. | There was a main effect of FP Type (Chi-square=48.4, p<0.001) with a ratio of open to closed FPs of approximately 2:1. This is comparable to previous studies (e.g., Rose, 1998). There were no other main effects. There was an interaction between medium and following word type (Chi-square=37.0, p<0.001), as well as between medium and FP type (Chisquare=5.4, p<0.05). In the spoken medium, the following word was 30% more likely to be a function word than a content word, while in the written medium, this trend reversed: the following word was 70% more likely to be a content word than a function word. Also, in the spoken medium, the ratio of open to closed FPs was almost 3:1, but in the written medium, this ratio dropped to 1.4:1. | Results from FPs in writing suggest a hybrid view of FPs in speech: Some FPs are used intentionally and with some selectional restrictions (i.e., before content words) in order to serve some pragmatic function (cf., filler-as-word hypothesis in Clark and Fox Tree, 2002), with open FPs being slightly preferred in this role. Other FPs in speech are the result of difficulties during linguistic processing and occur semi-automatically as part of speech repair (cf., Levelt, 1983).

  • Christoph Rühlemann, Andrej Bagoutdinov, and Matthew Brook O’Donnell, “Windows on the mind: Pauses in conversational narrative,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 198-230. DOI: 10.1075/ijcl.16.2.03ruh.

    Abstract This paper investigates four different types of pauses in conversational narrative: the filled pauses er and erm, and short and long silent pauses. The study is based on the Narrative Corpus (NC), a recently created corpus of everyday narratives. The texts, which include both the narrative and some context, have been annotated for important textual components. The current analysis reveals that pauses are more frequent in conversational narrative than in general conversation. We suggest three factors that account for this high frequency: (i) the need for narrators, in the opening utterance of the story, to provide specific information to orient listeners to the situation in which the events unfolded, (ii) the need to coordinate narrative clauses to match the story events, and (iii) the preference of narrators to present speech, thought, emotion and gesture using direct-mode discourse presentation, which is more "dramatic" but also more costly in terms of reference resolution.

    Keywords discourse presentation, narrative, narrative corpus, pauses, quotatives, Reference

  • Scott H. Fraundorf, and Duane G. Watson, “The disfluent discourse: Effects of filled pauses on recall,” Journal of Memory and Language, vol. 65, no. 2, 2011, pp. 161-175. DOI:

    Abstract We investigated the mechanisms by which fillers, such as uh and um, affect memory for discourse. Participants listened to and attempted to recall recorded passages adapted from Alice’s Adventures in Wonderland. The type and location of interruptions were manipulated through digital splicing. In Experiment 1, we tested a processing time account of fillers’ effects. While fillers facilitated recall, coughs matched in duration to the fillers impaired recall, suggesting that fillers’ benefits cannot be attributed to adding processing time. In Experiment 2, fillers’ locations were manipulated based on norming data to be either predictive or non-predictive of upcoming material. Fillers facilitated recall in both cases, inconsistent with an account in which listeners predict upcoming material using past experience with the distribution of fillers. Instead, these results suggest an attentional orienting account in which fillers direct attention to the speech stream but do not always result in specific predictions about upcoming material.

    Keywords Language comprehension

  • Parvaneh Tavakoli, “Pausing patterns: differences between L2 learners and native speakers,” ELT Journal, vol. 65, no. 1, May 2011, pp. 71-79. DOI: 10.1093/elt/ccq020.

    Abstract This paper reports on a comparative study of pauses made by L2 learners and native speakers of English while narrating picture stories. The comparison is based on the number of pauses and total amount of silence in the middle and at the end of clauses in the performance of 40 native speakers and 40 L2 learners of English.1 The results of the quantitative analyses suggest that, although the L2 learners generally pause more repeatedly and have longer periods of silence than the native speakers, the distinctive feature of their pausing pattern is that they pause frequently in the middle of clauses rather than at the end. The qualitative analysis of the data suggests that some of the L2 learners’ mid-clause pauses are associated with processes such as replacement, reformulation, and online planning. Formulaic sequences, however, contain very few pauses and therefore appear to facilitate the learners’ fluency.

  • Gunnel Tottie, “"Uh" and "Um" as sociolinguistic markers in British English,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 173-197. DOI: 10.1075/ijcl.16.2.02tot.

    Abstract This study is based on the British National Corpus (BNC) and also takes data from the London-Lund Corpus (LLC) into account. It shows that the so-called filled pauses er/uh and erm/um are sociolinguistic markers that differentiate between registers of English and along gender, age and socio-economic class. Men, older people and educated speakers use more fillers than women, younger speakers and less educated speakers. Nasalization is used more often by women, younger speakers and more educated speakers. These sociolinguistic factors can probably partly explain the fact that the use of fillers is higher in the LLC and the context-governed sample of the BNC than in the demographic sample of the BNC. It is argued that a more positive view should be taken of fillers as planning signals, or planners, and that their functions should be submitted to careful discourse analytic study. Their recognition as words will facilitate such an undertaking.

    Keywords corpus linguistics, Discourse markers, disfluency, filled pauses, hesitation markers, sociolinguistic markers


  • April Ginther, Slobodanka Dimova, and Rui Yang, “Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring,” Language Testing, vol. 27, no. 3, 06/2010 2010, pp. 379-399. DOI: 10.1177/0265532210364407.

    Abstract Information provided by examination of the skills that underlie holistic scores can be used not only as supporting evidence for the validity of inferences associated with performance tests but also as a way to improve the scoring rubrics, descriptors, and benchmarks associated with scoring scales. As fluency is considered a critical, perhaps foundational, component of speaking proficiency, temporal measures of fluency are expected to be strongly related to holistic ratings of speech quality.This study examines the relationships among selected temporal measures of fluency and holistic scores on a semi-direct measure of oral English proficiency. The spoken responses of 150 respondents to one item on the Oral English Proficiency Test (OEPT) were analyzed for selected temporal measures of fluency. The examinees represented three first language backgrounds (Chinese, Hindi, and English) and the range of scores on the OEPT scale. While strong and moderate correlations between OEPT scores and speech rate, speech time ratio, mean length of run, and the number and length of silent pauses were found, fluency variables alone did not distinguish adjacent levels of the OEPT scale. Temporal measures of fluency may reasonably be selected for the development of automated scoring systems for speech; however, identification of an examinee’s level remains dependent on aspects of performance only partially represented by fluency measures.

    Keywords automated scoring, Fluency, oral English proficiency

  • Joanne Arciuli, David Mallard, and Gina Villar, “"Um, I can tell you’re lying": Linguistic markers of deception versus truth-telling in speech,” Applied Psycholinguistics, vol. 31, no. 03, 2010, pp. 397-411. DOI: 10.1017/s0142716410000044.

    Abstract Lying is a deliberate attempt to transmit messages that mislead others. Analysis of language behaviors holds great promise as an objective method of detecting deception. The current study reports on the frequency of use and acoustic nature of and during laboratory-elicited lying versus truth-telling. Results obtained using a within-participants false opinion paradigm showed that instances of occur less frequently and are of shorter duration during lying compared to truth-telling. There were no significant differences in relation to These findings contribute to our understanding of the linguistic markers of deception behavior. They also assist in our understanding of the role of in communication more generally. Our results suggest that may not be accurately conceptualized as a filled pause/hesitation or speech disfluency/error whose increased usage coincides with increased cognitive load or increased arousal during lying. It may instead carry a lexical status similar to interjections and form an important part of authentic, effortless communication, which is somewhat lacking during lying.

  • Rachel Baker, and Valerie Hazan, “LUCID: a corpus of spontaneous and read clear speech in British English,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 3-6.

    Abstract This paper describes LUCID, the London UCL Clear Speech in Interaction Database, which contains spontaneous and read speech in clear and casual speaking styles for 40 Southern British English speakers. The problem-solving task used to collect the spontaneous speech, the DiapixUK task, is also described, along with ways of using the task to elicit different types of clear speech without explicit instruction, e,g. using different ‘barriers’ to communication. Applications of the corpus and of the task materials for future research projects are discussed. The corpus and materials will be available online to the research community at the end of the project.

    Keywords clear speech, DiSS, interaction, Speech production, spontaneous speech

  • Catia Cucchiarini, Joost van Doremalen, and Helmer Strik, “Fluency in non-native read and spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 15-18.

    Abstract Various studies have investigated the temporal aspects of nonnative speech and their relation to perceived fluency, because fluency constitutes an important aspect of second language proficiency. For this purpose it is important to determine which measures are most strongly correlated with perceived fluency and how these measures vary. In the present study objective measures related to perceived fluency were calculated for read and spontaneous speech of non-native speakers of Dutch. The results indicate that the objective measures vary as a function of different variables. Suggestions are made for future investigations so as to facilitate comparisons between studies and meta-analyses.

    Keywords DiSS, Fluency, non-native speech, temporal measures

  • Anne Cutler, Holger Mitterer, Susanne Brouwer, and Annelie Tuinman, “Phonological competition in casual speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 43-46.

    Abstract The natural processes affecting spontaneous speech production and the natural processes of spoken-word recognition combine to cause significant activation of irrelevant lexical competitors. Using eye-tracking, we show that reduced forms of words that occur in casual speech cause listeners to activate lexical candidates that resemble the reduced form but are quite unlike the canonical form of the intended word. In L2, the problem is worse: casual speech processes that occur in the L2 but not in the L1 lead to activation of irrelevant competitors even where native listeners experience no such competition.

    Keywords competition, DiSS, eyetracking, word recognition

  • Dale J. Barr, and Mandana Seyfeddinipur, “The role of fillers in listener attributions for speaker disfluency,” Language and Cognitive Processes, vol. 25, no. 4, 2010, pp. 441-455. DOI: 10.1080/01690960903047122.

    Abstract When listeners hear a speaker become disfluent, they expect the speaker to refer to something new. What is the mechanism underlying this expectation? In a mouse-tracking experiment, listeners sought to identify images that a speaker was describing. Listeners more strongly expected new referents when they heard a speaker say um than when they heard a matched utterance where the um was replaced by noise. This expectation was speaker-specific: it depended on what was new and old for the current speaker, not just on what was new or old for the listener. This finding suggests that listeners treat fillers as collateral signals.

    Keywords common ground, Dialogue, Disfluency, fillers, Perspective taking

  • Robert Eklund, “The effect of directed and open disambiguation prompts in authentic call center data on the frequency and distribution of filled pauses and possible implications for filled pause hypotheses and data collection methodology,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 23-26.

    Abstract This paper studies the frequency and distribution of filled pauses (FPs) in ecologically valid data where unaware and authentic customers called in to report problems with their telephony and/or Internet services and were met by a novel Wizard-of-Oz paradigm using real call center agents as wizards. The data analyzed were caller utterances following a directed or an open disambiguation prompt. While no significant differences in FP production were observed as a function of prompt type, FP frequency was found to be considerably higher than what is usually reported in the literature. Moreover, a higher proportion of utterance-initial FPs than normally reported was also observed. The results are compared to previously reported FP frequencies. Potential implications for data collection methodology are discussed.

    Keywords call center, data collection, dialog systems, directed prompts, DiSS, filled pauses, many-options, open prompts, speech planning, Speech production, Wizard-of-Oz, WOZ

  • Ian R. Finlayson, Robin J. Lickley, and Martin Corley, “The influence of articulation rate, and the disfluency of others, on one’s own speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 119-122.

    Abstract Disfluencies are a regular feature of spontaneous speech, and much has been learnt about the effects of various linguistic factors on their production. Speech usually occurs within dialogue, yet little is known about the influence of an interlocutor’s speech on a speaker’s own fluency. It has been shown that speakers tend to align on various levels, converging, for example, on lexical, and syntactic levels. But we know little about convergence in rate of speech or disfluency. Little is also known about the effects of speech rate on fluency in a speaker’s own speech. In this paper, we examine these effects through analysis of speech rate, hesitation and error correction in a corpus of task-oriented dialogues (the HCRC Map Task Corpus). Our findings demonstrate that different types of disfluencies can be influenced in different ways by speech rate. Furthermore, the probability of an interlocutor being disfluent appears to affect the speaker’s own likelihood, raising the possibility that interlocutors may “align” on disfluent, as well as fluent, speech.

    Keywords accommodation theory, alignment, articulation rate, Dialogue, DiSS

  • Anne Garcia-Fernandez, Ioana Vasilescu, and Sophie Rosset, “euh as cue for speaker confidence and word searching in human spoken answers in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 79-80.

    Abstract This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question answering automatic systems.

    Keywords DiSS, feeling of knowing, interaction management, QA systems, rephrasing, vocalic hesitation

  • Jean-Philippe Goldman, Mathieu Avanzi, and Antoine Auchlin, “Hesitations in read vs. spontaneous French in a multi-genre corpus,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 101-104.

    Abstract This study is a part of an on-going work whose goal is the prosodic characterization of various speaking styles in a multi-genre 70-minutes French corpus as well as the development of prosodic automatic detection tools. In this corpus, a manual annotation prominences and disfluencies like hesitations and syntactic ruptures is used to show evident phonological aspects of hesitation in regard to quality, pause position and proximity to syntactic rupture.

    Keywords disfluencies, DiSS, filled pause, hesitation, spoken French, vowel lengthening

  • Joakim Gustafson, and Daniel Neiberg, “Prosodic cues to engagement in non-lexical response tokens in Swedish,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 63-66.

    Abstract This paper investigates the prosodic patterns of non-lexical response tokens in a Swedish call-in radio show. The feedback of a professional speaker was investigated to give insight in how to build a simulated active listener that could encourage its users to continue talking. Possible domains for such systems include customer care and second language learning. The prosodic analysis of the non-lexical response tokens showed that the engagement level decreases over time. Prosodic cues to this include change in syllabicity, pitch slope and loudness. We have also investigated prosodic alignment, to see to what extent the active listener mimic the prosody of the callers in his non-lexical response tokens.

    Keywords DiSS, listener responses, prosodic alignment, prosodic cues, turn management

  • Corinna Harwardt, “Investigating the COG ratio as feature for speaker verification on high-effort speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 35-38.

    Abstract Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%. When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

    Keywords center of gravity ratio, DiSS, speaker recognition, vocal effort

  • Valerie Hazan, and Rachel Baker, “Does reading clearly produce the same acoustic-phonetic modifications as spontaneous speech in a clear speaking style?,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 7-10.

    Abstract This paper describes an acoustic-phonetic comparison of casual and clear speech styles elicited in read and spontaneous speech. For the spontaneous speech, 20 pairs of English talkers were recorded doing a problem-solving picture task in good and degraded listening conditions. Each person also read sentences in casual and clear styles. The read clear speech was an exaggerated form of clear speech relative to the spontaneous clear speech: it had higher median F0 in both styles, a greater increase in F0 range and greater decrease in speaking rate between casual and clear styles, and trends towards greater vowel space expansion.

    Keywords acoustic-phonetic characteristics, clear speech, DiSS, interaction, read speech, spontaneous speech

  • Pei-Yu Hsieh, “Pitch patterns in the vocalization of a 3-month-old Taiwanese infant,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 93-96.

    Abstract This paper studied pitch contours of a Taiwanese-acquiring infant at gooing stage. Breath group theory has shown that pitch patterns of this stage were physiologically-based [6]. Fall was expected to occur at the boundary of a breath group. It predicted that Fall to be the most common pitch contour, and the second high was Rise-Fall. But previous studies [8], [9] showed that Rise-Fall occurred more. We investigated patterns of an infant from six weeks old to twelve weeks old. Mean f0 of basic contours of this stage were also shown. The f0 range of Level, Fall, and Rise were reported. Our results showed four types of contours (Level, Fall, Rise, Rise-Fall) appearing at this stage. Consistent with the hypothesis, Fall was found to be most common. Rise-Fall was found to be the second high. Fall and Rise-Fall made up to almost seventy percent. Level contour was found to be rare. The mean f0 of the infant at 3-month old was 400 Hz, higher than that of a toddler at 1;3 (370 Hz) and that of an adult (220 Hz). The f0 range was 700 Hz, greater than that of a toddler at 1;3 (450 Hz), and an adult (300 Hz).

    Keywords acquisition, DiSS, pitch, vocalization

  • Tomohito Ishikawa, “Coding disfluency phenomena for a fluency measure in TBLT research,” Journal of Soka Women’s College, vol. 40, March 2010, pp. 101-130.

    Abstract The aim of this article is to describe coding steps for a disfluency measure employed in Ishikawa (2008a, b). According to Ellis and Barkhuizen (2005), fluency measures can be divided into two major categories. One is related to speed of speaking (i.e., temporal variables) and the other is related to repair fluency. In the sections to follow, I will first describe Shriberg’s classification system of disfluency. After the description of Shriberg’s classification system, I will describe an L2 disfluency measure used in Ishikawa (2008a, b).

  • Yuichi Ishimoto, and Mika Enomoto, “Analysis of prosodic features for end-of-utterance prediction in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 97-100.

    Abstract In this study, we analyzed prosodic features of accentual phrases and investigated their temporal changes to obtain cues for de- tecting boundaries at where turn-taking could occur in sponta- neous conversations. The acoustic parameters used as prosodic features were the fundamental frequency, sound pressure level, and duration of accentual phrases in long utterance units. The results showed that the fundamental frequency shift between the first and second accentual phrases could be useful for detecting the number of accentual phrases in the long utterance unit. In addition, the results suggested that a rapid decrease in sound pressure and an extended duration of the accentual phrase con- stitute a cue for detecting the end of the utterance. That is, the acoustic predictor of the utterance length appeared at the begin- ning of the utterance, and the predictor of the utterance bound- ary appeared shortly before the end of the utterance.

    Keywords accentual phrase, DiSS, long utterance unit, prosody, turn-taking

  • Kristiina Jokinen, “Hesitation and uncertainty as feedback,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 103-106.

    Abstract This paper deals with the signals that are used to express hesitation and uncertainty in conversational interactions. It studies the relation between gesturing, body posture, facial expressions, and speech, and draws conclusions of their role and function in the interpretation and coordination of interaction with respect to the basic enablements of communication. Dialogues are assumed to be cooperative activity that is constrained by the participants’ roles, social obligations, and communicative situation.

    Keywords DiSS, hesitation, interaction, speech, uncertainty

  • Okim Kang, “Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness,” System, vol. 38, no. 2, June 2010, pp. 301-315. DOI: 10.1016/j.system.2010.01.005.

    Abstract Suprasegmentals have been emphasized in ESL/EFL pedagogy since the advent of communicative language teaching. However, it is still unclear how individual suprasegmental features affect listeners’ judgments of non-native speakers’ accented speech. The current study began to specify relative weights of individual temporal and prosodic features for listeners’ judgments on L2 comprehensibility and accentedness. Using the PRAAT computer program, 5 min of continuous in-class lectures from 11 international teaching assistants (ITAs) were acoustically analyzed for measures of speech rate, pauses, stress, and pitch range. Fifty eight US undergraduate students evaluated the ITAs’ oral performance and commented on their ratings. The results revealed that suprasegmental features independently contributed to listeners’ perceptual judgments. Accent ratings were best predicted by pitch range and word stress measures whereas comprehensibility scores were mostly associated with speaking rates. ITAs’ acoustic profiles as well as listeners’ comments on their rating offer practical implications to ITA program developers, ESL teachers, and future research in accented speech.

    Keywords accentedness, Comprehensibility, International teaching assistants, Suprasegmentals

  • Takuya Kawada, “On the characteristics of three types of Japanese fillers: e-, ma-, and demonstrative-type fillers,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 27-30.

    Abstract Japanese has various forms of fillers. However, the characteristics of each form have yet to be well understood. We use a large corpus of spontaneous Japanese speech and conversation and focus on three frequently observed types of fillers : e-, ma-, and demonstrative-type fillers. We show that it is possible to characterize Japanese fillers from the viewpoint of how a speaker concerns himself with the listener in the communicative setting. The type of discourse, way of speaking, and direction of gaze of the speaker influence the distribution of the types of filler.

    Keywords DiSS, fillers, gaze, Japanese, spoken settings

  • Hanae Koiso, and Yasuharu Den, “Towards a precise model of turn-taking for conversation: a quantitative analysis of overlapped utterances,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 55-58.

    Abstract In this paper, we present the outline of a new model of turntaking that is applicable not only to smooth transitions but also to transitions involving overlapping speech. We identify acoustic, prosodic, and syntactic cues in overlapped utterances that elicit early initiation of a next turn, based on a quantitative analysis of Japanese three-party conversations, proposing a model for predicting a turn’s completion in an incremental fashion using sources from units at multiple levels.

    Keywords DiSS, incremental processing, overlapped utterances, turn-taking

  • Phoenix W. Y. Lam, “Discourse Particles in Corpus Data and Textbooks: The Case of Well,” Applied Linguistics, vol. 31, no. 2, May 2010, pp. 260-281. DOI: 10.1093/applin/amp026.

    Abstract Discourse particles are ubiquitous in spoken discourse. Yet despite their pervasiveness very few studies attempt to look at their use in the pedagogical setting. Drawing on data from an intercultural corpus of speech and a textbook database, the present study compares the use of discourse particles by expert users of English in Hong Kong with their descriptions and presentations in textbooks designed for learners of English in the same community. Specifically, it investigates the similarities and differences in the use of the discourse particle well between the two datasets in terms of its frequency of occurrence, its positional preference and its discourse function. Results from the analysis show that there are vast differences as regards how the particle well is used in real-world examples and how its use is described and presented in teaching materials. This raises the question to what extent foreign language learners who have minimal exposure to naturally-occurring spoken interactions in English could effectively master the use of discourse particles if they solely rely on these textbooks.

  • Rebecca Lunsford, Peter A. Heeman, Lois Black, and Jan van Santen, “Autism and the use of fillers: differences between ‘um’ and ‘uh’,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 107-110.

    Abstract Little research has been done to explore differences in the use of the fillers ‘um’ and ‘uh’ between children with Autistic Spec- trum Disorder (ASD) and those with typical development (TD). Quantifying any differences could aid in diagnosing ASD, un- derstanding its nature, and better understanding the mechanisms involved in dialogue processing. In this paper, we report on a study of dialogues between clinicians and children with ASD or TD, finding that the two groups of children differ substantially in their use of ‘um’ but not ‘uh’. This suggests that these two fillers result from different cognitive processes.

    Keywords autism, disfluencies, DiSS, fillers

  • Kikuo Maekawa, “Final lowering and boundary pitch movements in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 47-50.

    Abstract Standard theory of the prosodic structure in Tokyo Japanese treats both the final lowering and boundary pitch movements as the properties of utterance node. Validity of this treatment was examined by means of corpus-based analyses of spontaneous speech. The results showed that while final lowering could be treated as a property of utterance, boundary pitch movement could not. The latter should rather be treated as the property of accentual phrase. Based on these results, revised prosodic structure and annotation scheme were proposed.

    Keywords BPM, CSJ, DiSS, final lowering, X-JToBI

  • Takehiko Maruyama, Katsuya Takanashi, and Nao Yoshida, “An annotation scheme for syntactic unit in Japanese dialog,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 51-54.

    Abstract In this paper, we propose a scheme for annotating syntactic units called DCU (Dialog Clause-Unit) in Japanese dialogs. Since there is no explicit devices to mark sentence boundaries in speech, precise definition and criteria must be designed to extract syntactic units from the utterance. We show a design of DCU which consists of clausal and non-clausal units. Annotating DCU tags to eight dialogs of 40 minutes from two different dialog corpora, we examine characteristics of each dialog from the viewpoint of DCU, and compare them to the distribution of clausal-units annotated to monologs.

    Keywords clause boundary, dialog clause-unit, DiSS, Japanese dialog and monolog, unit length

  • Dana McDaniel, Cecile McKee, and Merrill F. Garrett, “Children’s sentence planning: Syntactic correlates of fluency variations,” Journal of Child Language, vol. 37, no. 1, 2010, pp. 59-94. DOI: 10.1017/s0305000909009507.

    Abstract This paper argues for broader consideration of children’s language production systems and, in that context, describes research on children’s planning of syntactic structures. The research presented here measures non-fluency patterns in elicited utterances of varied syntactic type. We describe and interpret several regularities in these patterns for two groups of children ((‘young’: three–five-year-olds; and ‘older’: six–eight-year-olds) and an adult comparison group. The evidence indicates a strong correspondence of adult and child responses to structural complexity, both in terms of global fluency measures and in terms of more detailed indicators of planning load. In addition, we report some specific contrasts in the patterning for children and adults that suggest disparities in processing resources and/or in local planning strategies.

  • Sandra Merlo, and Plínio A. Barbosa, “Periodic cycles of hesitation phenomena in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 19-22.

    Abstract To verify whether hesitation phenomena are distributed periodically in spontaneous speech, twenty speech samples produced by five male adults were analyzed. Spectral analysis allowed for three main findings. First, hesitations present stationary behavior, which implies they did not accumulate in the beginning, in the middle, or in the end of speech samples. Second, periodic cycles of hesitation phenomena were detected in all speech samples (mean cycle duration around 13 seconds). This implies that regions with more hesitations tended to regularly alternate with regions with fewer hesitations. Third, periodic cycles accounted for about 30% of variance in data.

    Keywords DiSS, hesitation phenomena, periodic cycles, time series

  • Emi Morita, “Salientizing the breaks in talk: a study of Japanese segmentizing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 59-62.

    Abstract In naturally occurring conversation, Japanese speakers often break up their turns at talk with seemingly random or disfluent pauses that break the flow of talk into a series of successive small segments which may not be semantically coherent. Moreover, the boundaries between such segments are often made salient via the attachment of interactional particles, such as ne and sa. Empirical observation of such naturally occurring partitioning of talk reveals that such “semantically irregular” segmentation is used by both speakers and their recipients to accomplish a legitimate communicative function in managing the fine-tuned choreography of moment-bymoment conversational interaction.

    Keywords DiSS, interactional particles, Japanese conversation, utterance segmentation

  • Daniel Neiberg, and Joakim Gustafson, “Modeling conversational interaction using coupled Markov chains,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 81-84.

    Abstract This paper presents a series of experiments on automatic transcription and classification of fillers and feedbacks in conversational speech corpora. A feature combination of PCA projected normalized F0 Constant-Q Cepstra and MFCCs has shown to be effective for standard Hidden Markov Models (HMM). We demonstrate how to model both speaker channel with coupled HMMs and show expected improvements. In particular, we explore model topologies which take advantage of predictive cues for fillers and feedback. This is done by initializing the training with special labels located immediately before fillers in the same channel and immediately before feedbacks in the other speaker channel. The average F-score for a standard HMM is 34.1%, for a coupled HMM 36.7% and for a coupled HMM with pre-filler and pre-feedback labels 40.4%. In a pilot study the detectors are found to be useful for semi-automatic transcription of feedback and fillers in socializing conversations.

    Keywords conversation, coupled hidden markov models, cross-speaker modeling, DiSS, feedbacks, fillers

  • Hannele Nicholson, Kathleen Eberhard, and Matthias Scheutz, “"um...i don’t see any": the function of filled pauses and repairs,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 89-92.

    Abstract We investigate disfluency distribution rates within different moves from an interactive task-oriented experiment to further explore the suggestion by Bortfeld et al. [1] and Nicholson [2] that different types of disfluencies may fulfill varying functions. We focus on disfluency types within moves, or speech turns, where a speaker initiates something compared to a response to such a move. We find that filled pauses (FPs) such as um or uh fulfilled an interpersonal role for participants while repairs occurred out of difficulty.

    Keywords Dialogue, dialogue moves, disfluency, DiSS, Language production

  • Emanuel A. Schegloff, “Some Other "Uh(m)"s,” Discourse Processes, vol. 47, no. 2, 2010, pp. 130-174. DOI: 10.1080/01638530903223380.

    Abstract Recent work on the occurrence of "uh" and "uhm" in ordinary talk-in-interaction is concerned almost exclusively with its relation to trouble in the speech production process. After touching briefly on this environment of occurrence, this conversation-analytic article focuses attention on several interactional environments in which "uh(m)" figures in other ways—most extensively on its use to indicate the "reason-for-the-interaction’s-launching." The underlying theme is that accounts for what gets done and gets understood in talk-in-interaction must take into account not only its composition, but also its position—not only with respect to the grammar of sentences, but also with respect to the organization of turns at talk, of action sequences encompassing multiple turns at talk, and of occasions of talk, all of which are demonstrably oriented to by speakers in their production of the talk and by recipients in their analyzing of the talk.

  • Norman Segalowitz, Cognitive Bases of Second Language Fluency. London: Routledge.June 2010.

    Abstract Exploring fluency from multiple vantage points that together constitute a cognitive science perspective, this book examines research in second language acquisition and bilingualism that points to promising avenues for understanding and promoting second language fluency. Cognitive Bases of Second Language Fluency covers essential topics such as units of analysis for measuring fluency, the relation of second language fluency to general cognitive fluidity, social and motivational contributors to fluency, and neural correlates of fluency. The author provides clear and accessible summaries of foundational empirical work on speech production, automaticity, lexical access, and other issues of relevance to second language acquisition theory. Cognitive Bases of Second Language Fluency is a valuable reference for scholars in SLA, cognitive psychology, and language teaching, and it can also serve as an ideal textbook for advanced courses in these fields.

  • Kazuki Sekine, “Gesture correction in children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 71-74.

    Abstract Speakers sometimes modify their gestures during the process of production into disguised adaptors. Such disguised adaptors can be treated as evidence that speakers can monitor their gestures. This study investigated when disguised adaptors are produced in Japanese elementary school children. The results showed that children did not produce disguised adaptors until the age of 8. The emergence of disguised adaptors suggested that children start to monitor their gestures when they are 9 or 10 years old. Cultural influences and cognitive changes were considered as factors to influence emergence of disguised adaptors.

    Keywords adaptors, DiSS, speech error, spontaneous gestures

  • Shu-Chuan Tseng, and Yun-Ru Huang, “A socio-phonetic analysis of Taiwan Mandarin interview speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 67-70.

    Abstract This paper presents results of a socio-phonetic analysis of Taiwan Mandarin by using a corpus of questionnaire-based interview speech. Questions were asked to collect data of the interviewee’s background of language use, socio-economic status, and internet access in different regions of Taiwan. Two typical dialect-influenced pronunciation errors, the deletion of /w/ before /o/ and the delabilialization of /y/ were analyzed with the associated socio-economic factors and the degree of dialect exposure. The degree of dialect exposure (Southern Min) and the studied pronunciation variants are statistically correlated with the accuracy rate. But no direct correlation was found between the pronunciation variation and the socioeconomic factors.

    Keywords DiSS, interview speech, sociophonetics, Taiwan Mandarin

  • Shu-Chuan Tseng, and Tzu-Lun Lee, “Contextual effects in recognizing reduced words in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 39-42.

    Abstract This study investigates the effects of context on recognizing reduced word forms in spontaneous speech. Sixteen high-frequency disyllabic targets, eight disyllabic and eight combinations of monosyllabic words are presented to 48 subjects in a spoken word recognition experiment in three conditions: in their original context, in isolation, and embedded in a carrier sentence. Results show that context, degree of reduction, word unit type, gender, and age group all show an effect on the accuracy rates of recognizing the target items. Most interestingly, while a meaningful context helps recognize reduced word forms, a less meaningful context inhibits the recognition more than no context.

    Keywords context effect, DiSS, spoken word recognition

  • Shu-Chuan Tseng, Pei-Chen Tsou, Ko Kuei, and Chien-Wen Lee, “Assessing sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 11-14.

    Abstract This paper examines sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children with matched gender, age and level of speech comprehension. We assessed these two kinds of speech styles by talker intelligibility, vowel space, and spike production in plosives. In both speaking styles, normally hearing children performed better in talker intelligibility than their hearingimpaired counterparts. No clear vowel space shrinkage was observed in respect of speech style, hearing impairment, and age group. Surprisingly, the production of the spike in plosives was a useful measure for distinguishing acoustic properties of different speaking styles and hearing ability.

    Keywords acoustic properties, DiSS, hearing impairment, speaking style, speech assessment

  • Ioana Vasilescu, Sophie Rosset, and Martine Adda-Decker, “On the functions of the vocalic hesitation euh in interactive man-machine question answering dialogs in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 111-114.

    Abstract This paper deals with the functions of the French vocalic hesitation euh in interactive speech of man-machine question answering dialogs. The present analysis suggests that the vocalic hesitation euh may carry various properties in speech, both disfluent signaling the speakers’ efforts to put the intended message under production into appropriate words, and fluent, as markers of discourse structure. Moreover, euh seems to play a role in bracketing lexical units, pointing to the informative content within an utterance. This bracketing may favour intelligibility or decoding fluency on the listener’s side. The potential contribution of the vocalic hesitation euh to lexical information bracketing is investigated with the goal of improved information processing by QA systems. Future objectives include a smarter interaction capacity by an appropriate usage of such euh items.

    Keywords dialog corpus, Discourse markers, disfluency, DiSS, Fluency, French, Q/A, vocalic hesitation

  • Kun-Ching Wang, Chiun-Li Chin, and Yi-Hsing Tsai, “Voice activity detection based on combination of weighted sub-band features using auto-correlation function,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 85-88.

    Abstract This paper shows the voice activity detection (VAD) based on combination of weighted sub-band features using autocorrelation function. According to the fact that the noise corruption on each sub-band is different from each other, so the estimated signal to noise ratio (SNR) is employed to weight utility rate of each frequency sub-band. Furthermore, a strategy of sub-band features combination is used to integrate all of weighted sub-band auto-correlation function feature parameter and to develop the combined feature parameter. Experimental results demonstrate that the proposed VAD achieves better performance than existing standard VADs at any noise level.

    Keywords auto-correlation, DiSS, feature combination, sub-band weighting, voice activity detection, wavelet packet transform

  • Michiko Watanabe, and Yasuharu Den, “Utterance-initial elements in Japanese: a comparison among fillers, conjunctions, and topic phrases,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 31-34.

    Abstract Speakers need to plan the following part of speech under the pressure of a temporal imperative at utterance-initial positions. Each language seems to have some devices to solve this problem, which we call utterance-initial elements (UIEs). We investigated effects of two factors, boundary strengths and complexity of the following constituents, on the durations of possible UIEs, such as fillers, conjunctions, and topic phrases. We found that the last mora of filler e, as well as wa-marked topic phrases, became longer as the complexity increased in certain conditions. Possible interpretations for the results are discussed.

    Keywords boundary strengths, constituent complexity, DiSS, prolongation, utterance-initial elements

  • Li-chiung Yang, “Meaning and use: a pragmatic and prosodic analysis of interjections in conversational speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 75-78.

    Abstract In this paper we report on our research on the pragmaticcontextual meaning and prosody of three interjections ey, wa, and oh. A detailed qualitative-contextual analysis of our corpus shows that these interjections share important contextual and prosodic characteristics due to their similar functional status with respect to new or unexpected information. We show that there are also significant differences in contextual meaning arising from specific emotional or cognitive states, and that these differences are expressively communicated in the varied prosody of each interjection.

    Keywords discourse, DiSS, interjections, meaning, prosody

  • Etsuko Yoshida, and Robin J. Lickley, “Disfluency patterns in dialogue processing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 115-118.

    Abstract Spontaneous speech abounds with disfluencies such as filled pauses, repairs, repetitions, false start and prolongations, all of which are significant but easily overlooked features of speech communication. Based on the comparable corpora of English and Japanese dialogues, we argue that disfluency features can have a positive effect on turn-taking issues and the establishment of common referring expressions in dialogue processing. We examined the occurrence of ten types of filled pauses in Japanese and investigated how they interact with discourse entities and the sharing of common ground. The results indicate that two patterns of disfluency features contribute to on-line speech planning of the participants and their four functions serve to construct the collaborative process of speech communication.

    Keywords common ground, corpus, Dialogue, disfluency, DiSS, referring expressions


  • Tracey M. Derwing, Murray J. Munro, Ron I. Thomson, and Marian J. Rossiter, “The Relationship between L1 Fluency and L2 Fluency Development,” Studies in Second Language Acquisition, vol. 31, no. 4, December 2009, pp. 533-557. DOI: 10.1017/S0272263109990015.

    Abstract A fundamental question in the study of second language (L2) fluency is the extent to which temporal characteristics of speakers’ first language (L1) productions predict the same characteristics in the L2. A close relationship between a speaker’s L1 and L2 temporal characteristics would suggest that fluency is governed by an underlying trait. This longitudinal investigation compared L1 and L2 English fluency at three times over 2 years in Russian- and Ukrainian- (which we will refer to here as Slavic) and Mandarin-speaking adult immigrants to Canada. Fluency ratings of narratives by trained judges indicated a relationship between the L1 and the L2 in the initial stages of L2 exposure, although this relationship was found to be stronger in the Slavic than in the Mandarin learners. Pauses per second, speech rate, and pruned syllables per second were all related to the listeners’ judgments in both languages, although vowel durations were not. Between-group differences may reflect differential exposure to spoken English and a closer relationship between Slavic languages and English than between Mandarin and English. Suggestions for pedagogical interventions and further research are also proposed.

  • Rod Ellis, “The Differential Effects of Three Types of Task Planning on the Fluency, Complexity, and Accuracy in L2 Oral Production,” Applied Linguistics, vol. 30, no. 4, December 2009, pp. 474-509. DOI: 10.1093/applin/amp042.

    Abstract The main purpose of this article is to review studies that have investigated the effects of three types of planning (rehearsal, pre-task planning, and within-task planning) on the fluency, complexity, and accuracy of L2 performance. All three types of planning have been shown to have a beneficial effect on fluency but the results for complexity and accuracy are more mixed, reflecting both the type of planning and also the mediating role of various factors, including task design and implementation variables and individual difference factors. A secondary purpose is to outline a theory that can account for the role that planning plays in L2 performance. The article concludes with a list of limitations in the research to date.

  • Klaus Zechner, Derrick Higgins, Xiaoming Xi, and David M. Williamson, “Automatic scoring of non-native spontaneous speech in tests of spoken English,” Speech Communication, vol. 51, no. 10, 2009, pp. 883 - 895. DOI:

    Abstract This paper presents the first version of the SpeechRaterSM system for automatically scoring non-native spontaneous high-entropy speech in the context of an online practice test for prospective takers of the Test of English as a Foreign Language&reg; internet-based test (TOEFL&reg; iBT). The system consists of a speech recognizer trained on non-native English speech data, a feature computation module, using speech recognizer output to compute a set of mostly fluency based features, and a multiple regression scoring model which predicts a speaking proficiency score for every test item response, using a subset of the features generated by the previous component. Experiments with classification and regression trees (CART) complement those performed with multiple regression. We evaluate the system both on {TOEFL} Practice data [TOEFL Practice Online (TPO)] as well as on Field Study data collected before the introduction of the {TOEFL} iBT. Features are selected by test development experts based on both their empirical correlations with human scores as well as on their coverage of the concept of communicative competence. We conclude that while the correlation between machine scores and human scores on {TPO} (of 0.57) still differs by 0.17 from the inter-human correlation (of 0.74) on complete sets of six items (Pearson r correlation coefficients), the correlation of 0.57 is still high enough to warrant the deployment of the system in a low-stakes practice environment, given its coverage of several important aspects of communicative competence such as fluency, vocabulary diversity, grammar, and pronunciation. Another reason why the deployment of the system in a low-stakes practice environment is warranted is that this system is an initial version of a long-term research and development program where features related to vocabulary, grammar, and content will be added in a later stage when automatic speech recognition performance improves, which can then be easily achieved without a re-design of the system. Exact agreement on single {TPO} items between our system and human scores was 57.8%, essentially at par with inter-human agreement of 57.2%. Our system has been in operational use to score {TOEFL} Practice Online Speaking tests since the Fall of 2006 and has since scored tens of thousands of tests.

    Keywords Speaking assessment


  • Carla L. Hudson Kam, and Nicole A. Edwards, “The use of uh and um by 3- and 4-year-old native English-speaking children: Not quite right but not completely wrong,” First Language, vol. 28, no. 3, 08/2008 2008, pp. 313-327. DOI: 10.1177/0142723708091149.

    Abstract The delay markers (DMs) 'uh' and 'um' are often used by adult English speakers to indicate that an upcoming pause is due to a speech disruption, not the end of a conversational turn. Moreover, 'uh' and 'um' indicate different degrees of disruption (Clark & Fox Tree, 2002). Thus, it appears that children must learn how to use DMs appropriately. In the current study we examined DM use in elicited speech samples from 24 3- and 4-year-old children. We found that pauses following DMs were longer than those not following a DM, but that there was no difference between the pauses following 'uh' and 'um'. Children at this age, then, appear to understand the basic use of DMs, but do not yet differentiate between them.

    Keywords Conversational development, disfluencies, filled pauses, narrative, turn-taking

  • Martin Corley, and Oliver W. Stewart, “Hesitation Disfluencies in Spontaneous Speech: The Meaning of um,” Language and Linguistics Compass, vol. 2, no. 4, July 2008, pp. 589-602. DOI: 10.1111/j.1749-818X.2008.00068.x.

    Abstract Human speech is peppered with ums and uhs, among other signs of hesitation in the planning process. But are these so-called fillers (or filled pauses) intentionally uttered by speakers, or are they side-effects of difficulties in the planning process? And how do listeners respond to them? In the present paper, we review evidence concerning the production and comprehension of fillers such as um and uh, in an attempt to determine whether they can be said to be ’words’ with ’meanings’ that are understood by listeners. We conclude that, whereas listeners are highly sensitive to hesitation disfluencies in speech, there is little evidence to suggest that they are intentionally produced, or should be considered to be words in the conventional sense.

  • Tracey M. Derwing, Murray J. Munro, and Ron I. Thomson, “A Longitudinal Study of ESL Learners’ Fluency and Comprehensibility Development,” Applied Linguistics, vol. 29, no. 3, 2008, pp. 359-380. DOI: 10.1093/applin/amm041.

    Abstract This longitudinal mixed-methods study compared the oral fluency of well-educated adult immigrants from Mandarin and Slavic language backgrounds (16 per group) enrolled in introductory English as a second language (ESL) classes. Speech samples were collected over a 2-year period, together with estimates of weekly English use. We also conducted interviews at the last data collection session. The participants’ fluency and comprehensibility at three points over 22 months were judged by 33 native speakers of English. We examine the learners’ progress in light of their exposure to English outside of their ESL class. The Slavic language speakers showed a small but significant improvement in both fluency and comprehensibility, whereas the Mandarin speakers’ performance did not change over 2 years, although both groups started at the same level of oral proficiency. These differences may be attributable in part to degree of exposure to English outside the ESL courses. Neither group had extensive exposure outside of their classes because of employment and familial responsibilities (although the Slavic language speakers reported more opportunities). Thus both groups may have been disadvantaged by a lack of oral fluency instruction. The findings, both quantitative and qualitative, are interpreted using the Willingness to Communicate framework; we also discuss implications for the language classroom.

  • Michael Erard, Um... Slips, Stumbles, and Verbal Blunders, and What They Mean. New York: Penguin Random House.August 2008.

    Abstract This original, entertaining, and surprising book investigates verbal blunders: what they are, what they say about those who make them, and how and why we’ve come to judge them.Um… is about how you really speak, and why it’s normal for your everyday speech to be filled with errors—about one in every ten words. In this charming, engaging account of language in the wild, linguist and writer Michael Erard also explains why our attention to some blunders rises and falls. Where did the Freudian slip come from? Why do we prize "umlessness" in speaking—and should we? And how do we explain the American presidents who are famous for their verbal stumbles? Full of entertaining examples, Um… is essential reading for talkers and listeners of all stripes.

  • T. Florian Jaeger, and Celeste Kidd, “A Unified Model of Redundancy Avoidance and Strategic Lengthening,” in The 21st CUNY Sentence Processing Conference, March 2008.

    Abstract Recent studies have revealed an intriguing link between redundancy and reduction: words that are more predictable in their context are more commonly reduced (shorter and with less articulatory detail [1,2,3]). These studies have, however, also found a puzzling asymmetry: Content words are reduced when predictable given the previous word, but function words are reduced when predictable given the following word. We present a solution to this puzzle that unifies work on redundancy with work on strategic lengthening [4]. We find that the apparent backward-predictability effect on function word reduction is an artifact caused by speakers' tendency to slow pronunciation when the next word is unavailable.

  • Lucy J. MacGregor, “Disfluencies affect language comprehension: evidence from event-related potentials and recognition memory,” Master's Thesis, The University of Edinburgh. 2008.

    Abstract Everyday speech is littered with disfluencies such as filled pauses, silent pauses, repetitions and repairs which reflect a speaker’s language production difficulties. But what are the effects on language comprehension? This thesis took a novel approach to the study of disfluencies by combining an investigation of the immediate effects on language processing with an investigation of the longer-term effects for the representation of language in memory. A series of experiments is reported which reflects the first attempt at a systematic investigation of the effects of different types of disfluencies on language comprehension. The experiments focused on the effects of three types of disfluencies—ers, silent pauses, and repetitions—on the comprehension of subsequent words. Critical words were either straightforward continuations of the pre-interrupted speech or a repair word which corrected the pre-interrupted speech. In addition, the effects that occur when er, repetition, and repair disfluencies themselves are processed, were assessed. ERPs showed that the N400 effect elicited in response to contextually unpredictable compared to predictable words was attenuated by the presence of a pre-target er reflecting a reduction in the standard difference where unpredictable words are more difficult to integrate into their contexts. This finding suggests that ers may reduce the extent to which listeners make predictions about upcoming words. In addition, words preceded by an er were more likely to be correctly recognised in a subsequent memory test. These findings demonstrate a longer-term consequence for representation which may reflect heightened attention during processing. Silent pauses did not affect the N400 but there was some indication of an effect on recognition memory. Repetition disfluencies did not affect the N400 or recognition memory. These findings demonstrate the importance of the nature of the disruption to speech. For all types of disfluent utterances, unpredictable words elicited a Late Positive Complex (LPC), possibly reflecting processes associated with memory retrieval and control as listeners attempted to resume structural fluency after any interruption. Ers themselves elicited standard attention-related ERP effects: the Mismatch Negativity (MMN) and P300 effects, supporting the possibility that ers heighten attention. Repetition disfluencies elicited a right posterior positivity, reflecting detection of the disfluency and possibly syntactic reanalysis. Repair disfluencies elicited an early frontal negativity, possibly related to the detection of a word category violation, and a P600 effect, reflecting syntactic reanalysis. The presence of an er preceding the repair eliminated the early negativity, but had no effect on the P600 suggesting that ers may prepare listeners for the possibility of an upcoming repair, but that they do not reduce the difficulty associated with reanalysis. Taken together, the results from the studies reported in the thesis support an account of disfluency processing which incorporates both prediction and attention

    Keywords Language comprehension, Psychology

  • Ralph L. Rose, “Filled Pauses in Language Teaching: Why and How,” Bulletin of Gunma Prefectural Women’s University, vol. 29, 2008, pp. 47-64.

    Abstract Filled Pauses (uh, um) are ubiquitous elements of spontaneous speech but have received relatively little attention in second language teaching. Perhaps this is because filled pauses have often been regarded as meaningless elements resulting from speech processing difficulties. This paper draws from research in widely disparate fields to show that speakers and listeners use them systematically and meaningfully. These facts are used to generate a unified and coherent model of filled pauses in spontaneous speech. This model is then used to develop a concept of communicative competence in which filled pauses play a role at the interface between pragmatic constraints and communication strategies. The article concludes with practical recommendations for how filled pauses may be incorporated into the second-language teaching curriculum.

  • Michiko Watanabe, Keikichi Hirose, Yasuharu Den, and Nobuaki Minematsu, “Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners,” Speech Communication, vol. 50, no. 2, February 2008, pp. 81-94. DOI: 10.1016/j.specom.2007.06.002.

    Abstract We examined whether filled pauses (FPs) affect listeners’ predictions about the complexity of upcoming phrases in Japanese. Studies of spontaneous speech corpora show that constituents tend to be longer or more complex when they are immediately preceded by FPs than when they are not. From this finding, we hypothesized that FPs cause listeners to expect that the speaker is going to refer to something that is likely to be expressed by a relatively long or complex constituent. In the experiments, participants listened to sentences describing both simple and compound shapes on a computer screen. Their task was to press a button as soon as they had identified the shape corresponding to the description. Phrases describing shapes were immediately preceded by a FP, a silent pause of the same duration, or no pause. We predicted that listeners’ response times to compound shapes would be shorter when there is a FP before phrases describing the shape than when there is no FP, because FPs are good cues to complex phrases, whereas response times to simple shapes would not be shorter with a preceding FP than without. The results of native Japanese and proficient non-native Chinese listeners agreed with the prediction and provided evidence to support the hypothesis. Response times of the least proficient non-native listeners were not affected by the existence of FPs, suggesting that the effects of FPs on non-native listeners depend on their language proficiency.

  • Chen-huei Wu, “Filled Pauses in L2 Chinese: A Comparison of Native and Non-Native Speakers,” in Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20), Columbus, Ohio, The Ohio State University, 2008, pp. 213-227.

    Abstract The aim of this paper is to determine whether native and non-native speech can be predicted on the basis of temporal measurements of filled pauses by training a Classification and Regression Tree (Breiman et al. 1984). On the basis of the present results, several conclusions can be drawn: First, distinguishing between native and non-native speech can increase in accuracy based on temporal measurements of FPs. Among these variables, the rate of speech appears to be the best predictor. Second, this study suggests that information from the FPs ‘uh’ and ‘um’ is a useful predictor of fluency in further differentiating native/nonnative speakers. Third, the classification can be accurately predicted with a small set of variables.


  • Karl G.D. Bailey, and Fernanda Ferreira, “The Processing of Filled Pause Disfluencies in the Visual World,” in Eye movements: A window on mind and brain, Van Gompel, Roger P.G. and Murray, Wayne S. and Fischer, Martin H. and Hill, Robin L., Ed.Amsterdam: Elsevier, 2007, ch. 22, pp. 485-500. DOI: 10.1016/B978-008044980-7/50024-0.

    Abstract One type of spontaneous speech disfluency is the filled pause, in which a filler (e.g. uh) interrupts production of an utterance. We report a visual world experiment in which participants’ eye movements were monitored while they responded to ambiguous utterances containing filled pauses by manipulating objects placed in front of them. Participant’s eye movements and actions suggested that filled pauses informed resolution of the current referential ambiguity, but did not affect the final parse. We suggest that filled pauses may inform the resolution of whatever ambiguity is most salient in a given situation.

  • Esther de Leeuw, “Hesitation Markers in English, German, and Dutch,” Journal of Germanic Linguistics, vol. 19, no. 2, 2007, pp. 85-114. DOI: 10.1017/S1470542707000049.

    Abstract This study reports on a number of highly significant differences found between English, German, and Dutch hesitation markers. English and German native speakers used significantly more vocalic-nasal hesitation markers than Dutch native speakers, who used predominantly vocalic hesitation markers. English hesitation markers occurred most frequently when preceded by silence and followed by a lexical item, or when surrounded by silence. German and Dutch hesitation markers occurred most frequently surrounded by lexical items. In Dutch, vocalic-nasal hesitation markers dominated only when surrounded by silence. Vocalic-nasal hesitation markers dominated in all positions in English and German, although in the former language this was more salient than in the latter. Nasal hesitation markers were used significantly more frequently in German than in English or Dutch. In addition to overall language trends, speaker-specific differences, especially within German and Dutch, were observed. These results raise questions in terms of the symptom versus signal hypotheses regarding the function of hesitation markers.

  • Carol Fehringer, and Christina Fry, “Hesitation phenomena in the language production of bilingual speakers: The role of working memory,” Folia Linguistica, vol. 41, no. 1-2, June 2007, pp. 37-72. DOI: 10.1515/flin.41.1-2.37.

    Abstract This paper is an empirical investigation of the use of hesitation phenomena, specifically filled pauses (ums and ers), automatisms (sort of, at the end of the day), repetitions and reformulations, in both the mother tongue (L1) and second language (L2) of highly proficient adult bilingual speakers (English and German). Its purpose is to ascertain: i) whether speakers who are highly proficient in L2 produce an approximately similar amount of hesitation phenomena in both languages; and ii) whether the production of such elements (in both languages) is linked to working memory capacity. Results show that: i) despite high proficiency, speakers produced a higher overall rate of hesitation phenomena in their L2, indicating that there was an additional cognitive load imposed by working in L2; and ii) in each language there was an underlying negative relationship between memory capacity and the production of hesitation phenomena, implying that speakers with lower memory ability rely more heavily on such time-buying devices. Furthermore, it was shown that the individual types of hesitation phenomena produced by speakers in their L1 were carried over into their L2, which suggests that a speaker’s planning behaviour is mirrored in both languages.

    Keywords bilingual, hesitation, L2, memory, prefabricated utterance, Speech production, working

  • Jean E. Fox Tree, “Folk notions of um and uh, you know, and like,” Text & Talk, vol. 22, no. 3, 2007, pp. 297-314. DOI:

    Abstract The current study measures laypeople’s uses of 'um', 'uh', 'you know', and 'like', including folk notions of meanings, self-assessments of use, history of discussing use, and attitudes toward the words. Unlike the prevalent idea in the popular press that these discourse markers are interchangeable speaker production flaws, respondents in this study demonstrated that people do possess folk notions of meanings and uses that dramatically distinguish markers from each other. 'Um' and 'uh' were thought to indicate production trouble, 'you know' was thought to be used in checking for understanding and connecting with listeners, and 'like' defied definition. The folk notions of 'um', 'uh', and 'you know' accord well with researchers’ ideas about the meanings of these words. The use of 'like' may be too subtle for laypeople to articulate. Most researchers’ views of 'like' involve some kind of discrepancy between what’s said and what’s meant. Even if they cannot state a meaning, people do treat the different markers differently.

    Keywords Discourse markers, fillers, like, meaning, spontaneous speech, you know

  • Irena O’Brien, Norman Segalowitz, Barbara Freed, and Joe Collentine, “Phonological Memory Predicts Second Language Oral Fluency Gains in Adults,” Studies in Second Language Acquisition, vol. 29, no. 04, 2007, pp. 557-581. DOI: 10.1017/s027226310707043x.

    Abstract This study investigated the relationship between phonological memory and second language (L2) fluency gains in native English-speaking adults learning Spanish in two learning contexts: at their home university or abroad in an immersion context. Phonological memory (operationalized as serial nonword recognition) and Spanish oral fluency (temporal&sol;hesitation phenomena) were assessed at two times, 13 weeks apart. Hierarchical regressions showed that, after the variance attributable to learning context was partialed out, initial serial nonword recognition performance was significantly associated with L2 oral fluency development, explaining 4.5-9.7% of unique variance. These results indicate that phonological memory makes an important contribution to L2 learning in terms of oral fluency development. Furthermore, these results from an adult population extend conclusions from previous studies that have claimed a role for phonological memory primarily in vocabulary development in younger populations.

  • Pavel Trofimovich, and Wendy Baker, “Learning prosody and fluency characteristics of second language speech: The effect of experience on child learners’ acquisition of five suprasegmentals,” Applied Psycholinguistics, vol. 28, no. 2, 2007, pp. 251-276. DOI: 10.1017/s0142716407070130.

    Abstract This study examined second language (L2) experience effects on children’s acquisition of fluency-(speech rate, frequency, and duration of pausing) and prosody-based (stress timing, peak alignment) suprasegmentals. Twenty Korean children (age of arrival in the United States = 7-11 years, length of US residence = 1 vs. 11 years) and 20 age-matched English monolinguals produced six English sentences in a sentence repetition task. Acoustic analyses and listener judgments were used to determine how accurately the suprasegmentals were produced and to what extent they contributed to foreign accent. Results indicated that the children with 11 years of US residence, unlike those with 1 year of US residence, produced all but one (speech rate) suprasegmentals natively. Overall, findings revealed similarities between L2 segmental and suprasegmental learning.

  • Ioana Vasilescu, Rena Nemoto, and Martine Adda-Decker, “Vocalic Hesitations vs Vocalic Systems: A Cross-Language Comparison,” in 16th International Congress of Phonetic Sciences, 2007.

    Abstract This paper deals with the acoustic characteristics of vocalic hesitations in a cross-language perspective. The underlying questions concern the "neutral" vs. language-dependent timbre of vocalic hesitations and the link between their vocalic quality and the phonemic system of the language. An additional point of interest concerns the duration effect on vocalic hesitations compared to intra-lexical vowels. Acoustic measurements have been carried out in American English, French and Spanish. Results on vocalic timbre show that hesitations (i) carry language-specific information; (ii) whereas often close to measurements of existing vowels, they do not necessarily collapse with them. Finally, (iii) duration variation affects the timbre of vocalic hesitation and a centralization towards a "neutral" realization is observed for decreasing durations.

    Keywords centralization, duration, timbre, vocalic hesitation, vocalic systems


  • Felix K. Ameka, “Interjections,” in Encyclopedia of Language & Linguistics, Brown, Keith, Ed.Oxford, UK: Oxford, 2006, pp. 743-746. DOI: 10.1016/B0-08-044854-2/00396-5.

    Abstract Interjections are words that conventionally constitute utterances by themselves and express a speaker’s current mental state or reaction toward an element in the linguistic or extralinguistic context. Some English interjections are words such as yuk! ‘I feel disgusted,’ ow! ‘I feel sudden pain,’ wow! ‘I feel surprised and I am impressed,’ aha! ‘I now understand,’ hey! ‘I want someone’s attention,’ damn! ‘I feel frustrated,’ and bother! ‘I feel annoyed.’ Such words are found in all languages of the world. This article surveys the different uses and definitions of the term ‘interjection’ and the different types of interjections that are found in the languages of the world. It also explores the relationship of interjections to other pragmatic devices such as particles, discourse markers, and speech formulae.

    Keywords formulaic language, Indexicality, interjections, language functions, onomatopoeia, particles, routines, speech acts

  • Richard Bello, “Causes and paralinguistic correlates of interpersonal equivocation,” Journal of Pragmatics, vol. 38, no. 9, 2006, pp. 1430-1441. DOI: 10.1016/j.pragma.2005.09.001.

    Abstract This paper examines the long standing theory of the Bavelas group which suggests that the only consistent cause of interpersonal equivocation is avoidance-avoidance conflict (AAC), and it also attempts to uncover a psycholinguistic profile of equivocation, especially in the form of paralinguistic cues such as dysfluencies. Participants responded orally to questions from hypothetical interlocutors within scenarios which manipulated both the presence/absence of AAC and level of situational formality. Their responses (72 messages) were audio taped, transcribed, rated for degree of equivocation, and coded for dysfluencies. Results of ANOVA showed that AAC not only resulted in more equivocation, but also that formality level interacted with AAC in influencing equivocation. Participants used filled pauses, surprisingly, in the condition within which they equivocated the least, although they produced other dysfluencies (combined) within conditions where they equivocated the most. Results are discussed in terms of the notion that filled pauses are special and in terms of interpersonal deception theory.

    Keywords avoidance-avoidance conflict, disfluencies, Equivocation, filled pauses, Informality, Interpersonal communication, Paralinguistics

  • Stefan Benus, Frank Enos, Julia Hirschberg, and Elizabeth Shriberg, “Pauses in Deceptive Speech,” in Speech Prosody 18, Dresden, Germany, 2006, pp. 2-5.

    Abstract We use a corpus of spontaneous interview speech to investigate the relationship between the distributional and prosodic characteristics of silent and filled pauses and the intent of an interviewee to deceive an interviewer. Our data suggest that the use of pauses correlates more with truthful than with deceptive speech, and that prosodic features extracted from filled pauses themselves as well as features describing contextual prosodic information in the vicinity of filled pauses may facilitate the detection of deceit in speech.

  • Alex Boulton, “To er is human: Silent pauses and speech dysfunctions of the 2004 US presidential debates,” in Le Désaccord, Pereiro, M. and Daniels, H., Ed.Nancy: AMAES, 2006, pp. 7-32.

    Abstract It has become fashionable, even axiomatic in some circles today, to suppose that politics is all about form, not content—it’s not what they say but the way that they say it. It ought to follow that the most powerful politicians should be the best speakers, so this paper takes as its starting point the 2004 US presidential debates. These televised confrontations, where each candidate has to react to new questions as well as to counter his opponent, are notoriously high-risk, and present considerable opportunities for various speech "dysfunctions". These are analysed in relation to media reaction and public perception of the outcome.

    Keywords cognitive science, disfluency, hesitation, linguistics, presidential debate, speed of articulation

  • Martin Corley, Lucy J. MacGregor, and David Donaldson, “It’s the way that you, er, say it: Hesitations in speech affect language comprehension,” Cognition, vol. 105, no. 3, 2006, pp. 658-698. DOI: 10.1016/j.cognition.2006.10.010.

    Abstract Everyday speech is littered with disfluency, often correlated with the production of less predictable words (e.g., Beattie & Butterworth [Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses in spontaneous speech. Language and Speech, 22, 201-211.]). But what are the effects of disfluency on listeners? In an ERP experiment which compared fluent to disfluent utterances, we established an N400 effect for unpredictable compared to predictable words. This effect, reflecting the difference in ease of integrating words into their contexts, was reduced in cases where the target words were preceded by a hesitation marked by the word er. Moreover, a subsequent recognition memory test showed that words preceded by disfluency were more likely to be remembered. The study demonstrates that hesitation affects the way in which listeners process spoken language, and that these changes are associated with longer-term consequences for the representation of the message.

    Keywords disfluency, ERPs, Language comprehension, speech

  • Chika Nagaoka, “Mutual influence of nonverbal behavior in interpersonal communication,” Japanese Journal of Interpersonal and Social Psychology, vol. 6, 2006, pp. 101-112.

    Abstract In social interactions, the interactants’ nonverbal behavior may synchronize and become similar. In this study, the author called this phenomenon ‘synchrony tendency’. Since conventional research about this phenomenon has been conducted from various angles separately, there has been almost no attempt to examine the role of synchrony tendency systematically. In this light, the present study aims at reviewing synchrony tendency based on previous studies from various fields and perspectives. The synchrony tendency has been observed in various communication channels, and in various forms, such as interspeaker congruence of paralanguage, convergence of accents in cross-cultural communication, mimicry of other’s facial and vocal emotional expressions, neonate imitation, interpersonal synchrony of body movements, entrainment between a neonate’s body movement and the flow of an adult’s speech. Therefore, this phenomenon has been labeled with various terms, each one having a specific nuance. Moreover, the synchrony tendency is not always observed in all interactions, and it sensitively changes with various factors, such as the interactants’ level of empathy and socialization. For example, the results of my experiments indicate that the convergence of response latencies (i.e., latencies before responding to the last utterance of one’s partner) in dialogues reflects whether a speaker is receptive to the conversational partner during the dialogue. All these suggest that the synchrony tendency provides an effective indicator reflecting various aspects of our communication behavior. Various functions of the synchrony tendency in adults’ interactions can be inferred from past literature: (a) it facilitates the understanding of an interactional partner’s emotions, (b) it conveys empathy and rapport, and (c) it makes the speakers’ personality and attitude feel positive. Furthermore, the results of my experiments showed that the synchrony tendency facilitates goal achievement, such as reaching a compromise through discussion (the speakers whose response latencies became similar over the time course to those of their conversational partners evaluated that they reached a compromise). Past literature along with the results of my own experiments bring to light two aspects of the synchrony tendency: the emotional/automatic/inherent aspect and the cognitive/acquired aspect. Examples that clearly illustrate the former aspect are imitations of facial and vocal emotional expressions and neonate imitation. On the other hand, the cognitive/acquired aspect is illustrated by convergence or congruence of response latencies, vocal intensity, speech duration, language, or accent, and is influenced by social factors. The above-mentioned aspects of the synchrony tendency match Hess, Philippot, & Blairy (1999)’s mimicry model, Giles et al.’s communication accommodation theory (ex. Shepard, Giles, & LePoire, 2001), as well as the author’s speech style convergence model. The speech styles convergence model derived from a series of studies on the convergence of response latencies in dialogues. This model suggests that adopting a partner’s speech style and the output cycle between the interactants being influenced by the speakers’ social skills and attitude towards the partner, this cycle develops over the course of the interaction until the speech styles finally converge to a point most suitable for the members of the dyad to progress smoothly through the dialogue. In the future, it is necessary to investigate quantitatively through which communication channels, and when in the time course of an interaction, the synchrony tendency is displayed.

    Keywords cognition, emotion, nonverbal behavior, synchrony tendency

  • Stefanie Pillai, “Self-Monitoring and Self-Repair in Spontaneous Speech,” k@ta, vol. 8, no. 2, 2006, pp. 114-126.

    Abstract This study explores what repairs in the spontaneous production of speech reveal about the psycholinguistic processes of self-monitoring and self-repair. Three intervals were examined: error-to-cut off; cut off-to-repair; error-to-repair. The intervals indicate support theories of internal speech monitoring, and also indicate that the planning of speech-repairs can take place pre-articulatorily as well

    Keywords error-detection, Perceptual loop theory, self-monitoring, self-repairs, Speech production

  • Pavel Trofimovich, and Wendy Baker, “Learning Second Language Suprasegmentals: Effect of L2 Experience on Prosody and Fluency Characteristics of L2 Speech,” Studies in Second Language Acquisition, vol. 28, 2006, pp. 1-30. DOI: 10.1017/S0272263106060013.

    Abstract This study examines effects of short, medium, and extended second language (L2) experience (3 months, 3 years, and 10 years of United States residence, respectively) on the production of five suprasegmentals (stress timing, peak alignment, speech rate, pause frequency, and pause duration) in six English declarative sentences by 30 adult Korean learners of English and 10 adult native English speakers. Acoustic analyses and listener judgments were used to determine how accurately the suprasegmentals were produced and to what extent they contributed to foreign accent. Results revealed that amount of experience influenced the production of one suprasegmental (stress timing), whereas adult learners’ age at the time of first extensive exposure to the L2 (indexed as age of arrival in the United States) influenced the production of others (speech rate, pause frequency, pause duration). Moreover, it was found that suprasegmentals contributed to foreign accent at all levels of experience and that some suprasegmentals (pause duration, speech rate) were more likely to do so than others (stress timing, peak alignment). Overall, results revealed similarities between L2 segmental and suprasegmental learning.

  • Aldert Vrij, Lucy Akehurst, Laura Brown, and Samantha Mann, “Detecting Lies in Young Children, Adolescents and Adults,” Applied Cognitive Psychology, vol. 20, 2006, pp. 1225-1237. DOI: 10.1002/acp.1278.

    Abstract The ability of teachers, social workers, police officers and laypersons (undergraduate and postgraduate students) to detect truths and lies told by 5-6 year-olds, adolescents and adults was tested in the present experiment. Lie detectors judged the veracity of statements from 18 liars and 18 truth tellers belonging to these three age groups. Accuracy scores were around 60% for each of these three age groups, both for detecting truths and for detecting lies. No occupational differences emerged. Moreover, judgements made by teachers, social workers and police officers showed an overlap, suggesting that an erroneous decision made by a member of one group may not easily be detected by a member of the other groups. The lie detectors were inclined to judge cues of nervousness, cognitive demand and attempted behavioural control as cues to deceit, even when truth tellers were displaying these cues.


  • Timothy Arbisi-Kelm, and Sun-Ah Jun, “A comparison of disfluency patterns in normal and stuttered speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 13-16.

    Abstract While speech disfluencies are commonly found in every speaker’s speech, stuttering is a language disorder characterized by an abnormally high rate of speech aberrations, including prolongation, cessation, and repetition of speech segments. However, despite the obvious differences between stuttered and normal speech, identifying the crucial qualities that identify stuttered speech remains a significant challenge. A story-telling task was presented to four stutterers and four non-stutterers in order to analyze the prosodic patterns that surfaced from their spontaneous narrations. Preliminary results revealed that the major difference between stutterers’ and non-stutterers’ disfluencies – aside from the total number – is the type of disfluency and the context affected by the disfluency. Disfluencies in both groups included prolongation, pause and cut, but stutterers’ disfluencies also include repetition and combinations of the three (e.g., cut followed by pause). In addition, stutterers’ disfluencies were accompanied by more prosodic irregularities (e.g. pitch accent on function words, creating a prosodic break with degraded phonetic cues) prior to the actual disfluency than non-stutterers’ disfluencies, indirectly supporting the overvigilant self-monitoring hypothesis.

    Keywords DiSS

  • Matthew P. Aylett, “Extracting the acoustic features of interruption points using non-lexical prosodic analysis,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 17-20.

    Abstract Non-lexical prosodic analysis is our term for the process of extracting prosodic structure from a speech waveform without reference to the lexical contents of the speech. It has been shown that human subjects are able to perceive prosodic structure within speech without lexical cues. There is some evidence that this extends to the perception of disfluency, for example, the detection interruption points (IPs) in low pass filtered speech samples. In this paper, we apply non-lexical prosodic analysis to a corpus of data collected for a speaker in a multi-person meeting environment. We show how non-lexical prosodic analysis can help structure corpus data of this kind, and reinforce previous findings that non-lexical acoustic cues can help detect IPs. These cues can be described by changes in amplitude and f0 after the IP and they can be related to the acoustic characteristics of hyper-articulated speech.

    Keywords DiSS

  • Katarina Bartkova, “Prosodic cues of spontaneous speech in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 21-25.

    Abstract Disfluencies, when present in speech signal, can make syntactic parsing difficult. This difficulty is increased when machines are involved in communication and when speech devices rely on automatic speech recognition techniques. In order to improve automatic speech parsing and thus speech comprehension, methods have been proposed to filter disfluencies out from the speech signal. Attempts have been made to use prosodic parameters to improve such a filtering. However, before introducing prosodic parameters into automatic speech recognition processes, it would be useful to investigate whether disfluencies can be characterized in a prosodic way and whether their prosodic cues would be representative enough to be used in automatic systems. The aim of this study was to examine to which extent prosodic parameters would be able to characterize disfluencies in French. Word repetitions, filled and silent pauses and speech repairs were described in a prosodic way using statistical analyses of their prosodic parameters. These analyses allowed simple prosodic rules to be formulated. The efficiency of the prosodic rules was evaluated on the task of filled pauses, word repetitions and hesitation detections.

    Keywords DiSS

  • Philippe Boula de Mareüil, Benoît Habert, Frédérique Bénard, Martine Adda-Decker, Claude Barras, Gilles Adda, and Patrick Paroubek, “A quantitative study of disfluencies in French broadcast interviews,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 27-32.

    Abstract The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the so-called disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers’ status? And what are the most frequent disfuency patterns?

    Keywords DiSS

  • Jean-Leon Bouraoui, and Nadine Vigouroux, “Disfluency phenomena in an apprenticeship corpus,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 33-37.

    Abstract This papers presents a study carried out on an apprenticeship corpus. It features dialogues between air traffic controllers in formation and "pseudo-pilots". "Pseudo-pilots" are people (often instructors) that simulate the behavior of real pilots, in real situations. Its main specificities are the apprenticeship characteristic, and the fact that the production is subordinate to a particular phraseology. Our study is related to the many kinds of disfluency phenomena that occur in this specific corpus. We define 6 main categories of these phenomena, and take position in regard to the terminology used in literature. We then present the distribution of these categories. It appears that some of the occurrences frequencies largely differs from those observed in other studies. Our explanation is based on the corpus specificity: in reason of their responsibilities, both controllers and pseudo-pilots have to be especially careful to the mistakes they could do, since they could lead to some dramas. The remainder of our paper is dedicated to the more deepen study of a disfluency class: the "false starts". It consists of the beginning utterance of a word, that is not achieved. We show that this category consists of several sub-categories, of which we study the distribution.

    Keywords DiSS

  • Pierpaolo Busan, Giovanna Pelamatti, Alessandro Tavano, Michele Grassi, and Franco Fabbro, “Improvement of verbal behavior after pharmacological treatment of developmental stuttering: a case study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 39-42.

    Abstract Developmental stuttering is a disruption in normal speech fluency and rhythm. Developmental stuttering usually manifests between 6 and 9 years of age and may persist in adulthood. At present, the exact etiology of developmental stuttering is not fully clear. Besides, the dopaminergic neurological component is likely to have a causal role in the manifestation of stuttering behaviors. Actually, some studies seem to confirm the efficacy of antidopaminergic drugs (haloperidol, risperidone and olanzapine, among others) in controlling stuttering behaviors. We present a case of persistent developmental stuttering in a 24-year-old adult male who was able to control his symptoms to a significant extent after administration of risperidone, an antidopaminergic drug. Our findings show that the pharmacological intervention helped the patient improve on a set of fluency tasks but especially when the tasks involved the uttering of content words. Our results are discussed against the current theories on the cognitive and neurological basis of developmental stuttering.

    Keywords DiSS

  • Estelle Campione, and Jean Véronis, “Pauses and hesitations in French spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 43-46.

    Abstract In traditional terminology, silent and filled pauses are grouped together, whereas hesitation lengthening is put into a separate category. However, while these various phenomena are very often associated, there have been few studies on how they interact. We analyzed an hour of spontaneous speech to show that silent and filled pauses operate in a totally different way, and that contrary to common belief, silent pauses by themselves never serve as hesitation markers, but only do so when coupled with other markers – mostly syllabic lengthening and filled pauses. These last two hesitation markers have similar acoustic and articulatory characteristics; they are also distributed and function alike.

    Keywords DiSS

  • Maria Candea, Ioana Vasilescu, and Martine Adda-Decker, “Inter- and intra-language acoustic analysis of autonomous fillers,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 47-51.

    Abstract The present work deals with autonomous fillers in a multilingual context. The question addressed here is whether fillers are carrying universal or language-specific characteristics. Fillers occur frequently in spontaneous speech and represent an interesting topic for improving language-specific models in automatic language processing. Most of the current studies focus on few languages such as English and French. We focus here on multilingual fillers resulting from eight languages (Arabic, Mandarin Chinese, French, German, Italian, European Portuguese, American English and Latin American Spanish). We propose thus an acoustic typology based on the vocalic peculiarities of the autonomous fillers. Three parameters are considered here: duration, pitch (F0) and timbre (F1/F2). We also compare the vocalic segments of the fillers with intra-lexical vowels possessing similar timbre. In this purpose, a preliminary study on French language is described.

    Keywords DiSS

  • Jennifer Cole, Mark Hasegawa-Johnson, Chilin Shih, Heejin Kim, Eun-Kyung Lee, Hsin-yi Lu, Yoonsook Mo, and Tae-Jin Yoon, “Prosodic parallelism as a cue to repetition and error correction disfluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 53-58.

    Abstract Complex disfluencies that involve the repetition or correction of words are frequent in conversational speech, with repetition disfluencies alone accounting for over 20% of disfluencies. These disfluencies generally do not lead to comprehension errors for human listeners. We propose that the frequent occurrence of parallel prosodic features in the reparandum (REP) and alteration (ALT) intervals of complex disfluencies may serve as strong perceptual cues that signal the disfluency to the listener. We report results from a transcription analysis of complex disfluencies that classifies disfluent regions on the basis of prosodic factors, and preliminary evidence from F0 analysis to support our finding of prosodic parallelism.

    Keywords DiSS

  • Andrew A. Cooper, and John T. Hale, “Promotion of disfluency in syntactic parallelism,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 59-63.

    Abstract The development of a disfluency-robust speech parser requires some insight into where disfluencies occur in spontaneous spoken language. This corpus study deals with one syntactic variable which is predictive of disfluency location: syntactic parallelism. A formal definition of syntactic parallelism is used to show that syntactic parallelism is indeed predictive of disfluency.

    Keywords DiSS

  • Rodolfo Delmonte, “Modeling conversational styles in Italian by means of overlaps,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 65-70.

    Abstract Conversational styles vary cross-culturally remarkably: communities of speakers – rather than single speakers - seem to share turn-taking rules which do not always coincide with those shared by other communities of the same language. These rules are usually responsible for the smoothness of conversational interaction and the readiness of the attainment of communicative goals by conversants. Overlaps constitute a disruptive element in the economy of conversations: however, they show regular patterns which can be used to define conversational styles (Ford and Thompson, 1996). Overlaps constitute a challenge for any system of linguistic representations in that they cannot be treated as a one-dimensional event: in order to take into account the purport of an overlapping stretch of dialogue for the ongoing pragmatics and semantics of discourse, we have devised a new annotation schema which is then fed into the parser and produces a multidimensional linear syntactic constituency representation. This study takes a new tack on the issues raised by overlaps, both in terms of its linguistic representation and its semantic and pragmatic interpretation. It will present work carried out on the 60,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API - the Italian version of MapTask, in particular the parser, to produce syntactic structures of overlapped temporally aligned turns. We will also present preliminary data from IPAR, another corpus of spontaneous dialogues run with the Spot Differences protocol. Then it will concentrate on the syntactic, semantic and prosodic aspects related to this debated issue. The paper will argue in favour of a joint and thus temporally aligned representation of overlapping material to capture all linguistic information made available by the local context. This will result in a syntactically branching node we call OVL which contains both the overlapper’s and the overlappee’s material (linguistic or non-linguistic). An extended classification of the phenomenon has shown that overlaps contribute substantially to the interpretation of the local context rather than the other way around. They also determine the overall conversational style of a given community of speakers with cultural import.

    Keywords DiSS

  • Janet Fletcher, Nicholas Evans, and Belinda Ross, “The intra-word pause and disfluency in Dalabon,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 77-81.

    Abstract Earlier impressionistic analyses of Dalabon indicate that the grammatical word is often realized as either an accentual or an intonational phrase, followed by a pause. Unusually, it can also be interrupted by a silent pause, with each section being potentially (although not necessarily) realized as separate intonational phrases. Our analyses of pause duration and pause placement within grammatical words support these earlier impressions, although this use of the silent pause appears to be restricted to certain affix boundaries, and other phonological constraints relating to the following surrounding linguistic material. These interruptions also share certain characteristics of "normal" disfluencies however.

    Keywords DiSS

  • Kristy Beers Fägersten, “Hesitations and repair in German,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 71-76.

    Abstract The occurrence of pauses and hesitations in spontaneous speech has been shown to occur systematically, for example, "between sentences, after discourse markers and conjunctions and before accented content words." (Hansson [15]) This is certainly plausible in English, where pauses and hesitations can and often do occur before content words such as nominals, for example, "uh, there’s a ... man." (Chafe [8]) However, if hesitations are, in fact, evidence of "deciding what to talk about next," (Chafe [8]) then the complex grammatical system of German should render this pausing position precarious, since pre-modifiers must account for the gender of the nominals they modify. In this paper, I present data to test the hypothesis that pre-nominal hesitation patterns in German are dissimilar to those in English. Hesitations in German will be shown, in fact, to occur within noun phrase units. Nevertheless, native speakers most often succeed in supplying a nominal which conforms to the gender indicated by the determiner or pre-modifier. Corrections, or repairs, of infelicitous pre-modifiers indicate that the speaker was unable to supply a nominal of the same gender which the choice of pre-modifier had committed him/her to. The frequency of such repairs is shown to vary according to task, with fewest repairs occurring in elicited speech which allows for linguistic freedom and therefore is most like spontaneous speech. The data sets indicate that among German native speakers, hesitations occurring before noun phrase units (pre-NPU hesitations) indicate deliberation of what to say, while hesitations within or before the head of the noun phrase (pre-NPH hesitations) indicate deliberation of how to say what has already been decided (cf. Chafe [8]).

    Keywords DiSS

  • Tiit Hennoste, “Repair-initiating particles and um-s in Estonian spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 83-88.

    Abstract Particles and um-s used in spontaneous Estonian speech as initiators of different types of repair are analysed. Our model and typology of repair based on conversation analysis is introduced. Three main types of repair and particles used to initiate those are described: prepositioned self-initiated self-repair, postpositioned self-initiated self-repair (addition, substitution, insertion and abandon), and other-initiated self-repair (reformulation, clarification and misunderstanding). In conclusion 6 groups of particles are brougth out by the role they play in the initiation of the repair sequence. Data come from Corpus of Spoken Estonian of the University of Tartu, which contains everyday and institutional speech, telephone and face-to-face conversations.

    Keywords DiSS

  • Sandrine Henry, “Repeats in spontaneous spoken French: the influence of the complexity of phrases,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 89-92.

    Abstract We here present the results of a descriptive study we conducted on 383 disfluent repeats from a corpus of spontaneous spoken French. We analyze noun phrases under construction and study whether there is a co-relation between the frequency of the repeats and the complexity feature of the phrases. We then focus on complex noun phrases in order to locate precisely the repeats. We also analyze how repeats affect structures such as [Preposition + Determiner + Noun] and what the constraints upon such structures are.

    Keywords DiSS

  • Peter Howell, and Olatunji Akande, “Simulations of the types of disfluency produced in spontaneous utterances by fluent speakers, and the change in disfluency type seen as speakers who stutter get older,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 93-98.

    Abstract The EXPLAN model is implemented on a graphic simulator. It is shown that it is able to produce speech in serial order and several types of fluency failure produced by fluent speakers and speakers who stutter. A way that EXPLAN accounts for longitudinal changes in the pattern of fluency failures shown by speakers who stutter is demonstrated.

    Keywords DiSS

  • Peter Howell, Jennifer Hayes, Ceri Savage, Jane Ladd, and Nafisa Patel, “Factors that determine the form and position of disfluencies in spontaneous utterances,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 99-102.

    Abstract This presentation reviews work on types of disfluency in the spontaneous speech of fluent speakers and speakers who stutter. Examination is made of factors that determine where disfluencies are located. It is concluded that the phonological, or prosodic, word provides a good basis for explaining the distribution of different types of disfluency in spontaneous speech.

    Keywords DiSS

  • T. Florian Jaeger, “Optional ’that’ indicates production difficulty: evidence from disfluencies,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 103-108.

    Abstract Optional word omission, such as that omission in complement and relative clauses, has been argued to be driven by production pressure (rather than by comprehension). One particularly strong production-driven hypothesis states that speakers insert words to buy time to alleviate production difficulties. I present evidence from the distribution of disfluencies in non-subject-extracted relative clauses arguing against this hypothesis. While word omission is driven by production difficulties, speakers may use that as a collateral signal to addressees, informing them of anticipated production difficulties. In that sense, word omission would be subject to audience design (i.e. catering to addressees’ needs).

    Keywords DiSS

  • Jumpei Kaneda, “Phrase-final rise-fall intonation and disfluency in Japanese - a preliminary study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 109-112.

    Abstract In Japanese conversations, rise-fall intonation with vowel lengthening often occurs on the final syllable of a phrase. This phrase-final rise-fall (PFRF) is a new type of intonation first reported in the 1960’s. Researchers consider PFRF intonation a discourse marker which functions to sharpen the phrase boundary and retain the utterance turn, but other phrase-final intonation such as phrase-final lengthening (PFL) can have a similar pattern. PFLs are recognized as a type of disfluent speech with similar characteristics to PFRFs in terms of final-lengthening and having discourse functions. Also from reports about the spontaneity of speech, we assume that PFRFs would have a relation with disfluency, as well as with PFLs. To examine this assumption, this paper attempts to show the co-occurrence relation between PFRF and disfluency in the same utterance. The results show that PFRFs and PFLs have a relation to posterior disfluent units and suggest that both indicate speech planning strategies. Further, this paper speculates that a difference between PFRF and PFL is a difference in the purposes of speech planning: the latter represents ongoing linguistic editing while the former indicates adjusting the utterance according to the interlocutor’s reaction. Disfluencies accordingly occur as effects from processes of speech planning.

    Keywords DiSS

  • Shigeyoshi Kitazawa, “Evaluation of vowel hiatus in prosodic boundaries of Japanese,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 113-116.

    Abstract We investigated V-V hiatus through J-ToBI labeling and listening to whole phrases to estimate degree of discontinuity and, if possible, to determine the exact boundary between two phrases. Appropriate boundaries were found in most cases as the maximum perceptual score. Using electroglottography (EGG) of the open quotients OQ, pitch mark and spectrogram, the acoustic phonological feature of these V-V hiatus was found as phrase-initial glottalization and phrase-final nasalization observable in EGG and spectrogram, as well as phrase-final lengthening and phrase-initial shortening of the morae. A small dip was observable at the boundary of V-V hiatus showing glottalization. The test materials are taken from the "Japanese MULTEXT", consisting of a particle - vowel (36), adjective - vowel (5), and word - word (4).

    Keywords DiSS

  • Ellen F. Lau, and Fernanda Ferreira, “Lingering effects of disfluent material on comprehension of garden path sentences,” Language and Cognitive Processes, vol. 20, no. 5, 2005, pp. 633-666. DOI: 10.1080/01690960444000142.

    Abstract In two experiments, we tested for lingering effects of verb replacement disfluencies on the processing of garden path sentences that exhibit the main verb/reduced relative (MV/RR) ambiguity. Participants heard sentences with revisions like The little girl chosen, uh, selected for the role celebrated with her parents and friends. We found that the syntactic ambiguity associated with the reparandum verb involved in the disfluency (here chosen) had an influence on later parsing: Garden path sentences that included such revisions were more likely to be judged grammatical if the reparandum verb was structurally unambiguous. Conversely, ambiguous non-garden path sentences were more likely to be judged ungrammatical if the structurally unambiguous disfluency verb was inconsistent with the final reading. Results support a model of disfluency processing in which the syntactic frame associated with the replacement verb "overlays" the previous verb’s structure rather than actively deleting the already-built tree.

    Keywords Cognitive Psychology, Language, Language & Linguistics, Neuropsychology, Psychology of, Speech & Language Disorders, Speech Perception & Production

  • Che-Kuang Lin, Shu-Chuan Tseng, and Lin-Shan Lee, “Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 117-121.

    Abstract This paper presents a whole set of new features, some duration-related and some pitch-related, to be used in disfluency interruption point (IP) detection for spontaneous Mandarin speech, considering the special linguistic characteristics of Mandarin Chinese. Decision tree is incorporated into the maximum entropy model to perform the IP detection. By examining performance degradation when each specific feature was missing from the whole set, the most important features for IP detection for each disfluency type were analyzed in detail. The experiments were conducted on the Mandarin Conversational Dialogue Corpus (MCDC) developed by the Institute of Linguistics of Academia Sinica in Taiwan.

    Keywords DiSS

  • Tobias Lövgren, and Jan van Doorn, “Influence of manipulation of short silent pause duration on speech fluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 123-126.

    Abstract Ordinary speech contains disfluencies in the form of hesitations and repairs. When listeners make global judgements on speech fluency they are influenced by the frequency and nature of the individual disfluencies contained in the speech. The aim of this study was to investigate a single dimension, pause duration, in the perception of speech fluency. The method involved simulation of pause duration within naturally fluent speech by manipulating existing acoustic silences in the speech. Four conditions were created: one for the natural speech and three with step wise increases in acoustic silence durations (average x2, x4 and x7.5 respectively). In a forced choice task listeners were asked to judge the speech samples as fluent or non fluent. The results showed that the percentage of judgements of disfluency increased as the pause durations increased, and that the difference between the unmanipulated speech condition and the two conditions with the longest pause durations were statistically significant. The results were interpreted to indicate that the individual dimension of pause duration has an independent influence on the judgement of fluency in ordinary speech.

    Keywords DiSS

  • Elgar-Paul Magro, “Disfluency markers and their facial and gestural correlates. preliminary observations on a dialogue in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 127-131.

    Abstract The aim of this article is to try to establish any observable regularities between the vocal and the visual expression of disfluency markers in a French spontaneous dialogue. The data show different configurations for different types of disfluency markers. Thus "euh"s are typically accompanied by mutual eye contact and no gesture; interrupted eye contact takes place less frequently, on occasions where speech planning is more seriously impaired (syntactical disruption and combination of "euh" with other disfluency markers). False starts seem to be typically accompanied by gesture production whereas eye contact can be maintained if the speaker relies or not on the listener to resolve the speech production problem. The article takes up the idea that disfluency markers can be classified along a continuum throughout the speech formulation process, going from the most discreet to the most prominent. It suggests that the more prominent the disfluency, the more likely is the visual channel to play a role (interrupted eye contact and gesture production).

    Keywords DiSS

  • Jan McAllister, and Mary Kingston, “Characteristics of final part-word repetitions,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 7-11.

    Abstract In an earlier paper, we have described final part-word repetitions in the conversational speech of two school-age boys of normal intelligence with no known neurological lesions. In this paper we explore in more detail the phonetic and linguistic characteristics of the speech of the boys. The repeated word fragments were more likely to be preceded by a pause than followed by one. The word immediately following the fragment tended to have a higher word frequency score than other surrounding words. Utterances containing the disfluencies typically contained a greater number of syllables than those that did not; however, there was no reliable difference between fluent and disfluent utterances in terms of their grammatical complexity.

    Keywords DiSS

  • Hannele Nicholson, Ellen Gurman Bard, Robin Lickley, Anne H. Anderson, Catriona Havard, and Yiya Chen, “Disfluency and behaviour in dialogue: evidence from eye-gaze,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 133-138.

    Abstract Previous research on disfluency types has focused on their distinct cognitive causes, prosodic patterns, or effects on the listener. This paper seeks to add to this taxonomy by providing a psycholinguistic account of the dialogue and gaze behaviour speakers engage in when they make certain types of disfluency. Dialogues came from a version of the Map Task, [2, 4], in which 36 normal adult speakers each participated in six dialogues across which feedback modality and time-pressure were counter-balanced. In this paper, we ask whether disfluency, both generally and type-specifically, was associated with speaker attention to the listener. We show that certain disfluency types can be linked to particular dialogue goals, depending on whether the speaker had attended to listener feedback. The results shed light on the general cognitive causes of disfluency and suggest that it will be possible to predict the types of disfluency which will accompany particular behaviours.

    Keywords DiSS

  • Sieb Nooteboom, “Lexical bias re-re-visited. some further data on its possible cause.,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 139-144.

    Abstract This paper describes an experiment eliciting spoonerisms by using the so-called SLIP technique. The purpose of the experiment was to provide a further test of the hypothesis that self-monitoring of inner speech is a major source of lexical bias. This is a follow-up on an earlier experiment in which subjects were explicitly prompted after each response to make a correction in case of a speech error. In the current experiment both the prompt and the extra time for correction were left out, and there was no strong time pressure for the subject in giving his response. It is shown that under these conditions many primed-for spoonerisms are replaced by other, mostly lexical, errors. These ’replacing’ or ’secondary’ errors are more frequent in the condition priming for nonword-nonword errors than in the condition priming for word-word errors. Response times obtained for replacing errors are considerably and significantly longer than response times for overtly interrupted errors, and also longer than response times for the primed-for spoonerisms. This suggests that a time-consuming operation follows the primed-for spoonerisms in inner speech, and replaces those with other speech errors, often to preserve lexicality of the error.

    Keywords DiSS

  • Daniel O’Connell, and Sabine Kowal, “Uh and Um Revisited: Are They Interjections for Signaling Delay?,” Journal of Psycholinguistic Research, vol. 34, no. 6, 2005, pp. 555-576. DOI: 10.1007/s10936-005-9164-3.

    Abstract Clark and Fox Tree (2002) have presented empirical evidence, based primarily on the London-Lund corpus (LL; Svartvik & Quirk, 1980), that the fillers uh and um are conventional English words that signal a speaker’s intention to initiate a minor and a major delay, respectively. We present here empirical analyses of uh and um and of silent pauses (delays) immediately following them in six media interviews of Hillary Clinton. Our evidence indicates that uh and um cannot serve as signals of upcoming delay, let alone signal it differentially: In most cases, both uh and um were not followed by a silent pause, that is, there was no delay at all; the silent pauses that did occur after um were too short to be counted as major delays; finally, the distributions of durations of silent pauses after uh and um were almost entirely overlapping and could therefore not have served as reliable predictors for a listener. The discrepancies between Clark and Fox Tree’s findings and ours are largely a consequence of the fact that their LL analyses reflect the perceptions of professional coders, whereas our data were analyzed by means of acoustic measurements with the PRAAT software ( A comparison of our findings with those of O’Connell, Kowal, and Ageneau (2005) did not corroborate the hypothesis of Clark and Fox Tree that uh and um are interjections: Fillers occurred typically in initial, interjections in medial positions; fillers did not constitute an integral turn by themselves, whereas interjections did; fillers never initiated cited speech, whereas interjections did; and fillers did not signal emotion, whereas interjections did. Clark and Fox Tree’s analyses were embedded within a theory of ideal delivery that we find inappropriate for the explication of these phenomena.

    Keywords filled pauses, fillers, hesitations, interjections, spontaneous speech, uh, um

  • Daniel O’Connell, and Sabine Kowal, “Where Do Interjections Come From? A Psycholinguistic Analysis of Shaw’s Pygmalion,” Journal of Psycholinguistic Research, vol. 34, no. 5, September 2005, pp. 497-514. DOI: 10.1007/s10936-005-6205-x.

    Abstract Starting from our recent findings regarding emotional and initializing functions of interjections in TV and radio interviews (Kowal & O’Connell, 2004b; O’Connell & Kowal, in press; O’Connell, Kowal, & Ageneau, 2005), we used the book and script of Shaw (1916/1969) and the audiotape of the motion picture (Pascal, Asquith, & Howard, 1938) Pygmalion to investigate how actors use interjections to express emotions. The following hypotheses were tested: (1) The actors use the written cues selectively in their oral performance by substituting, adding, and deleting interjections; (2) primary interjections added by the actors are less conventional than those in the written text; (3) durations and number of syllables of Eliza Doolittle’s spoken renditions of her signature interjection ah-ah-ah-ow-ow-ow-oo do not correlate with the length in letters and syllables of the written versions; and (4) there is no evidence for Ameka’s (1992b, 1994) characterization of interjections as temporally isolated, i.e., preceded and followed by silent pauses, in consequence of their syntactic isolation. Our findings confirmed all the hypotheses except for one unexpectedly significant correlation between number of syllables in Eliza Doolittle’s signature interjection in the written version and duration in seconds of the spoken version thereof. The common thread throughout these data is the actor’s need to personalize emotions in a dramatic performance—by means of interjections other than those provided in the written text. In this process of personalization, the emotional and initializing functions of interjections are confirmed.

    Keywords conceptual and medial orality, dramatic performance, emotional expression, interjections, spontaneity

  • Daniel O’Connell, Sabine Kowal, and Carie Ageneau, “Interjections in Interviews,” Journal of Psycholinguistic Research, vol. 34, no. 2, March 2005, pp. 153-171. DOI: 10.1007/s10936-005-3636-3.

    Abstract A psycholinguistic hypothesis regarding the use of interjections in spoken utterances, originally formulated by Ameka (1992b, 1994) for the English language, but not confirmed in the German-language research of Kowal and O’Connell (2004 a & c), was tested: The local syntactic isolation of interjections is paralleled by their articulatory isolation in spoken utterances, i.e., by their occurrence between a preceding and a following pause. The corpus consisted of four TV and two radio interviews of Hillary Clinton that had coincided with the publication of her book Living History (2003) and one TV interview of Robin Williams by James Lipton. No evidence was found for articulatory isolation of English-language interjections. In the Hillary Clinton interviews and Robin Williams interviews, respectively, 71% and 73% of all interjections occurred initially, i.e., at the onset of various units of spoken discourse: at the beginning of turns; at the beginning of articulatory phrases within turns, i.e., after a preceding pause; and at the beginning of a citation within a turn (either Direct Reported Speech [DRS] or what we have designated Hypothetical Speaker Formulation [HSF]. One conventional interjection (OH) occurred most frequently. The Robin Williams interview had a much higher occurrence of interjections, especially nonconventional ones, than the Hillary Clinton interviews had. It is suggested that the onset or initializing role of interjections reflects the temporal priority of the affective and the intuitive over the analytic, grammatical, and cognitive in speech production. Both this temporal priority and the spontaneous and emotional use of interjections are consonant with Wundtrsquos (1900) characterization of the primary interjection as psychologically primitive. The interjection is indeed the purest verbal implementation of conceptual orality.

    Keywords conceptual orality, interjection, interview

  • Berthille Pallaud, “The re-adjustment of word-fragments in spontaneous spoken French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 145-149.

    Abstract A study of word-fragments in spoken French has been undertaken for a few years on the basis of non directive talks corpora recorded and transcribed according to GARS’ conventions (DELIC currently). These disfluencies are often analyzed within the framework of disfluent repetitions. The observations made on these two types of disfluencies led us to distinguish them. The aim of our study is to describe on the one hand insertions which take place in relation to the word interruptions and their re-adjustment, and on the other hand, to specify the types and localizations of retracing which follow these interruptions. Two kinds of incidental clauses were observed at the time of the readjustments which follow these disturbances. Some, (the more numerous) are syntactically linked to the fragment or with its retracing, others are not. Moreover, the word-fragments which will be modified are the only one to be dependent on the type of localization. For the others, this localization does not make it possible to predict the category of interruption (complemented or unfinished). Our results on word-fragments, confirm however that in contemporary French, the retracing at the head of the nominal or verbal group which contains the disfluency remains the simplest example (at the same time the most frequent, [5]. Nevertheless, a third of the retracing either does not go back to the beginning of the Group, or exceeds it.

    Keywords DiSS

  • Myriam Piccaluga, Jean-Luc Nespoulous, and Bernard Harmegnies, “Disfluencies as a window on cognitive processing. an analysis of silent pauses in simultaneous interpreting,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 151-155.

    Abstract The paper focuses on silent pauses observed in the productions of subjects involved in simultaneous interpreting tasks. Four bilingual subjects with various degrees of expertise in interpreting and various degrees of mastery of the languages involved (French and Spanish) have been recorded while interpreting utterances of French and Spanish talks. The source discourses had been perturbated by changes both in speech rates (by time compression) and in auditory quality (by addition of a parasiting noise). On the basis of acoustical analyzes performed on the subjects’ productions, statistical analyzes focus both on the number and on the duration of the observed pauses. This double approach enables investigations of the kind of cognitive disturbances caused by the independent variables and allows further speculation on the semiology of the pauses durations.

    Keywords DiSS

  • Melanie Soderstrom, and James L. Morgan, “Disfluency in speech input to infants? The interaction of mother and child to create error-free speech input for language acquisition,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 157-162.

    Abstract One characteristic of infant-directed speech is that it is highly fluent compared with adult-directed speech. However, the speech that infants hear still contains disfluencies. Such disfluencies might potentially cause problems for infants during language development. We first analyzed samples of spontaneous speech in the presence of infants (both adult- and infant-directed) and found that under ideal circumstances the speech infants hear is highly fluent. Under less than ideal circumstances infants hear much more highly disfluent speech - however this disfluent speech is almost entirely adult-directed. While grammatically ill-formed, the prosodic structure of these disfluencies might signal their ill-formedness to the infants. In a preference experiment, 10 month olds listened longer to infant-directed speech samples containing prosodic disfluencies than to equated samples without disfluency. However, this effect was found in only one of two counterbalancing groups. Using adult ratings of low-pass versions of these speech samples, we found that infants’ preferences were correlated with the adults’ perception of the relative disfluency of the samples. A follow-up experiment using adult-directed disfluencies found that while the 10 month olds showed no differences in their listening preferences, older infants preferred to listen to the fluent speech. These results suggest that younger and older infants attend differently to infant and adult-directed speech, and that older infants may be able to differentiate grammatical adult-directed input from input distorted by disfluency. We discuss implications of these findings for language acquisition.

    Keywords DiSS

  • Ellen Thompson, “A cross-linguistic look at VP-ellipsis and verbal speech errors,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 163-164.

    Abstract This paper argues that consideration of spontaneous speech errors provides insight into cross-linguistic analyses of syntactic phenomena. In particular, I claim that differences in the distribution of non-parallel VP-Ellipsis constructions in English and German, as well as variation in the spontaneously-occurring verbal speech errors, is explained by a parametric analysis of variation in the inflectional systems of the two languages.

    Keywords DiSS

  • Doroteo T. Toledano, Antonio Moreno Sandoval, José Colás Pasamontes, and Javier Garrido Salas, “Acoustic-phonetic decoding of different types of spontaneous speech in Spanish,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 165-168.

    Abstract This paper presents preliminary acoustic-phonetic decoding results for Spanish on the spontaneous speech corpus C-ORAL-ROM. These results are compared with results on the read speech corpus ALBAYZIN. We also compare the decoding results obtained with the different types of spontaneous speech in C-ORAL-ROM. As the most important conclusions, the experiments show that the type of spontaneous speech has a deep impact on spontaneous speech recognition results. Best speech recognition results are those obtained on speech captured from the media.

    Keywords DiSS

  • Michiko Watanabe, Yasuharu Den, Keikichi Hirose, and Nobuaki Minematsu, “The effects of filled pauses on native and non-native listeners’ speech processing,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 169-172.

    Abstract Everyday speech is abundant with disfluencies. However, little is known about their roles in speech communication. We examined the effects of filled pauses at phrase boundaries on native and non-native listeners in Japanese. Study of spontaneous speech corpus showed that filled pauses tended to precede relatively long and complex constituents. We tested the hypothesis that filled pauses biased listeners’ expectation about the upcoming phrase toward a longer and complex one. In the experiment participants were presented with two shapes at one time, one simple and the other compound. Their task was to identify the one that they heard as soon as possible. The speech stimuli involved two factors: complexity and fluency. As the complexity factor, a half of the speech stimuli described compound shapes with long and complex phrases and the other half described simple shapes with short and simple phrases. As the fluency factor phrases describing a shape had a preceding filled pause, a preceding silent pause of the same length, or no preceding pause. The results of the experiments with both native and non-native listeners showed that response times to the complex phrases were significantly shorter after filled or silent pauses than when there was no pause. In contrast, there was no significant difference between the three conditions for the simple phrases, supporting the hypothesis.

    Keywords DiSS

  • Yelena Yasinnik, Stefanie Shattuck-Hufnagel, and Nanette Veilleux, “Gesture marking of disfluencies in spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 173-178.

    Abstract Speakers effectively use both visual and acoustic cues to convey information in speech. While earlier research has concentrated on the association of visual cues (provided by gestures) with fluent prosodic structure, this study looks at the relationship between visual cues, prosodic markers and spoken disfluencies. Preliminary results suggested that speakers preferentially perform gestures in the eye region in spoken disfluencies, but a more careful frame-by-frame analysis capturing all gestures revealed that movements of the eye region (blinks, frowns, eyebrow raises and changes in direction of eyegaze) occur with high frequency in both fluent and non-fluent speech. The paper describes a method for frame-by-frame labelling of speech- accompanying gestures for a speech sample, whose output can then be combined with independently derived labels of the prosody. Initial analysis of 3 minute samples from two speakers reveals that one speaker produces eye movements in association with disfluencies and the other does not, and that this tendency does not result from alignment of brow gestures with pitch accents.

    Keywords DiSS

  • Yuan Zhao, and Dan Jurafsky, “A preliminary study of Mandarin filled pauses,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 179-182.

    Abstract The paper reports preliminary results on Mandarin filled pauses (FPs), based on a large speech corpus of Mandarin telephone conversation. We find that Mandarin intensively uses both demonstratives (zhege ’this’, nage ’that’) and uh/ mm as FPs. Demonstratives are more frequent FPs and are more likely to be surrounded by other types of disfluency phenomena than uh/mm, as well as occurring more often in nominal environments. We also find durational differences: FP demonstratives are longer than non-FP demonstratives, and mm is longer than uh. The study also revealed dialectal influence on the use of FPs. Our results agree with earlier work which shows that a language may divide conversational labor among different FPs. Our work also extends this research in suggesting that different languages may assign conversational functions to FPs in different ways.

    Keywords DiSS


  • Jennifer Arnold, Michael K. Tanenhaus, Rebecca Altmann, and Maria Fagnano, “The Old and Thee, uh, New: Disfluency and Reference Resolution,” Psychological Science, vol. 15, no. 9, September 2004, pp. 578-582. DOI: 10.1111/j.0956-7976.2004.00723.x.

    Abstract Most research on the rapid mental processes of online language processing has been limited to the study of idealized, fluent utterances. Yet speakers are often disfluent, for example, saying "thee, uh, candle" instead of "the candle." By monitoring listeners’ eye movements to objects in a display, we demonstrated that the fluency of an article ("thee uh" vs. "the") affects how listeners interpret the following noun. With a fluent article, listeners were biased toward an object that had been mentioned previously, but with a disfluent article, they were biased toward an object that had not been mentioned. These biases were apparent as early as lexical information became available, showing that disfluency affects the basic processes of decoding linguistic input.

  • J. C. Brown, “Eliminating the Segmental Tier: Evidence from Speech Errors,” Journal of Psycholinguistic Research, vol. 33, no. 2, March 2004, pp. 97-101. DOI: 10.1023/B:JOPR.0000017222.24698.73.

    Abstract The dominant viewpoint regarding phonologically driven speech errors is that segments are the units responsible behind the errors. The goal of this paper is to illustrate the point that other potential candidates for explaining these speech errors, which have gone largely unnoticed, provide a better explanatory framework for speech errors than do segments. By looking at unambiguous cases and patterns of markedness, it can be shown that there exists good evidence for features and prosodic constituents in speech errors, but never any positive evidence for segments. All of these considerations taken into account together lend strong support to the argument that there is no need for a segmental level of analysis in phonology.

    Keywords Phonology, production errors, segments, slips of the tongue

  • Fernanda Ferreira, and Karl G.D. Bailey, “Disfluencies and human language comprehension,” TRENDS in Cognitive Sciences, vol. 8, no. 5, May 2004, pp. 231-237. DOI: 10.1016/j.tics.2004.03.011.

    Abstract Spoken language contains disfluencies, which include editing terms such as uh and um as well as repeats and corrections. In less than ten years the question of how disfluencies are handled by the human sentence comprehension system has gone from virtually ignored to a topic of major interest in computational linguistics and psycholinguistics. We discuss relevant empirical findings and describe a computational model that captures how disfluencies influence parsing and comprehension. The research reviewed shows that the parser, which presumably evolved to handle conversations, deals with disfluencies in a way that is efficient and linguistically principled. The success of this research program reinforces the current trend in cognitive science to view cognitive mechanisms as adaptations to real-world constraints and challenges.

  • Fernanda Ferreira, Ellen F. Lau, and Karl G.D. Bailey, “Disfluencies, language comprehension, and Tree Adjoining Grammars,” Cognitive Science, vol. 28, no. 5, 2004, pp. 721-749. DOI: 10.1016/j.cogsci.2003.10.006.

    Abstract Disfluencies include editing terms such as uh and um as well as repeats and revisions. Little is known about how disfluencies are processed, and there has been next to no research focused on the way that disfluencies affect structure-building operations during comprehension. We review major findings from both computational linguistics and psycholinguistics, and then we summarize the results of our own work which centers on how the parser behaves when it encounters a disfluency. We describe some new research showing that information associated with misarticulated verbs lingers, and which adds to the large body of data on the critical influence of verb argument structures on sentence comprehension. The paper also presents a model of disfluency processing. The parser uses a Tree Adjoining Grammar to build phrase structure. In this approach, filled and unfilled pauses affect the timing of Substitution operations. Repairs and corrections are handled by a mechanism we term "Overlay," which allows the parser to overwrite an undesired tree with the appropriate, correct tree. This model of disfluency processing highlights the need for the parser to sometimes coordinate the mechanisms that perform garden-path reanalysis with those that do disfluency repair. The research program as a whole demonstrates that it is possible to study disfluencies systematically and to learn how the parser handles filler material and mistakes. It also showcases the power of Tree Adjoining Grammars, a formalism developed by Aravind Joshi which has yielded results in many different areas of linguistics and cognitive science.

    Keywords disfluencies, parsing, syntax, TAG

  • Barbara F. Freed, Norman Segalowitz, and Dan P. Dewey, “Context of Learning and Second Language Fluency in French: Comparing Regular Classroom, Study Abroad, and Intensive Domestic Immersion Programs,” Studies in Second Language Acquisition, vol. 26, no. 02, 2004, pp. 275-301. DOI: 10.1017/S0272263104262064.

    Abstract We compared the acquisition of various dimensions of fluency by 28 students of French studying in three different learning contexts: formal language classrooms in an at home (AH) institution, an intensive summer immersion (IM) program, and a study abroad (SA) setting. For the purpose of oral data collection, students participated in oral interviews (similar to the Oral Proficiency Interview) at the beginning and the end of the semester and provided information regarding language use and interactions. Analyses included comparisons of gain scores as a function of the learning context and as a function of the time reported using French outside of class. The main findings that reached statistical significance include: (a) The IM group made significant gains in oral performance in terms of the total number of words spoken, in length of the longest turn, in rate of speech, and in speech fluidity based on a composite of fluidity measures. When compared to the AH group, the SA group made statistically significant gains only in terms of speech fluidity but fewer gains than the IM group. The AH group made no significant gains. (b) The IM students reported that they spoke and wrote French significantly more hours per week than the other two groups. The SA group reported using English more than French (although the difference was not statistically significant) and reported using significantly more English in out-of-class activities than the IM group. (c) Multiple regression analyses revealed that reported hours per week spent writing outside of class was significantly associated with oral fluidity gains.

  • Judit Kormos, and Mariann Dénes, “Exploring measures and perceptions of fluency in the speech of second language learners,” System, vol. 32, no. 2, 2004, pp. 145-164. DOI: 10.1016/j.system.2004.01.001.

    Abstract The research reported in this paper explores which variables predict native and non-native speaking teachers’ perception of fluency and distinguish fluent from non-fluent L2 learners. In addition to traditional measures of the quality of students’ output such as accuracy and lexical diversity, we investigated speech samples collected from 16 Hungarian L2 learners at two distinct levels of proficiency with the help of computer technology. The two groups of students were compared and their temporal and linguistic measures were correlated with the fluency scores they received from three experienced native and three non-native speaker teacher judges. The teachers’ written comments concerning the students’ performance were also taken into consideration. For all the native and non-native teachers, speech rate, the mean length of utterance, phonation time ratio and the number of stressed words produced per minute were the best predictors of fluency scores. However, the raters differed as regards how much importance they attributed to accuracy, lexical diversity and the mean length of pauses. The number of filled and unfilled pauses and other disfluency phenomena were not found to influence perceptions of fluency.

  • Sandra Merlo, and Letı́cia Lessa Mansur, “Descriptive discourse: topic familiarity and disfluencies,” Journal of Communication Disorders, vol. 37, 2004, pp. 489-503. DOI: 10.1016/j.jcomdis.2004.03.002.

    Abstract This investigation was undertaken to address questions about topic familiarity and disfluencies during oral descriptive discourse of adult speakers. Participants expressed more attributes when the topic was familiar than when it was unfamiliar. Fillers and lexical pauses were the most frequent disfluencies. The mean duration of each hesitation pause was 776 ms. The sum of hesitation pause durations was well correlated with the number of occurrences. Repetitions, hesitation pauses, and prolongations were shown to have the same role, which was distinct from the role of fillers. The type of analysis conducted in this investigation may be useful in distinguishing between normal and disordered speech production. Learning outcomes: The reader will obtain information about the differences between the number of propositions in familiar and unfamiliar oral descriptions. The reader will also become aware of the distribution of disfluencies in discourse categories employed by the participants in this investigation.

    Keywords Descriptive discourse, disfluency, Fluency, Topic familiarity

  • Daniel O’Connell, and Sabine Kowal, “The History of Research on the Filled Pause as Evidence of ’The Written Language Bias in Linguistics’ (Linell, 1982),” Journal of Psycholinguistic Research, vol. 33, no. 6, 2004, pp. 459-474. DOI: 10.1007/s10936-004-2666-6.

    Abstract Erard’s (2004) publication in the New York Times of a journalistic history of the filled pause serves as the occasion for this critical review of the past half-century of research on the filled pause. Historically, the various phonetic realizations or instantiations of the filled pause have been presented with an odd recurrent admixture of the interjection ah. In addition, the filled pause has been consistently associated with both hesitation and disfluency. The present authors hold that such a mandatory association of the filled pause with disfluency is the product of The Written Language Bias in Linguistics [Linell, 1982] and disregards much cogent evidence to the contrary. The implicit prescriptivism of well formedness—a demand derived from literacy—must be rejected; literate well formedness is not a necessary or even typical property of spontaneous spoken discourse; its structures and functions—including those of the filled pause—are very different from those of written language. The recent work of Clark and Fox Tree (2002) holds promise for moving the status of the filled pause not only toward that of a conventional word, but also toward its status as an interjection. This latter development is also being fostered by lexicographers. Nonetheless, in view of ongoing research regarding the disparate privileges of occurrence and functions of filled pauses in comparison with interjections, the present authors are reluctant to categorize the filled pause as an interjection.

    Keywords disfluency, filler, hesitation, interjection, orality, spontaneity, word

  • Daniel O’Connell, Sabine Kowal, and Edward J. Dill, “Dialogicality in TV News Interviews,” Journal of Pragmatics, vol. 36, 2004, pp. 185-205. DOI: 10.1016/j.pragma.2003.06.001.

    Abstract Eight TV news interviews, six American, one British, and one German, were analyzed for markers of orality/literacy (back channeling, hesitations, interruptions, contractions and elisions, first-person singular pronominals, interjections, and tag questions). The interviewer/interviewee pairs were: W. Blitzer/B. Clinton; K. Couric/H. Clinton; B. Shaw/B. Bush, /M. Thatcher, /B. Goldwater, and /C. Powell; M. Bashir/Princess Diana; and G. Gaus/H. Arendt. The most evident markers of orality were hesitations (filled pauses, repeats, and false starts) and first-person singular pronominals on the part of interviewees. Across the four interviews of B. Shaw, there were notable differences in style for both interviewer and interviewees. The women participants used interjections and tag questions more frequently than the men and were interrupted more often by the men. The results are interpreted in light of a dialogical theory of intersubjectivity.

    Keywords Dialogicality, Discourse markers, Informality, Intersubjectivity, orality, TV news interviews

  • Norman Segalowitz, and Barbara F. Freed, “Context, Contact, and Cognition in Oral Fluency Acquisition: Learning Spanish in At Home and Study Abroad Contexts,” Studies in Second Language Acquisition, vol. 26, no. 02, 2004, pp. 173-199. DOI: 10.1017/s0272263104262027.

    Abstract This study investigates the role of context of learning in second language (L2) acquisition. Participants were 40 native speakers of English studying Spanish for one semester in one of two different learning contexts—a formal classroom at a home university (AH) and a study abroad (SA) setting. The research looks at various indexes of oral performance gains—particularly gains in oral fluency as measured by temporal and hesitation phenomena and gains in oral proficiency based on the Oral Proficiency Interview (OPI). The study also examines the relation these oral gains bore to L2-specific cognitive measures of speed of lexical access (word recognition), efficiency (automaticity) of lexical access, and speed and efficiency of attention control hypothesized to underlie oral performance. The learners also provided estimates of the number of hours they spent in extracurricular language-contact activities. The results show that in some respects learners in the SA context made greater gains, both in terms of temporal and hesitation phenomena and in oral proficiency as measured by the OPI, than learners in the AH context. There were also, however, significant interaction effects and correlational patterns indicating complex relationships between oral proficiency, cognitive abilities, and language contact. The results demonstrate the importance of the dynamic interactions that exist among oral, cognitive, and contextual variables. Such interactions may help explain the enormous individual variation one sees in learning outcomes, and they underscore the importance of studying such variables together rather than in isolation.

  • Segalowitz,Sidney J., and Lane,Korri, “Perceptual fluency and lexical access for function versus content words,” Behavioral and Brain Sciences, vol. 27, 4 2004, pp. 307–308. DOI: 10.1017/S0140525X04310071.

    Abstract By examining single-word reading times (in full sentences read for meaning), we show that (1) function words are accessed faster than content words, independent of perceptual characteristics; (2) previous failures to show this involved problems of frequency range and task used; and (3) these differences in lexical access are related to perceptual fluency. We relate these findings to issues in the literature on event-related potentials (ERPs) and neurolinguistics.

  • Chung-Hsien Wu, and Gwo-Lang Yan, “Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition,” Journal of VLSI Signal Processing, vol. 36, no. 2-3, 2004, pp. 91-104. DOI: 10.1023/B:VLSI.0000015089.17975.f4.

    Abstract Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses "ah," "ung," "um," "em," and "hem" in spontaneous speech. Karhunen-Loéve transform (KLT) and linear discriminant analysis (LDA) were adopted to select discriminant features for filled pause detection. In order to suitably determine the number of discriminant features, Bartlett hypothesis testing was adopted. Twenty-six features were selected using Bartlett hypothesis testing. Gaussian mixture models (GMMs), trained with a gradient decent algorithm, were used to improve the filled pause detection performance. The experimental results show that the filled pause detection rates using KLT and LDA were 84.4% and 86.8%, respectively. A significant improvement was obtained in the filled pause detection rate using the discriminative GMM with KLT and LDA. In addition, the LDA features outperformed the KLT features in the detection of filled pauses.


  • Martine Adda-Decker, Benoît Habert, Claude Barras, Gilles Adda, Philippe Boula de Mareuil, and Patrick Paroubek, “A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 67-70.

    Abstract The aim of this study is to elaborate a disfluent speech model by comparing different types of audio iranscripts. The study makes use of 10 hours of French radio interview archives, involving journalists and personalities from political or civil society. A first type of transcripts is press-oriented where most disfluencies are discarded. For 10% of the corpus, we produced exact audio transcripts: all audible phenomena and overlapping speech segments are transcribed manually. In these iranscripts about 14% of the words correspond to disfluencies and discourse markers. The audio corpus has then been iranscribed using the LIMSI speech recognizer. With 8% of the corpus the disfluency words explain 12% of the overall error rate. This shows that disfluencies have no major effect on neighboring speech segments. Restarts are the most error prone, with a 36.9% within class error rate.

    Keywords DiSS

  • Jennifer Arnold, Maria Fagnano, and Michael K. Tanenhaus, “Disfluencies Signal Theee, Um, New Information,” Journal of Psycholinguistic Research, vol. 32, no. 1, January 2003, pp. 25-36. DOI: 10.1023/A:1021980931292.

    Abstract Speakers are often disfluent, for example, saying "theee uh candle" instead of "the candle." Production data show that disfluencies occur more often during references to things that are discourse-new, rather than given. An eyetracking experiment shows that this correlation between disfluency and discourse status affects speech comprehension. Subjects viewed scenes containing four objects, including two cohort competitors (e.g., camel, candle), and followed spoken instructions to move the objects. The first instruction established one cohort as discourse-given; the other was discourse-new. The second instruction was either fluent or disfluent, and referred to either the given or new cohort. Fluent instructions led to more initial fixations on the given cohort object (replicating Dahan et al., 2002). By contrast, disfluent instructions resulted in more fixations on the new cohort. This shows that discourse-new information can be accessible under some circumstances. More generally, it suggests that disfluency affects core language comprehension processes.

    Keywords disfluency, information status, language processing, reference comprehension

  • Matthew P. Aylett, “Disfluency and speech recognition profile factors,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 51-54.

    Abstract This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson & Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.

    Keywords DiSS

  • Karl G.D. Bailey, and Fernanda Ferreira, “Disfluencies affect the parsing of garden-path sentences,” Journal of Memory and Language, vol. 49, no. 2, 2003, pp. 183-200. DOI: 10.1016/S0749-596X(03)00027-5.

    Abstract Spontaneous speech differs in several ways from the sentences often studied in psycholinguistics experiments. One important difference is that naturally produced utterances often contain disfluencies. In this study, we examined how the presence of “uh” in a spoken sentence might affect processes that assign syntactic structure (i.e., parsing). Four experiments are reported. In the first, participants judged the grammaticality of sentences that had disfluencies either right before the head noun of the ambiguous phrase or after (e.g., Sandra bumped into the busboy and the uh uh waiter told her to be careful or Sandra bumped into the busboy and the waiter uh uh told her to be careful). Sentences in the latter condition were judged grammatical less often. This result was replicated in the second experiment, in which disfluencies were replaced with environmental sounds. These findings suggest that interruptions can affect syntactic parsing, and the content of the interruption need not be speechlike. In Experiments 3 and 4 we tested whether these effects occurred because listeners use interruptions as cues to help resolve a structural ambiguity. Results from these latter two grammaticality judgment tasks suggest that when an interruption occurs before an ambiguous noun phrase, comprehenders are more likely to assume that the noun phrase is the subject of a new clause rather than the object of an old one, and furthermore, it appears that the parser is relatively insensitive to the form of the interruption. We conclude that disfluencies can influence the parser by signaling a particular structure; at the same time, for the parser, a disfluency might be any interruption to the flow of speech.

  • Alan Bell, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, and Daniel Gildea, “Effects of disfluencies, predictability, and utterance position on word form variation in English conversation,” Journal of the Acoustical Society of America, vol. 113, no. 2, February 2003, pp. 1001-1024. DOI: 10.1121/1.1534836.

    Abstract Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., ði, ðæt, ænd, ʌv) or a more reduced or lenited pronunciation (e.g., ðə, ðit, n, ə). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.

  • Ramona Benkenstein, and Adrian P. Simpson, “Phonetic correlates of self-repair involving word repetition in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 81-84.

    Abstract A phonetic description of self-initiated self-repair sequences involving the repetition of words in German spontaneous speech is presented. Data are drawn from the Kiel Corpus of Spontaneous Speech. The description is primarily impressionistic auditory, but it also employs acoustic records to verify and objectify the impressionistic findings. A number of different patterns around cut-off are identified. The comparison of phonetic differences between reparandum and repair tokens is used to argue that repair sequences can also provide an interesting insight into the way in which fluent stretches of spontaneous speech are phonetically organized.

    Keywords DiSS

  • Martin Corley, and Robert Hartsuiker, “Hesitation in speech can... um... help a listener understand,” in Proceedings of the twenty-fifth meeting of the Cognitive Science Society, Erlbaum, August 2003, pp. 276-281.

    Abstract This paper investigates the effect of disuencies on listeners’ on-line processing of speech. More specifically, it tests the hypothesis that filled pauses like um, which tend to occur before words that are low in accessibility, act as a signal to the listener that a relatively inaccessible word is about to be produced. Two experiments are reported, in which participants followed recorded instructions to press buttons corresponding to images on a computer screen. In 50% of trials, the spoken name of the image was preceded by um. In experiment 1, the intrinsic accessibility of the target items was manipulated (by means of lexical frequency); in experiment 2, the extrinsic (visual) accessibility varied. Both experiments demonstrated that participants were quicker to respond when a target was preceded by um, regardless of whether the item referred to was difficult to access or not. In addition, in experiment 2 there was a weak interaction between accessibility and presence or absence of an um. We present the data here as early evidence that listeners can benefit from disfluencies in others’ speech, and outline some methodological and theoretical considerations and further experiments.

  • Yasuharu Den, “Some strategies in prolonging speech segments in spontaneous Japanese,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 87-90.

    Abstract In this paper, we investigate segmental prolongation in a corpus of spontaneous Japanese monologues consisting of over 700,000 words. We examine effects on the rate of prolongation of various factors including speech types, the genders of speakers, word classes, word positions in the phrase and in the inter-pausal unit, and the presence of preceding fillers. Based on the empirical findings, we state some sirategies in prolonging speech segments used by Japanese speakers.

    Keywords DiSS

  • Sheena Finlayson, Victoria Forrest, Robin Lickley, and Janet Mackenzie Beck, “Effects of the restriction of hand gestures on disfluency,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 21-24.

    Abstract This paper describes an experimental pilot study of disfluency and gesture rates in spontaneous speech where speakers perform a communication task in three conditions: hands free, one arm immobilized, both arms immobilized. Previous work suggests that the restriction of the ability to gesture can have an impact on the fluency of speech. In particular, it has been found that the inability to produce iconic gestures, which depict actions and objects, results in a higher rate of disfluency. Models of speech production account for this by suggesting that gesture and speech production are part of the same integrated system. Such models differ in their interpretation of the location of the gesture planning mechanism in relation to the speech model: some authors suggest that iconic gestures relate closely to lexical access, while others suggest that the link is located around the conceptualization stage. The findings of this study tentatively confirm that there is a relationship beiween gesture and fluency - overall, disfluency increases as gesture is restricted. But it remains unclear whether the disfluency is more related to lexical access than to conceptualization. Proposals for a larger study are suggested. The work is of interest to psycholinguists focusing on the integration of gesture into models of speech production and to Speech and Language Therapists who need to know about the impact that an impaired ability to produce gestures may have on communication.

    Keywords DiSS

  • Kotaro Funakoshi, and Takenobu Tokunaga, “Evaluation of a robust parser for spoken Japanese,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 55-58.

    Abstract We implemented a parser designed to handle ill-formedness in Japanese speech. The parser was evaluated by utilizing newly collected speech data, which was obtained from an experiment designed to produce ill-formed data effectively. Introducing the proposed method increased the number of correctly analyzed utterances from 171 to 322, from among 532 utterances in the corpus.

    Keywords DiSS

  • Robert J. Hartsuiker, Martin Corley, Robin Lickley, and Melanie Russell, “Perception of disfluency in people who stutter and people who do not stutter: Results from magnitude estimation,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 35-37.

    Abstract Recent accounts of stuttering consider disfluencies the result of an interaction between speech planning and self- monitoring, emphasizing the continuity beiween errors made in everyday speech and those made by people who stutter. On Vasi9 & Wijnen’s account, the monitor is hypervigilant for upcoming problems and interrupts and restarts the speech signal, resulting in disfluent speech. Crucially, on this account, self-monitoring is a perceptual function. Therefore, this account makes iwo predictions (1) people who stutter are also hypervigilant in perceiving another person’s speech. (2) the quality of disfluencies made by people who stutter and those who do not will be comparable. We tested these hypotheses using a magnitude estimation judgment task. Twenty participants who stutter and 20 conirols were asked to rate the fluency of excerpted fluent and disfluent fragments from recorded dialogues, either between people who stutter or beiween non-stutterers. In line with the first hypothesis, people who stutter tended to rate all fragments as more disfluent than controls did. However the second hypothesis was not confirmed: across judges, fluent and disfluent fragments excerpted from recordings of people who stutter were rated as less fluent than those excerpted from conirol dialogues, suggesting that there are perceptually relevant differences between the speech of PWS and PWDNS, independent of number and type of disfluencies.

    Keywords DiSS

  • Sandrine Henry, and Berthille Pallaud, “Word fragments and repeats in spontaneous spoken French,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 77-80.

    Abstract This paper presents the results of a study conducted on the interaction of two disfluencies: repeats and word fragments. It is based on 150 repeated word fragments (e.g., "on le re- re- revendique encore une fois") extracted from a one-million-word corpus of spoken French. Word fragments such as: "notre metier spé- spécifique", are, like repeats (e.g., "vous avez évalué le le montant des dégâts"), very frequent events in spoken language: on average, there is 1 word fragment every 50 seconds, 1 repeat every 17 seconds. Speakers and listeners alike are generally unaware of these phenomena as if they were not part of the communication process. They seldom trigger a metalinguistic reaction from the speaker and are even more rarely acknowledged by the listener. These phenomena have sometimes been interpreted as ’errors’ in the communication process, like slips of the tongue. Word fragments and repeats encompass different categories of phenomena, and this enables us to define them as an heterogeneous group ruled by different types of constraints and mechanisms.2 This analysis rests on the following criteria: structural aspects of the repeat, types of word fragments, morphological and syntactic aspects. Analyses of these repeated of identical word fragments from two different angles - that of the repeats and then that of the word fragments - confirm the relevance of the distinction beiween these two types of disfluencies.

    Keywords DiSS

  • Peter Howell, “Is a perceptual monitor needed to explain how speech errors are repaired?,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 31-34.

    Abstract Kolk & Postma [2] proposed, following Dell & O’Seaghdha [1], that when a speaker chooses a word, phonologically-related words as well as the intended word are activated. Initially, the activations of all these words are similar, though eventually the intended word reaches a higher asymptotic value when activation is complete [1]. According to Kolk & Postma [2], if a response is made in the phase where activation is building up (rather than at full activation), there is a higher chance of the competing, rather than the intended, word being selected (i.e. an error). They propose that a speaker detects such errors when they are produced overtly using the perceptual system, and a monitor in the linguistic system responds by interrupting and initiating the correction [2]. Word repetition and hesitation (not errors in themselves) have been regarded as signifying underlying errors that are detected and interrupted before speech is output in a similar way to overt errors. An assumption in [2] is that activation for a word stops (or, if it continues, is ignored) immediately a candidate word is selected. The brain processes responsible for speech production have massive parallel capacity. Consequently, activation for all the candidates for a word slot could continue beyond the point where a word is selected in cases where a word is responded to prematurely. when the selected word reaches asymptote, the relative activations of this and the other candidate words indicate when an error has occurred (when the selected word has a lower activation than one of the competing words), and what correction is appropriate (the word with the highest activation). This provides the basis for error detection and correction without the need for a perceptual monitor. Continuing the buildup of activation after a word has been selected, implies that activation of nearby words in its phrase overlaps. It is shown, with some realistic assumptions about how activation builds up and decays across different words in a phrase, that this model predicts word repetition and hesitation and also part-word disfluencies (a characteristic of stuttering), again without the need for a perceptual monitor.

    Keywords DiSS

  • Kim Kirsner, John Dunn, and Kathryn Hird, “Fluency: Time for a Paradigm Shift,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 13-16.

    Abstract Pauses in spontaneous speaking constitute a rich source of data for several disciplines. They have been used to enhance automatic segmentation of speech, classification of patients with acquired communication disorders, the design of psycholinguistic models of speaking, and the analysis of psychological disorders. Unfortunately, however, although pause analysis has been with us for more than 40 years, their interpretation has been compromised by several problems [1]. The first problem is that the pause distribution is skewed, making mean duration a poor measure of central tendency. The second problem is that there are at least two components to the pause duration distribution, a problem that has been confounded by the fact that most authors have assumed that short pauses can be ignored. The third problem is that many scholars have used an arbitrary criterion to separate the pause components thereby adopting statistics that reflect errors of commission or omission. In this paper we review recent work that resolves each of these issues and illustrates the application of the new paradigm to a variety of problems. Our research indicates that, first, there are at least two pause duration distribufl’ons, each of which may be sensitive to theoretically interesting variables; second, the distributions are log-normal, thereby opening the way to appropriate measures of central tendency and dispersion, and, third, the distributions can be reliably separated by application of signal detection theory, and the proportion of misclassifications minimised and estimated. This paper reviews recent research using the new approach to pause analysis.

    Keywords DiSS

  • Koji Kitayama, Masataka Goto, Katunobu Itou, and Tetsunori Kobayashi, “Speech Spotter: New Speech Interface Capable of Invoking Speech Recognition Functions during Human-Human Conversation,” in Proceedings of Workshop on Interactive Systems and Software, 2003, pp. 9-18.

    Abstract In this paper, we propose a novel speech interface function, called "Speech Spotter", which enables a user to enter voice commands into a speech recognizer during natural human-human conversation. Only when a user utters a filled pause (a vowel-lengthening hesitation like "er...") and then utters a voice command with a high pitch, its voice command is accepted by the speech recognizer. Thus the Speech Spotter function makes full use of nonverbal information of human voice: a filled pause and the voice pitch of an utterance. By using the Speech Spotter function, we built two application systems: "on-demand human-human conversation support system" and "a telephone system with BGM-playback function". The results of using these systems showed that the Speech Spotter function is robust and convenient enough to be used in daily human-human conversation at a site or over a cellular phone.

  • Göran Kjellmer, “Hesitation. In Defence of ER and ERM,” English Studies, vol. 84, no. 2, 2003, pp. 170-198. DOI: 10.1076/enst.

    Abstract Speech differs in a number of ways from writing. How great the differences are has only been fully realised when detailed comparisons were made possible by the publication of large corpora that were partly or wholly based on the spoken language. While the two media, speech and writing, necessarily have large sections in common, it is true to say that they often use widely differing means of conveying information. The means that are specific to speech were long either neglected or ignored by researchers, so that the description of individual languages was formerly based mainly on their written manifestations. One characteristic of speech is its frequent indication of hesitation or uncertainty. The means by which it is expressed range from nonlinguistic, such as gestures, facial expressions and bodily movements to linguistic, such as repetitions. Another linguistic hesitation marker is the pause, whether silent or filled. This feature can now be studied by means of modern corpora.

  • Torbjörn Lager, “In dialogue with a desktop calculator: A concurrent stream processing approach to building simple conversational agents,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 59-62.

    Abstract Human spontaneous face-to-face conversations are characterized by phenomena such as turn-taking, feedback, sounds of hesitation and repairs. A simple and highly modular stream-based approach to natural language processing is proposed that attempts to deal with such things. A basic version of the model has been implemented in the Oz programming language.

    Keywords DiSS

  • Piroska Lendvai, Antal van den Bosch, and Emiel Krahmer, “Memory-based disfluency chunking,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 63-66.

    Abstract We investigate the feasibility of machine learning in automatic detection of disfluencies in a large syntactically annotated corpus of spontaneous spoken Dutch. We define disfluencies as chunks that do not fit under the syntactic iree of a sentence (including fragmented words, laughter, self-corrections, repetitions, abandoned constituents, hesitations and filled pauses). we use a memory-based learning algorithm for detecting disfluent chunks, on the basis of a relatively small set of low-level features, keeping track of the local context of the focus word and of potential overlaps between words in this context. We use attenuation to deal with sparse data and show that this leads to a slight improvement of the results and more efficient experiments. We perform a search for the optimal settings of the learning algorithm, which yields an accuracy of 97% and an F-score of 80%. This is a significant improvement of the baselines and of the results obtained with the default settings of the learner.

    Keywords DiSS

  • Krisztina Menyhárt, “Age-dependent types and frequency of disfluencies,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 45-48.

    Abstract The age-dependent changes of one’s speech production from childhood up to old age are relatively well known. However, there has been less research conducted concerning the possible alterations of the disfluency phenomena in speakers’ spontaneous speech determined by age. Our hypothesis is that permanent changes are going on in the operation of speech production processes from early childhood up to old age, and that those changes can be studied via observing disfluency phenomena. A series of experiments has been carried out with the participation of altogether 30 Hungarian-speaking persons, children, midle-aged adults and old subjects (ages of 77). Their spontaneous speech was recorded and analyzed concerning the articulation and speech tempi, silent and filled pauses, as well as other disfluency phenomena (like false starts, repetitions, slips, etc.). The aim of the research is to explore the invariant and variable factors of the disfluencies depending on age. The results highlight also the individual differences that seem to be independent of the age factor.

    Keywords DiSS

  • Hannele Nicholson, Ellen Gurman Bard, Rohin Lickley, Anne H. Anderson, Jim Mullin, David Kenicer, and Lucy Smallwood, “The intentionality of disfluency: Findings from feedback and timing,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 17-20.

    Abstract This paper addresses the causes of disfluency. Disfluency has been described as a strategic device for intentionally signalling to an interlocutor that the speaker is committed to an utterance under construction. It is also described as an automatic effect of cognitive burdens, particularly of managing speech production during other tasks. To assess these claims, we used a version of the map task and tested 24 normal adult subjects in a baseline untimed monologue condition against conditions adding either feedback in the form of an indication of a supposed listener’s gaze, or time-pressure, or both. Both feedback and time-pressure affected the nature of the speaker’s performance overall. Disfluency rate increased when feedback was available, as the strategic view predicts, but only deletion disfluencies showed a significant effect of this manipulation. Both the nature of the deletion disfluencies in the current task and of the information which the speaker would need to acquire in order to use them appropriately suggest ways of refining the strategic view of disfluency.

    Keywords DiSS

  • Sieb G. Nooteboom, “Self-monitoring is the main cause of lexical bias in phonological speech errors,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 27-30.

    Abstract In this paper I present new evidence, stemming both from an experiment and from spontaneous speech, demonstrating that (a) lexical bias is caused by self-monitoring of inner speech, as proposed by Levelt et al. [1], and (b) that there is phoneme-to-word feedback in the mental programming of speech, as supposed by Dell [2] and Stemberger [3]. It is argued here that possibly phoneme-to-word feedback is an unavoidable side-effect of self-monitoring of inner speech.

    Keywords DiSS

  • Caroline L. Rieger, “Disfluencies and hesitation strategies in oral L2 tests,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 41-44.

    Abstract This paper presents an investigation of hesitation strategies of intermediate learners of German as a second or foreign language (L2) when they take part in oral L2 tests. Previous studies of L2 hesitation strategies have focused on beginning and advanced L2 learners. They found that beginners tend to leave their hesitation pauses unfilled making their speech highly disfluent [17], while advanced L2 speakers - similar to native speakers - use a variety of fillers. In oral L2 tests, intermediate learners hesitate mainly for two reasons: to search for a German word or structure, or to think about the content of their utterance. Some participants use a variety of strategies to signal to the addressee that they are hesitating. This variety is not as rich as it is for advanced L2 learners or native speakers. Other participants leave their hesitation pauses unfilled or rely on quasi-lexical fillers to hold the floor when hesitating.

    Keywords DiSS

  • Guergana Savova, and Joan Bachenko, “Prosodic features of four types of disfluencies,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 91-94.

    Abstract We present a corpus-based approach for using intonation and duration to detect disfluency sites. The questions we aim to answer are: what are the prosodic cues for each disfluency type? Can predictive models be built to describe the relationship between disfluency types and prosodic cues? Are there correlations beiween the reparandum onset and offset and the repair onset and offset? Is there a general prosodic strategy? Our findings support four main hypotheses: 1) The Combination Rule: A single prosodic feature does not uniquely identify disfluencies or their types. Rather, it is a combination of several features that signals each type. 2) The Compensatory Rule: If there is an overlap of one prosodic feature, then another cue neutralizes the overlap. 3) The Discourse Type Rule: Prosodic cues for disfluencies vary according to discourse type. 4) The Expanded Reset Rule: Repair onsets are dependent on reparandum onsets and reparandum offsets. The limitation of the current study is the relatively small corpus size. Further testing of our proposed hypotheses is needed.

    Keywords DiSS

  • Shu-Chuan Tseng, “Repairs and repetitions in spontaneous Mandarin,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 73-76.

    Abstract 246 overt repairs, 653 complete repetitions and 475 partial repetitions were identified in an annotated corpus of spontaneous Mandarin conversations. On the basis of the data, this paper investigates Mandarin repairs and repetitions by segmenting them into the reparandum part, the editing part and the reparans part and by tagging them using the CKIP automatic word segmentation and tagging system. Results of the use of editing term, the distribution of part of speech and syllables in the reparandum are presented. Semantic differences and similarity in the discrepancy of tagging results of the reparandum and the reparans are also discussed.

    Keywords DiSS

  • Fan Yang, Peter A. Heeman, and Susan E. Strayer, “Acoustically verifying speech repair annotations,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 97-100.

    Abstract Identifying speech repairs is a critical part of annotating spontaneous speech. DialogueView is an annotation tool that provides visual and audio supports for directly annotating speech repairs. In this paper, we report the usability of clean play, a special feature implemented in DialogueView, which cuts out the annotated reparanda and editing terms and plays the remaining speech. We find that although clean play does not help users detect repairs, it does help them determine the extent of repairs. We also find that clean play improves users’ confidence because they have another way to verify their annotations.

    Keywords DiSS


  • Jennifer Arnold, Maria Fagnano, and Michael K. Tanenhaus, “Disfluencies signal theee, um, new stuff: Immediate use of disfluencies during reference comprehension,” in 15th CUNY Conference on Human Sentence Processing, New York, NY, 2002.

    Abstract Spontaneous speech is rarely fluent, resulting in hesitations, fillers ("um" / "uh"), repeated or repaired words, or pronouncing "the" as /thiy/ (Fox Tree & Clark, 1997). Yet these features are generally considered to not affect the core processes of language comprehension. While disfluencies have been argued to signal that the speaker is having difficulty (Clark & Wasow, 1998; Fox Tree & Clark, 1997), this metalinguistic knowledge has not been linked to specific language comprehension phenomena. A corpus analysis showed that speakers are disfluent more often when referring to entities that are new (rather than given) in the discourse. If listeners are sensitive to this correlation, disfluencies at the start of a noun phrase should lead them to focus on objects that are visible but have not yet been mentioned. Eye movements of 24 native speakers of English were recorded as they listened to pairs of instructions to move objects on a computer screen (Table 1). Each display contained 4 colored pictures (Rossion & Purtois, 2001), including two cohorts (e.g., camel/candle). We investigated the time course of referent identification for the first noun in the second instruction, manipulating whether: 1) the critical NP was fluent (the camel) or disfluent (thiy, uh, camel), and 2) the referent was discourse-new, or was given but unfocused, having just been mentioned as the goal of the first instruction. All NPs were accented. Disfluent NPs should lead to faster target looks in the new condition, and increased cohort competition in the given condition. By contrast, fluent, accented NPs provide an initial bias toward the given but nonfocused object (Dahan et al., in press), so we expected fluent NPs to lead to faster target looks in the given condition and more cohort competition in the new condition. Results showed precisely this interaction, beginning 200 msec after the onset of the head noun ("ca-"). Prior to the noun, there was also a preference for new objects in the disfluent condition and given objects in the fluent condition, emerging 200 msec after the determiner (the/thiy), which provided the first information about fluency. Thus, comprehenders immediately use information provided by disfluencies. This may stem from use of purely distributional information about disfluencies and discourse status, or may result from inferring that the speaker is having difficulty in lexical retrieval (which would be less likely for a just-mentioned referent). Regardless, information about fluency affects the earliest moments of reference resolution. Table 1: Sample instructions (target NP is underlined) Given (Discourse-Old) Context: Put the grapes below the candle. Discourse-new Context: Put the grapes below the camel. a. fluent (accented): Now put the candle below the salt shaker. b. disfluent: Now put thiy, uh, CANDLE below the salt shaker.

  • Thomas Berg, “Slips of the typewriter key,” Applied Psycholinguistics, vol. 23, no. 2, 2002, pp. 185-207. DOI: 10.1017/s0142716402002023.

    Abstract This article presents an analysis of 500 submorphemic slips of the typewriter key that escaped the notice of authors and other proofreaders and thereby made their way into the published records of scientific research. Despite this high selectivity, the corpus is not found to differ in major ways from other collections of keying slips. The main characteristics of this error type include a predominance of within-word slips, an elevated rate of noncontextual slips, a heightened incidence of omissions (in particular, masking errors), a high number of adjacent switches, and an uncommonness of these slips in word edges. In all these respects, slips of the key resemble slips of the pen, although not slips of the tongue. It is argued that speech errors are shaped by a fully deployed structural representation, whereas key slips arise under the influence of a weak structural representation. By implication, speaking is characterized by a hierarchical strategy of activation while typewriting is subject to the so-called staircase strategy of serialization in which activation is a function of linear distance. These disparate strategies may be understood as a response of the processing system to disparate requirements, such as varying speed of execution.

  • Herbert Clark, and Jean E. Fox Tree, “Using uh and um in spontaneous speaking,” Cognition, vol. 84, no. 1, May 2002, pp. 73-111. DOI: 10.1016/S0010-0277(02)00017-3.

    Abstract The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in "and-uh"), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.

    Keywords conversation, Dialogue, disfluencies, Language production, spontaneous speech, uh, um

  • Catia Cucchiarini, Helmer Strik, and Lou Boves, “Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech,” Journal of the Acoustical Society of America, vol. 111, no. 6, June 2002, pp. 2862-2873. DOI: 10.1121/1.1471894.

    Abstract This paper describes two experiments aimed at exploring the relationship between objective properties of speech and perceived fluency in read and spontaneous speech. The aim is to determine whether such quantitative measures can be used to develop objective fluency tests. Fragments of read speech (Experiment 1) of 60 non-native speakers of Dutch and of spontaneous speech (Experiment 2) of another group of 57 non-native speakers of Dutch were scored for fluency by human raters and were analyzed by means of a continuous speech recognizer to calculate a number of objective measures of speech quality known to be related to perceived fluency. The results show that the objective measures investigated in this study can be employed to predict fluency ratings, but the predictive power of such measures is stronger for read speech than for spontaneous speech. Moreover, the adequacy of the variables to be employed appears to be dependent on the specific type of speech material investigated and the specific task performed by the speaker.

  • Jean E. Fox Tree, “Interpreting pauses and ums at turn exchanges,” Discourse Processes, vol. 34, no. 1, 2002, pp. 37-55. DOI: 10.1207/S15326950DP3401_2.

    Abstract In 3 experiments, this article compares how overhearers interpreted second speakers’ contributions to a conversation depending on whether the second speaker responded to a first speaker immediately; paused and responded; said um and responded; or said um, paused, and then responded. The conversational snippets tested were unscripted and diverse; an example of one exchange is, "Are you here because of affirmative action?" (pause, um, or both) "It helped me out a little bit." Overhearers thought speakers had more production difficulty, were less honest, and were less comfortable with topics under discussion when speakers either said um or paused, and even more so with both. The best explanation for the data is that overhearers are judging, for each question asked, what it means for speakers to produce an anticipated or an unanticipated delay.

  • Yoko Kato Nakai, “Topic Shifting Devices Used by Supporting Participants in Native/Native and Native/Non- Native Japanese Conversations,” Japanese Language and Literature, vol. 36, no. 1, April 2002, pp. 1-25. DOI: 10.2307/3250876.

    Abstract In this paper, I analyzed differences in the devices used by native and nonnative supporting participants in topic openings and closings in Japanese face-to-face conversations. My analysis builds on previous research on conversational units and topic-shifting devices in Japanese conversations (Hayashi 1960; Minami 1972, 1983, 1993; Ichikawa 1978; Sugito and Sawaki 1979; Noda 1981, 1990; Ikuta 1983; Sugito 1983, 1987; Jorden with Noda 1987; Sakuma 1987, 1990, 1992; Szatrowski 1986a, 1986b, 1987, 1991, 1993, 1997, 1998; Imaishi 1992; Sakuma and Suzuki 1993; Suzuki 1994, 1995; Karatsu 1995; Emmett 1996, 1998; Okada 1996; Sasaki 1996, 1998; Kato 1999), analyses of topic-shifting devices in English conversations (Garfinkel and Sacks 1970; Reichman 1978; Derber 1979; Goodwin 1981; Long 1981; Levinson 1983; Chafe 1987; Goodwin and Goodwin 1992; Sacks 1992; Geluykens 1993), and contrastive analyses of topic-shifting strategies in English and Japanese conversation (Maynard 1989; Yamada 1992; Watanabe 1993). I demonstrate that the non-native supporting participants in my data used fewer devices such as discourse developing connectives (e.g., demo ’but’, ja ’so [in that case]’, etc.) and the extended predicate (Jorden with Noda 1987) to indicate the relation of their utterances to the context in topic openings than Japanese native supporting participants did. Non-native supporting participants also tended to use more aizuchi ’backchannel utterances’ in topic closings than did native supporting participants, who combined aizuchi with a variety of other devices such as fragments, assessments, summary utterances, direct style, final particles, prolonged vowels, overlap, repetition, and co-construction.

  • Miguel Oliveira, “The Role of Pause Occurrence and Pause Duration in the Signaling of Narrative Structure,” in PorTAL ’02 Proceedings of the Third International Conference on Advances in Natural Language Processing, Springer-Verlag, 2002, pp. 43-52.

    Abstract This paper addresses the prosodic feature of pause and its distribution in spontaneous narrative in relation to the role it plays in signaling narrative structure. Pause duration and pause occurrence were taken as variables for the present analysis. The results indicate that both variables consistently mark narrative section boundaries, suggesting thus that pause is a very important structuring device in oral narratives.

  • Michiko Watanabe, “Fillers as Indicators of Discourse Segment Boundaries in Japanese Monologues,” in Proceedings of Speech Prosody 2002, 2002.

    Abstract We investigated distribution of fillers (filled pauses) in the vicinity of boundaries of different strengths in Japanese monologues, to understand whether fillers may convey information about the location and the strength of boundaries. Consistent with the results of studies on Dutch monologues, fillers tend to increase as the boundary strength grows. It has also been revealed that fillers tend to occur phrase-initially, more strongly at deeper boundaries than at shallower ones. Regarding filler types, the frequency of eto grows most sharply as boundary strength increases, as does e to a lesser degree. These findings indicate that occurrence of fillers, particularly phrase-initial eto and e, provide contributory evidence to discourse boundaries.


  • Laura Abou-Haidar, “Pauses in speech by French speakers with Down Syndrome,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 33-36.

    Abstract A better understanding of the control mechanisms of speech in verbal interaction is very important for the evaluation of the pragmatic competence of a mentally deficient speaker. This study focuses on pauses in the oral production of a Speaker with Down syndrome involved in a conversation: it brings to light the temporal compensation mechanisms which allow the speaker to go beyond the distortions of the segmental level. It confirms the important role of prosody in the success of a conversation, particularly with a speaker who has a handicap which disrupts language structure. Down Syndrome is a condition characterised by an overall delay in cognitive, social, linguistic and motor development. At the oral production level, it leads to deficits in segmental and supra-segmental speech patterning. The goal of this study is to bring elements of response to the following question: is the pragmatic function of language preserved in spite of significant distortions of the motor functions of the phonatory organs? The description of the management of pauses by a speaker with Down syndrome involved in a conversation makes it possible to clarify this subject, while taking into account the various functions which are specific to them beyond the respiratory function: their role in encoding, in the delimitation of syntactic boundaries, and in the regulation of speaking turns, among others. This study allowed us to define criteria which make it possible to characterise the oral production of a Speaker with Down syndrome. These elements relate to the variation of the frequency and the length of pauses. The results obtained are the following: 1. a high frequency of occurrence of pauses in the production of the trisomic speaker; 2. a frequency of occurrence of "mixed pauses", of which the majority have very long lengths, this element revealing a lack of ease and disfluency on the production level; 3. a significant recourse to false-starts, hesitation, repetition and lengthening, to mark sound pauses; 4. a considerable number of very long pauses pauses; 5. a relatively high number of pauses located at the boundaries of or within syntagms, with rather long lengths of intra-syntagmatic uses. We furthermore noted a rarity of long phonic sequences in the speaker with Down syndrome, these sequences seldom exceeding 2000 ms. In spite of these results, it is important to note that we have defined parameters which show that the speaker with Down syndrome integrated rules relating to the management of pauses in verbal interaction.

    Keywords DiSS

  • Karl G.D. Bailey, and Fernanda Ferreira, “Do non-word disfluencies affect syntactic parsing?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 61-64.

    Abstract Although disfluencies such as uh are generally not treated as linguistic items, our results suggest that they can affect syntactic parsing. Using a grammaticality judgment task, we demonstrate that disfluencies are able to affect the syntactic parse of a sentence in two ways. First, disfluencies can make syntactic reanalysis more difficult by coming between an ambiguous constituent and a disambiguating item. Second, the pattern of disfluencies in spontaneous speech may be used by the listener to guide the parse of a sentence. Thus, although disfluencies have often been viewed as pragmatic phenomena, they can affect the language comprehension by influencing its parsing procedures.

    Keywords DiSS

  • Ellen G. Bard, Robin J. Lickley, and Matthew P. Aylett, “Is disfluency just difficulty?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 97-100.

    Abstract The question addressed by this paper is whether disfluency resembles Inter-Move Interval, a measure of reaction time in conversation, in displaying effects of the overall difficulty of conducting a coherent conversation. Five sources of difficulty are considered as potential causes of disfluency: planning and producing an utterance, comprehending the prior utterance, performing a communicative task, order effects, and interpersonal factors. A multiple regression analysis on simple disfluencies in the HCRC Map Task Corpus shows that planning and production make the major independent contribution to predicting the rate of disfluencies, with interpersonal variables and position in dialogue also contributing significantly. Notably, comprehension variables did not affect either the total rate of disfluency or the rate of individual kinds of disfluencies.

    Keywords DiSS

  • Heather Bortfeld, Silvia Leon, Jonathan Bloom, Michael Schober, and Susan Brennan, “Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender,” Language and Speech, vol. 44, 2001, pp. 123-147.

    Abstract After reviewing situational and demographic factors that have been argued to affect speakers’ disfluency rates, we examined disfluency rates in a corpus of task-oriented conversations (Schober & Carstensen, 2001) with variables that might affect fluency rates. These factors included: speakers’ ages (young, middle-aged, and older), task roles (director vs. matcher in a referential communication task), difficulty of topic domain (abstract geometric figures vs. photographs of children), relationships between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only slightly higher disfluency rates than young and middle-aged speakers. Overall, disfluency rates were higher both when speakers acted as directors and when they discussed abstract figures, confirming that disfluencies are associated with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repeats or restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

    Keywords communication, conversation, disfluency, speech planning, spontaneous speech

  • Susan Brennan, and Michael Schober, “How Listeners Compensate for Disfluencies in Spontaneous Speech,” Journal of Memory and Language, vol. 44, no. 2, 2001, pp. 274-296. DOI: 10.1006/jmla.2000.2753.

    Abstract Listeners often encounter disfluencies (like uhs and repairs) in spontaneous speech. How is comprehension affected? In four experiments, listeners followed fluent and disfluent instructions to select an object on a graphical display. Disfluent instructions included mid-word interruptions (Move to the yel- purple square), mid-word interruptions with fillers (Move to the yel- uh, purple square), and between-word interruptions (Move to the yellow- purple square). Relative to the target color word, listeners selected the target object more quickly, and no less accurately, after hearing mid-word interruptions with fillers than after hearing comparable fluent utterances as well as utterances that replaced disfluencies with pauses of equal length. Hearing less misleading information before the interruption site led listeners to make fewer errors, and fillers allowed for more time after the interruption for listeners to cancel misleading information. The information available in disfluencies can help listeners compensate for disruptions and delays in spontaneous utterances.

    Keywords comprehension, disfluencies, fillers, paralinguistic cues, parsing, pauses, repairs, spontaneous speech

  • Jeanne-Marie Debaisieux, and José Deulofeu, “Grammatically unacceptable utterances are communicatively accepted by native speakers, why are they?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 69-72.

    Abstract This paper aims at redefining the generally accepted notion of unfinished or elliptic sentence, which appears to be crucial in defining in turn the notion of fluency itself. It will be shown that a large part of utterances which a regularly trained linguist would consider as unacceptable and revealing some kind of disfluency of the speaker who produced them, are in fact fully accepted by the participants of a regular verbal interaction. This apparent contradiction will be explained by the fact that linguists base their judgments of well formedness of the utterances on their grammatical structure, whereas speakers interact basically by means of communicative units, which are not necessarily made up of grammatically well formed parts.

    Keywords DiSS

  • Yasuharu Den, “Are word repetitions really intended by the speaker?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 25-28.

    Abstract This paper compares, using our Japanese data, word repetitions with error repairs in terms of their temporal structures in order to examine whether or not the prolongation of first tokens in word repetitions, observed by Den and Clark (2000), is really an effect of the speaker’s strategy. Analyses of 10 task-oriented Japanese dialogues reveal a difference between word repetitions and error repairs for the data involving cut-off in first tokens; in both types of disfluencies, the final phoneme of the first token is considerably prolonged, but the degree of the prolongation is much greater in word repetitions than in error repairs. These results support our view that prolonged first tokens in word repetitions are a product of a process under the speaker’s control or intention.

    Keywords DiSS

  • Danielle Duez, “Acoustico-phonetic characteristics of filled pauses in spontaneous French speech: preliminary results,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 41-44.

    Abstract In the current analysis we examined the acoustic and phonetic characteristics of filled pauses in spontaneous French speech and their relationship to the prosody of the surrounding context. Two main results emerged: 1) There was no effect of the duration of filled pauses or their sentence location on their F0 patterns or on the differences between the highest and lowest values. 2) There was no relationship between peak-F0 values and the F0 values of filled-pause onsets, but the F0 values of filled-pause onsets and the F0-values of non-marked breath-group onsets were highly similar. The F0 values of filled-pause onsets seem to be stable within the same speaker’s speech. They are speaker-dependent and strongly linked to the physiological, absolute aspects of speech production. It is assumed that filled-pause onset may be used by listeners as a reference for evaluating the speaker’s pitch range.

    Keywords DiSS

  • Robert Eklund, “Prolongations: A dark horse in the disfluency stable,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 5-8.

    Abstract This paper studies a specific type of disfluency, viz. segment prolongation (PR), i.e., the "stretching out" of speech sounds as a means of hesitation. It is shown that the occurrence of PRs varies as a function of phone type, position in the word, lexical factors and word class, and that PRs are subject to phonotactic constraints in Swedish. A comparison between Swedish and Tok Pisin suggests that there are languagespecific traits associated with PR production.

    Keywords DiSS

  • Jean E. Fox Tree, “Listeners’ uses of "um" and "uh" in speech comprehension,” Memory and Cognition, vol. 29, no. 2, March 2001, pp. 320-326.$$/content/29/2/320.abstract.

    Abstract Despite their frequency in conversational talk, little is known about how ums and uhs affect listeners’ on-line processing of spontaneous speech. Two studies of ums and uhs in English and Dutch reveal that hearing an uh has a beneficial effect on listeners’ ability to recognize words in upcoming speech, but that hearing an um has neither a beneficial nor a detrimental effect. The results suggest that um and uh are different from one another and support the hypothesis that uh is a signal of short upcoming delay and um is a signal of a long upcoming delay.

  • Mária Gósy, “The double function of disfluency phenomena in spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 57-60.

    Abstract Disfluency in spontaneous speech is the outcome of a speaker’s indecision about what to say next. The listener, however, is continuously adapted to both the language signals and the types of disfluency of the heard text. What is in the background of this adaptation process? This paper analyses the types and characteristics of the disfluency phenomena of a 78-minute spontaneous speech sample (produced by 10 adults). The author’s intention is to explain the characteristics of disharmony between speech planning and articulation within the speech production process. In order to explain the hypothesized double function of disfluency in terms of perceptual necessity from the listener’s side various experiments have been carried out. Three different samples of spontaneous speech have been selected for experimental purposes. Three groups of listeners (altogether 60 university students) participated in the experiments. One of the groups had to detect the instances of disfluency in the texts marking them on a paper sheet. The subjects of the other group listened to the same texts and then wrote down their contents. The pauses and hesitations were then eliminated from the texts. The third group of the subjects had the same comprehension task as the previous one had. Results show that (i) instances of disfluency are consequences of the speaker’s speech planning processes, (ii) their reasons and occurrences are unconsciously known by the listener as well, (iii) disfluency phenomena are relatively well predicted, (iv) the listeners need pauses and hesitations in order to comprehend the heard texts successfully.

    Keywords DiSS

  • Lynne Hansen, “Language Attrition: The Fate of the Start,” Annual Review of Applied Linguistics, vol. 21, 2001, pp. 60-73.

    Abstract This chapter reviews the literature on psycholinguistic aspects of language attrition over the past half decade. Descriptive data-based studies have continued to dominate during this time, providing needed groundwork for the emerging discipline. A few studies have continued theoretical threads from previous work, however, by examining attrition data from the perspectives of the regression hypothesis and markedness theory. We have also seen the beginnings of promising new lines of research which draw theoretical underpinnings from neighboring disciplines, most notably from the savings paradigm in cognitive psychology and from theories of codeswitching in bilingualism studies. Evidence on the effects in attrition of non-linguistic variables such as age, proficiency level, and literacy has continued to accumulate. Hesitation phenomena in attriter speech have begun to receive serious attention. Relearning, one of the main areas to potentially benefit from language attrition studies, is also gaining new research impetus at the turn of the century.

  • Tapio Hokkanen, “Prosodic marking of self-repairs,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 37-40.

    Abstract Slip studies predominantly focus on either structural or semantic properties of the errors. Since most analyses have been based on pen-and-paper collections, i.e., on-line notes, it is quite understandable that suprasegmental of errors have remained a neglected area. The present prosodic analysis is based on acoustical measurements of 307 self-repairs. Each repair has been measured with the Praat program. In order to make the measurements psychoacoustically relevant and comparable across speakers, the changes in F0 are expressed in terms of semitones. In general, speakers repair slightly less than three quarters of the errors they commit whereas one quarter remains either totally undetected or at least without a repair. With respect to prosodic marking, it appears that the proportion of marked repairs in the present data is significantly larger than in previous studies: approximately two thirds of self-repairs are marked with remarkably higher pitch (>+3ST), and a total of 96.7 per cent with a somewhat heigthened pitch. It is concluded that alternations of fundamental frequency are utilized in marking self-initiated repairs.

    Keywords DiSS

  • Peter Howell, and James Au-Yeung, “Application of EXPLAN theory to spontaneous speech control,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 9-12.

    Abstract Problems for theories that explain speech errors by a monitoring process are discussed. EXPLAN theory is based on a proposal about planning and execution time, not on how errors arise. This theory is outlined and support from characteristics of fluency failure and altered feedback studies given.

    Keywords DiSS

  • Peter Howell, and Stevie Sackin, “Function Word Repetitions Emerge When Speakers Are Operantly Conditioned to Reduce Frequency of Silent Pauses,” Journal of Psycholinguistic Research, vol. 30, no. 5, 2001, pp. 457-474.

    Abstract Beattie and Bradbury (1979) reported a study in which, in one condition, they punished speakers when they produced silent pauses (by lighting a light they were supposed to keep switched off). They found speakers were able to reduce silent pauses and that this was not achieved at the expense of reduced overall speech rate. They reported an unexpected increase in word repetition rate. A recent theory proposed by Howell, Au-Yeung, and Sackin (1999) predicts that the change in word repetition rate will occur on function, not content words. This hypothesis is tested and confirmed. The results are used to assess the theory and to consider practical applications of this conditioning procedure.

  • Ben Hutchinson, and Cécile Pereira, “Um, one large pizza. A preliminary study of disfluency modelling for improving ASR,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 77-80.

    Abstract A corpus of spontaneous telephone transactions between call centre operators of a pizza company and its customers is examined for disfluencies (fillers and speech repairs) with the aim of improving automatic speech recognition. From this, a subset of the customer orders is selected as a test set. An architecture is presented which allows filled pauses and repairs to be detected and corrected. A language repair module removes fillers and reparanda and transforms utterances containing them into fluent utterances. An experiment on filled pauses using this module and architecture is then described. A speech recognition grammar for recognising fluent speech is used to provide a baseline. This grammar is then enriched with filled pauses, based on their placement in relation to syntactic boundaries. Evaluation is done at the level of understanding, using a metric on feature structures. Initial results indicate that incorporating filled pauses at syntactic boundaries improves the recognition results for spontaneous continuous speech containing disfluencies.

    Keywords DiSS

  • Klaus J. Kohler, Benno Peters, and Thomas Wesener, “Interruption glottalization in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 45-48.

    Abstract This paper analyzes the occurrence of phonetic interruption cues at points of syntactic irregularities (false starts and truncations) in a large annotated corpus of German dialogues and compares interruption glottalization with laryngealization in terminal low phrase-final prosodies. Glottalization (including glottal stop) predominantly marks word fragments, whereas non-verbal insertions, e.g. breathing, tend to be word-external interruption cues. Laryngealization (excluding glottal stop) predominantly signals terminal phrase boundaries in turn-final positions. Individual speakers differ a great deal as to the distribution of these phenomena.

    Keywords DiSS

  • Robin J. Lickley, “Dialogue moves and disfluency rates,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 93-96.

    Abstract Many factors conspire to cause speakers to produce hesitations and self-repairs in dialogue. It has been noted that disfluency rates vary between corpora, with different overall dialogue tasks and with different modalities (e.g. human-computer vs. human-human) and between speakers, where they play different roles within a given dialogue. In this paper, we attempt to account for some of these results by examining the interaction between rates of different types of disfluency and types of utterance (dialogue moves) within one corpus of human-human task oriented dialogues. We find both that overall disfluency rate varies by dialogue move type, with moves which require more planning producing more disfluency, and that the distribution of disfluency types varies between move types, most notably with complex and negative responses to questions producing more filled pauses than positive replies and other moves. This work helps us to understand how dialogue structure can account for differences in disfluency rates between and within speech corpora and has implications for research in speech production and perception, discourse studies, dialogue management and automatic speech recognition.

    Keywords DiSS

  • Jan McAllister, Susan Cato-Symonds, and Blake Johnson, “Listeners’ ERP responses to false starts and repetitions in spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 65-68.

    Abstract Hindle [1] suggested that false starts and repetitions should be handled differently in a computational account of the processing of the two kinds of disfluency, and there is behavioural evidence that the human sentence processing mechanism likewise honours this distinction [2]. The same dichotomy was also evident in the electrophysiological data reported here. False starts and repetitions were identified in a corpus of spontaneous speech. Control items for the false starts were prepared by excising the reparanda to yield apparently fluent items. Continuous EEG was recorded while subjects listened to items containing the false starts, fluent false start controls, and first and second tokens of repetitions. Compared with identical words in their fluent controls, the false starts elicited a positive response similar to the P600 which is reported for syntactically anomalous words [3, 4, 5]. By contrast, second tokens of repetitions in general resulted in increased amplitude of the N400 [6]; yet, when the same repetitions were excised from context and presented listfashion, they elicited the positive-going response which has been reported by other researchers [7].

    Keywords DiSS

  • Nikolinka Nenova, Gina Joue, Ronan Reilly, and Julie Carson-Berndsen, “Sound and function regularities in interjections,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 49-52.

    Abstract This paper investigates the relation between the sound patterns of interjections and their functional realisation in the discourse process. It considers whether certain interjection functions tend to have particular sound distributions. In order to address these questions a classification scheme for American English nonlexical interjections in terms of discourse markers is also presented.

    Keywords DiSS

  • Sieb G. Nooteboom, “Different sources of lexical bias and overt self-corrections,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 21-24.

    Abstract In this paper it is argued, on the basis of a quantitative analysis of spontaneous speech errors and their corrections in Dutch, that the mechanism leading to lexical bias in speech errors cannot be same as that leading to overt self-corrections. Although spontaneous speech errors show a strong lexical bias, overt self-corrections do not. Lexical bias strongly increases with dissimilarity between target phoneme and source phoneme No such effect is found in overt selfcorrections. Several possible sources of these differences are discussed.

    Keywords DiSS

  • Serguei V. Pakhomov, “Hesitations and Cognitive Status of Noun Phrase Referents in Spontaneous Discourse,” PhD Dissertation, University of Minnesota. 2001.

    Abstract (none)

  • Anastasia Riazantseva, “Second Language Proficiency and Pausing: A Study of Russian Speakers of English,” Studies in Second Language Acquisition, vol. 23, no. 4, December 2001, pp. 497-526. DOI: 10.1017/s027226310100403x.

    Abstract The present study examines the relationship between second language (L2) proficiency and pausing patterns (i.e., pause duration, frequency, and distribution) in the speech of 30 Russian speakers of English performing two oral tasks—a topic narrative and a cartoon description—in Russian and in English. The subjects were divided into two oral English proficiency groups, high and intermediate, on the basis of a standardized test of spoken English. Baseline data were collected from a control group of 20 native English speakers. Statistical analyses were performed to determine: (a) the native norms of pause duration, frequency, and distribution for Russian and English on the two experimental tasks; (b) the effect of the level of L2 proficiency (high and intermediate) on the pausing of Russian speakers in English; and (c) the differences or similarities in pausing exhibited by native English speakers and native Russian speakers (with two different levels of English proficiency) when speaking English. The results of this study indicate that English and Russian informal monologue speech can be characterized as having different pausing conventions, thus suggesting that crosslinguistic differences involve, among many other aspects, contrasts in pausing patterns. Additionally, L2 proficiency was found to affect the pause duration of advanced nonnative speakers in that they were able to adjust the duration of their pauses in English to produce a nativelike pausing norm. It was also found that even highly proficient L2 speakers pause more frequently in their L2 than in their first language (L1). The examination of pause distribution patterns suggests that persons of intermediate to high L2 speaking proficiency make the same number of within-constituent pauses as native speakers. Overall, the findings of this study support the view that adherence to the target language pausing norms may lead to the perception of nonnative speech as more fluent and nativelike. The findings also highlight the importance of exposing L2 students to a richer variety of situations that illustrate native patterns of verbal communication.

  • Caroline L. Rieger, “Idiosyncratic fillers in the speech of bilinguals,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 81-84.

    Abstract This paper introduces a never before described strategy used by bilinguals to fill hesitation pauses. This strategy proved so unique that it was given the name ’idiosyncratic filler.’ It describes a filler type that is produced unusually often by one individual when hesitating. It is usually a particular lexical filler that is used as often as or more often than all other lexical fillers combined. Idiosyncratic fillers are as flexible as, but more ’prestigious’ than quasi-lexical fillers and they are used by bilinguals in their non-native language as an overgeneralization and to avoid the incessant production of ’uhs’ and ’uhms.’

    Keywords DiSS

  • L. J. Rodríguez, I. Torres, and A. Varona, “Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 1-4.

    Abstract A new database consisting of 227 dialogues in Spanish was annotated with disfluencies. Then a detailed analysis of the annotations was carried out. The database had been recorded according to the well knownWizard of Oz paradigm. Seventy-five speakers were given each one three different scenarios to make queries about timetables, prices and other conditions of train travels between two spanish cities. The notion of disfluency was relaxed to include any acoustic, lexical or syntactic feature that distinguises spontaneous from read speech. A specific XML annotation scheme was developed. A simple text editor was used to insert marks, and a specific parser was implemented to find errors in annotations. The analysis of annotations revealed that disfluencies were not uniformly distributed among either user turns or speakers. Most disfluencies were grouped into certain user turns, especially the first one. On the other hand, some speakers were remarkably more prone to hesitate, repeat or correct fragments of speech than others.

    Keywords DiSS

  • Mandana Seyfeddinipur, and Sotaro Kita, “Gesture as an indicator of early error detection in self-monitoring of speech,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 29-32.

    Abstract There is a theoretical controversy regarding when the selfmonitoring process interrupts the speech stream. One view holds that the speech stream is interrupted as soon as an error is detected. Another view holds that, even after an error is detected, the speaker does not interrupt immediately but continues speaking and at the same time plans the upcoming repair. We address this question by observing speech-accompanying gestures at the moment of speech disfluency. The results show that the concurrent gestural movements are typically stopped on average 240 ms before speech is stopped. In other words, the gesture suspension foreshadows the speech suspension. The gestural foreshadowing shows that the speaker must know early on that he is going to suspend speech. The gestural indication of an upcoming speech suspension suggests that the speaker does not interrupt speech at the very moment s/he detects an error. This result supports the hypothesis on speech monitoring stating that the speaker continues to talk after error detection and at the same time plans the upcoming repair.

    Keywords DiSS

  • Richard Shillcock, Simon Kirby, Scott McDonald, and Chris Brew, “Filled pauses and their status in the mental lexicon,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 53-56.

    Abstract We report a study of the relationship between form and meaning in the most frequent monosyllabic words in the lexicon of English. There is a small but significant correlation between the phonological distance and the semantic distance between each pair of words. To this extent, words that have similar meanings tend to sound similar. Words differ as to the size of this meaning-form correlation in their relationship with all of the other words. When the words are ranked according to the size of this correlation we find that the words which appear towards the top of the ranking are the communicatively important words. When we look at the position in the ranking of the speech editing terms, such as er, oh and um, we find that they are at the very top of the ranking. We argue that this position reflects the communicative importance of these items, and that it therefore makes sense to treat them as a proper part of the mental lexicon.

    Keywords DiSS

  • Elizabeth Shriberg, “To ’errrr’ is human: ecology and acoustics of speech disfluencies,” Journal of the International Phonetic Association, vol. 31, no. 1, 2001, pp. 153-169. DOI: 10.1017/S0025100301001128.

    Abstract Unlike read or laboratory speech, spontaneous speech contains high rates of disfluencies (e.g. repetitions, repairs, filled pauses, false starts). This paper aims to promote ’disfluency awareness’ especially in the field of phonetics — which has much to offer in the way of increasing our understanding of these phenomena. Two broad claims are made, based on analyses of disfluencies in different corpora of spontaneous American English speech. First, an Ecology Claim suggests that disfluencies are related to aspects of the speaking environments in which they arise. The claim is supported by evidence from task effects, location analyses, speaker effects and sociolinguistic effects. Second, an Acoustics Claim argues that disfluency has consequences for phonetic and prosodic aspects of speech that are not represented in the speech patterns of laboratory speech. Such effects include modifications in segment durations, intonation, voice quality, vowel quality and coarticulation patterns. The ecological and acoustic evidence provide insights about human language production in real-world contexts. Such evidence can also guide methods for the processing of spontaneous speech in automatic speech recognition applications.

  • Jörg Spilker, Anton Batliner, and Elmar Nöth, “How to repair speech repairs in an end-to-end system,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 73-76.

    Abstract If automatic speech processing wants to deal with spontaneous speech, it has to deal with disfluencies in general and speech repairs in particular as well. The paper describes the processing of speech repairs in the VERBMOBIL system and discusses the special requirements of real-time systems. With respect to this criterion, the VERBMOBIL approach and its results are compared to other work. All these results are based more or less on the evaluation of a stand alone process, not integrated in a speech system. The ultimate goal is, of course, the use and the evaluation of the impact of such a repair process in a real-time, end-to-end system. An evaluation method based on this idea is presented and some preliminary results are given.

    Keywords DiSS

  • Nada Vasic, and Frank Wijnen, “Stuttering and speech monitoring,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 13-16.

    Abstract In this paper, we would like to argue that stuttering represents inadequate monitoring of the speech production process. The model we are proposing is the vicious circle hypothesis. The stuttering speaker has a malfunctioning monitor whose three parameters, namely focus, effort, and threshold are inappropriately set. In order to test our hypothesis, we tested 20 stuttering individuals in a dual task situation. The experiment consisted of three conditions: baseline where semi-spontansous speech was elicited and two dual-task conditions. First dual task was speaking and playiong a computer game at the same time where the processing resources were taken away from monitoring. The second dual task waqs designed to shift the monitor’s focus away from habitual monitoring. Subjects were asked to monitor for a particular word in their speech. The preliminary results for our expeiment show that in the dual task condition the number of disfluencies decreased in relation to the number of words, which, in turn supports our prediction that distraction has a positive effect on fluency in the case of stuttering individuals.

    Keywords DiSS

  • Michiko Watanabe, “The usage of fillers at discourse segment boundaries in Japanese lecture-style monologues,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 89-92.

    Abstract We examined whether fillers (filled pauses) in a Japanese lecture appeared more frequently after discourse segment boundaries (DSB) than after other sentence boundaries. Contrary to our hypothesis that fillers occur more often after DSB than after other sentence boundaries, the frequency of fillers in the first phrase after DSB did not differ statistically from that after other sentence boundaries. The location of fillers in the first phrase after DSB and after other boundaries did not show any clear difference, either. However, the types of fillers at the initial position of the first phrase after two kinds of boundaries were different; sentence initial ’eto’ appeared exclusively at DSB. This result indicates that sentence initial ’eto’ may help highlighting DSB, but not other types of fillers. Other kinds of fillers (’e’, ’ma’, ’ano’, ’sono’) seem to be mainly concerned with planning units of the utterance that are smaller than a sentence.

    Keywords DiSS

  • Asa Wengelin, “Disfluencies in writing - are they like in speaking?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 85-88.

    Abstract This paper presents a study of disfluencies in written language production. Texts from ten university students are compared to data from people who almost never use writing, namely adult dyslexics and to texts from people who communicate in writing under real-time constraints every day, namely deaf whose main use of writing is text telephone conversations. This paper investigates which types of disfluencies occur in writing, where they occur and their durations. Further, this paper investigates how different text types and the specific characteristics of deaf and dyslexic writers influence the distribution of disfluencies. The results are discussed in relation to earlier work on disfluencies in speaking.

    Keywords DiSS

  • Kouzou Yanagawa, “Hesitation Phenomenaが 高校生のリスニング理解に及ぼす影響,” STEP Bulletin, vol. 13, 2001, pp. 13-25.

    Abstract 日常会話の中で,著者のいうHesitation Phenomena (HP)が,重要な役割を持ち,コミュニケーションをよりrealisticで,生き生きとしたものにしていることは経験的にもよく理解できることです。教室場面での教科書的会話文や用意された会話文にもとづく練習場面での会話がいかにも空々しく聞こえ,非現実的に思えるのもそうしたHPの潤滑油的役割が介入していないからではないでしょうか。また,You know, とか I mean, といった相槌を間髪を容れず会話に自由にはさむことなどは,日本人のいちばん苦手とすることの一つと思われます。 しかしそうしたHPの存在は,初心者にとって聴解を助ける場合もあり,またかえって妨げる場合もあり,研究テーマとして興味深いものがあります。日本の高校生を対象にHPの存在が聴解に及ぼす影響を比較した本研究は,英語学習の指導上にも多くの示唆を与えるものとして,意義あるものといえるでしょう。 | 過去の先行研究のレビューから始まって,聴解素材の作成,実施,収集されたデータの分析と,論文完成に至るまでの手続きにはかなりの注意が払われ,慎重に進められていると思います。しかし,得られた結果は著者の予想仮説に反して,かなり明瞭な形でHPの存在が聴解を助けるというものでした。いままで,HPの存在が聴解にプラス効果をもたらすかそうでないかについて決着はついておらず,一般的結論を出すのはそう単純な問題ではないことを示していますが,ここでの結果は,生きた自然英語による教育を重視する人たちにとって勇気を与えるものとなるでしょう。これを契機にいろいろな発展が期待できる研究だと思います。

  • Michiko Yoshida, “Repeated phoneme effect in Japanese speech errors,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 17-20.

    Abstract Analyses of errors in the natural speech of Dutch, German, and English have shown that involuntary rearrangements of phonemes (e.g., left hemisphere heft lemisphere) are more likely to occur when the two words involved in the error have the same phoneme before or after the phoneme on which the error occurred (e.g., /E/ in left hemisphere) [1, 2]. A study by Dell (1984) has revealed that phoneme repetition could also contribute to experimentally induced speech errors in English [3]. The present study explored the effect of repeated phonemes in Japanese speech errors by means of two errorinducing experiments. Analyses of subjects’ errors showed that a sequence of syllables that share the same phoneme was more error-prone than one with a variety of phonemes, suggesting that phoneme repetition could contribute to Japanese speech errors. These results are consistent with the view that the repeated phoneme effect is common to all speakers regardless of language.

    Keywords DiSS


  • Judit Kormos, “The Role of Attention in Monitoring Second Language Speech Production,” Language Learning, vol. 50, no. 2, June 2000, pp. 343-384. DOI: 10.1111/0023-8333.00120.

    Abstract The study investigates the role of attention in monitoring second language speechproduction by means of analyzing the distribution and frequency of self-repairs and the correction rate of errors in the speech of 30 Hungarian learners of English at 3 levels of proficiency and of 10 native speakers of Hungarian. The results indicate that the amount of attention paid to the linguistic form of the utterance does not vary at different stages of L2 competence and that the distribution of attention in monitoring for errors is markedly different inL1 and L2. In the case of advanced L2 speakers, the extra attentional resources made available by the automaticity of certain encoding processes were used for checking the discourse-level aspects of their message.

  • Liz Temple, “Second language learner speech production,” Studia Linguistica, vol. 54, no. 2, August 2000, pp. 288-297. DOI: 10.1111/1467-9582.00068.

    Abstract This paper reports on a study which investigated temporal variables in foreign language learner speech and native speech. The findings are discussed from a cognitive processing perspective. The subjects were 30 intermediate/advanced level adult students of French as a foreign language and 20 native speakers of French. Short extracts of recorded interviews were transcribed and quantitative measures of pause and hesitation phenomena, repairs and errors were calculated. The speech production model of Levelt (1989) provides a framework for understanding the source of these phenomena and the significant differences between natives and learners in planning and encoding speech. Capacity limitations of working memory, related in particular to foreign language learners’ non-automatic processing mode, resulted in non-fluent speech performance, compared with native speakers.


  • Heather Bortfeld, Silvia D. Leon, Jonathan Bloom, Michael F. Schober, and Susan E. and Brennan, “Which speakers are most disfluent in conversation, and when?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 7-10.

    Abstract We examined disfluency rates in a corpus of task-oriented conversations [1] in which several factors were manipulated that could affect fluency rates. These factors included: speakers’ age (young, middleaged, and older), task roles (director vs. matcher), difficulty of domain (abstract geometric figures or tangrams vs. photographs of children’s faces), relationship between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only marginally higher (combined) disfluency rates than young and middleaged speakers. Overall, disfluency rates were higher both when speakers took the initiative and when they discussed tangrams, associating disfluencies with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repetitions and restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

    Keywords DiSS

  • Susan E. Brennan, and Michael F. Schober, “Uhs and interrupted words: The information available to listeners,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 19-22.

    Abstract Speech disfluencies are generally assumed to harm comprehension. Our studies investigated whether this is true, or whether certain disfluencies might actually help comprehension by marking for listeners which information the speaker intends to repair. We tested two hypotheses: (1) whether an interrupted word signals that the word was produced in error, and (2) whether a filler such as uh after an interrupted word signals an error. Listeners heard fluent instructions and disfluent ones whose reparanda contained completed words, interrupted words, or interrupted words with fillers, and then responded to these instructions. Responses to mid-word interruptions were no faster than to between-word interruptions, although there were fewer errors when less of the unintended word was heard. Responses to mid-word interruptions with uh were faster and more accurate than controls without disfluencies. With more complex displays, the response time advantage (but not the error rate advantage) diminished, suggesting that an interrupted word followed by uh tells listeners what the speaker does NOT mean. A fourth experiment showed that it is not the presence of the uh per se, but the additional time after the interrupted word that is the source of this "disfluency advantage."

    Keywords DiSS

  • Mark G. Core, and Lenhart K. Schubert, “Speech Repairs: A Parsing Perspective,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 47-50.

    Abstract This paper presents a grammatical and processing framework for handling speech repairs. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. This parser can also correct a pre-parser speech repair identifier producing increases in recall varying from 2% to 4.8%.

    Keywords DiSS

  • Robert Eklund, “A Comparative Analysis of Disfluencies in Four Swedish Travel Dialogue Corpora,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 3-6.

    Abstract This paper reports on ongoing work on disfluencies carried out at Telia Research AB. Four travel dialogue corpora are described: human-"machine"-human (Wizard-of-Oz); human-"machine" (Wizard-of-Oz); human-human and human-machine. The data collection methods are outlined and their possible influence on the collected material is discussed. An annotation scheme for disfluency labelling is described. Preliminary results on five different kinds of disfluencies are presented: filled and unfilled pauses, prolonged segments, truncations and explicit editing terms.

    Keywords DiSS

  • Jean E. Fox Tree, “Between-Turn Pauses and Ums,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 15-17.

    Abstract Pauses and ums are often treated as two versions of the same thing, with the traditional label for ums, filled pauses, emphasizing this seeming interchangeability. To explore this hypothesis, I compared how overhearers interpreted a speaker’s contribution to a conversation depending on whether the speaker responded immediately, paused and responded, or said um and responded. Overhearers answered a series of questions about the turn exchanges they had heard. The questions measured their interpretations of the second speakers’ speech production difficulty, honesty, comfort with the topic discussed, familiarity with the interlocutor, and desire to have further contact with the interlocutor. In two experiments, the type of turn exchange was found to influence overhearers’ interpretations. Results supply information about both the signalling properties of ums and the relationship between ums and pauses of varying lengths in the environment of a turn exchange.

    Keywords DiSS

  • Jean E. Fox Tree, and Josef C. Schrock, “Discourse Markers in Spontaneous Speech: Oh What a Difference an Oh Makes,” Journal of Memory and Language, vol. 40, 1999, pp. 280-295. DOI: 10.1006/jmla.1998.2613.

    Abstract Discourse markers are usually studied from the vantage point of corpora analyses. By looking at where they fall in spontaneous talk, hypotheses can be made about their possible functions. However, direct tests of listeners’ uses of these expressions are rare. In five experiments, we looked at the on-line spontaneous speech comprehension effects of one discourse marker, oh. We found that recognition of words was faster after oh than when the oh was either excised and replaced by a pause or excised entirely. We also found that semantic verification of words heard earlier in the discourse was faster after oh than when the oh was either excised and replaced by a pause or excised entirely, but only when the test point was downstream from the oh. Results demonstrate that oh is not only a potential signal to addressees, as has been suggested by corpora analyses, but that it is in fact used by addressees to help them integrate information in spontaneous talk.

  • Dafydd Gibbon, and Shu-Chuan Tseng, “Toward a formal characterisation of disfluency processing,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 35-38.

    Abstract Inherent structural characteristics of speech disfluencies are the prerequisite for the fulfilment of detecting and correcting speech disfluencies in spontaneous speech. However, a considerable number of recent research works on speech disfluencies focus on the surface patterns of speech disfluency editing structure, instead of looking into the relations between editing structure, the syntactic structure and the prosodic structure of speech disfluencies. In this paper we present first results of a new line of research, using feature structures modelled by finite state transducers, on the formal modelling of speech disfluencies in unplanned speech, in relation to all three levels of description.

    Keywords DiSS

  • Marie-Noëlle Guillot, Fluency and Its Teaching. Clevedon, England: Multilingual Matters.1999.

    Abstract We can all recognize fluency and practice it, but often do not understand what linguistic and paralinguistic operations are involved. This text tries to solve this puzzle. It begins by exploring perceptions of fluency to understand their common denomimators. It goes on to pinpoint the specific features which promote fluency while emphasizing its relative and interactional nature. These analyses produce both a methodological framework and a pedagogical strategy, illustrated by sample classroom activities. Language teachers, applied linguists, linguists and their students should find this book an accessible companion to the teaching and study of oral language, with French as its domain of application.

  • Peter A. Heeman, and K.H. Loken-Kim, “Detecting and Correcting Speech Repairs in Japanese,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 43-46.

    Abstract One of the characteristics of spontaneous speech is the abundance of speech repairs, in which speakers go back and repeat or change something they have just said. In other work [7], we proposed a language model for speech recognition that can detect and correct speech repairs in English. In this paper, we show that this model works equally as well on a Japanese corpus of spontaneous speech. The structure of the model captures the language independent aspect of speech repairs, while machine training techniques on an annotated corpus learn the language dependent aspects.

    Keywords DiSS

  • Kim Kirsner, Ben Roberts, and Yong-Heng Lee, “Why does spontaneous speech unfold in temporal cycles, sometimes?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 11-14.

    Abstract Spontaneous speech typically consists of alternating periods of continuous fluency, where fluency refers to the ratio of speech to pausing. Individual differences in fluency are substantial, with mean pause per minute ranging from less than 20 to more than 40 sec per minute in our sample of English and Mandarin speakers. While pauses have been regarded as critical clues for psycholinguistic analysis for decades, the existence of temporal cycles have been subject to extensive debate. The results of our experiments provide strong support for the presence of temporal cycles in spontaneous speech, and demonstrate in particular that fluency declines and increases prior and subsequent to topic shifts respectively. The source of temporal cycles is unclear, however. The prevailing assumption is that they reflect alternating periods of high level macro-planning, associated with low fluency, and low level micro-execution, associated with high fluency. However, a variety of alternative explanations merit consideration.

    Keywords DiSS

  • Judit Kormos, “Monitoring and Self-Repair in L2,” Language Learning, vol. 49, no. 2, June 1999, pp. 303-342. DOI: 10.1111/0023-8333.00090.

    Abstract The aim of this article is to review the psycholinguistic research on second language (L2) self-repair to date with particular attention to the relevance of this field for L2 production and acquisition. The article points out that W. J. M. Levelt’s (1989, 1993, 1992) and W. J. M. Levelt et al.’s (in press) perceptual loop theory of monitoring can be adopted for monitoring in L2 speech as well. It is also argued, however, that this theory needs to be complemented with recent research on consciousness, attention, and noticing in order to account for mechanisms of error detection in L2.

  • Willem J. M. Levelt, Ardi Roelofs, and Antje S. Meyer, “A theory of lexical access in speech production,” Behavioral and Brain Sciences, vol. 22, no. 1, 1999, pp. 1–38. DOI: 10.1017/S0140525X99001776.

    Abstract Preparing words in speech production is normally a fast and accurate process. We generate them two or three per second in fluent conversation; and overtly naming a clear picture of an object can easily be initiated within 600 msec after picture onset. The underlying process, however, is exceedingly complex. The theory reviewed in this target article analyzes this process as staged and feed-forward. After a first stage of conceptual preparation, word generation proceeds through lexical selection, morphological and phonological encoding, phonetic encoding, and articulation itself. In addition, the speaker exerts some degree of output control, by monitoring of self-produced internal and overt speech. The core of the theory, ranging from lexical selection to the initiation of phonetic encoding, is captured in a computational model, called WEAVER++. Both the theory and the computational model have been developed in interaction with reaction time experiments, particularly in picture naming or related word production paradigms, with the aim of accounting for the real-time processing in normal word production. A comprehensive review of theory, model, and experiments is presented. The model can handle some of the main observations in the domain of speech errors (the major empirical domain for most other theories of lexical access), and the theory opens new ways of approaching the cerebral organization of speech production by way of high-temporal-resolution imaging.

  • Robin Lickley, David McKelvie, and Ellen Gurman Bard, “Comparing human and automatic speech recognition using word-gating,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 23-26.

    Abstract This paper describes a study in which we compare human and automatic recognition of words in fluent and disfluent spontaneous speech. In a word-level gating study with confidence judgements, we examine how the recognition and confidence of recognition of words by humans develops over utterances and show how disfluency disrupts the process. We give an automatic recogniser the same task and compare its performance with the humans’. With both systems, subsequent context supports word recognition: confidence in word recognition peaks after subsequent words have been heard. With both systems, disfluency adversely affects recognition of words in the immediate vicinity of the disfluent interruption (for repeats and repairs): disrupted subsequent context disrupts the recognition process.

    Keywords DiSS

  • Douglas O’Shaughnessy, “Better detection of hesitations in spontaneous speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 39-42.

    Abstract Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speeech recognition work has concentrated on read speech, whose acoustic aspects differ significanlty from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., "uhh," "umm"), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficiences), identiyfing filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation.

    Keywords DiSS

  • Sherri Page, “Use of a postprocessor to identify and correct speaker disfluencies in automated speech recognition for medical transcription,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 27-30.

    Abstract Medical practitioners speak in a quasi-spontaneous monologue when they dictate a chart note, letter, or patient history. Prior research has largely ignored the issue of disfluency in dictation, arguing that speakers can control recording and start over if necessary. In 550,000 words of hand transcribed medical dictation, however, we find numerous filled pauses, repetitions, and other self-repairs. This paper describes: a pre-theoretical classification of disfluencies, developed to identify patterns useful in automatic text processing; the patterns of disfluency found in a corpus hand tagged with this classification, which include repetitions in combination with substitutions, insertions, and deletions; and, preliminary results of implementation of a disfluency pattern matcher and filter in a postprocessor developed for commercial use.

    Keywords DiSS

  • Sergey Pakhomov, and Guergana Savova, “Filled Pause Distribution and Modeling in Quasi-Spontaneous Speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 31-34.

    Abstract Filled pauses (FP’s) are characteristic of spontaneous speech and present considerable problems for speech recognition by being often recognized as short words. Recognition of quasispontaneous speech (medical dictation) is subject to this problem as well. An um can be recognized as thumb or arm if the recognizer’s language model does not adequately represent FP’s. Representing FP’s in the training corpus improves recognition. Several techniques of seeding a training corpus with FP’s were evaluated to show that a stochastic method, along with random insertion uniformly distributed around the average sentence length, yield better results compared to random insertion at other ranges. The optimal method of seeding a training corpus with FP’s may be linked to clause boundaries despite the fact that an imperfect method of inserting FP’s at clause boundaries used in this study failed.

    Keywords DiSS

  • Shu-Chuan Tseng, “Grammar, prosody and speech disfluencies in spoken dialogues,” Master's Thesis, Bielefeld University. 1999.