Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Bibliography of hesitation phenomena resources

Following is a complete list of published resources in the FPRC bibliography. Note that this is not an exhaustive list of publications related to hesitation phenomena. If you know of resources that ought to be in this list, then please send them to me via the FPRC contact form. Download the entire list in bibtex format here.


2020

  • Burcu Arslan, and Tilbe Göksun, “Understanding Multimodal Communication: Gesture Production and Disfluency in Speech,” in Proceedings of Gesture and Speech in Interaction (GESPIN2020), Stockholm, Sweden, September 2020. https://trello.com/c/uc6U4thK.

    Abstract Do gestures facilitate lexical access particularly when speech production is not fluent? This study investigates gesture and disfluency rates and patterns when individuals describe concrete and abstract paintings and asks whether gestures facilitate speech by resolving disfluencies. Turkish-speaking participants (N=30) were asked to describe three concrete and three abstract paintings. We coded speech disfluencies (i.e., filled pauses, repairs, repetitions), frequency and type of gestures used. The results showed that although describing abstract paintings were relatively more difficult compared to the concrete ones, disfluency rates and overall gesture frequency were similar between the two painting categories. However, representational gesture frequency was higher for the abstract category, emphasizing the relationship between representational gestures and conceptualization process. Moreover, we found that most disfluencies occurred without gestures and most gestures occurred without disfluent speech. These findings suggest that although there can be cases in which gestures facilitate speech, it does not mean that gestures are fully compensatory in nature.

  • Alyssa Bulow, “Write before you Speak: The Impact of Writing on L2 Oral Narratives,” Master's Thesis, Michigan State University, East Lansing, MI, . 2020. https://search.proquest.com/openview/a7419aa70a4496835659a3f90739b625/1?pq-origsite=gscholar&cbl=18750&diss=y.

    Abstract Current literature suggests that writing may better facilitate language learning than speaking practice alone, but direct empirical research demonstrating this is limited. Evidence is also limited as to whether grammar and vocabulary learned while writing can transfer to speaking. This study investigates the prediction that written planning, even more so than oral planning, leads to improved oral narratives. Thirty-four Spanish-speaking learners of English were randomly assigned to one of two groups: writing rehearsal or oral rehearsal; rehearsal being individual practice before the final task. The writing group composed a story ending in the written modality while the oral group rehearsed by narrating theirs out loud. Both groups recorded their oral story continuation task as the final product. In order to compare the impact of writing versus oral rehearsal on learners’ subsequent oral performance, final narratives were examined using complexity, accuracy, and fluency measures. Results showed that the writing group produced more fluent and lexically diverse narratives than the speaking group but there was no effect on accuracy, and limited effects on grammatical complexity. The study concludes with pedagogical implications for using writing tasks to prepare students for oral tasks.

    Keywords L2 writing, complexity, fluency, story continuation task (SCT), EFL, benefits ofwriting for speaking, pre-task planning, rehearsal

  • Aurélie Chlébowski, and Nicolas Ballier, “A Manually Annotated Resource for the Investigation of Nasal Grunts,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, European Language Resources Association, May 2020, pp. 6514-6522(in English). https://www.aclweb.org/anthology/2020.lrec-1.802.

    Abstract This paper presents an annotation framework for nasal grunts of the whole French CID corpus (Bertrand et al., 2008). The acoustic components under scrutiny are justified and the annotation guidelines are described. We carefully characterise the acoustic cues and visual cues followed by the annotator, especially for non-modal phonation types. The conventions followed for the annotation of interactional and positional properties of grunts are explained. The resulting datasets after data extraction with Praat scripts (Boersma and Weenink, 2019) are analysed with R (R Core Team, 2017), focusing on duration. We analyse the effect of non-modal phonation (especially ingressive phonation) on duration and discuss a specialisation of grunts observed in the CID for grunts with ingressive phonation. The more general aim of this research is to establish putative core and additive properties of grunts and a tentative typology of grunts in spoken interactions.

  • Jing Fang, “Pause in Sight Translation: A Pilot Study,” in Translation Education: A Tribute to the Establishment of World Interpreter and Translator Training Association (WITTA), Zhao, Junfeng and Li, Defeng and Tian, Lu, Ed.Singapore: Springer, 2020, pp. 173-192. DOI: 10.1007/978-981-15-7390-3_11.

    Abstract Pauses are common in the practice of sight translation, especially among student interpreters. However, research on this topic has been limited so far. Based on a pilot project, this study aims to explore pauses in English-Chinese sight translation. Two groups of student interpreters, at different stages of training, were recruited to sight translate two texts with different syntactic complexity. The data collection also involved a pre-task vocabulary test and a post-task interview. All the silent pauses were identified and labelled based on their duration, and grammatical position in the text. The results showed that syntactic complexity of the source text had affected the pauses of short and medium length, but its effect on the long pauses of over 2 s was limited. Also, training was found to have an effect on reducing short pauses in the simple text, and medium pauses in the complex text. And students with longer training had significantly fewer longer pauses (over 1 s) at an ungrammatical position than the junior student interpreters. The research also found that, although interpreters had encountered difficult words in the source texts, these words were not a major contributor to pauses. Apart from pausing, interpreters also responded in other ways when facing lexical challenges.

    Keywords Pause; Sight translation; Syntactic complexity; Training effect

  • Lorenzo García-Amaya, and Sean Lang, “Filled Pauses are Susceptible to Cross-Language Phonetic Influence: Evidence from Afrikaans-Spanish Bilinguals,” Studies in Second Language Acquisition, 2020, pp. 1–29. DOI: 10.1017/S0272263120000169.

    Abstract This article investigates the effects of long-term bilingualism on the production of filled pauses (FPs; e.g., uh, um, eh, em) in the speech of Afrikaans-Spanish bilinguals from Patagonia, Argentina. The instrumental analysis draws from a corpus of sociolinguistic interviews obtained from three speaker groups: L1-Afrikaans/L2-Spanish bilinguals; L1-Spanish-comparison speakers, also from Patagonia; and L1-Afrikaans-comparison speakers from South Africa. In the data analysis, we examined relative FP usage (categorical outcomes), as well as phonetic measures of vowel quality and segmental duration (continuous outcomes). The results allude to multiple patterns of cross-language influence (e.g., L1-to-L2 influence, L2-to-L1 influence, bidirectional influence), which depend on the phonetic measure explored. Overall, the findings suggest that the patterns of cross-language phonetic influence observed in the L2 learning of traditionally understood lexical items likewise hold in the L2 learning of hesitation markers such as FPs.

  • Kajsa Gullberg, “Planning Processes in Speaking, Texting, and Writing: The effect of reader’s and listener’s temporal and spatial presence on planning in language production,” Master's Thesis, Lund University. September 2020. http://lup.lub.lu.se/student-papers/record/9030233.

    Abstract This thesis investigates planning processes in language production, more specifically in texting as compared to speaking and writing, through pauses analyses (Goldman Eisler, 1969; Matsuhashi, 1981). Texting is used to examine the role of the spatial and temporal presence of a listener/reader in language production. Texting offers an interesting context in this respect since it has spatial absence between texter and reader, just like in writing, but temporal presence just like in speaking. The main research questions are as follows: In which contexts are pauses located in texting, speaking, and writing? How long are the production bursts in speaking, texting, and writing? This study uses the processing models the blueprint of the speaker (Levelt 1999) and the individual-environmental model of written language production (Hayes 1996) to identify the processes in texting. Part of the thesis comprises method development to capture and analyse the real-time language production in texting on smartphones. The method consists of an experimental set-up where the same participant talks and texts dialogically, and then writes monologically. In the texting and writing conditions, the pause threshold is 1 minute, and in the speaking condition all perceived pauses are identified. In the analyses, the pauses are categorised based on the context that precede the pause (e.g. syntactic unit or revision). The results show that the temporal and spatial presence of the reader/listener has an effect on language production. Clause boundaries are important contexts for pausing and planning in all three conditions, indicating that language users make use of syntactic units when they produce language regardless of the spatial and temporal presence of the speaker/listener. In texting and writing, pauses following a revision are important, showing that the texters and writers review what they have written. Further, the results show that texting has shorter planning units than both speaking and writing, which can be explained by the temporal presence of the reader resulting in a faster pace of communication, while the writing tool limits the speed at which language can be produced. In speaking and texting, pauses in phrase-final position are more common than in writing, which can be a result of the shorter planning units. In conclusion, texters adapt their language production to the temporal presence of a reader, through shorter planning units, while also adapting to the spatial absence of the reader through reviewing and editing their messages. The findings of this thesis are finally used to propose a model for the language processes in texting.

    Keywords language processing; spoken language production; written language production; computer mediated communication; CMC; planning processes; texting

  • Widya Nindi Hardianti, and Rohmani Nur Indah, “Disfluencies in Stand-Up Comedy: A Psycholinguistic Analysis on Drew Lynch's Stuttering,” LEKSEMA: Jurnal Bahasa dan Sastra, vol. 5, no. 1, 2020, pp. 27-38. DOI: 10.22515/ljbs.v5i1.2075. http://ejournal.iainsurakarta.ac.id/index.php/leksema/article/view/2075.

    Abstract Difficulties of producing speech sound in stutterers are indicated by the repetition, pause, prolongation, revision, and filled pause on the speaking. However, such difficulties do not hinder the communication as shown in the speech of a stand-up comedian named Drew Lynch. This study aims at exploring the types of fluency disorder identified in Lynch’s utterances on stage. This study uses the descriptive qualitative method employed through the process of observing, transcribing, describing, and analyzing his utterances in American Got Talent videos. The result shows Lynch produces all kinds of disfluency covering filled pause, phrase repetition, revision, multisyllabic whole-word repetition, monosyllabic whole-word repetition, repetition of individual sound or syllable, prolongation of sound, and block. The monosyllabic whole-word repetition is more dominant. The combination happens between revision with monosyllabic whole-word repetition, prolongation, or multisyllabic whole-word repetition. These findings confirm that in the context of stand-up comedy, the disfluencies in stuttering do not hamper the transfer of meaning.

    Keywords disfluency, fluency disorder, stand-up comedy, stuttering

  • Zara Harmon, and Vsevolod Kapatsinski, “The best-laid plans of mice and men: Competition between top-down and preceding-item cues in plan execution,” in CogSci 2020 Proceedings (Proceedings of the Cognitive Science Society), 2020, pp. 1674-1680. https://cognitivesciencesociety.org/cogsci20/papers/0366/index.html.

    Abstract There is evidence that the process of executing a planned utterance involves the use of both preceding-context and top-down cues. Utterance-initial words are cued only by the top-down plan. In contrast, non-initial words are cued both by top-down cues and preceding-context cues. Co-existence of both cue types raises the question of how they interact during learning. We argue that this interaction is competitive: items that tend to be preceded by predictive preceding-context cues are harder to activate from the plan without this predictive context. A novel computational model of this competition is developed. The model is tested on a corpus of repetition disfluencies and shown to account for the influences on patterns of restarts during production. In particular, this model predicts a novel Initiation Effect: following an interruption, speakers re-initiate production from words that tend to occur in utterance-initial position, even when they are not initial in the interrupted utterance.

  • Nur Kafifah, and Nurul Aini, “A Comparative Analysis of Spoken Error of Students’ Utterances,” Pedagogy : Journal of English Language Teaching, vol. 8, no. 1, 2020, pp. 64-72. DOI: 10.32332/pedagogy.v8i1.1926. http://e-journal.metrouniv.ac.id/index.php/pedagogy/article/view/1926.

    Abstract This present study deals with the comparative analysis in spoken production errors made by the 2nd and the 4th-semester students of English Education Study Program in STKIP Kumala Metro. The objectives of this research are to comparative the types of errors, the frequency of error, the dominant type of errors, the similarities and differences of errors, and the sources of errors. The type of this research is qualitative research. The data of this research are utterances containing errors taken from the 2nd and the 4th-semester students. In collecting data, the researcher listened to the audio record carefully, writes the scripts correctly, then identifies the data, and selects the data deals with the types of errors. The researcher used Clark and Clark, Dulay, Burt, and Krashen's theory to analyze the errors. The results indicated that there are three types of errors made by the 2nd-semester students, namely, speech errors (78,22%), morphological errors (15,6%), and syntactical errors (6,06%). Whereas, the erroneous made by the 4th-semester students are speech errors (83,86%), morphological errors (13,1%), and syntactical errors (2,93%). The speech errors made by the 2nd and the 4th-semester students have similarities and differences. The similarities of speech errors that found by the researcher were: silent pause, filled pause, repeats, false start (unretracted), false start (retraced), correction, interjection, stutters, a slip of the tongue, error in pronunciation, error in vocabulary, error in word selection, the omission of bound morpheme-s, the omission of to be, the addition of to be, the omission of the verb, the omission of –Ing, the addition of –Ing, and misuse of to be. The differences of errors made by the 2nd and the 4th-semester students are in the addition of preposition, malformation, and disordering. The dominant error made by students is filled pause. These speech errors mostly caused by three sources; cognitive difficulty, situational anxiety, and social reason.

  • Oriana Kilbourn-Ceron, Meghan Clayards, and Michael Wagner, “Predictability modulates pronunciation variants through speech planning effects: A case study on coronal stop realizations,” Laboratory Phonology: Journal of the Association for Laboratory Phonology, vol. 11, no. 1, 2020, pp. 5. DOI: 10.5334/labphon.168.

    Abstract Predictability has been shown to be associated with many dimensions of variation in speech, including durational variation and variable omission of segments. However, the mechanism or mechanisms that underlie these effects are still unclear. This paper presents data on a new aspect of predictability in speech, namely how it affects allophonic variation. We examine two coronal stop allophones in English, flap and glottal stop, and find that their relationship with predictability is quite different from what is expected under current theories of probabilistic reduction in speech. Flapping is more likely when the word that follows is more predictable, but is not influenced by the frequency of the word itself, while glottal stops are more likely in words that are less predictable. We propose that the crucial distinction between these two allophones is how they are conditioned by phonological context. This, we argue, interacts with online speech planning processes and gives rise to variability for context-dependent allophones. This hypothesis offers a specific, testable mechanism for certain predictability effects, and has the potential to extend to other factors that contribute to variability in speech.

    Keywords Phonological variation, predictability, speech production planning, corpus phonology

  • Katarzyna Klessa, and Maciej Karpiński, “Hesitation markers in a corpus of Polish-German, German-German and Polish-Polish task-oriented dialogues in the context of communicative alignment,” in Proceedings of the 19th Meeting of theTexas Linguistics Society, vol. 19, Austin, Texas, USA, February 2020, pp. 17-26. http://tls.ling.utexas.edu/2020tls/TLS19_Conference_Proceedings.pdf.

    Abstract In this study, we investigate the distribution and properties of hesitation markers produced in task-oriented dialogues by Polish and German teenagers. The material comes from a multimodal corpus which has been collected in the Polish-German border area, in the cities of Słubice and Frankfurt (Oder). The speakers took part in two kinds of dialogue tasks: a collaborative and a competitive one. We report that the number and durational variability of hesitation markers produced by the speakers are influenced by dialogue task type and language configuration. We inspect aspects of interlocutor alignment using automatized annotation mining. A number of patterns of alignment can be visually traced for the study material. However, only few of them can be confirmed by tests as statistically significant.

  • Christian Koch, and Britta Thörle, “Metadiscursive Activities in Oral Discourse Production in L2 French: A Study on Learner Profiles,” Corpus Pragmatics, 2020. DOI: 10.1007/s41701-020-00089-7.

    Abstract This study explores the use of discourse markers (DMs) in metadiscursive activities such as word searches, repairs or metalinguistic evaluations that occur during spontaneous oral production. The analysis is based on a corpus of telephone conversations between advanced learners and native speakers of French and draws on functional as well as on interactional work on DM. In a first step, three selected learner profiles provide insight, by means of sequence analysis, into how individual learners make use of their particular DM inventory for their utterance planning, carrying out repairs and expressing attitudes toward their oral production. In a second step, the study compares native and non-native speaker’s DM inventories in order to detect general tendencies in the learners’ DM use that differ from the native speakers’ use of DMs. The comparison of the profiles shows that, even if there is relatively little agreement among the learners regarding the concrete lexical forms of the DMs, similarities can be discerned regarding the interlinguistic characteristics (e.g. individual preferences and overuse in the form of “lexical teddy bears” such as oui, alors or voilà, underuse of typical French reformulation markers like enfin, and weak routine in the lexicalisation of metadiscursive comments).

  • Justin J. H. Lo, “Between Äh(m) and Euh(m): The Distribution and Realization of Filled Pauses in the Speech of German-French Simultaneous Bilinguals,” Language and Speech, vol. 63, no. 4, December 2020, pp. 746-768. DOI: 10.1177/0023830919890068. https://journals.sagepub.com/doi/10.1177/0023830919890068.

    Abstract Filled pauses are well known for their speaker specificity, yet cross-linguistic research has also shown language-specific trends in their distribution and phonetic quality. To examine the extent to which speakers acquire filled pauses as language- or speaker-specific phenomena, this study investigates the use of filled pauses in the context of adult simultaneous bilinguals. Making use of both distributional and acoustic data, this study analyzed UH, consisting of only a vowel component, and UM, with a vowel followed by [m], in the speech of 15 female speakers who were simultaneously bilingual in French and German. Speakers were found to use UM more frequently in German than in French, but only German-dominant speakers had a preference for UM in German. Formant and durational analyses showed that while speakers maintained distinct vowel qualities in their filled pauses in different languages, filled pauses in their weaker language exhibited a shift towards those in their dominant language. These results suggest that, despite high levels of variability between speakers, there is a significant role for language in the acquisition of filled pauses in simultaneous bilingual speakers, which is further shaped by the linguistic environment they grow up in.

  • Minxia Luo, Mona Neysari, Gerold Schneider, Mike Martin, and Burcu Demiray, “Linear and Nonlinear Age Trajectories of Language Use: A Laboratory Observation Study of Couples’ Conflict Conversations,” The Journals of Gerontology: Series B, vol. 75, no. 9, 03 2020, pp. e206-e214. DOI: 10.1093/geronb/gbaa041.

    Abstract This study investigated linear and nonlinear age effects on language use with speech samples that were representative of naturally occurring conversations.Using a corpus-based approach, we examined couples’ conflict conversations in the laboratory. The conversations, from a total of 364 community-dwelling German-speaking heterosexual couples (aged 19–82), were videotaped and transcribed. We examined usage of lower-frequency words, grammatical complexity, and utterance of filled pauses (e.g., äh [“um”]).Multilevel models showed that age effects on the usage of lower-frequency words were nonsignificant. Grammatical complexity increased until middle age (i.e., 54) and then declined. The utterance of filled pauses increased until old age (i.e., 70) and then decreased.Results are discussed in relation to cognitive aging research.

    Keywords Adult life span; Cognitive aging; Filled pauses; Frequency of nouns; Grammatical complexity

  • Nathan D. Maxfield, “Inhibitory Control of Lexical Selection in Adults who Stutter,” Journal of Fluency Disorders, vol. 66, 2020, pp. 105780. DOI: 10.1016/j.jfludis.2020.105780. http://www.sciencedirect.com/science/article/pii/S0094730X20300358.

    Abstract Purpose: Based on previous evidence that lexical selection may operate differently in adults who stutter (AWS) versus typically-fluent adults (TFA), and that atypical attentional processing may be a contributing factor, the purpose of this study was to investigate inhibitory control of lexical selection in AWS. | Method: 12 AWS and 12 TFA completed two tasks. One was a picture naming task featuring High and Low Agreement object naming. Naming accuracy and reaction times (RT), and event-related potentials (ERPs) time-locked to picture onset, were recorded. Second was a flanker task featuring Congruent and Incongruent arrow arrays. Push-button accuracy and RTs, and ERPs time-locked to arrow array onset, were recorded. | Results: Low Agreement pictures were named less accurately and slower than High Agreement pictures in both Groups. The magnitude of the Agreement effect on naming RTs was larger in AWS versus TFA. Delta-plot analysis revealed that the Agreement effect was positively correlated with individual differences in inhibition in TFA but not in AWS. Moreover, Low Agreement pictures elicited negative-going ERP activity relative to High Agreement pictures in both Groups. However, the scalp topography of this effect was markedly reduced in AWS versus TFA. For the Flanker task, Congruency affected push-button accuracy and RTs, and N2 amplitudes, similarly between groups. | Conclusions: Results point to a selective deficit in inhibitory control of lexical selection in AWS. Potential pathways between diminished inhibitory control of lexical selection, speech motor control and stuttering are discussed.

    Keywords stuttering, lexical selection, executive, inhibition, ERP

  • Mohammed Ali Mohsen, and Mutahar Qassem, “Analyses of L2 Learners’ Text Writing Strategy: Process-Oriented Perspective,” Journal of Psycholinguistic Research, vol. 49, 2020, pp. 435-451. DOI: 10.1007/s10936-020-09693-9.

    Abstract Second language writing researchers have examined the affordances of Automated Writing Evaluation programs in providing immediate feedback that helps improve students’ writing outputs. However, a little is known about tracking learners’ process during writing essays and whether much/less pauses made by learners could predict good/poor quality of students’ writing output. This article aims to address this issue by recording a case study of 8 postgraduate students’ pauses during writing two types of text genre; descriptive and argumentative essays. Their pauses have been recorded using Keystroke logging program—Input Log 7.0 (Leijten and Van Waes in Writ Commun 30:358–392, 2013. https://doi.org/10.1177/0741088313491692) and their screen activities were captured by Active Presenter program. Findings revealed that the students’ pauses were significantly higher in word boundary than in sentence and/or paragraph boundaries. Moreover, on word boundary, pauses before words were significantly higher than that after words for both types of text genre. Concerning pauses across text genre, students’ pauses were significantly higher in the argumentative essay than that of the descriptive essay. Multiple regression revealed negative correlation between much pauses and poor quality of students’ product in the descriptive essay while there was no correlation found in the argumentative essay.

  • Costanza Navarretta, “Speech Pauses and Dialogue Acts,” in 2020 IEEE International Conference on Human-Machine Systems (ICHMS), Rome, Italy, IEEE, 2020, pp. 1-6. DOI: 10.1109/ICHMS49158.2020.9209502. https://ieeexplore.ieee.org/document/9209502.
  • Luis Bernardo Quesada Nieto, “Fenómenos de vacilación, sus contextos léxicos ysintácticos en entrevistas formales de legisladores aciudadanos en el Congreso de la Ciudad de México [Lexical and syntactic contexts of hesitation phenomena informal deputy-citizen interviews conducted at Mexico City congress],” Cuadernos de Lingüística de El Colegio de México, vol. 7, no. e141, October 2020, pp. 1-50. DOI: 10.24201/clecm.v7i0.141. http://www.scielo.org.mx/scielo.php?pid=S2007-736X2020000100102&script=sci_abstract&tlng=en.

    Abstract This article, which is an outcome of an ethnographic research, aims to offer an insight into lexical and syntactic contexts of some hesitation phenomena (short fillers, repetitions, long fillers, word lengthening, unfinished words and unfinished phrases), identified in a corpus sample that consists of structured interviews conducted by a group of deputies of Mexico City Local Congress with citizens who applied for the ombudsperson’s position at the city’s Human Rights Office (Comisión de Derechos Humanos de la Ciudad de México). Drawing upon a lexical and syntactic description, some remarks on the hesitation phenomena’s linguistic and communicative values are presented. I propose an interpretation of hesitation occurrence patterns that appear in the respondent’s answers. This interpretation is based on the discursive planning level, the interaction between hesitation markers and word classes, and the concept of repertoire as it has been used in the theory of translanguaging. Towards the end of the manuscript I argue that the studied phenomena and their distribution are directly related to open class words, and the cognitive effort of producing grammatical, accurate and socially appropriate messages.

    Keywords hesitation markers, discursive planning, oral language, word classes, repertoire, translanguaging theory

  • Nikhil Saini, Jyotsana Khatri, Preethi Jyothi, and Pushpak Bhattacharyya, “Generating Fluent Translations from Disfluent Text Without Access to Fluent References: IIT Bombay@IWSLT2020,” in Proceedings of the 17th International Conference on Spoken Language Translation, Online, Association for Computational Linguistics, July 2020, pp. 178-186. DOI: 10.18653/v1/2020.iwslt-1.22. https://www.aclweb.org/anthology/2020.iwslt-1.22.

    Abstract Machine translation systems perform reasonably well when the input is well-formed speech or text. Conversational speech is spontaneous and inherently consists of many disfluencies. Producing fluent translations of disfluent source text would typically require parallel disfluent to fluent training data. However, fluent translations of spontaneous speech are an additional resource that is tedious to obtain. This work describes the submission of IIT Bombay to the Conversational Speech Translation challenge at IWSLT 2020. We specifically tackle the problem of disfluency removal in disfluent-to-fluent text-to-text translation assuming no access to fluent references during training. Common patterns of disfluency are extracted from disfluent references and a noise induction model is used to simulate them starting from a clean monolingual corpus. This synthetically constructed dataset is then considered as a proxy for labeled data during training. We also make use of additional fluent text in the target language to help generate fluent translations. This work uses no fluent references during training and beats a baseline model by a margin of 4.21 and 3.11 BLEU points where the baseline uses disfluent and fluent references, respectively. Index Terms- disfluency removal, machine translation, noise induction, leveraging monolingual data, denoising for disfluency removal.

  • Katerina Smirnova, Nikolay Korotaev, Yana Panikratova, Irina Lebedeva, Ekaterina Pechenkova, and Olga Fedorova, “Using the RUPEX Multichannel Corpus in a Pilot fMRI Study on Speech Disfluencies,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, European Language Resources Association, May 2020, pp. 195-203(in English). https://www.aclweb.org/anthology/2020.lrec-1.25.

    Abstract In modern linguistics and psycholinguistics speech disfluencies in real fluent speech are a well-known phenomenon. But it's not still clear which components of brain systems are involved into its comprehension in a listener's brain. In this paper we provide a pilot neuroimaging study of the possible neural correlates of speech disfluencies perception, using a combination of the corpus and functional magnetic-resonance imaging (fMRI) methods. Special technical procedure of selecting stimulus material from Russian multichannel corpus RUPEX allowed to create fragments in terms of requirements for the fMRI BOLD temporal resolution. They contain isolated speech disfluencies and their clusters. Also, we used the referential task for participants fMRI scanning. As a result, it was demonstrated that annotated multichannel corpora like RUPEX can be an important resource for experimental research in interdisciplinary fields. Thus, different aspects of communication can be explored through the prism of brain activation.

  • Wikipedia contributors, “Filler (linguistics) -- Wikipedia, The Free Encyclopedia,” October 2020. https://en.wikipedia.org/w/index.php?title=Filler_(linguistics)&oldid=978016784.

    Abstract In linguistics, a filler, filled pause, hesitation marker or planner is a sound or word that is spoken in conversation by one participant to signal to others a pause to think without giving the impression of having finished speaking. (These are not to be confused with placeholder names, such as thingamajig, whatchamacallit, whosawhatsa and whats'isface, which refer to objects or people whose names are temporarily forgotten, irrelevant, or unknown.) Fillers fall into the category of formulaic language, and different languages have different characteristic filler sounds. The term filler also has a separate use in the syntactic description of wh-movement constructions.

  • Kadek Wirahyuni, and Putu Nitiasih, “Pause and Slip of the Tongue on the Participants of 2019 Putra Putri Undiksha in the Interview Session,” International Journal of Education and Pedagogy, vol. 2, no. 2, 2020, pp. 64-77. http://myjms.moe.gov.my/index.php/ijeap/article/view/9488.

    Abstract The Election of Putra Putri Undiksha is conducted every year. There are several stages in the selection of Putra Putri Undiksha, which one of them is the interview stage. At this stage, participants will be interviewed about insight, talent, personality, and beauty or good looks. During this interview, researchers found pause and slip of the tongue that were said by several participants. This research uses descriptive qualitative research. Qualitative research is an approach in conducting research whose orientation lies in natural phenomena (Mahmud, 2011: 89). Sources of data in this study took the form of slip of the tongue and pause notes experienced by the participants. The subjects of the research were 50 participants of Putra Putri Undiksha consisting of 22 men and 28 women. Data collection technique in this study is indirect techniques in the form of documentary study techniques. The source consists of documents in the form of notes (Syamsuddin and Damaianti, 2015: 108). The types of pause that are obtained are pause and filled pause. The nine pauses that occurred were as many as nine, consisting of 2 pauses and there were 7 filled in, namely ‘e’, ‘m’, and ‘ng’. In addition there are also progressive repetitive pause that are ‘saya’, ‘apa’, ‘itu’, and ‘ya’. Furthermore, there were 13 slips of the tongues spoken by Undiksha Putra Putri participants during the interview. Tongue blobs found were tongue flirting, selection error and assembling error. Selection errors are divided into three types, namely semantic errors, which are the utterances, 'Pak' and 'selamat pagi'. Furthermore, the error of malaproprism is the utterance of 'fikir', and the error of mixed words or blends on the utterance of sinu, benul, inu, bileh. The mistake of assembling in this research is the transposition error ‘menyadari sudah’, ‘semester tiga baru’, and ‘media sosial’. Furthermore, the mistake of anticipating assembling is found in the utterances ‘halus’, ‘pretasi’, and ‘diporpaganda’. The cause of pause and slips of the tongue in the Putra Putri Undiksha participants during this interview was due to nervousness or nervousness, thinking, not knowing the answers, haste, spontaneity, out of focus, and habits.

  • Yasunori Yamada, Kaoru Shinkawa, Akihiro Kosugi, Masatomo Kobayashi, Hironobu Takagi, Miyuki Nemoto, Kiyotaka Nemoto, and Tetsuaki Arai, “Predicting Future Accident Risks of Older Drivers by Speech Data from a Voice-Based Dialogue System: A Preliminary Result,” in Advances in the Human Side of Service Engineering. AHFE 2020. Advances in Intelligent Systems and Computing, vol. 1208, Springer, Cham, July 2020, pp. 131-137. DOI: 10.1007/978-3-030-51057-2_19.

    Abstract As the world’s elderly population increases, driving accidents involving older adults has become an increasingly serious social problem. Previous studies have suggested cognitive impairments as one of the risk factors for future accidents. However, it remains unclear whether and how such future accident risks related to cognitive impairments can be predicted by using health monitoring technologies. In this study, we collected speech data from simulated conversations between 38 healthy older adults and a voice-based dialogue system. We followed up with the participants 1.5 years later and found that 17 of them had experienced near-accidents within the past year. We then built a binary classification model using the originally obtained speech data and found through leave-one-out cross-validation that it could predict whether a person would have a near-accident experience with 78.9% accuracy. Our preliminary results suggest that speech data from voice-based interaction systems might help older drivers recognize future accident risks.

  • Michael Zock, and Chris Biemann, “Comparison of Different Lexical Resources With Respect to the Tip-of-the-Tongue Problem,” Journal of Cognitive Science, vol. 21, no. 2, 2020, pp. 193-252. http://cogsci.snu.ac.kr/jcs/index.php/issues/?uid=298&mod=document.

    Abstract Language production is largely a matter of words which, in the case of access problems, can be searched for in an external resource (lexicon, thesaurus). When accessing the resource, the user provides her momentarily available knowledge concerning the target and the resource-powered system responds with the best guess(es) it can make given this input. As tip-of-the-tongue studies have shown, people always have some knowledge concerning the target (meaning fragments, number of syllables, ...) even if its precise or complete form is eluding them. We will show here how to tap on this knowledge to build a resource likely to help authors (speakers/writers) to overcome the Tip-of-the-Tongue (ToT) problem. Yet, before doing so we need a better understanding of the various kinds of knowledge people have when looking for a word. To this end, we asked crowd workers to provide some cues to describe a given target and to specify then how each one of them relates to it, in the hope that this could help others to find the elusive word. Next, we checked how well a given search strategy worked when being applied to differently built lexical networks. The results showed quite dramatic differences, which is not really surprising. After all, different networks are built for different purposes; hence each one of them is more or less well suited for a given task. What was more surprising though is the fact that the relational information given by the users did not allow us to find the elusive word in WordNet more easily than without relying on this information.

    Keywords word access, tip of the tongue problem, indexing, knowledge states, metaknowledge, mental lexicon, navigation, lexical networks

2019

  • Thanaporn Anansiripinyo, and Chutamanee Onsuwan, “Acoustic-phonetic characteristics of Thai filled pauses in monologues,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 51-54. DOI: 10.21862/diss-09-014-anan-onsu. https://doi.org/10.21862/diss-09-014-anan-onsu.

    Abstract Filled pause (FP) is one type of disfluent phenomena that is commonly found in everyday speech. It has been widely studied in many languages, but little is known about this topic in Thai. This work explored three important acoustic-phonetic characteristics of Thai filled pauses in monologues. To elicit target monosyllabic tokens of FPs and those of regular word (RW) counterparts, 31 Thai adult females were asked to watch two short cooking videos and describe the contents. They were also asked to read out loud target word lists. Three acoustic measures: syllable dura¬tion, first (F1) and second formant (F2) frequencies were taken from 738 tokens. Across vowel contexts, only F2, not F1, in FPs, was significantly different from that in RWs. Differences in syllable duration between RWs versus FPs were near significant. The findings suggest that Thai speakers produced FPs in a presumably different way from RWs. In FPs, the syllable was relatively lengthened and the tongue position was moved towards the center of vowel space. Future directions include a detailed analysis of FPs in terms of amplitude, fundamental frequency, pause duration before/after fillers and other non-linguistic factors.

  • Maria Bakti, “Error type disfluencies in consecutively interpreted and spontaneous monolingual Hungarian speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 71-74. DOI: 10.21862/diss-09-019-bakti. https://doi.org/10.21862/diss-09-019-bakti.

    Abstract Interpreting can be considered as a form of spontaneous speech, the key differences being that language change is involved in interpreting and the fact that speech production is influenced by several constraints during interpreting. Research has shown that the interpreting task influences the disfluency patterns of target language texts. The aim of this paper is to investigate how the frequency and distribution of error type disfluencies changes in the target language output of trainee interpreters as they progress in their training. Results indicate that there is no considerable change in the frequency and proportion of error type disfluencies in the target language texts recorded at the end of the second, third and fourth semesters of interpreter training. The proportion of error type disfluencies is higher in the consecutively interpreted texts than in the spontaneous monolingual speech of the students. This suggests that the complexity of the task, rather than progress in training, determines the disfluency pattern of consecutively interpreted target language texts.

  • Charlotte Bellinghausen, Thomas Fangmeier, Bernhard Schröder, Johanna Keller, Susanne Drechsel, Peter Birkholz, Ludger Tebartz van Elst, and Andreas Riedel, “On the role of disfluent speech for uncertainty in articulatory speech synthesis,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 39-42. DOI: 10.21862/diss-09-011-bell-etal. https://doi.org/10.21862/diss-09-011-bell-etal.

    Abstract In this paper we present a perception study on the role of disfluent speech in forms of prosodic cues of uncertainty in question-answering situations. In our scenario the answer to each question was modeled by varying three prosodic cues: pause, intonation, and hesitation. The utterances were generated by means of an articulatory speech synthesizer. Subjects were asked to rate each answer on a Likert scale with respect to uncertainty, naturalness and understandability. Results showed evidence for an additive principle of the prosodic cues, i.e. the more cues were activated the higher the perceived level of uncertainty. Overall, the effect of intonation and hesitation was more evident than the effect of pause.

  • Simon Betz, and Loulou Kosmala, “Fill the silence! Basics for modeling hesitation,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 11-14. DOI: 10.21862/diss-09-004-betz-kosm. https://doi.org/10.21862/diss-09-004-betz-kosm.

    Abstract In order to model hesitations for technical applications such as conversational speech synthesis, it is desirable to understand interactions between individual hesitation markers. In this study, we explore two markers that have been subject to many discussions: silences and fillers. While it is generally acknowledged that fillers occur in two distinct forms, um and uh, it is not agreed on whether these forms systematically influence the length of associated silences. This notion will be investigated on a small dataset of English spontaneous speech data, and the measure of distance between filler and silence will be introduced to the analyses. Results suggest that filler type influences associated silence duration systematically and that silences tend to gravitate towards fillers in utterances, exhibiting systematically lower duration when preceding them. These results provide valuable insights for improving existing hesitation models.

  • Simon Philip Botley, and Sharifah Zakiah Wan Hassan, “Investigating Dysfluency in Malaysian Spoken Discussions,” in Research Mosaics of Language Studies in Asia: Differences and Diversity, Lah, Salasiah Che and Ramakrishna, Rita Abdul Rahman, Ed.: Penerbit Universiti Sains Malaysia, 2019. https://books.google.co.jp/books?hl=en&lr=lang_en&id=HDX6DwAAQBAJ&oi=fnd&pg=PT8&dq="hesitation phenomena"&ots=h2H97wyXzv&sig=M40wPldX9FGmh4JvgwTd0k4NrqU#v=onepage&q="hesitation phenomena"&f=false.

    Abstract (none)

  • Harry Collins , Willow Leonard-Clarke, and Hannah O’Mahoney, “‘Um, er’: how meaning varies between speech and its typed transcript,” Qualitative Research, vol. 19, no. 6, 2019, pp. 653-668. DOI: 10.1177/1468794118816615. https://journals.sagepub.com/doi/10.1177/1468794118816615.

    Abstract We report a small empirical study on the way the transcription used to represent speech affects its meaning. We show that ‘disfluencies’ in speech indicate far more uncertainty in the speaker when transmitted in text than when transmitted in recorded sound. This has important implications for how transcribed interviews should be edited when they are being used to convey meaning rather than the organization of phonemes. We propose the implications of different ways of representing speech in text could be a new subject for investigation. Presented here is one possible empirical approach to such studies.

    Keywords certainty in text and speech, disfluencies, editing of transcripts, interview transcription, meaning, qualitative research, transcribing fillers: um, er, uh

  • Iulia Grosman, Anne Catherine Simon, and Liesbeth Degand, “Empathetic hearers perceive repetitions as less disfluent, especially in non-broadcast situations,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 23-26. DOI: 10.21862/diss-09-007-gros-etal. https://doi.org/10.21862/diss-09-007-gros-etal.

    Abstract This experiment measures the impact of the communicative situation on perceived fluency in French speech. We consider three dimensions of fluency: grammatical, discursive and socio-interper¬sonal. We first hypothesise that grammatical fluency is less influenced by contextual constraints than the other two dimensions. Furthermore, taking into account the Interpersonal Reactivity Index of each participant, we hypothesise that hearers with higher interpersonal capacities will be more tolerant in their fluency evaluation, because of their ability to project into the speaker’s mind. The strength of the design rests on the proposal to test natural stimuli and integrate social and individual variables in a perception experiment.

  • Dorottya Gyarmathy, and Viktória Horváth, “Pausing strategies with regard to speech style,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 27-30. DOI: 10.21862/diss-09-008-gyar-horv. https://doi.org/10.21862/diss-09-008-gyar-horv.

    Abstract Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrela¬tion between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style.

  • Mária Gósy, “Halt command in word retrieval,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 3-6. DOI: 10.21862/diss-09-002-gosy. https://doi.org/10.21862/diss-09-002-gosy.

    Abstract In this study, occurrences and temporal patterns of five types of disfluencies were analyzed that show a common feature on the surface. All of them have some kind of interruption of content words followed by some continuation. The purpose was to show whether the place of interruption of the word articulation and the durational patterns of the editing phases are characteristic of re-starts, false starts, slips of the tongue, pauses within words, and prolongations. More than 1,400 instances were processed. Both (i) the number of pronounced segments of abandoned words and the duration of the corresponding editing phases are characteristic of a specific disfluency type, and (ii) speakers select a strategy to overcome their speech planning difficulties most economically.

  • Julianna Jankovics, and Luca Garai, “Disfluencies in mildly intellectually disabled young adults’ spontaneous speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 79-82. DOI: 10.21862/diss-09-021-jank-gara. https://doi.org/10.21862/diss-09-021-jank-gara.

    Abstract The study analyzes various hesitations and repairs in the spontaneous speech of mildly intellectually disabled women. The main research questions of the study focus on the similarities and differences in the frequency of disfluencies and the duration of pauses between the spontaneous speech of mildly intellectually disabled and mentally healthy young adults. Our results show that hesitation phenomena were more frequent among intellectually disabled subjects in spontaneous speech, while repairs occurred more frequently among control subjects in guided spontaneous speech.

  • Annelies Jehoul, “Filled pauses from a multimodal perspective. On the interplay of speech and eye gaze.,” PhD Dissertation, Katholieke Universiteit Leuven. September 2019(in eng). https://lirias.kuleuven.be/2814932?limo=0.

    Abstract This project offers a novel, integrative approach on filled pauses, the elements 'euh' and 'euhm' in Dutch. Insights on filled pauses from various research traditions are united to obtain a comprehensive overview of their form and function. Starting from a cognitive-interactional framework, our analysis relates formal variation in filled pauses to the functional variation. We show that formal differences in filled pauses, such as the difference between 'euh' and 'euhm', the difference in duration, the presence of surrounding silences and the speaker's eye gaze behavior, are associated with functional variation. In the study of the function of filled pauses, earlier studies can be distinguished in two approaches: the filler-as-symptom approach and the filler-as-signal approach (Clark & Fox Tree 2002, De Leeuw 2007). The filler-as-symptom perspective interprets filled pauses as symptoms of cognitive difficulties, for example when the speaker is uncertain or has trouble producing an utterance (e.g. Siegman & Pope 1965, Goldman-Eisler 1968, Christenfeld 1994). In the filler-as-signal perspective, a signaling function is attributed to filled pauses. Filled pauses are, amongst other things, claimed to signal the speaker's intention to continue the turn (Maclay & Osgood 1959), mark a delay in speech (Clark & Fox Tree 2002), structure the discourse (Rendle-Short 2004) and exit a sequence (Schegloff 2010). In this project, however, we show that filled pauses cannot be distinguished into cognitive and discursive filled pauses, but rather, that in most of their functions, these two dimensions are connected. There is an association of the complexity of the cognitive processing, and the scope of the discursive force. Both complex cognitive processing and a broad scope are reflected in the form of the filled pause: a longer duration of the filled pause, more pauses, the use of 'euhm' (instead of 'euh'), and the speaker's gaze aversion.

  • Borbála Keszler, and Judit Bóna, “Pausing and disfluencies in elderly speech: Longitudinal case studies,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 67-70. DOI: 10.21862/diss-09-018-kesz-bona. https://doi.org/10.21862/diss-09-018-kesz-bona.

    Abstract The aim of this paper was to investigate the changes in fluency of speech during ageing. The novelty of the examination is that this is a longitudinal study: it analyses the speech of 7 speakers from middle or young-old age to old-old age. Pausing strategies and frequency of disfluencies were analyzed. Results show that active aging helps to preserve certain parameters of speech characteristics of young speakers.

  • Valéria Krepsz, “Vowel lengthening — Effect of position, age, and phonological quantity,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 59-62. DOI: 10.21862/diss-09-016-krepsz. https://doi.org/10.21862/diss-09-016-krepsz.

    Abstract The present research examined the effect of phrase-final lengthening on the spectral structure of vowels in the spontaneous speech of children and adults. Three Hungarian vowel pairs (in quantity pairs) were analyzed in two positions: in the middle of the phrase and at the end of the phrase. The effect of lengthening on the spectral structure of the vowels were already be detected in four-year-olds. However, its extent was strongly correlated with the articulation aspects of the vowels. There was a discrepancy in the tendencies of the lengthening’s effect between the two groups of children and the adults, presumably due to different linguistic experience, inaccuracy of articulation, and significant individual differences.

  • Mária Laczkó, “Temporal characteristics of teenagers’ spontaneous speech and topic based narratives produced during school lessons,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 63-66. DOI: 10.21862/diss-09-017-laczko. https://doi.org/10.21862/diss-09-017-laczko.

    Abstract The aim of this presentation is to analyse the articulation and speech rates of teenagers and the types of pauses in their spontaneous speech and topic based narratives during school lessons. The speech samples were analysed in terms of temporal characteristics by Praat program. The results showed the different tempo values and various function of filled pauses in the examnined situations.

  • Mark Liberman, “Dysfluency considered Harmful,” May 2019. https://languagelog.ldc.upenn.edu/nll/?p=42775.

    Abstract … as a technical term, that is. Disfluency is no better, although the prefix is less judgmental. There are two problems: 1. These terms pathologize normal behavior, creating confusion between pathological symptoms and common phenomena in normal speech, which may be different not only in their causes and their frequency but also in behavioral detail; 2. Applied to normal speech, these terms often treat intrinsic aspects of the content and performance of spoken messages as if they were disruptions or failures.

  • Kikuo Maekawa, “Five pieces of evidence suggesting large lookahead in spontaneous monologue,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 7-10. DOI: 10.21862/diss-09-003-maekawa. https://doi.org/10.21862/diss-09-003-maekawa.

    Abstract There is considerable disagreement among the researchers of speech production with respect to the range of lookahead or pre-planning. In this paper, five pieces of evidence suggesting the presence of relatively large lookahead in spontaneous monologues are presented, based on the analyses of the Corpus of Spontaneous Japanese. This evidence consistently suggests that the range of a lookahead is six to seven accentual phrases long, which corresponds on average to 3–4 seconds in the time domain.

  • Helena Moniz, “Processing disfluencies in distinct speaking styles: Idiosyncrasies and transversality,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 1-2. DOI: 10.21862/diss-09-001-moniz. https://doi.org/10.21862/diss-09-001-moniz.

    Abstract This talk will tackle the idiosyncratic properties of disfluencies in distinct speaking styles, mostly university lectures (Trancoso et al., 2008) and map-task dialogues (Trancoso et al., 1998), but also featuring verbal fluency tests, and (more recently) second language learning presentations in ecological settings. It will also discuss the transversal acoustic-prosodic properties pertained across speaking styles. The main research questions are twofold: i) are there domain effects in the production of disfluencies when speakers adjust to distinct communicative contexts, as in university lectures and dialogues?; ii) if domain effects do exist, are there still acoustic-prosodic properties that can be shared across domains?

  • Elizabeth Morin-Lessard, and Krista Byers-Heinlein, “Uh and euh signal novelty for monolinguals and bilinguals: evidence from children and adults,” Journal of Child Language, vol. 46, no. 3, 2019, pp. 522–545. DOI: 10.1017/S0305000918000612.

    Abstract Previous research suggests that English monolingual children and adults can use speech disfluencies (e.g., uh) to predict that a speaker will name a novel object. To understand the origins of this ability, we tested 48 32-month-old children (monolingual English, monolingual French, bilingual English–French; Study 1) and 16 adults (bilingual English–French; Study 2). Our design leveraged the distinct realizations of English (uh) versus French (euh) disfluencies. In a preferential-looking paradigm, participants saw familiar–novel object pairs (e.g., doll–rel), labeled in either Fluent (“Look at the doll/rel!”), Disfluent Language-consistent (“Look at thee uh doll/rel!”), or Disfluent Language-inconsistent (“Look at thee euh doll/rel!”) sentences. All participants looked more at the novel object when hearing disfluencies, irrespective of their phonetic realization. These results suggest that listeners from different language backgrounds harness disfluencies to comprehend day-to-day speech, possibly by attending to their lengthening as a signal of speaker uncertainty. Stimuli and data are available at [https://osf.io/qn6px/].

  • Johanna Pap, “Effects of speech rate changes on pausing and disfluencies in cluttering,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 75-78. DOI: 10.21862/diss-09-020-pap. https://doi.org/10.21862/diss-09-020-pap.

    Abstract People with cluttering (PWC) often receive feedback, such as “Slow down!”, even so, this fluency disorder cannot be cured by only slowing down the speakers’ speech rate. When PWC accelerate their speech rate, language planning difficulties and word structure errors might occur, which might result in breakdowns in fluency and/or intelligibility. In the present paper characteristics of the frequency of disfluencies were examined in four different speech tasks from deliberately slow to maximum speech rate, whether speech rate changes have effects on cluttered speech. Twenty participants of this investigation were individuals suspected of cluttering with ages between 20 and 50 years of both genders. The results show that PWC are able to change, not only their speech rate but articulatory rate as well. Moreover, disfluencies were produced the most frequently in the speech task of maximum speech rate, where PWC do not have enough time for speech planning. The research provides empirical, measured data for a better insight into the nature of cluttering. Understanding the correlation between speech rate and disfluencies in cluttered speech is fundamental to improve the diagnosis of cluttering.

  • Brent Pitchford, and Karen M. Arnell, “Speech of young offenders as a function of their psychopathic tendencies,” Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimental, vol. 73, no. 3, 2019, pp. 193-201. DOI: 10.1037/cep0000176.

    Abstract The purpose of this study was to analyse young psychopathic offenders’ speech compared with controls and to determine whether it was dissimilar. An examination of two subsets of disfluencies in speech was conducted (i.e., filled pauses and discourse markers) to explore their disfluent language. Transcripts of Psychopathy Checklist–Revised Youth Version (PCL:YV) interviews from a sample of young offenders were analysed using Wmatrix software (Rayson, 2003, 2008). The young offenders were divided into a high psychopathy group (HP; n = 13) and a low psychopathy group (LP; n = 13). HP participants included more words relating to basic needs (i.e., money, sex) in their speech than their counterparts, but not fewer words relating to social needs (i.e., family, kin), which could reflect viewing the world in a more unemotional and instrumental way by HP individuals compared with LP participants. HP participants had fewer total disfluencies and filled pauses (i.e., uh, um) in their speech than LP participants. However, the usage of discourse markers (i.e., I mean, you know, like) was similar for HP and LP participants. Like adult psychopaths, the young offenders with higher psychopathic tendencies tended to use more basic needs words in their speech. Reduced filled pause use, which has been found to be related to individual’s self-consciousness, may reflect less self-monitoring in psychopaths when they are engaging in secondary tasks (i.e., tasks that will not offer rewards). These findings provide further support that individual differences can be reflected by characteristics in speech.

  • Kata Baditzné Pálvölgyi, “Hesitation patterns in the Spanish spontaneous speech of Hungarian learners of Spanish,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 35-38. DOI: 10.21862/diss-09-010-badi. https://doi.org/10.21862/diss-09-010-badi.

    Abstract This paper examines what native Spanish speakers find most disturbing in the pronunciation of Hungarian language learners of Spanish. Former research (Baditzné Pálvölgyi, 2019) showed that in spontaneous Spanish speech of at least threshold level Hungarian learners, one of the aspects that Spanish native speakers least tolerated was the way Hungarians hesitated. So the present paper focuses primarily on hesitation phenomena—lengthening and filled pauses—assuming that Hungarians hesitate more, and the lengthened segments are longer than the Spanish ones. In order to validate the hypothesis, an investigation comparing a corpus of Northern Spanish spontaneous speech to another corpus of advanced Hungarian learners of Spanish was conducted.

  • Ralph L. Rose, “The structural signaling effect of silent and filled pauses,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 19-22. DOI: 10.21862/diss-09-006-rose. https://doi.org/10.21862/diss-09-006-rose.

    Abstract Filled pauses (uh, um) have been shown in a number of studies to have a facilitative effect for listeners, such as helping them better perceive the syntactic structure of ongoing speech. This may be because the extra time afforded by the filled pause gives listeners more time to process the input. Theoretically, then, silent pauses should show a comparable effect. The present study tests this prediction using a grammaticality judgment task following a study by Bailey and Ferreira (2003). Results show that filled and silent pauses have a comparable influence on listeners’ grammaticality judgments but further suggest that listeners deem silent pauses as more important and influential markers.

  • Ralph L. Rose, “A comparison of filled pauses in scripted and non-scripted spontaneous speech,” in The 3rd International Symposium on Linguistic Patterns in Spontaneous Speech, Taipei, Taiwan, November 2019, pp. 21-25. http://hdl.handle.net/2065/00074187.

    Abstract Television and film productions are heavily scripted, but intend to portray speech as unscripted within the fiction of the dramatic universe they depict. Previous evidence (Quaglio, 2009) suggests however, that various lexical features of speech occur in such scripted spontaneous speech differently than they do in actual spontaneous speech. The present study is a comparison of the occurrence of filled pause disfluencies (in English, uh and um) in scripted spontaneous speech and actual spontaneous speech, to see if the basic usage patterns are similar. Using the English-Corpora.org web site interface, filled pauses were examined in three corpora (spontaneous speech, TV transcripts, and movie transcripts) in terms of their basic frequency of occurrence, their um:uh ratios, and their structural distribution with respect to sentence boundaries. Each was also evaluated in terms of how they shifted over time. Results show that the disfluency patterns of scripted spontaneous speech are similar in many ways to that of actual spontaneous speech. The frequency of filled pauses is similar to that shown in other major corpora and the um:uh ratio also replicates a trend observed in other work (Wieling et al, 2016; Fruehwald, 2016) suggesting an ongoing shift toward the use of um over uh but with television and film speech patterns lagging that of society.

  • Vered Silber-Varod, Mária Gósy, and Robert Eklund, “Segment prolongation in Hebrew,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 47-50. DOI: 10.21862/diss-09-013-silb-etal. https://doi.org/10.21862/diss-09-013-silb-etal.

    Abstract In this paper we study segment prolongations (PRs), a type of disfluency sometimes included under the term “hesitation disfluencies”, in Hebrew. PRs have previously been studied in a number of other lan¬guages within a comprehensive speech disfluency framework, which is applied to Hebrew in the cur¬rent study. For the purpose of this study we defined Hebrew clitics, such as conjunctions, articles, prepositions and so on, as words. The most striking difference between Hebrew and the previously studies languages is how restricted PRs seem to be in Hebrew, occurring almost exclusively on word-final vowels. The most frequently prolonged vowel is [e]. The segment type does not affect PRs’ duration. We found significant differences between men and women regarding the frequency of PRs.

  • Shungo Suzuki, and Judit Kormos, “The effects of read-aloud assistance on second language oral fluency in text summary speech,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 31-34. DOI: 10.21862/diss-09-009-suzu-korm. https://doi.org/10.21862/diss-09-009-suzu-korm.

    Abstract Focusing on text summary speaking tasks, the present study investigated the effects of the activation of phonological representations during text comprehension (operationalized by read-aloud assistance) on the subsequent retelling speech. A total of 24 Japanese learners of English completed text summary speaking tasks under two conditions: (a) reading without read-aloud assistance and (b) reading with read-aloud assistance. Their speech data were analyzed by lexical overlap indices (i.e. the ratio of characteristic single-words and multiword sequences) and by fluency measures capturing three major dimensions of fluency—speed, breakdown, and repair fluency. The results showed that read-aloud assistance directly facilitated lexical overlaps with source texts and indirectly improved speed and repair fluency. Furthermore, read-aloud assistance was found to affect the interrelationship between lexical overlaps and utterance fluency. The findings suggested that read-aloud assistance might help second language learners to store multiword sequences as a single unit (i.e. chunking) during text comprehension.

  • Linda Taschenberger, Outi Tuomainen, and Valerie Hazan, “Disfluencies in spontaneous speech in easy and adverse communicative situations: The effect of age,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 55-58. DOI: 10.21862/diss-09-015-tasc-etal. https://doi.org/10.21862/diss-09-015-tasc-etal.

    Abstract Disfluencies are a pervasive feature of speech communication. Their function in communication is still widely discussed with some proposing that their usage might aid understanding. Accordingly, talkers may produce more disfluencies when conversing in adverse communicative situations, e.g. in background noise. Moreover, increasing age may have an effect on disfluency use as older adults report particular difficulties when communicating in adverse condi¬tions. In this study, we elicited spontaneous speech via a problem-solving task from four different age groups (19–76 years old) to investigate the effect of energetic and informational maskers on the use of filled pauses (FPs), and its interaction with age. Measures of disfluency rates, effort ratings, and communication efficiency were obtained. Results show that, against our predictions, FP usage may decrease in adverse conditions. Moreover, age does not play a great role in adults with normal hearing. The results indicate that individuals differ greatly in their disfluency adaptations, utilising different strategies to overcome challenging communicative situations.

  • Michiko Watanabe, Yusaku Korematsu, and Yuma Shirahata, ““Uh” is preferred by male speakers in informal presentations in American English,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 43-46. DOI: 10.21862/diss-09-012-wata-etal. https://doi.org/10.21862/diss-09-012-wata-etal.

    Abstract This study investigates factors that are likely to be related to speakers' choice of filler type between uh and um in English, using an informal presentation speech corpus. The effects of the following factors on the probability of each filler type was examined: (1) immediately preceding clause boundary depth, (2) clause size measured as the number of words in the clause, (3) the number of quotation remarks in the clause, and (4) speaker's sex. The filler probabilities increased with the boundary depths. This trend was much stronger with um than with uh. Ums are more likely to appear clause-initially than uhs. Clause size had similar effect sizes on the two filler types. The number of quotation remarks had a stronger negative effect with ums. Speaker's sex had a significant effect only with uhs. Uhs are used more frequently by male speakers than by female speakers. The results indicate that speakers' choice of filler type is affected by the combination of multiple factors with various effect sizes.

  • Hong Zhang, “Variation in the choice of filled pause: A language change, or a variation in meaning?,” in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019), Budapest, Hungary, September 2019, pp. 15-18. DOI: 10.21862/diss-09-005-zhang. https://doi.org/10.21862/diss-09-005-zhang.

    Abstract The role of filled pauses in message structuring is a heavily debated question, but the result is still somewhat inconclusive. In this study, I consider this question jointly with sociolinguistic factors that have been thought to affect the choice of filled pause in American English. The results suggest that the use of uh is subject to higher variability across not only age groups, but also conversation topics and interlocutors. A latent semantic analysis found consistent difference between two forms of filled pause and silent pauses of varying duration in the primary latent dimension, but similarity between short silent pause and uh, as well as long silent pause and um in the second dimension. Therefore, the functional difference between um and uh should be acknowledged, and the observed change in their relative popularity is potentially related to their different meaning or function in the discourse.

  • Derya Çokal, Vitor Zimmerer, Douglas Turkington, Nicol Ferrier, Rosemary Varley, Stuart Watson, and Wolfram Hinzen, “Disturbing the rhythm of thought: Speech pausing patterns in schizophrenia, with and without formal thought disorder,” PLOS ONE, vol. 14, no. 5, 05 2019, pp. 1-14. DOI: 10.1371/journal.pone.0217404. https://doi.org/10.1371/journal.pone.0217404.

    Abstract Everyday speech is produced with an intricate timing pattern and rhythm. Speech units follow each other with short interleaving pauses, which can be either bridged by fillers (erm, ah) or empty. Through their syntactic positions, pauses connect to the thoughts expressed. We investigated whether disturbances of thought in schizophrenia are manifest in patterns at this level of linguistic organization, whether these are seen in first degree relatives (FDR) and how specific they are to formal thought disorder (FTD). Spontaneous speech from 15 participants without FTD (SZ-FTD), 15 with FTD (SZ+FTD), 15 FDRs and 15 neurotypical controls (NC) was obtained from a comic strip retelling task and rated for pauses subclassified by syntactic position and duration. SZ-FTD produced significantly more unfilled pauses than NC in utterance-initial positions and before embedded clauses. Unfilled pauses occurring within clausal units did not distinguish any groups. SZ-FTD also differed from SZ+FTD in producing significantly more pauses before embedded clauses. SZ+FTD differed from NC and FDR only in producing longer utterance-initial pauses. FDRs produced significantly fewer fillers than NC. Results reveal that the temporal organization of speech is an important window on disturbances of the thought process and how these relate to language.

2018

  • Ayşe Altıparmak, and Gülmira Kuruoğlu, “An Analysis of Speech Disfluencies of Turkish Speakers Based on Age Variable,” Journal of Psycholinguistic Research, Jan 2018. DOI: 10.1007/s10936-017-9553-4. https://doi.org/10.1007/s10936-017-9553-4.

    Abstract The focus of this research is to verify the influence of the age variable on fluent Turkish native speakers’ production of the various types of speech disfluencies. To accomplish this, four groups of native speakers of Turkish between ages 4–8, 18–23, 33–50 years respectively and those over 50-year-olds were constructed. A total of 84 participants took part in this study. Prepared and unprepared speech samples of at least 300 words were collected from each participant via face-to-face interviews that were tape recorded and transcribed; for practical reasons, only the unprepared speech samples were collected from children. As a result, for the prepared speech situation, there was no statistically significant difference in terms of age in the production rates of filled gaps, false starts, slips of the tongue and repetitions; however, participants in the over 50-year-old group produced more hesitations and prolongations than participants in the 18–23 and 33–50-year-old groups. For the unprepared speech situation, age variable was not effective on the production rates of filled gaps. However, 4–8 and over 50-year-old participants produced more hesitations and prolongations than the 18–23 and 33–50-year-old groups. 4–8-year-old children produced more slips of the tongue than the 18–23 and 33–50-year-old groups, and more false starts and repetitions than the participants in the other three age groups (18–23, 33–50, over 50). Further analyses revealed more extensive insights related to the types of disfluencies, the position of disfluencies, and the linguistic units involved in disfluency production in Turkish speech.

    Keywords linguistics, Speech disfluencies, Speech production, Turkish speech

  • Yu-Lin Cheng, “Unfamiliar Accented English Negatively Affects EFL Listening Comprehension: It Helps to be a More Able Accent Mimic,” Journal of Psycholinguistic Research, Feb 2018. DOI: 10.1007/s10936-018-9562-y. https://doi.org/10.1007/s10936-018-9562-y.

    Abstract In this study, EFL learners who listened to four short context-rich audio files each delivered in an unfamiliar English accent were required to produce best-attempt transcriptions and accent imitation recordings. Results indicate that exposure alone does not suffice to eliminate accent impact on EFL listeners. Importantly, results from one-way ANOVA analyses reveal between-participants differences in residual accent impact, vocabulary knowledge, and quality of accent imitation. Results from a linear mixed-effects model analysis, while suggesting that other unidentified factors may also assist EFL listeners in processing unfamiliar accented English, demonstrate that the more able mimics cope more successfully with unfamiliar accents than the less able mimics. Counter-intuitively, vocabulary knowledge is rejected as a predictor for success in reducing accent impact. A logical explanation for this particular finding is that a larger vocabulary repertoire aids listeners where there is no interference from unfamiliar accents. Given these findings, to better prepare EFL listeners for the English-as-an-International-Language world, training should include both listening to a variety of native and non-native accents and performing accent imitation (reproduction) exercises to further expand listeners’ phonological-phonetic flexibility.

    Keywords Accent imitation, Accent impact, Chinese-L1, EFL

  • Felix Ball, Lara E. Michels, Carsten Thiele, and Toemme Noesselt, “The role of multisensory interplay in enabling temporal expectations,” Cognition, vol. 170, no. Supplement C, 2018, pp. 130 - 146. DOI: 10.1016/j.cognition.2017.09.015. http://www.sciencedirect.com/science/article/pii/S0010027717302585.

    Abstract Temporal regularities can guide our attention to focus on a particular moment in time and to be especially vigilant just then. Previous research provided evidence for the influence of temporal expectation on perceptual processing in unisensory auditory, visual, and tactile contexts. However, in real life we are often exposed to a complex and continuous stream of multisensory events. Here we tested – in a series of experiments – whether temporal expectations can enhance perception in multisensory contexts and whether this enhancement differs from enhancements in unisensory contexts. Our discrimination paradigm contained near-threshold targets (subject-specific 75% discrimination accuracy) embedded in a sequence of distractors. The likelihood of target occurrence (early or late) was manipulated block-wise. Furthermore, we tested whether spatial and modality-specific target uncertainty (i.e. predictable vs. unpredictable target position or modality) would affect temporal expectation (TE) measured with perceptual sensitivity (d′) and response times (RT). In all our experiments, hidden temporal regularities improved performance for expected multisensory targets. Moreover, multisensory performance was unaffected by spatial and modality-specific uncertainty, whereas unisensory TE effects on but not RT were modulated by spatial and modality-specific uncertainty. Additionally, the size of the temporal expectation effect, i.e. the increase in perceptual sensitivity and decrease of RT, scaled linearly with the likelihood of expected targets. Finally, temporal expectation effects were unaffected by varying target position within the stream. Together, our results strongly suggest that participants quickly adapt to novel temporal contexts, that they benefit from multisensory (relative to unisensory) stimulation and that multisensory benefits are maximal if the stimulus-driven uncertainty is highest. We propose that enhanced informational content (i.e. multisensory stimulation) enables the robust extraction of temporal regularities which in turn boost (uni-)sensory representations.

    Keywords Auditory dominance, Multisensory interplay, Redundant target, Spatial coincidence, Temporal expectation, Temporal orienting

  • Jia E. Loy, Hannah Rohde, and Martin Corley, “Cues to Lying May be Deceptive: Speaker and Listener Behaviour in an Interactive Game of Deception,” Journal of Cognition, vol. 1, no. 1, 2018, pp. 1-21. DOI: 10.5334/joc.46.

    Abstract Are the cues that speakers produce when lying the same cues that listeners attend to when attempting to detect deceit? We used a two-person interactive game to explore the production and perception of speech and nonverbal cues to lying. In each game turn, participants viewed pairs of images, with the location of some treasure indicated to the speaker but not to the listener. The speaker described the location of the treasure, with the objective of misleading the listener about its true location; the listener attempted to locate the treasure, based on their judgement of the speaker’s veracity. In line with previous comprehension research, listeners’ responses suggest that they attend primarily to behaviours associated with increased mental difficulty, perhaps because lying, under a cognitive hypothesis, is thought to cause an increased cognitive load. Moreover, a mouse-tracking analysis suggests that these judgements are made quickly, while the speakers’ utterances are still unfolding. However, there is a surprising mismatch between listeners and speakers: When producing false statements, speakers are less likely to produce the cues that listeners associate with lying. This production pattern is in keeping with an attempted control hypothesis, whereby liars may take into account listeners’ expectations and correspondingly manipulate their behaviour to avoid detection.

    Keywords Deception; Communication; Pragmatics; Disfluency

  • Emi Morita, and Tomoyo Takagi, “Marking “commitment to undertaking of the task at hand”: Initiating responses with eeto in Japanese conversation,” Journal of Pragmatics, vol. 124, January 2018, pp. 31-49. DOI: 10.1016/j.pragma.2017.12.002. http://www.sciencedirect.com/science/article/pii/S0378216617302515.

    Abstract Eeto is one of the most frequently occurring Japanese vocal markers. Often characterized as a mere time-buyer, or “filler”, this token has been frequently said to reflect ongoing internal cognitive processing or reflection. Examining naturally occurring instances of eeto by focusing on its occurrences at the beginning of responses to information–seeking questions, however, we found that eeto-prefaced responses all provide a carefully constructed answer in contexts where the responses might otherwise be heard as not aligning in the most straightforward way. We argue that eeto affords Japanese conversationalists a way through which they can project the maximally prosocial stance of interactional commitment to undertaking the task at hand. Rather than a marker of an internal processing state, eeto, we argue, is instead a useful linguistic resource to publically display a respectful stance toward the questioner while the respondent is carefully building an appropriately contextualized response.

    Keywords Japanese, Fillers, Conversation analysis, Turn beginnings, stance,

  • Matthew Purver, Julian Hough, and Christine Howes, “Computational Models of Miscommunication Phenomena,” Topics in Cognitive Science, 3 2018. DOI: 10.1111/tops.12324. http:https://doi.org/10.1111/tops.12324.

    Abstract Miscommunication phenomena such as repair in dialogue are important indicators of the quality of communication. Automatic detection is therefore a key step toward tools that can characterize communication quality and thus help in applications from call center management to mental health monitoring. However, most existing computational linguistic approaches to these phenomena are unsuitable for general use in this way, and particularly for analyzing human–human dialogue: Although models of other-repair are common in human-computer dialogue systems, they tend to focus on specific phenomena (e.g., repair initiation by systems), missing the range of repair and repair initiation forms used by humans; and while self-repair models for speech recognition and understanding are advanced, they tend to focus on removal of “disfluent” material important for full understanding of the discourse contribution, and/or rely on domain-specific knowledge. We explain the requirements for more satisfactory models, including incrementality of processing and robustness to sparsity. We then describe models for self- and other-repair detection that meet these requirements (for the former, an adaptation of an existing repair model; for the latter, an adaptation of standard techniques) and investigate how they perform on datasets from a range of dialogue genres and domains, with promising results.

    Keywords Dialogue, disfluency, Incrementality, Miscommunication, Parallelism, repair, Sparsity

  • Jennifer M. Roche, and Hayley S. Arnold, “The Effects of Emotion Suppression During Language Planning and Production,” Journal of Speech, Language, and Hearing Research, vol. 61, no. 8, August 2018, pp. 2076-2083. DOI: 10.1044/2018_JSLHR-L-17-0232. https://pubs.asha.org/doi/10.1044/2018_JSLHR-L-17-0232.

    Abstract Purpose: Emotion regulation and language planning occur in parallel during interactive communication, but their processes are often studied separately. It has been suggested that emotion suppression and more complex language production both recruit cognitive resources. However, it is currently less clear how the language planning and production system is impacted when required to emotionally suppress outward displays of affect (i.e., expressive suppression). The purpose of the current study was to evaluate the interactive effects of emotion regulation and language production processes. | Method: Through discourse analysis of a corpus of interactive dialogue, we evaluated the production of interjections (i.e., also termed “filled pauses,” a type of speech disfluency) when participants regulated outward displays of emotion and when language was lexically complex (i.e., via lexical diversity). One participant (the sender) was assigned to either express or suppress affective displays during the interaction. The other person (the receiver) was given no special instructions before the interaction. The interactions were transcribed, and their linguistic content (i.e., lexical diversity, lexical alignment, and interjections) was analyzed. | Results: Results indicated that participants actively suppressing outward displays of affect produced more interjections and that participants asked to emotionally regulate, both expressors and suppressors, were more disfluent when producing lexically diverse statements (2 cognitively demanding tasks). | Conclusions: The current research provides support that, when suppressing emotion, one might be more disfluent when speaking. However, also when engaged in 2 simultaneous, demanding tasks of having to either upregulate or downregulate emotions and utter lexically diverse statements, the combined cognitive load may impede fluency in language production. More specifically, in the context of language planning and production, emotion suppression may pilfer resources away from the language planning and production system, leading to higher rates of disfluent speech. This finding is of particular importance because understanding the interactive effects of emotion and language production may be impactful to interventions for communication disorders.

  • Julie Sedivy, “Your Speech Is Packed With Misunderstood, Unconscious Messages,” March 2018. http://nautil.us/blog/-your-speech-is-packed-with-misunderstood-unconscious-messages.

    Abstract Imagine standing up to give a speech in front of a critical audience. As you do your best to wax eloquent, someone in the room uses a clicker to conspicuously count your every stumble, hesitation, um and uh; once you’ve finished, this person loudly announces how many of these blemishes have marred your presentation...

  • Sophia Uddin, Shannon L.M. Heald, Stephen C. Van Hedger, Serena Klos, and Howard C. Nusbaum, “Understanding environmental sounds in sentence context,” Cognition, vol. 172, 2018, pp. 134 - 143. DOI: 10.1016/j.cognition.2017.12.009. https://www.sciencedirect.com/science/article/pii/S0010027717303293.

    Abstract There is debate about how individuals use context to successfully predict and recognize words. One view argues that context supports neural predictions that make use of the speech motor system, whereas other views argue for a sensory or conceptual level of prediction. While environmental sounds can convey clear referential meaning, they are not linguistic signals, and are thus neither produced with the vocal tract nor typically encountered in sentence context. We compared the effect of spoken sentence context on recognition and comprehension of spoken words versus nonspeech, environmental sounds. In Experiment 1, sentence context decreased the amount of signal needed for recognition of spoken words and environmental sounds in similar fashion. In Experiment 2, listeners judged sentence meaning in both high and low contextually constraining sentence frames, when the final word was present or replaced with a matching environmental sound. Results showed that sentence constraint affected decision time similarly for speech and nonspeech, such that high constraint sentences (i.e., frame plus completion) were processed faster than low constraint sentences for speech and nonspeech. Linguistic context facilitates the recognition and understanding of nonspeech sounds in much the same way as for spoken words. This argues against a simple form of a speech-motor explanation of predictive coding in spoken language understanding, and suggests support for conceptual-level predictions.

    Keywords Constraint, Context, Environmental sound perception, Language, Recognition, speech perception

  • Sylvie Hancil, “Discourse coherence and intersubjectivity: The development of final but in dialogues,” Language Sciences, 2018. DOI: 10.1016/j.langsci.2017.12.002. http://www.sciencedirect.com/science/article/pii/S0388000117300852.

    Abstract All the studies on final particles in non-Asian languages systematically propose a synchronic view of the constructions under consideration. This paper closes the gap by offering a diachronic analysis of final but in dialogues in a corpus of Northern English over a sixty-year period. Relying on Schiffrin’s (1987) planes of discourse and Hasselgård’s (2006) definition of a modal particle, it is shown that final but has semantic–pragmatic properties of both a discourse marker and a modal particle. A socio-linguistic approach complements the analysis. Besides, the modal values identified are discussed in relation to Traugott’s (1982) and Traugott and Dasher’s (2002) theories of language change. Finally, it is explained how final but can be inserted in the category of final particles.

    Keywords Discourse value, Final particles, language change, Modal value, Northern English, Socio-linguistic parameters

2017

  • Jens Allwood, “Fluency or disfluency?,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 1-4. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract In this paper, I investigate the concepts of “fluency” and “disfluency” and argue that the application of the two concepts must be relativized to type of communicative activity. It is not clear that there is a generic sense of fluency or disfluency, rather what contributes to fluency and disfluency depends on what type of communication we are dealing with. The paper then turns to a brief investigation of what makes interactive face-to-face communication fluent or disfluent and argues that many of the features that have been labeled as disfluent, in fact, contribute to the fluency of interactive communication. Finally, I suggest that maybe it is time for a change of terminology and abandon the term “disfluent” for more positive or neutral terminology.

    Keywords DiSS

  • Ana Rita S. Valente, Kenneth O. St. Louis, Margaret Leahy, Andreia Hall, and Luis M.T. Jesus, “A country-wide probability sample of public attitudes toward stuttering in Portugal,” Journal of Fluency Disorders, vol. 52, 2017, pp. 37 - 52. DOI: http://dx.doi.org/10.1016/j.jfludis.2017.03.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300249.

    Abstract Background. Negative public attitudes toward stuttering have been widely reported, although differences among countries and regions exist. Clear reasons for these differences remain obscure. | Purpose. Published research is unavailable on public attitudes toward stuttering in Portugal as well as a representative sample that explores stuttering attitudes in an entire country. This study sought to (a) determine the feasibility of a country-wide probability sampling scheme to measure public stuttering attitudes in Portugal using a standard instrument (the "Public Opinion Survey of Human Attributes–Stuttering" ["POSHA–S"]) and (b) identify demographic variables that predict Portuguese attitudes. | Methods. The POSHA–S was translated to European Portuguese through a five-step process. Thereafter, a local administrative office-based, three-stage, cluster, probability sampling scheme was carried out to obtain 311 adult respondents who filled out the questionnaire. | Results. The Portuguese population held stuttering attitudes that were generally within the average range of those observed from numerous previous POSHA–S samples. Demographic variables that predicted more versus less positive stuttering attitudes were respondents’ age, region of the country, years of school completed, working situation, and number of languages spoken. Non-predicting variables were respondents’ sex, marital status, and parental status. | Conclusion. A local administrative office-based, probability sampling scheme generated a respondent profile similar to census data and indicated that Portuguese attitudes are generally typical.

    Keywords Representative Sampling

  • Anne Ruth van Leeuwen, Right on time. Utrecht, the Netherlands: Netherlands Graduate School of Linguistics / Landelijke (LOT).2017, pp. 155. https://www.lotpublications.nl/right-on-time.

    Abstract When a conversation is running smoothly, you know exactly when to nod, hum, or when to start your turn. You feel understood and connected, and you sense that your conversational partner feels the same. However, a conversation may also contain awkward silences, simultaneous starts, and an overall feeling of stuttering and stammering. During such conversations, you are often left with feelings of distance and mutual incomprehension. | Many people share the intuition that the expression of ‘being in sync’ with someone means that you are somehow in tune, in agreement, or in harmony with the other. This dissertation explores whether this intuition is correct; it investigates whether specific temporal patterns between turn-taking speakers, including synchronization of speech rhythms, shape the affective impression of speakers in conversation. The answer to this question can broaden our understanding of the affective push-and-pull of spoken interaction that we experience every day. | This question was explored by presenting participants with short fragments of dialogues between speakers in which we manipulated the temporal patterns between those speakers. Participants were then asked to rate the perceived degree of affiliation between the speakers of those fragments. In the last study of this dissertation we also recorded participants’ real-time affective response during listening to these fragments. We found that, in addition to the presence of overlapping talk, responding too early given the beat of the previous speaker conveys disaffiliation. ‘Being in sync’ is not just a figure of speech, but a real sign of affiliation in spoken dialogue.

  • Malte Belz, “Glottal filled pauses in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 5-8. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract For German, filled pauses are traditionally described with a vocalic form äh and a vocalic-nasal form ähm. A corpus-based approach and a closer phonetic inspection is used here to argue for an additional form, namely glottal filled pauses. In the data analysed for this study, the glottal form is produced by all seven speakers and amounts to 21% of all filled pauses. Contexts and durations of occurrences are discussed and compared to earlier studies on traditional filled pauses. It is suggested that the glottal variant should be considered in future studies on filled pauses and disfluencies.

    Keywords DiSS

  • Axel Bergström, Martin Johansson, and Robert Eklund, “Differences in production of disfluencies in children with typical language development and children with mixed receptive-expressive language disorder,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 9-12. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract There are several studies about non-fluency in people who stutter, but comparatively few regarding children with language impairment. The current research body regarding disfluencies in children with language impairment has been using different study-designs and definitions, making some results rather contradictory. The purpose of the present study is to expand the knowledge about disfluencies in children with language impairment and compare the occurrence of disfluencies between children with language impairment and children with typical language development in the same age group. A total of ten children with language impairment and six children with typical language development participated in this study. The subjects were recorded when talking freely about a thematic picture or toys and then analysed by calculating disfluencies per 50 words including frequency of different kinds of disfluencies according to Johnson and Associates’ (1959) classic taxonomy. Our results show that children with language impairment do produce statistically significant more disfluency in general, notably sound and syllable repetition, broken words and prolongations.

    Keywords DiSS

  • Simon Betz, Robert Eklund, and Petra Wagner, “Prolongation in German,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 13-16. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract We investigate segment prolongation as a means of disfluent hesitation in spontaneous German speech. We describe phonetic and structural features of disfluent prolongation and compare it to data of other languages and to non-disfluent prolongations.

    Keywords DiSS

  • Hans Rutger Bosker, “How our own speech rate influences our perception of others.,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 43, no. 8, 08/2017 2017, pp. 1225-1238. DOI: 10.1037/xlm0000381. http://psycnet.apa.org/record/2017-01854-001.

    Abstract In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved)

  • Hans Rutger Bosker, Eva Reinisch, and Matthias J. Sjerps, “Cognitive load makes speech sound fast, but does not modulate acoustic context effects,” Journal of Memory and Language, vol. 94, 2017, pp. 166 - 176. DOI: 10.1016/j.jml.2016.12.002. http://www.sciencedirect.com/science/article/pii/S0749596X16302492.

    Abstract In natural situations, speech perception often takes place during the concurrent execution of other cognitive tasks, such as listening while viewing a visual scene. The execution of a dual task typically has detrimental effects on concurrent speech perception, but how exactly cognitive load disrupts speech encoding is still unclear. The detrimental effect on speech representations may consist of either a general reduction in the robustness of processing of the speech signal (‘noisy encoding’), or, alternatively it may specifically influence the temporal sampling of the sensory input, with listeners missing temporal pulses, thus underestimating segmental durations (‘shrinking of time’). The present study investigated whether and how spectral and temporal cues in a precursor sentence that has been processed under high vs. low cognitive load influence the perception of a subsequent target word. If cognitive load effects are implemented through ‘noisy encoding’, increasing cognitive load during the precursor should attenuate the encoding of both its temporal and spectral cues, and hence reduce the contextual effect that these cues can have on subsequent target sound perception. However, if cognitive load effects are expressed as ‘shrinking of time’, context effects should not be modulated by load, but a main effect would be expected on the perceived duration of the speech signal. Results from two experiments indicate that increasing cognitive load (manipulated through a secondary visual search task) did not modulate temporal (Experiment 1) or spectral context effects (Experiment 2). However, a consistent main effect of cognitive load was found: increasing cognitive load during the precursor induced a perceptual increase in its perceived speech rate, biasing the perception of a following target word towards longer durations. This finding suggests that cognitive load effects in speech perception are implemented via ‘shrinking of time’, in line with a temporal sampling framework. In addition, we argue that our results align with a model in which early (spectral and temporal) normalization is unaffected by attention but later adjustments may be attention-dependent.

    Keywords Acoustic context, cognitive load, Rate normalization, Spectral normalization

  • Shin Ying Chu, Naomi Sakai, Koichi Mori, and Lisa Iverach, “Japanese normative data for the Unhelpful Thoughts and Beliefs about Stuttering (UTBAS) Scales for adults who stutter,” Journal of Fluency Disorders, vol. 51, 03/2017 2017, pp. 1-7. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.09.006. http://www.sciencedirect.com/science/article/pii/S0094730X16300274.

    Abstract Purpose. This study reports Japanese normative data for the Unhelpful Thoughts and Beliefs about Stuttering (UTBAS) scales. We outline the translation process, and evaluate the psychometric properties of the Japanese version of the UTBAS scales. | Methods. The translation of the UTBAS scales into Japanese (UTBAS-J) was completed using the standard forward-backward translation process, and was administered to 130 Japanese adults who stutter. To validate the UTBAS-J scales, scores for the Japanese and Australian cohorts were compared. Spearman correlations were conducted between the UTBAS-J and the Modified Erickson Communication Attitude scale (S-24), the self-assessment scale of speech (SA scale), and age. The test-retest reliability and internal consistency of the UTBAS-J were assessed. Independent t-tests were conducted to evaluate the differences in the UTBAS-J scales according to gender, speech treatment experience, and stuttering self-help group participation experience. | Results. The UTBAS-J showed good test-retest reliability, high internal consistency, and moderate to high significant correlations with S-24 and SA scale. A weak correlation was found between the UTBAS-J scales with age. No significant relationships were found between UTBAS-J scores, gender and speech treatment experience. However, those who participated in the stuttering self-help group demonstrated lower UTBAS-J scores than those who did not. | Conclusion. Given the current scarcity of clinical assessment tools for adults who stutter in Japan, the UTBAS-J holds promise as an assessment tool and outcome measure for use in clinical and research environments.

    Keywords Assessment, Japanese, Psychosocial issues, Questionnaire, stuttering

  • Jennifer Cole, Timothy Mahrt, and Joseph Roy, “Crowd-sourcing prosodic annotation,” Computer Speech & Language, 2017, pp. -. DOI: http://dx.doi.org/10.1016/j.csl.2017.02.008. http://www.sciencedirect.com/science/article/pii/S0885230816302455.

    Abstract Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

    Keywords Speech transcription

  • Ludivine Crible, Liesbeth Degand, and Gaëtanelle Gilquin, “The clustering of discourse markers and filled pauses A corpus-based French-English study of (dis)fluency,” Languages in Contrast, vol. 17, 02/2017 2017, pp. 69-95. DOI: 10.1075/lic.17.1.04cri. http://www.jbe-platform.com/content/journals/10.1075/lic.17.1.04cri.

    Abstract This article presents a corpus-based contrastive study of (dis)fluency in French and English, focusing on the clustering of discourse markers (DMs) and filled pauses (FPs) across various spoken registers. Starting from the hypothesis that markers of (dis)fluency, or ‘fluencemes’, occur more frequently in sequences than in isolation, and that their contribution to the relative fluency of discourse can only be assessed by taking into account the contextual distribution of these sequences, this study uncovers the specific contextual conditions that trigger the clustering of fluencemes in the two languages. First, the contexts of appearance of DMs and FPs are described separately, both in English and French, focusing on their distribution, position and co-occurrence patterns. Then, the combination of DMs and FPs in sequences and their different configurations (DM+FP, FP+DM, etc.) are investigated. Overall, it appears that FPs function differently depending on whether they are clustered with DMs or not, and this difference consists in either maintaining or erasing inter- and intra-linguistic contrasts.

    Keywords comparable corpus, Discourse markers, English/French, filled pauses, Fluency

  • Jillian Donahue, Christine Schoepfer, and Robin Lickley, “The effects of disfluent repetitions and speech rate on recall accuracy in a discourse listening task,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 17-20. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract disfluency on word recognition and local syntactic or semantic issues, fewer have addressed the impact on comprehension at a discourse level. In this work, we ask what effects features typical in the pathological condition of cluttering (essentially, rapid, disfluent and unintelligible speech) have on our ability to retain the information conveyed in speech. Specifically, we manipulate repetition disfluencies and speech rate in passages of running speech. Forty participants listened to four recordings of passages presented in four conditions: Control, Rapid, Disfluent, Rapid + Disfluent. They were asked to recall details of the passages and rate their speed, fluency and comprehensibility. Both repetition disfluencies and increased speech rate significantly reduced recall of information from discourse. Though no relationship was found between the working memory span of individuals and information recall, we argue that the cognitive load of these features of cluttered speech significantly affects intelligibility and thus recall of speech.

    Keywords DiSS

  • Megan Drevets, and Robin Lickley, “A psycholinguistic exploration of disfluency behaviour during the tip-of-the-tongue phenomenon,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 21-24. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract A tip-of-the-tongue state (TOT) occurs when a speaker knows a word but cannot retrieve its phonological form from memory. While previous studies have found that disfluencies are related to lexical retrieval difficulties, the literature lacks studies which have specifically investigated the impact of TOTs on disfluency. This study explores the relationship between TOTs and such disfluency behaviours as hesitations and target approximations (i.e. incorrect attempts to produce targets). TOTs were induced using the TOTimal method (Smith, Brown & Balfour, 1991), where participants memorised and retrieved the names of imaginary animals. Speech samples were analysed for TOTs and disfluencies. Disfluency rates increased with retrieval times during resolved TOTs. Additionally, target approximation rates correlated with the rates of both TOTs and “Don’t Know” responses, suggesting that target approximations are not unique to TOTs but are indicative of general uncertainty during lexical retrieval.

    Keywords DiSS

  • Gary Geunbae Lee, Ho-Young Lee, Jieun Song, Byeongchang Kim, Sechun Kang, Jinsik Lee, and Hyosung Hwang, “Automatic sentence stress feedback for non-native English learners,” Computer Speech & Language, vol. 41, 2017, pp. 29 - 42. DOI: http://dx.doi.org/10.1016/j.csl.2016.04.003. http://www.sciencedirect.com/science/article/pii/S0885230816301759.

    Abstract This paper proposes a sentence stress feedback system in which sentence stress prediction, detection, and feedback provision models are combined. This system provides non-native learners with feedback on sentence stress errors so that they can improve their English rhythm and fluency in a self-study setting. The sentence stress feedback system was devised to predict and detect the sentence stress of any practice sentence. The accuracy of the prediction and detection models was 96.6% and 84.1%, respectively. The stress feedback provision model offers positive or negative stress feedback for each spoken word by comparing the probability of the predicted stress pattern with that of the detected stress pattern. In an experiment that evaluated the educational effect of the proposed system incorporated in our CALL system, significant improvements in accentedness and rhythm were seen with the students who trained with our system but not with those in the control group.

    Keywords CALL

  • Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in chat and chunk phases of multiparty casual talk,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 25-28. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract Multiparty casual conversation lasting more than a few minutes can be viewed as a series of phases of chat and chunk type interaction, where chat is interactive conversation with several participants taking turns, and chunk refers to phases where one participant dominates the conversation, often by telling a story or giving an opinion. We investigate the distribution of disfluency in these phases in a 70-minute 5-party conversation where participants had no practical task to perform. This pilot study shows differences in the distribution of disfluency types and frequency in the two phases.

    Keywords DiSS

  • Mária Gósy, Dorottya Gyarmathy, and András Beke, “Phonetic analysis of filled pauses based on a Hungarian-English learner corpus,” International Journal of Learner Corpus Research, vol. 3, 12/2017 2017, pp. 149-174. DOI: 10.1075/ijlcr.3.2.03gos. http://www.jbe-platform.com/content/journals/10.1075/ijlcr.3.2.03gos.

    Abstract Filled pauses may reveal speech planning or execution problems to a greater extent in L2 spontaneous speech than in L1. The purpose of this study was to analyze the forms and position of all filled pauses, and the durations and the formants of vocalic filled pauses in English (L2) and in Hungarian (L1) spontaneous speech produced by 30 young learners with various L2 proficiency levels using data from our HunEng-D learner corpus. The findings showed that the forms of filled pauses were similar in both languages, irrespective of level of language proficiency. Results confirmed significantly longer vocalic filled pauses in basic and intermediate learners in their L2 relative to their more advanced peers. Formant values (as acoustic reflections of vowel quality) indicated very similar articulatory configurations for all vocalic filled pauses, irrespective of language and language proficiency.

    Keywords acoustics of vocalic filled pauses, duration, HunEng-D corpus, proficiency level

  • Mária Gósy, and Robert Eklund, “Segment prolongation in Hungarian,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 29-32. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract Segment prolongation (PR) has been shown to be one of the most common forms of non-pathological speech disfluencies (Eklund, 2001). The distribution of PRs in the word (initial–medial–final segment) seems to vary between languages of different syllable-structure complexity, making it interesting to study segment prolongation in languages that exhibit different syllable structure characteristics. Previous studies have studied languages with complex syllable structure, such as English and Swedish (Eklund & Shriberg, 1998; Eklund, 2001, 2004) where affixation creates complex consonant clusters, and languages with very simple syllable, such as Japanese (Den, 2003) or Tok Pisin (Eklund, 2001, 2004), as well as Mandarin Chinese (Lee et al., 2004). In this paper we study PRs in Hungarian. Our results indicate that PRs in Hungarian are more similar to English and Swedish than it is to Japanese, Tok Pisin or Mandarin Chinese, which lends support to the notion that underlying morphology plays a role in how PRs is realised.

    Keywords DiSS

  • Peter Howell, Kaho Yoshikawa, Kevin Tang, John Harris, and Clarissa Sorger, “Intervention for word-finding difficulty for children starting school who have diverse language backgrounds,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 33-36. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract Children who have word-finding difficulty can be identified by the pattern of disfluencies in their spontaneous speech; in particular whole-word repetition of prior words often occurs when they cannot retrieve the subsequent word. Work is reviewed that shows whole-word repetitions can be used to identify children from diverse language backgrounds who have word-finding difficulty. The symptom-based identification procedure was validated using a non-word repetition task. Children who were identified as having word-finding difficulty were given phonological training that taught them features of English that they lacked (this depended on their language background). Then they received semantic training. In the cases of children whose first language was not English, the children were primed to use English and then presented with material where there was interference in meanings across the languages (English names had to be produced). It was found that this training improved a range of outcome measures related to education.

    Keywords DiSS

  • Kenneth O. St. Louis, Farzan Irani, Rodney M. Gabel, Stephanie Hughes, Marilyn Langevin, Midori Rodriguez, Kathleen Scaler Scott, and Mary E. Weidner, “Evidence-based guidelines for being supportive of people who stutter in North America,” Journal of Fluency Disorders, 2017, pp. -. DOI: 10.1016/j.jfludis.2017.05.002. http://www.sciencedirect.com/science/article/pii/S0094730X17300050.

    Abstract Purpose. While many resources, particularly those available on the Internet, provide suggestions for fluent speakers as they interact with people who stutter (PWS), little evidence exists to support these suggestions. Thus, the purpose of this study was to document the supportiveness of common public reactions, behaviors, or interventions to stuttering by PWS. | Methods. 148 PWS completed the Personal Appraisal of Support for Stuttering-Adults. Additionally, a comparison of the opinions of adults who stutter based on gender and their involvement in self-help/support groups was undertaken. | Results. Many of the Internet-based suggestions for interacting with PWS are aligned with the opinions of the participants of this study. Significant differences were found amongst people who stutter on the basis of gender and involvement in self-help groups. | Conclusions. Lists of “DOs and DON’Ts” that are readily available on the Internet are largely supported by the data in this study; however, the findings highlight the need for changing the emphasis from strict rules for interacting with people who stutter to more flexible principles that keep the needs of individual PWS in mind.

  • Loulou Kosmala, and Aliyah Morgenstern, “A preliminary study of hesitation phenomena in L1 and L2 productions: a multimodal approach,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 37-40. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract This paper presents a preliminary study of vocal hesitations in L1 and L2 productions using a multimodal perspective. It investigates the use of vocal hesitations of French learners of English interacting in tandem with American speakers in semi-spontaneous speech. Several hesitation markers were analyzed (filled pauses, unfilled pauses, prolongations and non-lexical sounds) based on formal and functional features as well as their relation to gesture. Results do not show great differences in the frequency of vocal hesitations between L1 and L2 productions overall; however, we find differences in duration and combination complexity. Our study indicated that vocal hesitations mainly served planning functions and were very often accompanied with gaze aversion both in L1 and L2 productions. Moreover, speakers did not tend to gesture while hesitating. We conclude that hesitations mainly served planning strategies both in L1 and L2 speech, but with some differences in duration and complexity.

    Keywords DiSS

  • Kurt Eggers, and Sabine Van Eerdenbrugh, “Speech disfluencies in children with Down Syndrome,” Journal of Communication Disorders, 2017. DOI: 10.1016/j.jcomdis.2017.11.001. http://www.sciencedirect.com/science/article/pii/S0021992416301794.

    Abstract Purpose. Speech and language development in individuals with Down syndrome is often delayed and/or disordered and speech disfluencies appear to be more common. These disfluencies have been labeled over time as stuttering, cluttering or both. | Findings. were usually generated from studies with adults or a mixed age group, quite often using different methodologies, making it difficult to compare findings. Therefore, the purpose of this study was to analyze and describe the speech disfluencies of a group, only consisting of children with Down Syndrome between 3 and 13 years of age. | Method. Participants consisted of 26 Dutch-speaking children with DS. Spontaneous speech samples were collected and 50 utterances were analyzed for each child. Types of disfluencies were identified and classified into stuttering-like (SLD) and other disfluencies (OD). The criterion of three or more SLD per 100 syllables (cf. Ambrose & Yairi, 1999) was used to identify stuttering. Additional parameters such as mean articulation rate (MAR), ratio of disfluencies, and telescoping (cf. Coppens-Hofman et al., 2013) were used to identify cluttering and to differentiate between stuttering and cluttering. | Results & conclusion. Approximately 30 percent of children with DS between 3 and 13 years of age in this study stutter, which is much higher than the prevalence in normally developing children. Moreover, this study showed that the speech of children with DS has a different distribution of types of disfluencies than the speech of normally developing children. Although different cluttering-like characteristics were found in the speech of young children with DS, none of them could be identified as cluttering or cluttering-stuttering.

    Keywords Cluttering, Down Syndrome, Speech disfluencies, stuttering

  • Craig Lambert, Judit Kormos, and Danny Minn, “Task Repetition and Second Language Speech Processing,” Studies in Second Language Acquisition, vol. 39, no. 1, 2017, pp. 167–196. DOI: 10.1017/S0272263116000085.

    Abstract This study examines the relationship between the repetition of oral monologue tasks and immediate gains in L2 fluency. It considers the effect of aural-oral task repetition on speech rate, frequency of clause-final and midclause filled pauses, and overt self-repairs across different task types and proficiency levels and relates these findings to specific stages of L2 speech production (conceptualization, formulation, and monitoring). Thirty-two Japanese learners of English sampled at three levels of proficiency completed three oral communication tasks (instruction, narration, and opinion) six times. Results revealed that immediate aural-oral same task repetition was related to gains in oral fluency regardless of proficiency level or task type. Overall gains in speech rate were the largest across the first three performances of each task type but continued until the fifth performance. More specifically, however, clause-final pauses decreased until the second performance, midclause pauses decreased up to the fourth, and self-repairs decreased only after the fourth performance, indicating that task repetition may have been differentially related to specific stages in the speech production process.

  • Robin Lickley, “Disfluency in typical and stuttered speech,” in Fattori Socali e Biologici Nella Variazione Fonetica [Social and Biological Factors in Speech Variation] (Studi AISV), Bertini, Chiara and Celata, Chiara and Lenoci, Giovanna and Meluzzi, Chiara and Ricci, Irene, Ed.Milano, Italy: Associazione Italiana Scienze della Voce, 2017, pp. 373-387. DOI: 10.17469/O2103AISV000019.

    Abstract This paper discusses what happens when things go wrong in the planning and execution of running speech, comparing disfluency in typical speech with pathological disfluency in stuttering. Spontaneous speech by typical speakers is rarely completely fluent. There are several reasons why fluency can break down in typical speech. Various studies suggest that we produce disfluencies at a rate of around 6 per 100 fluent words, so a significant proportion of our utterances are disfluent in some way. Stuttering can halt the flow of speech at a much higher rate than typical disfluency. While persons who stutter are also prone to the same kinds of disfluency as typical speakers, their impairment results in the production of other forms of disfluency that are both quantitatively and qualitatively different from typical forms. In this paper, I give an overview of the causes of disfluency in both typical and stuttered speech and relate these causes to their articulatory and phonetic realisations. I show how typical and stuttered disfluencies differ in both their cause and their realisations.

  • Ludivine Crible, “Discourse markers and (dis)fluency in English and French Variation and combination in the DisFrEn corpus,” International Journal of Corpus Linguistics, vol. 22, no. 2, 09/2017 2017, pp. 242-264. DOI: 10.1075/ijcl.22.2.04cri. http://www.jbe-platform.com/content/journals/10.1075/ijcl.22.2.04cri.

    Abstract While discourse markers (DMs) and (dis)fluency have been extensively studied in the past as separate phenomena, corpus-based research combining large-scale yet fine-grained annotations of both categories has, however, never been carried out before. Integrating these two levels of analysis, while methodologically challenging, is not only innovative but also highly relevant to the investigation of spoken discourse in general and form-meaning patterns in particular. The aim of this paper is to provide corpus-based evidence of the register-sensitivity of DMs and other disfluencies (e.g. pauses, repetitions) and of their tendency to combine in recurrent clusters. These claims are supported by quantitative findings on the variation and combination of DMs with other (dis)fluency devices in DisFrEn, a richly annotated and comparable English-French corpus representative of eight different interaction settings. The analysis uncovers the prominent place of DMs within (dis)fluency and meaningful association patterns between forms and functions, in a usage-based approach to meaning-in-context.

    Keywords corpus annotation, dis uency, Discourse markers, speech, usage-based

  • Kikuo Maekawa, Ken’ya Nishikawa, and Shu-Chuan Tseng, “Phonetic characteristics of filled pauses: a preliminary comparison between Japanese and Chinese,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 41-44. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract Filled pauses in spontaneous Chinese and Japanese were analyzed to examine if there is systematic phonetic difference between the vowels of filled pauses and those occurred in ordinary lexical items. Also, the effect of the category of filled pauses (simple vocalic fillers versus fillers derived from demonstratives) was examined in both languages. Random forests analysis revealed that it was possible to construct automatic classifiers that achieved F-measure values of .7-.9. It turned out also that, in both languages, vowels in simple vocalic filled pauses showed higher F-values than the filled pauses derived from demonstratives. Lastly, it turned out that acoustic features distinguishing filled pauses from ordinary lexical items differ depending on both the category of filled pauses and languages.

    Keywords DiSS

  • Srdan Medimorec, Torin P. Young, and Evan F. Risko, “Disfluency effects on lexical selection,” Cognition, vol. 158, 01/2017 2017, pp. 28 - 32. DOI: http://dx.doi.org/10.1016/j.cognition.2016.10.008. http://www.sciencedirect.com/science/article/pii/S0010027716302426.

    Abstract Recent research has suggested that introducing a disfluency in the context of written composition (i.e., typing with one hand) can increase lexical sophistication. In the current study, we provide a strong test between two accounts of this phenomenon, one that attributes it to the delay caused by the disfluency and one that attributes it to the disruption of typical finger-to-letter mappings caused by the disfluency. To test between these accounts, we slowed down participants’ typewriting by introducing a small delay between keystrokes while individuals wrote essays. Critically, this manipulation did not disrupt typical finger-to-letter mappings. Consistent with the delay-based account, our results demonstrate that the essays written in this less fluent condition were more lexically diverse and used less frequent words. Implications for the temporal dynamics of lexical selection in complex cognitive tasks are discussed.

    Keywords Lexical sophistication

  • Mohammad Alameer, Lotte Meteyard, and David Ward, “Stuttering Generalization Self-Measure: Preliminary Development of a Self-Measuring Tool,” Journal of Fluency Disorders, 2017, pp. -. DOI: 10.1016/j.jfludis.2017.04.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300390.

    Abstract Introduction. Generalization of treatment is considered a difficult task for clinicians and people who stutter (PWS), and can constitute a barrier to long-term treatment success. To our knowledge, there are no standardized tests that collect measurement of the behavioral and cognitive aspects alongside the client’s self-perception in real-life speaking situations. | Purpose. This paper describes the preliminary development of a Stuttering Generalization Self-Measure (SGSM). The purpose of SGSM is to assess 1) stuttering severity and 2) speech-anxiety level during real-life situations as perceived by PWS. Additionally, this measurement aims to 3) investigate correlations between stuttering severity and speech-anxiety level within the same real-life situation. | Method. The SGSM initially reported includes nine speaking situations designed that are developed to cover a variety of frequent speaking scenario situations. However, two of these were less commonly encountered by participants and subsequently not included in the final analyses. Items were created according to five listener categories (family and close friends, acquaintances, strangers, persons of authority, and giving a short speech to small audience). Forty-three participants (22 PWS, and 21 control) aged 18 to 53 years were asked to complete the assessment in real-life situations. | Results. Analyses indicated that test-retest reliability was high for both groups. Discriminant validity was also achieved as the SGSM scores significantly differed between the controls and PWS two groups for stuttering and speech-anxiety. Convergent validity was confirmed by significant correlations between the SGSM and other speech-related anxiety measures.

    Keywords Assessment, Generalization, Self-perception, Speech-anxiety, Stuttering severity

  • Naomi Ogi, Involvement and Attitude in Japanese Discourse. Amsterdam, Netherlands: John Benjamins.2017. DOI: 10.1075/pbns.272. https://benjamins.com/$#$catalog/books/pbns.272/main.

    Abstract This book addresses the long discussed issue of Japanese interactive markers (traditionally called sentence-final particles) in a new light, and provides the comprehensive linguistic documentation of the interactional functions of seven interactive markers: ne, na, yo, sa, wa, zo and ze. By adopting three key notions, ‘involvement’, ‘formality’ and ‘gender’, the study not only reveals the functions and pragmatic effects of each marker, but also sheds light on some fundamental issues of the nature of spoken discourse in general, including how speakers collaborate with each other to create and sustain their conversations and how linguistic functions of verbal forms interface with sociocultural norms. This book will be of interest to students and scholars in a wide range of linguistic fields such as Japanese linguistics, pragmatics, sociolinguistics, discourse analysis and applied linguistics and to teachers and learners of Japanese and of a second/foreign language.

  • Sieb Nooteboom, and Hugo Quené, “The time course of self-monitoring within words and utterances,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 45-48. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract The within-word and within-utterance time course of internal and external self-monitoring is investigated in a four-word tongue twister experiment eliciting interactional word initial and word medial segmental errors and their repairs. It is found that detection rate for both internal and external self-monitoring decreases from early to late both within words and within utterances. Also, offset-to-repair times are more often of 0 ms in initial than in medial consonants.

    Keywords DiSS

  • Dan Nosowitz, “The Mystery and Occasional Poetry of, Uh, Filled Pauses,” January 2017. https://www.atlasobscura.com/articles/the-mystery-and-occasional-poetry-of-uh-filled-pauses.

    Abstract NEARLY EVERY LANGUAGE AND EVERY culture has what are called “filled pauses,” a notoriously difficult-to-define concept that generally refers to sounds or words that a speaker uses when, well, not exactly speaking. In American English, the most common are “uh” and “um.”

  • Pauliina Peltonen, “Temporal fluency and problem-solving in interaction: An exploratory study of fluency resources in L2 dialogue,” System, vol. 70, 2017, pp. 1 - 13. DOI: 10.1016/j.system.2017.08.009. http://www.sciencedirect.com/science/article/pii/S0346251X1630286X.

    Abstract Second language (L2) speech fluency has mostly been studied from monologues with temporal measures. In the present study, dialogue data are examined with a new framework that links (temporal) fluency analysis to a broader problem-solving perspective, offering a unique approach to examining the resources learners have for maintaining fluent speech despite problems. Dialogues based on a pairwise problem-solving task from 42 Finnish learners of English at two school levels were analyzed quantitatively for temporal fluency, dialogue fluency, stalling mechanisms, and communication strategies (CSs). A complementary qualitative analysis of selected productions was also conducted. The results indicate that temporal and dialogue fluency measures differentiate learners at different school levels, but the relationship between CSs and fluency is complex. While correlations between mid-clause pauses and certain strategies were found, the qualitative analysis indicated that stalling mechanisms and CSs can compensate for local dysfluencies and even contribute to temporal fluency. The results highlight the importance of combining quantitative and qualitative analysis in L2 fluency studies. Conceptually, L2 speech fluency should include collaborative aspects (dialogue fluency) in addition to individual, temporal fluency, and cover resources for maintaining fluency.

    Keywords Communication strategies, interaction, Mixed-methods, oral fluency, Problem-solving, second language speech

  • Ralph Rose, “Silent and filled pauses and speech planning in first and second language production,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 49-52. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract The present study looks at the relative association of silent and filled pauses to problems in discourse and syntactic planning via utterance and clause boundary phenomena, respectively, by focusing on crosslinguistic data. The occurrence of boundary pauses in a crosslinguistic corpus of speech suggests that silent pauses are more closely related to both discourse and syntactic planning than filled pauses, but more strongly so for discourse planning. These results were consistent across both first and second language production. However, clause boundary silent pauses in first language speech were more atypical (i.e., longer than average) than those in second language speech. This difference may be due to complexity differences in the first and second language speech samples.

    Keywords DiSS

  • Ralph L Rose, “Differences in second language speech fluency ratings: native versus nonnative listeners,” in Proceedings of the International Conference: Fluency & Disfluency Across Languages and Language Varieties, Université catholique de Louvain, 2 2017, pp. 101-103. http://hdl.handle.net/2078.1/195807.

    Abstract (none)

  • Ralph L. Rose, “A Comparison of Form and Temporal Characteristics of Filled Pauses in L1 Japanese and L2 English,” Journal of the Phonetic Society of Japan, vol. 21, no. 3, 2017, pp. 33-40. DOI: 10.24467/onseikenkyu.21.3_33. https://www.jstage.jst.go.jp/article/onseikenkyu/21/3/21_33/_article/-char/en.

    Abstract Filled pauses (FPs) in English can be either monophonemic ‘uh’ [ə] or polyphonemic ‘um’ [əm]. These differ temporally: shorter ‘uh’ is associated with shorter overall delay (including silent pauses). Japanese FPs are more varied, including both monophonemic ([ε], [ŋ]) and polyphonemic ([ε:to], [ɑno]) forms. This study compares the FPs of native Japanese speakers in a crosslinguistic speech corpus. Results show speakers use FPs with a lower F1 than native English speakers and strongly prefer the monophonemic form. Duration patterns are similar, but low proficiency speakers delay longer with monophonemic FPs. Results suggest possibilities for nonnative speech detection in speech applications.

  • June Ruivivar, and Laura Collins, “The Effects of Foreign Accent on Perceptions of Nonstandard Grammar: A Pilot Study Authors,” TESOL Quarterly, 05/2017 2017. DOI: 10.1002/tesq.374. http://onlinelibrary.wiley.com/doi/10.1002/tesq.374/full.

    Abstract (none)

  • Naomi Sakai, Shin Ying Chu, Koichi Mori, and J. Scott Yaruss, “The Japanese version of the Overall Assessment of the Speaker’s Experience of Stuttering for Adults (OASES-A-J): Translation and psychometric evaluation,” Journal of Fluency Disorders, 01/2017 2017. DOI: 10.1016/j.jfludis.2016.11.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300663.

    Abstract Purpose. This study evaluates the psychometric performance of the Japanese version of the Overall Assessment of the Speaker’s Experience of Stuttering for Adults (OASES-A), a comprehensive assessment tool of individuals who stutter. | Methods. The OASES-A-J was administered to 200 adults who stutter in Japan. All respondents also evaluated their own speech (SA scale), satisfaction of their own speech (SS scale) and the Japanese translation version of the Modified Erickson Communication Attitude scale (S-24). The test-retest reliability and internal consistency of the OASES-A-J were assessed. To examine the concurrent validity of the questionnaire, Pearson correlation was conducted between the OASES-A-J Impact score and the S-24 scale, SA scale and SS scale. In addition, Pearson correlation among the impact scores of each section and total were calculated to examine the construct validity. | Results. The OASES-A-J showed a good test-retest reliability (r = 0.81–0.95) and high internal consistency (α > 0.80). Concurrent validity was moderate to high (0.55–0.75). Construct validity was confirmed by the relation between internal consistency in each section and correlation among sections’ impact scores. Japanese adults showed higher negative impact for ‘General Information’, ‘Reactions to Stuttering’ and ‘Quality of Life’ sections. | Conclusion. These results suggest that the OASES-A-J is a reliable and valid instrument to measure the impact of stuttering on Japanese adults who stutter. The OASES-A-J could be used as a clinical tool in Japanese stuttering field.

    Keywords ICF, OASES, Psychometric analysis, Quality of life, stuttering

  • Vered Silber-Varod, and Anat Lerner, “Analysis of silences in unbalanced dialogues: the effect of genre and role,” in The 8th Workshop on Disfluency in Spontaneous Speech (DiSS 2017) (TMH-QPSR), vol. 58, no. 1, Stockholm, Sweden, August 2017, pp. 53-57. https://www.isca-speech.org/archive/diss_2017/DiSS2017_Proceedings.pdf.

    Abstract This study examines the diversity of silences in unbalanced dialogues, i.e. dialogues between speakers with different participation levels: responder and reporter. We examined two genres: therapeutic sessions and private dialogues that are based on this responder-reporter structure. When looking at silences versus speech ratios, we found no differences between the genres nor between the roles. However, when grouping the silences by their types: Pauses (intra-speaker silences), gaps (interspeakers’ silences) and silences that occur in the vicinity of speech overlaps, we found that the silence duration of pauses are role dependent in both genres, while the silence duration of gaps were found genre dependent, but not role dependent. Moreover, speech rate was not found genre dependent. It seems that although silences in unbalanced dialogues vary considerably, genre and speaker’s role are influential.

    Keywords DiSS

  • Richard Stephens, and Amy Zile, “Does Emotional Arousal Influence Swearing Fluency?,” Journal of Psycholinguistic Research, 01/2017 2017, pp. 1–13. DOI: 10.1007/s10936-016-9473-8. http://dx.doi.org/10.1007/s10936-016-9473-8.

    Abstract This study assessed the effect of experimentally manipulated emotional arousal on swearing fluency. We hypothesised that swear word generation would be increased with raised emotional arousal. The emotional arousal of 60 participants was manipulated by having them play a first-person shooter video game or, as a control, a golf video game, in a randomised order. A behavioural measure of swearing fluency based on the Controlled Oral Word Association Test was employed. Successful experimental manipulation was indicated by raised State Hostility Questionnaire scores after playing the shooter game. Swearing fluency was significantly greater after playing the shooter game compared with the golf game. Validity of the swearing fluency task was demonstrated via positive correlations with self-reported swearing fluency and daily swearing frequency. In certain instances swearing may represent a form of emotional expression. This finding will inform debates around the acceptability of using taboo language.

  • Stewart M. McCauley, and Morten H. Christiansen, “Computational Investigations of Multiword Chunks in Language Learning,” Topics in Cognitive Science, 2017. DOI: 10.1111/tops.12258. http:https://dx.doi.org/10.1111/tops.12258.

    Abstract Second-language learners rarely arrive at native proficiency in a number of linguistic domains, including morphological and syntactic processing. Previous approaches to understanding the different outcomes of first- versus second-language learning have focused on cognitive and neural factors. In contrast, we explore the possibility that children and adults may rely on different linguistic units throughout the course of language learning, with specific focus on the granularity of those units. Following recent psycholinguistic evidence for the role of multiword chunks in online language processing, we explore the hypothesis that children rely more heavily on multiword units in language learning than do adults learning a second language. To this end, we take an initial step toward using large-scale, corpus-based computational modeling as a tool for exploring the granularity of speakers’ linguistic units. Employing a computational model of language learning, the Chunk-Based Learner, we compare the usefulness of chunk-based knowledge in accounting for the speech of second-language learners versus children and adults speaking their first language. Our findings suggest that while multiword units are likely to play a role in second-language learning, adults may learn less useful chunks, rely on them to a lesser extent, and arrive at them through different means than children learning a first language.

    Keywords chunking, Comput ational modeling, Corpora, L2, Language learning

  • Uriel Cohen Priva, “Not so fast: Fast speech correlates with lower lexical and structural information,” Cognition, vol. 160, 2017, pp. 27 - 34. DOI: 10.1016/j.cognition.2016.12.002. http://www.sciencedirect.com/science/article/pii/S0010027716302888.

    Abstract Speakers dynamically adjust their speech rate throughout conversations. These adjustments have been linked to cognitive and communicative limitations: for example, speakers speak words that are contextually unexpected (and thus add more information) with slower speech rates. This raises the question whether limitations of this type vary wildly across speakers or are relatively constant. The latter predicts that across speakers (or conversations), speech rate and the amount of information content are inversely correlated: on average, speakers can either provide high information content or speak quickly, but not both. Using two corpus studies replicated across two corpora, I demonstrate that indeed, fast speech correlates with the use of less informative words and syntactic structures. Thus, while there are individual differences in overall information throughput, speakers are more similar in this aspect than differences in speech rate would suggest. The results suggest that information theoretic constraints on production operate at a higher level than was observed before and affect language throughout production, not only after words and structures are chosen.

    Keywords Information, Information rate, Language, speech rate

  • Vered Aharonson, Eran Aharonson, Katia Raichlin-Levi, Aviv Sotzianu, Ofer Amir, and Zehava Ovadia-Blechman, “A real-time phoneme counting algorithm and application for speech rate monitoring,” Journal of Fluency Disorders, vol. 51, 2017, pp. 60 - 68. DOI: 10.1016/j.jfludis.2017.01.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300389.

    Abstract Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient’s speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient’s practice. The algorithm’s phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of −4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.

    Keywords Smartphone application, Speaking rate computation, Spectral transition measure, Stuttering therapy

  • Xiaoming Jiang, and Marc D. Pell, “The sound of confidence and doubt,” Speech Communication, vol. 88, 2017, pp. 106 - 126. DOI: http://dx.doi.org/10.1016/j.specom.2017.01.011. http://www.sciencedirect.com/science/article/pii/S0167639316301509.

    Abstract Feeling of knowing (or "expressed confidence") reflects a speaker’s certainty or commitment to a statement and can be associated with one’s trustworthiness or persuasiveness in social interaction. We investigated the perceptual-acoustic correlates of expressed confidence and doubt in spoken language, with a focus on both linguistic and vocal speech cues. In Experiment 1, utterances subserving different communicative functions (e.g., stating facts, making judgments) produced in a confident, close-to-confident, unconfident, and neutral-intending voice by six speakers, were then rated for perceived confidence by 72 native listeners. As expected, speaker confidence ratings increased with the intended level of expressed confidence; neutral-intending statements were frequently judged as relatively high in confidence. The communicative function of the statement, and the presence vs. absence of an utterance-initial probability phrase (e.g., Maybe, I’m sure), further modulated speaker confidence ratings. In Experiment 2, acoustic analysis of perceptually valid tokens rated in Experiment 1 revealed distinct patterns of pitch, intensity and temporal features according to perceived confidence levels; confident expressions were highest in fundamental frequency (f0) range, mean amplitude, and amplitude range, whereas unconfident expressions were highest in mean f0, slowest in speaking rate, with more frequent pauses. Dynamic analyses of f0 and intensity changes across the utterance uncovered distinctive patterns in expression as a function of confidence level at different positions of the utterance. Our findings provide new information on how metacognitive states such as confidence and doubt are communicated by vocal and linguistic cues which permit listeners to arrive at graded impressions of a speaker’s feeling of (un)knowing.

    Keywords nonverbal behavior

  • Yuh-show Cheng, “Development and preliminary validation of four brief measures of L2 language-skill-specific anxiety,” System, 2017, pp. -. DOI: 10.1016/j.system.2017.06.009. http://www.sciencedirect.com/science/article/pii/S0346251X17304888.

    Abstract This paper reports a study on the development and validation of four brief measures of L2 language-skill-specific anxiety scales: L2 listening, speaking, reading, and writing anxiety scales. A total of 523 college students in Taiwan participated in the study. Lang’s (1971) tripartite model of anxiety provided a theoretical basis for developing the four scales. An initial pool of items were developed based on a review of related literature and the results of a focus group interview. Less ideal items were removed based upon the results of a pilot test. In the formal study, exploratory factor analysis was conducted to select items for each anxiety scale, which was subsequently validated by confirmatory factor analysis and correlation analysis. The results provided evidence for the reliability, convergent validity, and discriminant validity of the scores of the four brief measures.

    Keywords Brief measure, L2, Language anxiety, Language-skill-specific, Psychometric properties

2016

  • Akiko Fuse, and Erika A. Lanham, “Impact of social media and quality life of people who stutter,” Journal of Fluency Disorders, vol. 50, 2016, pp. 59 - 71. DOI: 10.1016/j.jfludis.2016.09.005. http://www.sciencedirect.com/science/article/pii/S0094730X16300262.

    Abstract Highlights. • People who stutter (PWS) who are connecting with other PWS have seen an improvement in their overall confidence. • PWS who use social media feel that they do not rely on it as their main form of communication and feel that they use social media an average amount. • Social media relieves PWS anxiety in communication by allowing them to communicate without negative evaluation or experience difficulty with functional communication.

  • Amy Watts, Patricia Eadie, Susan Block, Fiona Mensah, and Sheena Reilly, “Language skills of children during the first 12 months after stuttering onset,” Journal of Fluency Disorders, 12/2016 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.12.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300286.

    Abstract Purpose To describe the language development in a sample of young children who stutter during the first 12 months after stuttering onset was reported. Methods Language production was analysed in a sample of 66 children who stuttered (aged 2 to 4 years). The sample were identified from a pre-existing prospective, community based longitudinal cohort. Data were collected at three time points within the first year after stuttering onset. Stuttering severity was measured, and global indicators of expressive language proficiency (length of utterances and grammatical complexity) were derived from the samples and summarised. Language production abilities of the children who stutter were contrasted with normative data. Results The majority of children’s stuttering was rated as mild in severity, with more than 83% of participants demonstrating very mild or mild stuttering at each of the time points studied. The participants demonstrated developmentally appropriate spoken language skills comparable with available normative data. Conclusion In the first year following the report of stuttering onset, the language skills of the children who were stuttering progressed in a manner that is consistent with developmental expectations.

    Keywords Language

  • Andrea Révész, Monika Ekiert, and Eivind Nessa Torgersen, “The Effects of Complexity, Accuracy, and Fluency on Communicative Adequacy in Oral Task Performance,” Applied Linguistics, vol. 37, no. 6, 12/2016 2016, pp. 828-848. DOI: 10.1093/applin/amu069. http://applij.oxfordjournals.org/content/37/6/828.short?rss=1.

    Abstract Communicative adequacy is a key construct in second language research, as the primary goal of most language learners is to communicate successfully in real-world situations. Nevertheless, little is known about what linguistic features contribute to communicatively adequate speech. This study fills this gap by investigating the extent to which complexity, accuracy, and fluency (CAF) predict adequacy, and whether proficiency and task type moderate these relationships. In all, 20 native speakers and 80 second language users from four proficiency levels performed five tasks. Speech samples were rated for adequacy and coded for a range of CAF indices. Filled pause frequency, a feature of breakdown fluency, emerged as the strongest predictor of adequacy. Predictors with significant but smaller effects included indices of all three CAF dimensions: linguistic complexity (lexical diversity, overall syntactic complexity, syntactic complexity by subordination, and frequency of conjoined clauses), accuracy (general accuracy and accuracy of connectors), and fluency (silent pause frequency and speed fluency). For advanced speakers, incidence of false starts also emerged as predicting communicatively adequate speech. Task type did not influence the link between linguistic features and adequacy.

  • Andrew Martin, Yosuke Igarashi, Nobuyuki Jincho, and Reiko Mazuka, “Utterances in infant-directed speech are shorter, not slower,” Cognition, vol. 156, 2016, pp. 52 - 59. DOI: http://dx.doi.org/10.1016/j.cognition.2016.07.015. http://www.sciencedirect.com/science/article/pii/S0010027716301901.

    Abstract It has become a truism in the literature on infant-directed speech (IDS) that IDS is pronounced more slowly than adult-directed speech (ADS). Using recordings of 22 Japanese mothers speaking to their infant and to an adult, we show that although IDS has an overall lower mean speech rate than ADS, this is not the result of an across-the-board slowing in which every vowel is expanded equally. Instead, the speech rate difference is entirely due to the effects of phrase-final lengthening, which disproportionally affects IDS because of its shorter utterances. These results demonstrate that taking utterance-internal prosodic characteristics into account is crucial to studies of speech rate.

    Keywords Final lengthening

  • Elina Banzina, “Consonant lengthening for persuasiveness in L1 and L2 English,” International Journal of Applied Linguistics, vol. 26, no. 3, 11/2016 2016, pp. 403-419. DOI: doi.org/10.1111/ijal.12137. http://www.ingentaconnect.com/content/bpl/ijal/2016/00000026/00000003/art00007.

    Abstract The present study explored how persuasiveness is expressed phonetically in English and whether non-native speakers of English are able to employ L2 phonetic cues to convey importance in L2 in a native-like manner. An acoustic experiment compared English and Latvian speakers’ of English treatment of syllable-onset consonant duration relative to vowels in (i) neutral and (ii) persuasive speech contexts. Duration was measured in voiceless stops and continuants and a wide variety of vowels in the stressed syllables of key words. Results revealed that in persuasive speech, native English speakers significantly increased the proportion of consonantal duration, whereas no consonant lengthening was found in Latvian L1 and L2 productions. These findings provide evidence for the paralinguistic function of consonants and the existence of language-specific persuasion cues.

    Keywords consonant duration, consonant lengthening, discurso persuasivo, discurso público, duración de consonante, emphasis, énfasis, inglés como lengua extranjera, persuasive speech, public speaking, spoken English

  • Benjamin V. Tucker, Mirjam Ernestus, and View Affiliations, “Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon,” The Mental Lexicon, vol. 11, no. 3, 12/2016 2016, pp. 375-400. DOI: 10.1075/ml.11.3.03tuc. http://www.jbe-platform.com/content/journals/10.1075/ml.11.3.03tuc.

    Abstract The majority of studies addressing psycholinguistic questions focus on speech produced and processed in a careful, laboratory speech style. This ‘careful’ speech is very different from the speech that listeners encounter in casual conversations. This article argues that research on casual speech is necessary to show the validity of conclusions based on careful speech. Moreover, research on casual speech produces new insights and questions on the processes underlying communication and on the mental lexicon that cannot be revealed by research using careful speech. This article first places research on casual speech in its historic perspective. It then provides many examples of how casual speech differs from careful speech and shows that these differences may have important implications for psycholinguistic theories. Subsequently, the article discusses the challenges that research on casual speech faces, which stem from the high variability of this speech style, its necessary casual context, and that casual speech is connected speech. We also present opportunities for research on casual speech, mostly in the form of new experimental methods that facilitate research on connected speech. However, real progress can only be made if these new methods are combined with advanced (still to be developed) statistical techniques.

    Keywords casual speech, conversational speech, experimental paradigms, pronunciation variability, statistical analyses

  • Bjørn Wessel-Tolvig, and Patrizia Paggio, “Revisiting the thinking-for-speaking hypothesis: Speech and gesture representation of motion in Danish and Italian,” Journal of Pragmatics, vol. 99, 07/2016 2016, pp. 39 - 61. DOI: http://dx.doi.org/10.1016/j.pragma.2016.05.004. http://www.sciencedirect.com/science/article/pii/S0378216616301539.

    Abstract Many studies try to explain thought processes based on verbal data alone and often take the linguistic variation between languages as evidence for cross-linguistic thought processes during speaking. We argue that looking at co-speech gestures might broaden the scope and shed new light on different thinking-for-speaking patterns. Data comes from a corpus study investigating the relationship between speech and gesture in two typologically different languages: Danish, a satellite-framed language and Italian, a verb-framed language. Results show cross-linguistic variation in how motion components are mapped onto linguistic constituents, but also show how Italian speakers to some degree deviate from standard verb-framed lexicalization patterns, and use typical satellite-framed constructions. Co-speech gestures, when they occur, largely follow the patterns used in speech, with a notable exception: In 28% of the cases, in fact, Italian speakers express manner in path-only speech constructions gesturally. This finding suggests that gestures may be instrumental in revealing what semantic components speakers attend to while speaking; in other words, purely verbal data may not fully account for the thinking part of the thinking-for-speaking hypothesis.

    Keywords Gesture

  • Boaz M. Ben-David, Maroof I. Moral, Aravind K. Namasivayam, Hadas Erel, and Pascal H.H.M. van Lieshout, “Linguistic and Emotional-Valence Characteristics of Reading Passages for Clinical Use and Research,” Journal of Fluency Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.06.003. http://www.sciencedirect.com/science/article/pii/S0094730X16300377.

    Abstract Highlights: • There is little information on fundamental properties of reading passages that can affect reading (e.g., words’ arousal and valence, passage readability). • In a detailed analysis, the three commonly used passages were found to contain a share of emotionally valenced, high arousal, lower familiarity and polysyllabic content words. • The paper also provides a new well-balanced (and ranked high on ease of readability) passage that minimizes the impact of these properties (e.g., low arousal words). • Testing 26 PWS, error rates on a traditional passage and on the novel passage were correlated, yet many individuals showed a large difference between the two. • We suggest a combined procedure, using more than one passage. The details on passage characteristics can inform clinical practice.

  • Jazmín Cevasco, and Paul van den Broek, “The effect of filled pauses on the processing of the surface form and the establishment of causal connections during the comprehension of spoken expository discourse,” Cognitive Processing, vol. 17, no. 2, 2016, pp. 185–194. DOI: 10.1007/s10339-016-0755-8. http://dx.doi.org/10.1007/s10339-016-0755-8.

    Abstract The purpose of this study was to examine the effect of filled pauses (uh) on the verification of words and the establishment of causal connections during the comprehension of spoken expository discourse. With this aim, we asked Spanish-speaking students to listen to excerpts of interviews with writers, and to perform a word-verification task and a question-answering task on causal connectivity. There were two versions of the excerpts: filled pause present and filled pause absent. Results indicated that filled pauses increased verification times for words that preceded them, but did not make a difference on response times to questions on causal connectivity. The results suggest that, as signals of delay, filled pauses create a break with surface information, but they do not have the same effect on the establishment of meaningful connections.

  • David Wood, “Willingness to communicate and second language speech fluency: An idiodynamic investigation,” System, vol. 60, 2016, pp. 11 - 28. DOI: http://dx.doi.org/10.1016/j.system.2016.05.003. http://www.sciencedirect.com/science/article/pii/S0346251X16300276.

    Abstract Second language (L2) speech fluency has usually been studied as a function of a set of measurable temporal features of speech, but it has seldom been researched in relation to learner or situational factors in performance such as willingness to communicate (WTC), definable as readiness to engage in communication at a specific time and with specific interlocutors. The present study is an examination of the fluid relationship between WTC and L2 fluency from a dynamic systems perspective. The exploratory case study presents an examination of WTC and fluency in Japanese learners of English L2, in communication with a non-Japanese interlocutor. Speech samples produced by the learners were analyzed for markers of fluency. The learners produced WTC profiles for their speech samples by creating bitmaps during stimulated recall, and also provided retrospective self-analysis of WTC in stimulated recall. The fluency profiles and WTC profiles were matched and analyzed to explore the interrelationship between fluency and WTC. The results illuminate the relationship between fluency and WTC, particularly the fluidity and possible directionality of the relationship, i.e. whether fluency breakdowns lead to lowered WTC or vice versa.

    Keywords Cognitive fluency

  • Nivja H. de Jong, “Predicting pauses in L1 and L2 speech: the effects of utterance boundaries and word frequency,” International Review of Applied Linguistics in Language Teaching, vol. 54, no. 2, 06/2016 2016, pp. 113-132. DOI: 10.1515/iral-2016-9993. http://www.degruyter.com/view/j/iral.2016.54.issue-2/iral-2016-9993/iral-2016-9993.xml.

    Abstract This paper compares the distribution of silent and filled pauses in first (L1) and second language (L2) speech. The occurrence of pauses of 52 L2 and 18 L1 Dutch speakers was evaluated with respect to utterance boundaries and word frequency. We found that L2 speakers paused more often than L1 speakers within utterances; but not between utterances. Similarly, only within utterances, L2 pauses were longer than L1 pauses. Regarding word frequency, both L1 and L2 speakers are more likely to pause before lower frequency words as compared to higher frequency words. These findings imply that L1 and L2 speakers’ production processes may be similar in that (1) pauses at utterance boundaries are used for conceptual planning mostly and (2) lexical retrieval difficulties are comparable for L1 and L2 speakers. These findings furthermore imply that when using fluency for L2 testing, pause locations must be taken into account.

  • Francesca Bianchi, and Sara Gesuato, Pragmatic Issues in Specialized Communicative Contexts. : Brill.2016, pp. 240. DOI: 10.1163/9789004323902. http://www.brill.com/products/book/pragmatic-issues-specialized-communicative-contexts.

    Abstract "Pragmatic Issues in Specialized Communicative Contexts", edited by Francesca Bianchi and Sara Gesuato, illustrates how interactants systematically and effectively employ micro and macro linguistic resources and textual strategies to engage in communicative practices in such specific contexts as healthcare services, TV interpreting, film dialogue, TED talks, archaeology academic communication, student-teacher communication, and multilingual classrooms. Each contribution presents a pedagogical slant, reporting on or suggesting didactic approaches to, or applications of, pragmatic aspects of communication in SL, FL and LSP learning contexts. The topics covered and the issues addressed are all directly relevant to applied pragmatics, that is, pragmatically oriented linguistic analysis that accounts for interpersonal-transactional issues in real-life situated communication.

  • Josef Fruehwald, “Filled Pause Choice as a Sociolinguistic Variable,” University of Pennsylvania Working Papers in Linguistics, vol. 22, no. 2, 2016, pp. Article 6. https://repository.upenn.edu/pwpl/vol22/iss2/6.

    Abstract In this paper, I argue that filled pause selection (um/uh) is a sociolinguistic variable, conditioned by both internal and external factors. There appears to be a language change in progress towards selecting um more often than uh. In all respects, the (UHM) variable appears to pattern quantiatively just like all other sociolinguistic variables which have been examined, even though the locus of (UHM) variation would seem to be firmly in the speech planning domain. Combined with the quantitative systematicity of sociolinguistic variables across the full range of linguistic modules, I argue that the locus of variation may not be in the grammar, but rather constitutes a separate domain of knowledge, perhaps what Preston (2004) called the “sociocultural selection device.”

  • Effrosyni Georgiadou, and Karen Roehr-Brackin, “Investigating Executive Working Memory and Phonological Short-Term Memory in Relation to Fluency and Self-Repair Behavior in L2 Speech,” Journal of Psycholinguistic Research, 2016, pp. 1–19. DOI: 10.1007/s10936-016-9463-x. http://dx.doi.org/10.1007/s10936-016-9463-x.

    Abstract This paper reports the findings of a study investigating the relationship of executive working memory (WM) and phonological short-term memory (PSTM) to fluency and self-repair behavior during an unrehearsed oral task performed by second language (L2) speakers of English at two levels of proficiency, elementary and lower intermediate. Correlational analyses revealed a negative relationship between executive WM and number of pauses in the lower intermediate L2 speakers. However, no reliable association was found in our sample between executive WM or PSTM and self-repair behavior in terms of either frequency or type of self-repair. Taken together, our findings suggest that while executive WM may enhance performance at the conceptualization and formulation stages of the speech production process, self-repair behavior in L2 speakers may depend on factors other than working memory.

    Keywords Executive working memory, Fluency, hesitation phenomena, L2 speech production, Phonological short-term memory, Self-repair behavior, Working memory capacity

  • Anna Gladkova, Ulla Vanhatalo, and Cliff Goddard, “The semantics of interjections: An experimental study with natural semantic metalanguage,” Applied Psycholinguistics, vol. 37, 7 2016, pp. 841–865. DOI: 10.1017/S0142716415000260. http://journals.cambridge.org/article_S0142716415000260.

    Abstract The paper reports the results of a pilot experimental study aimed at evaluating natural semantic metalanguage (NSM) explications of English interjections. It proposes a novel online survey technique to test NSM explications with language speakers. The survey tested recently developed semantic explications of selected English interjections as published in Goddard (2014a): 'wow', 'gosh', 'gee', 'yikes' (“surprise” group) and 'yuck', 'ugh' (“disgust” group). The results provide overall support for the proposed explications and indicate directions for their further development. It is interesting that respondents’ preexisting knowledge of NSM and other background variables (age, gender, being a native speaker, or studying linguistics) were shown to have little influence on the test results.

  • Kaisa Hash, Heini-Marja Javinen, and Kalle Juuti, “Accommodating to English-medium instruction in teacher education in Finland,” International Journal of Applied Linguistics, vol. 26, no. 3, 11/2016 2016, pp. 291-310. DOI: 10.1111/ijal.12093. http://www.ingentaconnect.com/content/bpl/ijal/2016/00000026/00000003/art00001.

    Abstract This study analyses teacher educators’ and student teachers’ perceptions of teaching and learning situations in an international English as a lingua franca (ELF) context in an English-medium instruction (EMI) teacher education programme in Finland. The analysis of semi-structured interviews revealed that the participants perceived a partial reversal of traditional teacher and student roles; students assisted voluntarily and teaching became reciprocal. Some teachers reflected on having used typical strategies in ELF context, such as code-switching, to further communication and engage students. However, teachers’ lack of fluency was sometimes considered causing frustration among students and affected negatively their feeling of being professional teacher educators. Nevertheless, by increasing more learner-led activities, ELF can positively affect teacher education pedagogy.

    Keywords accommodation strategies, co-construction of communication, ELF, EMI, englanninkielinen koulutus, opettajankoulutus, sovittamisstrategiat, teacher education, yhdessä rakennettu viestintä

  • Hyunkyung Lee, Hyunsub Sim, Eunju Lee, and Dahye Choi, “Disfluency characteristics of children with attention-deficit/hyperactivity disorder symptoms,” Journal of Communication Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jcomdis.2016.12.001. http://www.sciencedirect.com/science/article/pii/S0021992416302027.

    Abstract The purpose of the current study was to investigate the characteristics of speech disfluency in 15 children with attention-deficit/hyperactivity disorder (ADHD) symptoms and 15 age-matched control children. Reading, story retelling, and picture description tasks were used to elicit utterances from the participants. The findings indicated that children with ADHD symptoms produced significantly more stuttering-like disfluencies (SLD) and other disfluencies (OD) when compared to the control group during all three tasks. Further statistical analysis showed that children with ADHD symptoms produced more OD during the story retelling task than the other two tasks, whereas no significant differences in OD were observed among the three tasks in the control children. Finally, children with ADHD symptoms exhibited a higher proportion of SLD in total disfluencies (TD) than the control children. These results are consistent with previous studies that children with ADHD are disfluent in their verbal production. Furthermore, children with ADHD symptoms seem to be more vulnerable to a speaking task that places greater demands on their attentional resources for language production, resulting in increased speech disfluencies.

    Keywords Stuttering-like disfluency

  • Jennifer A. Foote, and Pavel Trofimovich, “A Multidimensional Scaling Study of Native and Non-Native Listeners’ Perception of Second Language Speech,” Perceptual and Motor Skills, vol. 122, no. 2, 03/2016 2016, pp. 470-489. DOI: 10.1177/0031512516636528. http://pms.sagepub.com/content/122/2/470.

    Abstract Second language speech learning is predicated on learners’ ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers’ pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training.

    Keywords multidimensional scaling, second language speech, speech perception

  • Joana Cholin, Sabrina Heiler, Alexander Whillier, and Martin Sommer, “Premonitory Awareness in Stuttering Scale (PAiS),” Journal of Fluency Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.07.001. http://www.sciencedirect.com/science/article/pii/S0094730X16300353.

    Abstract Anticipation of stuttering events in persistent developmental stuttering is a frequent but inadequately measured phenomenon that is of both theoretical and clinical importance. Here, we describe the development and preliminary testing of a German version of the Premonitory Awareness in Stuttering Scale (PAiS) a 12-item questionnaire assessing immediate and prospective anticipation of stuttering that was translated and adapted from the Premonitory Urge for Tics Scale (PUTS) (Woods, Piacentini, Himle, & Chang, 2005). After refining the preliminary PAiS scale in a pilot study, we administered a revised version to 21 adults who stutter (AWS) and 21 age, gender and education-matched control participants. Results demonstrated that the PAiS had good internal consistency and discriminated the two speaker groups very effectively, with AWS reporting anticipation of speech disruptions significantly more often than adults with typical speech. Correlations between the PAiS total score and both the objective and subjective measures of stuttering severity revealed that AWS with high PAiS scores produced fewer stuttered syllables. This is possibly because these individuals are better able to adaptively use these anticipatory sensations to modulate their speech. These results suggest that, with continued refinement, the PAiS has the potential to provide clinicians and researchers with a practical and psychometrically sound tool that can quantify how a given AWS anticipates upcoming stuttering events.

    Keywords premonitory awareness

  • Kristen Lucas, Sharon A. Kerrick, Jenna Haugen, and Cole J. Corider, “Communicating Entrepreneurial Passion: Personal Passion vs. Perceived Passion in Venture Pitches,” IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, vol. 59, no. 4, 10/2016 2016, pp. 363-378. DOI: 10.1109/TPC.2016.2607818. http://ieeexplore.ieee.org/document/7604127/.

    Abstract Research problem: Entrepreneurial passion has been shown to play an important role in venture success and, therefore, in investors’ funding decisions. However, it is unknown whether the passion entrepreneurs personally feel or experience can be accurately assessed by investors during a venture pitch. Research questions: (1) To what extent does entrepreneurs’ personal passion align with investors’ perceived passion? (2) To what cues do investors attend when assessing entrepreneurs’ passion? Literature review: Integrating theory and research in entrepreneurship communication and entrepreneurial passion within the context of venture pitching, we explain that during venture pitches, investors make judgments about entrepreneurs’ passion that have consequences for their investment decisions. However, they can attend to only those cues that entrepreneurs outwardly display. As a result, they may not be assessing the passion entrepreneurs personally feel or experience. Methodology: We used a sequential explanatory mixed methods research design. For our data collection, we surveyed 40 student entrepreneurs, videorecorded their venture pitches, and facilitated focus groups with 16 investors who viewed the videos and ranked, rated, and discussed their perceptions of entrepreneurs’ passion. We conducted statistical analyses to assess the extent to which entrepreneurs’ personal passion and investors’ perceived passion aligned. We then performed an inductive analysis of critical cases to identify specific cues that investors attributed to passion or lack thereof. Results and conclusions: We revealed a large misalignment between entrepreneurs’ personal passion and investors’ perceived passion. Our critical case analysis demonstrated that entrepreneurs’ weak or strong presentation skills led investors either to underestimate or overestimate, respectively, perceptions of entrepreneurs’ passion. We suggest that entrepreneurs should develop specific presentation skills and rhetorical strategies for displaying their passion; at the same time, investors should be wary of attending too closely to presentation skills when assessing passion.

    Keywords Communication effectiveness, oral communication, public speaking

  • Lisa Iverach, Mark Jones, Lauren F. McLellan, Heidi J. Lyneham, Ross G. Menzies, Mark Onslow, and Ronald M. Rapee, “Prevalence of anxiety disorders among children who stutter,” Journal of Fluency Disorders, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.07.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300067.

    Abstract Purpose Stuttering during adulthood is associated with a heightened rate of anxiety disorders, especially social anxiety disorder. Given the early onset of both anxiety and stuttering, this comorbidity could be present among stuttering children. Method Participants were 75 stuttering children 7–12 years and 150 matched non-stuttering control children. Multinomial and binary logistic regression models were used to estimate odds ratios for anxiety disorders, and two-sample t-tests compared scores on measures of anxiety and psycho-social difficulties. Results Compared to non-stuttering controls, the stuttering group had six-fold increased odds for social anxiety disorder, seven-fold increased odds for subclinical generalized anxiety disorder, and four-fold increased odds for any anxiety disorder. Conclusion These results show that, as is the case during adulthood, stuttering during childhood is associated with a significantly heightened rate of anxiety disorders. Future research is needed to determine the impact of those disorders on speech treatment outcomes.

    Keywords stuttering

  • Louise Cummings, Case Studies in Communication Disorders. New York: Cambridge University Press.2016. get-book.cfm?BookID=109554.

    Abstract Designed for students of speech-language pathology, audiology and clinical linguistics, this valuable text introduces students to all aspects of the assessment, diagnosis and treatment of clients with developmental and acquired communication disorders through a series of structured case studies. Each case study includes questions which direct readers to important features of the case that will facilitate clinical learning. A selection of further readings encourages students to extend their knowledge of communication disorders. Key features of this book include: • 48 detailed case studies based on actual clients with communication disorders • 25 questions within each case study • Fully-worked answers to every question • 105 suggestions for further reading The text also develops knowledge of the epidemiology, aetiology, and linguistic and cognitive features of communication disorders, highlights salient aspects of client histories, and examines assessments and interventions used in the management of clients.

    Keywords cognitive science, General Linguistics, Neurolinguistics, psycholinguistics

  • Carolyn Mancuso, and Raymond G. Miltenberger, “Using habit reversal to decrease filled pauses in public speaking,” Journal of Applied Behavior Analysis, vol. 49, no. 1, 2016, pp. 188–192. DOI: 10.1002/jaba.267. http://dx.doi.org/10.1002/jaba.267.

    Abstract This study evaluated the effectiveness of simplified habit reversal in reducing filled pauses that occur during public speaking. Filled pauses consist of “uh,” “um,” or “er”; clicking sounds; and misuse of the word “like.” After baseline, participants received habit reversal training that consisted of awareness training and competing response training. During postintervention assessments, all 6 participants exhibited an immediate decrease in filled pauses.

    Keywords awareness training, competing response training, habit reversal, public speaking

  • Martijn Wieling, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman, and Mark Liberman, “Variation and Change in the Use of Hesitation Markers in Germanic Languages,” Language Dynamics and Change, vol. 6, no. 2, 2016 2016, pp. 199-234. DOI: 10.1163/22105832-00602001. http://booksandjournals.brillonline.com/content/journals/10.1163/22105832-00602001.

    Abstract In this study, we investigate crosslinguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.

    Keywords corpus linguistics, crosslinguistic change, hesitation markers, language change

  • Michael P. Boyle, Lauren Dioguardi, and Julie E. Pate, “A comparison of three strategies for reducing the public stigma associated with stuttering,” Journal of Fluency Disorders, vol. 50, 09/2016 2016, pp. 44-58. DOI: 10.1016/j.jfludis.2016.09.004. http://www.sciencedirect.com/science/article/pii/S0094730X16300316.

    Abstract Purpose. The effects of three anti-stigma strategies for stuttering—contact (hearing personal stories from an individual who stutters), education (replacing myths about stuttering with facts), and protest (condemning negative attitudes toward people who stutter)—were examined on attitudes, emotions, and behavioral intentions toward people who stutter. | Method. Two hundred and twelve adults recruited from a nationwide survey in the United States were randomly assigned to one of the three anti-stigma conditions or a control condition. Participants completed questionnaires about stereotypes, negative emotional reactions, social distance, discriminatory intentions, and empowerment regarding people who stutter prior to and after watching a video for the assigned condition, and reported their attitude changes about people who stutter. Some participants completed follow-up questionnaires on the same measures one week later. | Results. All three anti-stigma strategies were more effective than the control condition for reducing stereotypes, negative emotions, and discriminatory intentions from pretest to posttest. Education and protest effects for reducing negative stereotypes were maintained at one-week follow-up. Contact had the most positive effect for increasing affirming attitudes about people who stutter from pretest to posttest and pretest to follow-up. Participants in the contact and education groups, but not protest, self-reported significantly more positive attitude change about people who stutter as a result of watching the video compared to the control group. | Conclusion. Advocates in the field of stuttering can use education and protest strategies to reduce negative attitudes about people who stutter, and people who stutter can increase affirming attitudes through interpersonal contact with others.

    Keywords Anti-stigma programs, Empowerment, Public stigma, Stereotypes, Stuttering advocacy

  • Milly Heelan, Jan McAllister, and Jane Skinner, “Stuttering, alcohol consumption and smoking,” Journal of Fluency Disorders, vol. 48, 2016, pp. 27 - 34. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.05.001. http://www.sciencedirect.com/science/article/pii/S0094730X1630016X.

    Abstract Purpose: Limited research has been published regarding the association between stuttering and substance use. An earlier study provided no evidence for such an association, but the authors called for further research to be conducted using a community sample. The present study used data from a community sample to investigate whether an association between stuttering and alcohol consumption or regular smoking exists in late adolescence and adulthood. Methods: Regression analyses were carried out on data from a birth cohort study, the National Child Development Study (NCDS), whose initial cohort included 18,558 participants who have since been followed up until age 55. In the analyses, the main predictor variable was parent-reported stuttering at age 16. Parental socio-economic group, cohort member’s sex and childhood behavioural problems were also included. The outcome variables related to alcohol consumption and smoking habits at ages 16, 23, 33, 41, 46, 50 and 55. Results: No significant association was found between stuttering and alcohol consumption or stuttering and smoking at any of the ages. It was speculated that the absence of significant associations might be due to avoidance of social situations on the part of many of the participants who stutter, or adoption of alternative coping strategies. Conclusion: Because of the association between anxiety and substance use, individuals who stutter and are anxious might be found to drink or smoke excessively, but as a group, people who stutter are not more likely than those who do not to have high levels of consumption of alcohol or nicotine.

    Keywords Birth cohort

  • Nadia Brejon Teitler, Sandrine Ferré, and Clémentine Dailly, “Specific subtype of fluency disorder affecting French speaking children: A phonological analysis,” Journal of Fluency Disorders, vol. 50, 2016, pp. 33 - 43. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.09.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300237.

    Abstract Purpose Clinicians working with fluency disorders sometimes see children whose word repetitions are mostly located at the end of words and do not induce physical tension. Prior studies on the topic have proposed several names for these disfluencies including “end word repetitions”, “final sound repetitions” and “atypical disfluency”. The purpose of this study was to use phonological analysis to explore the patterns of this poorly recognized fluency disorder in order to better understand its specific speech characteristics. Methods We analyzed a spontaneous language sample of 8 French speaking children. Audio and video recordings allowed us to study general communication issues as well as linguistic and acoustical data. Results We did not detect speech rupture or coarticulation failures between the syllable onset and rhyme. The problem resides primarily on the rhyme production with a voicing interruption in the middle of the syllable nucleus or a repetition of the rhyme (nucleus alone or nucleus and coda), regardless of the position in the word or phrase. Conclusion The present study provides data suggesting that there exist major differences in syllable production between the disfluencies produced by our 8 children and stuttered disfluencies. Consequently, we believe that this fluency disorder should be recognized as distinct from stuttering.

    Keywords Syllable rhyme

  • Naomi Hertsberg, and Patricia M. Zebrowski, “Self-perceived competence and social acceptance of young children who stutter: Initial findings,” Journal of Communication Disorders, vol. 64, 2016, pp. 18 - 31. DOI: http://dx.doi.org/10.1016/j.jcomdis.2016.08.004. http://www.sciencedirect.com/science/article/pii/S0021992416301083.

    Abstract Purpose. The goals of this study were to determine whether young children who stutter (CWS) perceive their own competence and social acceptance differently than young children who do not stutter (CWNS), and to identify the predictors of perceived competence and social acceptance in young speakers. | Method. We administered the "Pictorial Scale of Perceived Competence and Social Acceptance for Young Children" (PSPCSA; Harter & Pike, 1984) to 13 CWS and 14 CWNS and examined group differences. We also collected information on the children’s genders, temperaments, stuttering frequencies, language abilities, and phonological skills to identify which of these factors predicted PSPCSA scores. | Results. CWS, as a group, did not differ from CWNS in their perceived general competence or social acceptance. Gender predicted scores of perceived general competence, and stuttering frequency predicted perceived social acceptance. Temperament, language abilities, and phonological skills were not significant predictors of perceived competence or social acceptance in our sample. | Conclusions. While CWS did not significantly differ from CWNS in terms of perceived competence and social acceptance, when both talker groups were considered together, girls self-reported greater perceived competence than boys. Further, lower stuttering frequency was associated with greater perceived social acceptance. These preliminary findings provide motivation for further empirical study of the psychosocial components of childhood stuttering. | Learning outcomes. Readers will be able to describe the constructs of perceived competence and social acceptance in young children, and whether early stuttering plays a role in the development of these constructs.

    Keywords children

  • Olga Kozar, “Teachers’ reaction to silence and teachers’ wait time in video and audioconferencing English lessons: Do webcams make a difference?,” System, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.system.2016.07.002. http://www.sciencedirect.com/science/article/pii/S0346251X16300720.

    Abstract There is a mismatch between an increasing number of people teaching languages via video or audioconferencing tools, and the amount of research available to such teachers to guide their practice. One particular pedagogical question that research does not provide guidance on teachers’ treatment of during videoconferencing and audioconferencing lessons. This study uses Conversation Analysis to compare lessons conducted by the same teacher-student dyads in audio and videoconferencing. The findings show distinct differences in teachers’ treatment of silence and teachers’ and students’ pausing behaviour in video and audioconferencing. Specifically, teachers tended to wait longer in videoconferencing and took the conversational floor faster in audioconferencing, thus leading to a higher number of overlaps with students’ emergent turns. This suggests that teachers need to be trained for conducting lessons via audio and video conferencing, and that teachers and teacher trainers need to identify specific pedagogical behaviours for each of these contexts.

    Keywords Online language teaching

  • Mary Grantham O’Brien, “Methodological Choices in Rating Speech Samples,” Studies in Second Language Acquisition, vol. 38, 9 2016, pp. 587–605. DOI: 10.1017/S0272263115000418. http://journals.cambridge.org/article_S0272263115000418.

    Abstract Much pronunciation research critically relies upon listeners’ judgments of speech samples, but researchers have rarely examined the impact of methodological choices. In the current study, 30 German native listeners and 42 German L2 learners (L1 English) rated speech samples produced by English-German L2 learners along three continua: accentedness, fluency, and comprehensibility. The goal was to determine whether rating condition, that is, (a) whether each speech sample is rated along all three continua after it is heard once or (b) whether all speech samples are rated along one continuum before being rated along the next continuum, and continuum order (e.g., whether participants rate speech samples for accentedness before comprehensibility or fluency) have an effect on listeners’ ratings. Results indicate no significant overall effects of rating condition or continuum order, but there is evidence of rating condition effects by listener group. The results have implications for laboratory and classroom assessments of L2 speech.

  • Ross Menzies, Sue O’Brian, Robyn Lowe, Ann Packman, and Mark Onslow, “International Phase II clinical trial of CBTPsych: A standalone Internet social anxiety treatment for adults who stutter,” Journal of Fluency Disorders, vol. 48, 2016, pp. 35-43. DOI: http://dx.doi.org/10.1016/j.jfludis.2016.06.002. http://www.sciencedirect.com/science/article/pii/S0094730X16300195.

    Abstract Purpose : is an individualized, fully automated, standalone Internet treatment program that requires no clinical contact or support. It is designed specifically for those who stutter. Two preliminary trials demonstrated that it may be efficacious for treating the social anxiety commonly associated with stuttering. However, both trials involved pre- and post-treatment assessment at a speech clinic. This contact may have increased compliance, commitment and adherence with the program. The present study sought to establish the effectiveness of : in a large international trial with no contact of any kind from researchers or clinicians. Method Participants were 267 adults with a reported history of stuttering who were given a maximum of 5 months access to CBTPsych. Pre-and post-treatment functioning was assessed within the online program with a range of psychometric measures. Results Forty-nine participants (18.4%) completed all seven modules of : and completed the post-treatment online assessments. That compliance rate was far superior to similar community trials of self-directed Internet mental health programs. Completion of the program was associated with large, statistically and clinically significant reductions for all measures. The reductions were similar to those obtained in earlier trials of CBTPsych, and those obtained in trials of in-clinic {CBT} with an expert clinician. Conclusions : is a promising individualized treatment for social anxiety for a proportion of adults who stutter, which requires no health care costs in terms of clinician contact or support. Educational objectives The reader will be able to: (a) Discuss the reasons for investigating : without any clinical contact (b) Describe the main components of the : treatment; (c) Summarize the results of this clinical trial; (d) Describe how the results might affect clinical practice, if at all.

    Keywords Stuttering, Cognitive behavior therapy, E-therapy, Internet

  • Benjamin G. Schultz, Irena O’Brien, Natalie Phillips, David H. McFarland, Debra Titone, and Caroline Palmer, “Speech rates converge in scripted turn-taking conversations,” Applied Psycholinguistics, vol. 37, 09/2016 2016, pp. 1201–1220. DOI: 10.1017/S0142716415000545. http://journals.cambridge.org/article_S0142716415000545.

    Abstract When speakers engage in conversation, acoustic features of their utterances sometimes converge. We examined how the speech rate of participants changed when a confederate spoke at fast or slow rates during readings of scripted dialogues. A beat-tracking algorithm extracted the periodic relations between stressed syllables (beats) from acoustic recordings. The mean interbeat interval (IBI) between successive stressed syllables was compared across speech rates. Participants’ IBIs were smaller in the fast condition than in the slow condition; the difference between participants’ and the confederate’s IBIs decreased across utterances. Cross-correlational analyses demonstrated mutual influences between speakers, with greater impact of the confederate on participants’ beat rates than vice versa. Beat rates converged in scripted conversations, suggesting speakers mutually entrain to one another’s beat.

  • Ye Tian, Takehiko Maruyama, and Jonathan Ginzburg, “Self Addressed Questions and Filled Pauses: A Cross-linguistic Investigation,” Journal of Psycholinguistic Research, 12/2016 2016, pp. 1–18. DOI: 10.1007/s10936-016-9468-5. http://dx.doi.org/10.1007/s10936-016-9468-5.

    Abstract There is an ongoing debate whether phenomena of disfluency (such as filled pauses) are produced communicatively. Clark and Fox Tree (Cognition 84(1):73–111, 2002) propose that filled pauses are words, and that different forms signal different lengths of delay. This paper evaluates this Filler-As-Words hypothesis by analyzing the distribution of self-addressed-questions or SAQs (such as ‘‘what’s the word’’) in relation to filled pauses. We found that SAQs address different problems in different languages (most frequently about memory-retrieval in English and Chinese, and about appropriateness in Japanese). In relation to filled pauses, British but not American English uses ‘‘um’’ to signal a more severe problem than ‘‘uh’’. Chinese uses different filled pauses to signal the syntactic category of the problem constituent. Japanese uses different filled pauses to signal levels of interaction with the interlocuter. Overall, our data supports the Filler-As-Words hypothesis that filled pauses are used communicatively. However, the dimensions of its meanings vary across languages and dialects.

    Keywords Cross-linguistic analysis, disfluency, filled pauses, Self addressed questions

  • Gunnel Tottie, “Planning what to say: Uh and um among the pragmatic markers,” in Outside the Clause: Form and function of extra-clausal constituents (Outside the Clause: Form and function of extra-clausal constituents), .: John Benjamins, 2016, pp. 97-122. https://benjamins.com/$#$catalog/books/slcs.178.04tot/details.

    Abstract Based on data from the Santa Barbara Corpus of Spoken American English, this paper argues that the vocalizations [ə(:)] and [ə(:)m]), usually transcribed 'uh' and 'um,' can be regarded as pragmatic markers, rather than as undesirable disfluencies or hesitation markers. It is shown that they are especially frequent in registers and contexts that require more planning by speakers, like narrative passages in conversation and in task-related contexts, especially in long turns. The term 'planner' is therefore proposed as an appropriate designation. Co-occurrences of 'uh' and 'um' with other pragmatic markers such as 'well, you know, I mean' and 'like' as well as with 'and' and 'but' are shown to support this view.

  • Vincent Hughes, Sophie Wood, and Paul Foulkes, “Strength of forensic voice comparison evidence from the acoustics of filled pauses,” International Journal of Speech Language and the Law, vol. 23, no. 1, 2016, pp. 99-132. DOI: 10.1558/ijsll.v23i1.29874. https://journals.equinoxpub.com/index.php/IJSLL/article/view/29874.

    Abstract This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed, drawn from Task 1 of the DyViS corpus (Nolan et al. 2009). The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (Brümmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; Brümmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases.

    Keywords durations, Forensic voice comparison, formant dynamics, hesitation markers, likelihood ratio

  • Vincenza Tudini, “Repair and codeswitching for learning in online intercultural talk,” System, 2016, pp. -. DOI: http://dx.doi.org/10.1016/j.system.2016.06.011. http://www.sciencedirect.com/science/article/pii/S0346251X16300641.

    Abstract This study examines the role of repair and code switching for language learning in online written interaction between two speakers of both Italian and English as, respectively, either an L1 or L2. Specifically, during episodes of general repair and corrective feedback, these geographically dispersed university language students used both languages in their repertoire as key interactional and learning resources to co-construct a language learning partnership and pursue affiliation. Despite the face-threatening nature of corrective feedback, also known as other-initiated other-repair, participants managed to construct and maintain intersubjectivity in the text chat environment by availing themselves of the reciprocal possibilities of their bilingual expertise, thus overcoming linguistic asymmetries. In this way both social and learning objectives were achieved during written talk-in-interaction, suggesting that online language learning partnerships with multilingual intercultural speakers of the target language rather than monolingual native speaker partners should be given a more prominent role in languages programs across sectors.

    Keywords Written talk-in-interaction

  • Yvonne Préfontaine, Judit Kormos, and Daniel Ezra Johnson, “How do utterance measures predict raters’ perceptions of fluency in French as a second language?,” Language Testing, vol. 33, no. 1, 2016, pp. 53-73. DOI: 10.1177/0265532215579530. http://dx.doi.org/10.1177/0265532215579530.

    Abstract While the research literature on second language (L2) fluency is replete with descriptions of fluency and its influence with regard to English as an additional language, little is known about what fluency features influence judgments of fluency in L2 French. This study reports the results of an investigation that analyzed the relationship between utterance fluency measures and raters’ perceptions of L2 fluency in French using mixed-effects modeling. Participants were 40 adult learners of French at varying levels of proficiency, studying in a university immersion context. Speech performances were collected on three different types of narrative tasks. Four utterance fluency measures were extracted from each performance. Eleven untrained judges rated the speech performances and we examined which utterance fluency measures are the best predictors of the scores awarded by the raters. The mean length of runs and articulation rate proved to be the most influential factors in raters’ judgments, while the frequency of pauses played a less important role. The length of pauses was positively related to fluency scores, indicating a prominent cross-linguistic variation specific to French. The relative importance of the utterance measures in predicting fluency ratings, however, was found to vary across tasks.

  • Peyman Zamani, Majid Ravanbakhsh, Farzad Weisi, Vahid Rashedi, Sara Naderi, Ayub Hosseinzadeh, and M Rezaei, “Effect(s) of Language Tasks on Severity of Disfluencies in Preschool Children with Stuttering,” Journal of Psycholinguistic Research, 05/2016 2016. DOI: 10.1007/s10936-016-9437-z. http://dx.doi.org/10.1007/s10936-016-9437-z.

    Abstract Speech disfluency in children can be increased or decreased depending on the type of linguistic task presented to them. In this study, the effect of sentence imitation and sentence modeling on severity of speech disfluencies in preschool children with stuttering is investigated. In this cross-sectional descriptive analytical study, 58 children with stuttering (29 with mild stuttering and 29 with moderate stuttering) and 58 typical children aged between 4 and 6 years old participated. The severity of speech disfluencies was assessed by SSI-3 and TOCS before and after offering each task. In boys with mild stuttering, The mean stuttering severity scores in two tasks of sentence imitation and sentence modeling were 21.81±1.7221.81±1.72 and 12.94±1.3812.94±1.38 respectively (P=0.837P=0.837). But, in boys with moderate stuttering the stuttering severity in the both tasks were 23.79±1.2623.79±1.26 and 29.00±2.0329.00±2.03 respectively (P=0.004P=0.004). In girls with mild stuttering, the stuttering severity in two tasks of sentence imitation and sentence modeling were 13.14±2.4713.14±2.47 and 13.86±2.0313.86±2.03 respectively (P=0.094P=0.094). But, in girls with moderate stuttering the mean stuttering severity in the both tasks were 25.27±1.9325.27±1.93 and 33.18±2.3233.18±2.32 respectively (P=0.007P=0.007). In both gender of typical children, the score of speech disfluencies had no significant difference between two tasks (P>0.05P>0.05). In preschool children with mild stuttering and peer non-stutters, performing the tasks of sentence imitation and sentence modeling could not increase the severity of stuttering. But, in preschool children with moderate stuttering, doing the task of sentence modeling increased the stuttering severity score.

2015

  • Malte Belz, and Uwe Reichel, “Pitch Characteristics of Filled Pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract We investigate the pitch characteristics of filled pauses in order to distinguish between hesitational and floor-holding functions of filled pauses. A corpus of spontaneous dialogues is explored using a parametric bottom-up approach to extract intonation contours. We find that subjects tend to utter filled pauses more prominently when they cannot see each other, which indicates an increased floor-holding usage of filled pauses in this condition.

    Keywords disfluencies, DiSS, filled pauses, floor-holding, intonation

  • Hans Rutger Bosker, and Eva Reinisch, “Normalization for Speechrate in Native and Nonnative Speech,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0324.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0324.pdf.

    Abstract Speech perception involves a number of processes that deal with variation in the speech signal. One such process is normalization for speechrate: local temporal cues are perceived relative to the rate in the surrounding context. It is as yet unclear whether and how this perceptual effect interacts with higher level impressions of rate, such as a speaker’s nonnative identity. Nonnative speakers typically speak more slowly than natives, an experience that listeners take into account when explicitly judging the rate of nonnative speech. The present study investigated whether this is also reflected in implicit rate normalization. Results indicate that nonnative speech is implicitly perceived as faster than temporally-matched native speech, suggesting that the additional cognitive load of listening to an accent speeds up rate perception. Therefore, rate perception in speech is not dependent on syllable durations alone but also on the ease of processing of the temporal signal.

    Keywords cognitive load, implicit processing, nonnative speech, speech perception, speechrate

  • Hans Rutger Bosker, Jade Tjiong, Hugo Quené, Ted Sanders, and Nivja De Jong, “Both native and non-native disfluencies trigger listeners’ attention,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For instance, disfluencies may elicit prediction of less accessible referents and may trigger listeners’ attention to the following word. However, recent work suggests differential processing of disfluencies in native and non-native speech. The current study investigated whether the beneficial effects of disfluencies on listeners’ attention are modulated by the (non-)native identity of the speaker. Using the Change Detection Paradigm, we investigated listeners’ recall accuracy for words presented in disfluent and fluent contexts, in native and non-native speech. We observed beneficial effects of both native and non-native disfluencies on listeners’ recall accuracy, suggesting that native and non-native disfluencies trigger listeners’ attention in a similar fashion.

    Keywords attention, Change Detection Paradigm, disfluencies, DiSS, non-native speech

  • Angelika Braun, and Annabelle Rosin, “On the Speaker-Specificity of Hesitation Markers,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0731.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0731.pdf.

    Abstract The occurrence of hesitation markers is generally considered to be part of the verbal planning process. It is also a feature which is of potential importance to the forensic application of phonetics if hesitation behaviour could be linked to individual speakers. This study examines a total of eight female speakers on three different days. It can be demonstrated that, even though results vary across sessions, subjects exhibit distinct patterns of hesitation marker usage. This pertains to the number as well as the type of hesitations marker, which makes this feature a potential candidate for forensic investigations.

    Keywords forensic phonetics, verbal planning

  • Vera Cabarrão, Helena Moniz, Jaime Ferreira, and Fernando Batista, “Prosodic Classification of Discourse Markers,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0634.1-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0634.pdf.

    Abstract The first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, and which correspond to words that are neither markers nor disfluencies (chunks). Our goal is to automatically predict discourse markers and include them in rich transcripts, along with other structural metadata events (e.g., disfluencies and punctuation marks) that are already encompassed in the language models of our in-house speech recognizer. Results show that the automatic classification of discourse markers is better for the lectures corpus (87%) than for the dialogue corpus (84%). Nonetheless, in both corpora, discourse markers are more easily confused with chunks than with disfluencies.

    Keywords Dialogues, Discourse markers, Lectures, prosody, Structural Metadata Events

  • Rasmus Dall, Mirjam Wester, and Martin Corley, “Disfluencies in change detection in natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract In this paper, we investigate the effect of filled pauses, a discourse marker and silent pauses in a change detection experiment in natural, vocoded and synthetic speech. In natural speech change detection has been found to increase in the presence of filled pauses, we extend this work by replicating earlier findings and explore the effect of a discourse marker, like, and silent pauses. Furthermore we report how the use of "unnatural" speech, namely synthetic and vocoded, affects change detection rates. It was found that the filled pauses, the discourse marker and silent pauses all increase change detection rates in natural speech, however in neither synthetic nor vocoded speech did this effect appear. Rather, change detection rates decreased in both types of "unnatural" speech compared to natural speech. The natural results suggests that while each type of pause increase detection rates, the type of pause may have a further effect. The "unnatural" results suggest that it is not the full pipeline of synthetic speech that causes the degradation, but rather that something in the pre-processing, i.e. vocoding, of the speech database limits the resulting synthesis.

    Keywords change detection, DiSS, filled pauses, speech synthesis

  • Nivja H. de Jong, Rachel Groenhout, Rob Schoonen, and Jan H. Hulstijn, “Second language fluency: Speaking style or proficiency? Correcting measures of second language fluency for first language behavior,” Applied Psycholinguistics, vol. 36, no. 2, 03/2015 2015, pp. 223-243. DOI: 10.1017/S0142716413000210. http://journals.cambridge.org/article_S0142716413000210.

    Abstract In second language (L2) research and testing, measures of oral fluency are used as diagnostics for proficiency. However, fluency is also determined by personality or speaking style, raising the question to what extent L2 fluency measures are valid indicators of L2 proficiency. In this study, we obtained a measure of L2 (Dutch) proficiency (vocabulary knowledge), L2 fluency measures, and fluency measures that were corrected for first language behavior from the same group of Turkish and English native speakers (N = 51). For most measures of fluency, except for silent pause duration, both the corrected and the uncorrected measures significantly predicted L2 proficiency. For syllable duration, the corrected measure was a stronger predictor of L2 proficiency than was the uncorrected measure. We conclude that for L2 research purposes, as well as for some types of L2 testing, it is useful to obtain corrected measures of syllable duration to measure L2-specific fluency.

  • Mark Dingemanse, Seán G. Roberts, Julija Baranova, Joe Blythe, Paul Drew, Simeon Floyd, Rosa S. Gisladottir, Kobin H. Kendrick, Stephen C. Levinson, Elizabeth Manrique, Giovanni Rossi, and N. J. Enfield, “Universal Principles in the Repair of Communication Problems,” PLoS ONE, vol. 10, no. 9, 09/2015 2015, pp. e0136100. DOI: 10.1371/journal.pone.0136100. http://dx.doi.org/10.1371%2Fjournal.pone.0136100.

    Abstract There would be little adaptive value in a complex communication system like human language if there were no ways to detect and correct problems. A systematic comparison of conversation in a broad sample of the world’s languages reveals a universal system for the real-time resolution of frequent breakdowns in communication. In a sample of 12 languages of 8 language families of varied typological profiles we find a system of ‘other-initiated repair’, where the recipient of an unclear message can signal trouble and the sender can repair the original message. We find that this system is frequently used (on average about once per 1.4 minutes in any language), and that it has detailed common properties, contrary to assumptions of radical cultural variation. Unrelated languages share the same three functionally distinct types of repair initiator for signalling problems and use them in the same kinds of contexts. People prefer to choose the type that is the most specific possible, a principle that minimizes cost both for the sender being asked to fix the problem and for the dyad as a social unit. Disruption to the conversation is kept to a minimum, with the two-utterance repair sequence being on average no longer that the single utterance which is being fixed. The findings, controlled for historical relationships, situation types and other dependencies, reveal the fundamentally cooperative nature of human communication and offer support for the pragmatic universals hypothesis: while languages may vary in the organization of grammar and meaning, key systems of language use may be largely similar across cultural groups. They also provide a fresh perspective on controversies about the core properties of language, by revealing a common infrastructure for social interaction which may be the universal bedrock upon which linguistic diversity rests.

  • Stephanie Don, and Robin Lickley, “Uh I forgot what I was going to say: How memory affects fluency,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Disfluency rates vary considerably between individuals. Previous studies have considered gender, age and conversational roles amongst other factors that may affect fluency. Testing a nonclinical, gender-balanced population of young adults performing the same speaking tasks, this study explores how inter-speaker variations in working memory and in long-term (lexical) memory affect disfluency in two different ways. Working memory was tested by a forward digit span test; long-term lexical memory was tested by the Verbal Fluency Test, both semantic and phonological versions. In addition, each participant provided 3 one-minute samples of monologue speech. The speech samples were analysed for disfluencies. Speakers with lower working memory scores produced more error repairs in running speech. Speakers with lower lexical access scores produced a higher rate of hesitations. The two types of memory affected fluency in different ways.

    Keywords DiSS, error repair, hesitation, long term lexical memory, working memory

  • Robert Eklund, Peter Fransson, and Martin Ingvar, “Neural correlates of the processing of unfilled and filled pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Spontaneously produced Unfilled Pauses (UPs) and Filled Pauses (FPs) were played to subjects in an fMRI experiment. While both stimuli resulted in increased activity in the Primary Auditory Cortex, FPs, unlike UPs, also elicited modulation in the Supplementary Motor Area, Brodmann Area 6. This observation provides neurocognitive confirmation of the oft-reported difference between FPs and other kinds of speech disfluency and also could provide a partial explanation for the previously reported beneficial effect of FPs on reaction times in speech perception. The results are discussed in the light of the suggested role of FPs as floor-holding devices in human polylogs.

    Keywords Auditory Cortex, BA6, Brodmann Area 6, DiSS, filled pauses, fMRI, PAC, SMA, speech disfluency, speech perception, spontaneous speech, Supplementary Motor Area, unfilled pauses

  • Ewa Guz, “Establishing the Fluency Gap Between Native and Non-Native-Speech,” Research in Language, vol. 13, no. 3, 2015. DOI: 10.1515/rela-2015-0021. https://www.degruyter.com/view/j/rela.2015.13.issue-3/rela-2015-0021/rela-2015-0021.xml.

    Abstract Although various dimensions of speech fluency have so far generated a great deal of research interest, very few accounts have tackled the issue of the relationship between L1 and L2 fluency. Also, little empirical evidence has been provided to support the claim that language users are more fluent in their mother tongue than in a foreign/second language. This study examines the fluency gap between L1 and L2 fluency using a battery of objectively quantifiable temporal measures of speed and breakdown fluency. It also attempts to identify those temporal fluency variables which are affected by the individual way of speaking rather than the degree of automatisation of speech processing and which underlie oral performance both in L1 and L2. The analysis draws on transcriptions of elicited speech samples in L1 (Polish) and L2 (English).

    Keywords breakdown fluency, hesitation phenomena, L1/ L2 speech fluency, pausing, speech rate, speed fluency, temporal measures of fluency

  • Elena Galkina, “Processing of Garden-Path Sentences Containing Silent and Filled Pauses in Stuttered Speech: Evidence From a Comprehensive Study,” Master's Thesis, University of South Carolina - Columbia, Columbia, South Carolina, USA, . 2015. http://scholarcommons.sc.edu/etd/3139.

    Abstract Disfluency is common in spontaneous speech. Self-correction is a type of disfluency that consists of reparandum, filler, and repair (Levelt, 1989). Little is known about the processing of self-corrections in a normally disfluent speech, and even less is known about its processing in atypically disfluent speech (e.g. speech in patients with autism spectrum disorder, hearing impaired, patients with brain damage, and stuttered speech; see: Lake, Humphreys, & Cardy, 2011; Lind, Hickson, & Erber, 2004; Plexico et al., 2010; Rossi et al., 2011; Yairi, Gintautas, & Avent, 1981). This study focuses on self-correction disfluencies in garden-path sentences and employs a behavioral data collection method to investigate how disfluencies are processed as they are heard. This experiment examines spoken language comprehension by measuring accuracy and response time to comprehension questions. The data was gathered and analyzed. Two experimental conditions were presented where in the first one normal speakers listened to typically disfluent speech, and in the second one normal speakers listened to atypically disfluent stuttered speech. The information about the speakers in the recorded stimuli was kept from the listeners. Fillers, such as uh and um are common in stuttered speech because of their helpful role in starting an utterance. In stuttered speech, the uhs, ums and pauses tend to be longer and in odd places, relative to the speech of people who do not stutter. Therefore, the hypothesis of this study was that the fillers and pauses made by people who stutter affect the dynamics of processing, particularly in garden-path sentences. Namely, the accuracy rate for the comprehensive questions was predicted to be lower for the garden-path filled pause sentences, particularly for atypical speaker condition. Reaction time was predicted to be longer for the same condition. The analysis revealed an accuracy measure dependence on the speaker condition but no significant time correlation. This study provides significant information about how normal speakers’ comprehension is affected by disfluency such as pauses in general, and how speech impairment, such as stuttering, affects the processing of filled and silent pause disfluecies.

  • Lorenzo García-Amaya, “A longitudinal study of filled pauses and silent pauses in second language speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract This study provides a longitudinal analysis of speech rate and the use of filled pauses (FPs) and unfilled or silent pauses (SPs) in the oral production of L2 learners of Spanish in two learning contexts: a 6-week intensive overseas immersion program (OIM), and a 15-week US-based ‘at-home’ foreign language classroom (AH). Fifty-six native speakers of English performed two video-retell tasks at three different time points. A total of five measurements of oral production were calculated. The results show a significant increase in rate of speech over time in the OIM group compared to the AH group. Additionally, the OIM learners show greater use of “disfluencies” over time, namely FPs and short Sps. We suggest that OIM learners increase their use of hesitation phenomena over time as a speech processing and planning strategy and discuss this finding within the framework of L2 cognitive Fluency.

    Keywords disfluencies, DiSS, filled pauses, rate of speech, second language fluency, silent pauses, Spanish, study abroad

  • Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in multiparty social talk,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Much research on disfluencies in spontaneous spoken interaction has been carried out on corpora of task-based conversations, resulting in greater understanding of the role of several phenomena. Modern multimodal corpora allow the full spectrum of signals in face to face communication to be analysed. However, the ‘unmarked’ case of casual conversation or social talk with no obvious short-term instrumental goal has been less studied in this manner. Corpus-based work on social talk tends to deal with short dyadic interactions, although the norm for social conversation is for longer multiparty interaction. In this paper, we outline our programme of exploratory studies of disfluency in a longer multiparty conversation. We briefly describe the background to our research goals, and then report on the collection, transcription, and annotation of the data for our experiments. We present and discuss some of our early results.

    Keywords casual conversation, disfluency, DiSS, hesitation, repair, spoken interaction

  • Iulia Grosman, “Complexity cues or attention triggers? Repetitions and editing terms for native speakers of French,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract A growing stream of research shows evidence of the metalinguistic information that disfluencies (silent and filled pauses, repetitions, false-starts, repairs, etc.) can display to listeners. As a result, disfluencies may work as fluent devices. By means of a decision task latencies, this study investigates whether lexical repetition co-occurring with an editing term affects the perception of native speakers of French. There is a lack of consensus in the literature: do disfluencies trigger conceptual priming of complex entity or act simply as attention cues? Results from multiple analysis of variance and linear mixed-effect modelling show that the presence of a disfluency triggers a faster response from the participant, however complex the following noun-phrase might be, supporting the hypothesis that repetition and co-occurring editing terms act as cognitive signposts rather than as cues of complexity of an upcoming event.

    Keywords disfluencies, DiSS, French, perception, prosody, reaction time, repetitions

  • Sandra Götz, “Fluency in ENL, ESL and EFL: A corpus-based approach,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Against the background of a ‘cline model’ of increasing fluency/decreasing disfluency from ENL to ESL to EFL forms of English, the present pilot study investigates (dis)fluency features in British English, Sri Lankan English and German Learner English. The analysis of selected variables of temporal fluency (viz. unfilled pauses, mean length of runs) and fluency-enhancement strategies (viz. discourse markers, smallwords and repeats) is based on the c. 40,000-word subcorpora of the British and the Sri Lankan components of the International Corpus of English (ICE-GB and ICE-SL) and the c. 80,000-word German component of the Louvain International Database of Spoken English Interlanguage (LINDSEI-GE). The study reveals that, while the EFL variant shows the lowest degree of temporal fluency (e.g. the highest number of unfilled pauses), the findings are mixed for ESL and ENL (e.g. the ESL speakers show a lower number of unfilled pauses, but the ENL speakers show a higher number of smallwords). Also, variant-specific preferences of using certain fluency-enhancement strategies become clearly visible.

    Keywords corpus-based (dis)fluency, DiSS, ENL vs. ESL vs. EFL, Fluency, fluency profiles

  • Zara Harmon, and Vsevolod Kapatsinski, “Studying the dynamics of lexical access using disfluencies,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Faced with planning problems related to lexical access, speakers take advantage of a major function of disfluencies: buying time. It is reasonable, then, to expect that the structure of disfluencies sheds light on the mechanisms underlying lexical access. Using data from the Switchboard Corpus, we investigated the effect of semantic competition during lexical access on repetition disfluencies. We hypothesized that the more time the speaker needs to access the following unit, the longer the repetition. We examined the repetitions preceding verbs and nouns and tested predictors influencing the accessibility of these items. Results suggest that speed of lexical access negatively correlates with the length of repetition and that the main determinants of lexical access speed differ for verbs and nouns. Longer disfluencies before verbs appear to be due to significant paradigmatic competition from semantically similar verbs. For nouns, they occur when the noun is relatively unpredictable given the preceding context.

    Keywords DiSS, lexical access, lexicalization, repetition, semantic competition, sentence planning

  • Clara Hedenqvist, Frida Persson, and Robert Eklund, “Disfluency incidence in 6-year old Swedish boys and girls with typical language development,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract This paper reports the prevalence of disfluencies in a group of 55 (25F/30M) Swedish children with typical speech development, and within the age range 6;0 and 6;11. All children had Swedish as their mother tongue. Speech was elicited using an “event picture” which the children described in their own, spontaneously produced, words. The data were analysed with regard to sex differences and lexical ability, including size of vocabulary and word retrieval, which was assessed using the two tests Peabody Picture Vocabulary Test and Ordracet. Results showed that girls produced significantly more unfilled pauses, prolongations and sound repetitions, while boys produced more word repetitions. However, no correlation with lexical development was found. The results are of interest to speech pathologists who study early speech development in search for potential early predictors of speech pathologies.

    Keywords children, DiSS, lexical development, sex differences, speech disfluency

  • Julian Hough, Laura de Ruiter, Simon Betz, and David Schlangen, “Disfluency and laughter annotation agreement in a light-weight dialogue mark-up protocol,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Despite a great deal of research effort, disfluency and laughter annotation is still an unsolved problem, both in terms of consensus for a general applicable system, and in terms of annotation agreement metrics. In this paper we present a new annotation scheme within a light-weight mark-up for spontaneous speech. We show, despite the low overhead required for understanding the annotation protocol, it allows for good inter-annotator agreement and can be used to map onto existing disfluency categorization, with no loss of information.

    Keywords disfluency annotation, DiSS, German corpora, inter-annotator agreement, laughter, spontaneous speech

  • Peter Howell, “Intervention for children with word-finding difficulty: Impact on fluency during spontaneous speech for children using English as their native or as an additional language,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Types of intervention that could be targeted when there are high rates of word-finding difficulty were examined for any impact they had on speech fluency (whole-word repetition rate in particular). Results are reported that are interpreted as showing that a semantic-based intervention has an impact on fluency as well as word-finding.

    Keywords DiSS, EAL, intervention, stuttering, word-finding

  • Jennifer E. Mack, Sarah D. Chandler, Aya Meltzer-Asscher, Emily Rogalski, Sandra Weintraub, M.-Marsel Mesulam, and Cynthia K. Thompson, “What do pauses in narrative production reveal about the nature of word retrieval deficits in PPA?,” Neuropsychologia, vol. 77, 2015, pp. 211 - 222. DOI: http://dx.doi.org/10.1016/j.neuropsychologia.2015.08.019. http://www.sciencedirect.com/science/article/pii/S0028393215301354.

    Abstract Naming and word-retrieval deficits, which are common characteristics of primary progressive aphasia (PPA), differentially affect production across word classes (e.g., nouns, verbs) in some patients. Individuals with the agrammatic variant (PPA-G) often show greater difficulty producing verbs whereas those with the semantic variant (PPA-S) show greater noun deficits and those with logopenic PPA (PPA-L) evince no clear-cut differences in production of the two word classes. To determine the source of these production patterns, the present study examined word-finding pauses as conditioned by lexical variables (i.e., word class, frequency, length) in narrative speech samples of individuals with PPA-S (n=12), PPA-G (n=12), PPA-L (n=11), and cognitively healthy controls (n=12). We also examined the relation between pause distribution and cortical atrophy (i.e., cortical thickness) in nine left hemisphere regions of interest (ROIs) linked to word production. Results showed higher overall pause rates for PPA compared to unimpaired controls; however, greater naming severity was not associated with increased pause rate. Across all groups, more pauses were produced before lower vs. higher frequency words, with no independent effects of word length after controlling for frequency. With regard to word class, the PPA-L group showed a higher rate of pauses prior to production of nouns compared to verbs, consistent with noun-retrieval deficits arising at the lemma level of word production. Those with PPA-G and PPA-S, like controls, produced similar pause rates across word classes; however, lexical simplification (i.e., production of higher-frequency and/or shorter words) was evident in the more-impaired word class: nouns for PPA-S and verbs for PPA-G. These patterns are consistent with conceptual and/or lemma-level impairments for PPA-S, predominantly affecting objects/nouns, and a lemma-level verb-retrieval deficit for PPA-G, with a concomitant impairment in phonological encoding and articulation affecting overall pause rates. The greater tendency to pause before nouns was correlated with atrophy in the left precentral gyrus, inferior frontal gyrus and inferior parietal lobule, whereas the greater tendency to pause before less frequent and longer words was associated with atrophy in left precentral and inferior parietal regions.

    Keywords Brain–behavior relationship

  • Hanae Koiso, and Yasuharu Den, “Causal analysis of acoustic and linguistic factors related to speech planning in Japanese monologs,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract In this paper, we applied a general method of testing path models, investigating causal relationship between cognitive load in speech planning and four types of disfluencies in Japanese monologs. The four disfluencies examined were i) clause-initial fillers, ii) inter-clausal pauses, iii) clause-final lengthening, and iv) boundary pitch movements, which occurred at weak clause boundaries. The length of the constituents following weak clause boundaries was assumed to be a measure of the complexity affecting the cognitive load. By using a model selection technique based on the AIC, we found an optimal model with the smallest AIC, in which the constituent complexity had direct effects on all of the four disfluency variables. In addition, some of the disfluencies influenced one another; clause-final lengthening was enhanced by the presence of a boundary pitch movement and the occurrence of clause-initial fillers was affected by all the other three disfluency variables.

    Keywords boundary pitch movements, clause-final lengthening, DiSS, fillers, path models, pauses

  • Marie-José Kolly, Adrian Leemann, Philippe Boula de Mareüil, and Volker Dellwo, “Speaker-Idiosyncrasy in Pausing Behavior: Evidence from a Cross-Linguistic Study,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0294.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0294.pdf.

    Abstract Phoneticians study acoustic speech signals. But what about the aspects of speech where the signal is silent? The present study investigated speakers’ pausing behavior in their native and non-native speech. Pausing measures were applied in order to study between-speaker and within-speaker variability, where within-speaker variability was introduced by recording speakers in their native Zurich German, and in their second languages English and French. Results showed that pausing measures in the form of pause numbers and pause durations are speaker-specific. Furthermore, this speaker-specificity became evident across different languages. Results are discussed in the context of forensic voice comparison.

    Keywords forensic phonetics, pausing, second language, speaker-idiosyncrasy, temporal features

  • Jixing Li, and Sam Tilsen, “Phonetic Evidence for Two Types of Disfluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0766.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0766.pdf.

    Abstract Disfluency, such as pause (silences), filled pause (e.g., ‘um’, ‘uh’), repetition (e.g., ‘the the’) and cutoff word (e.g., ‘hori[zontal]-’), is a common part of human speech that occurs at a rate of 6 to 10 per 100 words [2, 5]. According to one model of speech production [8], there are two types of disfluency: disfluency at the internal planning stage (e.g., word-retrieval difficulties), and disfluency at the external monitoring stage (e.g., self-correction of speech errors). The current study provides phonetic evidence for the two types of disfluency by examining word durations before different types of disfluency in the Switchboard corpus [6]. The results showed only a marginal increase in the durations of words before cutoffs, but a large increase in the durations of words before repetitions, silences and filled pauses, suggesting internal processing difficulty before noncutoff disfluency, but not before cutoff disfluency.

    Keywords disfluency, duration, self-monitoring, Switchboard

  • Yan-Hua Long, and Hong Ye, “Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech,” PLoS ONE, vol. 10, no. 4, 04/2015 2015. DOI: doi:10.1371/journal.pone.0123466.

    Abstract Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  • Kikuo Maekawa, and Hiroki Mori, “Voice quality analysis of Japanese filled pauses : a preliminary report,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Using the Core of the Corpus of Spontaneous Japanese, acoustic analysis of F1, spectral tilt (TL), H1-H2, jitter and F0 was conducted to examine the voice-quality difference between the vowels in filled pauses and those in ordinary lexical items. It turned out by simple SVM analysis that the two classes of vowels could be discriminated with the mean accuracy of higher than 70%.

    Keywords DiSS

  • Kirsty McDougall, Martin Duckworth, and Toby Hudson, “Individual and Group Variation in Disfluency Features: A Cross-Accent Investigation,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0308.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0308.pdf.

    Abstract A study of individual differences in the fluency disruptions of speakers of two different accents, Standard Southern British English (SSBE) and York English is presented. Distributions of rates of occurrence per 100 syllables are examined for filled and silent pauses, repetitions, prolongations and (self-)interruptions, and subcategories of these. Patterns of occurrence of disfluency features show considerable between-speaker variation in both SSBE and York English. Similar ranges of speakers’ overall disfluency rates are exhibited by both accents, but cross-accent differences are present in the patterning of some disfluency feature categories. The results suggest that a detailed record of disfluency features is a useful additional tool in forensic speaker comparison.

    Keywords accent differences, disfluency, forensic speaker comparison, individual differences

  • Helena Moniz, Jaime Ferreira, Fernando Batista, and Isabel Trancoso, “Disfluency detection across domains,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.

    Keywords acoustic-prosodic features, cross-domain analysis, disfluency detection, DiSS, European Portuguese.

  • Helena Moniz, A. Pompili, Fernando Batista, Isabel Trancoso, A. Abad, and C. Amorim, “Automatic Recogntion of Prosodic Patterns in Semantic Verbal Fluency Tests – An Animal Naming Task for Edutainment Applications,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0997.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0997.pdf.

    Abstract This paper automatically detects prosodic patterns in the domain of semantic fluency tests. Verbal fluency tests aim at evaluating the spontaneous production of words under constrained conditions. Mostly used for assessing cognitive impairment, they can be used in a plethora of domains, as edutainment applications or games with educational purposes. This work discriminates between list effects, disfluencies, and other linguistic events in an animal naming task. Recordings from 42 Portuguese speakers were automatically recognized and AuToBI was applied in order to detect prosodic patterns, using both European Portuguese and English models. Both models allowed to differentiate list effects from the other events, mostly represented by the tunes: L* H/L(-%) (English models) or L*+H H/L(-%) (Portuguese models). However, English models proved to be more suitable because they rely in substantial more training material.

    Keywords and Automatic Speech Recognition, Edutainment, prosody, Semantic Fluency

  • Sieb Nooteboom, and Hugo Quené, “The Word-Onset Effect: Some Contradictory Findings,” 2015. http://www.siebnooteboom.nl/files/pdf/Diss2015WordOnsetsSomeContradictoryFindings.pdf.

    Abstract In this paper we describe two experiments exploring possible for reasons for earlier conflicting results concerning the so-called word-onset effect in interactional segmental speech errors. Experiment 1 elicits errors in pairs of CVC real words with the SLIP technique. No word-onset effect is found. Experiment 2 is a tongue-twister experiment with lists of four disyllabic words. A significant word-onset effect is found. The conflicting results are not resolved. We also found that intervocalic consonants hardly ever interact with initial and final consonants, and that words sharing a stress pattern are a major factor in generating interactional errors.

  • Núria Enríquez, Lourdes Díaz, and Mariona Taulé, “Mental Processes in the Oral Production of Non-Native Spanish Speakers: Pauses and Self-Correction,” Procedia - Social and Behavioral Sciences, vol. 173, 2015, pp. 24-30. DOI: http://dx.doi.org/10.1016/j.sbspro.2015.02.025. http://www.sciencedirect.com/science/article/pii/S1877042815013348.

    Abstract In the field of teaching Spanish as a Foreign Language (SFL), textbooks and teaching materials often provide learners with language samples characterized by a lack of naturalness. We propose the use of a prototypical model of core competence, obtained from the analysis of communicative situations based on real corpora and the comparison of the same type of work with native and non-native speakers. The specific objective is the study of communication strategies related to pauses and self-correction in native and non-native speech, in order to analyse the repair strategies related to language processing

    Keywords L1/L2 corpora

  • Leendert Plug, “Prosodic Marking and Predictability in Lexical Self-Repair,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0032.1-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0032.pdf.

    Abstract This paper reports on an investigation of lexical self-repair in Dutch spontaneous dialogue. Lexical self-repairs, in which one word is rejected for another, can be produced with or without notable ’prosodic marking’ of the second word. It remains unclear what motivates speakers‘ choices, but previous research has shown that the semantic distance between the two words is relevant. This study assesses the relevance of the words’ predictability. Prosodic marking judgements are modelled using an established semantic classification and a range of probabilistic variables, including both frequency-based and cloze-based measures. Results suggest that probabilistic measures add little predictive power to the semantic classification, although informative data trends can be observed.

    Keywords Dutch, predictability, prosody, self-repair, spontaneous speech

  • Ines Rehbein, “Filled Pauses in User-generated Content are Words with Extra-propositional Meaning,” in Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015), Denver, Colorado, Association for Computational Linguistics, June 2015, pp. 12-21. DOI: 10.3115/v1/W15-1302. https://www.aclweb.org/anthology/W15-1302.

    Abstract In this paper, we present a corpus study investigating the use of the fillers äh (uh) and ähm (uhm) in informal spoken German youth language and in written text from social media. Our study shows that filled pauses occur in both corpora as markers of hesitations, corrections, repetitions and unfinished sentences, and that the form as well as the type of the fillers are distributed similarly in both registers. We present an analysis of fillers in written microblogs, illustrating that äh and ähm are used intentionally and can add a subtext to the message that is understandable to both author and reader. We thus argue that filled pauses in user-generated content from social media are words with extrapropositional meaning.

  • Sandra Reitbrecht, and Ursula Hirschfeld, “The Impact of Fluency and Hesitation Phenomena on the Perception of Non-native Speakers by Native Listeners of German,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0166.1-4. http://www.icphs2015.info/pdfs/Papers/ICPHS0166.pdf.

    Abstract The here presented and ongoing study addresses L2 fluency and hesitation phenomena in the context of speech effects in intercultural communication. It investigates the impact of fluency and hesitation phenomena on the perception of non-native speakers by native listeners of German. The first results underline the importance and salience of hesitation phenomena and fluency for speech effects and suggest a higher consideration of these features in future studies. Native recipients’ verbal reactions to L2 speech material show that they often make reference to features of L2 utterance fluency to explain how they perceive non-native speakers, their personality and their emotional state. Furthermore, Spearman’s rank correlation tests for a certain number of fixed perceptual categories prove significant correlations between perceived fluency and the attributes assured (r(309)=0.617, p<0.01), well prepared (r(303)=0.589, p<0.01), competent (r(305)=0.483, p<0.01), relaxed (r(307)=0.375, p<0.01) and nervous (r(309)=-0.322, p<0.01).

    Keywords Czech, Fluency, French, German as a foreign language, speech effects

  • Ralph Rose, “Um and uh as differential delay markers: the role of contextual factors,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract The English filled pauses uh and um have been argued to correspond respectively to shorter and longer anticipated delays in speech production. This study looks at some contextual factors that might cause this difference by investigating filled pause instances in monologue and conversation speech corpora. Results are consistent with previously observed delay differences and further show that discourse-level processing may influence differential delay marking though monologue results are more conclusive than conversation results. However, no evidence was found that lexical factors (word type, frequency) correlate with filled pause choice. The findings suggest a limited view of how speakers use filled pauses as delay markers: Not all contextual factors may trigger differential delay marking.

    Keywords contextual factors, delay, DiSS, filled pause

  • Ralph Rose, “Temporal Variables in First and Second Language Speech and Perception of Fluency,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0405.1-5. http://www.roselab.sci.waseda.ac.jp/resources/file/2015_icphs_rose_paper.pdf.

    Abstract Evidence is accumulating that many temporal features of second language speech are correlated with those of first language speech. This study looks at the correlation between articulation rate, pause rate, and mean pause duration in Japanese first and English second language speech and how second language fluency raters perceive these. In a crosslinguistic corpus of spontaneous speech, mean pause duration was found to have a near-high correlation while the other two temporal variables have a moderate correlation. A subsequent elicitation of fluency judgments on the second language English speech via Amazon Mechanical Turk showed that ratings were highly dependent on pause duration, rather less on articulation rate, but not on pause rate. Results suggest that raters’ perception of second language fluency is divergent from speakers’ actual second language development: Ratings are related to features that are not indicative of second language development but rather of individual speech patterns.

    Keywords articulation rate, Fluency, second language acquisition, silent pause

  • Sara Bögels, Kobin H. Kendrick, and Stephen C. Levinson, “Never Say No … How the Brain Interprets the Pregnant Pause in Conversation,” PLoS ONE, vol. 10, no. 12, 2015, pp. 15. DOI: 10.1371/journal.pone.0145474. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0145474.

    Abstract In conversation, negative responses to invitations, requests, offers, and the like are more likely to occur with a delay–conversation analysts talk of them as dispreferred. Here we examine the contrastive cognitive load ‘yes’ and ‘no’ responses make, either when relatively fast (300 ms after question offset) or delayed (1000 ms). Participants heard short dialogues contrasting in speed and valence of response while having their EEG recorded. We found that a fast ‘no’ evokes an N400-effect relative to a fast ‘yes’; however, this contrast disappeared in the delayed responses. ’No’ responses, however, elicited a late frontal positivity both if they were fast and if they were delayed. We interpret these results as follows: a fast ‘no’ evoked an N400 because an immediate response is expected to be positive–this effect disappears as the response time lengthens because now in ordinary conversation the probability of a ‘no’ has increased. However, regardless of the latency of response, a ‘no’ response is associated with a late positivity, since a negative response is always dispreferred. Together these results show that negative responses to social actions exact a higher cognitive load, but especially when least expected, in immediate response.

  • Miki Shrosbree, “Cross-Linguistic Articulation Rate among Near-Balanced Bilinguals and Implications for Second Language Fluency Measurement,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0572.1-4. http://www.icphs2015.info/pdfs/Papers/ICPHS0572.pdf.

    Abstract The present study examines cross-linguistic articulation rates in read speech among 28 native speakers (14 English and 14 Japanese) and 14 Japanese-English near-balanced bilinguals. The results show that: (1) articulation rates are comparable between the native speakers and the bilinguals; (2) there was a significant difference of articulation rates in Japanese and English among the bilinguals; (3) there is a strong positive correlation between English and Japanese articulation rates among bilinguals. Implications for development of L2 fluency measurement using the L1 fluency as a baseline are discussed.

    Keywords articulation rate, balanced bilingual, Fluency, second language, speech rate

  • Vered Silber-Varod, Adva Weiss, and Noam Amir, “Can you hear these mid-front vowels? Formants analysis of hesitation disfluencies in spontaneous Hebrew,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract This study attempts to characterize the timbre of the default type of hesitation disfluency (HD) in Israeli Hebrew: the mid-front vowel /e/. For this purpose, we analysed the frequencies of the first three formants, F1, F2, and F3, of hundreds of HD pronunciations taken from The Corpus of Spoken Israeli Hebrew (COSIH). We also compared the formant values with two former studies that were carried out on the vowel /e/ in fluent speech. The findings show that, in general, elongated word-final syllables and appended [e]s are pronounced with the same amount of openness as fluent [e], while filled pauses tend to be more open (lower F1), and more frontal (higher F2). Following these results, we suggest to use different set of IPA symbols, and not the phonemic mid-front /e/, in order to better represent hesitation disfluencies.

    Keywords DiSS, filled pauses, formants, Hebrew, hesitation disfluency, LPC analysis, spontaneous speech

  • Anton Stepikhov, and Anastassia Loukina, “Sentence Boundaries in Text and Pauses in Speech: Correlation or Confrontation?,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0588.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0588.pdf.

    Abstract The paper explores the interaction between sentence boundaries marked by annotators in transcriptions of Russian spontaneous speech and actual prosodic boundaries in the signal. The aim of the research is to investigate whether annotators’ prosodic competence allows them to correctly detect sentence boundaries in speech based on textual information only. We found that inter-annotator agreement for each sentence boundary identified in transcription was affected by both presence or absence of pause and pause duration. Mixed linear model showed that presence or absence of pause explain 13% of variance in boundary detection. Pause duration explained only 4% of variance in inter-annotator agreement with moderate correlation of r = 0.21. We argue that relatively small size of effect in this case may be due to the interaction of different pausing strategies typical for reading and spontaneous speech, ambiguity of sentence boundaries and individual differences in speech perception.

    Keywords annotation, boundary detection, pausing, Russian, spontaneous speech

  • Jozsef Szakos, and Ulrike Glavitsch, “Investigating disfluency in recordings of last speakers of endangered Austronesion languages in Taiwan,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract The nearly three decades spent in Formosan language documentation produced hundreds of hours of recorded speech. In this paper, we show how the use of SpeechIndexer for transcribing and indexing the data visualises the problem of disfluency in the spontaneous narratives and dialogues. The semiautomatic alignment of speech and transcription needs to be adjusted manually each time when unpredictable pauses occur which are disfluencies, rather than markers of phrasal units. It is illustrated how the combination of SpeechIndexer’s pause finder with pitch measurements can help to pinpoint the difference of phrasal boundaries and pauses of disfluency.

    Keywords Austronesian, DiSS, lesser-documented unwritten language, pause finder, SpeechIndexer

  • Leimin Tian, Catherine Lai, and Johanna Moore, “Recognising emotions in dialogues with disfluencies and non-verbal vocalisations,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV features alone is not as predictive as using lexical or acoustic features. However, adding them to lexical or acoustic feature set yields improvement compared to using lexical or acoustic features alone. This indicates that disfluencies and non-verbal vocalisations provide useful information overlooked by the other two types of features for emotion recognition.

    Keywords Dialogue, disfluency, DiSS, emotion recognition, HCI, speech processing

  • Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King, “A lattice-based approach to automatic filled pause insertion,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.

    Keywords disfluency, DiSS, f-RNNLMs, filled pauses, lattices, Ngrams

  • Gunnel Tottie, “From pause to word: Uh and um in written language.,” in ICAME 36 (WORDS, WORDS, WORDS – CORPORA AND LEXIS), 05/2015 2015, pp. 174. https://www.uni-trier.de/fileadmin/fb2/ANG/ICAME36/ICAME_36_abstracts_booklet.pdf.

    Abstract (none)

  • Michiko Watanabe, Yosuke Kashiwagi, and Kikuo Maekawa, “The relationship between preceding clause type, subsequent clause length and duration of silent and filled pauses at clause boundaries in Japanese monologues,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Filled pauses (FPs) are claimed to occur when speakers have some difficulties and need extra time in speech production. This study investigated whether the following two factors affect silent pause (SP) and FP durations at clause boundaries, using a spontaneous speech corpus: 1) boundary strength and 2) subsequent clause length. First, whether SP and FP durations increase with syntactic boundary strength was examined. Second, whether subsequent clause length affects SP and FP durations at the boundaries was investigated. Results show SP duration increased with boundary strength and subsequent clause length, but FP duration did not, suggesting only SP duration is affected by the two Factors.

    Keywords clause boundary, disfluency, DiSS, filled pause, silent pause, speech planning

  • Mirjam Wester, Martin Corley, and Rasmus Dall, “The temporal delay hypothesis: natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Including disfluencies in synthetic speech is being explored as a way of making synthetic speech sound more natural and conversational. How to measure whether the resulting speech is actually more natural, however, is not straightforward. Conventional approaches to synthetic speech evaluation fall short as a listener is either primed to prefer stimuli with filled pauses or, when they aren’t primed they prefer more fluent speech. Psycholinguistic reaction time experiments may circumvent this issue. In this paper, we revisit one such reaction time experiment. For natural speech, delays in word onset were found to facilitate word recognition regardless of the type of delay; be they a filled pause (um), silence or a tone. We expand these experiments by examining the effect of using vocoded and synthetic speech. Our results partially replicate previous findings. For natural and vocoded speech, if the delay is a silent pause, significant increases in the speed of word recognition are found. If the delay comprises a filled pause there is a significant increase in reaction time for vocoded speech but not for natural speech. For synthetic speech, no clear effects of delay on word recognition are found. We hypothesise this is because it takes longer (requires more cognitive resources) to process synthetic speech than natural or vocoded speech.

    Keywords delay hypothesis, disfluency, DiSS

  • Maria K. Wolters, Luis Ferrini, Elaine Farrow, Aurora Szentagotai Tatar, and Christopher D. Burton, “Tracking Depressed Mood Using Speech Pause Patterns,” in Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK, the University of Glasgow, August 2015, pp. 0811.1-5. http://www.icphs2015.info/pdfs/Papers/ICPHS0811.pdf.

    Abstract The speech of people with depression often shows clear signs of their condition (e.g., flat intonation, slow speech, long pauses), but it is not clear to what extent these signs covary with diurnal fluctuations in mood. In this paper, we report results from a pilot longitudinal study where 11 people with depression tracked various aspects of their mental health for a month. This included a daily mood tracker and regular completion of speech tasks. Speech tasks were designed to be emotionally neutral and require different levels of automaticity. We found that participants differed in their willingness to complete the speech tasks, and that preliminary analyses show no clear link between mood and prosody. We discuss implications of this study for tracking depressed mood using speech in real-life applications.

    Keywords depression, emotion, pauses, prosody

  • Clare Wright, and Cong Zhang, “The effect of study abroad experience on L2 Mandarin disfluency in different types of tasks,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015), Edinburgh, Scotland, August 2015. http://diss2019.elte.hu/wp-content/uploads/2018/09/DiSS2015_Papers.pdf.

    Abstract Disfluency is a common phenomenon in L2 speech, especially in beginners’ speech. Whether studying abroad can help with reducing their disfluency or not remains debated [8]. We examined longitudinal data from 10 adult English instructed learners of Mandarin measured before and after ten months of studying abroad (SA) in this paper. We used two speaking tasks comparing pre-planned vs. Unplanned spontaneous speech to compare differences over time and between tasks, using eight linguistic and temporal fluency measures (analysed using CLAN and PRAAT). Overall mean linguistic and temporal fluency scores improved significantly (p < .05), especially speech rate (p <.01), supporting the general claim that SA favours oral development, particularly fluency [2]. Further analysis revealed task differences at both times of measurement, but with greater improvement in the spontaneous task.

    Keywords DiSS, Fluency, L2 Mandarin, study abroad

2014

  • Hans Rutger Bosker, Hugo Quené, Ted Sanders, and Nivja H. Jong, “The Perception of Fluency in Native and Nonnative Speech,” Language Learning, vol. 64, no. 3, 9 2014, pp. 579–614. DOI: 10.1111/lang.12067. http:https://dx.doi.org/10.1111/lang.12067.

    Abstract Where native speakers supposedly are fluent by default, nonnative speakers often have to strive hard to achieve a nativelike fluency level. However, disfluencies (such as pauses, fillers, repairs, etc.) occur in both native and nonnative speech and it is as yet unclear how fluency raters weigh the fluency characteristics of native and nonnative speech. Two rating experiments compared the way raters assess the fluency of native and nonnative speech. The fluency characteristics were controlled by using phonetic manipulations in pause (Experiment 1) and speed characteristics (Experiment 2). The results show that the ratings of manipulated native and nonnative speech were affected in a similar fashion. This suggests that there is no difference in the way listeners weigh the fluency characteristics of native and nonnative speakers.

  • Richard Dufour, Yannick Estève, and Paul Deléglise, “Characterizing and detecting spontaneous speech: Application to speaker role recognition,” Speech Communication, vol. 56, 2014, pp. 1 - 18. DOI: 10.1016/j.specom.2013.07.007. http://www.sciencedirect.com/science/article/pii/S0167639313000976.

    Abstract Processing spontaneous speech is one of the many challenges that automatic speech recognition systems have to deal with. The main characteristics of this kind of speech are disfluencies (filled pause, repetition, false start, etc.) and many studies have focused on their detection and correction. Spontaneous speech is defined in opposition to prepared speech, where utterances contain well-formed sentences close to those found in written documents. Acoustic and linguistic features made available by the use of an automatic speech recognition system are proposed to characterize and detect spontaneous speech segments from large audio databases. To better define this notion of spontaneous speech, segments of an 11-hour corpus (French Broadcast News) had been manually labeled according to three classes of spontaneity. Firstly, we present a study of these features. We then propose a two-level strategy to automatically assign a class of spontaneity to each speech segment. The proposed system reaches a 73.0% precision and a 73.5% recall on high spontaneous speech segments, and a 66.8% precision and a 69.6% recall on prepared speech segments. A quantitative study shows that the classes of spontaneity are useful information to characterize the speaker roles. This is confirmed by extending the speech spontaneity characterization approach to build an efficient automatic speaker role recognition system.

    Keywords Spontaneous speech; Speaker role; Feature extraction; Speech classification; Automatic speech recognition; Role recognition

  • Eszter Tisljár-Szabó, and Csaba Pléh, “Ascribing emotions depending on pause length in native and foreign language speech,” Speech Communication, vol. 56, 2014, pp. 35-48. DOI: http://dx.doi.org/10.1016/j.specom.2013.07.009. http://www.sciencedirect.com/science/article/pii/S016763931300099X.

    Abstract Although the relationship between emotions and speech is well documented, little is known about the role of speech pauses in emotion expression and emotion recognition. The present study investigated how speech pause length influences how listeners ascribe emotional states to the speaker. Emotionally neutral Hungarian speech samples were taken, and speech pauses were systematically manipulated to create five variants of all passages. Hungarian and Austrian participants rated the emotionality of these passages by indicating on a 1–6 point scale how angry, sad, disgusted, happy, surprised, scared, positive, and heated the speaker could have been. The data reveal that the length of silent pauses influences listeners in attributing emotional states to the speaker. Our findings argue that pauses play a relevant role in ascribing emotions and that this phenomenon might be partly independent of language.

    Keywords Foreign language

  • Ian R. Finlayson, “Testing the roles of disfluency and rate of speech in the coordination of conversation,” Master's Thesis, Queen Margaret University, Edinburgh, Scotland, UK, . 2014. http://etheses.qmu.ac.uk/1631/.

    Abstract This thesis is concerned with two different accounts of how speakers coordinate conversation. In both accounts it is suggested that aspects of the manner in which speech is performed (its disfluency and its rate) are integral to the smooth performance of conversation. In the first strand, we address Clark’s (1996) suggestion that speakers design hesitations, such as filled pauses (e.g. uh and um), repetitions and prolongations, to signal to their audience that they are experiencing difficulties during language production. Such signals allow speakers to account for their use of time, particularly when they experience disruptions during production. The account is tested against three criteria, proposed by Kraljic and Brennan (2005), for evaluating whether a feature of speech is being designed: That it be produced with regularity, that it be interpretable by listeners, and that its production varies according to the speaker’s communicative intention. While existing literature offers support for the first two criteria, neither an experiment with dyads nor analyses of dialogue in the Map Task Corpus (MTC; Anderson et al., 1991) found support for the third criterion. We conclude that, rather than being signals of difficulty, hesitations are merely symptoms which listeners may exploit to aid comprehension. In the second strand, we tested Wilson and Wilson’s (2005) oscillator theory of the timing of turn-taking. This suggests that entrainment between conversational partners’ rates of speech allow them to make precise predictions about when each others’ turns are going to end, and, subsequently, when they can begin a turn of their own. As a critical test of the theory, we predicted that speakers who were more tightly entrained would produce more seamless turn-taking. Again using the MTC, we found no evidence of a relationship between how closely entrained speakers were and how precisely they timed the beginning of their turns relative to the ends of each others’ turns.

  • Craig Lambert, and Judit Kormos, “Complexity, Accuracy, and Fluency in Task-based L2 Research: Toward More Developmentally Based Measures of Second Language Acquisition,” Applied Linguistics, vol. 35, no. 5, 08/2014 2014, pp. 607-614. DOI: doi.org/10.1093/applin/amu047. https://academic.oup.com/applij/article/35/5/607/2887860/Complexity-Accuracy-and-Fluency-in-Task-based-L2.

    Abstract This article surveys how complexity, accuracy, and fluency (CAF) have been operationalized in studies of task-based L2 production, pointing out some problems with this approach and the need for more precise information about L2 development during task performance. Research into developing L1 text construction ability is then discussed and some approaches for establishing measures of the relevant constructs in L2 performance are suggested.

  • Charlyn M. Laserna, Yi-Tai Seih, and James W. Pennebaker, “Um . . . Who Like Says You Know : Filler Word Use as a Function of Age, Gender, and Personality,” Journal of Language and Social Psychology, vol. 33, no. 3, 2014, pp. 328-338. DOI: 10.1177/0261927X14526993. http://jls.sagepub.com/content/early/2014/03/26/0261927X14526993.abstract.

    Abstract Filler words ('I mean, you know, like, uh, um') are commonly used in spoken conversation. The authors analyzed these five filler words from transcripts recorded by a device called the Electronically Activated Recorder (EAR), which sampled participants’ language use in daily conversations over several days. By examining filler words from 263 transcriptions of natural language from five separate studies, the current research sought to clarify the psychometric properties of filler words. An exploratory factor analysis extracted two factors from the five filler words: filled pauses ('uh, um') and discourse markers ('I mean, you know, like'). Overall, filled pauses were used at comparable rates across genders and ages. Discourse markers, however, were more common among women, younger participants, and more conscientious people. These findings suggest that filler word use can be considered a potential social and personality marker.

    Keywords discourse marker, EAR, filler word, LIWC

  • Olga Vyacheslavovna Maletina, “All Theses and Dissertations Understanding L1-L2 Fluency Relationship Across Different Languages and Different Proficiency Levels,” Master's Thesis, Brigham Young University. 06/2014 2014, pp. 4094. http://scholarsarchive.byu.edu/etd/4094/.

    Abstract The purpose of this research was to better understand the relationship between L1 and L2 fluency, precisely, whether there is a relationship between L1 and L2 temporal fluency measures and whether this relationship differs across different languages and different proficiency levels. In order to answer these questions, L1 and L2 speech samples of the same speakers were collected and analyzed. Twenty-five native speakers and 45 non-native speakers of Japanese, Mandarin Chinese, Portuguese, Spanish, and Russian were asked to respond to questions and perform picture descriptions in their L1 and L2. The recorded speech samples were then analyzed by means of a Praat script in order to identify mean length of run (MLR), speech rate, and number of pauses. Several different statistical analyses were then performed to compare these L1 and L2 temporal features across different languages and different proficiency levels. The results of this study indicate that there is a strong relationship between L1 and L2 fluency and that this relationship may play a role in L2 production. Furthermore, it was found that native languages differ in their patterns of L1 temporal fluency production and that these differences may affect the production of L2 temporal fluency. It was also found that L1-L2 fluency relationship did not differ at different proficiency levels suggesting that individual factors may play a role in L2 fluency production. Thus, it was found that an Intermediate speaker of Spanish, for instance, did not speak faster than an Intermediate speaker of Russian, suggesting that naturally slower speakers in their L1 will still speak slower in their L2. These results indicate that fluency is as much of a trait as it is a state. However, it was also found that not all of the L1-L2 language combinations demonstrated the same results, indicating that the L1-L2 fluency relationship is affected by the L2. These findings have different implications for both L2 teaching and learning, as well as L2 assessment of fluency and overall language proficiency.

    Keywords acquisition, Fluency, proficiency, second-language

  • Helena Moniz, Fernando Batista, Ana Isabel Mata, and Isabel Trancoso, “Speaking style effects in the production of disfluencies,” Speech Communication, vol. 65, 2014, pp. 20-35. DOI: 10.1016/j.specom.2014.05.004. http://www.sciencedirect.com/science/article/pii/S0167639314000430.

    Abstract This work explores speaking style effects in the production of disfluencies. University lectures and map-task dialogues are analyzed in order to evaluate if the prosodic strategies used when uttering disfluencies vary across speaking styles. Our results show that the distribution of disfluency types is not arbitrary across lectures and dialogues. Moreover, although there is a statistically significant cross-style strategy of prosodic contrast marking (pitch and energy increases) between the region to repair and the repair of fluency, this strategy is displayed differently depending on the specific speech task. The overall patterns observed in the lectures, with regularities ascribed for speaker and disfluency types, do not hold with the same strength for the dialogues, due to underlying specificities of the communicative purposes. The tempo patterns found for both speech tasks also confirm their distinct behaviour, evidencing the more dynamic tempo characteristics of dialogues. In university lectures, prosodic cues are given to the listener both for the units inside disfluent regions and between these and the adjacent contexts. This suggests a stronger prosodic contrast marking of disfluency–fluency repair when compared to dialogues, as if teachers were monitoring the different regions – the introduction to a disfluency, the disfluency itself and the beginning of the repair – demarcating them in very contrastive ways.

    Keywords Prosody, Disfluencies, Lectures, Dialogues, Speaking styles

  • O’Brien,Mary Grantham, “L2 Learners’ Assessments of Accentedness, Fluency, and Comprehensibility of Native and Nonnative German Speech,” Language Learning, vol. 64, no. 4, 12/2014 2014, pp. 715-748. DOI: 10.1111/lang.12082. http://onlinelibrary.wiley.com/doi/10.1111/lang.12082/full.

    Abstract In early stages of classroom language learning, many adult second language (L2) learners communicate primarily with one another, yet we know little about which speech stream characteristics learners tune into or the extent to which they understand this lingua franca communication. In the current study, 25 native English speakers learning German as a L2 with varying levels of German proficiency rated German speech produced by native speakers and fellow learners of German along three continua: accentedness, fluency, and comprehensibility. An examination of speech stream (i.e., phonological, fluency based, and lexical/grammatical) characteristics along with partial correlations indicates both that the raters distinguished among the three concepts but that they conflated the term fluency with proficiency. Self‐reported proficiency in German and linguistic training were the best predictors of the ratings assigned.

    Keywords accentedness, Comprehensibility, Fluency, German, L2 raters, L2 speech

  • Vikram Ramanarayanan, Adam Lammert, Louis Goldstein, and Shrikanth Narayanan, “Are Articulatory Settings Mechanically Advantageous for Speech Motor Control?,” PLoS ONE, vol. 9, no. 8, 08/2014 2014, pp. e104168. DOI: 10.1371/journal.pone.0104168. http://dx.doi.org/10.1371%2Fjournal.pone.0104168.

    Abstract We address the hypothesis that postures adopted during grammatical pauses in speech production are more “mechanically advantageous” than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and consonant intervals using automated analysis of video captured with real-time magnetic resonance imaging during production of read and spontaneous speech by 5 healthy speakers of American English. We then use locally-weighted linear regression to estimate the articulatory forward map from low-level articulator variables to high-level task/goal variables for these postures. We quantify the overall magnitude of the first derivative of the forward map as a measure of mechanical advantage. We find that postures assumed during grammatical pauses in speech as well as speech-ready postures are significantly more mechanically advantageous than postures assumed during absolute rest. Further, these postures represent empirical extremes of mechanical advantage, between which lie the postures assumed during various vowels and consonants. Relative mechanical advantage of different postures might be an important physical constraint influencing planning and control of speech production.

  • Scott H. Fraundorf, and Duane G. Watson, “Alice’s adventures in um-derland: psycholinguistic sources of variation in disfluency production,” Language, Cognition and Neuroscience, vol. 29, no. 9, 2014, pp. 1083-1096. DOI: 10.1080/01690965.2013.832785. http://dx.doi.org/10.1080/01690965.2013.832785.

    Abstract This study tests the hypothesis that three common types of disfluency (fillers, silent pauses and repeated words) reflect variance in what strategies are available to the production system for responding to difficulty in language production. Participants’ speech in a storytelling paradigm was coded for the three disfluency types. Repeats occurred most often when difficult material was already being produced and could be repeated, but fillers and silent pauses occurred most when difficult material was still being planned. Fillers were associated only with conceptual difficulties, consistent with the proposal that they reflect a communicative signal, whereas silent pauses and repeats were also related to lexical and phonological difficulties. These differences are discussed in terms of different strategies available to the language production system.

    Keywords discourse, Disfluency, Language production

  • Gunnel Tottie, “On the use of uh and um in American English,” Functions of Language, vol. 21, no. 1, 2014, pp. 6-29. DOI: http://dx.doi.org/10.1075/fol.21.1.02tot. http://www.jbe-platform.com/content/journals/10.1075/fol.21.1.02tot.

    Abstract This study examines the use of uh and um — referred to jointly as UHM — in 14 conversations totaling c. 62,350 words from the Santa Barbara Corpus of Spoken American English. UHM was much less frequent than in British English with 7.5 vs. 14.5 instances per million words in the British National Corpus. However, as in British English the frequency of UHM was closely correlated to extra-linguistic context. Conversations in non-private environments (such as offices and classrooms) had higher frequencies than those taking place in private spaces, mostly homes. Time required for planning, especially when difficult subjects were discussed, appeared to be an important explanatory factor. It is clear that UHM cannot be dismissed as mere hesitation or disfluency; it functions as a pragmatic marker on a par with well, you know, and I mean, sharing some of the functions of these in discourse. Although the role of sociolinguistic factors was less clear, the tendencies for older speakers and educated speakers to use UHM more frequently than younger and less educated ones paralleled British usage, but contrary to British usage, there were no gender differences.

2013

  • Julie Beliao, and Anne Lacheret, “Disfluency and discursive markers: when prosody and syntax plan discourse,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 5-8. http://www.isca-speech.org/archive/diss_2013/papers/dis6_005.pdf.

    Abstract Hesitations, interruptions within phrases or within words are common in spontaneous speech. Those phenomena are widely known to be observable from a prosodic point of view through disfluencies. From a syntactic point of view, many studies already established that discursive markers such as hm, oh, I mean, etc. are representative of spontaneous speech. In this study, we demonstrate through a joint corpus-based analysis that these prosodical and syntactical features are correlated, without however being equivalent. More precisely, the lack of either disfluencies or discursive markers is consistently shown to be representative of a planned discourse.

    Keywords discursive marker, disfluency, DiSS, genres

  • Malte Belz, and Myriam Klapi, “Pauses following fillers in L1 and L2 German map task dialogues,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 9-12. http://www.isca-speech.org/archive/diss_2013/papers/dis6_009.pdf.

    Abstract Fillers and pauses in spoken language indicate hesitations. Filler type (uh vs. um) is believed to signal a minor or major following speech delay in L1. We examined whether advanced speakers of L2 German use pauses following filler type (äh vs. ähm) in the same way as native speakers do. Two Map Task corpora of L1 and L2 were contrasted with respect to speaker role, filler type and the exact time interval of fillers and pauses. Speaker role influenced the disfluency patterns in L1 and L2 in the same way. Filler type had no impact on the length of the following pause, but the time interval patterns differed significantly. Longer filler intervals are followed by longer pauses in L2 and by shorter pauses in L1. These results suggest that filler type in German is not used to indicate the length of the following delay. Advanced learners seem to have adopted this pattern of use, but cannot overcome their hesitations as fast as native speakers, probably due to their less automatised speech production.

    Keywords contrastive analysis, disfluencies, DiSS, fillers, German, L1, L2, map task, pauses, spontaneous speech

  • Sara Candeias, Dirce Celorico, Jorge Proença, Arlindo Veiga, and Fernando Perdigão, “HESITA(tions) in Portuguese: a database,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 13-16. http://www.isca-speech.org/archive/diss_2013/papers/dis6_013.pdf.

    Abstract With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.

    Keywords annotation, disfluency, DiSS, hesitation corpus, hesitations, prepared speech, spontaneous speech

  • Rebecca Carroll, and Esther Ruigendijk, “The Effects of Syntactic Complexity on Processing Sentences in Noise,” Journal of Psycholinguistic Research, vol. 42, no. 2, 2013, pp. 139–159. DOI: 10.1007/s10936-012-9213-7. http://dx.doi.org/10.1007/s10936-012-9213-7.

    Abstract This paper discusses the influence of stationary (non-fluctuating) noise on processing and understanding of sentences, which vary in their syntactic complexity (with the factors canonicity, embedding, ambiguity). It presents data from two RT-studies with 44 participants testing processing of German sentences in silence and in noise. Results show a stronger impact of noise on the processing of structurally difficult than on syntactically simpler parts of the sentence. This may be explained by a combination of decreased acoustical information and an increased strain on cognitive resources, such as working memory or attention, which is caused by noise. The noise effect for embedded sentences is less than for non-embedded sentences, which may be explained by a benefit from prosodic information.

  • Nivja H. de Jong, and Hans Rutger Bosker, “Choosing a threshold for silent pauses to measure second language fluency,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 17-20. http://www.isca-speech.org/archive/diss_2013/papers/dis6_017.pdf.

    Abstract Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.

    Keywords DiSS, duration of pauses, number of pauses, second language speech, silent pause threshold, silent pauses

  • Nivja H. de Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, and Jan H. Hulstijn, “Linguistic skills and speaking fluency in a second language,” Applied Psycholinguistics, vol. 34, no. 5, 09/2013 2013, pp. 893-916. DOI: 10.1017/S0142716412000069. http://journals.cambridge.org/article_S0142716412000069.

    Abstract This study investigated how individual differences in linguistic knowledge and processing skills relate to individual differences in speaking fluency. Speakers of Dutch as a second language (N = 179) performed eight speaking tasks, from which several measures of fluency were derived such as measures for pausing, repairing, and speed (mean syllable duration). In addition, participants performed separate tasks, designed to gauge individuals’ second language linguistic knowledge and linguistic processing speed. The results showed that the linguistic skills were most strongly related to average syllable duration, of which 50% of individual variance was explained; in contrast, average pausing duration was only weakly related to linguistic knowledge and processing skills.

  • Laura E. de Ruiter, “Self-repairs in German children’s peer interaction - initial explorations,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 29-32. http://www.isca-speech.org/archive/diss_2013/papers/dis6_029.pdf.

    Abstract Forty-nine self-repairs were extracted from a corpus of conversational speech of ten German children (mean age 5;1) with peers. The repairs were analysed using Levelt’s [1] classification and compared with his adult data. Children produced fewer appropriateness repairs than adults, but more covert repairs and more phonetic repairs. Like adults, children had a preference to interrupt themselves within-word only for error repairs. Unlike adults, children did not produce editing terms following interruptions.

    Keywords DiSS

  • Andrea Deme, and Alexandra Markó, “Lengthenings aand filled pauses in Hungarian adults’ and children’s speech,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 21-24. http://www.isca-speech.org/archive/diss_2013/papers/dis6_021.pdf.

    Abstract In the present paper vowel lengthenings and non-lexicalized filled pauses were studied in the spontaneous speech of children and adults (focusing more on the much less studied phenomenon: vowel lengthening). The results revealed different usage and appearance of lengthenings in the two age groups, therefore, differences in speech skills and strategies can be concluded. LEs and FPs differ mostly in their position in the speech session between the age groups, which has implications regarding different planning strategies of adults and children. We also draw conclusions regarding the methodological considerations in the issue of identifying vowel lengthening supporting a previously formulated conception.

    Keywords (non-lexicalized) filled pause, discourse management, DiSS, lengthening, speech planning, spontaneous speech

  • Yasuharu Den, and Natsuko Nakagawa, “Anti-zero pronominalization: when Japanese speakers overtly express omissible topic phrases,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 25-28. http://www.isca-speech.org/archive/diss_2013/papers/dis6_025.pdf.

    Abstract In this paper, we focus on cases where Japanese speakers overtly express a topic phrase that could have been omitted. We call this phenomenon anti-zero-pronominalization and hypothesize that this helps speakers gain time for planning a following utterance; anti-zero-pronominalization is another option to deal with cognitive load at the beginning of an utterance in addition to fillers and other speech disfluencies. Based on a quantitative analysis of a corpus of spontaneous Japanese dialogs, we investigate the difference between overt topic NPs and zero-pronouns. We show that i) the utterance is more complex when the topic is expressed as an overt NP than when it is expressed as a zero-pronoun; ii) turn-initial items such as fillers are produced less frequently when overt NPs appear than when zero-pronouns appear; and iii) the utterance becomes more complex when the last mora of the topic is more prolonged.

    Keywords cognitive load, DiSS, Japanese dialogs, topic phrases, zero-pronouns

  • Luis J. García-López, M. Belén Díez-Bedmar, and José M. Almansa-Moreno, “From Being a Trainee to Being a Trainer: Helping Peers Improve their Public Speaking Skills,” Revista de Psicodidáctica, vol. 18, no. 2, 2013, pp. 331-342. DOI: 10.1387/RevPsicodidact.6419. http://www.redalyc.org/articulo.oa?id=17527003006.

    Abstract Although public speaking anxiety is present at all educational stages, the university period is critical since the students’ lack of oral communication skills may prevent them from accomplishing their educational goals. To improve this situation, a two-fold objective was pursued in this study. First, to examine the effects of a 3-hour public speaking training workshop for Psychology undergraduates. Second, to test if these students could effectively train other undergraduates to use public speaking skills and reduce their anxiety by using a collaborative methodology and peer tutoring. The findings prove that the training of Psychology students resulted in their peers’ improvement of their oral communication skills and reduction of their speech anxiety. Both groups of students benefited from the study: Psychology students had the opportunity to improve their communication skills and gained practical experience, and the other undergraduates received a free, personalized and successful workshop which improved their communication skills and reduced their anxiety levels.

    Keywords collaborative methodology, Communication skills, peers, public speaking

  • Jonathan Ginzburg, Raquel Fernández, and David Schlangen, “Self-addressed questions in disfluencies,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 33-36. http://www.isca-speech.org/archive/diss_2013/papers/dis6_033.pdf.

    Abstract The paper considers self-addressed queries – queries speakers address to themselves in the aftermath of a filled pause. We study their distribution in the BNC and show that such queries show signs of sensitivity to the syntactic/semantic type of the sub-utterance they follow. We offer a formal model that explains the coherence of such queries.

    Keywords DiSS

  • Sandra Götz, Fluency in Native and Nonnative English Speech. Amsterdam, Netherlands: John Benjamins Publishing Company.2013, pp. 238. DOI: 10.1075/scl.53. https://benjamins.com/$#$catalog/books/scl.53/main.

    Abstract This book takes a new and holistic approach to fluency in English speech and differentiates between productive, perceptive, and nonverbal fluency. The in-depth corpus-based description of productive fluency points out major differences of how fluency is established in native and nonnative speech. It also reveals areas in which even highly advanced learners of English still deviate strongly from the native target norm and in which they have already approximated to it. Based on these findings, selected learners are subjected to native speakers’ ratings of seven perceptive fluency variables in order to test which variables are most responsible for a perception of oral proficiency on the sides of the listeners. Finally, language-pedagogical implications derived from these findings for the improvement of fluency in learner language are presented. This book is conceptually and methodologically relevant for corpus-linguistics, learner corpus research and foreign language teaching and learning.

  • Ivan Hernandez, and Jesse Lee Preston, “Disfluency disrupts the confirmation bias,” Journal of Experimental Social Psychology, vol. 49, no. 1, 01/2013 2013, pp. 178-182. DOI: http://dx.doi.org/10.1016/j.jesp.2012.08.010. http://www.sciencedirect.com/science/article/pii/S002210311200176X.

    Abstract One difficulty in persuasion is overcoming the confirmation bias, where people selectively seek evidence that is consistent with their prior beliefs and expectations. This biased search for information allows people to analyze new information in an efficient, but shallow way. The present research discusses how experienced difficultly in processing (disfluency) can reduce the confirmation bias by promoting careful, analytic processing. In two studies, participants with prior attitudes on an issue became less extreme after reading an argument on the issues in a disfluent format. The change occurred for both naturally occurring attitudes (i.e. political ideology) and experimentally assigned attitudes (i.e. positivity toward a court defendant). Importantly, disfluency did not reduce confirmation biases when participants were under cognitive load, suggesting that cognitive resources are necessary to overcome these biases. Overall, these results suggest that changing the style of an argument’s presentation can lead to attitude change by promoting more comprehensive consideration of opposing views.

    Keywords Attitude change, Confirmation bias, Fluency, Persuasion

  • Martina Jakesch, Helmut Leder, and Michael Forster, “Image Ambiguity and Fluency,” PLoS ONE, vol. 8, no. 9, 09/2013 2013, pp. e74084. DOI: 10.1371/journal.pone.0074084. http://dx.doi.org/10.1371%2Fjournal.pone.0074084.

    Abstract Ambiguity is often associated with negative affective responses, and enjoying ambiguity seems restricted to only a few situations, such as experiencing art. Nevertheless, theories of judgment formation, especially the “processing fluency account”, suggest that easy-to-process (non-ambiguous) stimuli are processed faster and are therefore preferred to (ambiguous) stimuli, which are hard to process. In a series of six experiments, we investigated these contrasting approaches by manipulating fluency (presentation duration: 10ms, 50ms, 100ms, 500ms, 1000ms) and testing effects of ambiguity (ambiguous versus non-ambiguous pictures of paintings) on classification performance (Part A; speed and accuracy) and aesthetic appreciation (Part B; liking and interest). As indicated by signal detection analyses, classification accuracy increased with presentation duration (Exp. 1a), but we found no effects of ambiguity on classification speed (Exp. 1b). Fifty percent of the participants were able to successfully classify ambiguous content at a presentation duration of 100 ms, and at 500ms even 75% performed above chance level. Ambiguous artworks were found more interesting (in conditions 50ms to 1000ms) and were preferred over non-ambiguous stimuli at 500ms and 1000ms (Exp. 2a - 2c, 3). Importantly, ambiguous images were nonetheless rated significantly harder to process as non-ambiguous images. These results suggest that ambiguity is an essential ingredient in art appreciation even though or maybe because it is harder to process.

  • Frank Jansen, and Daniel Janssen, “Uw reservering is eh komen te vervallen - Experimenteel onderzoek naar het effect van gevulde pauzes in voicemails met slecht nieuws,” Tijdschrift voor Taalbeheersing, vol. 35, no. 3, December 2013, pp. 237-253. DOI: 10.5117/TVT2013.3.JANS. https://www.ingentaconnect.com/content/aup/tt/2013/00000035/00000003/art00003.

    Abstract This article presents the results of three experiments in which the influence of the pause eh in bad news voicemails is studied on the hearer evaluation. Based on the politeness theory of Brown & Levinson (1987) we expect that eh will facilitate the hearer’s acceptance of the bad news. The addition of eh turns out to have a positive effect on the attributed relational qualities of the speaker of the voice mail. On the other hand, his attributed communicative professionalism is rated lower. One of the two potential explanations for these results is that eh causes some delay in the presentation of the bad news itself, thereby triggering the hearer’s suspicion that really very bad news is forthcoming. Against this expectation the eventual bad news is not that bad. The experimental evidence does not support this hypothesis. Therefore the alternative hypothesis, eh signals the speaker’s difficulty to communicate the message, which in turn makes him more empathic, becomes highly probable.

    Keywords communicative professionalism, empathy, filled pause, hearer’s evaluation, politeness

  • Tyler Kendall, Speech Rate, Pause and Sociolinguistic Variation. Basingstoke: Palgrave Macmillan.2013. DOI: 10.1057/9781137291448.0001. http://www.palgrave.com/page/detail/speech-rate-pause-and-sociolinguistic-variation-tyler-kendall/?isb=9780230249776.

    Abstract Speech Rate, Pause, and Sociolinguistic Variation examines the confluence of psycholinguistic factors and social factors in linguistic variation through corpus-based analyses of speech rate and silent pause in US English. In particular, based on a large amount of data extracted from a wide range of sociolinguistic interview recordings, it demonstrates the great extent to which articulation rates are correlated with social factors of speakers (such as regional origin and sex) while pause durations are less so. Through the development of new quantitative techniques, it considers the cognitive importance of variability in pauses and highlights new ways that speech features like these can be used to help understand the production of sociolinguistic variables. With detailed discussions of its data and methods, and with a helpful accompanying website, it makes a valuable guide for conducting one’s own corpus (socio)phonetic research.

  • Hanae Koiso, and Yasuharu Den, “Acoustic and linguistics features related to speech planning appearing at weak clause boundaries in Japanese monologs,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 37-40. http://www.isca-speech.org/archive/diss_2013/papers/dis6_037.pdf.

    Abstract In this paper, we focus on weak clause boundaries in Japanese monologs in order to investigate the relationship of the length of constituents following weak boundaries to three acoustic and linguistic features: 1) occurrence rate of fillers, 2) occurrence rate of boundary pitch movements, and 3) degree of lengthening of clause-final morae. We found that all these features were significantly correlated with the length of following constituents. Most importantly, boundary pitch movements had an additional effect that can be distinct from the effect of clause-final lengthening. These results suggest that Japanese speakers have earlier-occurring items that help them deal with cognitive load in speech planning, in addition to fillers and other clause-initial disfluencies.

    Keywords boundary pitch movements, clause-final lengthening, DiSS, fillers, Japanese monologs

  • Kikuo Maekawa, “Prediction of F0 height of filled pauses in spontaneous Japanese: a preliminary report,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 41-44. http://www.isca-speech.org/archive/diss_2013/papers/dis6_041.pdf.

    Abstract F0 values of filled pauses (FP) in the Corpus of Spontaneous Japanese were analyzed to examine the mechanism by which the F0 heights of FP were determined. Statistical analyses of the F0 values of FP occurring in between two full-fledged accentual phrases (AP) revealed correspondence between the occurrence timing of FP and the F0 height. Based upon this finding, 5 models of F0 prediction were proposed. Comparison of the mean prediction errors revealed that the best prediction was obtained in a model that linearly interpolate the phrase-final L% tone of the immediately preceding AP and the phrase-initial %L tone of the immediately following AP. This finding suggests that the F0 of FP was specified at the level of phonetic realization rather than phonological prosodic representation.

    Keywords DiSS

  • Takehiko Maruyama, “Analysis of parenthetical clauses in spontaneous Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 45-48. http://www.isca-speech.org/archive/diss_2013/papers/dis6_045.pdf.

    Abstract In this paper, I will discuss the functional aspects of parenthetical clauses and sentences in spontaneous Japanese monologues. Parentheticals can be defined as syntactic elements that are instantly inserted in the middle of an ongoing utterance to add supplemental information and thus interrupts the fluent flow of speech production. Examples of parenthetical clauses/sentences that appeared in the Corpus of Spontaneous Japanese were examined and then classified into three types. These types differ in their contextual functions, but share a commonality in that they present multiplex information simultaneously in the process of producing spontaneous speech.

    Keywords contextual functions, Corpus of Spontaneous Japanese, DiSS, parenthetical clause/sentence

  • Helena Moniz, Fernando Batista, Isabel Trancoso, and Ana Isabel Mata, “Automatic structural metadata identification based on multilayer prosodic information,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 49-52. http://www.isca-speech.org/archive/diss_2013/papers/dis6_049.pdf.

    Abstract This paper discriminates different types of structural metadata in transcripts of university lectures: boundary events (comma, full stops and interrogatives), and disfluencies (repair). The disambiguation process is based on predefined multilayered linguistic information and on its hierarchical structure. Since boundary events may share similar linguistic properties, in terms of F0 and energy slopes, presence/absence of silent pauses, and duration of different units of analysis, different classification methods based on a set of automatically derived prosodic features have been applied to differentiate between those events and disfluencies. This paper also performs a detailed analysis on the impact of each individual feature in discriminating each structural event. The results of our data-driven approach allow us to reach a structured set of basic features towards the disambiguation of metadata events. These results are a step forward towards the analysis of speech acts and their disambiguation from disfluencies.

    Keywords automatic speech processing, disfluencies, DiSS, speech prosody, structural metadata

  • Rena Nemoto, “Which kind of hesitations can be found in Estonian spontaneous speech?,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 53-54. http://www.isca-speech.org/archive/diss_2013/papers/dis6_053.pdf.

    Abstract This paper describes the acoustic characteristics of hesitations in Estonian spontaneous speech. We especially investigate duration, fundamental frequency, and first two formant analyses. Most frequent hesitations can be expressed by lengthened phonemes such as /ää/, /ee/, /õõ/, and /mm/. We compare lengthened phoneme hesitations with their related phonemes. The results from our preliminary hesitation study show (i) hesitations have longer duration and its range is spread; (ii) hesitations globally include lower pitch; (iii) hesitation formants are likely to be centralized or posterior and opened in comparison with related phonemes.

    Keywords DiSS, Estonian, hesitation, spontaneous speech

  • Sieb Nooteboom, and Hugo Quené, “Self-monitoring as reflected in identification of misspoken segments,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 55-57. http://www.isca-speech.org/archive/diss_2013/papers/dis6_055.pdf.

    Abstract Most segmental speech errors probably are articulatory blends of competing segments. Perceptual consequences were studied in listeners’ reactions to misspoken segments. 291 speech fragments containing misspoken initial consonants plus 291 correct control fragments, all stemming from earlier SLIP experiments, were presented for identification to listeners. Results show that misidentifications (i.e. deviations from an earlier auditory transcription) are rare (3%), but reaction times to correctly identified fragments systematically reflect differences between correct controls, undetected, early detected and late detected speech errors, leading to the following speculative conclusions: (1) segmental errors begin their life in inner speech as full substitutions, and competition with correct target segments often is slightly delayed; (2) in early interruptions speech is initiated before competing target segments are activated, but then rapidly interrupted after error detection; (3) late detected errors reflect conflict-based monitoring of articulation or monitoring overt speech.

    Keywords DiSS

  • Klim Peshkov, Laurent Prévot, Stéphane Rauzy, and Berthille Pallaud, “Catogorizing syntactic chunks for marking disfluent speech in French language,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 59-62. http://www.isca-speech.org/archive/diss_2013/papers/dis6_059.pdf.

    Abstract Disfluency is the first phenomenon one has to address when processing spontaneous speech. Efficient systems combining transcription-based and signal-based cues have been created for English. These systems generally use supervised machine learning models, trained over large annotated datasets combining signal and transcription. As for other languages, including French, the situation is complicated by the lack of resources. A few proposals based on filled pauses, truncated words and repetitions have been made for identifying disfluencies in French. In this paper, we propose a transcription-based approach to this task, with high-quality morpho-syntactic tags as input for identifying disfluent areas. Originally, we adopted a transcription-based approach for obtaining an independent way of characterizing disfluencies. This can be later compared and combined with prosodic cues. Our method consists in building syntactic chunks from our tagging and then classify these chunks into several categories, some of them being considered as disfluent. We apply our method to speaker style characterization, discourse genres zoning, as well as to dataset cleaning. Finally, an attempt is made to relate our disfluent chunks to a more standard description of disfluencies in order to open the way of a deeper integration of our work with the one of the disfluency community.

    Keywords chunking, disfluencies, DiSS, speaking style, tagging, transcription-based approach

  • Jorge Proença, Dirce Celorico, Arlindo Veiga, Sara Candeias, and Fernando Perdigão, “Acoustical characterization of vocalic fillers in European Portuguese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 63-66. http://www.isca-speech.org/archive/diss_2013/papers/dis6_063.pdf.

    Abstract This study attempts to acoustically characterize the most common filled pause vocalizations (or vocalic fillers) in spontaneous speech in European Portuguese: the near-open central vowel [ɐ] and the mid-central vowel [ə]. For this purpose we analyzed the spectral information of the vocalic fillers by estimating their first two formant frequencies as well as their duration properties. The vocalic fillers are taken from a large corpus of European Portuguese broadcast news’ speech. We also compared the vocalic fillers with lexical vowels possessing similar timbre. No formant variation trend was attained for the vocalic fillers and a great overlap of formant values is observed. These results provide a base of information for understanding the most common vocalic fillers in European Portuguese spontaneous speech.

    Keywords DiSS, filled pauses, formant estimation, hesitations, spontaneous speech, vocalic fillers

  • Ralph L. Rose, “Crosslinguistic Corpus of Hesitation Phenomena: A Corpus for Investigating First and Second Language Speech Performance,” in INTERSPEECH 2013, Lyon, France, 08/2013 2013, pp. 992-996. http://www.isca-speech.org/archive/interspeech_2013/i13_0992.html.

    Abstract There is a growing consensus that there is a need to evaluate second language speech performance with respect to first language speech behavior. To support this need, the Crosslinguistic Corpus of Hesitation Phenomena was developed. This freely available corpus is designed to investigate the crosslinguistic influence of speech patterns and consists of recordings of speakers producing first and second language speech samples in response to parallel elicitation tasks in each language. Preliminary results from the corpus are consistent with other findings that second language performance is sometimes correlated with first language speech behavior. In particular, findings show that silent pause rate and duration as well as other hesitation phenomena correlate with first language performance while speech rate does not. Interestingly, repeats also differ from first language production. Results show that the corpus may be a useful tool for researchers who wish to investigate the correspondence between first and second language speech, particularly with respect to the use of hesitation phenomena.

    Keywords corpus, hesitation phenomena, second language speech

  • Vered Silber-Varod, and Takehiko Maruyama, “The linguistic role of hesitation disfluencies: evidence from Hebrew and Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 67-70. http://www.isca-speech.org/archive/diss_2013/papers/dis6_067.pdf.

    Abstract In this paper we examine a certain aspect of prosodysyntax interface, that of hesitation disfluencies (HD) that occur intra-phrases or intra-morphemes. Such cases were found in two spontaneous corpora of two syntactically distinct languages – Israeli Hebrew (IH) and Japanese. It was found that intra-phrasal hesitations in the two languages calls for different explanations, since in Japanese the noun (e.g., in NP) precedes the case marking particle while in IH the preposition (e.g., in PP) precedes the noun. In this paper we will present qualitative findings and suggest a unified view of the phenomenon of intra-phrasal HDs.

    Keywords DiSS, hesitation disfluency, Israeli Hebrew, Japanese, prosody-syntax interface

  • Michiko Watanabe, “Phrasal complexity and the occurrence of filled pauses in presentation speeches in Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 71-72. http://www.isca-speech.org/archive/diss_2013/papers/dis6_071.pdf.

    Abstract Filled pauses are ubiquitous in everyday speech. I investigated whether linguistic complexity of upcoming phrases affects filler rate at phrase boundaries in presentation speeches in Japanese. Filler rate at phrase boundaries increased monotonically with complexity of the following phrases. However, when the following phrase was composed of more than 11 Bunsetsu-phrases, the filler rate did not show any constant increase. The results indicate that filler rate at phrase boundaries is closely related to cognitive load of local linguistic encoding and that the maximum planning span for linguistic encoding is about 10 Bunsetsu-phrases in Japanese monologues.

    Keywords bunsetsu-phrase, DiSS, filled pause, linguistic complexity, planning load

  • Charlotte Wollermann, Eva Lasarcyk, Ulrich Schade, and Bernhard Schröder, “Disfluencies and uncertainty perception - evidence from a human-machine scenario,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 73-76. http://www.isca-speech.org/archive/diss_2013/papers/dis6_073.pdf.

    Abstract This paper deals with the modelling and perception of disfluencies in articulatory speech synthesis. The stimuli are embedded into short dialogues in question-answering situations in a human–machine scenario. The system is supposed to express uncertainty in the answer. We test the influence of delay, intonation, and filler as prosodic indicators of uncertainty on perception in two studies. Study 1 deals with the effect of delay and filler on uncertainty perception. Results suggest an additive effect of the cues, i.e. the activation of both prosodic cues of uncertainty has a stronger impact on uncertainty perception than the deactivation of a single cue or of both cues. With respect to the effect of single cues, no significant difference can be observed. Study 2 investigates the impact of delay and intonation on perceived uncertainty. Again, a principle of additivity can be observed. Furthermore as modelled here, intonation has a stronger influence than delay. In both studies no correlation between the ranking of uncertainty and naturalness of the stimuli is found.

    Keywords disfluencies, DiSS, speech perception, speech synthesis, uncertainty

  • Luke Jai Wood, Kerstin Dautenhahn, Austen Rainer, Ben Robins, Hagen Lehmann, and Dag Sverre Syrdal, “Robot-Mediated Interviews - How Effective Is a Humanoid Robot as a Tool for Interviewing Young Children?,” PLoS ONE, vol. 8, no. 3, 03/2013 2013, pp. e59448. DOI: 10.1371/journal.pone.0059448. http://dx.doi.org/10.1371%2Fjournal.pone.0059448.

    Abstract Robots have been used in a variety of education, therapy or entertainment contexts. This paper introduces the novel application of using humanoid robots for robot-mediated interviews. An experimental study examines how children’s responses towards the humanoid robot KASPAR in an interview context differ in comparison to their interaction with a human in a similar setting. Twenty-one children aged between 7 and 9 took part in this study. Each child participated in two interviews, one with an adult and one with a humanoid robot. Measures include the behavioural coding of the children’s behaviour during the interviews and questionnaire data. The questions in these interviews focused on a special event that had recently taken place in the school. The results reveal that the children interacted with KASPAR very similar to how they interacted with a human interviewer. The quantitative behaviour analysis reveal that the most notable difference between the interviews with KASPAR and the human were the duration of the interviews, the eye gaze directed towards the different interviewers, and the response time of the interviewers. These results are discussed in light of future work towards developing KASPAR as an ‘interviewer’ for young children in application areas where a robot may have advantages over a human interviewer, e.g. in police, social services, or healthcare applications.

2012

  • Hans Rutger Bosker, Anne-France Pinget, Hugo Quené, Ted Sanders, and Nivja H. de Jong, “What makes speech sound fluent? The contributions of pauses, speed and repairs,” Language testing, vol. 30, no. 2, 04/2013 2012, pp. 159-175. DOI: 10.1177/0265532212455394. http://ltj.sagepub.com/content/30/2/159.

    Abstract The oral fluency level of an L2 speaker is often used as a measure in assessing language proficiency. The present study reports on four experiments investigating the contributions of three fluency aspects (pauses, speed and repairs) to perceived fluency. In Experiment 1 untrained raters evaluated the oral fluency of L2 Dutch speakers. Using specific acoustic measures of pause, speed and repair phenomena, linear regression analyses revealed that pause and speed measures best predicted the subjective fluency ratings, and that repair measures contributed only very little. A second research question sought to account for these results by investigating perceptual sensitivity to acoustic pause, speed and repair phenomena, possibly accounting for the results from Experiment 1. In Experiments 2–4 three new groups of untrained raters rated the same L2 speech materials from Experiment 1 on the use of pauses, speed and repairs. A comparison of the results from perceptual sensitivity (Experiments 2–4) with fluency perception (Experiment 1) showed that perceptual sensitivity alone could not account for the contributions of the three aspects to perceived fluency. We conclude that listeners weigh the importance of the perceived aspects of fluency to come to an overall judgment.

    Keywords Fluency perception, pauses, perceptual sensitivity, repair, speed

  • Troy Cox, and Wendy Baker-Smemoe, “The relationship between L1 fluency and L2 fluency across different proficiency levels and L1s,” November 2012. https://nivjadj.wixsite.com/workshopfluentspeech/coxandsmemoe/c1gv7.

    Abstract Our understanding of oral temporal fluency (i.e., speech rate, pauses, and hesitations) in a second language (L2) has increased greatly in the past several years, along with our understanding of its relationship to overall proficiency, language processing, and automaticity (i.e., Brand & Götz, 2011; Segalowitz, 2007). However, the role of the speaker’s fluency in their native language (L1) on L2 fluency is still not understood. Few studies have examined this relationship, and these studies have examined few L1/L2 relationships across few proficiency levels (Scanlon, 1987; Derwing et al., 2009). Thus, the influence of L1 fluency on L2 fluency development is still unclear. The purpose of this study is to determine the effect of native language (L1) fluency and L2 proficiency level on features of L2 temporal fluency. Over one hundred English as a second language (ESL) students participated from five L1 backgrounds (Chinese, Japanese, Korean, Spanish, Portuguese) and 9 proficiency levels (novice high to advanced high on the ACTFL scale). Participants were asked to describe 4 pictures stories, 2 in their L1 and 2 in their L2. Several fluency measures including unfilled pauses, speech rate, and articulation rate were analyzed using the Praat script described in de Jong and Wempe (2007). These fluency measures in the L1 were compared to those in the L2. The results of this analysis revealed that all features were highly correlated across the two languages, that these correlations were stronger for lower than higher proficiency speakers, and that differences in the number and type of pauses, as well as speaking rate, differed across L1s. These results suggest that fluency reveals more than processing constraints aggregated by learning an L2, and suggest that measuring L1 fluency is important in any investigation of L2 fluency.

  • Nivja De Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, and Jan H. Hulstijn, “Facets of Speaking Proficiency,” Studies in Second Language Acquisition, vol. 34, no. 1, March 2012, pp. 5-34. DOI: 10.1017/S0272263111000489.

    Abstract This study examined the componential structure of second-language (L2) speaking proficiency. Participants—181 L2 and 54 native speakers of Dutch—performed eight speaking tasks and six tasks tapping nine linguistic skills. Performance in the speaking tasks was rated on functional adequacy by a panel of judges and formed the dependent variable in subsequent analyses (structural equation modeling). The following independent variables were assessed separately: linguistic knowledge in two tests (vocabulary and grammar); linguistic processing skills (four reaction time measures obtained in three tasks: picture naming, delayed picture naming, and sentence building); and pronunciation skills (speech sounds, word stress, and intonation). All linguistic skills, with the exception of two articulation measures in the delayed picture naming task, were significantly and substantially related to functional adequacy of speaking, explaining 76% of the variance. This provides substantial evidence for a componential view of L2 speaking proficiency that consists of language-knowledge and language-processing components. The componential structure of speaking proficiency was almost identical for the 40% of participants at the lower and the 40% of participants at the higher end of the functional adequacy distribution (n = 73 each), which does not support Higgs and Clifford’s (1982) relative contribution model, predicting that, although L2 learners become more proficient over time, the relative weight of component skills may change.

  • Ian R. Finlayson, and Martin Corley, “Disfluency in dialogue: an intentional signal from the speaker?,” Psychonomic Bulletin & Review, vol. 19, no. 5, October 2012, pp. 921-928. DOI: 10.3758/s13423-012-0279-x. https://link.springer.com/article/10.3758/s13423-012-0279-x.

    Abstract Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use disfluency in dialogue to manage listeners’ expectations? To address this question, we present two experiments investigating the production of disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue.

  • Jordi Adell, David Escudero, and Antonio Bonafonte, “Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence,” Speech Communication, vol. 54, no. 3, 2012, pp. 459-476. DOI: http://dx.doi.org/10.1016/j.specom.2011.10.010. http://www.sciencedirect.com/science/article/pii/S0167639311001580.

    Abstract Until now, speech synthesis has mainly involved reading-style speech. Today, however, text-to-speech systems must provide a variety of styles because users expect these interfaces to do more than just read information. If synthetic voices must be integrated into future technology, they must simulate the way people talk instead of the way people read. Existing knowledge about how disfluencies occur has made it possible to propose a general framework for synthesising disfluencies. We propose a model based on the definition of disfluency and the concept of underlying fluent sentences. The model incorporates the parameters of standard prosodic models for fluent speech with local modifications of prosodic parameters near the interruption point. The constituents of the local models for filled pauses are derived from the analysis corpus, and constituent’s prosodic parameters are predicted via linear regression analysis. We also discuss the implementation details of the model when used in a real speech synthesis system. Objective and perceptual evaluations showed that the proposed models outperformed the baseline model. Perceptual evaluations of the system showed that it is possible to synthesise filled pauses without decreasing the overall naturalness of the system, and users stated that the speech produced is even more natural than the one produced without filled pauses.

    Keywords Perceptual evaluation

  • Ralph L. Rose, “On the lexical status of filled pauses: Seeing ’uh’ and ’um’ as words,” 2012.

    Abstract Filled pauses (FPs: e.g., English uh/um, Japanese e-(to)) occur frequently in everyday communication. However, the exact linguistic status of FPs has been the subject of some debate. Some researchers have argued that FPs are words, with the same lexical status as such interjections as well or oh (Clark and Fox Tree 2002), or at least word-like in that they can be used in a controlled fashion (Villar et al 2012). However, others have argued that the evidence is inconclusive and that FPs can be regarded as resulting automatically from cognitive processes (Corley and Stewart 2008). I argue that FPs are words based on facts showing the systematic and distinctive use of FPs in speech corpora (Kjellmer, 2003), and particularly in a corpus of blog writings (Rose 2011). Evidence from these corpora show that FPs are used, among other ways, to highlight unexpected or unusual words and phrases (e.g., "Jan Wenner’s famous pub has gone, um, gaga for [Lady] Gaga.").

  • Gina Villar, Joanne Arciuli, and David Mallard, “Use of "um" in the deceptive speech of a convicted murderer,” Applied Psycholinguistics, vol. 33, no. 1, January 2012, pp. 83-95. DOI: 10.1017/S0142716411000117.

    Abstract Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data revealed significantly fewer instances of "um" in deceptive speech. These results are in line with our recent study of "um" in laboratory elicited low-stakes lies. Rather than constituting a filled pause or speech disfluency, "um" may have a lexical status similar to other English words and may be under the strategic control of the speaker. In an attempt to successfully deceive, humans may alter their speech, perhaps in order to avoid certain language behaviors that they think might give them away.

2011

  • Karin Aijmer, “"Well I’m not sure I think…" The use of "well" by non-native speakers,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 231-254. DOI: 10.1075/ijcl.16.2.04aij.

    Abstract Pragmatic markers are an important part of the grammar of conversation and not simply markers of disfluency. They have a number of functions that help the speaker to organise the conversation and to express feelings and attitudes. Advanced EFL learners use frequent pragmatic markers such as well. However their use of well diverges from the native speaker norm. The present study uses data from the Swedish component of the LINDSEI corpus and its native speaker counterpart (LOCNEC) to examine similarities and differences between native and non-native speakers. The overall picture is that Swedish learners overuse well, although there are considerable individual differences. Thus learners use well above all as a fluency device to cope with speech management problems but underuse it for attitudinal purposes. Pragmatic markers cannot be taught in the same way as other lexical items but it is important to discuss how and where they are used.

    Keywords language teaching, learner corpora, non-native speaker, pragmatic marker, well

  • Christiane Brand, and Sandra Götz, “Fluency versus accuracy in advanced spoken learner language: A multi-method approach,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 255-275. DOI: 10.1075/ijcl.16.2.05bra.

    Abstract In this paper we present a possible multi-method approach towards the description of a potential correlation between errors and temporal variables of (dys-)fluency in spoken learner language. Using the German subcorpus of the Louvain International Database of Spoken English Interlanguage (LINDSEI) and the native control corpus Louvain Corpus of Native English Conversation (LOCNEC), we first analysed errors and temporal variables of fluency quantitatively. We detected lexical and grammatical categories which are especially error-prone as well as problematic aspects of fluency for all learners in the LINDSEI subcorpus, e.g. confusion in tense agreement across clauses or an overuse of unfilled pauses. In the ensuing qualitative analysis of five prototypical learners, no trend for a possible correlation of accuracy and fluency could be observed. Fifty native speakers’ ratings of these five learners revealed that the learner with an average performance across the investigated variables received the highest ratings for overall oral proficiency.

    Keywords accuracy, error analysis, errors, Fluency, learner corpus, LINDSEI

  • Martin Corley, and Robert J. Hartsuiker, “Why Um Helps Auditory Word Recognition: The Temporal Delay Hypothesis,” PLOS ONE, vol. 6, no. 5, 05 2011, pp. 1-6. DOI: 10.1371/journal.pone.0019792.

    Abstract Several studies suggest that speech understanding can sometimes benefit from the presence of filled pauses (uh, um, and the like), and that words following such filled pauses are recognised more quickly. Three experiments examined whether this is because filled pauses serve to delay the onset of upcoming words and these delays facilitate auditory word recognition, or whether the fillers themselves serve to signal upcoming delays in a way which informs listeners' reactions. Participants viewed pairs of images on a computer screen, and followed recorded instructions to press buttons corresponding to either an easy (unmanipulated, with a high-frequency name) or a difficult (visually blurred, low-frequency) image. In all three experiments, participants were faster to respond to easy images. In 50% of trials in each experiment, the name of the image was directly preceded by a delay; in the remaining trials an equivalent delay was included earlier in the instruction. Participants were quicker to respond when a name was directly preceded by a delay, regardless of whether this delay was filled with a spoken um, was silent, or contained an artificial tone. This effect did not interact with the effect of image difficulty, nor did it change over the course of each experiment. Taken together, our consistent finding that delays of any kind help word recognition indicates that natural delays such as fillers need not be seen as ‘signals’ to explain the benefits they have to listeners' ability to recognise and respond to the words which follow them.

  • Nivja De Jong, “Cross-linguistic differences in pausing behavior,” December 2011. https://mirjamernestus.nl/Ernestus/public/AbstractsWorkshop2011.pdf.

    Abstract Pauses in speech can serve communicative means, to help listeners understand (Clark, 1994), and pauses can be due to cognitive factors, when a speaker has not finished planning and formulating the upcoming utterance (Howell & Au-Yeung, 2002). In theories of speech production, lexical concepts are seen as the basic units of planning. If this holds for all languages, one would predict that for an agglutinative language such as Turkish, units of planning can be larger than for a non-agglutinative language such as English. Following this reasoning, speakers of Turkish would have fewer opportunities to pause than speakers of English. This hypothesis is tested by comparing speech data of Turkish and English native speakers. Twenty-four Turkish speakers and twenty-nine English speakers performed eight speaking tasks. These tasks were long turns in simulated conversation. In total, nine hours of Turkish and English speech were annotated, adding information about frequency and duration of silent pauses (as well as other hesitation phenomena). The results showed that Turkish words are indeed longer in number of syllables and in duration. Furthermore, speakers hardly paused within words, confirming the hypothesis that lexical items form the basis of units-of-speech. Finally, Turkish speakers paused less often than English speakers, but when they paused the duration of these pauses was longer. In total, percentage of time spent pausing did not differ for the Turkish and English speakers. We conclude that usage of pauses due to cognitive factors is dependent on typological features of languages, leading to cross-linguistic differences in pausing behavior.

  • Tyko Dirksmeyer, “Lexical hesitation marking in Chintang: Evidence for fillers as words,” December 2011. https://mirjamernestus.nl/Ernestus/public/AbstractsWorkshop2011.pdf.

    Abstract The status of hesitation markers (or ‘fillers’, ‘filled pauses’, ‘editing expressions’, etc. — such as uh(m) in English) has been fiercely disputed in various subdisciplines of the language sciences over the past decades. | Should these items be viewed as aberrations in performance that need to be excluded from linguistic analysis (e.g. Chomsky 1965), are they symptoms of speech production processes that signal trouble but do not signify anything beyond that (Goldman-Eisler 1968; Levelt 1989), or are they actively employed as communicative means just like other words are (Clark and Fox Tree 2002; Jefferson 1974; Schegloff 2010), and thus form an integral part of language? | Chintang, a Tibeto-Burman language spoken in two villages in Nepal, provides evidence for the latter view. Its principal hesitation marker me~ı occurs in the same range of functional environments — word search, self-repair, prefacing dispreferred turns, among others — in which uh(m) appears in English (and similar forms feature in other wellknown languages). Yet, me~ı demonstrably conforms to standard phonological, morphosyntactic and semantic criteria for wordhood, can be seamlessly integrated into utterances, and is regularly exploited for communicative purposes such as "floor management" and projecting what to expect next. | In this talk, I will review data drawn from a corpus of video-recorded naturallyoccurring conversational interaction in Chintang and argue for the profoundly conventional nature of hesitation marking with me~ı. The findings from this small, as-yet-understudied speech community indicate that fillers should indeed be treated as lexical items on a par with other words. Consequently, they call on linguistic theorizing not only to take hesitation marking and its communicative functions in conversational speech seriously, but also to embrace and incorporate typological diversity in order to arrive at truly generalizable models of language processing.

  • Gaëtanelle Gilquin, and Sylvie De Cock, “Errors and disfluencies in spoken corpora: Setting the scene,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 141-172. DOI: 10.1075/ijcl.16.2.01gil.

    Abstract (none)

  • John Osborne, “Fluency, complexity and informativeness in native and non-native speech,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 276-298. DOI: 10.1075/ijcl.16.2.06osb.

    Abstract Individual speakers vary considerably in their rate of speech, their syntactic choices, and the organization of information in their discourse. This study, based on a corpus of monologue productions from native and non-native speakers of English and French, examines the relations between temporal fluency, syntactic complexity and informational content. The purpose is to identify which features, or combinations of features, are common to more fluent speakers, and which are more idiosyncratic in nature. While the syntax of fluent speakers is not necessarily more complex than that of less fluent speakers, it is suggested that they are able to deliver content more efficiently through a combination of less hesitant speech and of lexical and syntactic choices that allow them to package information more economically.

    Keywords Fluency, information content, learner corpora, lexical bundles, syntactic complexity

  • Anne-France Pinget, “Native Speakers’ Perceptions of Fluency and Accent in L2 Speech,” Master's Thesis, Utrecht University, Utrecht, the Netherlands, . June 2011. http://igitur-archive.library.uu.nl/student-theses/2011-0816-200626/UUindex.html.

    Abstract The goal of this study is threefold. It is aimed at exploring (i) the relationship between objective properties of speech and perceived fluency, (ii) the relationship between segmental characteristics of speech and perceived accent, and (iii) the relationship between fluency and accent. We collected 90 speech samples from Turkish and English L2 learners of Dutch. Objective measures of fluency and accent were made for each sample. Forty untrained native speakers of Dutch rated the samples for fluency and accentedness. The results showed that the temporal measures of fluency were good predictors of fluency ratings, and that their predictive power depends on the type of measures used (i.e. traditional measures per time units, measures per information units, measures that take the L1 into consideration). Furthermore, the segmental measure of accent could predict a small part of accent ratings. Finally, perceived fluency and accent appeared to be weakly correlated, but objective measures of fluency and accent did not add additional explanatory power to the models of perceived accent and perceived fluency respectively.

    Keywords accent, Fluency, perception, second language acquisition

  • Ralph L. Rose, “Filled Pauses in Writing: What can they Teach us about Speech?,” December 2011. https://mirjamernestus.nl/Ernestus/public/AbstractsWorkshop2011.pdf.

    Abstract This presentation reports on a research effort to use filled pauses ('uh', 'um': hereafter, FPs) in blog writings to better understand how and why speakers use them in spontaneous speech. Blog FPs are written intentionally and cannot be the result of some linguistic processing shortcoming (i.e., speech-repair as in Levelt, 1983). Hence, if written FPs can be accurately characterized, then the spoken FPs that fit this characterization can be removed from consideration leaving a smaller, potentially more uniform set of other FPs for further study. | Samples of FPs in blog writings were gathered from 100 top blogs. Samples of FPs in spontaneous speech were taken from the Switchboard corpus. A balanced sample of 227 FPs were gathered of each type. Each FP was categorized according to its medium (written or spoken), its location (at clause boundary or clause-internal), the part-of-speech of the immediately following word (content or function, following Maclay and Osgood's 1959 classification), and the FP type (open 'uh' or closed 'um', after Rose, 1998). The data was analyzed under a generalized linear model with chi-square tests. | There was a main effect of FP Type (Chi-square=48.4, p<0.001) with a ratio of open to closed FPs of approximately 2:1. This is comparable to previous studies (e.g., Rose, 1998). There were no other main effects. There was an interaction between medium and following word type (Chi-square=37.0, p<0.001), as well as between medium and FP type (Chisquare=5.4, p<0.05). In the spoken medium, the following word was 30% more likely to be a function word than a content word, while in the written medium, this trend reversed: the following word was 70% more likely to be a content word than a function word. Also, in the spoken medium, the ratio of open to closed FPs was almost 3:1, but in the written medium, this ratio dropped to 1.4:1. | Results from FPs in writing suggest a hybrid view of FPs in speech: Some FPs are used intentionally and with some selectional restrictions (i.e., before content words) in order to serve some pragmatic function (cf., filler-as-word hypothesis in Clark and Fox Tree, 2002), with open FPs being slightly preferred in this role. Other FPs in speech are the result of difficulties during linguistic processing and occur semi-automatically as part of speech repair (cf., Levelt, 1983).

  • Christoph Rühlemann, Andrej Bagoutdinov, and Matthew Brook O’Donnell, “Windows on the mind: Pauses in conversational narrative,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 198-230. DOI: 10.1075/ijcl.16.2.03ruh.

    Abstract This paper investigates four different types of pauses in conversational narrative: the filled pauses er and erm, and short and long silent pauses. The study is based on the Narrative Corpus (NC), a recently created corpus of everyday narratives. The texts, which include both the narrative and some context, have been annotated for important textual components. The current analysis reveals that pauses are more frequent in conversational narrative than in general conversation. We suggest three factors that account for this high frequency: (i) the need for narrators, in the opening utterance of the story, to provide specific information to orient listeners to the situation in which the events unfolded, (ii) the need to coordinate narrative clauses to match the story events, and (iii) the preference of narrators to present speech, thought, emotion and gesture using direct-mode discourse presentation, which is more "dramatic" but also more costly in terms of reference resolution.

    Keywords discourse presentation, narrative, narrative corpus, pauses, quotatives, Reference

  • Scott H. Fraundorf, and Duane G. Watson, “The disfluent discourse: Effects of filled pauses on recall,” Journal of Memory and Language, vol. 65, no. 2, 2011, pp. 161-175. DOI: http://dx.doi.org/10.1016/j.jml.2011.03.004. http://www.sciencedirect.com/science/article/pii/S0749596X11000234.

    Abstract We investigated the mechanisms by which fillers, such as uh and um, affect memory for discourse. Participants listened to and attempted to recall recorded passages adapted from Alice’s Adventures in Wonderland. The type and location of interruptions were manipulated through digital splicing. In Experiment 1, we tested a processing time account of fillers’ effects. While fillers facilitated recall, coughs matched in duration to the fillers impaired recall, suggesting that fillers’ benefits cannot be attributed to adding processing time. In Experiment 2, fillers’ locations were manipulated based on norming data to be either predictive or non-predictive of upcoming material. Fillers facilitated recall in both cases, inconsistent with an account in which listeners predict upcoming material using past experience with the distribution of fillers. Instead, these results suggest an attentional orienting account in which fillers direct attention to the speech stream but do not always result in specific predictions about upcoming material.

    Keywords Language comprehension

  • Parvaneh Tavakoli, “Pausing patterns: differences between L2 learners and native speakers,” ELT Journal, vol. 65, no. 1, May 2011, pp. 71-79. DOI: 10.1093/elt/ccq020.

    Abstract This paper reports on a comparative study of pauses made by L2 learners and native speakers of English while narrating picture stories. The comparison is based on the number of pauses and total amount of silence in the middle and at the end of clauses in the performance of 40 native speakers and 40 L2 learners of English.1 The results of the quantitative analyses suggest that, although the L2 learners generally pause more repeatedly and have longer periods of silence than the native speakers, the distinctive feature of their pausing pattern is that they pause frequently in the middle of clauses rather than at the end. The qualitative analysis of the data suggests that some of the L2 learners’ mid-clause pauses are associated with processes such as replacement, reformulation, and online planning. Formulaic sequences, however, contain very few pauses and therefore appear to facilitate the learners’ fluency.

  • Gunnel Tottie, “"Uh" and "Um" as sociolinguistic markers in British English,” International Journal of Corpus Linguistics, vol. 16, no. 2, 2011, pp. 173-197. DOI: 10.1075/ijcl.16.2.02tot.

    Abstract This study is based on the British National Corpus (BNC) and also takes data from the London-Lund Corpus (LLC) into account. It shows that the so-called filled pauses er/uh and erm/um are sociolinguistic markers that differentiate between registers of English and along gender, age and socio-economic class. Men, older people and educated speakers use more fillers than women, younger speakers and less educated speakers. Nasalization is used more often by women, younger speakers and more educated speakers. These sociolinguistic factors can probably partly explain the fact that the use of fillers is higher in the LLC and the context-governed sample of the BNC than in the demographic sample of the BNC. It is argued that a more positive view should be taken of fillers as planning signals, or planners, and that their functions should be submitted to careful discourse analytic study. Their recognition as words will facilitate such an undertaking.

    Keywords corpus linguistics, Discourse markers, disfluency, filled pauses, hesitation markers, sociolinguistic markers

2010

  • April Ginther, Slobodanka Dimova, and Rui Yang, “Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring,” Language Testing, vol. 27, no. 3, 06/2010 2010, pp. 379-399. DOI: 10.1177/0265532210364407. http://ltj.sagepub.com/content/27/3/379.short.

    Abstract Information provided by examination of the skills that underlie holistic scores can be used not only as supporting evidence for the validity of inferences associated with performance tests but also as a way to improve the scoring rubrics, descriptors, and benchmarks associated with scoring scales. As fluency is considered a critical, perhaps foundational, component of speaking proficiency, temporal measures of fluency are expected to be strongly related to holistic ratings of speech quality.This study examines the relationships among selected temporal measures of fluency and holistic scores on a semi-direct measure of oral English proficiency. The spoken responses of 150 respondents to one item on the Oral English Proficiency Test (OEPT) were analyzed for selected temporal measures of fluency. The examinees represented three first language backgrounds (Chinese, Hindi, and English) and the range of scores on the OEPT scale. While strong and moderate correlations between OEPT scores and speech rate, speech time ratio, mean length of run, and the number and length of silent pauses were found, fluency variables alone did not distinguish adjacent levels of the OEPT scale. Temporal measures of fluency may reasonably be selected for the development of automated scoring systems for speech; however, identification of an examinee’s level remains dependent on aspects of performance only partially represented by fluency measures.

    Keywords automated scoring, Fluency, oral English proficiency

  • Joanne Arciuli, David Mallard, and Gina Villar, “"Um, I can tell you’re lying": Linguistic markers of deception versus truth-telling in speech,” Applied Psycholinguistics, vol. 31, no. 03, 2010, pp. 397-411. DOI: 10.1017/s0142716410000044. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=7792900&fulltextType=RA&fileId=S0142716410000044.

    Abstract Lying is a deliberate attempt to transmit messages that mislead others. Analysis of language behaviors holds great promise as an objective method of detecting deception. The current study reports on the frequency of use and acoustic nature of and during laboratory-elicited lying versus truth-telling. Results obtained using a within-participants false opinion paradigm showed that instances of occur less frequently and are of shorter duration during lying compared to truth-telling. There were no significant differences in relation to These findings contribute to our understanding of the linguistic markers of deception behavior. They also assist in our understanding of the role of in communication more generally. Our results suggest that may not be accurately conceptualized as a filled pause/hesitation or speech disfluency/error whose increased usage coincides with increased cognitive load or increased arousal during lying. It may instead carry a lexical status similar to interjections and form an important part of authentic, effortless communication, which is somewhat lacking during lying.

  • Rachel Baker, and Valerie Hazan, “LUCID: a corpus of spontaneous and read clear speech in British English,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 3-6. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_003.pdf.

    Abstract This paper describes LUCID, the London UCL Clear Speech in Interaction Database, which contains spontaneous and read speech in clear and casual speaking styles for 40 Southern British English speakers. The problem-solving task used to collect the spontaneous speech, the DiapixUK task, is also described, along with ways of using the task to elicit different types of clear speech without explicit instruction, e,g. using different ‘barriers’ to communication. Applications of the corpus and of the task materials for future research projects are discussed. The corpus and materials will be available online to the research community at the end of the project.

    Keywords clear speech, DiSS, interaction, Speech production, spontaneous speech

  • Catia Cucchiarini, Joost van Doremalen, and Helmer Strik, “Fluency in non-native read and spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 15-18. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_015.pdf.

    Abstract Various studies have investigated the temporal aspects of nonnative speech and their relation to perceived fluency, because fluency constitutes an important aspect of second language proficiency. For this purpose it is important to determine which measures are most strongly correlated with perceived fluency and how these measures vary. In the present study objective measures related to perceived fluency were calculated for read and spontaneous speech of non-native speakers of Dutch. The results indicate that the objective measures vary as a function of different variables. Suggestions are made for future investigations so as to facilitate comparisons between studies and meta-analyses.

    Keywords DiSS, Fluency, non-native speech, temporal measures

  • Anne Cutler, Holger Mitterer, Susanne Brouwer, and Annelie Tuinman, “Phonological competition in casual speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 43-46. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_043.pdf.

    Abstract The natural processes affecting spontaneous speech production and the natural processes of spoken-word recognition combine to cause significant activation of irrelevant lexical competitors. Using eye-tracking, we show that reduced forms of words that occur in casual speech cause listeners to activate lexical candidates that resemble the reduced form but are quite unlike the canonical form of the intended word. In L2, the problem is worse: casual speech processes that occur in the L2 but not in the L1 lead to activation of irrelevant competitors even where native listeners experience no such competition.

    Keywords competition, DiSS, eyetracking, word recognition

  • Yuto Daikuhara, “日本語教育におけるフィラーの指導のための基礎的研究 : フィラーの定義と個々の形式の使い分けについて [Basic research on filler for Japanese as foreign language: definition of filler and differentiated use of each form],” PhD Dissertation, Kobe University, Kobe, Japan. March 2010. http://www.lib.kobe-u.ac.jp/infolib/meta_pub/G0000003kernel_D1004831.

    Abstract (none)

  • Dale J. Barr, and Mandana Seyfeddinipur, “The role of fillers in listener attributions for speaker disfluency,” Language and Cognitive Processes, vol. 25, no. 4, 2010, pp. 441-455. DOI: 10.1080/01690960903047122. https://www.tandfonline.com/doi/abs/10.1080/01690960903047122.

    Abstract When listeners hear a speaker become disfluent, they expect the speaker to refer to something new. What is the mechanism underlying this expectation? In a mouse-tracking experiment, listeners sought to identify images that a speaker was describing. Listeners more strongly expected new referents when they heard a speaker say um than when they heard a matched utterance where the um was replaced by noise. This expectation was speaker-specific: it depended on what was new and old for the current speaker, not just on what was new or old for the listener. This finding suggests that listeners treat fillers as collateral signals.

    Keywords common ground, Dialogue, Disfluency, fillers, Perspective taking

  • Robert Eklund, “The effect of directed and open disambiguation prompts in authentic call center data on the frequency and distribution of filled pauses and possible implications for filled pause hypotheses and data collection methodology,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 23-26. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_023.pdf.

    Abstract This paper studies the frequency and distribution of filled pauses (FPs) in ecologically valid data where unaware and authentic customers called in to report problems with their telephony and/or Internet services and were met by a novel Wizard-of-Oz paradigm using real call center agents as wizards. The data analyzed were caller utterances following a directed or an open disambiguation prompt. While no significant differences in FP production were observed as a function of prompt type, FP frequency was found to be considerably higher than what is usually reported in the literature. Moreover, a higher proportion of utterance-initial FPs than normally reported was also observed. The results are compared to previously reported FP frequencies. Potential implications for data collection methodology are discussed.

    Keywords call center, data collection, dialog systems, directed prompts, DiSS, filled pauses, many-options, open prompts, speech planning, Speech production, Wizard-of-Oz, WOZ

  • Ian R. Finlayson, Robin J. Lickley, and Martin Corley, “The influence of articulation rate, and the disfluency of others, on one’s own speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 119-122. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_119.pdf.

    Abstract Disfluencies are a regular feature of spontaneous speech, and much has been learnt about the effects of various linguistic factors on their production. Speech usually occurs within dialogue, yet little is known about the influence of an interlocutor’s speech on a speaker’s own fluency. It has been shown that speakers tend to align on various levels, converging, for example, on lexical, and syntactic levels. But we know little about convergence in rate of speech or disfluency. Little is also known about the effects of speech rate on fluency in a speaker’s own speech. In this paper, we examine these effects through analysis of speech rate, hesitation and error correction in a corpus of task-oriented dialogues (the HCRC Map Task Corpus). Our findings demonstrate that different types of disfluencies can be influenced in different ways by speech rate. Furthermore, the probability of an interlocutor being disfluent appears to affect the speaker’s own likelihood, raising the possibility that interlocutors may “align” on disfluent, as well as fluent, speech.

    Keywords accommodation theory, alignment, articulation rate, Dialogue, DiSS

  • Anne Garcia-Fernandez, Ioana Vasilescu, and Sophie Rosset, “euh as cue for speaker confidence and word searching in human spoken answers in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 79-80. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_079.pdf.

    Abstract This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question answering automatic systems.

    Keywords DiSS, feeling of knowing, interaction management, QA systems, rephrasing, vocalic hesitation

  • Jean-Philippe Goldman, Mathieu Avanzi, and Antoine Auchlin, “Hesitations in read vs. spontaneous French in a multi-genre corpus,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 101-104. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_101.pdf.

    Abstract This study is a part of an on-going work whose goal is the prosodic characterization of various speaking styles in a multi-genre 70-minutes French corpus as well as the development of prosodic automatic detection tools. In this corpus, a manual annotation prominences and disfluencies like hesitations and syntactic ruptures is used to show evident phonological aspects of hesitation in regard to quality, pause position and proximity to syntactic rupture.

    Keywords disfluencies, DiSS, filled pause, hesitation, spoken French, vowel lengthening

  • Joakim Gustafson, and Daniel Neiberg, “Prosodic cues to engagement in non-lexical response tokens in Swedish,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 63-66. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_063.pdf.

    Abstract This paper investigates the prosodic patterns of non-lexical response tokens in a Swedish call-in radio show. The feedback of a professional speaker was investigated to give insight in how to build a simulated active listener that could encourage its users to continue talking. Possible domains for such systems include customer care and second language learning. The prosodic analysis of the non-lexical response tokens showed that the engagement level decreases over time. Prosodic cues to this include change in syllabicity, pitch slope and loudness. We have also investigated prosodic alignment, to see to what extent the active listener mimic the prosody of the callers in his non-lexical response tokens.

    Keywords DiSS, listener responses, prosodic alignment, prosodic cues, turn management

  • Corinna Harwardt, “Investigating the COG ratio as feature for speaker verification on high-effort speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 35-38. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_035.pdf.

    Abstract Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%. When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

    Keywords center of gravity ratio, DiSS, speaker recognition, vocal effort

  • Valerie Hazan, and Rachel Baker, “Does reading clearly produce the same acoustic-phonetic modifications as spontaneous speech in a clear speaking style?,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 7-10. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_007.pdf.

    Abstract This paper describes an acoustic-phonetic comparison of casual and clear speech styles elicited in read and spontaneous speech. For the spontaneous speech, 20 pairs of English talkers were recorded doing a problem-solving picture task in good and degraded listening conditions. Each person also read sentences in casual and clear styles. The read clear speech was an exaggerated form of clear speech relative to the spontaneous clear speech: it had higher median F0 in both styles, a greater increase in F0 range and greater decrease in speaking rate between casual and clear styles, and trends towards greater vowel space expansion.

    Keywords acoustic-phonetic characteristics, clear speech, DiSS, interaction, read speech, spontaneous speech

  • Pei-Yu Hsieh, “Pitch patterns in the vocalization of a 3-month-old Taiwanese infant,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 93-96. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_093.pdf.

    Abstract This paper studied pitch contours of a Taiwanese-acquiring infant at gooing stage. Breath group theory has shown that pitch patterns of this stage were physiologically-based [6]. Fall was expected to occur at the boundary of a breath group. It predicted that Fall to be the most common pitch contour, and the second high was Rise-Fall. But previous studies [8], [9] showed that Rise-Fall occurred more. We investigated patterns of an infant from six weeks old to twelve weeks old. Mean f0 of basic contours of this stage were also shown. The f0 range of Level, Fall, and Rise were reported. Our results showed four types of contours (Level, Fall, Rise, Rise-Fall) appearing at this stage. Consistent with the hypothesis, Fall was found to be most common. Rise-Fall was found to be the second high. Fall and Rise-Fall made up to almost seventy percent. Level contour was found to be rare. The mean f0 of the infant at 3-month old was 400 Hz, higher than that of a toddler at 1;3 (370 Hz) and that of an adult (220 Hz). The f0 range was 700 Hz, greater than that of a toddler at 1;3 (450 Hz), and an adult (300 Hz).

    Keywords acquisition, DiSS, pitch, vocalization

  • Tomohito Ishikawa, “Coding disfluency phenomena for a fluency measure in TBLT research,” Journal of Soka Women’s College, vol. 40, March 2010, pp. 101-130. http://ci.nii.ac.jp/naid/40017373381/en/.

    Abstract The aim of this article is to describe coding steps for a disfluency measure employed in Ishikawa (2008a, b). According to Ellis and Barkhuizen (2005), fluency measures can be divided into two major categories. One is related to speed of speaking (i.e., temporal variables) and the other is related to repair fluency. In the sections to follow, I will first describe Shriberg’s classification system of disfluency. After the description of Shriberg’s classification system, I will describe an L2 disfluency measure used in Ishikawa (2008a, b).

  • Yuichi Ishimoto, and Mika Enomoto, “Analysis of prosodic features for end-of-utterance prediction in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 97-100. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_097.pdf.

    Abstract In this study, we analyzed prosodic features of accentual phrases and investigated their temporal changes to obtain cues for de- tecting boundaries at where turn-taking could occur in sponta- neous conversations. The acoustic parameters used as prosodic features were the fundamental frequency, sound pressure level, and duration of accentual phrases in long utterance units. The results showed that the fundamental frequency shift between the first and second accentual phrases could be useful for detecting the number of accentual phrases in the long utterance unit. In addition, the results suggested that a rapid decrease in sound pressure and an extended duration of the accentual phrase con- stitute a cue for detecting the end of the utterance. That is, the acoustic predictor of the utterance length appeared at the begin- ning of the utterance, and the predictor of the utterance bound- ary appeared shortly before the end of the utterance.

    Keywords accentual phrase, DiSS, long utterance unit, prosody, turn-taking

  • Kristiina Jokinen, “Hesitation and uncertainty as feedback,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 103-106. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_103.pdf.

    Abstract This paper deals with the signals that are used to express hesitation and uncertainty in conversational interactions. It studies the relation between gesturing, body posture, facial expressions, and speech, and draws conclusions of their role and function in the interpretation and coordination of interaction with respect to the basic enablements of communication. Dialogues are assumed to be cooperative activity that is constrained by the participants’ roles, social obligations, and communicative situation.

    Keywords DiSS, hesitation, interaction, speech, uncertainty

  • Okim Kang, “Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness,” System, vol. 38, no. 2, June 2010, pp. 301-315. DOI: 10.1016/j.system.2010.01.005.

    Abstract Suprasegmentals have been emphasized in ESL/EFL pedagogy since the advent of communicative language teaching. However, it is still unclear how individual suprasegmental features affect listeners’ judgments of non-native speakers’ accented speech. The current study began to specify relative weights of individual temporal and prosodic features for listeners’ judgments on L2 comprehensibility and accentedness. Using the PRAAT computer program, 5 min of continuous in-class lectures from 11 international teaching assistants (ITAs) were acoustically analyzed for measures of speech rate, pauses, stress, and pitch range. Fifty eight US undergraduate students evaluated the ITAs’ oral performance and commented on their ratings. The results revealed that suprasegmental features independently contributed to listeners’ perceptual judgments. Accent ratings were best predicted by pitch range and word stress measures whereas comprehensibility scores were mostly associated with speaking rates. ITAs’ acoustic profiles as well as listeners’ comments on their rating offer practical implications to ITA program developers, ESL teachers, and future research in accented speech.

    Keywords accentedness, Comprehensibility, International teaching assistants, Suprasegmentals

  • Takuya Kawada, “On the characteristics of three types of Japanese fillers: e-, ma-, and demonstrative-type fillers,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 27-30. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_027.pdf.

    Abstract Japanese has various forms of fillers. However, the characteristics of each form have yet to be well understood. We use a large corpus of spontaneous Japanese speech and conversation and focus on three frequently observed types of fillers : e-, ma-, and demonstrative-type fillers. We show that it is possible to characterize Japanese fillers from the viewpoint of how a speaker concerns himself with the listener in the communicative setting. The type of discourse, way of speaking, and direction of gaze of the speaker influence the distribution of the types of filler.

    Keywords DiSS, fillers, gaze, Japanese, spoken settings

  • Hanae Koiso, and Yasuharu Den, “Towards a precise model of turn-taking for conversation: a quantitative analysis of overlapped utterances,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 55-58. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_055.pdf.

    Abstract In this paper, we present the outline of a new model of turntaking that is applicable not only to smooth transitions but also to transitions involving overlapping speech. We identify acoustic, prosodic, and syntactic cues in overlapped utterances that elicit early initiation of a next turn, based on a quantitative analysis of Japanese three-party conversations, proposing a model for predicting a turn’s completion in an incremental fashion using sources from units at multiple levels.

    Keywords DiSS, incremental processing, overlapped utterances, turn-taking

  • Phoenix W. Y. Lam, “Discourse Particles in Corpus Data and Textbooks: The Case of Well,” Applied Linguistics, vol. 31, no. 2, May 2010, pp. 260-281. DOI: 10.1093/applin/amp026. http://applij.oxfordjournals.org/content/31/2/260.abstract.

    Abstract Discourse particles are ubiquitous in spoken discourse. Yet despite their pervasiveness very few studies attempt to look at their use in the pedagogical setting. Drawing on data from an intercultural corpus of speech and a textbook database, the present study compares the use of discourse particles by expert users of English in Hong Kong with their descriptions and presentations in textbooks designed for learners of English in the same community. Specifically, it investigates the similarities and differences in the use of the discourse particle well between the two datasets in terms of its frequency of occurrence, its positional preference and its discourse function. Results from the analysis show that there are vast differences as regards how the particle well is used in real-world examples and how its use is described and presented in teaching materials. This raises the question to what extent foreign language learners who have minimal exposure to naturally-occurring spoken interactions in English could effectively master the use of discourse particles if they solely rely on these textbooks.

  • Rebecca Lunsford, Peter A. Heeman, Lois Black, and Jan van Santen, “Autism and the use of fillers: differences between ‘um’ and ‘uh’,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 107-110. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_107.pdf.

    Abstract Little research has been done to explore differences in the use of the fillers ‘um’ and ‘uh’ between children with Autistic Spec- trum Disorder (ASD) and those with typical development (TD). Quantifying any differences could aid in diagnosing ASD, un- derstanding its nature, and better understanding the mechanisms involved in dialogue processing. In this paper, we report on a study of dialogues between clinicians and children with ASD or TD, finding that the two groups of children differ substantially in their use of ‘um’ but not ‘uh’. This suggests that these two fillers result from different cognitive processes.

    Keywords autism, disfluencies, DiSS, fillers

  • Kikuo Maekawa, “Final lowering and boundary pitch movements in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 47-50. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_047.pdf.

    Abstract Standard theory of the prosodic structure in Tokyo Japanese treats both the final lowering and boundary pitch movements as the properties of utterance node. Validity of this treatment was examined by means of corpus-based analyses of spontaneous speech. The results showed that while final lowering could be treated as a property of utterance, boundary pitch movement could not. The latter should rather be treated as the property of accentual phrase. Based on these results, revised prosodic structure and annotation scheme were proposed.

    Keywords BPM, CSJ, DiSS, final lowering, X-JToBI

  • Takehiko Maruyama, Katsuya Takanashi, and Nao Yoshida, “An annotation scheme for syntactic unit in Japanese dialog,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 51-54. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_051.pdf.

    Abstract In this paper, we propose a scheme for annotating syntactic units called DCU (Dialog Clause-Unit) in Japanese dialogs. Since there is no explicit devices to mark sentence boundaries in speech, precise definition and criteria must be designed to extract syntactic units from the utterance. We show a design of DCU which consists of clausal and non-clausal units. Annotating DCU tags to eight dialogs of 40 minutes from two different dialog corpora, we examine characteristics of each dialog from the viewpoint of DCU, and compare them to the distribution of clausal-units annotated to monologs.

    Keywords clause boundary, dialog clause-unit, DiSS, Japanese dialog and monolog, unit length

  • Dana McDaniel, Cecile McKee, and Merrill F. Garrett, “Children’s sentence planning: Syntactic correlates of fluency variations,” Journal of Child Language, vol. 37, no. 1, 2010, pp. 59-94. DOI: 10.1017/s0305000909009507. http://journals.cambridge.org/article_S0305000909009507.

    Abstract This paper argues for broader consideration of children’s language production systems and, in that context, describes research on children’s planning of syntactic structures. The research presented here measures non-fluency patterns in elicited utterances of varied syntactic type. We describe and interpret several regularities in these patterns for two groups of children ((‘young’: three–five-year-olds; and ‘older’: six–eight-year-olds) and an adult comparison group. The evidence indicates a strong correspondence of adult and child responses to structural complexity, both in terms of global fluency measures and in terms of more detailed indicators of planning load. In addition, we report some specific contrasts in the patterning for children and adults that suggest disparities in processing resources and/or in local planning strategies.

  • Sandra Merlo, and Plínio A. Barbosa, “Periodic cycles of hesitation phenomena in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 19-22. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_019.pdf.

    Abstract To verify whether hesitation phenomena are distributed periodically in spontaneous speech, twenty speech samples produced by five male adults were analyzed. Spectral analysis allowed for three main findings. First, hesitations present stationary behavior, which implies they did not accumulate in the beginning, in the middle, or in the end of speech samples. Second, periodic cycles of hesitation phenomena were detected in all speech samples (mean cycle duration around 13 seconds). This implies that regions with more hesitations tended to regularly alternate with regions with fewer hesitations. Third, periodic cycles accounted for about 30% of variance in data.

    Keywords DiSS, hesitation phenomena, periodic cycles, time series

  • Emi Morita, “Salientizing the breaks in talk: a study of Japanese segmentizing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 59-62. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_059.pdf.

    Abstract In naturally occurring conversation, Japanese speakers often break up their turns at talk with seemingly random or disfluent pauses that break the flow of talk into a series of successive small segments which may not be semantically coherent. Moreover, the boundaries between such segments are often made salient via the attachment of interactional particles, such as ne and sa. Empirical observation of such naturally occurring partitioning of talk reveals that such “semantically irregular” segmentation is used by both speakers and their recipients to accomplish a legitimate communicative function in managing the fine-tuned choreography of moment-bymoment conversational interaction.

    Keywords DiSS, interactional particles, Japanese conversation, utterance segmentation

  • Daniel Neiberg, and Joakim Gustafson, “Modeling conversational interaction using coupled Markov chains,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 81-84. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_081.pdf.

    Abstract This paper presents a series of experiments on automatic transcription and classification of fillers and feedbacks in conversational speech corpora. A feature combination of PCA projected normalized F0 Constant-Q Cepstra and MFCCs has shown to be effective for standard Hidden Markov Models (HMM). We demonstrate how to model both speaker channel with coupled HMMs and show expected improvements. In particular, we explore model topologies which take advantage of predictive cues for fillers and feedback. This is done by initializing the training with special labels located immediately before fillers in the same channel and immediately before feedbacks in the other speaker channel. The average F-score for a standard HMM is 34.1%, for a coupled HMM 36.7% and for a coupled HMM with pre-filler and pre-feedback labels 40.4%. In a pilot study the detectors are found to be useful for semi-automatic transcription of feedback and fillers in socializing conversations.

    Keywords conversation, coupled hidden markov models, cross-speaker modeling, DiSS, feedbacks, fillers

  • Hannele Nicholson, Kathleen Eberhard, and Matthias Scheutz, “"um...i don’t see any": the function of filled pauses and repairs,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 89-92. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_089.pdf.

    Abstract We investigate disfluency distribution rates within different moves from an interactive task-oriented experiment to further explore the suggestion by Bortfeld et al. [1] and Nicholson [2] that different types of disfluencies may fulfill varying functions. We focus on disfluency types within moves, or speech turns, where a speaker initiates something compared to a response to such a move. We find that filled pauses (FPs) such as um or uh fulfilled an interpersonal role for participants while repairs occurred out of difficulty.

    Keywords Dialogue, dialogue moves, disfluency, DiSS, Language production

  • Emanuel A. Schegloff, “Some Other "Uh(m)"s,” Discourse Processes, vol. 47, no. 2, 2010, pp. 130-174. DOI: 10.1080/01638530903223380.

    Abstract Recent work on the occurrence of "uh" and "uhm" in ordinary talk-in-interaction is concerned almost exclusively with its relation to trouble in the speech production process. After touching briefly on this environment of occurrence, this conversation-analytic article focuses attention on several interactional environments in which "uh(m)" figures in other ways—most extensively on its use to indicate the "reason-for-the-interaction’s-launching." The underlying theme is that accounts for what gets done and gets understood in talk-in-interaction must take into account not only its composition, but also its position—not only with respect to the grammar of sentences, but also with respect to the organization of turns at talk, of action sequences encompassing multiple turns at talk, and of occasions of talk, all of which are demonstrably oriented to by speakers in their production of the talk and by recipients in their analyzing of the talk.

  • Norman Segalowitz, Cognitive Bases of Second Language Fluency. London: Routledge.June 2010. http://www.routledge.com/books/details/9780805856620/.

    Abstract Exploring fluency from multiple vantage points that together constitute a cognitive science perspective, this book examines research in second language acquisition and bilingualism that points to promising avenues for understanding and promoting second language fluency. Cognitive Bases of Second Language Fluency covers essential topics such as units of analysis for measuring fluency, the relation of second language fluency to general cognitive fluidity, social and motivational contributors to fluency, and neural correlates of fluency. The author provides clear and accessible summaries of foundational empirical work on speech production, automaticity, lexical access, and other issues of relevance to second language acquisition theory. Cognitive Bases of Second Language Fluency is a valuable reference for scholars in SLA, cognitive psychology, and language teaching, and it can also serve as an ideal textbook for advanced courses in these fields.

  • Kazuki Sekine, “Gesture correction in children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 71-74. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_071.pdf.

    Abstract Speakers sometimes modify their gestures during the process of production into disguised adaptors. Such disguised adaptors can be treated as evidence that speakers can monitor their gestures. This study investigated when disguised adaptors are produced in Japanese elementary school children. The results showed that children did not produce disguised adaptors until the age of 8. The emergence of disguised adaptors suggested that children start to monitor their gestures when they are 9 or 10 years old. Cultural influences and cognitive changes were considered as factors to influence emergence of disguised adaptors.

    Keywords adaptors, DiSS, speech error, spontaneous gestures

  • Shu-Chuan Tseng, and Yun-Ru Huang, “A socio-phonetic analysis of Taiwan Mandarin interview speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 67-70. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_067.pdf.

    Abstract This paper presents results of a socio-phonetic analysis of Taiwan Mandarin by using a corpus of questionnaire-based interview speech. Questions were asked to collect data of the interviewee’s background of language use, socio-economic status, and internet access in different regions of Taiwan. Two typical dialect-influenced pronunciation errors, the deletion of /w/ before /o/ and the delabilialization of /y/ were analyzed with the associated socio-economic factors and the degree of dialect exposure. The degree of dialect exposure (Southern Min) and the studied pronunciation variants are statistically correlated with the accuracy rate. But no direct correlation was found between the pronunciation variation and the socioeconomic factors.

    Keywords DiSS, interview speech, sociophonetics, Taiwan Mandarin

  • Shu-Chuan Tseng, and Tzu-Lun Lee, “Contextual effects in recognizing reduced words in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 39-42. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_039.pdf.

    Abstract This study investigates the effects of context on recognizing reduced word forms in spontaneous speech. Sixteen high-frequency disyllabic targets, eight disyllabic and eight combinations of monosyllabic words are presented to 48 subjects in a spoken word recognition experiment in three conditions: in their original context, in isolation, and embedded in a carrier sentence. Results show that context, degree of reduction, word unit type, gender, and age group all show an effect on the accuracy rates of recognizing the target items. Most interestingly, while a meaningful context helps recognize reduced word forms, a less meaningful context inhibits the recognition more than no context.

    Keywords context effect, DiSS, spoken word recognition

  • Shu-Chuan Tseng, Pei-Chen Tsou, Ko Kuei, and Chien-Wen Lee, “Assessing sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 11-14. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_011.pdf.

    Abstract This paper examines sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children with matched gender, age and level of speech comprehension. We assessed these two kinds of speech styles by talker intelligibility, vowel space, and spike production in plosives. In both speaking styles, normally hearing children performed better in talker intelligibility than their hearingimpaired counterparts. No clear vowel space shrinkage was observed in respect of speech style, hearing impairment, and age group. Surprisingly, the production of the spike in plosives was a useful measure for distinguishing acoustic properties of different speaking styles and hearing ability.

    Keywords acoustic properties, DiSS, hearing impairment, speaking style, speech assessment

  • Ioana Vasilescu, Sophie Rosset, and Martine Adda-Decker, “On the functions of the vocalic hesitation euh in interactive man-machine question answering dialogs in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 111-114. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_111.pdf.

    Abstract This paper deals with the functions of the French vocalic hesitation euh in interactive speech of man-machine question answering dialogs. The present analysis suggests that the vocalic hesitation euh may carry various properties in speech, both disfluent signaling the speakers’ efforts to put the intended message under production into appropriate words, and fluent, as markers of discourse structure. Moreover, euh seems to play a role in bracketing lexical units, pointing to the informative content within an utterance. This bracketing may favour intelligibility or decoding fluency on the listener’s side. The potential contribution of the vocalic hesitation euh to lexical information bracketing is investigated with the goal of improved information processing by QA systems. Future objectives include a smarter interaction capacity by an appropriate usage of such euh items.

    Keywords dialog corpus, Discourse markers, disfluency, DiSS, Fluency, French, Q/A, vocalic hesitation

  • Kun-Ching Wang, Chiun-Li Chin, and Yi-Hsing Tsai, “Voice activity detection based on combination of weighted sub-band features using auto-correlation function,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 85-88. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_085.pdf.

    Abstract This paper shows the voice activity detection (VAD) based on combination of weighted sub-band features using autocorrelation function. According to the fact that the noise corruption on each sub-band is different from each other, so the estimated signal to noise ratio (SNR) is employed to weight utility rate of each frequency sub-band. Furthermore, a strategy of sub-band features combination is used to integrate all of weighted sub-band auto-correlation function feature parameter and to develop the combined feature parameter. Experimental results demonstrate that the proposed VAD achieves better performance than existing standard VADs at any noise level.

    Keywords auto-correlation, DiSS, feature combination, sub-band weighting, voice activity detection, wavelet packet transform

  • Michiko Watanabe, and Yasuharu Den, “Utterance-initial elements in Japanese: a comparison among fillers, conjunctions, and topic phrases,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 31-34. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_031.pdf.

    Abstract Speakers need to plan the following part of speech under the pressure of a temporal imperative at utterance-initial positions. Each language seems to have some devices to solve this problem, which we call utterance-initial elements (UIEs). We investigated effects of two factors, boundary strengths and complexity of the following constituents, on the durations of possible UIEs, such as fillers, conjunctions, and topic phrases. We found that the last mora of filler e, as well as wa-marked topic phrases, became longer as the complexity increased in certain conditions. Possible interpretations for the results are discussed.

    Keywords boundary strengths, constituent complexity, DiSS, prolongation, utterance-initial elements

  • Li-chiung Yang, “Meaning and use: a pragmatic and prosodic analysis of interjections in conversational speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 75-78. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_075.pdf.

    Abstract In this paper we report on our research on the pragmaticcontextual meaning and prosody of three interjections ey, wa, and oh. A detailed qualitative-contextual analysis of our corpus shows that these interjections share important contextual and prosodic characteristics due to their similar functional status with respect to new or unexpected information. We show that there are also significant differences in contextual meaning arising from specific emotional or cognitive states, and that these differences are expressively communicated in the varied prosody of each interjection.

    Keywords discourse, DiSS, interjections, meaning, prosody

  • Etsuko Yoshida, and Robin J. Lickley, “Disfluency patterns in dialogue processing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 115-118. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_115.pdf.

    Abstract Spontaneous speech abounds with disfluencies such as filled pauses, repairs, repetitions, false start and prolongations, all of which are significant but easily overlooked features of speech communication. Based on the comparable corpora of English and Japanese dialogues, we argue that disfluency features can have a positive effect on turn-taking issues and the establishment of common referring expressions in dialogue processing. We examined the occurrence of ten types of filled pauses in Japanese and investigated how they interact with discourse entities and the sharing of common ground. The results indicate that two patterns of disfluency features contribute to on-line speech planning of the participants and their four functions serve to construct the collaborative process of speech communication.

    Keywords common ground, corpus, Dialogue, disfluency, DiSS, referring expressions

2009

  • Kartik Audhkhasi, Kundan Kandhway, Om. D. Deshmukh, and Ashish Verma, “Formant-based technique for automatic filled-pause detection in spontaneous spoken english,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 4857-4860. DOI: 10.1109/ICASSP.2009.4960719.

    Abstract Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to improve the performance of automatic speech recognition systems or to predict the mental state of the speaker. This paper presents an algorithm for filled pause detection that is based on the premise that the vocal tract characteristics, and hence the formants, are stable during the production of a filled pause. The performance of the proposed algorithm is evaluated on real-life recordings of call center agents where the locations of the filled pauses are hand labeled. The proposed algorithm outperforms a standard cepstral stability based filled pause detection algorithm and a standard pitch-based detection technique.

  • Tracey M. Derwing, Murray J. Munro, Ron I. Thomson, and Marian J. Rossiter, “The Relationship between L1 Fluency and L2 Fluency Development,” Studies in Second Language Acquisition, vol. 31, no. 4, December 2009, pp. 533-557. DOI: 10.1017/S0272263109990015.

    Abstract A fundamental question in the study of second language (L2) fluency is the extent to which temporal characteristics of speakers’ first language (L1) productions predict the same characteristics in the L2. A close relationship between a speaker’s L1 and L2 temporal characteristics would suggest that fluency is governed by an underlying trait. This longitudinal investigation compared L1 and L2 English fluency at three times over 2 years in Russian- and Ukrainian- (which we will refer to here as Slavic) and Mandarin-speaking adult immigrants to Canada. Fluency ratings of narratives by trained judges indicated a relationship between the L1 and the L2 in the initial stages of L2 exposure, although this relationship was found to be stronger in the Slavic than in the Mandarin learners. Pauses per second, speech rate, and pruned syllables per second were all related to the listeners’ judgments in both languages, although vowel durations were not. Between-group differences may reflect differential exposure to spoken English and a closer relationship between Slavic languages and English than between Mandarin and English. Suggestions for pedagogical interventions and further research are also proposed.

  • Rod Ellis, “The Differential Effects of Three Types of Task Planning on the Fluency, Complexity, and Accuracy in L2 Oral Production,” Applied Linguistics, vol. 30, no. 4, December 2009, pp. 474-509. DOI: 10.1093/applin/amp042. http://applij.oxfordjournals.org/content/30/4/474.abstract.

    Abstract The main purpose of this article is to review studies that have investigated the effects of three types of planning (rehearsal, pre-task planning, and within-task planning) on the fluency, complexity, and accuracy of L2 performance. All three types of planning have been shown to have a beneficial effect on fluency but the results for complexity and accuracy are more mixed, reflecting both the type of planning and also the mediating role of various factors, including task design and implementation variables and individual difference factors. A secondary purpose is to outline a theory that can account for the role that planning plays in L2 performance. The article concludes with a list of limitations in the research to date.

  • Klaus Zechner, Derrick Higgins, Xiaoming Xi, and David M. Williamson, “Automatic scoring of non-native spontaneous speech in tests of spoken English,” Speech Communication, vol. 51, no. 10, 2009, pp. 883 - 895. DOI: http://dx.doi.org/10.1016/j.specom.2009.04.009. http://www.sciencedirect.com/science/article/pii/S0167639309000703.

    Abstract This paper presents the first version of the SpeechRaterSM system for automatically scoring non-native spontaneous high-entropy speech in the context of an online practice test for prospective takers of the Test of English as a Foreign Language&reg; internet-based test (TOEFL&reg; iBT). The system consists of a speech recognizer trained on non-native English speech data, a feature computation module, using speech recognizer output to compute a set of mostly fluency based features, and a multiple regression scoring model which predicts a speaking proficiency score for every test item response, using a subset of the features generated by the previous component. Experiments with classification and regression trees (CART) complement those performed with multiple regression. We evaluate the system both on {TOEFL} Practice data [TOEFL Practice Online (TPO)] as well as on Field Study data collected before the introduction of the {TOEFL} iBT. Features are selected by test development experts based on both their empirical correlations with human scores as well as on their coverage of the concept of communicative competence. We conclude that while the correlation between machine scores and human scores on {TPO} (of 0.57) still differs by 0.17 from the inter-human correlation (of 0.74) on complete sets of six items (Pearson r correlation coefficients), the correlation of 0.57 is still high enough to warrant the deployment of the system in a low-stakes practice environment, given its coverage of several important aspects of communicative competence such as fluency, vocabulary diversity, grammar, and pronunciation. Another reason why the deployment of the system in a low-stakes practice environment is warranted is that this system is an initial version of a long-term research and development program where features related to vocabulary, grammar, and content will be added in a later stage when automatic speech recognition performance improves, which can then be easily achieved without a re-design of the system. Exact agreement on single {TPO} items between our system and human scores was 57.8%, essentially at par with inter-human agreement of 57.2%. Our system has been in operational use to score {TOEFL} Practice Online Speaking tests since the Fall of 2006 and has since scored tens of thousands of tests.

    Keywords Speaking assessment

2008

  • Martin Corley, and Oliver W. Stewart, “Hesitation Disfluencies in Spontaneous Speech: The Meaning of um,” Language and Linguistics Compass, vol. 2, no. 4, July 2008, pp. 589-602. DOI: 10.1111/j.1749-818X.2008.00068.x.

    Abstract Human speech is peppered with ums and uhs, among other signs of hesitation in the planning process. But are these so-called fillers (or filled pauses) intentionally uttered by speakers, or are they side-effects of difficulties in the planning process? And how do listeners respond to them? In the present paper, we review evidence concerning the production and comprehension of fillers such as um and uh, in an attempt to determine whether they can be said to be ’words’ with ’meanings’ that are understood by listeners. We conclude that, whereas listeners are highly sensitive to hesitation disfluencies in speech, there is little evidence to suggest that they are intentionally produced, or should be considered to be words in the conventional sense.

  • Tracey M. Derwing, Murray J. Munro, and Ron I. Thomson, “A Longitudinal Study of ESL Learners’ Fluency and Comprehensibility Development,” Applied Linguistics, vol. 29, no. 3, 2008, pp. 359-380. DOI: 10.1093/applin/amm041. http://applij.oxfordjournals.org/content/29/3/359.abstract.

    Abstract This longitudinal mixed-methods study compared the oral fluency of well-educated adult immigrants from Mandarin and Slavic language backgrounds (16 per group) enrolled in introductory English as a second language (ESL) classes. Speech samples were collected over a 2-year period, together with estimates of weekly English use. We also conducted interviews at the last data collection session. The participants’ fluency and comprehensibility at three points over 22 months were judged by 33 native speakers of English. We examine the learners’ progress in light of their exposure to English outside of their ESL class. The Slavic language speakers showed a small but significant improvement in both fluency and comprehensibility, whereas the Mandarin speakers’ performance did not change over 2 years, although both groups started at the same level of oral proficiency. These differences may be attributable in part to degree of exposure to English outside the ESL courses. Neither group had extensive exposure outside of their classes because of employment and familial responsibilities (although the Slavic language speakers reported more opportunities). Thus both groups may have been disadvantaged by a lack of oral fluency instruction. The findings, both quantitative and qualitative, are interpreted using the Willingness to Communicate framework; we also discuss implications for the language classroom.

  • Michael Erard, Um... Slips, Stumbles, and Verbal Blunders, and What They Mean. New York: Penguin Random House.August 2008. https://www.penguinrandomhouse.com/books/46803/um---by-michael-erard/.

    Abstract This original, entertaining, and surprising book investigates verbal blunders: what they are, what they say about those who make them, and how and why we’ve come to judge them.Um… is about how you really speak, and why it’s normal for your everyday speech to be filled with errors—about one in every ten words. In this charming, engaging account of language in the wild, linguist and writer Michael Erard also explains why our attention to some blunders rises and falls. Where did the Freudian slip come from? Why do we prize "umlessness" in speaking—and should we? And how do we explain the American presidents who are famous for their verbal stumbles? Full of entertaining examples, Um… is essential reading for talkers and listeners of all stripes.

  • Carla L. Hudson Kam, and Nicole A. Edwards, “The use of uh and um by 3- and 4-year-old native English-speaking children: Not quite right but not completely wrong,” First Language, vol. 28, no. 3, 08/2008 2008, pp. 313-327. DOI: 10.1177/0142723708091149. http://fla.sagepub.com/content/28/3/313.abstract.

    Abstract The delay markers (DMs) 'uh' and 'um' are often used by adult English speakers to indicate that an upcoming pause is due to a speech disruption, not the end of a conversational turn. Moreover, 'uh' and 'um' indicate different degrees of disruption (Clark & Fox Tree, 2002). Thus, it appears that children must learn how to use DMs appropriately. In the current study we examined DM use in elicited speech samples from 24 3- and 4-year-old children. We found that pauses following DMs were longer than those not following a DM, but that there was no difference between the pauses following 'uh' and 'um'. Children at this age, then, appear to understand the basic use of DMs, but do not yet differentiate between them.

    Keywords Conversational development, disfluencies, filled pauses, narrative, turn-taking

  • T. Florian Jaeger, and Celeste Kidd, “A Unified Model of Redundancy Avoidance and Strategic Lengthening,” in The 21st CUNY Sentence Processing Conference, March 2008. https://www.researchgate.net/publication/228797456_A_Unified_Model_of_Redundancy_Avoidance_and_Strategic_Lengthening.

    Abstract Recent studies have revealed an intriguing link between redundancy and reduction: words that are more predictable in their context are more commonly reduced (shorter and with less articulatory detail [1,2,3]). These studies have, however, also found a puzzling asymmetry: Content words are reduced when predictable given the previous word, but function words are reduced when predictable given the following word. We present a solution to this puzzle that unifies work on redundancy with work on strategic lengthening [4]. We find that the apparent backward-predictability effect on function word reduction is an artifact caused by speakers' tendency to slow pronunciation when the next word is unavailable.

  • Lucy J. MacGregor, “Disfluencies affect language comprehension: evidence from event-related potentials and recognition memory,” Master's Thesis, The University of Edinburgh. 2008. http://hdl.handle.net/1842/3311.

    Abstract Everyday speech is littered with disfluencies such as filled pauses, silent pauses, repetitions and repairs which reflect a speaker’s language production difficulties. But what are the effects on language comprehension? This thesis took a novel approach to the study of disfluencies by combining an investigation of the immediate effects on language processing with an investigation of the longer-term effects for the representation of language in memory. A series of experiments is reported which reflects the first attempt at a systematic investigation of the effects of different types of disfluencies on language comprehension. The experiments focused on the effects of three types of disfluencies—ers, silent pauses, and repetitions—on the comprehension of subsequent words. Critical words were either straightforward continuations of the pre-interrupted speech or a repair word which corrected the pre-interrupted speech. In addition, the effects that occur when er, repetition, and repair disfluencies themselves are processed, were assessed. ERPs showed that the N400 effect elicited in response to contextually unpredictable compared to predictable words was attenuated by the presence of a pre-target er reflecting a reduction in the standard difference where unpredictable words are more difficult to integrate into their contexts. This finding suggests that ers may reduce the extent to which listeners make predictions about upcoming words. In addition, words preceded by an er were more likely to be correctly recognised in a subsequent memory test. These findings demonstrate a longer-term consequence for representation which may reflect heightened attention during processing. Silent pauses did not affect the N400 but there was some indication of an effect on recognition memory. Repetition disfluencies did not affect the N400 or recognition memory. These findings demonstrate the importance of the nature of the disruption to speech. For all types of disfluent utterances, unpredictable words elicited a Late Positive Complex (LPC), possibly reflecting processes associated with memory retrieval and control as listeners attempted to resume structural fluency after any interruption. Ers themselves elicited standard attention-related ERP effects: the Mismatch Negativity (MMN) and P300 effects, supporting the possibility that ers heighten attention. Repetition disfluencies elicited a right posterior positivity, reflecting detection of the disfluency and possibly syntactic reanalysis. Repair disfluencies elicited an early frontal negativity, possibly related to the detection of a word category violation, and a P600 effect, reflecting syntactic reanalysis. The presence of an er preceding the repair eliminated the early negativity, but had no effect on the P600 suggesting that ers may prepare listeners for the possibility of an upcoming repair, but that they do not reduce the difficulty associated with reanalysis. Taken together, the results from the studies reported in the thesis support an account of disfluency processing which incorporates both prediction and attention

    Keywords Language comprehension, Psychology

  • Ralph L. Rose, “Filled Pauses in Language Teaching: Why and How,” Bulletin of Gunma Prefectural Women’s University, vol. 29, 2008, pp. 47-64. http://www.roselab.sci.waseda.ac.jp/resources/file/teachingfps.pdf.

    Abstract Filled Pauses (uh, um) are ubiquitous elements of spontaneous speech but have received relatively little attention in second language teaching. Perhaps this is because filled pauses have often been regarded as meaningless elements resulting from speech processing difficulties. This paper draws from research in widely disparate fields to show that speakers and listeners use them systematically and meaningfully. These facts are used to generate a unified and coherent model of filled pauses in spontaneous speech. This model is then used to develop a concept of communicative competence in which filled pauses play a role at the interface between pragmatic constraints and communication strategies. The article concludes with practical recommendations for how filled pauses may be incorporated into the second-language teaching curriculum.

  • Michiko Watanabe, Keikichi Hirose, Yasuharu Den, and Nobuaki Minematsu, “Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners,” Speech Communication, vol. 50, no. 2, February 2008, pp. 81-94. DOI: 10.1016/j.specom.2007.06.002.

    Abstract We examined whether filled pauses (FPs) affect listeners’ predictions about the complexity of upcoming phrases in Japanese. Studies of spontaneous speech corpora show that constituents tend to be longer or more complex when they are immediately preceded by FPs than when they are not. From this finding, we hypothesized that FPs cause listeners to expect that the speaker is going to refer to something that is likely to be expressed by a relatively long or complex constituent. In the experiments, participants listened to sentences describing both simple and compound shapes on a computer screen. Their task was to press a button as soon as they had identified the shape corresponding to the description. Phrases describing shapes were immediately preceded by a FP, a silent pause of the same duration, or no pause. We predicted that listeners’ response times to compound shapes would be shorter when there is a FP before phrases describing the shape than when there is no FP, because FPs are good cues to complex phrases, whereas response times to simple shapes would not be shorter with a preceding FP than without. The results of native Japanese and proficient non-native Chinese listeners agreed with the prediction and provided evidence to support the hypothesis. Response times of the least proficient non-native listeners were not affected by the existence of FPs, suggesting that the effects of FPs on non-native listeners depend on their language proficiency.

  • Chen-huei Wu, “Filled Pauses in L2 Chinese: A Comparison of Native and Non-Native Speakers,” in Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20), Columbus, Ohio, The Ohio State University, 2008, pp. 213-227. http://chinalinks.osu.edu/naccl/naccl-20/NACCL-20_Proceedings.htm.

    Abstract The aim of this paper is to determine whether native and non-native speech can be predicted on the basis of temporal measurements of filled pauses by training a Classification and Regression Tree (Breiman et al. 1984). On the basis of the present results, several conclusions can be drawn: First, distinguishing between native and non-native speech can increase in accuracy based on temporal measurements of FPs. Among these variables, the rate of speech appears to be the best predictor. Second, this study suggests that information from the FPs ‘uh’ and ‘um’ is a useful predictor of fluency in further differentiating native/nonnative speakers. Third, the classification can be accurately predicted with a small set of variables.

2007

  • Karl G.D. Bailey, and Fernanda Ferreira, “The Processing of Filled Pause Disfluencies in the Visual World,” in Eye movements: A window on mind and brain, Van Gompel, Roger P.G. and Murray, Wayne S. and Fischer, Martin H. and Hill, Robin L., Ed.Amsterdam: Elsevier, 2007, ch. 22, pp. 485-500. DOI: 10.1016/B978-008044980-7/50024-0.

    Abstract One type of spontaneous speech disfluency is the filled pause, in which a filler (e.g. uh) interrupts production of an utterance. We report a visual world experiment in which participants’ eye movements were monitored while they responded to ambiguous utterances containing filled pauses by manipulating objects placed in front of them. Participant’s eye movements and actions suggested that filled pauses informed resolution of the current referential ambiguity, but did not affect the final parse. We suggest that filled pauses may inform the resolution of whatever ambiguity is most salient in a given situation.

  • Esther de Leeuw, “Hesitation Markers in English, German, and Dutch,” Journal of Germanic Linguistics, vol. 19, no. 2, 2007, pp. 85-114. DOI: 10.1017/S1470542707000049.

    Abstract This study reports on a number of highly significant differences found between English, German, and Dutch hesitation markers. English and German native speakers used significantly more vocalic-nasal hesitation markers than Dutch native speakers, who used predominantly vocalic hesitation markers. English hesitation markers occurred most frequently when preceded by silence and followed by a lexical item, or when surrounded by silence. German and Dutch hesitation markers occurred most frequently surrounded by lexical items. In Dutch, vocalic-nasal hesitation markers dominated only when surrounded by silence. Vocalic-nasal hesitation markers dominated in all positions in English and German, although in the former language this was more salient than in the latter. Nasal hesitation markers were used significantly more frequently in German than in English or Dutch. In addition to overall language trends, speaker-specific differences, especially within German and Dutch, were observed. These results raise questions in terms of the symptom versus signal hypotheses regarding the function of hesitation markers.

  • Carol Fehringer, and Christina Fry, “Hesitation phenomena in the language production of bilingual speakers: The role of working memory,” Folia Linguistica, vol. 41, no. 1-2, June 2007, pp. 37-72. DOI: 10.1515/flin.41.1-2.37. http://related.springerprotocols.com/lp/de-gruyter/hesitation-phenomena-in-the-language-production-of-bilingual-speakers-1GCcNqDqgA.

    Abstract This paper is an empirical investigation of the use of hesitation phenomena, specifically filled pauses (ums and ers), automatisms (sort of, at the end of the day), repetitions and reformulations, in both the mother tongue (L1) and second language (L2) of highly proficient adult bilingual speakers (English and German). Its purpose is to ascertain: i) whether speakers who are highly proficient in L2 produce an approximately similar amount of hesitation phenomena in both languages; and ii) whether the production of such elements (in both languages) is linked to working memory capacity. Results show that: i) despite high proficiency, speakers produced a higher overall rate of hesitation phenomena in their L2, indicating that there was an additional cognitive load imposed by working in L2; and ii) in each language there was an underlying negative relationship between memory capacity and the production of hesitation phenomena, implying that speakers with lower memory ability rely more heavily on such time-buying devices. Furthermore, it was shown that the individual types of hesitation phenomena produced by speakers in their L1 were carried over into their L2, which suggests that a speaker’s planning behaviour is mirrored in both languages.

    Keywords bilingual, hesitation, L2, memory, prefabricated utterance, Speech production, working

  • Jean E. Fox Tree, “Folk notions of um and uh, you know, and like,” Text & Talk, vol. 22, no. 3, 2007, pp. 297-314. DOI: 10.1515/TEXT.2007.012. https://www.degruyter.com/view/j/text.2007.27.issue-3/text.2007.012/text.2007.012.xml.

    Abstract The current study measures laypeople’s uses of 'um', 'uh', 'you know', and 'like', including folk notions of meanings, self-assessments of use, history of discussing use, and attitudes toward the words. Unlike the prevalent idea in the popular press that these discourse markers are interchangeable speaker production flaws, respondents in this study demonstrated that people do possess folk notions of meanings and uses that dramatically distinguish markers from each other. 'Um' and 'uh' were thought to indicate production trouble, 'you know' was thought to be used in checking for understanding and connecting with listeners, and 'like' defied definition. The folk notions of 'um', 'uh', and 'you know' accord well with researchers’ ideas about the meanings of these words. The use of 'like' may be too subtle for laypeople to articulate. Most researchers’ views of 'like' involve some kind of discrepancy between what’s said and what’s meant. Even if they cannot state a meaning, people do treat the different markers differently.

    Keywords Discourse markers, fillers, like, meaning, spontaneous speech, you know

  • Irena O’Brien, Norman Segalowitz, Barbara Freed, and Joe Collentine, “Phonological Memory Predicts Second Language Oral Fluency Gains in Adults,” Studies in Second Language Acquisition, vol. 29, no. 04, 2007, pp. 557-581. DOI: 10.1017/s027226310707043x. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1392672&fulltextType=RA&fileId=S027226310707043X.

    Abstract This study investigated the relationship between phonological memory and second language (L2) fluency gains in native English-speaking adults learning Spanish in two learning contexts: at their home university or abroad in an immersion context. Phonological memory (operationalized as serial nonword recognition) and Spanish oral fluency (temporal&sol;hesitation phenomena) were assessed at two times, 13 weeks apart. Hierarchical regressions showed that, after the variance attributable to learning context was partialed out, initial serial nonword recognition performance was significantly associated with L2 oral fluency development, explaining 4.5-9.7% of unique variance. These results indicate that phonological memory makes an important contribution to L2 learning in terms of oral fluency development. Furthermore, these results from an adult population extend conclusions from previous studies that have claimed a role for phonological memory primarily in vocabulary development in younger populations.

  • Pavel Trofimovich, and Wendy Baker, “Learning prosody and fluency characteristics of second language speech: The effect of experience on child learners’ acquisition of five suprasegmentals,” Applied Psycholinguistics, vol. 28, no. 2, 2007, pp. 251-276. DOI: 10.1017/s0142716407070130. http://journals.cambridge.org/article_S0142716407070130.

    Abstract This study examined second language (L2) experience effects on children’s acquisition of fluency-(speech rate, frequency, and duration of pausing) and prosody-based (stress timing, peak alignment) suprasegmentals. Twenty Korean children (age of arrival in the United States = 7-11 years, length of US residence = 1 vs. 11 years) and 20 age-matched English monolinguals produced six English sentences in a sentence repetition task. Acoustic analyses and listener judgments were used to determine how accurately the suprasegmentals were produced and to what extent they contributed to foreign accent. Results indicated that the children with 11 years of US residence, unlike those with 1 year of US residence, produced all but one (speech rate) suprasegmentals natively. Overall, findings revealed similarities between L2 segmental and suprasegmental learning.

  • Ioana Vasilescu, Rena Nemoto, and Martine Adda-Decker, “Vocalic Hesitations vs Vocalic Systems: A Cross-Language Comparison,” in 16th International Congress of Phonetic Sciences, 2007. http://www.icphs2007.de/conference/Papers/1504/index.html.

    Abstract This paper deals with the acoustic characteristics of vocalic hesitations in a cross-language perspective. The underlying questions concern the "neutral" vs. language-dependent timbre of vocalic hesitations and the link between their vocalic quality and the phonemic system of the language. An additional point of interest concerns the duration effect on vocalic hesitations compared to intra-lexical vowels. Acoustic measurements have been carried out in American English, French and Spanish. Results on vocalic timbre show that hesitations (i) carry language-specific information; (ii) whereas often close to measurements of existing vowels, they do not necessarily collapse with them. Finally, (iii) duration variation affects the timbre of vocalic hesitation and a centralization towards a "neutral" realization is observed for decreasing durations.

    Keywords centralization, duration, timbre, vocalic hesitation, vocalic systems

2006

  • Felix K. Ameka, “Interjections,” in Encyclopedia of Language & Linguistics, Brown, Keith, Ed.Oxford, UK: Oxford, 2006, pp. 743-746. DOI: 10.1016/B0-08-044854-2/00396-5.

    Abstract Interjections are words that conventionally constitute utterances by themselves and express a speaker’s current mental state or reaction toward an element in the linguistic or extralinguistic context. Some English interjections are words such as yuk! ‘I feel disgusted,’ ow! ‘I feel sudden pain,’ wow! ‘I feel surprised and I am impressed,’ aha! ‘I now understand,’ hey! ‘I want someone’s attention,’ damn! ‘I feel frustrated,’ and bother! ‘I feel annoyed.’ Such words are found in all languages of the world. This article surveys the different uses and definitions of the term ‘interjection’ and the different types of interjections that are found in the languages of the world. It also explores the relationship of interjections to other pragmatic devices such as particles, discourse markers, and speech formulae.

    Keywords formulaic language, Indexicality, interjections, language functions, onomatopoeia, particles, routines, speech acts

  • Richard Bello, “Causes and paralinguistic correlates of interpersonal equivocation,” Journal of Pragmatics, vol. 38, no. 9, 2006, pp. 1430-1441. DOI: 10.1016/j.pragma.2005.09.001.

    Abstract This paper examines the long standing theory of the Bavelas group which suggests that the only consistent cause of interpersonal equivocation is avoidance-avoidance conflict (AAC), and it also attempts to uncover a psycholinguistic profile of equivocation, especially in the form of paralinguistic cues such as dysfluencies. Participants responded orally to questions from hypothetical interlocutors within scenarios which manipulated both the presence/absence of AAC and level of situational formality. Their responses (72 messages) were audio taped, transcribed, rated for degree of equivocation, and coded for dysfluencies. Results of ANOVA showed that AAC not only resulted in more equivocation, but also that formality level interacted with AAC in influencing equivocation. Participants used filled pauses, surprisingly, in the condition within which they equivocated the least, although they produced other dysfluencies (combined) within conditions where they equivocated the most. Results are discussed in terms of the notion that filled pauses are special and in terms of interpersonal deception theory.

    Keywords avoidance-avoidance conflict, disfluencies, Equivocation, filled pauses, Informality, Interpersonal communication, Paralinguistics

  • Stefan Benus, Frank Enos, Julia Hirschberg, and Elizabeth Shriberg, “Pauses in Deceptive Speech,” in Speech Prosody 18, Dresden, Germany, 2006, pp. 2-5. http://aune.lpl.univ-aix.fr/sprosig/sp2006/.

    Abstract We use a corpus of spontaneous interview speech to investigate the relationship between the distributional and prosodic characteristics of silent and filled pauses and the intent of an interviewee to deceive an interviewer. Our data suggest that the use of pauses correlates more with truthful than with deceptive speech, and that prosodic features extracted from filled pauses themselves as well as features describing contextual prosodic information in the vicinity of filled pauses may facilitate the detection of deceit in speech.

  • Alex Boulton, “To er is human: Silent pauses and speech dysfunctions of the 2004 US presidential debates,” in Le Désaccord, Pereiro, M. and Daniels, H., Ed.Nancy: AMAES, 2006, pp. 7-32. http://hal.archives-ouvertes.fr/hal-00114282/en/.

    Abstract It has become fashionable, even axiomatic in some circles today, to suppose that politics is all about form, not content—it’s not what they say but the way that they say it. It ought to follow that the most powerful politicians should be the best speakers, so this paper takes as its starting point the 2004 US presidential debates. These televised confrontations, where each candidate has to react to new questions as well as to counter his opponent, are notoriously high-risk, and present considerable opportunities for various speech "dysfunctions". These are analysed in relation to media reaction and public perception of the outcome.

    Keywords cognitive science, disfluency, hesitation, linguistics, presidential debate, speed of articulation

  • Martin Corley, Lucy J. MacGregor, and David Donaldson, “It’s the way that you, er, say it: Hesitations in speech affect language comprehension,” Cognition, vol. 105, no. 3, 2006, pp. 658-698. DOI: 10.1016/j.cognition.2006.10.010. http://www.elsevier.com/locate/COGNIT.

    Abstract Everyday speech is littered with disfluency, often correlated with the production of less predictable words (e.g., Beattie & Butterworth [Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses in spontaneous speech. Language and Speech, 22, 201-211.]). But what are the effects of disfluency on listeners? In an ERP experiment which compared fluent to disfluent utterances, we established an N400 effect for unpredictable compared to predictable words. This effect, reflecting the difference in ease of integrating words into their contexts, was reduced in cases where the target words were preceded by a hesitation marked by the word er. Moreover, a subsequent recognition memory test showed that words preceded by disfluency were more likely to be remembered. The study demonstrates that hesitation affects the way in which listeners process spoken language, and that these changes are associated with longer-term consequences for the representation of the message.

    Keywords disfluency, ERPs, Language comprehension, speech

  • Chika Nagaoka, “Mutual influence of nonverbal behavior in interpersonal communication,” Japanese Journal of Interpersonal and Social Psychology, vol. 6, 2006, pp. 101-112. http://syasin.hus.osaka-u.ac.jp/jjisp/006/nagaoka-a.html.

    Abstract In social interactions, the interactants’ nonverbal behavior may synchronize and become similar. In this study, the author called this phenomenon ‘synchrony tendency’. Since conventional research about this phenomenon has been conducted from various angles separately, there has been almost no attempt to examine the role of synchrony tendency systematically. In this light, the present study aims at reviewing synchrony tendency based on previous studies from various fields and perspectives. The synchrony tendency has been observed in various communication channels, and in various forms, such as interspeaker congruence of paralanguage, convergence of accents in cross-cultural communication, mimicry of other’s facial and vocal emotional expressions, neonate imitation, interpersonal synchrony of body movements, entrainment between a neonate’s body movement and the flow of an adult’s speech. Therefore, this phenomenon has been labeled with various terms, each one having a specific nuance. Moreover, the synchrony tendency is not always observed in all interactions, and it sensitively changes with various factors, such as the interactants’ level of empathy and socialization. For example, the results of my experiments indicate that the convergence of response latencies (i.e., latencies before responding to the last utterance of one’s partner) in dialogues reflects whether a speaker is receptive to the conversational partner during the dialogue. All these suggest that the synchrony tendency provides an effective indicator reflecting various aspects of our communication behavior. Various functions of the synchrony tendency in adults’ interactions can be inferred from past literature: (a) it facilitates the understanding of an interactional partner’s emotions, (b) it conveys empathy and rapport, and (c) it makes the speakers’ personality and attitude feel positive. Furthermore, the results of my experiments showed that the synchrony tendency facilitates goal achievement, such as reaching a compromise through discussion (the speakers whose response latencies became similar over the time course to those of their conversational partners evaluated that they reached a compromise). Past literature along with the results of my own experiments bring to light two aspects of the synchrony tendency: the emotional/automatic/inherent aspect and the cognitive/acquired aspect. Examples that clearly illustrate the former aspect are imitations of facial and vocal emotional expressions and neonate imitation. On the other hand, the cognitive/acquired aspect is illustrated by convergence or congruence of response latencies, vocal intensity, speech duration, language, or accent, and is influenced by social factors. The above-mentioned aspects of the synchrony tendency match Hess, Philippot, & Blairy (1999)’s mimicry model, Giles et al.’s communication accommodation theory (ex. Shepard, Giles, & LePoire, 2001), as well as the author’s speech style convergence model. The speech styles convergence model derived from a series of studies on the convergence of response latencies in dialogues. This model suggests that adopting a partner’s speech style and the output cycle between the interactants being influenced by the speakers’ social skills and attitude towards the partner, this cycle develops over the course of the interaction until the speech styles finally converge to a point most suitable for the members of the dyad to progress smoothly through the dialogue. In the future, it is necessary to investigate quantitatively through which communication channels, and when in the time course of an interaction, the synchrony tendency is displayed.

    Keywords cognition, emotion, nonverbal behavior, synchrony tendency

  • Stefanie Pillai, “Self-Monitoring and Self-Repair in Spontaneous Speech,” k@ta, vol. 8, no. 2, 2006, pp. 114-126. http://puslit2.petra.ac.id/ejournal/index.php/ing/article/viewArticle/16575.

    Abstract This study explores what repairs in the spontaneous production of speech reveal about the psycholinguistic processes of self-monitoring and self-repair. Three intervals were examined: error-to-cut off; cut off-to-repair; error-to-repair. The intervals indicate support theories of internal speech monitoring, and also indicate that the planning of speech-repairs can take place pre-articulatorily as well

    Keywords error-detection, Perceptual loop theory, self-monitoring, self-repairs, Speech production

  • Pavel Trofimovich, and Wendy Baker, “Learning Second Language Suprasegmentals: Effect of L2 Experience on Prosody and Fluency Characteristics of L2 Speech,” Studies in Second Language Acquisition, vol. 28, 2006, pp. 1-30. DOI: 10.1017/S0272263106060013.

    Abstract This study examines effects of short, medium, and extended second language (L2) experience (3 months, 3 years, and 10 years of United States residence, respectively) on the production of five suprasegmentals (stress timing, peak alignment, speech rate, pause frequency, and pause duration) in six English declarative sentences by 30 adult Korean learners of English and 10 adult native English speakers. Acoustic analyses and listener judgments were used to determine how accurately the suprasegmentals were produced and to what extent they contributed to foreign accent. Results revealed that amount of experience influenced the production of one suprasegmental (stress timing), whereas adult learners’ age at the time of first extensive exposure to the L2 (indexed as age of arrival in the United States) influenced the production of others (speech rate, pause frequency, pause duration). Moreover, it was found that suprasegmentals contributed to foreign accent at all levels of experience and that some suprasegmentals (pause duration, speech rate) were more likely to do so than others (stress timing, peak alignment). Overall, results revealed similarities between L2 segmental and suprasegmental learning.

  • Aldert Vrij, Lucy Akehurst, Laura Brown, and Samantha Mann, “Detecting Lies in Young Children, Adolescents and Adults,” Applied Cognitive Psychology, vol. 20, 2006, pp. 1225-1237. DOI: 10.1002/acp.1278.

    Abstract The ability of teachers, social workers, police officers and laypersons (undergraduate and postgraduate students) to detect truths and lies told by 5-6 year-olds, adolescents and adults was tested in the present experiment. Lie detectors judged the veracity of statements from 18 liars and 18 truth tellers belonging to these three age groups. Accuracy scores were around 60% for each of these three age groups, both for detecting truths and for detecting lies. No occupational differences emerged. Moreover, judgements made by teachers, social workers and police officers showed an overlap, suggesting that an erroneous decision made by a member of one group may not easily be detected by a member of the other groups. The lie detectors were inclined to judge cues of nervousness, cognitive demand and attempted behavioural control as cues to deceit, even when truth tellers were displaying these cues.

2005

  • Timothy Arbisi-Kelm, and Sun-Ah Jun, “A comparison of disfluency patterns in normal and stuttered speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_013.pdf.

    Abstract While speech disfluencies are commonly found in every speaker’s speech, stuttering is a language disorder characterized by an abnormally high rate of speech aberrations, including prolongation, cessation, and repetition of speech segments. However, despite the obvious differences between stuttered and normal speech, identifying the crucial qualities that identify stuttered speech remains a significant challenge. A story-telling task was presented to four stutterers and four non-stutterers in order to analyze the prosodic patterns that surfaced from their spontaneous narrations. Preliminary results revealed that the major difference between stutterers’ and non-stutterers’ disfluencies – aside from the total number – is the type of disfluency and the context affected by the disfluency. Disfluencies in both groups included prolongation, pause and cut, but stutterers’ disfluencies also include repetition and combinations of the three (e.g., cut followed by pause). In addition, stutterers’ disfluencies were accompanied by more prosodic irregularities (e.g. pitch accent on function words, creating a prosodic break with degraded phonetic cues) prior to the actual disfluency than non-stutterers’ disfluencies, indirectly supporting the overvigilant self-monitoring hypothesis.

    Keywords DiSS

  • Matthew P. Aylett, “Extracting the acoustic features of interruption points using non-lexical prosodic analysis,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_017.pdf.

    Abstract Non-lexical prosodic analysis is our term for the process of extracting prosodic structure from a speech waveform without reference to the lexical contents of the speech. It has been shown that human subjects are able to perceive prosodic structure within speech without lexical cues. There is some evidence that this extends to the perception of disfluency, for example, the detection interruption points (IPs) in low pass filtered speech samples. In this paper, we apply non-lexical prosodic analysis to a corpus of data collected for a speaker in a multi-person meeting environment. We show how non-lexical prosodic analysis can help structure corpus data of this kind, and reinforce previous findings that non-lexical acoustic cues can help detect IPs. These cues can be described by changes in amplitude and f0 after the IP and they can be related to the acoustic characteristics of hyper-articulated speech.

    Keywords DiSS

  • Katarina Bartkova, “Prosodic cues of spontaneous speech in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 21-25. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_021.pdf.

    Abstract Disfluencies, when present in speech signal, can make syntactic parsing difficult. This difficulty is increased when machines are involved in communication and when speech devices rely on automatic speech recognition techniques. In order to improve automatic speech parsing and thus speech comprehension, methods have been proposed to filter disfluencies out from the speech signal. Attempts have been made to use prosodic parameters to improve such a filtering. However, before introducing prosodic parameters into automatic speech recognition processes, it would be useful to investigate whether disfluencies can be characterized in a prosodic way and whether their prosodic cues would be representative enough to be used in automatic systems. The aim of this study was to examine to which extent prosodic parameters would be able to characterize disfluencies in French. Word repetitions, filled and silent pauses and speech repairs were described in a prosodic way using statistical analyses of their prosodic parameters. These analyses allowed simple prosodic rules to be formulated. The efficiency of the prosodic rules was evaluated on the task of filled pauses, word repetitions and hesitation detections.

    Keywords DiSS

  • Philippe Boula de Mareüil, Benoît Habert, Frédérique Bénard, Martine Adda-Decker, Claude Barras, Gilles Adda, and Patrick Paroubek, “A quantitative study of disfluencies in French broadcast interviews,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 27-32. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_027.pdf.

    Abstract The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the so-called disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers’ status? And what are the most frequent disfuency patterns?

    Keywords DiSS

  • Jean-Leon Bouraoui, and Nadine Vigouroux, “Disfluency phenomena in an apprenticeship corpus,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 33-37. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_033.pdf.

    Abstract This papers presents a study carried out on an apprenticeship corpus. It features dialogues between air traffic controllers in formation and "pseudo-pilots". "Pseudo-pilots" are people (often instructors) that simulate the behavior of real pilots, in real situations. Its main specificities are the apprenticeship characteristic, and the fact that the production is subordinate to a particular phraseology. Our study is related to the many kinds of disfluency phenomena that occur in this specific corpus. We define 6 main categories of these phenomena, and take position in regard to the terminology used in literature. We then present the distribution of these categories. It appears that some of the occurrences frequencies largely differs from those observed in other studies. Our explanation is based on the corpus specificity: in reason of their responsibilities, both controllers and pseudo-pilots have to be especially careful to the mistakes they could do, since they could lead to some dramas. The remainder of our paper is dedicated to the more deepen study of a disfluency class: the "false starts". It consists of the beginning utterance of a word, that is not achieved. We show that this category consists of several sub-categories, of which we study the distribution.

    Keywords DiSS

  • Pierpaolo Busan, Giovanna Pelamatti, Alessandro Tavano, Michele Grassi, and Franco Fabbro, “Improvement of verbal behavior after pharmacological treatment of developmental stuttering: a case study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 39-42. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_039.pdf.

    Abstract Developmental stuttering is a disruption in normal speech fluency and rhythm. Developmental stuttering usually manifests between 6 and 9 years of age and may persist in adulthood. At present, the exact etiology of developmental stuttering is not fully clear. Besides, the dopaminergic neurological component is likely to have a causal role in the manifestation of stuttering behaviors. Actually, some studies seem to confirm the efficacy of antidopaminergic drugs (haloperidol, risperidone and olanzapine, among others) in controlling stuttering behaviors. We present a case of persistent developmental stuttering in a 24-year-old adult male who was able to control his symptoms to a significant extent after administration of risperidone, an antidopaminergic drug. Our findings show that the pharmacological intervention helped the patient improve on a set of fluency tasks but especially when the tasks involved the uttering of content words. Our results are discussed against the current theories on the cognitive and neurological basis of developmental stuttering.

    Keywords DiSS

  • Estelle Campione, and Jean Véronis, “Pauses and hesitations in French spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 43-46. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_043.pdf.

    Abstract In traditional terminology, silent and filled pauses are grouped together, whereas hesitation lengthening is put into a separate category. However, while these various phenomena are very often associated, there have been few studies on how they interact. We analyzed an hour of spontaneous speech to show that silent and filled pauses operate in a totally different way, and that contrary to common belief, silent pauses by themselves never serve as hesitation markers, but only do so when coupled with other markers – mostly syllabic lengthening and filled pauses. These last two hesitation markers have similar acoustic and articulatory characteristics; they are also distributed and function alike.

    Keywords DiSS

  • Maria Candea, Ioana Vasilescu, and Martine Adda-Decker, “Inter- and intra-language acoustic analysis of autonomous fillers,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 47-51. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_047.pdf.

    Abstract The present work deals with autonomous fillers in a multilingual context. The question addressed here is whether fillers are carrying universal or language-specific characteristics. Fillers occur frequently in spontaneous speech and represent an interesting topic for improving language-specific models in automatic language processing. Most of the current studies focus on few languages such as English and French. We focus here on multilingual fillers resulting from eight languages (Arabic, Mandarin Chinese, French, German, Italian, European Portuguese, American English and Latin American Spanish). We propose thus an acoustic typology based on the vocalic peculiarities of the autonomous fillers. Three parameters are considered here: duration, pitch (F0) and timbre (F1/F2). We also compare the vocalic segments of the fillers with intra-lexical vowels possessing similar timbre. In this purpose, a preliminary study on French language is described.

    Keywords DiSS

  • Jennifer Cole, Mark Hasegawa-Johnson, Chilin Shih, Heejin Kim, Eun-Kyung Lee, Hsin-yi Lu, Yoonsook Mo, and Tae-Jin Yoon, “Prosodic parallelism as a cue to repetition and error correction disfluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 53-58. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_053.pdf.

    Abstract Complex disfluencies that involve the repetition or correction of words are frequent in conversational speech, with repetition disfluencies alone accounting for over 20% of disfluencies. These disfluencies generally do not lead to comprehension errors for human listeners. We propose that the frequent occurrence of parallel prosodic features in the reparandum (REP) and alteration (ALT) intervals of complex disfluencies may serve as strong perceptual cues that signal the disfluency to the listener. We report results from a transcription analysis of complex disfluencies that classifies disfluent regions on the basis of prosodic factors, and preliminary evidence from F0 analysis to support our finding of prosodic parallelism.

    Keywords DiSS

  • Andrew A. Cooper, and John T. Hale, “Promotion of disfluency in syntactic parallelism,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 59-63. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_059.pdf.

    Abstract The development of a disfluency-robust speech parser requires some insight into where disfluencies occur in spontaneous spoken language. This corpus study deals with one syntactic variable which is predictive of disfluency location: syntactic parallelism. A formal definition of syntactic parallelism is used to show that syntactic parallelism is indeed predictive of disfluency.

    Keywords DiSS

  • Rodolfo Delmonte, “Modeling conversational styles in Italian by means of overlaps,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 65-70. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_065.pdf.

    Abstract Conversational styles vary cross-culturally remarkably: communities of speakers – rather than single speakers - seem to share turn-taking rules which do not always coincide with those shared by other communities of the same language. These rules are usually responsible for the smoothness of conversational interaction and the readiness of the attainment of communicative goals by conversants. Overlaps constitute a disruptive element in the economy of conversations: however, they show regular patterns which can be used to define conversational styles (Ford and Thompson, 1996). Overlaps constitute a challenge for any system of linguistic representations in that they cannot be treated as a one-dimensional event: in order to take into account the purport of an overlapping stretch of dialogue for the ongoing pragmatics and semantics of discourse, we have devised a new annotation schema which is then fed into the parser and produces a multidimensional linear syntactic constituency representation. This study takes a new tack on the issues raised by overlaps, both in terms of its linguistic representation and its semantic and pragmatic interpretation. It will present work carried out on the 60,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API - the Italian version of MapTask, in particular the parser, to produce syntactic structures of overlapped temporally aligned turns. We will also present preliminary data from IPAR, another corpus of spontaneous dialogues run with the Spot Differences protocol. Then it will concentrate on the syntactic, semantic and prosodic aspects related to this debated issue. The paper will argue in favour of a joint and thus temporally aligned representation of overlapping material to capture all linguistic information made available by the local context. This will result in a syntactically branching node we call OVL which contains both the overlapper’s and the overlappee’s material (linguistic or non-linguistic). An extended classification of the phenomenon has shown that overlaps contribute substantially to the interpretation of the local context rather than the other way around. They also determine the overall conversational style of a given community of speakers with cultural import.

    Keywords DiSS

  • Janet Fletcher, Nicholas Evans, and Belinda Ross, “The intra-word pause and disfluency in Dalabon,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 77-81. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_077.pdf.

    Abstract Earlier impressionistic analyses of Dalabon indicate that the grammatical word is often realized as either an accentual or an intonational phrase, followed by a pause. Unusually, it can also be interrupted by a silent pause, with each section being potentially (although not necessarily) realized as separate intonational phrases. Our analyses of pause duration and pause placement within grammatical words support these earlier impressions, although this use of the silent pause appears to be restricted to certain affix boundaries, and other phonological constraints relating to the following surrounding linguistic material. These interruptions also share certain characteristics of "normal" disfluencies however.

    Keywords DiSS

  • Kristy Beers Fägersten, “Hesitations and repair in German,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 71-76. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_071.pdf.

    Abstract The occurrence of pauses and hesitations in spontaneous speech has been shown to occur systematically, for example, "between sentences, after discourse markers and conjunctions and before accented content words." (Hansson [15]) This is certainly plausible in English, where pauses and hesitations can and often do occur before content words such as nominals, for example, "uh, there’s a ... man." (Chafe [8]) However, if hesitations are, in fact, evidence of "deciding what to talk about next," (Chafe [8]) then the complex grammatical system of German should render this pausing position precarious, since pre-modifiers must account for the gender of the nominals they modify. In this paper, I present data to test the hypothesis that pre-nominal hesitation patterns in German are dissimilar to those in English. Hesitations in German will be shown, in fact, to occur within noun phrase units. Nevertheless, native speakers most often succeed in supplying a nominal which conforms to the gender indicated by the determiner or pre-modifier. Corrections, or repairs, of infelicitous pre-modifiers indicate that the speaker was unable to supply a nominal of the same gender which the choice of pre-modifier had committed him/her to. The frequency of such repairs is shown to vary according to task, with fewest repairs occurring in elicited speech which allows for linguistic freedom and therefore is most like spontaneous speech. The data sets indicate that among German native speakers, hesitations occurring before noun phrase units (pre-NPU hesitations) indicate deliberation of what to say, while hesitations within or before the head of the noun phrase (pre-NPH hesitations) indicate deliberation of how to say what has already been decided (cf. Chafe [8]).

    Keywords DiSS

  • Tiit Hennoste, “Repair-initiating particles and um-s in Estonian spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 83-88. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_083.pdf.

    Abstract Particles and um-s used in spontaneous Estonian speech as initiators of different types of repair are analysed. Our model and typology of repair based on conversation analysis is introduced. Three main types of repair and particles used to initiate those are described: prepositioned self-initiated self-repair, postpositioned self-initiated self-repair (addition, substitution, insertion and abandon), and other-initiated self-repair (reformulation, clarification and misunderstanding). In conclusion 6 groups of particles are brougth out by the role they play in the initiation of the repair sequence. Data come from Corpus of Spoken Estonian of the University of Tartu, which contains everyday and institutional speech, telephone and face-to-face conversations.

    Keywords DiSS

  • Sandrine Henry, “Repeats in spontaneous spoken French: the influence of the complexity of phrases,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 89-92. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_089.pdf.

    Abstract We here present the results of a descriptive study we conducted on 383 disfluent repeats from a corpus of spontaneous spoken French. We analyze noun phrases under construction and study whether there is a co-relation between the frequency of the repeats and the complexity feature of the phrases. We then focus on complex noun phrases in order to locate precisely the repeats. We also analyze how repeats affect structures such as [Preposition + Determiner + Noun] and what the constraints upon such structures are.

    Keywords DiSS

  • Peter Howell, and Olatunji Akande, “Simulations of the types of disfluency produced in spontaneous utterances by fluent speakers, and the change in disfluency type seen as speakers who stutter get older,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 93-98. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_093.pdf.

    Abstract The EXPLAN model is implemented on a graphic simulator. It is shown that it is able to produce speech in serial order and several types of fluency failure produced by fluent speakers and speakers who stutter. A way that EXPLAN accounts for longitudinal changes in the pattern of fluency failures shown by speakers who stutter is demonstrated.

    Keywords DiSS

  • Peter Howell, Jennifer Hayes, Ceri Savage, Jane Ladd, and Nafisa Patel, “Factors that determine the form and position of disfluencies in spontaneous utterances,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 99-102. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_099.pdf.

    Abstract This presentation reviews work on types of disfluency in the spontaneous speech of fluent speakers and speakers who stutter. Examination is made of factors that determine where disfluencies are located. It is concluded that the phonological, or prosodic, word provides a good basis for explaining the distribution of different types of disfluency in spontaneous speech.

    Keywords DiSS

  • T. Florian Jaeger, “Optional ’that’ indicates production difficulty: evidence from disfluencies,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 103-108. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_103.pdf.

    Abstract Optional word omission, such as that omission in complement and relative clauses, has been argued to be driven by production pressure (rather than by comprehension). One particularly strong production-driven hypothesis states that speakers insert words to buy time to alleviate production difficulties. I present evidence from the distribution of disfluencies in non-subject-extracted relative clauses arguing against this hypothesis. While word omission is driven by production difficulties, speakers may use that as a collateral signal to addressees, informing them of anticipated production difficulties. In that sense, word omission would be subject to audience design (i.e. catering to addressees’ needs).

    Keywords DiSS

  • Jumpei Kaneda, “Phrase-final rise-fall intonation and disfluency in Japanese - a preliminary study,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 109-112. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_109.pdf.

    Abstract In Japanese conversations, rise-fall intonation with vowel lengthening often occurs on the final syllable of a phrase. This phrase-final rise-fall (PFRF) is a new type of intonation first reported in the 1960’s. Researchers consider PFRF intonation a discourse marker which functions to sharpen the phrase boundary and retain the utterance turn, but other phrase-final intonation such as phrase-final lengthening (PFL) can have a similar pattern. PFLs are recognized as a type of disfluent speech with similar characteristics to PFRFs in terms of final-lengthening and having discourse functions. Also from reports about the spontaneity of speech, we assume that PFRFs would have a relation with disfluency, as well as with PFLs. To examine this assumption, this paper attempts to show the co-occurrence relation between PFRF and disfluency in the same utterance. The results show that PFRFs and PFLs have a relation to posterior disfluent units and suggest that both indicate speech planning strategies. Further, this paper speculates that a difference between PFRF and PFL is a difference in the purposes of speech planning: the latter represents ongoing linguistic editing while the former indicates adjusting the utterance according to the interlocutor’s reaction. Disfluencies accordingly occur as effects from processes of speech planning.

    Keywords DiSS

  • Shigeyoshi Kitazawa, “Evaluation of vowel hiatus in prosodic boundaries of Japanese,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 113-116. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_113.pdf.

    Abstract We investigated V-V hiatus through J-ToBI labeling and listening to whole phrases to estimate degree of discontinuity and, if possible, to determine the exact boundary between two phrases. Appropriate boundaries were found in most cases as the maximum perceptual score. Using electroglottography (EGG) of the open quotients OQ, pitch mark and spectrogram, the acoustic phonological feature of these V-V hiatus was found as phrase-initial glottalization and phrase-final nasalization observable in EGG and spectrogram, as well as phrase-final lengthening and phrase-initial shortening of the morae. A small dip was observable at the boundary of V-V hiatus showing glottalization. The test materials are taken from the "Japanese MULTEXT", consisting of a particle - vowel (36), adjective - vowel (5), and word - word (4).

    Keywords DiSS

  • Ellen F. Lau, and Fernanda Ferreira, “Lingering effects of disfluent material on comprehension of garden path sentences,” Language and Cognitive Processes, vol. 20, no. 5, 2005, pp. 633-666. DOI: 10.1080/01690960444000142. http://www.tandf.co.uk/journals/pp/01690965.html.

    Abstract In two experiments, we tested for lingering effects of verb replacement disfluencies on the processing of garden path sentences that exhibit the main verb/reduced relative (MV/RR) ambiguity. Participants heard sentences with revisions like The little girl chosen, uh, selected for the role celebrated with her parents and friends. We found that the syntactic ambiguity associated with the reparandum verb involved in the disfluency (here chosen) had an influence on later parsing: Garden path sentences that included such revisions were more likely to be judged grammatical if the reparandum verb was structurally unambiguous. Conversely, ambiguous non-garden path sentences were more likely to be judged ungrammatical if the structurally unambiguous disfluency verb was inconsistent with the final reading. Results support a model of disfluency processing in which the syntactic frame associated with the replacement verb "overlays" the previous verb’s structure rather than actively deleting the already-built tree.

    Keywords Cognitive Psychology, Language, Language & Linguistics, Neuropsychology, Psychology of, Speech & Language Disorders, Speech Perception & Production

  • Che-Kuang Lin, Shu-Chuan Tseng, and Lin-Shan Lee, “Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 117-121. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_117.pdf.

    Abstract This paper presents a whole set of new features, some duration-related and some pitch-related, to be used in disfluency interruption point (IP) detection for spontaneous Mandarin speech, considering the special linguistic characteristics of Mandarin Chinese. Decision tree is incorporated into the maximum entropy model to perform the IP detection. By examining performance degradation when each specific feature was missing from the whole set, the most important features for IP detection for each disfluency type were analyzed in detail. The experiments were conducted on the Mandarin Conversational Dialogue Corpus (MCDC) developed by the Institute of Linguistics of Academia Sinica in Taiwan.

    Keywords DiSS

  • Tobias Lövgren, and Jan van Doorn, “Influence of manipulation of short silent pause duration on speech fluency,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 123-126. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_123.pdf.

    Abstract Ordinary speech contains disfluencies in the form of hesitations and repairs. When listeners make global judgements on speech fluency they are influenced by the frequency and nature of the individual disfluencies contained in the speech. The aim of this study was to investigate a single dimension, pause duration, in the perception of speech fluency. The method involved simulation of pause duration within naturally fluent speech by manipulating existing acoustic silences in the speech. Four conditions were created: one for the natural speech and three with step wise increases in acoustic silence durations (average x2, x4 and x7.5 respectively). In a forced choice task listeners were asked to judge the speech samples as fluent or non fluent. The results showed that the percentage of judgements of disfluency increased as the pause durations increased, and that the difference between the unmanipulated speech condition and the two conditions with the longest pause durations were statistically significant. The results were interpreted to indicate that the individual dimension of pause duration has an independent influence on the judgement of fluency in ordinary speech.

    Keywords DiSS

  • Elgar-Paul Magro, “Disfluency markers and their facial and gestural correlates. preliminary observations on a dialogue in French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 127-131. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_127.pdf.

    Abstract The aim of this article is to try to establish any observable regularities between the vocal and the visual expression of disfluency markers in a French spontaneous dialogue. The data show different configurations for different types of disfluency markers. Thus "euh"s are typically accompanied by mutual eye contact and no gesture; interrupted eye contact takes place less frequently, on occasions where speech planning is more seriously impaired (syntactical disruption and combination of "euh" with other disfluency markers). False starts seem to be typically accompanied by gesture production whereas eye contact can be maintained if the speaker relies or not on the listener to resolve the speech production problem. The article takes up the idea that disfluency markers can be classified along a continuum throughout the speech formulation process, going from the most discreet to the most prominent. It suggests that the more prominent the disfluency, the more likely is the visual channel to play a role (interrupted eye contact and gesture production).

    Keywords DiSS

  • Jan McAllister, and Mary Kingston, “Characteristics of final part-word repetitions,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 7-11. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_007.pdf.

    Abstract In an earlier paper, we have described final part-word repetitions in the conversational speech of two school-age boys of normal intelligence with no known neurological lesions. In this paper we explore in more detail the phonetic and linguistic characteristics of the speech of the boys. The repeated word fragments were more likely to be preceded by a pause than followed by one. The word immediately following the fragment tended to have a higher word frequency score than other surrounding words. Utterances containing the disfluencies typically contained a greater number of syllables than those that did not; however, there was no reliable difference between fluent and disfluent utterances in terms of their grammatical complexity.

    Keywords DiSS

  • Hannele Nicholson, Ellen Gurman Bard, Robin Lickley, Anne H. Anderson, Catriona Havard, and Yiya Chen, “Disfluency and behaviour in dialogue: evidence from eye-gaze,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 133-138. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_133.pdf.

    Abstract Previous research on disfluency types has focused on their distinct cognitive causes, prosodic patterns, or effects on the listener. This paper seeks to add to this taxonomy by providing a psycholinguistic account of the dialogue and gaze behaviour speakers engage in when they make certain types of disfluency. Dialogues came from a version of the Map Task, [2, 4], in which 36 normal adult speakers each participated in six dialogues across which feedback modality and time-pressure were counter-balanced. In this paper, we ask whether disfluency, both generally and type-specifically, was associated with speaker attention to the listener. We show that certain disfluency types can be linked to particular dialogue goals, depending on whether the speaker had attended to listener feedback. The results shed light on the general cognitive causes of disfluency and suggest that it will be possible to predict the types of disfluency which will accompany particular behaviours.

    Keywords DiSS

  • Sieb Nooteboom, “Lexical bias re-re-visited. some further data on its possible cause.,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 139-144. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_139.pdf.

    Abstract This paper describes an experiment eliciting spoonerisms by using the so-called SLIP technique. The purpose of the experiment was to provide a further test of the hypothesis that self-monitoring of inner speech is a major source of lexical bias. This is a follow-up on an earlier experiment in which subjects were explicitly prompted after each response to make a correction in case of a speech error. In the current experiment both the prompt and the extra time for correction were left out, and there was no strong time pressure for the subject in giving his response. It is shown that under these conditions many primed-for spoonerisms are replaced by other, mostly lexical, errors. These ’replacing’ or ’secondary’ errors are more frequent in the condition priming for nonword-nonword errors than in the condition priming for word-word errors. Response times obtained for replacing errors are considerably and significantly longer than response times for overtly interrupted errors, and also longer than response times for the primed-for spoonerisms. This suggests that a time-consuming operation follows the primed-for spoonerisms in inner speech, and replaces those with other speech errors, often to preserve lexicality of the error.

    Keywords DiSS

  • Daniel O’Connell, and Sabine Kowal, “Where Do Interjections Come From? A Psycholinguistic Analysis of Shaw’s Pygmalion,” Journal of Psycholinguistic Research, vol. 34, no. 5, September 2005, pp. 497-514. DOI: 10.1007/s10936-005-6205-x.

    Abstract Starting from our recent findings regarding emotional and initializing functions of interjections in TV and radio interviews (Kowal & O’Connell, 2004b; O’Connell & Kowal, in press; O’Connell, Kowal, & Ageneau, 2005), we used the book and script of Shaw (1916/1969) and the audiotape of the motion picture (Pascal, Asquith, & Howard, 1938) Pygmalion to investigate how actors use interjections to express emotions. The following hypotheses were tested: (1) The actors use the written cues selectively in their oral performance by substituting, adding, and deleting interjections; (2) primary interjections added by the actors are less conventional than those in the written text; (3) durations and number of syllables of Eliza Doolittle’s spoken renditions of her signature interjection ah-ah-ah-ow-ow-ow-oo do not correlate with the length in letters and syllables of the written versions; and (4) there is no evidence for Ameka’s (1992b, 1994) characterization of interjections as temporally isolated, i.e., preceded and followed by silent pauses, in consequence of their syntactic isolation. Our findings confirmed all the hypotheses except for one unexpectedly significant correlation between number of syllables in Eliza Doolittle’s signature interjection in the written version and duration in seconds of the spoken version thereof. The common thread throughout these data is the actor’s need to personalize emotions in a dramatic performance—by means of interjections other than those provided in the written text. In this process of personalization, the emotional and initializing functions of interjections are confirmed.

    Keywords conceptual and medial orality, dramatic performance, emotional expression, interjections, spontaneity

  • Daniel O’Connell, and Sabine Kowal, “Uh and Um Revisited: Are They Interjections for Signaling Delay?,” Journal of Psycholinguistic Research, vol. 34, no. 6, 2005, pp. 555-576. DOI: 10.1007/s10936-005-9164-3.

    Abstract Clark and Fox Tree (2002) have presented empirical evidence, based primarily on the London-Lund corpus (LL; Svartvik & Quirk, 1980), that the fillers uh and um are conventional English words that signal a speaker’s intention to initiate a minor and a major delay, respectively. We present here empirical analyses of uh and um and of silent pauses (delays) immediately following them in six media interviews of Hillary Clinton. Our evidence indicates that uh and um cannot serve as signals of upcoming delay, let alone signal it differentially: In most cases, both uh and um were not followed by a silent pause, that is, there was no delay at all; the silent pauses that did occur after um were too short to be counted as major delays; finally, the distributions of durations of silent pauses after uh and um were almost entirely overlapping and could therefore not have served as reliable predictors for a listener. The discrepancies between Clark and Fox Tree’s findings and ours are largely a consequence of the fact that their LL analyses reflect the perceptions of professional coders, whereas our data were analyzed by means of acoustic measurements with the PRAAT software (www.praat.org). A comparison of our findings with those of O’Connell, Kowal, and Ageneau (2005) did not corroborate the hypothesis of Clark and Fox Tree that uh and um are interjections: Fillers occurred typically in initial, interjections in medial positions; fillers did not constitute an integral turn by themselves, whereas interjections did; fillers never initiated cited speech, whereas interjections did; and fillers did not signal emotion, whereas interjections did. Clark and Fox Tree’s analyses were embedded within a theory of ideal delivery that we find inappropriate for the explication of these phenomena.

    Keywords filled pauses, fillers, hesitations, interjections, spontaneous speech, uh, um

  • Daniel O’Connell, Sabine Kowal, and Carie Ageneau, “Interjections in Interviews,” Journal of Psycholinguistic Research, vol. 34, no. 2, March 2005, pp. 153-171. DOI: 10.1007/s10936-005-3636-3.

    Abstract A psycholinguistic hypothesis regarding the use of interjections in spoken utterances, originally formulated by Ameka (1992b, 1994) for the English language, but not confirmed in the German-language research of Kowal and O’Connell (2004 a & c), was tested: The local syntactic isolation of interjections is paralleled by their articulatory isolation in spoken utterances, i.e., by their occurrence between a preceding and a following pause. The corpus consisted of four TV and two radio interviews of Hillary Clinton that had coincided with the publication of her book Living History (2003) and one TV interview of Robin Williams by James Lipton. No evidence was found for articulatory isolation of English-language interjections. In the Hillary Clinton interviews and Robin Williams interviews, respectively, 71% and 73% of all interjections occurred initially, i.e., at the onset of various units of spoken discourse: at the beginning of turns; at the beginning of articulatory phrases within turns, i.e., after a preceding pause; and at the beginning of a citation within a turn (either Direct Reported Speech [DRS] or what we have designated Hypothetical Speaker Formulation [HSF]. One conventional interjection (OH) occurred most frequently. The Robin Williams interview had a much higher occurrence of interjections, especially nonconventional ones, than the Hillary Clinton interviews had. It is suggested that the onset or initializing role of interjections reflects the temporal priority of the affective and the intuitive over the analytic, grammatical, and cognitive in speech production. Both this temporal priority and the spontaneous and emotional use of interjections are consonant with Wundtrsquos (1900) characterization of the primary interjection as psychologically primitive. The interjection is indeed the purest verbal implementation of conceptual orality.

    Keywords conceptual orality, interjection, interview

  • Berthille Pallaud, “The re-adjustment of word-fragments in spontaneous spoken French,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 145-149. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_145.pdf.

    Abstract A study of word-fragments in spoken French has been undertaken for a few years on the basis of non directive talks corpora recorded and transcribed according to GARS’ conventions (DELIC currently). These disfluencies are often analyzed within the framework of disfluent repetitions. The observations made on these two types of disfluencies led us to distinguish them. The aim of our study is to describe on the one hand insertions which take place in relation to the word interruptions and their re-adjustment, and on the other hand, to specify the types and localizations of retracing which follow these interruptions. Two kinds of incidental clauses were observed at the time of the readjustments which follow these disturbances. Some, (the more numerous) are syntactically linked to the fragment or with its retracing, others are not. Moreover, the word-fragments which will be modified are the only one to be dependent on the type of localization. For the others, this localization does not make it possible to predict the category of interruption (complemented or unfinished). Our results on word-fragments, confirm however that in contemporary French, the retracing at the head of the nominal or verbal group which contains the disfluency remains the simplest example (at the same time the most frequent, [5]. Nevertheless, a third of the retracing either does not go back to the beginning of the Group, or exceeds it.

    Keywords DiSS

  • Myriam Piccaluga, Jean-Luc Nespoulous, and Bernard Harmegnies, “Disfluencies as a window on cognitive processing. an analysis of silent pauses in simultaneous interpreting,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 151-155. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_151.pdf.

    Abstract The paper focuses on silent pauses observed in the productions of subjects involved in simultaneous interpreting tasks. Four bilingual subjects with various degrees of expertise in interpreting and various degrees of mastery of the languages involved (French and Spanish) have been recorded while interpreting utterances of French and Spanish talks. The source discourses had been perturbated by changes both in speech rates (by time compression) and in auditory quality (by addition of a parasiting noise). On the basis of acoustical analyzes performed on the subjects’ productions, statistical analyzes focus both on the number and on the duration of the observed pauses. This double approach enables investigations of the kind of cognitive disturbances caused by the independent variables and allows further speculation on the semiology of the pauses durations.

    Keywords DiSS

  • Melanie Soderstrom, and James L. Morgan, “Disfluency in speech input to infants? The interaction of mother and child to create error-free speech input for language acquisition,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 157-162. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_157.pdf.

    Abstract One characteristic of infant-directed speech is that it is highly fluent compared with adult-directed speech. However, the speech that infants hear still contains disfluencies. Such disfluencies might potentially cause problems for infants during language development. We first analyzed samples of spontaneous speech in the presence of infants (both adult- and infant-directed) and found that under ideal circumstances the speech infants hear is highly fluent. Under less than ideal circumstances infants hear much more highly disfluent speech - however this disfluent speech is almost entirely adult-directed. While grammatically ill-formed, the prosodic structure of these disfluencies might signal their ill-formedness to the infants. In a preference experiment, 10 month olds listened longer to infant-directed speech samples containing prosodic disfluencies than to equated samples without disfluency. However, this effect was found in only one of two counterbalancing groups. Using adult ratings of low-pass versions of these speech samples, we found that infants’ preferences were correlated with the adults’ perception of the relative disfluency of the samples. A follow-up experiment using adult-directed disfluencies found that while the 10 month olds showed no differences in their listening preferences, older infants preferred to listen to the fluent speech. These results suggest that younger and older infants attend differently to infant and adult-directed speech, and that older infants may be able to differentiate grammatical adult-directed input from input distorted by disfluency. We discuss implications of these findings for language acquisition.

    Keywords DiSS

  • Ellen Thompson, “A cross-linguistic look at VP-ellipsis and verbal speech errors,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 163-164. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_163.pdf.

    Abstract This paper argues that consideration of spontaneous speech errors provides insight into cross-linguistic analyses of syntactic phenomena. In particular, I claim that differences in the distribution of non-parallel VP-Ellipsis constructions in English and German, as well as variation in the spontaneously-occurring verbal speech errors, is explained by a parametric analysis of variation in the inflectional systems of the two languages.

    Keywords DiSS

  • Doroteo T. Toledano, Antonio Moreno Sandoval, José Colás Pasamontes, and Javier Garrido Salas, “Acoustic-phonetic decoding of different types of spontaneous speech in Spanish,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 165-168. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_165.pdf.

    Abstract This paper presents preliminary acoustic-phonetic decoding results for Spanish on the spontaneous speech corpus C-ORAL-ROM. These results are compared with results on the read speech corpus ALBAYZIN. We also compare the decoding results obtained with the different types of spontaneous speech in C-ORAL-ROM. As the most important conclusions, the experiments show that the type of spontaneous speech has a deep impact on spontaneous speech recognition results. Best speech recognition results are those obtained on speech captured from the media.

    Keywords DiSS

  • Michiko Watanabe, Yasuharu Den, Keikichi Hirose, and Nobuaki Minematsu, “The effects of filled pauses on native and non-native listeners’ speech processing,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 169-172. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_169.pdf.

    Abstract Everyday speech is abundant with disfluencies. However, little is known about their roles in speech communication. We examined the effects of filled pauses at phrase boundaries on native and non-native listeners in Japanese. Study of spontaneous speech corpus showed that filled pauses tended to precede relatively long and complex constituents. We tested the hypothesis that filled pauses biased listeners’ expectation about the upcoming phrase toward a longer and complex one. In the experiment participants were presented with two shapes at one time, one simple and the other compound. Their task was to identify the one that they heard as soon as possible. The speech stimuli involved two factors: complexity and fluency. As the complexity factor, a half of the speech stimuli described compound shapes with long and complex phrases and the other half described simple shapes with short and simple phrases. As the fluency factor phrases describing a shape had a preceding filled pause, a preceding silent pause of the same length, or no preceding pause. The results of the experiments with both native and non-native listeners showed that response times to the complex phrases were significantly shorter after filled or silent pauses than when there was no pause. In contrast, there was no significant difference between the three conditions for the simple phrases, supporting the hypothesis.

    Keywords DiSS

  • Yelena Yasinnik, Stefanie Shattuck-Hufnagel, and Nanette Veilleux, “Gesture marking of disfluencies in spontaneous speech,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 173-178. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_173.pdf.

    Abstract Speakers effectively use both visual and acoustic cues to convey information in speech. While earlier research has concentrated on the association of visual cues (provided by gestures) with fluent prosodic structure, this study looks at the relationship between visual cues, prosodic markers and spoken disfluencies. Preliminary results suggested that speakers preferentially perform gestures in the eye region in spoken disfluencies, but a more careful frame-by-frame analysis capturing all gestures revealed that movements of the eye region (blinks, frowns, eyebrow raises and changes in direction of eyegaze) occur with high frequency in both fluent and non-fluent speech. The paper describes a method for frame-by-frame labelling of speech- accompanying gestures for a speech sample, whose output can then be combined with independently derived labels of the prosody. Initial analysis of 3 minute samples from two speakers reveals that one speaker produces eye movements in association with disfluencies and the other does not, and that this tendency does not result from alignment of brow gestures with pitch accents.

    Keywords DiSS

  • Yuan Zhao, and Dan Jurafsky, “A preliminary study of Mandarin filled pauses,” in The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, September 2005, pp. 179-182. http://www.isca-speech.org/archive_open/archive_papers/diss_05/dis5_179.pdf.

    Abstract The paper reports preliminary results on Mandarin filled pauses (FPs), based on a large speech corpus of Mandarin telephone conversation. We find that Mandarin intensively uses both demonstratives (zhege ’this’, nage ’that’) and uh/ mm as FPs. Demonstratives are more frequent FPs and are more likely to be surrounded by other types of disfluency phenomena than uh/mm, as well as occurring more often in nominal environments. We also find durational differences: FP demonstratives are longer than non-FP demonstratives, and mm is longer than uh. The study also revealed dialectal influence on the use of FPs. Our results agree with earlier work which shows that a language may divide conversational labor among different FPs. Our work also extends this research in suggesting that different languages may assign conversational functions to FPs in different ways.

    Keywords DiSS

2004

  • Jennifer Arnold, Michael K. Tanenhaus, Rebecca Altmann, and Maria Fagnano, “The Old and Thee, uh, New: Disfluency and Reference Resolution,” Psychological Science, vol. 15, no. 9, September 2004, pp. 578-582. DOI: 10.1111/j.0956-7976.2004.00723.x.

    Abstract Most research on the rapid mental processes of online language processing has been limited to the study of idealized, fluent utterances. Yet speakers are often disfluent, for example, saying "thee, uh, candle" instead of "the candle." By monitoring listeners’ eye movements to objects in a display, we demonstrated that the fluency of an article ("thee uh" vs. "the") affects how listeners interpret the following noun. With a fluent article, listeners were biased toward an object that had been mentioned previously, but with a disfluent article, they were biased toward an object that had not been mentioned. These biases were apparent as early as lexical information became available, showing that disfluency affects the basic processes of decoding linguistic input.

  • J. C. Brown, “Eliminating the Segmental Tier: Evidence from Speech Errors,” Journal of Psycholinguistic Research, vol. 33, no. 2, March 2004, pp. 97-101. DOI: 10.1023/B:JOPR.0000017222.24698.73.

    Abstract The dominant viewpoint regarding phonologically driven speech errors is that segments are the units responsible behind the errors. The goal of this paper is to illustrate the point that other potential candidates for explaining these speech errors, which have gone largely unnoticed, provide a better explanatory framework for speech errors than do segments. By looking at unambiguous cases and patterns of markedness, it can be shown that there exists good evidence for features and prosodic constituents in speech errors, but never any positive evidence for segments. All of these considerations taken into account together lend strong support to the argument that there is no need for a segmental level of analysis in phonology.

    Keywords Phonology, production errors, segments, slips of the tongue

  • Fernanda Ferreira, and Karl G.D. Bailey, “Disfluencies and human language comprehension,” TRENDS in Cognitive Sciences, vol. 8, no. 5, May 2004, pp. 231-237. DOI: 10.1016/j.tics.2004.03.011.

    Abstract Spoken language contains disfluencies, which include editing terms such as uh and um as well as repeats and corrections. In less than ten years the question of how disfluencies are handled by the human sentence comprehension system has gone from virtually ignored to a topic of major interest in computational linguistics and psycholinguistics. We discuss relevant empirical findings and describe a computational model that captures how disfluencies influence parsing and comprehension. The research reviewed shows that the parser, which presumably evolved to handle conversations, deals with disfluencies in a way that is efficient and linguistically principled. The success of this research program reinforces the current trend in cognitive science to view cognitive mechanisms as adaptations to real-world constraints and challenges.

  • Fernanda Ferreira, Ellen F. Lau, and Karl G.D. Bailey, “Disfluencies, language comprehension, and Tree Adjoining Grammars,” Cognitive Science, vol. 28, no. 5, 2004, pp. 721-749. DOI: 10.1016/j.cogsci.2003.10.006.

    Abstract Disfluencies include editing terms such as uh and um as well as repeats and revisions. Little is known about how disfluencies are processed, and there has been next to no research focused on the way that disfluencies affect structure-building operations during comprehension. We review major findings from both computational linguistics and psycholinguistics, and then we summarize the results of our own work which centers on how the parser behaves when it encounters a disfluency. We describe some new research showing that information associated with misarticulated verbs lingers, and which adds to the large body of data on the critical influence of verb argument structures on sentence comprehension. The paper also presents a model of disfluency processing. The parser uses a Tree Adjoining Grammar to build phrase structure. In this approach, filled and unfilled pauses affect the timing of Substitution operations. Repairs and corrections are handled by a mechanism we term "Overlay," which allows the parser to overwrite an undesired tree with the appropriate, correct tree. This model of disfluency processing highlights the need for the parser to sometimes coordinate the mechanisms that perform garden-path reanalysis with those that do disfluency repair. The research program as a whole demonstrates that it is possible to study disfluencies systematically and to learn how the parser handles filler material and mistakes. It also showcases the power of Tree Adjoining Grammars, a formalism developed by Aravind Joshi which has yielded results in many different areas of linguistics and cognitive science.

    Keywords disfluencies, parsing, syntax, TAG

  • Barbara F. Freed, Norman Segalowitz, and Dan P. Dewey, “Context of Learning and Second Language Fluency in French: Comparing Regular Classroom, Study Abroad, and Intensive Domestic Immersion Programs,” Studies in Second Language Acquisition, vol. 26, no. 02, 2004, pp. 275-301. DOI: 10.1017/S0272263104262064. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=214874&fulltextType=RA&fileId=S0272263104262064.

    Abstract We compared the acquisition of various dimensions of fluency by 28 students of French studying in three different learning contexts: formal language classrooms in an at home (AH) institution, an intensive summer immersion (IM) program, and a study abroad (SA) setting. For the purpose of oral data collection, students participated in oral interviews (similar to the Oral Proficiency Interview) at the beginning and the end of the semester and provided information regarding language use and interactions. Analyses included comparisons of gain scores as a function of the learning context and as a function of the time reported using French outside of class. The main findings that reached statistical significance include: (a) The IM group made significant gains in oral performance in terms of the total number of words spoken, in length of the longest turn, in rate of speech, and in speech fluidity based on a composite of fluidity measures. When compared to the AH group, the SA group made statistically significant gains only in terms of speech fluidity but fewer gains than the IM group. The AH group made no significant gains. (b) The IM students reported that they spoke and wrote French significantly more hours per week than the other two groups. The SA group reported using English more than French (although the difference was not statistically significant) and reported using significantly more English in out-of-class activities than the IM group. (c) Multiple regression analyses revealed that reported hours per week spent writing outside of class was significantly associated with oral fluidity gains.

  • Judit Kormos, and Mariann Dénes, “Exploring measures and perceptions of fluency in the speech of second language learners,” System, vol. 32, no. 2, 2004, pp. 145-164. DOI: 10.1016/j.system.2004.01.001.

    Abstract The research reported in this paper explores which variables predict native and non-native speaking teachers’ perception of fluency and distinguish fluent from non-fluent L2 learners. In addition to traditional measures of the quality of students’ output such as accuracy and lexical diversity, we investigated speech samples collected from 16 Hungarian L2 learners at two distinct levels of proficiency with the help of computer technology. The two groups of students were compared and their temporal and linguistic measures were correlated with the fluency scores they received from three experienced native and three non-native speaker teacher judges. The teachers’ written comments concerning the students’ performance were also taken into consideration. For all the native and non-native teachers, speech rate, the mean length of utterance, phonation time ratio and the number of stressed words produced per minute were the best predictors of fluency scores. However, the raters differed as regards how much importance they attributed to accuracy, lexical diversity and the mean length of pauses. The number of filled and unfilled pauses and other disfluency phenomena were not found to influence perceptions of fluency.

  • Sandra Merlo, and Letı́cia Lessa Mansur, “Descriptive discourse: topic familiarity and disfluencies,” Journal of Communication Disorders, vol. 37, 2004, pp. 489-503. DOI: 10.1016/j.jcomdis.2004.03.002.

    Abstract This investigation was undertaken to address questions about topic familiarity and disfluencies during oral descriptive discourse of adult speakers. Participants expressed more attributes when the topic was familiar than when it was unfamiliar. Fillers and lexical pauses were the most frequent disfluencies. The mean duration of each hesitation pause was 776 ms. The sum of hesitation pause durations was well correlated with the number of occurrences. Repetitions, hesitation pauses, and prolongations were shown to have the same role, which was distinct from the role of fillers. The type of analysis conducted in this investigation may be useful in distinguishing between normal and disordered speech production. Learning outcomes: The reader will obtain information about the differences between the number of propositions in familiar and unfamiliar oral descriptions. The reader will also become aware of the distribution of disfluencies in discourse categories employed by the participants in this investigation.

    Keywords Descriptive discourse, disfluency, Fluency, Topic familiarity

  • Daniel O’Connell, and Sabine Kowal, “The History of Research on the Filled Pause as Evidence of ’The Written Language Bias in Linguistics’ (Linell, 1982),” Journal of Psycholinguistic Research, vol. 33, no. 6, 2004, pp. 459-474. DOI: 10.1007/s10936-004-2666-6.

    Abstract Erard’s (2004) publication in the New York Times of a journalistic history of the filled pause serves as the occasion for this critical review of the past half-century of research on the filled pause. Historically, the various phonetic realizations or instantiations of the filled pause have been presented with an odd recurrent admixture of the interjection ah. In addition, the filled pause has been consistently associated with both hesitation and disfluency. The present authors hold that such a mandatory association of the filled pause with disfluency is the product of The Written Language Bias in Linguistics [Linell, 1982] and disregards much cogent evidence to the contrary. The implicit prescriptivism of well formedness—a demand derived from literacy—must be rejected; literate well formedness is not a necessary or even typical property of spontaneous spoken discourse; its structures and functions—including those of the filled pause—are very different from those of written language. The recent work of Clark and Fox Tree (2002) holds promise for moving the status of the filled pause not only toward that of a conventional word, but also toward its status as an interjection. This latter development is also being fostered by lexicographers. Nonetheless, in view of ongoing research regarding the disparate privileges of occurrence and functions of filled pauses in comparison with interjections, the present authors are reluctant to categorize the filled pause as an interjection.

    Keywords disfluency, filler, hesitation, interjection, orality, spontaneity, word

  • Daniel O’Connell, Sabine Kowal, and Edward J. Dill, “Dialogicality in TV News Interviews,” Journal of Pragmatics, vol. 36, 2004, pp. 185-205. DOI: 10.1016/j.pragma.2003.06.001.

    Abstract Eight TV news interviews, six American, one British, and one German, were analyzed for markers of orality/literacy (back channeling, hesitations, interruptions, contractions and elisions, first-person singular pronominals, interjections, and tag questions). The interviewer/interviewee pairs were: W. Blitzer/B. Clinton; K. Couric/H. Clinton; B. Shaw/B. Bush, /M. Thatcher, /B. Goldwater, and /C. Powell; M. Bashir/Princess Diana; and G. Gaus/H. Arendt. The most evident markers of orality were hesitations (filled pauses, repeats, and false starts) and first-person singular pronominals on the part of interviewees. Across the four interviews of B. Shaw, there were notable differences in style for both interviewer and interviewees. The women participants used interjections and tag questions more frequently than the men and were interrupted more often by the men. The results are interpreted in light of a dialogical theory of intersubjectivity.

    Keywords Dialogicality, Discourse markers, Informality, Intersubjectivity, orality, TV news interviews

  • Norman Segalowitz, and Barbara F. Freed, “Context, Contact, and Cognition in Oral Fluency Acquisition: Learning Spanish in At Home and Study Abroad Contexts,” Studies in Second Language Acquisition, vol. 26, no. 02, 2004, pp. 173-199. DOI: 10.1017/s0272263104262027. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=214862&fulltextType=RA&fileId=S0272263104262027.

    Abstract This study investigates the role of context of learning in second language (L2) acquisition. Participants were 40 native speakers of English studying Spanish for one semester in one of two different learning contexts—a formal classroom at a home university (AH) and a study abroad (SA) setting. The research looks at various indexes of oral performance gains—particularly gains in oral fluency as measured by temporal and hesitation phenomena and gains in oral proficiency based on the Oral Proficiency Interview (OPI). The study also examines the relation these oral gains bore to L2-specific cognitive measures of speed of lexical access (word recognition), efficiency (automaticity) of lexical access, and speed and efficiency of attention control hypothesized to underlie oral performance. The learners also provided estimates of the number of hours they spent in extracurricular language-contact activities. The results show that in some respects learners in the SA context made greater gains, both in terms of temporal and hesitation phenomena and in oral proficiency as measured by the OPI, than learners in the AH context. There were also, however, significant interaction effects and correlational patterns indicating complex relationships between oral proficiency, cognitive abilities, and language contact. The results demonstrate the importance of the dynamic interactions that exist among oral, cognitive, and contextual variables. Such interactions may help explain the enormous individual variation one sees in learning outcomes, and they underscore the importance of studying such variables together rather than in isolation.

  • Segalowitz,Sidney J., and Lane,Korri, “Perceptual fluency and lexical access for function versus content words,” Behavioral and Brain Sciences, vol. 27, 4 2004, pp. 307–308. DOI: 10.1017/S0140525X04310071. http://journals.cambridge.org/article_S0140525X04310071.

    Abstract By examining single-word reading times (in full sentences read for meaning), we show that (1) function words are accessed faster than content words, independent of perceptual characteristics; (2) previous failures to show this involved problems of frequency range and task used; and (3) these differences in lexical access are related to perceptual fluency. We relate these findings to issues in the literature on event-related potentials (ERPs) and neurolinguistics.

  • Chung-Hsien Wu, and Gwo-Lang Yan, “Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition,” Journal of VLSI Signal Processing, vol. 36, no. 2-3, 2004, pp. 91-104. DOI: 10.1023/B:VLSI.0000015089.17975.f4.

    Abstract Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses "ah," "ung," "um," "em," and "hem" in spontaneous speech. Karhunen-Loéve transform (KLT) and linear discriminant analysis (LDA) were adopted to select discriminant features for filled pause detection. In order to suitably determine the number of discriminant features, Bartlett hypothesis testing was adopted. Twenty-six features were selected using Bartlett hypothesis testing. Gaussian mixture models (GMMs), trained with a gradient decent algorithm, were used to improve the filled pause detection performance. The experimental results show that the filled pause detection rates using KLT and LDA were 84.4% and 86.8%, respectively. A significant improvement was obtained in the filled pause detection rate using the discriminative GMM with KLT and LDA. In addition, the LDA features outperformed the KLT features in the detection of filled pauses.

2003

  • Martine Adda-Decker, Benoît Habert, Claude Barras, Gilles Adda, Philippe Boula de Mareuil, and Patrick Paroubek, “A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 67-70. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_067.pdf.

    Abstract The aim of this study is to elaborate a disfluent speech model by comparing different types of audio iranscripts. The study makes use of 10 hours of French radio interview archives, involving journalists and personalities from political or civil society. A first type of transcripts is press-oriented where most disfluencies are discarded. For 10% of the corpus, we produced exact audio transcripts: all audible phenomena and overlapping speech segments are transcribed manually. In these iranscripts about 14% of the words correspond to disfluencies and discourse markers. The audio corpus has then been iranscribed using the LIMSI speech recognizer. With 8% of the corpus the disfluency words explain 12% of the overall error rate. This shows that disfluencies have no major effect on neighboring speech segments. Restarts are the most error prone, with a 36.9% within class error rate.

    Keywords DiSS

  • Jennifer Arnold, Maria Fagnano, and Michael K. Tanenhaus, “Disfluencies Signal Theee, Um, New Information,” Journal of Psycholinguistic Research, vol. 32, no. 1, January 2003, pp. 25-36. DOI: 10.1023/A:1021980931292.

    Abstract Speakers are often disfluent, for example, saying "theee uh candle" instead of "the candle." Production data show that disfluencies occur more often during references to things that are discourse-new, rather than given. An eyetracking experiment shows that this correlation between disfluency and discourse status affects speech comprehension. Subjects viewed scenes containing four objects, including two cohort competitors (e.g., camel, candle), and followed spoken instructions to move the objects. The first instruction established one cohort as discourse-given; the other was discourse-new. The second instruction was either fluent or disfluent, and referred to either the given or new cohort. Fluent instructions led to more initial fixations on the given cohort object (replicating Dahan et al., 2002). By contrast, disfluent instructions resulted in more fixations on the new cohort. This shows that discourse-new information can be accessible under some circumstances. More generally, it suggests that disfluency affects core language comprehension processes.

    Keywords disfluency, information status, language processing, reference comprehension

  • Matthew P. Aylett, “Disfluency and speech recognition profile factors,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 51-54. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_051.pdf.

    Abstract This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson & Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.

    Keywords DiSS

  • Karl G.D. Bailey, and Fernanda Ferreira, “Disfluencies affect the parsing of garden-path sentences,” Journal of Memory and Language, vol. 49, no. 2, 2003, pp. 183-200. DOI: 10.1016/S0749-596X(03)00027-5.

    Abstract Spontaneous speech differs in several ways from the sentences often studied in psycholinguistics experiments. One important difference is that naturally produced utterances often contain disfluencies. In this study, we examined how the presence of “uh” in a spoken sentence might affect processes that assign syntactic structure (i.e., parsing). Four experiments are reported. In the first, participants judged the grammaticality of sentences that had disfluencies either right before the head noun of the ambiguous phrase or after (e.g., Sandra bumped into the busboy and the uh uh waiter told her to be careful or Sandra bumped into the busboy and the waiter uh uh told her to be careful). Sentences in the latter condition were judged grammatical less often. This result was replicated in the second experiment, in which disfluencies were replaced with environmental sounds. These findings suggest that interruptions can affect syntactic parsing, and the content of the interruption need not be speechlike. In Experiments 3 and 4 we tested whether these effects occurred because listeners use interruptions as cues to help resolve a structural ambiguity. Results from these latter two grammaticality judgment tasks suggest that when an interruption occurs before an ambiguous noun phrase, comprehenders are more likely to assume that the noun phrase is the subject of a new clause rather than the object of an old one, and furthermore, it appears that the parser is relatively insensitive to the form of the interruption. We conclude that disfluencies can influence the parser by signaling a particular structure; at the same time, for the parser, a disfluency might be any interruption to the flow of speech.

  • Alan Bell, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, and Daniel Gildea, “Effects of disfluencies, predictability, and utterance position on word form variation in English conversation,” Journal of the Acoustical Society of America, vol. 113, no. 2, February 2003, pp. 1001-1024. DOI: 10.1121/1.1534836.

    Abstract Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., ði, ðæt, ænd, ʌv) or a more reduced or lenited pronunciation (e.g., ðə, ðit, n, ə). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.

  • Ramona Benkenstein, and Adrian P. Simpson, “Phonetic correlates of self-repair involving word repetition in German spontaneous speech,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 81-84. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_081.pdf.

    Abstract A phonetic description of self-initiated self-repair sequences involving the repetition of words in German spontaneous speech is presented. Data are drawn from the Kiel Corpus of Spontaneous Speech. The description is primarily impressionistic auditory, but it also employs acoustic records to verify and objectify the impressionistic findings. A number of different patterns around cut-off are identified. The comparison of phonetic differences between reparandum and repair tokens is used to argue that repair sequences can also provide an interesting insight into the way in which fluent stretches of spontaneous speech are phonetically organized.

    Keywords DiSS

  • Martin Corley, and Robert Hartsuiker, “Hesitation in speech can... um... help a listener understand,” in Proceedings of the twenty-fifth meeting of the Cognitive Science Society, Erlbaum, August 2003, pp. 276-281. http://csjarchive.cogsci.rpi.edu/proceedings/2003/mac/prof70.html.

    Abstract This paper investigates the effect of disuencies on listeners’ on-line processing of speech. More specifically, it tests the hypothesis that filled pauses like um, which tend to occur before words that are low in accessibility, act as a signal to the listener that a relatively inaccessible word is about to be produced. Two experiments are reported, in which participants followed recorded instructions to press buttons corresponding to images on a computer screen. In 50% of trials, the spoken name of the image was preceded by um. In experiment 1, the intrinsic accessibility of the target items was manipulated (by means of lexical frequency); in experiment 2, the extrinsic (visual) accessibility varied. Both experiments demonstrated that participants were quicker to respond when a target was preceded by um, regardless of whether the item referred to was difficult to access or not. In addition, in experiment 2 there was a weak interaction between accessibility and presence or absence of an um. We present the data here as early evidence that listeners can benefit from disfluencies in others’ speech, and outline some methodological and theoretical considerations and further experiments.

  • Yasuharu Den, “Some strategies in prolonging speech segments in spontaneous Japanese,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 87-90. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_087.pdf.

    Abstract In this paper, we investigate segmental prolongation in a corpus of spontaneous Japanese monologues consisting of over 700,000 words. We examine effects on the rate of prolongation of various factors including speech types, the genders of speakers, word classes, word positions in the phrase and in the inter-pausal unit, and the presence of preceding fillers. Based on the empirical findings, we state some sirategies in prolonging speech segments used by Japanese speakers.

    Keywords DiSS

  • Sheena Finlayson, Victoria Forrest, Robin Lickley, and Janet Mackenzie Beck, “Effects of the restriction of hand gestures on disfluency,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 21-24. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_021.pdf.

    Abstract This paper describes an experimental pilot study of disfluency and gesture rates in spontaneous speech where speakers perform a communication task in three conditions: hands free, one arm immobilized, both arms immobilized. Previous work suggests that the restriction of the ability to gesture can have an impact on the fluency of speech. In particular, it has been found that the inability to produce iconic gestures, which depict actions and objects, results in a higher rate of disfluency. Models of speech production account for this by suggesting that gesture and speech production are part of the same integrated system. Such models differ in their interpretation of the location of the gesture planning mechanism in relation to the speech model: some authors suggest that iconic gestures relate closely to lexical access, while others suggest that the link is located around the conceptualization stage. The findings of this study tentatively confirm that there is a relationship beiween gesture and fluency - overall, disfluency increases as gesture is restricted. But it remains unclear whether the disfluency is more related to lexical access than to conceptualization. Proposals for a larger study are suggested. The work is of interest to psycholinguists focusing on the integration of gesture into models of speech production and to Speech and Language Therapists who need to know about the impact that an impaired ability to produce gestures may have on communication.

    Keywords DiSS

  • Kotaro Funakoshi, and Takenobu Tokunaga, “Evaluation of a robust parser for spoken Japanese,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 55-58. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_055.pdf.

    Abstract We implemented a parser designed to handle ill-formedness in Japanese speech. The parser was evaluated by utilizing newly collected speech data, which was obtained from an experiment designed to produce ill-formed data effectively. Introducing the proposed method increased the number of correctly analyzed utterances from 171 to 322, from among 532 utterances in the corpus.

    Keywords DiSS

  • Robert J. Hartsuiker, Martin Corley, Robin Lickley, and Melanie Russell, “Perception of disfluency in people who stutter and people who do not stutter: Results from magnitude estimation,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 35-37. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_035.pdf.

    Abstract Recent accounts of stuttering consider disfluencies the result of an interaction between speech planning and self- monitoring, emphasizing the continuity beiween errors made in everyday speech and those made by people who stutter. On Vasi9 & Wijnen’s account, the monitor is hypervigilant for upcoming problems and interrupts and restarts the speech signal, resulting in disfluent speech. Crucially, on this account, self-monitoring is a perceptual function. Therefore, this account makes iwo predictions (1) people who stutter are also hypervigilant in perceiving another person’s speech. (2) the quality of disfluencies made by people who stutter and those who do not will be comparable. We tested these hypotheses using a magnitude estimation judgment task. Twenty participants who stutter and 20 conirols were asked to rate the fluency of excerpted fluent and disfluent fragments from recorded dialogues, either between people who stutter or beiween non-stutterers. In line with the first hypothesis, people who stutter tended to rate all fragments as more disfluent than controls did. However the second hypothesis was not confirmed: across judges, fluent and disfluent fragments excerpted from recordings of people who stutter were rated as less fluent than those excerpted from conirol dialogues, suggesting that there are perceptually relevant differences between the speech of PWS and PWDNS, independent of number and type of disfluencies.

    Keywords DiSS

  • Sandrine Henry, and Berthille Pallaud, “Word fragments and repeats in spontaneous spoken French,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 77-80. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_077.pdf.

    Abstract This paper presents the results of a study conducted on the interaction of two disfluencies: repeats and word fragments. It is based on 150 repeated word fragments (e.g., "on le re- re- revendique encore une fois") extracted from a one-million-word corpus of spoken French. Word fragments such as: "notre metier spé- spécifique", are, like repeats (e.g., "vous avez évalué le le montant des dégâts"), very frequent events in spoken language: on average, there is 1 word fragment every 50 seconds, 1 repeat every 17 seconds. Speakers and listeners alike are generally unaware of these phenomena as if they were not part of the communication process. They seldom trigger a metalinguistic reaction from the speaker and are even more rarely acknowledged by the listener. These phenomena have sometimes been interpreted as ’errors’ in the communication process, like slips of the tongue. Word fragments and repeats encompass different categories of phenomena, and this enables us to define them as an heterogeneous group ruled by different types of constraints and mechanisms.2 This analysis rests on the following criteria: structural aspects of the repeat, types of word fragments, morphological and syntactic aspects. Analyses of these repeated of identical word fragments from two different angles - that of the repeats and then that of the word fragments - confirm the relevance of the distinction beiween these two types of disfluencies.

    Keywords DiSS

  • Peter Howell, “Is a perceptual monitor needed to explain how speech errors are repaired?,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 31-34. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_031.pdf.

    Abstract Kolk & Postma [2] proposed, following Dell & O’Seaghdha [1], that when a speaker chooses a word, phonologically-related words as well as the intended word are activated. Initially, the activations of all these words are similar, though eventually the intended word reaches a higher asymptotic value when activation is complete [1]. According to Kolk & Postma [2], if a response is made in the phase where activation is building up (rather than at full activation), there is a higher chance of the competing, rather than the intended, word being selected (i.e. an error). They propose that a speaker detects such errors when they are produced overtly using the perceptual system, and a monitor in the linguistic system responds by interrupting and initiating the correction [2]. Word repetition and hesitation (not errors in themselves) have been regarded as signifying underlying errors that are detected and interrupted before speech is output in a similar way to overt errors. An assumption in [2] is that activation for a word stops (or, if it continues, is ignored) immediately a candidate word is selected. The brain processes responsible for speech production have massive parallel capacity. Consequently, activation for all the candidates for a word slot could continue beyond the point where a word is selected in cases where a word is responded to prematurely. when the selected word reaches asymptote, the relative activations of this and the other candidate words indicate when an error has occurred (when the selected word has a lower activation than one of the competing words), and what correction is appropriate (the word with the highest activation). This provides the basis for error detection and correction without the need for a perceptual monitor. Continuing the buildup of activation after a word has been selected, implies that activation of nearby words in its phrase overlaps. It is shown, with some realistic assumptions about how activation builds up and decays across different words in a phrase, that this model predicts word repetition and hesitation and also part-word disfluencies (a characteristic of stuttering), again without the need for a perceptual monitor.

    Keywords DiSS

  • Kim Kirsner, John Dunn, and Kathryn Hird, “Fluency: Time for a Paradigm Shift,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 13-16. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_013.pdf.

    Abstract Pauses in spontaneous speaking constitute a rich source of data for several disciplines. They have been used to enhance automatic segmentation of speech, classification of patients with acquired communication disorders, the design of psycholinguistic models of speaking, and the analysis of psychological disorders. Unfortunately, however, although pause analysis has been with us for more than 40 years, their interpretation has been compromised by several problems [1]. The first problem is that the pause distribution is skewed, making mean duration a poor measure of central tendency. The second problem is that there are at least two components to the pause duration distribution, a problem that has been confounded by the fact that most authors have assumed that short pauses can be ignored. The third problem is that many scholars have used an arbitrary criterion to separate the pause components thereby adopting statistics that reflect errors of commission or omission. In this paper we review recent work that resolves each of these issues and illustrates the application of the new paradigm to a variety of problems. Our research indicates that, first, there are at least two pause duration distribufl’ons, each of which may be sensitive to theoretically interesting variables; second, the distributions are log-normal, thereby opening the way to appropriate measures of central tendency and dispersion, and, third, the distributions can be reliably separated by application of signal detection theory, and the proportion of misclassifications minimised and estimated. This paper reviews recent research using the new approach to pause analysis.

    Keywords DiSS

  • Koji Kitayama, Masataka Goto, Katunobu Itou, and Tetsunori Kobayashi, “Speech Spotter: New Speech Interface Capable of Invoking Speech Recognition Functions during Human-Human Conversation,” in Proceedings of Workshop on Interactive Systems and Software, 2003, pp. 9-18. http://www.wiss.org/WISS2003/program.html.

    Abstract In this paper, we propose a novel speech interface function, called "Speech Spotter", which enables a user to enter voice commands into a speech recognizer during natural human-human conversation. Only when a user utters a filled pause (a vowel-lengthening hesitation like "er...") and then utters a voice command with a high pitch, its voice command is accepted by the speech recognizer. Thus the Speech Spotter function makes full use of nonverbal information of human voice: a filled pause and the voice pitch of an utterance. By using the Speech Spotter function, we built two application systems: "on-demand human-human conversation support system" and "a telephone system with BGM-playback function". The results of using these systems showed that the Speech Spotter function is robust and convenient enough to be used in daily human-human conversation at a site or over a cellular phone.

  • Göran Kjellmer, “Hesitation. In Defence of ER and ERM,” English Studies, vol. 84, no. 2, 2003, pp. 170-198. DOI: 10.1076/enst.84.2.170.14903.

    Abstract Speech differs in a number of ways from writing. How great the differences are has only been fully realised when detailed comparisons were made possible by the publication of large corpora that were partly or wholly based on the spoken language. While the two media, speech and writing, necessarily have large sections in common, it is true to say that they often use widely differing means of conveying information. The means that are specific to speech were long either neglected or ignored by researchers, so that the description of individual languages was formerly based mainly on their written manifestations. One characteristic of speech is its frequent indication of hesitation or uncertainty. The means by which it is expressed range from nonlinguistic, such as gestures, facial expressions and bodily movements to linguistic, such as repetitions. Another linguistic hesitation marker is the pause, whether silent or filled. This feature can now be studied by means of modern corpora.

  • Torbjörn Lager, “In dialogue with a desktop calculator: A concurrent stream processing approach to building simple conversational agents,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 59-62. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_059.pdf.

    Abstract Human spontaneous face-to-face conversations are characterized by phenomena such as turn-taking, feedback, sounds of hesitation and repairs. A simple and highly modular stream-based approach to natural language processing is proposed that attempts to deal with such things. A basic version of the model has been implemented in the Oz programming language.

    Keywords DiSS

  • Piroska Lendvai, Antal van den Bosch, and Emiel Krahmer, “Memory-based disfluency chunking,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 63-66. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_063.pdf.

    Abstract We investigate the feasibility of machine learning in automatic detection of disfluencies in a large syntactically annotated corpus of spontaneous spoken Dutch. We define disfluencies as chunks that do not fit under the syntactic iree of a sentence (including fragmented words, laughter, self-corrections, repetitions, abandoned constituents, hesitations and filled pauses). we use a memory-based learning algorithm for detecting disfluent chunks, on the basis of a relatively small set of low-level features, keeping track of the local context of the focus word and of potential overlaps between words in this context. We use attenuation to deal with sparse data and show that this leads to a slight improvement of the results and more efficient experiments. We perform a search for the optimal settings of the learning algorithm, which yields an accuracy of 97% and an F-score of 80%. This is a significant improvement of the baselines and of the results obtained with the default settings of the learner.

    Keywords DiSS

  • Krisztina Menyhárt, “Age-dependent types and frequency of disfluencies,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 45-48. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_045.pdf.

    Abstract The age-dependent changes of one’s speech production from childhood up to old age are relatively well known. However, there has been less research conducted concerning the possible alterations of the disfluency phenomena in speakers’ spontaneous speech determined by age. Our hypothesis is that permanent changes are going on in the operation of speech production processes from early childhood up to old age, and that those changes can be studied via observing disfluency phenomena. A series of experiments has been carried out with the participation of altogether 30 Hungarian-speaking persons, children, midle-aged adults and old subjects (ages of 77). Their spontaneous speech was recorded and analyzed concerning the articulation and speech tempi, silent and filled pauses, as well as other disfluency phenomena (like false starts, repetitions, slips, etc.). The aim of the research is to explore the invariant and variable factors of the disfluencies depending on age. The results highlight also the individual differences that seem to be independent of the age factor.

    Keywords DiSS

  • Hannele Nicholson, Ellen Gurman Bard, Rohin Lickley, Anne H. Anderson, Jim Mullin, David Kenicer, and Lucy Smallwood, “The intentionality of disfluency: Findings from feedback and timing,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 17-20. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_017.pdf.

    Abstract This paper addresses the causes of disfluency. Disfluency has been described as a strategic device for intentionally signalling to an interlocutor that the speaker is committed to an utterance under construction. It is also described as an automatic effect of cognitive burdens, particularly of managing speech production during other tasks. To assess these claims, we used a version of the map task and tested 24 normal adult subjects in a baseline untimed monologue condition against conditions adding either feedback in the form of an indication of a supposed listener’s gaze, or time-pressure, or both. Both feedback and time-pressure affected the nature of the speaker’s performance overall. Disfluency rate increased when feedback was available, as the strategic view predicts, but only deletion disfluencies showed a significant effect of this manipulation. Both the nature of the deletion disfluencies in the current task and of the information which the speaker would need to acquire in order to use them appropriately suggest ways of refining the strategic view of disfluency.

    Keywords DiSS

  • Sieb G. Nooteboom, “Self-monitoring is the main cause of lexical bias in phonological speech errors,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 27-30. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_027.pdf.

    Abstract In this paper I present new evidence, stemming both from an experiment and from spontaneous speech, demonstrating that (a) lexical bias is caused by self-monitoring of inner speech, as proposed by Levelt et al. [1], and (b) that there is phoneme-to-word feedback in the mental programming of speech, as supposed by Dell [2] and Stemberger [3]. It is argued here that possibly phoneme-to-word feedback is an unavoidable side-effect of self-monitoring of inner speech.

    Keywords DiSS

  • Caroline L. Rieger, “Disfluencies and hesitation strategies in oral L2 tests,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 41-44. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_041.pdf.

    Abstract This paper presents an investigation of hesitation strategies of intermediate learners of German as a second or foreign language (L2) when they take part in oral L2 tests. Previous studies of L2 hesitation strategies have focused on beginning and advanced L2 learners. They found that beginners tend to leave their hesitation pauses unfilled making their speech highly disfluent [17], while advanced L2 speakers - similar to native speakers - use a variety of fillers. In oral L2 tests, intermediate learners hesitate mainly for two reasons: to search for a German word or structure, or to think about the content of their utterance. Some participants use a variety of strategies to signal to the addressee that they are hesitating. This variety is not as rich as it is for advanced L2 learners or native speakers. Other participants leave their hesitation pauses unfilled or rely on quasi-lexical fillers to hold the floor when hesitating.

    Keywords DiSS

  • Guergana Savova, and Joan Bachenko, “Prosodic features of four types of disfluencies,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 91-94. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_091.pdf.

    Abstract We present a corpus-based approach for using intonation and duration to detect disfluency sites. The questions we aim to answer are: what are the prosodic cues for each disfluency type? Can predictive models be built to describe the relationship between disfluency types and prosodic cues? Are there correlations beiween the reparandum onset and offset and the repair onset and offset? Is there a general prosodic strategy? Our findings support four main hypotheses: 1) The Combination Rule: A single prosodic feature does not uniquely identify disfluencies or their types. Rather, it is a combination of several features that signals each type. 2) The Compensatory Rule: If there is an overlap of one prosodic feature, then another cue neutralizes the overlap. 3) The Discourse Type Rule: Prosodic cues for disfluencies vary according to discourse type. 4) The Expanded Reset Rule: Repair onsets are dependent on reparandum onsets and reparandum offsets. The limitation of the current study is the relatively small corpus size. Further testing of our proposed hypotheses is needed.

    Keywords DiSS

  • Shu-Chuan Tseng, “Repairs and repetitions in spontaneous Mandarin,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 73-76. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_073.pdf.

    Abstract 246 overt repairs, 653 complete repetitions and 475 partial repetitions were identified in an annotated corpus of spontaneous Mandarin conversations. On the basis of the data, this paper investigates Mandarin repairs and repetitions by segmenting them into the reparandum part, the editing part and the reparans part and by tagging them using the CKIP automatic word segmentation and tagging system. Results of the use of editing term, the distribution of part of speech and syllables in the reparandum are presented. Semantic differences and similarity in the discrepancy of tagging results of the reparandum and the reparans are also discussed.

    Keywords DiSS

  • Fan Yang, Peter A. Heeman, and Susan E. Strayer, “Acoustically verifying speech repair annotations,” in Disfluency in Spontaneous Speech (DiSS ’03) (Gothenburg Papers in Theoretical Linguistics), vol. 90, Göteborg, Sweden, September 2003, pp. 97-100. http://www.isca-speech.org/archive_open/archive_papers/diss_03/dis3_097.pdf.

    Abstract Identifying speech repairs is a critical part of annotating spontaneous speech. DialogueView is an annotation tool that provides visual and audio supports for directly annotating speech repairs. In this paper, we report the usability of clean play, a special feature implemented in DialogueView, which cuts out the annotated reparanda and editing terms and plays the remaining speech. We find that although clean play does not help users detect repairs, it does help them determine the extent of repairs. We also find that clean play improves users’ confidence because they have another way to verify their annotations.

    Keywords DiSS

2002

  • Jennifer Arnold, Maria Fagnano, and Michael K. Tanenhaus, “Disfluencies signal theee, um, new stuff: Immediate use of disfluencies during reference comprehension,” in 15th CUNY Conference on Human Sentence Processing, New York, NY, 2002. http://qcpages.qc.cuny.edu/~efernand/CUNY2002/program/absts/021.htm.

    Abstract Spontaneous speech is rarely fluent, resulting in hesitations, fillers ("um" / "uh"), repeated or repaired words, or pronouncing "the" as /thiy/ (Fox Tree & Clark, 1997). Yet these features are generally considered to not affect the core processes of language comprehension. While disfluencies have been argued to signal that the speaker is having difficulty (Clark & Wasow, 1998; Fox Tree & Clark, 1997), this metalinguistic knowledge has not been linked to specific language comprehension phenomena. A corpus analysis showed that speakers are disfluent more often when referring to entities that are new (rather than given) in the discourse. If listeners are sensitive to this correlation, disfluencies at the start of a noun phrase should lead them to focus on objects that are visible but have not yet been mentioned. Eye movements of 24 native speakers of English were recorded as they listened to pairs of instructions to move objects on a computer screen (Table 1). Each display contained 4 colored pictures (Rossion & Purtois, 2001), including two cohorts (e.g., camel/candle). We investigated the time course of referent identification for the first noun in the second instruction, manipulating whether: 1) the critical NP was fluent (the camel) or disfluent (thiy, uh, camel), and 2) the referent was discourse-new, or was given but unfocused, having just been mentioned as the goal of the first instruction. All NPs were accented. Disfluent NPs should lead to faster target looks in the new condition, and increased cohort competition in the given condition. By contrast, fluent, accented NPs provide an initial bias toward the given but nonfocused object (Dahan et al., in press), so we expected fluent NPs to lead to faster target looks in the given condition and more cohort competition in the new condition. Results showed precisely this interaction, beginning 200 msec after the onset of the head noun ("ca-"). Prior to the noun, there was also a preference for new objects in the disfluent condition and given objects in the fluent condition, emerging 200 msec after the determiner (the/thiy), which provided the first information about fluency. Thus, comprehenders immediately use information provided by disfluencies. This may stem from use of purely distributional information about disfluencies and discourse status, or may result from inferring that the speaker is having difficulty in lexical retrieval (which would be less likely for a just-mentioned referent). Regardless, information about fluency affects the earliest moments of reference resolution. Table 1: Sample instructions (target NP is underlined) Given (Discourse-Old) Context: Put the grapes below the candle. Discourse-new Context: Put the grapes below the camel. a. fluent (accented): Now put the candle below the salt shaker. b. disfluent: Now put thiy, uh, CANDLE below the salt shaker.

  • Thomas Berg, “Slips of the typewriter key,” Applied Psycholinguistics, vol. 23, no. 2, 2002, pp. 185-207. DOI: 10.1017/s0142716402002023. http://journals.cambridge.org/article_S0142716402002023.

    Abstract This article presents an analysis of 500 submorphemic slips of the typewriter key that escaped the notice of authors and other proofreaders and thereby made their way into the published records of scientific research. Despite this high selectivity, the corpus is not found to differ in major ways from other collections of keying slips. The main characteristics of this error type include a predominance of within-word slips, an elevated rate of noncontextual slips, a heightened incidence of omissions (in particular, masking errors), a high number of adjacent switches, and an uncommonness of these slips in word edges. In all these respects, slips of the key resemble slips of the pen, although not slips of the tongue. It is argued that speech errors are shaped by a fully deployed structural representation, whereas key slips arise under the influence of a weak structural representation. By implication, speaking is characterized by a hierarchical strategy of activation while typewriting is subject to the so-called staircase strategy of serialization in which activation is a function of linear distance. These disparate strategies may be understood as a response of the processing system to disparate requirements, such as varying speed of execution.

  • Herbert Clark, and Jean E. Fox Tree, “Using uh and um in spontaneous speaking,” Cognition, vol. 84, no. 1, May 2002, pp. 73-111. DOI: 10.1016/S0010-0277(02)00017-3.

    Abstract The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in "and-uh"), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.

    Keywords conversation, Dialogue, disfluencies, Language production, spontaneous speech, uh, um

  • Catia Cucchiarini, Helmer Strik, and Lou Boves, “Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech,” Journal of the Acoustical Society of America, vol. 111, no. 6, June 2002, pp. 2862-2873. DOI: 10.1121/1.1471894.

    Abstract This paper describes two experiments aimed at exploring the relationship between objective properties of speech and perceived fluency in read and spontaneous speech. The aim is to determine whether such quantitative measures can be used to develop objective fluency tests. Fragments of read speech (Experiment 1) of 60 non-native speakers of Dutch and of spontaneous speech (Experiment 2) of another group of 57 non-native speakers of Dutch were scored for fluency by human raters and were analyzed by means of a continuous speech recognizer to calculate a number of objective measures of speech quality known to be related to perceived fluency. The results show that the objective measures investigated in this study can be employed to predict fluency ratings, but the predictive power of such measures is stronger for read speech than for spontaneous speech. Moreover, the adequacy of the variables to be employed appears to be dependent on the specific type of speech material investigated and the specific task performed by the speaker.

  • Jean E. Fox Tree, “Interpreting pauses and ums at turn exchanges,” Discourse Processes, vol. 34, no. 1, 2002, pp. 37-55. DOI: 10.1207/S15326950DP3401_2.

    Abstract In 3 experiments, this article compares how overhearers interpreted second speakers’ contributions to a conversation depending on whether the second speaker responded to a first speaker immediately; paused and responded; said um and responded; or said um, paused, and then responded. The conversational snippets tested were unscripted and diverse; an example of one exchange is, "Are you here because of affirmative action?" (pause, um, or both) "It helped me out a little bit." Overhearers thought speakers had more production difficulty, were less honest, and were less comfortable with topics under discussion when speakers either said um or paused, and even more so with both. The best explanation for the data is that overhearers are judging, for each question asked, what it means for speakers to produce an anticipated or an unanticipated delay.

  • Yoko Kato Nakai, “Topic Shifting Devices Used by Supporting Participants in Native/Native and Native/Non- Native Japanese Conversations,” Japanese Language and Literature, vol. 36, no. 1, April 2002, pp. 1-25. DOI: 10.2307/3250876.

    Abstract In this paper, I analyzed differences in the devices used by native and nonnative supporting participants in topic openings and closings in Japanese face-to-face conversations. My analysis builds on previous research on conversational units and topic-shifting devices in Japanese conversations (Hayashi 1960; Minami 1972, 1983, 1993; Ichikawa 1978; Sugito and Sawaki 1979; Noda 1981, 1990; Ikuta 1983; Sugito 1983, 1987; Jorden with Noda 1987; Sakuma 1987, 1990, 1992; Szatrowski 1986a, 1986b, 1987, 1991, 1993, 1997, 1998; Imaishi 1992; Sakuma and Suzuki 1993; Suzuki 1994, 1995; Karatsu 1995; Emmett 1996, 1998; Okada 1996; Sasaki 1996, 1998; Kato 1999), analyses of topic-shifting devices in English conversations (Garfinkel and Sacks 1970; Reichman 1978; Derber 1979; Goodwin 1981; Long 1981; Levinson 1983; Chafe 1987; Goodwin and Goodwin 1992; Sacks 1992; Geluykens 1993), and contrastive analyses of topic-shifting strategies in English and Japanese conversation (Maynard 1989; Yamada 1992; Watanabe 1993). I demonstrate that the non-native supporting participants in my data used fewer devices such as discourse developing connectives (e.g., demo ’but’, ja ’so [in that case]’, etc.) and the extended predicate (Jorden with Noda 1987) to indicate the relation of their utterances to the context in topic openings than Japanese native supporting participants did. Non-native supporting participants also tended to use more aizuchi ’backchannel utterances’ in topic closings than did native supporting participants, who combined aizuchi with a variety of other devices such as fragments, assessments, summary utterances, direct style, final particles, prolonged vowels, overlap, repetition, and co-construction.

  • Miguel Oliveira, “The Role of Pause Occurrence and Pause Duration in the Signaling of Narrative Structure,” in PorTAL ’02 Proceedings of the Third International Conference on Advances in Natural Language Processing, Springer-Verlag, 2002, pp. 43-52. http://dl.acm.org/citation.cfm?id=646963.712274.

    Abstract This paper addresses the prosodic feature of pause and its distribution in spontaneous narrative in relation to the role it plays in signaling narrative structure. Pause duration and pause occurrence were taken as variables for the present analysis. The results indicate that both variables consistently mark narrative section boundaries, suggesting thus that pause is a very important structuring device in oral narratives.

  • Michiko Watanabe, “Fillers as Indicators of Discourse Segment Boundaries in Japanese Monologues,” in Proceedings of Speech Prosody 2002, 2002. http://aune.lpl.univ-aix.fr/sp2002/papers.htm.

    Abstract We investigated distribution of fillers (filled pauses) in the vicinity of boundaries of different strengths in Japanese monologues, to understand whether fillers may convey information about the location and the strength of boundaries. Consistent with the results of studies on Dutch monologues, fillers tend to increase as the boundary strength grows. It has also been revealed that fillers tend to occur phrase-initially, more strongly at deeper boundaries than at shallower ones. Regarding filler types, the frequency of eto grows most sharply as boundary strength increases, as does e to a lesser degree. These findings indicate that occurrence of fillers, particularly phrase-initial eto and e, provide contributory evidence to discourse boundaries.

2001

  • Laura Abou-Haidar, “Pauses in speech by French speakers with Down Syndrome,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 33-36. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_033.pdf.

    Abstract A better understanding of the control mechanisms of speech in verbal interaction is very important for the evaluation of the pragmatic competence of a mentally deficient speaker. This study focuses on pauses in the oral production of a Speaker with Down syndrome involved in a conversation: it brings to light the temporal compensation mechanisms which allow the speaker to go beyond the distortions of the segmental level. It confirms the important role of prosody in the success of a conversation, particularly with a speaker who has a handicap which disrupts language structure. Down Syndrome is a condition characterised by an overall delay in cognitive, social, linguistic and motor development. At the oral production level, it leads to deficits in segmental and supra-segmental speech patterning. The goal of this study is to bring elements of response to the following question: is the pragmatic function of language preserved in spite of significant distortions of the motor functions of the phonatory organs? The description of the management of pauses by a speaker with Down syndrome involved in a conversation makes it possible to clarify this subject, while taking into account the various functions which are specific to them beyond the respiratory function: their role in encoding, in the delimitation of syntactic boundaries, and in the regulation of speaking turns, among others. This study allowed us to define criteria which make it possible to characterise the oral production of a Speaker with Down syndrome. These elements relate to the variation of the frequency and the length of pauses. The results obtained are the following: 1. a high frequency of occurrence of pauses in the production of the trisomic speaker; 2. a frequency of occurrence of "mixed pauses", of which the majority have very long lengths, this element revealing a lack of ease and disfluency on the production level; 3. a significant recourse to false-starts, hesitation, repetition and lengthening, to mark sound pauses; 4. a considerable number of very long pauses pauses; 5. a relatively high number of pauses located at the boundaries of or within syntagms, with rather long lengths of intra-syntagmatic uses. We furthermore noted a rarity of long phonic sequences in the speaker with Down syndrome, these sequences seldom exceeding 2000 ms. In spite of these results, it is important to note that we have defined parameters which show that the speaker with Down syndrome integrated rules relating to the management of pauses in verbal interaction.

    Keywords DiSS

  • Karl G.D. Bailey, and Fernanda Ferreira, “Do non-word disfluencies affect syntactic parsing?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 61-64. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_061.pdf.

    Abstract Although disfluencies such as uh are generally not treated as linguistic items, our results suggest that they can affect syntactic parsing. Using a grammaticality judgment task, we demonstrate that disfluencies are able to affect the syntactic parse of a sentence in two ways. First, disfluencies can make syntactic reanalysis more difficult by coming between an ambiguous constituent and a disambiguating item. Second, the pattern of disfluencies in spontaneous speech may be used by the listener to guide the parse of a sentence. Thus, although disfluencies have often been viewed as pragmatic phenomena, they can affect the language comprehension by influencing its parsing procedures.

    Keywords DiSS

  • Ellen G. Bard, Robin J. Lickley, and Matthew P. Aylett, “Is disfluency just difficulty?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 97-100. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_097.pdf.

    Abstract The question addressed by this paper is whether disfluency resembles Inter-Move Interval, a measure of reaction time in conversation, in displaying effects of the overall difficulty of conducting a coherent conversation. Five sources of difficulty are considered as potential causes of disfluency: planning and producing an utterance, comprehending the prior utterance, performing a communicative task, order effects, and interpersonal factors. A multiple regression analysis on simple disfluencies in the HCRC Map Task Corpus shows that planning and production make the major independent contribution to predicting the rate of disfluencies, with interpersonal variables and position in dialogue also contributing significantly. Notably, comprehension variables did not affect either the total rate of disfluency or the rate of individual kinds of disfluencies.

    Keywords DiSS

  • Heather Bortfeld, Silvia Leon, Jonathan Bloom, Michael Schober, and Susan Brennan, “Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender,” Language and Speech, vol. 44, 2001, pp. 123-147. http://openurl.ingenta.com/content?genre=article&issn=0023-8309&volume=44&issue=2&spage=123&epage=147.

    Abstract After reviewing situational and demographic factors that have been argued to affect speakers’ disfluency rates, we examined disfluency rates in a corpus of task-oriented conversations (Schober & Carstensen, 2001) with variables that might affect fluency rates. These factors included: speakers’ ages (young, middle-aged, and older), task roles (director vs. matcher in a referential communication task), difficulty of topic domain (abstract geometric figures vs. photographs of children), relationships between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only slightly higher disfluency rates than young and middle-aged speakers. Overall, disfluency rates were higher both when speakers acted as directors and when they discussed abstract figures, confirming that disfluencies are associated with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repeats or restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

    Keywords communication, conversation, disfluency, speech planning, spontaneous speech

  • Susan Brennan, and Michael Schober, “How Listeners Compensate for Disfluencies in Spontaneous Speech,” Journal of Memory and Language, vol. 44, no. 2, 2001, pp. 274-296. DOI: 10.1006/jmla.2000.2753.

    Abstract Listeners often encounter disfluencies (like uhs and repairs) in spontaneous speech. How is comprehension affected? In four experiments, listeners followed fluent and disfluent instructions to select an object on a graphical display. Disfluent instructions included mid-word interruptions (Move to the yel- purple square), mid-word interruptions with fillers (Move to the yel- uh, purple square), and between-word interruptions (Move to the yellow- purple square). Relative to the target color word, listeners selected the target object more quickly, and no less accurately, after hearing mid-word interruptions with fillers than after hearing comparable fluent utterances as well as utterances that replaced disfluencies with pauses of equal length. Hearing less misleading information before the interruption site led listeners to make fewer errors, and fillers allowed for more time after the interruption for listeners to cancel misleading information. The information available in disfluencies can help listeners compensate for disruptions and delays in spontaneous utterances.

    Keywords comprehension, disfluencies, fillers, paralinguistic cues, parsing, pauses, repairs, spontaneous speech

  • Jeanne-Marie Debaisieux, and José Deulofeu, “Grammatically unacceptable utterances are communicatively accepted by native speakers, why are they?,” in Disfluency in Spontaneous Speech (DiSS ’01), Edinburgh, Scotland, August 2001, pp. 69-72. http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_069.pdf.

    Abstract This paper aims at redefining the generally accepted notion of unfinished or elliptic sentence, which appears to be crucial in defining in turn the notion of fluency itself. It will be shown that a large part of utterances which a regularly trained linguist would consider as unacceptable and revealing some kind of disfluency of the speaker who produced them, are in fact fully accepted by the participants of a regular verbal interaction. This apparent contradiction will be explained by the fact that linguists base their judgments of well formedness of the utterances on their grammatical structure, whereas speakers interact basically by means of communicative units, which are not necessarily made up of grammatically well formed parts.