FPRC — The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)

Malte Belz, and Uwe Reichel, “Pitch Characteristics of Filled Pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract We investigate the pitch characteristics of filled pauses in order to distinguish between hesitational and floor-holding functions of filled pauses. A corpus of spontaneous dialogues is explored using a parametric bottom-up approach to extract intonation contours. We find that subjects tend to utter filled pauses more prominently when they cannot see each other, which indicates an increased floor-holding usage of filled pauses in this condition.

Keywords DiSS, disfluencies, filled pauses, intonation, floor-holding

Hans Rutger Bosker, Jade Tjiong, Hugo Quené, Ted Sanders, and Nivja de Jong, “Both native and non-native disfluencies trigger listeners’ attention,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For instance, disfluencies may elicit prediction of less accessible referents and may trigger listeners’ attention to the following word. However, recent work suggests differential processing of disfluencies in native and non-native speech. The current study investigated whether the beneficial effects of disfluencies on listeners’ attention are modulated by the (non-)native identity of the speaker. Using the Change Detection Paradigm, we investigated listeners’ recall accuracy for words presented in disfluent and fluent contexts, in native and non-native speech. We observed beneficial effects of both native and non-native disfluencies on listeners’ recall accuracy, suggesting that native and non-native disfluencies trigger listeners’ attention in a similar fashion.

Keywords DiSS, disfluencies, attention, non-native speech, Change Detection Paradigm

Rasmus Dall, Mirjam Wester, and Martin Corley, “Disfluencies in change detection in natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract In this paper, we investigate the effect of filled pauses, a discourse marker and silent pauses in a change detection experiment in natural, vocoded and synthetic speech. In natural speech change detection has been found to increase in the presence of filled pauses, we extend this work by replicating earlier findings and explore the effect of a discourse marker, like, and silent pauses. Furthermore we report how the use of "unnatural" speech, namely synthetic and vocoded, affects change detection rates. It was found that the filled pauses, the discourse marker and silent pauses all increase change detection rates in natural speech, however in neither synthetic nor vocoded speech did this effect appear. Rather, change detection rates decreased in both types of "unnatural" speech compared to natural speech. The natural results suggests that while each type of pause increase detection rates, the type of pause may have a further effect. The "unnatural" results suggest that it is not the full pipeline of synthetic speech that causes the degradation, but rather that something in the pre-processing, i.e. vocoding, of the speech database limits the resulting synthesis.

Keywords DiSS, change detection, filled pauses, speech synthesis

Stephanie Don, and Robin Lickley, “Uh I forgot what I was going to say: How memory affects fluency,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Disfluency rates vary considerably between individuals. Previous studies have considered gender, age and conversational roles amongst other factors that may affect fluency. Testing a nonclinical, gender-balanced population of young adults performing the same speaking tasks, this study explores how inter-speaker variations in working memory and in long-term (lexical) memory affect disfluency in two different ways. Working memory was tested by a forward digit span test; long-term lexical memory was tested by the Verbal Fluency Test, both semantic and phonological versions. In addition, each participant provided 3 one-minute samples of monologue speech. The speech samples were analysed for disfluencies. Speakers with lower working memory scores produced more error repairs in running speech. Speakers with lower lexical access scores produced a higher rate of hesitations. The two types of memory affected fluency in different ways.

Keywords DiSS, hesitation, error repair, working memory, long term lexical memory

Robert Eklund, Peter Fransson, and Martin Ingvar, “Neural correlates of the processing of unfilled and filled pauses,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Spontaneously produced Unfilled Pauses (UPs) and Filled Pauses (FPs) were played to subjects in an fMRI experiment. While both stimuli resulted in increased activity in the Primary Auditory Cortex, FPs, unlike UPs, also elicited modulation in the Supplementary Motor Area, Brodmann Area 6. This observation provides neurocognitive confirmation of the oft-reported difference between FPs and other kinds of speech disfluency and also could provide a partial explanation for the previously reported beneficial effect of FPs on reaction times in speech perception. The results are discussed in the light of the suggested role of FPs as floor-holding devices in human polylogs.

Keywords DiSS, speech disfluency, filled pauses, unfilled pauses, speech perception, spontaneous speech, fMRI, Auditory Cortex, PAC, Supplementary Motor Area, SMA, Brodmann Area 6, BA6

Lorenzo García-Amaya, “A longitudinal study of filled pauses and silent pauses in second language speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This study provides a longitudinal analysis of speech rate and the use of filled pauses (FPs) and unfilled or silent pauses (SPs) in the oral production of L2 learners of Spanish in two learning contexts: a 6-week intensive overseas immersion program (OIM), and a 15-week US-based ‘at-home’ foreign language classroom (AH). Fifty-six native speakers of English performed two video-retell tasks at three different time points. A total of five measurements of oral production were calculated. The results show a significant increase in rate of speech over time in the OIM group compared to the AH group. Additionally, the OIM learners show greater use of “disfluencies” over time, namely FPs and short Sps. We suggest that OIM learners increase their use of hesitation phenomena over time as a speech processing and planning strategy and discuss this finding within the framework of L2 cognitive Fluency.

Keywords DiSS, second language fluency, disfluencies, rate of speech, filled pauses, silent pauses, study abroad, Spanish

Emer Gilmartin, Carl Vogel, and Nick Campbell, “Disfluency in multiparty social talk,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Much research on disfluencies in spontaneous spoken interaction has been carried out on corpora of task-based conversations, resulting in greater understanding of the role of several phenomena. Modern multimodal corpora allow the full spectrum of signals in face to face communication to be analysed. However, the ‘unmarked’ case of casual conversation or social talk with no obvious short-term instrumental goal has been less studied in this manner. Corpus-based work on social talk tends to deal with short dyadic interactions, although the norm for social conversation is for longer multiparty interaction. In this paper, we outline our programme of exploratory studies of disfluency in a longer multiparty conversation. We briefly describe the background to our research goals, and then report on the collection, transcription, and annotation of the data for our experiments. We present and discuss some of our early results.

Keywords DiSS, disfluency, hesitation, repair, casual conversation, spoken interaction

Iulia Grosman, “Complexity cues or attention triggers? Repetitions and editing terms for native speakers of French,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract A growing stream of research shows evidence of the metalinguistic information that disfluencies (silent and filled pauses, repetitions, false-starts, repairs, etc.) can display to listeners. As a result, disfluencies may work as fluent devices. By means of a decision task latencies, this study investigates whether lexical repetition co-occurring with an editing term affects the perception of native speakers of French. There is a lack of consensus in the literature: do disfluencies trigger conceptual priming of complex entity or act simply as attention cues? Results from multiple analysis of variance and linear mixed-effect modelling show that the presence of a disfluency triggers a faster response from the participant, however complex the following noun-phrase might be, supporting the hypothesis that repetition and co-occurring editing terms act as cognitive signposts rather than as cues of complexity of an upcoming event.

Keywords DiSS, disfluencies, reaction time, perception, prosody, repetitions, French

Sandra Götz, “Fluency in ENL, ESL and EFL: A corpus-based approach,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Against the background of a ‘cline model’ of increasing fluency/decreasing disfluency from ENL to ESL to EFL forms of English, the present pilot study investigates (dis)fluency features in British English, Sri Lankan English and German Learner English. The analysis of selected variables of temporal fluency (viz. unfilled pauses, mean length of runs) and fluency-enhancement strategies (viz. discourse markers, smallwords and repeats) is based on the c. 40,000-word subcorpora of the British and the Sri Lankan components of the International Corpus of English (ICE-GB and ICE-SL) and the c. 80,000-word German component of the Louvain International Database of Spoken English Interlanguage (LINDSEI-GE). The study reveals that, while the EFL variant shows the lowest degree of temporal fluency (e.g. the highest number of unfilled pauses), the findings are mixed for ESL and ENL (e.g. the ESL speakers show a lower number of unfilled pauses, but the ENL speakers show a higher number of smallwords). Also, variant-specific preferences of using certain fluency-enhancement strategies become clearly visible.

Keywords DiSS, ENL vs. ESL vs. EFL, fluency, corpus-based (dis)fluency, fluency profiles

Zara Harmon, and Vsevolod Kapatsinski, “Studying the dynamics of lexical access using disfluencies,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Faced with planning problems related to lexical access, speakers take advantage of a major function of disfluencies: buying time. It is reasonable, then, to expect that the structure of disfluencies sheds light on the mechanisms underlying lexical access. Using data from the Switchboard Corpus, we investigated the effect of semantic competition during lexical access on repetition disfluencies. We hypothesized that the more time the speaker needs to access the following unit, the longer the repetition. We examined the repetitions preceding verbs and nouns and tested predictors influencing the accessibility of these items. Results suggest that speed of lexical access negatively correlates with the length of repetition and that the main determinants of lexical access speed differ for verbs and nouns. Longer disfluencies before verbs appear to be due to significant paradigmatic competition from semantically similar verbs. For nouns, they occur when the noun is relatively unpredictable given the preceding context.

Keywords DiSS, repetition, lexical access, semantic competition, sentence planning, lexicalization

Clara Hedenqvist, Frida Persson, and Robert Eklund, “Disfluency incidence in 6-year old Swedish boys and girls with typical language development,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This paper reports the prevalence of disfluencies in a group of 55 (25F/30M) Swedish children with typical speech development, and within the age range 6;0 and 6;11. All children had Swedish as their mother tongue. Speech was elicited using an “event picture” which the children described in their own, spontaneously produced, words. The data were analysed with regard to sex differences and lexical ability, including size of vocabulary and word retrieval, which was assessed using the two tests Peabody Picture Vocabulary Test and Ordracet. Results showed that girls produced significantly more unfilled pauses, prolongations and sound repetitions, while boys produced more word repetitions. However, no correlation with lexical development was found. The results are of interest to speech pathologists who study early speech development in search for potential early predictors of speech pathologies.

Keywords DiSS, speech disfluency, children, lexical development, sex differences

Julian Hough, Laura de Ruiter, Simon Betz, and David Schlangen, “Disfluency and laughter annotation agreement in a light-weight dialogue mark-up protocol,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Despite a great deal of research effort, disfluency and laughter annotation is still an unsolved problem, both in terms of consensus for a general applicable system, and in terms of annotation agreement metrics. In this paper we present a new annotation scheme within a light-weight mark-up for spontaneous speech. We show, despite the low overhead required for understanding the annotation protocol, it allows for good inter-annotator agreement and can be used to map onto existing disfluency categorization, with no loss of information.

Keywords DiSS, disfluency annotation, laughter, German corpora, inter-annotator agreement, spontaneous speech

Peter Howell, “Intervention for children with word-finding difficulty: Impact on fluency during spontaneous speech for children using English as their native or as an additional language,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Types of intervention that could be targeted when there are high rates of word-finding difficulty were examined for any impact they had on speech fluency (whole-word repetition rate in particular). Results are reported that are interpreted as showing that a semantic-based intervention has an impact on fluency as well as word-finding.

Keywords DiSS, EAL, word-finding, stuttering, intervention

Hanae Koiso, and Yasuharu Den, “Causal analysis of acoustic and linguistic factors related to speech planning in Japanese monologs,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract In this paper, we applied a general method of testing path models, investigating causal relationship between cognitive load in speech planning and four types of disfluencies in Japanese monologs. The four disfluencies examined were i) clause-initial fillers, ii) inter-clausal pauses, iii) clause-final lengthening, and iv) boundary pitch movements, which occurred at weak clause boundaries. The length of the constituents following weak clause boundaries was assumed to be a measure of the complexity affecting the cognitive load. By using a model selection technique based on the AIC, we found an optimal model with the smallest AIC, in which the constituent complexity had direct effects on all of the four disfluency variables. In addition, some of the disfluencies influenced one another; clause-final lengthening was enhanced by the presence of a boundary pitch movement and the occurrence of clause-initial fillers was affected by all the other three disfluency variables.

Keywords DiSS, path models, fillers, pauses, clause-final lengthening, boundary pitch movements

Kikuo Maekawa, and Hiroki Mori, “Voice quality analysis of Japanese filled pauses : a preliminary report,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Using the Core of the Corpus of Spontaneous Japanese, acoustic analysis of F1, spectral tilt (TL), H1-H2, jitter and F0 was conducted to examine the voice-quality difference between the vowels in filled pauses and those in ordinary lexical items. It turned out by simple SVM analysis that the two classes of vowels could be discriminated with the mean accuracy of higher than 70%.

Keywords DiSS

Helena Moniz, Jaime Ferreira, Fernando Batista, and Isabel Trancoso, “Disfluency detection across domains,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.

Keywords DiSS, disfluency detection, acoustic-prosodic features, cross-domain analysis, European Portuguese.

Ralph Rose, “Um and uh as differential delay markers: the role of contextual factors,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract The English filled pauses uh and um have been argued to correspond respectively to shorter and longer anticipated delays in speech production. This study looks at some contextual factors that might cause this difference by investigating filled pause instances in monologue and conversation speech corpora. Results are consistent with previously observed delay differences and further show that discourse-level processing may influence differential delay marking though monologue results are more conclusive than conversation results. However, no evidence was found that lexical factors (word type, frequency) correlate with filled pause choice. The findings suggest a limited view of how speakers use filled pauses as delay markers: Not all contextual factors may trigger differential delay marking.

Keywords DiSS, filled pause, delay, contextual factors

Vered Silber-Varod, Adva Weiss, and Noam Amir, “Can you hear these mid-front vowels? Formants analysis of hesitation disfluencies in spontaneous Hebrew,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This study attempts to characterize the timbre of the default type of hesitation disfluency (HD) in Israeli Hebrew: the mid-front vowel /e/. For this purpose, we analysed the frequencies of the first three formants, F1, F2, and F3, of hundreds of HD pronunciations taken from The Corpus of Spoken Israeli Hebrew (COSIH). We also compared the formant values with two former studies that were carried out on the vowel /e/ in fluent speech. The findings show that, in general, elongated word-final syllables and appended [e]s are pronounced with the same amount of openness as fluent [e], while filled pauses tend to be more open (lower F1), and more frontal (higher F2). Following these results, we suggest to use different set of IPA symbols, and not the phonemic mid-front /e/, in order to better represent hesitation disfluencies.

Keywords DiSS, hesitation disfluency, filled pauses, LPC analysis, formants, spontaneous speech, Hebrew

Jozsef Szakos, and Ulrike Glavitsch, “Investigating disfluency in recordings of last speakers of endangered Austronesion languages in Taiwan,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract The nearly three decades spent in Formosan language documentation produced hundreds of hours of recorded speech. In this paper, we show how the use of SpeechIndexer for transcribing and indexing the data visualises the problem of disfluency in the spontaneous narratives and dialogues. The semiautomatic alignment of speech and transcription needs to be adjusted manually each time when unpredictable pauses occur which are disfluencies, rather than markers of phrasal units. It is illustrated how the combination of SpeechIndexer’s pause finder with pitch measurements can help to pinpoint the difference of phrasal boundaries and pauses of disfluency.

Keywords DiSS, Austronesian, lesser-documented unwritten language, SpeechIndexer, pause finder

Leimin Tian, Catherine Lai, and Johanna Moore, “Recognising emotions in dialogues with disfluencies and non-verbal vocalisations,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract We investigate the usefulness of DISfluencies and Non-verbal Vocalisations (DIS-NV) for recognizing human emotions in dialogues. The proposed features measure filled pauses, fillers, stutters, laughter, and breath in utterances. The predictiveness of DISNV features is compared with lexical features and state-of-the-art low-level acoustic features. Our experimental results show that using DIS-NV features alone is not as predictive as using lexical or acoustic features. However, adding them to lexical or acoustic feature set yields improvement compared to using lexical or acoustic features alone. This indicates that disfluencies and non-verbal vocalisations provide useful information overlooked by the other two types of features for emotion recognition.

Keywords DiSS, emotion recognition, dialogue, disfluency, speech processing, HCI

Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King, “A lattice-based approach to automatic filled pause insertion,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.

Keywords DiSS, disfluency, filled pauses, f-RNNLMs, Ngrams, lattices

Michiko Watanabe, Yosuke Kashiwagi, and Kikuo Maekawa, “The relationship between preceding clause type, subsequent clause length and duration of silent and filled pauses at clause boundaries in Japanese monologues,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Filled pauses (FPs) are claimed to occur when speakers have some difficulties and need extra time in speech production. This study investigated whether the following two factors affect silent pause (SP) and FP durations at clause boundaries, using a spontaneous speech corpus: 1) boundary strength and 2) subsequent clause length. First, whether SP and FP durations increase with syntactic boundary strength was examined. Second, whether subsequent clause length affects SP and FP durations at the boundaries was investigated. Results show SP duration increased with boundary strength and subsequent clause length, but FP duration did not, suggesting only SP duration is affected by the two Factors.

Keywords DiSS, silent pause, filled pause, clause boundary, speech planning, disfluency

Mirjam Wester, Martin Corley, and Rasmus Dall, “The temporal delay hypothesis: natural, vocoded and synthetic speech,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Including disfluencies in synthetic speech is being explored as a way of making synthetic speech sound more natural and conversational. How to measure whether the resulting speech is actually more natural, however, is not straightforward. Conventional approaches to synthetic speech evaluation fall short as a listener is either primed to prefer stimuli with filled pauses or, when they aren’t primed they prefer more fluent speech. Psycholinguistic reaction time experiments may circumvent this issue. In this paper, we revisit one such reaction time experiment. For natural speech, delays in word onset were found to facilitate word recognition regardless of the type of delay; be they a filled pause (um), silence or a tone. We expand these experiments by examining the effect of using vocoded and synthetic speech. Our results partially replicate previous findings. For natural and vocoded speech, if the delay is a silent pause, significant increases in the speed of word recognition are found. If the delay comprises a filled pause there is a significant increase in reaction time for vocoded speech but not for natural speech. For synthetic speech, no clear effects of delay on word recognition are found. We hypothesise this is because it takes longer (requires more cognitive resources) to process synthetic speech than natural or vocoded speech.

Keywords DiSS, delay hypothesis, disfluency

Clare Wright, and Cong Zhang, “The effect of study abroad experience on L2 Mandarin disfluency in different types of tasks,” in The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)), Edinburgh, Scotland, August 2015.

Abstract Disfluency is a common phenomenon in L2 speech, especially in beginners’ speech. Whether studying abroad can help with reducing their disfluency or not remains debated [8]. We examined longitudinal data from 10 adult English instructed learners of Mandarin measured before and after ten months of studying abroad (SA) in this paper. We used two speaking tasks comparing pre-planned vs. Unplanned spontaneous speech to compare differences over time and between tasks, using eight linguistic and temporal fluency measures (analysed using CLAN and PRAAT). Overall mean linguistic and temporal fluency scores improved significantly (p < .05), especially speech rate (p <.01), supporting the general claim that SA favours oral development, particularly fluency [2]. Further analysis revealed task differences at both times of measurement, but with greater improvement in the spontaneous task.

Keywords DiSS, fluency; L2 Mandarin; study abroad

Filled Pause

Research Center

Filled Pause

Research Center

Filled Pause

Research Center

The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015)

Papers presented