The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013)
The sixth Workshop on Disfluency in Spontaneous Speech was held as a satellite event of the International Speech Communication Association (ISCA) annual conference.
Date: August 21-23, 2013
Location: KTH Royal Institute of Technology; Stockholm, Sweden
Organizers: Jens Edlund, Robert Eklund, Joakim Gustafson, and Sofia Strömbergsson
Invited speakers: Herbert H. Clark, Martin Corley
“Disfluency and discursive markers: when prosody and syntax plan discourse,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 5-8.,
Abstract Hesitations, interruptions within phrases or within words are common in spontaneous speech. Those phenomena are widely known to be observable from a prosodic point of view through disfluencies. From a syntactic point of view, many studies already established that discursive markers such as hm, oh, I mean, etc. are representative of spontaneous speech. In this study, we demonstrate through a joint corpus-based analysis that these prosodical and syntactical features are correlated, without however being equivalent. More precisely, the lack of either disfluencies or discursive markers is consistently shown to be representative of a planned discourse.
Keywords DiSS, disfluency, discursive marker, genres
“Pauses following fillers in L1 and L2 German map task dialogues,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 9-12.,
Abstract Fillers and pauses in spoken language indicate hesitations. Filler type (uh vs. um) is believed to signal a minor or major following speech delay in L1. We examined whether advanced speakers of L2 German use pauses following filler type (äh vs. ähm) in the same way as native speakers do. Two Map Task corpora of L1 and L2 were contrasted with respect to speaker role, filler type and the exact time interval of fillers and pauses. Speaker role influenced the disfluency patterns in L1 and L2 in the same way. Filler type had no impact on the length of the following pause, but the time interval patterns differed significantly. Longer filler intervals are followed by longer pauses in L2 and by shorter pauses in L1. These results suggest that filler type in German is not used to indicate the length of the following delay. Advanced learners seem to have adopted this pattern of use, but cannot overcome their hesitations as fast as native speakers, probably due to their less automatised speech production.
Keywords DiSS, fillers, pauses, spontaneous speech, L1, L2, map task, German, disfluencies, contrastive analysis
“HESITA(tions) in Portuguese: a database,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 13-16.,
Abstract With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.
Keywords DiSS, hesitations, disfluency, prepared speech, spontaneous speech, annotation, hesitation corpus
“Choosing a threshold for silent pauses to measure second language fluency,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 17-20.,
Abstract Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.
Keywords DiSS, silent pauses, number of pauses, duration of pauses, silent pause threshold, second language speech
“Self-repairs in German children's peer interaction - initial explorations,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 29-32.,
Abstract Forty-nine self-repairs were extracted from a corpus of conversational speech of ten German children (mean age 5;1) with peers. The repairs were analysed using Levelt’s  classification and compared with his adult data. Children produced fewer appropriateness repairs than adults, but more covert repairs and more phonetic repairs. Like adults, children had a preference to interrupt themselves within-word only for error repairs. Unlike adults, children did not produce editing terms following interruptions.
“Lengthenings aand filled pauses in Hungarian adults' and children's speech,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 21-24.,
Abstract In the present paper vowel lengthenings and non-lexicalized filled pauses were studied in the spontaneous speech of children and adults (focusing more on the much less studied phenomenon: vowel lengthening). The results revealed different usage and appearance of lengthenings in the two age groups, therefore, differences in speech skills and strategies can be concluded. LEs and FPs differ mostly in their position in the speech session between the age groups, which has implications regarding different planning strategies of adults and children. We also draw conclusions regarding the methodological considerations in the issue of identifying vowel lengthening supporting a previously formulated conception.
Keywords DiSS, lengthening, (non-lexicalized) filled pause, spontaneous speech, speech planning, discourse management
“Anti-zero pronominalization: when Japanese speakers overtly express omissible topic phrases,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 25-28.,
Abstract In this paper, we focus on cases where Japanese speakers overtly express a topic phrase that could have been omitted. We call this phenomenon anti-zero-pronominalization and hypothesize that this helps speakers gain time for planning a following utterance; anti-zero-pronominalization is another option to deal with cognitive load at the beginning of an utterance in addition to fillers and other speech disfluencies. Based on a quantitative analysis of a corpus of spontaneous Japanese dialogs, we investigate the difference between overt topic NPs and zero-pronouns. We show that i) the utterance is more complex when the topic is expressed as an overt NP than when it is expressed as a zero-pronoun; ii) turn-initial items such as fillers are produced less frequently when overt NPs appear than when zero-pronouns appear; and iii) the utterance becomes more complex when the last mora of the topic is more prolonged.
Keywords DiSS, zero-pronouns, topic phrases, cognitive load, Japanese dialogs
“Self-addressed questions in disfluencies,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 33-36.,
Abstract The paper considers self-addressed queries – queries speakers address to themselves in the aftermath of a filled pause. We study their distribution in the BNC and show that such queries show signs of sensitivity to the syntactic/semantic type of the sub-utterance they follow. We offer a formal model that explains the coherence of such queries.
“Acoustic and linguistics features related to speech planning appearing at weak clause boundaries in Japanese monologs,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 37-40.,
Abstract In this paper, we focus on weak clause boundaries in Japanese monologs in order to investigate the relationship of the length of constituents following weak boundaries to three acoustic and linguistic features: 1) occurrence rate of fillers, 2) occurrence rate of boundary pitch movements, and 3) degree of lengthening of clause-final morae. We found that all these features were significantly correlated with the length of following constituents. Most importantly, boundary pitch movements had an additional effect that can be distinct from the effect of clause-final lengthening. These results suggest that Japanese speakers have earlier-occurring items that help them deal with cognitive load in speech planning, in addition to fillers and other clause-initial disfluencies.
Keywords DiSS, fillers, boundary pitch movements, clause-final lengthening, Japanese monologs
“Prediction of F0 height of filled pauses in spontaneous Japanese: a preliminary report,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 41-44.,
Abstract F0 values of filled pauses (FP) in the Corpus of Spontaneous Japanese were analyzed to examine the mechanism by which the F0 heights of FP were determined. Statistical analyses of the F0 values of FP occurring in between two full-fledged accentual phrases (AP) revealed correspondence between the occurrence timing of FP and the F0 height. Based upon this finding, 5 models of F0 prediction were proposed. Comparison of the mean prediction errors revealed that the best prediction was obtained in a model that linearly interpolate the phrase-final L% tone of the immediately preceding AP and the phrase-initial %L tone of the immediately following AP. This finding suggests that the F0 of FP was specified at the level of phonetic realization rather than phonological prosodic representation.
“Analysis of parenthetical clauses in spontaneous Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 45-48.,
Abstract In this paper, I will discuss the functional aspects of parenthetical clauses and sentences in spontaneous Japanese monologues. Parentheticals can be defined as syntactic elements that are instantly inserted in the middle of an ongoing utterance to add supplemental information and thus interrupts the fluent flow of speech production. Examples of parenthetical clauses/sentences that appeared in the Corpus of Spontaneous Japanese were examined and then classified into three types. These types differ in their contextual functions, but share a commonality in that they present multiplex information simultaneously in the process of producing spontaneous speech.
Keywords DiSS, parenthetical clause/sentence, Corpus of Spontaneous Japanese, contextual functions
“Automatic structural metadata identification based on multilayer prosodic information,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 49-52.,
Abstract This paper discriminates different types of structural metadata in transcripts of university lectures: boundary events (comma, full stops and interrogatives), and disfluencies (repair). The disambiguation process is based on predefined multilayered linguistic information and on its hierarchical structure. Since boundary events may share similar linguistic properties, in terms of F0 and energy slopes, presence/absence of silent pauses, and duration of different units of analysis, different classification methods based on a set of automatically derived prosodic features have been applied to differentiate between those events and disfluencies. This paper also performs a detailed analysis on the impact of each individual feature in discriminating each structural event. The results of our data-driven approach allow us to reach a structured set of basic features towards the disambiguation of metadata events. These results are a step forward towards the analysis of speech acts and their disambiguation from disfluencies.
Keywords DiSS, disfluencies, automatic speech processing, structural metadata, speech prosody
“Which kind of hesitations can be found in Estonian spontaneous speech?,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 53-54.,
Abstract This paper describes the acoustic characteristics of hesitations in Estonian spontaneous speech. We especially investigate duration, fundamental frequency, and first two formant analyses. Most frequent hesitations can be expressed by lengthened phonemes such as /ää/, /ee/, /õõ/, and /mm/. We compare lengthened phoneme hesitations with their related phonemes. The results from our preliminary hesitation study show (i) hesitations have longer duration and its range is spread; (ii) hesitations globally include lower pitch; (iii) hesitation formants are likely to be centralized or posterior and opened in comparison with related phonemes.
Keywords DiSS, hesitation, Estonian, spontaneous speech
“Self-monitoring as reflected in identification of misspoken segments,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 55-57.,
Abstract Most segmental speech errors probably are articulatory blends of competing segments. Perceptual consequences were studied in listeners' reactions to misspoken segments. 291 speech fragments containing misspoken initial consonants plus 291 correct control fragments, all stemming from earlier SLIP experiments, were presented for identification to listeners. Results show that misidentifications (i.e. deviations from an earlier auditory transcription) are rare (3%), but reaction times to correctly identified fragments systematically reflect differences between correct controls, undetected, early detected and late detected speech errors, leading to the following speculative conclusions: (1) segmental errors begin their life in inner speech as full substitutions, and competition with correct target segments often is slightly delayed; (2) in early interruptions speech is initiated before competing target segments are activated, but then rapidly interrupted after error detection; (3) late detected errors reflect conflict-based monitoring of articulation or monitoring overt speech.
“Catogorizing syntactic chunks for marking disfluent speech in French language,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 59-62.,
Abstract Disfluency is the first phenomenon one has to address when processing spontaneous speech. Efficient systems combining transcription-based and signal-based cues have been created for English. These systems generally use supervised machine learning models, trained over large annotated datasets combining signal and transcription. As for other languages, including French, the situation is complicated by the lack of resources. A few proposals based on filled pauses, truncated words and repetitions have been made for identifying disfluencies in French. In this paper, we propose a transcription-based approach to this task, with high-quality morpho-syntactic tags as input for identifying disfluent areas. Originally, we adopted a transcription-based approach for obtaining an independent way of characterizing disfluencies. This can be later compared and combined with prosodic cues. Our method consists in building syntactic chunks from our tagging and then classify these chunks into several categories, some of them being considered as disfluent. We apply our method to speaker style characterization, discourse genres zoning, as well as to dataset cleaning. Finally, an attempt is made to relate our disfluent chunks to a more standard description of disfluencies in order to open the way of a deeper integration of our work with the one of the disfluency community.
Keywords DiSS, tagging, chunking, transcription-based approach, disfluencies, speaking style
“Acoustical characterization of vocalic fillers in European Portuguese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 63-66.,
Abstract This study attempts to acoustically characterize the most common filled pause vocalizations (or vocalic fillers) in spontaneous speech in European Portuguese: the near-open central vowel [ɐ] and the mid-central vowel [ə]. For this purpose we analyzed the spectral information of the vocalic fillers by estimating their first two formant frequencies as well as their duration properties. The vocalic fillers are taken from a large corpus of European Portuguese broadcast news' speech. We also compared the vocalic fillers with lexical vowels possessing similar timbre. No formant variation trend was attained for the vocalic fillers and a great overlap of formant values is observed. These results provide a base of information for understanding the most common vocalic fillers in European Portuguese spontaneous speech.
Keywords DiSS, filled pauses, vocalic fillers, formant estimation, spontaneous speech, hesitations
“The linguistic role of hesitation disfluencies: evidence from Hebrew and Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 67-70.,
Abstract In this paper we examine a certain aspect of prosodysyntax interface, that of hesitation disfluencies (HD) that occur intra-phrases or intra-morphemes. Such cases were found in two spontaneous corpora of two syntactically distinct languages – Israeli Hebrew (IH) and Japanese. It was found that intra-phrasal hesitations in the two languages calls for different explanations, since in Japanese the noun (e.g., in NP) precedes the case marking particle while in IH the preposition (e.g., in PP) precedes the noun. In this paper we will present qualitative findings and suggest a unified view of the phenomenon of intra-phrasal HDs.
Keywords DiSS, hesitation disfluency, prosody-syntax interface, Israeli Hebrew, Japanese
“Phrasal complexity and the occurrence of filled pauses in presentation speeches in Japanese,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 71-72.,
Abstract Filled pauses are ubiquitous in everyday speech. I investigated whether linguistic complexity of upcoming phrases affects filler rate at phrase boundaries in presentation speeches in Japanese. Filler rate at phrase boundaries increased monotonically with complexity of the following phrases. However, when the following phrase was composed of more than 11 Bunsetsu-phrases, the filler rate did not show any constant increase. The results indicate that filler rate at phrase boundaries is closely related to cognitive load of local linguistic encoding and that the maximum planning span for linguistic encoding is about 10 Bunsetsu-phrases in Japanese monologues.
Keywords DiSS, filled pause, bunsetsu-phrase, linguistic complexity, planning load
“Disfluencies and uncertainty perception - evidence from a human-machine scenario,” in The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013) (TMH-QPSR), vol. 54, no. 1, Stockholm, Sweden, August 2013, pp. 73-76.,
Abstract This paper deals with the modelling and perception of disfluencies in articulatory speech synthesis. The stimuli are embedded into short dialogues in question-answering situations in a human–machine scenario. The system is supposed to express uncertainty in the answer. We test the influence of delay, intonation, and filler as prosodic indicators of uncertainty on perception in two studies. Study 1 deals with the effect of delay and filler on uncertainty perception. Results suggest an additive effect of the cues, i.e. the activation of both prosodic cues of uncertainty has a stronger impact on uncertainty perception than the deactivation of a single cue or of both cues. With respect to the effect of single cues, no significant difference can be observed. Study 2 investigates the impact of delay and intonation on perceived uncertainty. Again, a principle of additivity can be observed. Furthermore as modelled here, intonation has a stronger influence than delay. In both studies no correlation between the ranking of uncertainty and naturalness of the stimuli is found.
Keywords DiSS, uncertainty, disfluencies, speech synthesis, speech perception