Workshop on Disfluency in Spontaneous Speech (DiSS 1999)
Intro | DiSS 1999 | DiSS 2001 | DiSS 2003 | DiSS 2005 | DiSS-LPSS 2010 | DiSS 2013 | DiSS 2015 | DiSS 2017 | DiSS 2019 | DiSS 2021
The first Workshop on Disfluency in Spontaneous Speech was held as a one-day satellite meeting of the International Congress for Phonetic Sciences.
Date: July 30, 1999
Location: University of California Berkeley
Organizers: Robin Lickley, Ellen Gurman Bard, Jean Fox Tree, Peter Heeman, Liz Shriberg, and Madelaine Plauché.
Papers presented
(Download references in bibtex
format here.)
-
“Which speakers are most disfluent in conversation, and when?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 7-10.,
Abstract We examined disfluency rates in a corpus of task-oriented conversations [1] in which several factors were manipulated that could affect fluency rates. These factors included: speakers' age (young, middleaged, and older), task roles (director vs. matcher), difficulty of domain (abstract geometric figures or tangrams vs. photographs of children's faces), relationship between speakers (married vs. strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only marginally higher (combined) disfluency rates than young and middleaged speakers. Overall, disfluency rates were higher both when speakers took the initiative and when they discussed tangrams, associating disfluencies with an increase in planning difficulty. However, fillers (such as uh) were distributed somewhat differently than repetitions and restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.
Keywords DiSS
-
“Uhs and interrupted words: The information available to listeners,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 19-22.,
Abstract Speech disfluencies are generally assumed to harm comprehension. Our studies investigated whether this is true, or whether certain disfluencies might actually help comprehension by marking for listeners which information the speaker intends to repair. We tested two hypotheses: (1) whether an interrupted word signals that the word was produced in error, and (2) whether a filler such as uh after an interrupted word signals an error. Listeners heard fluent instructions and disfluent ones whose reparanda contained completed words, interrupted words, or interrupted words with fillers, and then responded to these instructions. Responses to mid-word interruptions were no faster than to between-word interruptions, although there were fewer errors when less of the unintended word was heard. Responses to mid-word interruptions with uh were faster and more accurate than controls without disfluencies. With more complex displays, the response time advantage (but not the error rate advantage) diminished, suggesting that an interrupted word followed by uh tells listeners what the speaker does NOT mean. A fourth experiment showed that it is not the presence of the uh per se, but the additional time after the interrupted word that is the source of this "disfluency advantage."
Keywords DiSS
-
“Speech Repairs: A Parsing Perspective,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 47-50.,
Abstract This paper presents a grammatical and processing framework for handling speech repairs. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. This parser can also correct a pre-parser speech repair identifier producing increases in recall varying from 2% to 4.8%.
Keywords DiSS
-
“A Comparative Analysis of Disfluencies in Four Swedish Travel Dialogue Corpora,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 3-6.,
Abstract This paper reports on ongoing work on disfluencies carried out at Telia Research AB. Four travel dialogue corpora are described: human-"machine"-human (Wizard-of-Oz); human-"machine" (Wizard-of-Oz); human-human and human-machine. The data collection methods are outlined and their possible influence on the collected material is discussed. An annotation scheme for disfluency labelling is described. Preliminary results on five different kinds of disfluencies are presented: filled and unfilled pauses, prolonged segments, truncations and explicit editing terms.
Keywords DiSS
-
“Between-Turn Pauses and Ums,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 15-17.,
Abstract Pauses and ums are often treated as two versions of the same thing, with the traditional label for ums, filled pauses, emphasizing this seeming interchangeability. To explore this hypothesis, I compared how overhearers interpreted a speaker's contribution to a conversation depending on whether the speaker responded immediately, paused and responded, or said um and responded. Overhearers answered a series of questions about the turn exchanges they had heard. The questions measured their interpretations of the second speakers' speech production difficulty, honesty, comfort with the topic discussed, familiarity with the interlocutor, and desire to have further contact with the interlocutor. In two experiments, the type of turn exchange was found to influence overhearers' interpretations. Results supply information about both the signalling properties of ums and the relationship between ums and pauses of varying lengths in the environment of a turn exchange.
Keywords DiSS
-
“Toward a formal characterisation of disfluency processing,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 35-38.,
Abstract Inherent structural characteristics of speech disfluencies are the prerequisite for the fulfilment of detecting and correcting speech disfluencies in spontaneous speech. However, a considerable number of recent research works on speech disfluencies focus on the surface patterns of speech disfluency editing structure, instead of looking into the relations between editing structure, the syntactic structure and the prosodic structure of speech disfluencies. In this paper we present first results of a new line of research, using feature structures modelled by finite state transducers, on the formal modelling of speech disfluencies in unplanned speech, in relation to all three levels of description.
Keywords DiSS
-
“Detecting and Correcting Speech Repairs in Japanese,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 43-46.,
Abstract One of the characteristics of spontaneous speech is the abundance of speech repairs, in which speakers go back and repeat or change something they have just said. In other work [7], we proposed a language model for speech recognition that can detect and correct speech repairs in English. In this paper, we show that this model works equally as well on a Japanese corpus of spontaneous speech. The structure of the model captures the language independent aspect of speech repairs, while machine training techniques on an annotated corpus learn the language dependent aspects.
Keywords DiSS
-
“Why does spontaneous speech unfold in temporal cycles, sometimes?,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 11-14.,
Abstract Spontaneous speech typically consists of alternating periods of continuous fluency, where fluency refers to the ratio of speech to pausing. Individual differences in fluency are substantial, with mean pause per minute ranging from less than 20 to more than 40 sec per minute in our sample of English and Mandarin speakers. While pauses have been regarded as critical clues for psycholinguistic analysis for decades, the existence of temporal cycles have been subject to extensive debate. The results of our experiments provide strong support for the presence of temporal cycles in spontaneous speech, and demonstrate in particular that fluency declines and increases prior and subsequent to topic shifts respectively. The source of temporal cycles is unclear, however. The prevailing assumption is that they reflect alternating periods of high level macro-planning, associated with low fluency, and low level micro-execution, associated with high fluency. However, a variety of alternative explanations merit consideration.
Keywords DiSS
-
“Comparing human and automatic speech recognition using word-gating,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 23-26.,
Abstract This paper describes a study in which we compare human and automatic recognition of words in fluent and disfluent spontaneous speech. In a word-level gating study with confidence judgements, we examine how the recognition and confidence of recognition of words by humans develops over utterances and show how disfluency disrupts the process. We give an automatic recogniser the same task and compare its performance with the humans’. With both systems, subsequent context supports word recognition: confidence in word recognition peaks after subsequent words have been heard. With both systems, disfluency adversely affects recognition of words in the immediate vicinity of the disfluent interruption (for repeats and repairs): disrupted subsequent context disrupts the recognition process.
Keywords DiSS
-
“Better detection of hesitations in spontaneous speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 39-42.,
Abstract Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speeech recognition work has concentrated on read speech, whose acoustic aspects differ significanlty from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., "uhh," "umm"), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficiences), identiyfing filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation.
Keywords DiSS
-
“Use of a postprocessor to identify and correct speaker disfluencies in automated speech recognition for medical transcription,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 27-30.,
Abstract Medical practitioners speak in a quasi-spontaneous monologue when they dictate a chart note, letter, or patient history. Prior research has largely ignored the issue of disfluency in dictation, arguing that speakers can control recording and start over if necessary. In 550,000 words of hand transcribed medical dictation, however, we find numerous filled pauses, repetitions, and other self-repairs. This paper describes: a pre-theoretical classification of disfluencies, developed to identify patterns useful in automatic text processing; the patterns of disfluency found in a corpus hand tagged with this classification, which include repetitions in combination with substitutions, insertions, and deletions; and, preliminary results of implementation of a disfluency pattern matcher and filter in a postprocessor developed for commercial use.
Keywords DiSS
-
“Filled Pause Distribution and Modeling in Quasi-Spontaneous Speech,” in Disfluency in Spontaneous Speech, Berkeley, CA, USA, July 1999, pp. 31-34.,
Abstract Filled pauses (FP's) are characteristic of spontaneous speech and present considerable problems for speech recognition by being often recognized as short words. Recognition of quasispontaneous speech (medical dictation) is subject to this problem as well. An um can be recognized as thumb or arm if the recognizer’s language model does not adequately represent FP’s. Representing FP’s in the training corpus improves recognition. Several techniques of seeding a training corpus with FP’s were evaluated to show that a stochastic method, along with random insertion uniformly distributed around the average sentence length, yield better results compared to random insertion at other ranges. The optimal method of seeding a training corpus with FP’s may be linked to clause boundaries despite the fact that an imperfect method of inserting FP’s at clause boundaries used in this study failed.
Keywords DiSS