FPRC — The Disfluency in Spontaneous Speech (DiSS) and Linguistic Patterns in Spontaneous Speech (LPSS) Joint Workshop 2010

Rachel Baker, and Valerie Hazan, “LUCID: a corpus of spontaneous and read clear speech in British English,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 3-6. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_003.pdf.

Abstract This paper describes LUCID, the London UCL Clear Speech in Interaction Database, which contains spontaneous and read speech in clear and casual speaking styles for 40 Southern British English speakers. The problem-solving task used to collect the spontaneous speech, the DiapixUK task, is also described, along with ways of using the task to elicit different types of clear speech without explicit instruction, e,g. using different ‘barriers’ to communication. Applications of the corpus and of the task materials for future research projects are discussed. The corpus and materials will be available online to the research community at the end of the project.

Keywords DiSS, spontaneous speech, speech production, clear speech, interaction

Catia Cucchiarini, Joost van Doremalen, and Helmer Strik, “Fluency in non-native read and spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 15-18. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_015.pdf.

Abstract Various studies have investigated the temporal aspects of nonnative speech and their relation to perceived fluency, because fluency constitutes an important aspect of second language proficiency. For this purpose it is important to determine which measures are most strongly correlated with perceived fluency and how these measures vary. In the present study objective measures related to perceived fluency were calculated for read and spontaneous speech of non-native speakers of Dutch. The results indicate that the objective measures vary as a function of different variables. Suggestions are made for future investigations so as to facilitate comparisons between studies and meta-analyses.

Keywords DiSS, fluency, non-native speech, temporal measures

Anne Cutler, Holger Mitterer, Susanne Brouwer, and Annelie Tuinman, “Phonological competition in casual speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 43-46. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_043.pdf.

Abstract The natural processes affecting spontaneous speech production and the natural processes of spoken-word recognition combine to cause significant activation of irrelevant lexical competitors. Using eye-tracking, we show that reduced forms of words that occur in casual speech cause listeners to activate lexical candidates that resemble the reduced form but are quite unlike the canonical form of the intended word. In L2, the problem is worse: casual speech processes that occur in the L2 but not in the L1 lead to activation of irrelevant competitors even where native listeners experience no such competition.

Keywords DiSS, word recognition, competition, eyetracking

Robert Eklund, “The effect of directed and open disambiguation prompts in authentic call center data on the frequency and distribution of filled pauses and possible implications for filled pause hypotheses and data collection methodology,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 23-26. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_023.pdf.

Abstract This paper studies the frequency and distribution of filled pauses (FPs) in ecologically valid data where unaware and authentic customers called in to report problems with their telephony and/or Internet services and were met by a novel Wizard-of-Oz paradigm using real call center agents as wizards. The data analyzed were caller utterances following a directed or an open disambiguation prompt. While no significant differences in FP production were observed as a function of prompt type, FP frequency was found to be considerably higher than what is usually reported in the literature. Moreover, a higher proportion of utterance-initial FPs than normally reported was also observed. The results are compared to previously reported FP frequencies. Potential implications for data collection methodology are discussed.

Keywords DiSS, filled pauses, Wizard-of-Oz, WOZ, speech planning, speech production, many-options, data collection, open prompts, directed prompts, call center, dialog systems

Ian R. Finlayson, Robin J. Lickley, and Martin Corley, “The influence of articulation rate, and the disfluency of others, on one's own speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 119-122. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_119.pdf.

Abstract Disfluencies are a regular feature of spontaneous speech, and much has been learnt about the effects of various linguistic factors on their production. Speech usually occurs within dialogue, yet little is known about the influence of an interlocutor's speech on a speaker's own fluency. It has been shown that speakers tend to align on various levels, converging, for example, on lexical, and syntactic levels. But we know little about convergence in rate of speech or disfluency. Little is also known about the effects of speech rate on fluency in a speaker's own speech. In this paper, we examine these effects through analysis of speech rate, hesitation and error correction in a corpus of task-oriented dialogues (the HCRC Map Task Corpus). Our findings demonstrate that different types of disfluencies can be influenced in different ways by speech rate. Furthermore, the probability of an interlocutor being disfluent appears to affect the speaker's own likelihood, raising the possibility that interlocutors may “align” on disfluent, as well as fluent, speech.

Keywords DiSS, articulation rate, alignment, accommodation theory, dialogue

Anne Garcia-Fernandez, Ioana Vasilescu, and Sophie Rosset, “euh as cue for speaker confidence and word searching in human spoken answers in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 79-80. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_079.pdf.

Abstract This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question answering automatic systems.

Keywords DiSS, vocalic hesitation, feeling of knowing, rephrasing, interaction management, QA systems

Jean-Philippe Goldman, Mathieu Avanzi, and Antoine Auchlin, “Hesitations in read vs. spontaneous French in a multi-genre corpus,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 101-104. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_101.pdf.

Abstract This study is a part of an on-going work whose goal is the prosodic characterization of various speaking styles in a multi-genre 70-minutes French corpus as well as the development of prosodic automatic detection tools. In this corpus, a manual annotation prominences and disfluencies like hesitations and syntactic ruptures is used to show evident phonological aspects of hesitation in regard to quality, pause position and proximity to syntactic rupture.

Keywords DiSS, hesitation, filled pause, vowel lengthening, spoken French, disfluencies

Joakim Gustafson, and Daniel Neiberg, “Prosodic cues to engagement in non-lexical response tokens in Swedish,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 63-66. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_063.pdf.

Abstract This paper investigates the prosodic patterns of non-lexical response tokens in a Swedish call-in radio show. The feedback of a professional speaker was investigated to give insight in how to build a simulated active listener that could encourage its users to continue talking. Possible domains for such systems include customer care and second language learning. The prosodic analysis of the non-lexical response tokens showed that the engagement level decreases over time. Prosodic cues to this include change in syllabicity, pitch slope and loudness. We have also investigated prosodic alignment, to see to what extent the active listener mimic the prosody of the callers in his non-lexical response tokens.

Keywords DiSS, listener responses, prosodic cues, turn management, prosodic alignment

Corinna Harwardt, “Investigating the COG ratio as feature for speaker verification on high-effort speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 35-38. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_035.pdf.

Abstract Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%. When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

Keywords DiSS, vocal effort, speaker recognition, center of gravity ratio

Valerie Hazan, and Rachel Baker, “Does reading clearly produce the same acoustic-phonetic modifications as spontaneous speech in a clear speaking style?,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 7-10. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_007.pdf.

Abstract This paper describes an acoustic-phonetic comparison of casual and clear speech styles elicited in read and spontaneous speech. For the spontaneous speech, 20 pairs of English talkers were recorded doing a problem-solving picture task in good and degraded listening conditions. Each person also read sentences in casual and clear styles. The read clear speech was an exaggerated form of clear speech relative to the spontaneous clear speech: it had higher median F0 in both styles, a greater increase in F0 range and greater decrease in speaking rate between casual and clear styles, and trends towards greater vowel space expansion.

Keywords DiSS, spontaneous speech, read speech, clear speech, interaction, acoustic-phonetic characteristics

Pei-Yu Hsieh, “Pitch patterns in the vocalization of a 3-month-old Taiwanese infant,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 93-96. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_093.pdf.

Abstract This paper studied pitch contours of a Taiwanese-acquiring infant at gooing stage. Breath group theory has shown that pitch patterns of this stage were physiologically-based [6]. Fall was expected to occur at the boundary of a breath group. It predicted that Fall to be the most common pitch contour, and the second high was Rise-Fall. But previous studies [8], [9] showed that Rise-Fall occurred more. We investigated patterns of an infant from six weeks old to twelve weeks old. Mean f0 of basic contours of this stage were also shown. The f0 range of Level, Fall, and Rise were reported. Our results showed four types of contours (Level, Fall, Rise, Rise-Fall) appearing at this stage. Consistent with the hypothesis, Fall was found to be most common. Rise-Fall was found to be the second high. Fall and Rise-Fall made up to almost seventy percent. Level contour was found to be rare. The mean f0 of the infant at 3-month old was 400 Hz, higher than that of a toddler at 1;3 (370 Hz) and that of an adult (220 Hz). The f0 range was 700 Hz, greater than that of a toddler at 1;3 (450 Hz), and an adult (300 Hz).

Keywords DiSS, vocalization, pitch, acquisition

Yuichi Ishimoto, and Mika Enomoto, “Analysis of prosodic features for end-of-utterance prediction in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 97-100. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_097.pdf.

Abstract In this study, we analyzed prosodic features of accentual phrases and investigated their temporal changes to obtain cues for de- tecting boundaries at where turn-taking could occur in sponta- neous conversations. The acoustic parameters used as prosodic features were the fundamental frequency, sound pressure level, and duration of accentual phrases in long utterance units. The results showed that the fundamental frequency shift between the first and second accentual phrases could be useful for detecting the number of accentual phrases in the long utterance unit. In addition, the results suggested that a rapid decrease in sound pressure and an extended duration of the accentual phrase con- stitute a cue for detecting the end of the utterance. That is, the acoustic predictor of the utterance length appeared at the begin- ning of the utterance, and the predictor of the utterance bound- ary appeared shortly before the end of the utterance.

Keywords DiSS, prosody, turn-taking, accentual phrase, long utterance unit

Kristiina Jokinen, “Hesitation and uncertainty as feedback,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 103-106. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_103.pdf.

Abstract This paper deals with the signals that are used to express hesitation and uncertainty in conversational interactions. It studies the relation between gesturing, body posture, facial expressions, and speech, and draws conclusions of their role and function in the interpretation and coordination of interaction with respect to the basic enablements of communication. Dialogues are assumed to be cooperative activity that is constrained by the participants' roles, social obligations, and communicative situation.

Keywords DiSS, hesitation, uncertainty, interaction, speech

Takuya Kawada, “On the characteristics of three types of Japanese fillers: e-, ma-, and demonstrative-type fillers,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 27-30. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_027.pdf.

Abstract Japanese has various forms of fillers. However, the characteristics of each form have yet to be well understood. We use a large corpus of spontaneous Japanese speech and conversation and focus on three frequently observed types of fillers : e-, ma-, and demonstrative-type fillers. We show that it is possible to characterize Japanese fillers from the viewpoint of how a speaker concerns himself with the listener in the communicative setting. The type of discourse, way of speaking, and direction of gaze of the speaker influence the distribution of the types of filler.

Keywords DiSS, Japanese, fillers, spoken settings, gaze

Hanae Koiso, and Yasuharu Den, “Towards a precise model of turn-taking for conversation: a quantitative analysis of overlapped utterances,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 55-58. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_055.pdf.

Abstract In this paper, we present the outline of a new model of turntaking that is applicable not only to smooth transitions but also to transitions involving overlapping speech. We identify acoustic, prosodic, and syntactic cues in overlapped utterances that elicit early initiation of a next turn, based on a quantitative analysis of Japanese three-party conversations, proposing a model for predicting a turn's completion in an incremental fashion using sources from units at multiple levels.

Keywords DiSS, turn-taking, overlapped utterances, incremental processing

Rebecca Lunsford, Peter A. Heeman, Lois Black, and Jan van Santen, “Autism and the use of fillers: differences between ‘um’ and ‘uh’,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 107-110. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_107.pdf.

Abstract Little research has been done to explore differences in the use of the fillers ‘um’ and ‘uh’ between children with Autistic Spec- trum Disorder (ASD) and those with typical development (TD). Quantifying any differences could aid in diagnosing ASD, un- derstanding its nature, and better understanding the mechanisms involved in dialogue processing. In this paper, we report on a study of dialogues between clinicians and children with ASD or TD, finding that the two groups of children differ substantially in their use of ‘um’ but not ‘uh’. This suggests that these two fillers result from different cognitive processes.

Keywords DiSS, disfluencies, fillers, autism

Kikuo Maekawa, “Final lowering and boundary pitch movements in spontaneous Japanese,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 47-50. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_047.pdf.

Abstract Standard theory of the prosodic structure in Tokyo Japanese treats both the final lowering and boundary pitch movements as the properties of utterance node. Validity of this treatment was examined by means of corpus-based analyses of spontaneous speech. The results showed that while final lowering could be treated as a property of utterance, boundary pitch movement could not. The latter should rather be treated as the property of accentual phrase. Based on these results, revised prosodic structure and annotation scheme were proposed.

Keywords DiSS, final lowering, CSJ, X-JToBI, BPM

Takehiko Maruyama, Katsuya Takanashi, and Nao Yoshida, “An annotation scheme for syntactic unit in Japanese dialog,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 51-54. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_051.pdf.

Abstract In this paper, we propose a scheme for annotating syntactic units called DCU (Dialog Clause-Unit) in Japanese dialogs. Since there is no explicit devices to mark sentence boundaries in speech, precise definition and criteria must be designed to extract syntactic units from the utterance. We show a design of DCU which consists of clausal and non-clausal units. Annotating DCU tags to eight dialogs of 40 minutes from two different dialog corpora, we examine characteristics of each dialog from the viewpoint of DCU, and compare them to the distribution of clausal-units annotated to monologs.

Keywords DiSS, dialog clause-unit, Japanese dialog and monolog, clause boundary, unit length

Sandra Merlo, and Plínio A. Barbosa, “Periodic cycles of hesitation phenomena in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 19-22. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_019.pdf.

Abstract To verify whether hesitation phenomena are distributed periodically in spontaneous speech, twenty speech samples produced by five male adults were analyzed. Spectral analysis allowed for three main findings. First, hesitations present stationary behavior, which implies they did not accumulate in the beginning, in the middle, or in the end of speech samples. Second, periodic cycles of hesitation phenomena were detected in all speech samples (mean cycle duration around 13 seconds). This implies that regions with more hesitations tended to regularly alternate with regions with fewer hesitations. Third, periodic cycles accounted for about 30% of variance in data.

Keywords DiSS, hesitation phenomena, time series, periodic cycles

Emi Morita, “Salientizing the breaks in talk: a study of Japanese segmentizing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 59-62. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_059.pdf.

Abstract In naturally occurring conversation, Japanese speakers often break up their turns at talk with seemingly random or disfluent pauses that break the flow of talk into a series of successive small segments which may not be semantically coherent. Moreover, the boundaries between such segments are often made salient via the attachment of interactional particles, such as ne and sa. Empirical observation of such naturally occurring partitioning of talk reveals that such “semantically irregular” segmentation is used by both speakers and their recipients to accomplish a legitimate communicative function in managing the fine-tuned choreography of moment-bymoment conversational interaction.

Keywords DiSS, utterance segmentation, interactional particles, Japanese conversation

Daniel Neiberg, and Joakim Gustafson, “Modeling conversational interaction using coupled Markov chains,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 81-84. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_081.pdf.

Abstract This paper presents a series of experiments on automatic transcription and classification of fillers and feedbacks in conversational speech corpora. A feature combination of PCA projected normalized F0 Constant-Q Cepstra and MFCCs has shown to be effective for standard Hidden Markov Models (HMM). We demonstrate how to model both speaker channel with coupled HMMs and show expected improvements. In particular, we explore model topologies which take advantage of predictive cues for fillers and feedback. This is done by initializing the training with special labels located immediately before fillers in the same channel and immediately before feedbacks in the other speaker channel. The average F-score for a standard HMM is 34.1%, for a coupled HMM 36.7% and for a coupled HMM with pre-filler and pre-feedback labels 40.4%. In a pilot study the detectors are found to be useful for semi-automatic transcription of feedback and fillers in socializing conversations.

Keywords DiSS, fillers, feedbacks, coupled hidden markov models, cross-speaker modeling, conversation

Hannele Nicholson, Kathleen Eberhard, and Matthias Scheutz, “"um...i don't see any": the function of filled pauses and repairs,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 89-92. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_089.pdf.

Abstract We investigate disfluency distribution rates within different moves from an interactive task-oriented experiment to further explore the suggestion by Bortfeld et al. [1] and Nicholson [2] that different types of disfluencies may fulfill varying functions. We focus on disfluency types within moves, or speech turns, where a speaker initiates something compared to a response to such a move. We find that filled pauses (FPs) such as um or uh fulfilled an interpersonal role for participants while repairs occurred out of difficulty.

Keywords DiSS, disfluency, dialogue, dialogue moves, language production

Kazuki Sekine, “Gesture correction in children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 71-74. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_071.pdf.

Abstract Speakers sometimes modify their gestures during the process of production into disguised adaptors. Such disguised adaptors can be treated as evidence that speakers can monitor their gestures. This study investigated when disguised adaptors are produced in Japanese elementary school children. The results showed that children did not produce disguised adaptors until the age of 8. The emergence of disguised adaptors suggested that children start to monitor their gestures when they are 9 or 10 years old. Cultural influences and cognitive changes were considered as factors to influence emergence of disguised adaptors.

Keywords DiSS, spontaneous gestures, adaptors, speech error

Shu-Chuan Tseng, and Yun-Ru Huang, “A socio-phonetic analysis of Taiwan Mandarin interview speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 67-70. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_067.pdf.

Abstract This paper presents results of a socio-phonetic analysis of Taiwan Mandarin by using a corpus of questionnaire-based interview speech. Questions were asked to collect data of the interviewee's background of language use, socio-economic status, and internet access in different regions of Taiwan. Two typical dialect-influenced pronunciation errors, the deletion of /w/ before /o/ and the delabilialization of /y/ were analyzed with the associated socio-economic factors and the degree of dialect exposure. The degree of dialect exposure (Southern Min) and the studied pronunciation variants are statistically correlated with the accuracy rate. But no direct correlation was found between the pronunciation variation and the socioeconomic factors.

Keywords DiSS, sociophonetics, Taiwan Mandarin, interview speech

Shu-Chuan Tseng, and Tzu-Lun Lee, “Contextual effects in recognizing reduced words in spontaneous speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 39-42. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_039.pdf.

Abstract This study investigates the effects of context on recognizing reduced word forms in spontaneous speech. Sixteen high-frequency disyllabic targets, eight disyllabic and eight combinations of monosyllabic words are presented to 48 subjects in a spoken word recognition experiment in three conditions: in their original context, in isolation, and embedded in a carrier sentence. Results show that context, degree of reduction, word unit type, gender, and age group all show an effect on the accuracy rates of recognizing the target items. Most interestingly, while a meaningful context helps recognize reduced word forms, a less meaningful context inhibits the recognition more than no context.

Keywords DiSS, spoken word recognition, context effect

Shu-Chuan Tseng, Pei-Chen Tsou, Ko Kuei, and Chien-Wen Lee, “Assessing sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 11-14. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_011.pdf.

Abstract This paper examines sentence repetition and narrative speech data produced by hearing-impaired and normally hearing children with matched gender, age and level of speech comprehension. We assessed these two kinds of speech styles by talker intelligibility, vowel space, and spike production in plosives. In both speaking styles, normally hearing children performed better in talker intelligibility than their hearingimpaired counterparts. No clear vowel space shrinkage was observed in respect of speech style, hearing impairment, and age group. Surprisingly, the production of the spike in plosives was a useful measure for distinguishing acoustic properties of different speaking styles and hearing ability.

Keywords DiSS, speech assessment, hearing impairment, speaking style, acoustic properties

Ioana Vasilescu, Sophie Rosset, and Martine Adda-Decker, “On the functions of the vocalic hesitation euh in interactive man-machine question answering dialogs in French,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 111-114. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_111.pdf.

Abstract This paper deals with the functions of the French vocalic hesitation euh in interactive speech of man-machine question answering dialogs. The present analysis suggests that the vocalic hesitation euh may carry various properties in speech, both disfluent signaling the speakers' efforts to put the intended message under production into appropriate words, and fluent, as markers of discourse structure. Moreover, euh seems to play a role in bracketing lexical units, pointing to the informative content within an utterance. This bracketing may favour intelligibility or decoding fluency on the listener's side. The potential contribution of the vocalic hesitation euh to lexical information bracketing is investigated with the goal of improved information processing by QA systems. Future objectives include a smarter interaction capacity by an appropriate usage of such euh items.

Keywords DiSS, disfluency, fluency, vocalic hesitation, French, discourse markers, Q/A, dialog corpus

Kun-Ching Wang, Chiun-Li Chin, and Yi-Hsing Tsai, “Voice activity detection based on combination of weighted sub-band features using auto-correlation function,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 85-88. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_085.pdf.

Abstract This paper shows the voice activity detection (VAD) based on combination of weighted sub-band features using autocorrelation function. According to the fact that the noise corruption on each sub-band is different from each other, so the estimated signal to noise ratio (SNR) is employed to weight utility rate of each frequency sub-band. Furthermore, a strategy of sub-band features combination is used to integrate all of weighted sub-band auto-correlation function feature parameter and to develop the combined feature parameter. Experimental results demonstrate that the proposed VAD achieves better performance than existing standard VADs at any noise level.

Keywords DiSS, voice activity detection, auto-correlation, wavelet packet transform, sub-band weighting, feature combination

Michiko Watanabe, and Yasuharu Den, “Utterance-initial elements in Japanese: a comparison among fillers, conjunctions, and topic phrases,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 31-34. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_031.pdf.

Abstract Speakers need to plan the following part of speech under the pressure of a temporal imperative at utterance-initial positions. Each language seems to have some devices to solve this problem, which we call utterance-initial elements (UIEs). We investigated effects of two factors, boundary strengths and complexity of the following constituents, on the durations of possible UIEs, such as fillers, conjunctions, and topic phrases. We found that the last mora of filler e, as well as wa-marked topic phrases, became longer as the complexity increased in certain conditions. Possible interpretations for the results are discussed.

Keywords DiSS, utterance-initial elements, prolongation, boundary strengths, constituent complexity

Li-chiung Yang, “Meaning and use: a pragmatic and prosodic analysis of interjections in conversational speech,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 75-78. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_075.pdf.

Abstract In this paper we report on our research on the pragmaticcontextual meaning and prosody of three interjections ey, wa, and oh. A detailed qualitative-contextual analysis of our corpus shows that these interjections share important contextual and prosodic characteristics due to their similar functional status with respect to new or unexpected information. We show that there are also significant differences in contextual meaning arising from specific emotional or cognitive states, and that these differences are expressively communicated in the varied prosody of each interjection.

Keywords DiSS, prosody, meaning, interjections, discourse

Etsuko Yoshida, and Robin J. Lickley, “Disfluency patterns in dialogue processing,” in DiSS-LPSS Joint Workshop 2010 - 5th Workshop on Disfluency in Spontaneous Speech and 2nd International Symposium on Linguistic Patterns in Spontaneous Speech, Tokyo, Japan, September 2010, pp. 115-118. http://www.isca-speech.org/archive/diss_lpss_2010/papers/dl10_115.pdf.

Abstract Spontaneous speech abounds with disfluencies such as filled pauses, repairs, repetitions, false start and prolongations, all of which are significant but easily overlooked features of speech communication. Based on the comparable corpora of English and Japanese dialogues, we argue that disfluency features can have a positive effect on turn-taking issues and the establishment of common referring expressions in dialogue processing. We examined the occurrence of ten types of filled pauses in Japanese and investigated how they interact with discourse entities and the sharing of common ground. The results indicate that two patterns of disfluency features contribute to on-line speech planning of the participants and their four functions serve to construct the collaborative process of speech communication.

Keywords DiSS, dialogue, disfluency, referring expressions, corpus, common ground

Filled Pause

Research Center

Filled Pause

Research Center

Filled Pause

Research Center

The DiSS-LPSS Joint workshop (DiSS-LPSS 2010)

Papers presented