Original Corpus of Hesitation Phenomena (CHP)
The original corpus of hesitation phenomena (CHP) was collated in 1997-1998 as part of my master's dissertation work (University of Birmingham). I was interested in studying the use of filled pauses in spontaneous speech and implications for how we should approach them in second language instruction. The purpose of the corpus was to gather my own primary data on filled pauses which could be compared against what was currently known about filled pauses and then used as a basis for language teaching recommendations. Four native speakers of English responded to several personal questions in an interview style format. The recordings were made using a Shure vocal microphone and a Harman-Kardon tape recorder. Although the recording environment was not suffficient for detailed acoustic analysis, it was suitable to capture all of the speech and make basic timing measurements.
The corpus consists of about 60 minutes of recordings. These were transcribed and annotated by me and contain word and syllable markings, tone unit boundaries with tone choice markings, hesitation phenomena markings, and speaking turn durations. Furthermore, silent and filled pauses are categorized coarsely by duration (i.e., short, medium, or long). The details of the methodology and mark-up are explained in the dissertation.
Some interesting findings from the CHP include some support for the idea that the open filled pause ('uh') and the closed filled pause ('um') are used under different conditions. Furthermore, evidence from the CHP suggest that there is a close parallel between lengthenings and filled pauses.
The corpus was compiled before the days when consent forms became common practice. Thus, I am still considering how I can distribute the corpus to others while protecting the rights of the original participants. I may make available some portion of the corpus in the near future. Watch the FPRC web site for updates.