![]() ![]() ![]() ![]() This corpus with speech intention tag could be widely used from basic research to applications of spoken dialogue. As a result, we confirmed that reliable data could be built. The reliability of tagging is evaluated by comparing the tagging among some annotators using kappa value. To evaluate the reliability of the intention tag, a tagging experiment was conducted. Over 35,000 utterance units in the CIAIR corpus have been tagged by hand. Therefore, we have designed an organization of tags, with focusing attention on layered tagging and context-dependent tagging. However, detailed information description increases tag types. ![]() Compared with the tags used for con-ventional corpus annotation, the proposed speech intention tag is specialized enough to determine system operations. This paper describes the design of speech act tags for spoken dialogue corpora and its evaluation. As the result of the experiment, we confirmed that our technique achieved as same detection capability as a human. Based on the analysis, we examined the possibility of detecting back-channel utterance timings by machine learning technique. Next, we analyzed the corpus and revealed the relation between back-channel utterance timings and information on bunsetsu, clause, pause and rate of speech. First, we constructed the back-channel utterance corpus by integrating the back-channel utterances that four subjects provided for the driver's utterances in 60 dialogues in the CIAIR in-car speech dialogue corpus. This paper describes construction of a back-channel utterance corpus and its analysis to develop the system which can output back-channel utterances at the proper timing in the responsive in-car speech dialogue. In particular, back-channel utterances, which the system outputs as voices such as"yeah"and"uh huh"in English have important roles for a driver in in-car speech dialogues because the driver does not look towards a listener while driving. In spoken dialogues, if a spoken dialogue system does not respond at all during user's utterances, the user might feel uneasy because the user does not know whether or not the system has recognized the utterances. In order to better understand what directions need to be explored to build datasets that best support the development and evaluation of algorithms for recognition, separation or localization that can be used in real-world applications, we present here a study of existing datasets in terms of their key attributes. Many data collection efforts have been conducted, moving along towards more and more realistic conditions, each mak- ing different compromises between mostly antagonistic factors: financial and human cost amount of collected data availability and quality of annotations and ground truth natural- ness of mixing conditions naturalness of speech content and speaking style naturalness of the background noise etc. While large datasets for automatic speech recognition (ASR) in clean environments with various speaking styles are available, the landscape is not as picture- perfect when it comes to robust ASR in realistic environments, much less so for evaluation of source separation and speech enhancement methods. Speech and audio signal processing research is a tale of data collection efforts and evaluation campaigns. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |