嶄猟井

 

 

About MSIIPL
 
   
   
Introduction  
History
News
 
 
 
 

 

   

About MSIIPL -- News

 

 

 

   Two papers from MSIIP were accepted by the conference "2014 IEEE Spoken Language Technology Workshop "(SLT 2014). The citation information and abstract of these two papers are attached below:

   Zhiyang He, Ping Lv, Ji Wu. "LABEL CORRELATION MIXTURE MODEL FOR MULTI-LABEL TEXT CATEGORIZATION." in SLT, 2014, pp83-88

   Zhipeng Chen, Teng Zhang, Ji Wu. "SUBWORD SCHEME FOR KEYWORD SEARCH." in SLT, 2014, pp483-488

   The abstract of the former one is as below:

   Multi-label text categorization is more difficult but practical than the conventional binary or multi-class text categorization. This paper propose a novel probabilistic generative model, label correlation mixture model (LCMM), to depict the multiple labeled documents, which can be used for multi-label text categorization. In LCMM, labels and topics have the one-to-one correspondences. LCMM consists of two parts: label correlation model and multi-label conditioned document model. The former one formulates the generating process of labels and the dependencies between the labels are taken into account. We also propose an efficient algorithm for calculating the probability of generating an arbitrary subset of labels. Multi-label conditioned document model can be regarded as a supervised label mixture model, in which the labels for a document are known. To evaluate LCMM, multi-label text categorization experiments on three standard text data sets are performed. The experimental results demonstrate the effectiveness of LCMM, comparing to other reported methods.

   The abstract of the later one is as below:

   Keyword search (KWS) is an important application of spoken language technology. The technique of Large Vocabulary Continuous Speech Recognition (LVCSR) is playing an important role in KWS system. However, for a language with large vocabulary and relatively insufficient text corpus, the vocabulary size keeps going up very quickly with the increasing amount of text, as we observed in Tamil. This brings difficulty in training a reliable language model, which may undermine KWS performance. Subword unit has been successfully employed in KWS system to handle out-of-vocabulary (OOV) problem. Inspired by this, we propose a novel subword scheme from the perspective of pronunciation to alleviate the large vocabulary problem. We find that the subword-based system outperforms our best word-based system on Tamil conversational telephone speech. The experiment of system combination shows that, over the best word-based system, a single subword-based system contains more complementary information than the total of that of the other three word-based systems.

     
 
 

Tsinghua University  |  School of Information Science and Technology  |  Department of Electronic Engineering  |  USTC iFLYTEK

 
 

Copyright@Multimedia Signal and Intelligent Information Processing Laboratory