Two papers from MSIIP were accepted by the conference "2014 IEEE Spoken Language Technology Workshop "(SLT 2014). The citation
information and abstract of these two papers are attached below:
Zhiyang He, Ping Lv, Ji Wu. "LABEL CORRELATION MIXTURE MODEL FOR MULTI-LABEL TEXT CATEGORIZATION." in SLT, 2014, pp83-88
Zhipeng Chen, Teng Zhang, Ji Wu. "SUBWORD SCHEME FOR KEYWORD SEARCH." in SLT, 2014, pp483-488
The abstract of the former one is as below:
Multi-label text categorization is more difficult but practical than the conventional binary or multi-class text categorization.
This paper propose a novel probabilistic generative model, label correlation mixture model (LCMM), to depict the multiple labeled
documents, which can be used for multi-label text categorization. In LCMM, labels and topics have the one-to-one correspondences.
LCMM consists of two parts: label correlation model and multi-label conditioned document model. The former one formulates the
generating process of labels and the dependencies between the labels are taken into account. We also propose an efficient algorithm
for calculating the probability of generating an arbitrary subset of labels. Multi-label conditioned document model can be regarded
as a supervised label mixture model, in which the labels for a document are known. To evaluate LCMM, multi-label text categorization
experiments on three standard text data sets are performed. The experimental results demonstrate the effectiveness of LCMM,
comparing to other reported methods.
The abstract of the later one is as below:
Keyword search (KWS) is an important application of spoken language technology. The technique of Large Vocabulary Continuous
Speech Recognition (LVCSR) is playing an important role in KWS system. However, for a language with large vocabulary and
relatively insufficient text corpus, the vocabulary size keeps going up very quickly with the increasing amount of text,
as we observed in Tamil. This brings difficulty in training a reliable language model, which may undermine KWS performance.
Subword unit has been successfully employed in KWS system to handle out-of-vocabulary (OOV) problem. Inspired by this, we
propose a novel subword scheme from the perspective of pronunciation to alleviate the large vocabulary problem. We find that
the subword-based system outperforms our best word-based system on Tamil conversational telephone speech. The experiment of
system combination shows that, over the best word-based system, a single subword-based system contains more complementary
information than the total of that of the other three word-based systems.