嶄猟井

 

 

About MSIIPL
 
   
   
Introduction  
History
News
 
 
 
 

 

   

About MSIIPL -- News

 

 

 

   On September 9th, 2014, the Ph. D candidate Zhiyang He made an oral report in the conference for the paper "Minimum Classification Error Rate Training of Supervised Topic Mixture Model for Multi-label Text Categorization".

  The content of this report is attached below:

   Multi-label text categorization has received more and more attention in recent years, which is more difficult but practical than the conventional binary or multi-class text categorization. Supervised topic model approaches have been proved to be effective for the multi-label text categorization. Models in most of these approaches are trained by maximum likelihood estimation (MLE). This paper proposes a discriminative learning approach based on the minimum classification error rate training to further improve the classification performance. Different properties of this method are investigated and the experimental results demonstrate the effectiveness of this new approach. The performance of the discriminative learning approach is more than 10% relatively better comparing with that of MLE training.

 


  The Ph. D candidate Zhiyang made an oral report for the paper "Improving Keyword Search by Query Expansion in a Probabilistic Framework".

   The content of this report is attached below:

     Keyword search (KWS) in speech data has become an important area of research. Speech recognition error and out-of-vocabulary (OOV) problem are two major challenges in KWS. In this paper, a unified probabilistic framework is proposed for query expansion in KWS to counter both problems. The posterior scores of hits are re-estimated with this framework to re-rank hits and to determine decision thresholds. Experiments on Vietnamese conversational telephone speech show that the actual term-weighted value (ATWV) is significantly improved by expanding queries using this framework. Some deeper diagnostic analysis shows that this framework is insensitive to the parameter and is robust in large-scale expansion, where false alarm problem is very common.


  The graduate Hongyi Ding made an oral report for the paper "An Ontology Semantic Tree based Natural Language Interface".

   The content of this report is attached below:

   As more and more ontology knowledge bases have been published, every user may have access to a wealth of knowledge. However, to acquire the information in ontologies, users have to be familiar with ontologies and its formal query language. Therefore, natural language interfaces (NLI) have been proposed in recent years to bridge the gap between ontologies and non-expert users. Traditional approaches have pretty broad coverage of natural language (NL) and good performance on well-organised NL queries. But they are subject to the word order, due to the lack of original semantic information of queries. This paper proposes a NLI which accepts NL as input and generates SPARQL (SPARQL Protocol and RDF Query Language) queries as output. To analyze the NL queries, the ontology semantic tree has been used to represent the semantic conceptual structure of NL queries with the support of ontology. Our results show that the proposed system can make use of semantic structure effectively and has a better performance than the baseline system on the queries with flexible word order.


   

    On September 14thZhiyang He made a poster for the paper "An Effective and Robust Approach to Mandarin Spoken Language Understanding in Specific Domain".

   The abstract of this paper is attached below:

   This paper describes an effective and robust approach based on finite state word network for Mandarin spoken language understanding (SLU) in specific domain. A kind of syntax for grammar representation is defined to efficiently specify the utterances which may be spoken in a task. Moreover, arbitrary semantic meaning can be added into grammars conveniently. Then, the grammars are complied into a finite state word network, which contains both literal and semantic information defined by the grammars. A robust parser is implemented based on 3-dimensional dynamic programming. Given a transcription from an automatic speech recognition (ASR) system, the parser searches for the best path in the word network that matches the recognition text most closely. The semantic meaning of the transcription can then be extracted from the best path. Experimental results demonstrate the good performance and robustness of the proposed approach on a Mandarin SLU task.

  Hongyi Ding made a poster for the paper "Global Discriminative Model for Dependency Parsing in NLP Pipeline".

   The abstract of this paper is attached below:

   Dependency parsing, which is a fundamental task in Natural Language Processing (NLP), has attracted a lot of interest in recent years. In general, it is a module in an NLP pipeline together with word segmentation and Part-Of-Speech (POS) tagging in real Chinese NLP application. The NLP pipeline, which is a cascade system, will lead to error propagation for the parsing. This paper proposes a global discriminative re-ranking model using non-local features from word segmentation, POS tagging and dependency parsing to re-rank the parse trees produced by an N-best enhanced NLP pipeline. Experimental results indicate that the proposed model can improve the performance of dependency parsing as well as word segmentation and POS tagging in an NLP pipeline.

 

     

 

 

 

 

 

 

 
     
 
 

Tsinghua University  |  School of Information Science and Technology  |  Department of Electronic Engineering  |  USTC iFLYTEK

 
 

Copyright@Multimedia Signal and Intelligent Information Processing Laboratory