DOI QR코드

DOI QR Code

A New Approach of Domain Dictionary Generation

  • Xi, Su Mei (College of Information Technology, The University of Suwon) ;
  • Cho, Young-Im (College of Information Technology, The University of Suwon) ;
  • Gao, Qian (College of Information Technology, The University of Suwon)
  • Received : 2012.01.11
  • Accepted : 2012.03.14
  • Published : 2012.03.25

Abstract

A Domain Dictionary generation algorithm based on pseudo feedback model is presented in this paper. This algorithm can increase the precision of domain dictionary generation algorithm. The generation of Domain Dictionary is regarded as a domain term retrieval process: Assume that top N strings in the original retrieval result set are relevant to C, append these strings into the dictionary, retrieval again. Iterate the process until a predefined number of domain terms have been generated. Experiments upon corpus show that the precision of pseudo feedback model based algorithm is much higher than existing algorithms.

Keywords

References

  1. Jian Zhang, Jianfeng Gao, Ming Zhou, "Extraction of Chinese Compound Words: An Experimental Study on a Very Large Corpus", ACL2000 Second Chinese Language Processing workshop, 2000.
  2. YuSheng Lai, Chung, "Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown Word Methodology", ACM Transactions on Asian Language Information Processing , vol.1,pp. 34-64, 2002.
  3. Rocchio, J. Relevance, "feedback in information retrieval", In: The Smart Retrieval System Experiments in Automatic Document Processing, G. Salton, Ed. Prentice Hall, Englewood Cliffs, NJ, pp.313-323. 1971.
  4. Haodi Feng , Kang Chen, Xiao tie Deng et al, " Access or Variety Criteria for Chinese Word Extraction Computer Linguistics" , vol.30, no. 1, 2004. https://doi.org/10.1162/089120104773633394
  5. Haowei Qin, "A survey of Chinese new word recognizing characteristic", computer engineering, 2004, 12.
  6. Gang Zou, Yang Liu et al, "Chinese new word detection face to Internet", Journal of Chinese Information Processing, vol.18, no.6, pp.1-9, 2004.