Automatic Term Recognition using Domain Similarity and Statistical Methods

Oh, Jong-Hoon;Lee, Kyung-Soon;Choi, Key-Sun;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 29 Issue 4
/
Pages.258-269
/
2002
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Automatic Term Recognition using Domain Similarity and Statistical Methods

분야간 유사도와 통계기법을 이용한 전문용어의 자동 추출

Oh, Jong-Hoon (Dept. of Computer Science, Korea Advanced Institute of Science and Technology) ;
Lee, Kyung-Soon ;
Choi, Key-Sun (Dept. of Computer Science, Korea Advanced Institute of Science and Technology)

오종훈 (한국과학기술원 전산학과) ;
이경순 (일본 NII(National Institute of Informatics)) ;
최기선 (한국과학기술원 전산학과)

Published : 2002.04.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, there are scopes to improve the performance of extracting terms still further by using the additional technical dictionaries. This paper focuses on the method for extracting terms using the hierarchy among technical dictionaries. Moreover, a statistical method based on frequencies, foreign words, and nested relations assists extracting terms which do not appear in dictionaries. Our method produces relatively good results for this task.

지금까지 전문용어를 자동으로 추출 (Automatic Term Recognition: ATR)하기 위한 많은 연구들이 있어 왔다. 이들 연구들은 주로 문서 내의 용어의 빈도수와 같은 단순한 통계정보를 이용하여 전문용어를 추출하였다. 하지만 전문분야의 기계가독형 사전의 구축으로 인하여 전문용어를 추출하는 데 있어 전문분야 사전의 사용이 가능하게 되었다. 본 논문에서는 이러한 기계가독형 전문분야 사전들을 이용하여 사전 간의 계층관계를 구축하고 이를 이용하여 전문용어를 추출하는 방법을 제시한다. 또한 전문용어 사전에서 나타나지 않는 전문용어를 추출하기 위하여 용어의 빈도수, 외래어 및 외국어, 내포관계 등을 포함한 통계기법을 이용한다. 본 논문에서 제안하는 기법은 기존의 방법에 비해 좋은 성능을 나타내었다.

Keywords

References

Bourigault, D., 'Surface grammatical analysis for the extraction of terminological noun phrases,' In Proceedings of the 14th International Conference on Computational Linguistics, COLING92, pp. 977-981, 1992 https://doi.org/10.3115/993079.993111
Dagan, I. and K. Church, 'Termight: Identifying and translating technical terminology,' In Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, EACL95, pp. 34-40, 1995
Frantzi, K.T. and S. Ananiadou, 'The C-value/NC-value domain independent method for multiword term extraction,' Jourmal of Natural Language Processing, Vol. 6, No. 3, pp. 145-180, 1999.9 https://doi.org/10.5715/jnlp.6.3_145
Justeson, J.S. and S.M. Katz, 'Technical terminology : some linguistic properties and an algorithm for identification in text,' Natural Language Engineering, Vol.1, No.1, pp.9-27, 1995
Lauriston, A., 'Automatic Term Recognition:performance of Linguistic and Statistical Techniques,' Ph D.Thesis, University of Manchester Institute of Science and Technology. 1996
Felber Helmut, Terminology Manual, International Information Centre for Terminology(Infoterm), 1984
ETRI, Etri-Kemong set, 1997
Anderberg, M.R, Cluster Analysis for Applications, Now Yokr: Academic, 1973
Murtagh, F., 'A Survey of Recent Advances in Hierarchical Clustering Algorithms,' Computer Journal, Vol.26, pp.354-359, 1983 https://doi.org/10.1093/comjnl/26.4.354
Lorr, M., 'Cluster Analysis and Its Application,' Advances in Information System Science, Vol.8, pp.169-192, 1983
Maynard, D. and Ananiadou, S., 'Acquiring Context Information for Term Disambiguation', In First Workshop on Computational Terminology Computerm98, pp. 86-90, 1998
Hisamitsu, Toru and Yoshiki Niwa, 'Extraction of useful terms from parenthetical expressions by using simple rules and statistical measures,' In First Workshop on Computational Terminology Computerm98, pp 36-42, 1998
이재성, 다국어 정보검색을 위한 영-한 음차 표기 및 복원 모델 박사학위 학위논문, 한국과학기술원 전산학과, 1999
오종훈, 최기선, '은닉 마르코프 모델을 이용한 과학기술문서에서의 외래어 자동 추출 모델', 제11회 한글 및 한국어 처리 학회 논문집 pp. 137-141, 1999
Oh, Jong-Hoon and Key-Sun Choi, 'Automatic Extraction of Transliterated Foreign Words using Hidden Markov Model,' ICCPOL'2001, pp.433-438, 2001
박영찬, 최기선, 김재군, 김영환, '한국어 정보검색을 위한 시험용 데이타 모음 2.0 개발', 1996년도 한국정보과학회 인공지능 연구회 춘계학술 대회, pp.59-65, 1996
강인호, 김길창, '최대 엔트로피 모델을 이용한 한국어 품사 태깅', 제10회 한글 및 한국어 정보처리 학술대회, pp.9-14, 1998
Ricardo B-Y. and Berthier R-N., 'Mordern Information Retrieval,' ACM Press New York and Addison-Wesley, 1999
Klavans, J. and Kan M.Y., 'Role of Verbs in Document Analysis,' In Proceedings of the 17th International Conference on Computational Linguistics, COLING98 pp. 680-686, 1998 https://doi.org/10.3115/980451.980959
Jacquemin, C., Judith L.K., and Evelyne, T., 'Expansion of Muti-word Terms for indexing and Retrieval Using Morphology and Syntax,' 35th Annual Meeting of the Associaiton for Computational Linguistics, pp 24-30, 1997

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Automatic Term Recognition using Domain Similarity and Statistical Methods

분야간 유사도와 통계기법을 이용한 전문용어의 자동 추출

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)