[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2004.11B.3.387

Construction of Linearly Aliened Corpus Using Unsupervised Learning

Lee, Kong-Joo (경인여자대학 컴퓨터정보기술학부)
Kim, Jae-Hoon (한국해양대학교 컴퓨터공학과)

Publication Information

The KIPS Transactions:PartB / v.11B, no.3, 2004 , pp. 387-394 More about this Journal

Abstract

In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Keywords

Unsupervised Learning; Edit Distance; Linear Alignment; Korean-English(back) Transliteration; Grapheme-Phoneme Conversion; Korean Morphological Segmentation;

Citations & Related Records

Reference

1	The use of tree-trellis search for large-vocabulary mandarin polysyllabic word speech recognition / [ Huang, E.-F.;Soong, F. K.;Wang, H.-C. ] / Computer Speech and Language DOI ScienceOn
2	Krogh, A., Brown, M., Mian, I. S., Sjolander, K. and Haussler, D. 'Hidden Markov models in computational biology: Applications to protein modeling,' Journal of Molecular Biology, 235, pp.1501-1531, 1994 DOI ScienceOn
3	Allison, L., Powell, D. and Dix, T. I. 'Comptession and Approximate Matching,' The Computer Journal, 42(1), pp. 1-10, 1999 DOI ScienceOn
4	Breimer, E. A. A Learning Approach for Designing Dynamic Programming Algorithms, http://www.cs.rpi.edu/~breime/slide/, 2000
5	이재성, 다국어 정보검색을 위한 영-한 음차 표기 및 복원 모델, 한국과학기술원 박사학위논문, 1999
6	국립국어연구원, 표준대국어사전, (주)두산동아, 2000
7	CMU, CMU Pronouncing Dictionary, http://www.speech.cs.cmu.edu/speech/
8	이성진, Two-Level 한국어 형태소 해석, 한국과학기술원, 전산학과, 석사학위 논문, 1992
9	Antworth, E. L., PC-KIMMO : A Two-level Processor for Morphological Analysis, Summer Institute of Linguistics, 1990
10	김재훈, 김길창, 한국어에서의 품사 부착 말뭉치의 작성 요령 : KAIST 말뭉치, 한국과학기술원, 전산학과, CS-TR-95-99, 1995
11	Jurafsky, A. and Martin, J. H., An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, 2000
12	Mitchell, T. M. Machine Learning, McGraw-Hill, 1997
13	Rainber, L. R., 'A tutorial on hidden Markov models and selected application in speech recognition,' Proceedings of the IEEE, 77(2), pp.257-286. 1989 DOI ScienceOn
14	Huang, E.-F., Soong, F. K., and Wang, H.-C., 'The use of tree-trellis search for large-vocabulary mandarin polysyllabic word speech recognition,' Computer Speech and Language, 8, pp.39-50, 1994 DOI ScienceOn
15	Marcus, M. P., Santorini, B. and Marcinkiewicz, M. A. 'Building a large annotated corpus of English: The Penn Treebank,' Computational Linguistics, 19(2), pp.313-330, 1993
16	국립국어연구원, 21세기 세종계획 성과발표 및 토론회 자료집, 2004
17	Manning, C. D. and Schutze, H. Foundations of Statistical Natural Language Processing, The MIT Press, 1999
18	Ristad, E., Yianilos, P., 'Learning String Edit Distance,' IEEE Tr. on Pattern Analysis and Machine Intelligence, 20(2), pp.522-532, 1998 DOI ScienceOn
19	Qualian J. R., C4.5 : Programs for Machine Learning, San Mateo, CA : Morgan Kaufmann Publishers, 1993
20	Burges, C. J. C., 'A tutorial on support vector machines for pattern recognition,' Knowledge Discovery and Data Mining, 2(2), 1998 DOI ScienceOn

KSCI

Construction of Linearly Aliened Corpus Using Unsupervised Learning 자율 학습을 이용한 선형 정렬 말뭉치 구축

Construction of Linearly Aliened Corpus Using Unsupervised Learning