Building an Annotated English-Vietnamese Parallel Corpus for Training Vietnamese-related NLPs

Dien Dinh;Kiem Hoang;

대한전자공학회:학술대회논문집 (Proceedings of the IEEK Conference)

대한전자공학회 2004년도 ICEIC The International Conference on Electronics Informations and Communications
/
Pages.103-109
/
2004

대한전자공학회 (The Institute of Electronics and Information Engineers)

Building an Annotated English-Vietnamese Parallel Corpus for Training Vietnamese-related NLPs

Dien Dinh (Faculty of Information Technology, University of Natural Sciences) ;
Kiem Hoang (Center of Information Technology Development, Vietnam National University of HCMC)

발행 : 2004.08.01

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In NLP (Natural Language Processing) tasks, the highest difficulty which computers had to face with, is the built-in ambiguity of Natural Languages. To disambiguate it, formerly, they based on human-devised rules. Building such a complete rule-set is time-consuming and labor-intensive task whilst it doesn't cover all the cases. Besides, when the scale of system increases, it is very difficult to control that rule-set. So, recently, many NLP tasks have changed from rule-based approaches into corpus-based approaches with large annotated corpora. Corpus-based NLP tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for Vietnamese are at a deadlock due to absence of annotated training data. Furthermore, hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper, we present our building an annotated English-Vietnamese parallel aligned corpus named EVC to train for Vietnamese-related NLP tasks such as Word Segmentation, POS-tagger, Word Order transfer, Word Sense Disambiguation, English-to-Vietnamese Machine Translation, etc.

대한전자공학회:학술대회논문집 (Proceedings of the IEEK Conference)

Building an Annotated English-Vietnamese Parallel Corpus for Training Vietnamese-related NLPs

초록

키워드

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)