Search | Korea Science

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

Kim, Ki-Ju;Cho, Young-Bok
- Journal of information and communication convergence engineering
- /
- v.18 no.1
- /
- pp.33-38
- /
- 2020
Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.
https://doi.org/10.6109/jicce.2020.18.1.33 인용 PDF KSCI

A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System (도서관 서지정보의 한중일 로마자표기법에 대한 이용자 만족도 연구)

Ha, Yoo-Jin
- Journal of the Korean Society for information Management
- /
- v.27 no.2
- /
- pp.95-115
- /
- 2010
The purpose of this study is to investigate how individuals assess Chinese, Japanese, and Korean (CJK) transliterated bibliographic information on current library catalogs. Two separate studies, a survey and an experiment, were conducted using the WorldCat system. Users noted that Romanization has many issues which can inhibit user‘s ability to understand the transliterated bibliographic information even when it is in the person’s own native language and even when the individual had extensive experience with transliteration systems. The experimental results also supported these findings: participants had better results and satisfaction when looking for information written in English than when searching for transliterated information written in their native language. Implications for future research suggests a need to investigate user preferences for translation vs. transliteration of bibliographic information. This study proposes consideration of using English translation as a parallel link with CJK Romanization for bibliographic information.
https://doi.org/10.3743/KOSIM.2010.27.2.095 인용 PDF

CJK Chinese Character-Korean Character Conversion Keyword Domain Name System in Software Defined Network (소프트웨어 정의 네트워크를 이용한 한중일 한자-한국어 변환 키워드 도메인 이름 시스템)

Lee, SeungHun;Cho, SungChol;Xue, Yuanyuan;Lu, Kai;Xiang, Tiange;Han, Sunyoung
- Annual Conference on Human and Language Technology
- /
- 2019.10a
- /
- pp.339-342
- /
- 2019
본 논문에서는 소프트웨어 정의 네트워크를 이용한 한중일 한자-한국어 변환 키워드 도메인 이름 시스템을 제안하였다. 한자 체계를 주로 사용하는 한국, 중국, 일본에서 세 나라의 한자 수량이 너무 많기 때문에 우선 한국, 중국, 일본이 공용으로 사용하는 한자 체계인 CJK808을 가지고 연구하였다. 연구를 통해 CJK808 한자 체계에서 각 나라의 한자 특징도 많이 발견하였고, 그 중에서 표준자와 이체자의 다양성이 더욱 두드러졌다. SDN을 이용함으로써 관리 측면에서 다양한 이점을 얻을 수 있다. 제안하는 시스템을 통하여 사용자들은 한국, 중국, 일본 한자를 입력하면 SDN에서 관리하는 도메인 네임 서버를 통해 IP 주소를 얻을 수 있다.
PDF

Korean-Chinese Person Name Translation for Cross Language Information Retrieval

Wang, Yu-Chun;Lee, Yi-Hsun;Lin, Chu-Cheng;Tsai, Richard Tzong-Han;Hsu, Wen-Lian
- Proceedings of the Korean Society for Language and Information Conference
- /
- 2007.11a
- /
- pp.489-497
- /
- 2007
Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one syllable may map to several Chinese characters. We propose an effective hybrid person name translation method to improve the performance of KCIR. First, we use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. Second, we adopt the Naver people search engine to find the query name's Chinese or English translation. Third, we extract Korean-English transliteration pairs from Google snippets, and then search for the English-Chinese transliteration in the database of Taiwan's Central News Agency or in Google. The performance of KCIR using our method is over five times better than that of a dictionary-based system. The mean average precision is 0.3490 and the average recall is 0.7534. The method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation from Korean to Chinese. Hence, it substantially improves the performance of KCIR.
PDF

Search Result 4, Processing Time 0.016 seconds

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System (도서관 서지정보의 한중일 로마자표기법에 대한 이용자 만족도 연구)

CJK Chinese Character-Korean Character Conversion Keyword Domain Name System in Software Defined Network (소프트웨어 정의 네트워크를 이용한 한중일 한자-한국어 변환 키워드 도메인 이름 시스템)

Korean-Chinese Person Name Translation for Cross Language Information Retrieval

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)