• Title/Summary/Keyword: unicode chinese character

Search Result 13, Processing Time 0.019 seconds

A Chinese Character(Hanja) Input System Based on Unicode 3.0 (유니코드 3.0 한자 입력시스템)

  • 윤지헌;변정용
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.375-377
    • /
    • 2000
  • 인터넷의 급속한 보급은 인간 생활의 많은 부분을 바꾸어 놓고있는데, 가장 대표적인 예로 전자상거래와 온라인 문서를 들 수 있다. 전자상거래와 온라인 문서는 과거 자국의 문자위주 PC통신상에서만 이루어지고 있었지만 현재는 대부분이 인터넷과 연동되어있다. 따라서 전자상거래와 온라인 문서 등을 전세계 사람들이 이용하기 위해서 만국 공통의 코드가 필요하게 되었다. 이러한 요구로 ISO10646 코드가 제정되고 발전하여 현재의 유니코드 3.0에 이르게 되었다. 유니코드 3.0에는 세계각국의 문자가 포함되어있고, 한국, 중국, 일본 등 한자문화권에서 공통적으로 많이 사용하는 한자 2만 7천여자도 포함되어있다. 이것은 과거 국내 표준인 완성형 한자 4천 8백여자와 비교하면 무척 많은 양이라 할 수 있다. 이러한 유니코드의 출현으로 국내외의 고문헌과 법전 등의 한자가 포함된 각종 문서를 인터넷상에서 제공할 수 있지만, 현재 유니코드 한자를 입력하기위한 방법은 MS Word2000의 한자 입력기만 있고 다른 운영체제나 인터넷 환경에서는 거의 전무한 상태이다. 본 논문에서는 운영체제에 독립적으로 작동하는 유니코드 한자입력시스템에 관하여 연구 개발하였다.

  • PDF

Study on the prerequisite Chinese characters for the education of traditional Korean medicine (한의학 교육을 위한 필수한자 추출 및 분석연구)

  • Hwang, Sang-Moon;Lee, Byung-Wook;Shin, Sang-Woo;Cho, Su-In;Yim, Yun-Kyoung;Chae, Han
    • Journal of Korean Medical classics
    • /
    • v.24 no.5
    • /
    • pp.147-158
    • /
    • 2011
  • There has been a need for an operational curriculum for teaching Chinese characters used by traditional Korean medicine (TKM), but the it was not thoroughly reviewed so far. We analysed the frequency of unicode Chinese characters with five textbooks of traditional Korean medicine used as a national standard. We found that 氣, 經, 陽, 陰, 不, 熱, 血, 脈, 病, 證, 寒, 中, 心, 痛, 虛, 大, 生, 治, 本, 之 are the 20 most frequently used Chinese characters, and also showed 100 frequently used characters for each textbook. We used a cumulative frequency analysis method to suggest a list of 1,000 prerequisite Chinese characters for the TKM education (TKM 1000). which represents the current usage of Chinese characters in TKM and covers 99% of all textbook use if combined with MEST 1800. This study showed prerequisite and essential Chinese characters for the implementation of evidence-based teaching in TKM. The TKM 1000, a prerequisite characters by this study based on the TKM textbooks can be used for the development of Korean Medicine Education Eligibility Test (KEET), entrance exam to the Colleges of Oriental Medicine or textbooks, and educational curriculum for premed students.

Language-Independent Word Acquisition Method Using a State-Transition Model

  • Xu, Bin;Yamagishi, Naohide;Suzuki, Makoto;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.3
    • /
    • pp.224-230
    • /
    • 2016
  • The use of new words, numerous spoken languages, and abbreviations on the Internet is extensive. As such, automatically acquiring words for the purpose of analyzing Internet content is very difficult. In a previous study, we proposed a method for Japanese word segmentation using character N-grams. The previously proposed method is based on a simple state-transition model that is established under the assumption that the input document is described based on four states (denoted as A, B, C, and D) specified beforehand: state A represents words (nouns, verbs, etc.); state B represents statement separators (punctuation marks, conjunctions, etc.); state C represents postpositions (namely, words that follow nouns); and state D represents prepositions (namely, words that precede nouns). According to this state-transition model, based on the states applied to each pseudo-word, we search the document from beginning to end for an accessible pattern. In other words, the process of this transition detects some words during the search. In the present paper, we perform experiments based on the proposed word acquisition algorithm using Japanese and Chinese newspaper articles. These articles were obtained from Japan's Kyoto University and the Chinese People's Daily. The proposed method does not depend on the language structure. If text documents are expressed in Unicode the proposed method can, using the same algorithm, obtain words in Japanese and Chinese, which do not contain spaces between words. Hence, we demonstrate that the proposed method is language independent.