• Title/Summary/Keyword: Chinese word segmentation

Search Result 17, Processing Time 0.025 seconds

A Reverse Segmentation Algorithm of Compound Nouns Using Affix Information and Preference Pattern (접사정보 및 선호패턴을 이용한 복합명사의 역방향 분해 알고리즘)

  • Ryu, Bang;Baek, Hyun-Chul;Kim, Sang-Bok
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.3
    • /
    • pp.418-426
    • /
    • 2004
  • This paper suggests a reverse segmentation Algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns are mostly derived from the Chinese characters and it includes some preference patterns, which are going to be utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory result from the comparative experimentation with other algorithm, especially most of the four or five-syllable compound nouns were successfully segmented without fail.

  • PDF

Chinese Segmentation and POS-Tagging by Automat ic POS Dictionary Training (품사 사전 자동 학습을 통한 중국어 단어 분할 및 품사 태깅)

  • Ha, Ju-Hong;Zheng, Yu;Lee, Gary G.
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.33-39
    • /
    • 2002
  • 중국어의 품사 태깅(part-of-speech tagging)을 위해서는 중국어 문장들은 내부 단어간의 명확한 분리가 없기 때문에 단어 분할(word segmentation)과 품사 태깅을 동시에 처리해야 한다. 본 논문은 규칙 기반(rule base)과 사전 기반(dictionary base) 기법을 혼합하여 구현한 단어 분할 시스템을 사용하여 입력 문장을 단어 단위로 분할하고, HMM(hidden Markov model) 기반 통계적 품사 태깅 기법을 사용한다. 특히, 본 논문에서는 주어진 말뭉치(corpus)로부터 자동 학습(automatic training)을 통해 품사 사전을 구축하여 구현된 시스템과 말뭉치간의 독립성을 유지한다. 말뭉치는 중국어 간체와 번체 모두를 대상으로 하고, 각 말뭉치로부터 자동 학습을 통해 얻어진 품사 사전으로 단어 분할과 품사 태깅을 한다. 실험결과들은 간체, 번체 각각의 단어 분할 성능과 품사 태깅 성능을 보여준다.

  • PDF

Corpus Based Unrestricted vocabulary Mandarin TTS (코퍼스 기반 무제한 단어 중국어 TTS)

  • Yu Zheng;Ha Ju-Hong;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

  • Lee, Donghyun;Park, Jae-Hyun;Kim, Kwang-Ho;Park, Jeong-Sik;Kim, Ji-Hwan;Jang, Gil-Jin;Park, Unsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4264-4279
    • /
    • 2017
  • In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Sentence design for speech recognition database

  • Zu Yiqing
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.472-472
    • /
    • 1996
  • The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]

  • PDF

Chinese Consumers' Satisfaction with On-line Purchasing Agent Services of Korean Fashion Products according to Their Selection Criteria and Information Source (중국 소비자의 패션상품 선택기준과 정보원 이용에 따른 한국 패션상품 온라인 구매대행 서비스 만족도: 상해지역 20-30대를 중심으로)

  • Liu, Jia;Hwang, Choon-Sup
    • Journal of Distribution Science
    • /
    • v.14 no.11
    • /
    • pp.117-128
    • /
    • 2016
  • Purpose - In order to collect information needed for the establishment of more effective marketing strategies of on-line purchasing agent services targeting Chinese consumers, the study investigated the relationship among Chinese selection criteria. They included fashion products, use of information source, and satisfaction with on-line purchasing agent services. The study also identified the differences in the Chinese selection criteria of fashion products, use of information source, and the satisfaction level with on-line purchasing agent services according to their age and gender. Research design, data, and methodology - The study was implemented through a normative-descriptive survey method using a self-administered questionnaire. Data were collected from February 9 to 28, 2016, and analyzed by factor analysis, ANOVA and Duncan test, t-test, and multiple regression analysis. Results - Differences were found in selection criteria of fashion products and use of information sources among groups. Thirty's age group was concerned about price/brand more than the twenty's were. Twenty's were concerned about practicality/quality of the products more than the thirty's. Hallyu/broadcasting was used by men more than by women as an information source of Korean fashion. SNS/WOM(word of mouth) was used more by women than by man. Twenty's showed lower level of satisfaction with customer services/credibility than other factors. The thirty's showed lower level of satisfaction with informational role of the service than other factors. Those who utilize each type of fashion information source more showed higher satisfaction level with on-line purchasing agent service of Korean fashion products.. In general, according to the selection criteria and use of information, there were differences in satisfaction with on-line purchasing agent service of Korean fashion products. Conclusions - Considering the findings of the study, as well as age, gender, selection criteria and use of information source, Chinese consumers could be used as a criteria of market segmentation for on-line purchasing agent services of Korean fashion products. The results manifested that there is a need to differentiate marketing strategies according to the satisfaction levels with each satisfaction factors of on-line purchasing agent service of Korean fashion products.

A Study on an effects of China consumers' self-congruence and public-cultural involvement on Hallyu contents evaluation and attitude (중국 소비자의 자기일치성과 대중문화 관여도가 한류콘텐츠 평가와 태도에 미치는 영향에 관한 연구)

  • Park, Se-Jeung;Choi, Jiyeon;Noh, Jeonpyo
    • Journal of Digital Convergence
    • /
    • v.14 no.2
    • /
    • pp.377-388
    • /
    • 2016
  • The purposes of this study are; first, to supply useful suggestions for the market segmentation of the Chinese market according to individual psychological variables(self-congruence) of customers. Second, it figures out the relative importance of actual self-congruence and ideal self-congruence. Lastly, it reveals whether there are any differences in the preference of attributes according to the popular cultural involvement. According to the results, actual self-congruence had a positive influence on the contents factor evaluation while ideal self-congruence had positive effect on the human and cultural factors evaluation. Also, the human and cultural factor had a positive influence on the purchase intention, as well as content factor and cultural factor had the influence on the word-of-mouth. In addition, the group of highly interested in the involvement at the popular culture was significantly higher in the evaluation on the human and contents factor than the low group.