• Title/Summary/Keyword: word dictionary

Search Result 276, Processing Time 0.027 seconds

A Postprocessing Method of Korean Character Recognition by Mis-recognized Morphology Presumption (오인식 형태소 추정에 의한 한국어 문자 인식 후처리 기법)

  • Kim, Young-Hun;Lee, Young-Hwa;Lee, Sang-Jo
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.7
    • /
    • pp.46-55
    • /
    • 1999
  • We proposed the new method of postprocessing which not only reduces the frequency of dictionary access using morphological analysis but improve the recognition rate of character recognizer. In this paper, after estimating morphological construction of mis-recognized word using the part of speech that is analyzed, correct presumed mis-recognized morphology. The postprocessing using a morphology unit reduce candidate because of short than word and frequency of dictionary access because there is no need to morphological analysis for candidate. To select right candidate is only necessary to dictionary access. The proposed results show that reduced the frequency of dictionary access to 60% than postprocessing method using a word unit and recognition rate improved from 94% to 97%.

  • PDF

An Experimental Study on an Effective Word Sense Disambiguation Model Based on Automatic Sense Tagging Using Dictionary Information (사전 정보를 이용한 단어 중의성 해소 모형에 관한 실험적 연구)

  • Lee, Yong-Gu;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.321-342
    • /
    • 2007
  • This study presents an effective word sense disambiguation model that does not require manual sense tagging Process by automatically tagging the right sense using a machine-readable and the collocation co-occurrence-based methods. The dictionary information-based method that applied multiple feature selection showed the tagging accuracy of 70.06%, and the collocation co-occurrence-based method 56.33%. The sense classifier using the dictionary information-based tagging method showed the classification accuracy of 68.11%, and that using the collocation co-occurrence-based tagging method 62.09% The combined 1a99ing method applying data fusion technique achieved a greater performance of 76.09% resulting in the classification accuracy of 76.16%.

Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition (한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구)

  • Lee Sang-Bok;Choi Dae-Lim;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.41-44
    • /
    • 2002
  • This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.

  • PDF

Study on the Meaning of Nasal discharge(涕) in Five fluids (오액(五液) 중(中) '체(涕)'의 의미에 대한 고찰)

  • Jang, Heewon;Song, Jichung;Eom, Dongmyung
    • Journal of Korean Medical classics
    • /
    • v.29 no.3
    • /
    • pp.75-80
    • /
    • 2016
  • Objectives : The paper raises an objection to the word '涕' being used to refer to nasal discharge, and proposes a word for nasal discharge upon studying a set of medical books. Methods : The author finds and confirms the dictionary definition of '涕' and studies how they are used differently in medical books. Through this study, the author shows how the word '涕' is used incorrectly and makes deductions for its reason. The author takes a look at the old form of the word '涕', its etymological origin, takes a guess as to the real word that should have been used to refer to nasal discharge, and find examples of instances where this correct word for nasal discharge are more appropriate. Results & Conclusions : In medical books such as Huangdineijing Suwen, '涕' is used to mean nasal discharge, but the word's dictionary definition does not validate such usage. Yugunryeombu (劉君廉夫), in its commentary for Somun, used '?' and '鼻夷' for '涕', and '?' means nasal discharge and used as same as '涕' when its used to mean tear. This is a phenomenon that originated from '弟' and '夷' being used interchangeably which led to the incorrect usage of '?'. If someone were to refer to nasal discharge, he needs to use '?'. '鼻夷' is believed to be the same word as '弟鼻', which is the old form of '?', and it means both tear(pronounced 'Che') and nasal discharge(pronounced 'Je'). However, the pronunciation different between 'Che' and 'Je', and its definition as tear, is divided in later periods into '涕' following the shape of '弟'. Following the shape of '夷', the meaning of nasal discharge remains in '?' while retaining the pronunciation of 'yi'. Therefore, the word '涕' used to mean nasal discharge is an incorrect form of '?', and should all be re-written to '?'.

A Word Dictionary Structure for the Postprocessing of Hangul Recognition (한글인식 후처리용 단어사전의 기억구조)

  • ;Yoshinao Aoki
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.9
    • /
    • pp.1702-1709
    • /
    • 1994
  • In the postprocessing of Hangul recognition system, the storage structure of contextual information is an important matter for the recognition rate and speed of the entire system. Trie in general is used to represent the context as word dictionary, but the memory space efficiency of the structure is low. Therefore we propose a new structure for word dictionary that has better space efficiency and the equivalent merits of trie. Because Hangul is a compound language, the language can be represented by phonemes or by characters. In the representation by phonemes(P-mode) the retrieval is fast, but the space efficiency is low. In the representation by characters(C-mode) the space efficiency is high, but the retrieval is slow. In this paper the two representation methods are combined to form a hybrid representation(H-mode). At first an optimal level for the combination is selected by two characteristic curves of node utilization and dispersion. Then the input words are represented with trie structure by P-mode from the first to the optimal level, and the rest are represented with sequentially linked list structure by C-mode. The experimental results for the six kinds of word set show that the proposed structure is more efficient. This result is based on the fact that the retrieval for H-mode is as fast as P-mode and the space efficiency is as good as C-mode.

  • PDF

WellnessWordNet: A Word Net for Unconstrained Subjective Well-Being Monitor ing Based on Unstructured Data and Contextual Polarity (웰니스워드넷: 비정형데이터와 상황적 긍부정성에 기반하여 주관적 웰빙 상태를 무구속적으로 모니터링하기 위한 워드넷 개발)

  • Song, Yeongeun;Nam, Suhyun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.1-21
    • /
    • 2016
  • IT-based subjective well-being (SWB) services, a main part of wellness IT, should measure the SWB state of individuals in an unrestrained, cost-effective manner. The dictionaries for sentiment analysis available in the market may be useful for this purpose, but obtaining proper sentiment values using only words from the sentiment lexicon is impossible; therefore, a new dictionary including wellness vocabulary is needed. The existing sentiment dictionaries link only a single sentiment value to a single sentiment word, although sentiment values may vary depending on personal traits. In this study, we develop an extended version of the SenticNet sentiment dictionary dubbed WellnessWordNet. SenticNet is considered the best and most expressive among the already existing sentiment dictionaries. Using the information provided by SenticNet, we created a database including the wellness states (estimated values) of stress, depression, and anger to develop the WellnessWordNet system. The accuracy of the system was validated through actual tests with live subjects. This study is unique and unprecedented in that i) an extended sentiment dictionary, WellnessWordNet, is developed; ii) values for wellness state language are offered; and iii) different sentiment values, namely contextual polarity, for people of the same gender or age group are suggested.

Sentiment Analysis Using Deep Learning Model based on Phoneme-level Korean (한글 음소 단위 딥러닝 모형을 이용한 감성분석)

  • Lee, Jae Jun;Kwon, Suhn Beom;Ahn, Sung Mahn
    • Journal of Information Technology Services
    • /
    • v.17 no.1
    • /
    • pp.79-89
    • /
    • 2018
  • Sentiment analysis is a technique of text mining that extracts feelings of the person who wrote the sentence like movie review. The preliminary researches of sentiment analysis identify sentiments by using the dictionary which contains negative and positive words collected in advance. As researches on deep learning are actively carried out, sentiment analysis using deep learning model with morpheme or word unit has been done. However, this model has disadvantages in that the word dictionary varies according to the domain and the number of morphemes or words gets relatively larger than that of phonemes. Therefore, the size of the dictionary becomes large and the complexity of the model increases accordingly. We construct a sentiment analysis model using recurrent neural network by dividing input data into phoneme-level which is smaller than morpheme-level. To verify the performance, we use 30,000 movie reviews from the Korean biggest portal, Naver. Morpheme-level sentiment analysis model is also implemented and compared. As a result, the phoneme-level sentiment analysis model is superior to that of the morpheme-level, and in particular, the phoneme-level model using LSTM performs better than that of using GRU model. It is expected that Korean text processing based on a phoneme-level model can be applied to various text mining and language models.

Favorable analysis of users through the social data analysis based on sentimental analysis (소셜데이터 감성분석을 통한 사용자의 호감도 분석)

  • Lee, Min-gyu;Sohn, Hyo-jung;Seong, Baek-min;Kim, Jong-bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.438-440
    • /
    • 2014
  • Recently it is used commercially to actively move the data from the SNS service. Therefore, we propose a method that can accurately analyze the information related to the reputation of companies and products in real time SNS environment in this paper.Identify the relationship between words by performing morphological analysis on the text data gathered by crawling the SNS scheme. In addition, it shows the visualization to analyze statistically through a established emotional dictionary morphemes are extracted from the sentence. Here, if the extracted word is not exist in sentimental dictionary. Also, we propose the algorithm that add the word to emotional dictionary automatically.

  • PDF

The Change of the Concept and Meaning of Bulgogi in Cookery Book & Dictionary (문헌에 나타난 불고기의 개념과 의미 변화)

  • Lee, Kyou-Jin;Cho, Mi-Sook
    • Journal of the Korean Society of Food Culture
    • /
    • v.25 no.5
    • /
    • pp.508-515
    • /
    • 2010
  • The purpose of this research was to investigate the transition of the concept and meaning of "bulgogi". "Bulgogi" is a representative Korean food and is also a global menu item. The first dictionary that presented the word "bulgogi" was the Keunsajeon (big dictionary). The results of an analysis of 17 dictionaries published in the last 60 years showed the immutable definition of "neobiani" as seasoned and broiled beef. In contrast, "bulgogi" has been termed differently, from "simply grilled meat of an animal" to the same meaning as that of "neobiani". Furthermore, to define the difference between common grilled meat in modern versus present time, a review of 26 cookery books from Sieuijeanseo, written in late 1800, to The Taste of Korea, written in 1987, were selected and examined. To date, the first appearance of the word "bulgogi" mentioned in a cook book was in Practice in Higher Cuisine, which was written by Shin- young Bang in 1958. The book states that "bulgogi" is the second name or the vulgar designation of "neobiani".

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF