• Title/Summary/Keyword: 어휘정보

Search Result 1,062, Processing Time 0.021 seconds

Handwritten Hangul Word Recognition from Small Vocabulary using Grapheme Combination Type (자모 결합 유형을 이용한 적은 어휘에서의 필기 한글 단어 인식)

  • Jin, Yu-Ho;Kim, Ho-Yeon;Kim, In-Jung;Kim, Jin-Hyeong
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.1
    • /
    • pp.52-63
    • /
    • 2001
  • 필기 단어 인식 방법에는 낱자별 분할 및 낱자 단위 인식을 통해 인식하는 방법과 단어 사전을 이용하여 단어와 영상을 직접 비교하는 방법이 있다. 이 중 후자는 인식 대상이 되는 단어들이 작은 수의 어휘로 제한되었을 대 매우 효과적이다. 본 논문에서는 입력 영상이 주어졌을 때 자모를 순차적으로 탐색하고 그 결과의 최적 조합을 찾아 인식하는 사전을 이용한 필기 한글 단어 인식 방법을 제안한다. 입력 영상은 사전의 각 단어와의 매칭을 통해 인식된다. 단어는 필기 순서로 정렬된 자모열로 표현하고 입력 영상은 획들의 집합으로 표현한다. 단어의 자모들은 입력 영상으로부터 추출된 획들의 집합으로부터 단계적으로 탐색된다. 각 단계에서는 전 단계까지의 매칭 상태와 탐색하려는 자모의 형태로부터 자모가 존재할 것이라고 기대되는 정합 기대 영역을 설정한 후 그 안에서 자모 탐색기를 이용해 자모를 찾는다. 자모 탐색기는 획들의 집합으로 이루어진 복수의 자모 후보와 그 점수를 출력한다. 각 단계마다 생성된 자모 후보들은 최적의 단어 매칭을 찾기 위한 탐색 공간을 이룬다. 본 연구에서는 단어 사전을 trie로 구성하고, 탐색 과정에서 dynamic programming을 이용하여 효과적으로 탐색을 수행하였다. 또한 인식 속도를 향상시키기 위해 산전 축소, 탐색 공간 축소 등 다양한 지식을 이용하였다. 제안하는 방법은 무제약으로 쓰여진 필기 단어도 인식 할 수 있을 뿐 아니라, 동적 사전을 이용하기 때문에 사전의 내용이 변하는 환경에서도 적용할 수 있다. 인식 실험에서는 39개의 단어로 이루어진 사전에 대하여 613개의 단어 영상에 대해 실험한 결과 98.54%의 높은 인식률을 보임으로써 제안하는 방법이 매우 효과적임을 확인하였다. 아니라 곰팡이 균주도 실제 praxis에 적합하게 개발시킬수 있다. 따라서 앞으로 발효육제품제조에 있어 starter culture가 갖는 의미는 매우 중요하며 특히 짧은 숙성기간을 거치는 발효소시지의 제조에 있어서는 필수불가결한 공정의 한 분야로 자리잡게 될 것이다.큰 차이 없었으나 이중포장과 진공포장은 상당히 효과적임을 알 수 있었다.로는 18%에 비하여 22%가 더 적합한 것으로 생각되었다.$0.15{\sim}0.35%$이었다.irc}C$에서 $13.49{\times}10^{-3}$이었다. 이 값들을 Arrhenius식에 대입하여 구한 활성화 에너지는 24.795 kJ/Kmol이었다. 이 값으로부터 결정한 살균 포장약주 명가의 상용 저장 수명은 $10^{\circ}C$에서 2년, $20^{\circ}C$에서 1년 4개월, $25^{\circ}C$에서 1년 2개월 이었다. 서울의 매월 평균 온도를 기준으로 계산할 때 본제품의 상용저장기간은 1년 8개월이었다.로 반죽이 호화되고 가열시간이 그 이상으로 증가할 때도 반죽의 호화가 약간은 진행되지만 $90^{\circ}C$ 이상의 가열온도에서는 가열시간 0.5분 이내에 반죽의 호화가 급속히 일어나고 가열 시간을 증가시켜도 더이상의 호화는 일어나지 않았다. 같은 조건에서는 waxy corn starch 반죽의 호화 속도가 corn starch보다 더 빠른 것으로 나타났다. 대표적으로 52% 수분함량에서 반응속도상수(k)와 가열온도(T)사이의 관계식은 corn starch의 경우 $logk=11.1140-4.1226{\times}10^3(1/T)

  • PDF

An Effect for Sequential Information Processing by the Anxiety Level and Temporary Affect Induction (불안수준 및 일시적 유발정서가 서열정보 어휘처리에 미치는 효과)

  • Kim, Choong-Myung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.4
    • /
    • pp.224-231
    • /
    • 2019
  • The current paper was conducted to unravel the influence of affect induction as a background emotion in the process of cognitive task to judge the degree of sequence in groups with or without anxiety symptoms. Four types of affect induction and two sequential task types were used as within-subject variables, and two types of college students groups classified under the Beck Anxiety Inventory (BAI) as a between-subject variable were selected to determine reaction times involving sequential judgment among the lexical relevance information. DmDx5 was used to present a series of stimuli and elicit a response from subjects. Repeated measured ANOVA analyses revealed that reaction times and error rates were significantly larger with anxiety participants compared to the normal group regardless of affect and task types. Within-subject variable effects found that specific affect type (sorrow condition) and number-related task type showed a more rapid response compared to other affect types and magnitude-related task type, respectively. In sum, these findings confirmed the difference in tendency with reaction time and error rates that varied as a function of accompanying affect types as well as anxiety level and task types suggesting the that underlying background affect plays a major role in processing affect-cognitive association tasks.

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

Non-Keyword Model for the Improvement of Vocabulary Independent Keyword Spotting System (가변어휘 핵심어 검출 성능 향상을 위한 비핵심어 모델)

  • Kim, Min-Je;Lee, Jung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.319-324
    • /
    • 2006
  • We Propose two new methods for non-keyword modeling to improve the performance of speaker- and vocabulary-independent keyword spotting system. The first method is decision tree clustering of monophone at the state level instead of monophone clustering method based on K-means algorithm. The second method is multi-state multiple mixture modeling at the syllable level rather than single state multiple mixture model for the non-keyword. To evaluate our method, we used the ETRI speech DB for training and keyword spotting test (closed test) . We also conduct an open test to spot 100 keywords with 400 sentences uttered by 4 speakers in an of fce environment. The experimental results showed that the decision tree-based state clustering method improve 28%/29% (closed/open test) than the monophone clustering method based K-means algorithm in keyword spotting. And multi-state non-keyword modeling at the syllable level improve 22%/2% (closed/open test) than single state model for the non-keyword. These results show that two proposed methods achieve the improvement of keyword spotting performance.

Development and Evaluation of a Document Summarization System using Features and a Text Component Identification Method (텍스트 구성요소 판별 기법과 자질을 이용한 문서 요약 시스템의 개발 및 평가)

  • Jang, Dong-Hyun;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.678-689
    • /
    • 2000
  • This paper describes an automatic summarization approach that constructs a summary by extracting sentences that are likely to represent the main theme of a document. As a way of selecting summary sentences, the system uses a model that takes into account lexical and statistical information obtained from a document corpus. As such, the system consists of two parts: the training part and the summarization part. The former processes sentences that have been manually tagged for summary sentences and extracts necessary statistical information of various kinds, and the latter uses the information to calculate the likelihood that a given sentence is to be included in the summary. There are at least three unique aspects of this research. First of all, the system uses a text component identification model to categorize sentences into one of the text components. This allows us to eliminate parts of text that are not likely to contain summary sentences. Second, although our statistically-based model stems from an existing one developed for English texts, it applies the framework to individual features separately and computes the final score for each sentence by combining the pieces of evidence using the Dempster-Shafer combination rule. Third, not only were new features introduced but also all the features were tested for their effectiveness in the summarization framework.

  • PDF

Korean Semantic Role Labeling Using Case Frame Dictionary and Subcategorization (격틀 사전과 하위 범주 정보를 이용한 한국어 의미역 결정)

  • Kim, Wan-Su;Ock, Cheol-Young
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1376-1384
    • /
    • 2016
  • Computers require analytic and processing capability for all possibilities of human expression in order to process sentences like human beings. Linguistic information processing thus forms the initial basis. When analyzing a sentence syntactically, it is necessary to divide the sentence into components, find obligatory arguments focusing on predicates, identify the sentence core, and understand semantic relations between the arguments and predicates. In this study, the method applied a case frame dictionary based on The Korean Standard Dictionary of The National Institute of the Korean Language; in addition, we used a CRF Model that constructed subcategorization of predicates as featured in Korean Lexical Semantic Network (UWordMap) for semantic role labeling. Automatically tagged semantic roles based on the CRF model, which established the information of words, predicates, the case-frame dictionary and hypernyms of words as features, were used. This method demonstrated higher performance in comparison with the existing method, with accuracy rate of 83.13% as compared to 81.2%, respectively.

A Reranking Method Using Query Expansion and PageRank Check (페이지 랭크지수와 질의 확장을 이용한 재랭킹 방법)

  • Kim, Tae-Hwan;Jeon, Ho-Chul;Choi, Joong-Min
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.231-240
    • /
    • 2011
  • Many search algorithms have been implemented by many researchers on the world wide web. One of the best algorithms is Google using PageRank technology. PageRank approach computes the number of inlink of each documents then ranks documents in the order of inlink members. But it is difficult to find the results that user needs, because this method find documents not valueable for a person but valueable for the public. To solve this problem, We use the WordNet for analysis of the user's query history. This paper proposes a personalized search engine using the user's query history and PageRank Check. We compared the performance of the proposed approaches with google search results in the top 30. As a result, the average of the r-precision for the proposed approaches is about 60% and it is better as about 14%.

A Korean Homonym Disambiguation Model Based on Statistics Using Weights (가중치를 이용한 통계 기반 한국어 동형이의어 분별 모델)

  • 김준수;최호섭;옥철영
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1112-1123
    • /
    • 2003
  • WSD(word sense disambiguation) is one of the most difficult problems in Korean information processing. The Bayesian model that used semantic information, extracted from definition corpus(1 million POS-tagged eojeol, Korean dictionary definitions), resulted in accuracy of 72.08% (nouns 78.12%, verbs 62.45%). This paper proposes the statistical WSD model using NPH(New Prior Probability of Homonym sense) and distance weights. We select 46 homonyms(30 nouns, 16 verbs) occurred high frequency in definition corpus, and then we experiment the model on 47,977 contexts from ‘21C Sejong Corpus’(3.5 million POS-tagged eojeol). The WSD model using NPH improves on accuracy to average 1.70% and the one using NPH and distance weights improves to 2.01%.

A Study on the YouTube Content Analysis and Users' Emotional Responses Analysis (대학도서관 유튜브 콘텐츠 내용분석과 이용자 감성반응 분석에 관한 연구)

  • Young Song;Ji-Hyun Kim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.73-93
    • /
    • 2023
  • This study conducted a comprehensive analysis and evaluation of library services using YouTube through content analysis of YouTube content and emotional response analysis of user comments. This study analyzed 2,169 YouTube contents and 6,487 comments of users from 61 university libraries. The results showed that the number of 'data' content was the largest among 4 categories, followed by 'communication' and 'education' content, and 'promotion' content. Among the sub-classifications, the number of 'information services' contents was the largest. In the analysis of users' emotional responses to YouTube content, the major categories of users' emotional responses were 'data' content and 'communication' content. Most of the user's emotional responses were positive in all categories of content, and the most frequent user emotional expression was 'good'. In addition, the vocabulary used in the user's emotional response was more about the person appearing in the video than the expression of the content of YouTube contents.

Analysis of Language Message Expression in Beauty Magazine's Cosmetic Ads : Focusing on "Hyang-jang", AMOREPACIFIC's from 1958 to 2018 (화장품광고에 나타난 언어메시지 표현분석 : 1958년~2018년의 아모레퍼시픽 뷰티매거진<향장>을 중심으로)

  • Choi, Eun-Sob
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.7
    • /
    • pp.99-118
    • /
    • 2019
  • This study confirmed the followings based on analysis of language messages in 718 advertisement in , AMOREPACIFIC's beauty magazine, published from 1958 to 2018 by product categories, era, in terms of purchase information, persuasive expression, word type. First, the number of pieces among 1980s to 1990s advertisement were the largest and, in terms of product categories, there were the greatest number of pieces in skincare, makeup and mens products. Second, headline and bodycopy had a different aspect in persuasive expression. "focused on image-making" was mainly used for head lines. Specifically, "situational image" was generally dominant. While the "user image" was higher before 1990's, "brand image" was as recent times. "Informal" was mostly applied for bodycopies, especially, "general information" and "differentiated information" was used the most. It is important to know what kind of information the brand established in each brand should be embodied rather than simply dividing the appeal method into "rational appeal" and "emotional appeal."Third, persuasive expression has different aspects in headlines and body copies. "focused on image-making" was mainly used as headlines. Specifically, "situational image" is dominant. Also, "user image" was high before 1990s but "brand image" got higher in recent times. "Informal" was mostly used as body copies, especially "general information" and "differentiated information" were the most frequently selected. Therefore, it is important to apprehend which information to specify established images by brands, rather than to divide "rational appeals" and "emotional appeals". Lastly, categorizing word type into brand names and headlines, foreign language was the most dominant in brand names and Chinese characters in headline. Remarkably, brand names in native language temporarily high in 70's and 80's, which could be interpreted to be resulted from the government policy promoting native language brands in those times. In addition, foreign language was frequently used in cosmetics and Chinese characters in men's product. It could be explained that colors or seasons in cosmetic products were expressed in foreign language in most case. On the other hand, the inclination of men's product consumers, where they pursue prestige or confidence in Chinese character, was actively reflected to language messages.