• Title/Summary/Keyword: 어휘정보

Search Result 1,062, Processing Time 0.025 seconds

KMSCR: A system for managing knowledge assets of an IT consulting firm (IT 컨설팅 회사의 지적 자산 관리를 위한 지식관리시스템)

  • 김수연;황현석;서의호
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.233-239
    • /
    • 2001
  • 최근 대부분의 회사들은 업무를 수행하는데 필요한 지식과 노하우를 공유하고 재사용하기 위하여 지적 자산 관리의 중요성을 인식하고 있다. 특히 고도로 지식 집약적인 업종이라 할 수 있는 IT컨설팅 회사에서는 지적 자산의 관리가 다른 어떤 회사에서보다 큰 중요성을 가지게 된다. 컨설팅 회사에 있어서 검증이 완료된 지적 자산의 공유 및 지능적이면서도 신속한 검색은 컨설팅 서비스의 품질과 고객 만족에 직결되는 중요한 요소이다. 따라서 대부분의 컨설팅 회사들은 자사의 지식 자산을 관리하기 위하여 많은 노력을 기울이고 있다. 본 논문의 목적은 IT 컨설팅 회사예서 관리되는 다양한 형태의 지적 자산들을 중앙 관리하여 설친 고객 사이트에 흩어져 프로젝트를 수행하는 컨설턴트들이 공유할 수 있도록 함으로써 컨설팅 서비스의 생산성과 품질들 높이고자 하는데 있다 이를 위하여 건설팅 회사에서 관리되는 모든 지적 자산의 재고를 조사하여 모델링하고 이를 쉽게 저장하고 검색할 수 있는 시스템 아키텍처를 제안한다. 제안된 아키텍처를 NT 기반에서 Index server를 이용하여 시스템으로 구현하였다 (KMSCR: A Knowledge Management System for managing Consulting Resources). KMSCR에서는 컨설턴트가 찾고자 하는 검색어를 입력하면 다양한 포맷의 (.doc, .ppt, xls, .rtf, .txt, .html 등과 같은) 결과물을 관련성이 높은 순서대로 출력해 줌으로써 컨설팅 리소스를 효과적으로 재사용할 수 있도록 도와 준다. 또한 검색 시에는 미리 등록된 키워드 뿐 아니라 본문 내의 텍스트 검색까지 가능하게 함으로써 컨설팅 리소스에 대한 보다 효과적이고 효율적인 검색을 가능하게 한다.간을 성능 평가 인자로 하여 수행하였다. 논문에서 제한된 방법을 적용한 개선된 RICH-DP을 모의 실험을 통하여 분석한 결과 기존의 제한된 RICH-DP는 실시간 서비스에 대한 처리율이 낮아지며 서비스 시간이 보장되지 못했다. 따라서 실시간 서비스에 대한 새로운 제안된 기법을 제안하고 성능 평가한 결과 기존의 RICH-DP보다 성능이 향상됨을 확인 할 수 있었다.(actual world)에서 가상 관성 세계(possible inertia would)로 변화시켜서, 완수동사의 종결점(ending point)을 현실세계에서 가상의 미래 세계로 움직이는 역할을 한다. 결과적으로, IMP는 완수동사의 닫힌 완료 관점을 현실세계에서는 열린 미완료 관점으로 변환시키되, 가상 관성 세계에서는 그대로 닫힌 관점으로 유지 시키는 효과를 가진다. 한국어와 영어의 관점 변환 구문의 차이는 각 언어의 지속부사구의 어휘 목록의 전제(presupposition)의 차이로 설명된다. 본 논문은 영어의 지속부사구는 논항의 하위간격This paper will describe the application based on this approach developed by the authors in the FLEX EXPRIT IV n$^{\circ}$EP29158 in the Work-package "Knowledge Extraction & Data mining"where the information captured from digital newspapers is extracted and reused in tourist information context.terpolation performance of CNN was relatively

  • PDF

Korean Semantic Role Labeling Based on Suffix Structure Analysis and Machine Learning (접사 구조 분석과 기계 학습에 기반한 한국어 의미 역 결정)

  • Seok, Miran;Kim, Yu-Seop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.555-562
    • /
    • 2016
  • Semantic Role Labeling (SRL) is to determine the semantic relation of a predicate and its argu-ments in a sentence. But Korean semantic role labeling has faced on difficulty due to its different language structure compared to English, which makes it very hard to use appropriate approaches developed so far. That means that methods proposed so far could not show a satisfied perfor-mance, compared to English and Chinese. To complement these problems, we focus on suffix information analysis, such as josa (case suffix) and eomi (verbal ending) analysis. Korean lan-guage is one of the agglutinative languages, such as Japanese, which have well defined suffix structure in their words. The agglutinative languages could have free word order due to its de-veloped suffix structure. Also arguments with a single morpheme are then labeled with statistics. In addition, machine learning algorithms such as Support Vector Machine (SVM) and Condi-tional Random Fields (CRF) are used to model SRL problem on arguments that are not labeled at the suffix analysis phase. The proposed method is intended to reduce the range of argument instances to which machine learning approaches should be applied, resulting in uncertain and inaccurate role labeling. In experiments, we use 15,224 arguments and we are able to obtain approximately 83.24% f1-score, increased about 4.85% points compared to the state-of-the-art Korean SRL research.

A Study on the Considerations in Rules for Authorized Access points of Music Work (음악 저작의 전거형접근점 규칙 마련시 고려사항에 관한 연구)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.49 no.4
    • /
    • pp.147-166
    • /
    • 2018
  • This study is to suggest the considerations in the rules for authorized access points for collocation of music work by figuring out the directions of authorized access points in FRBR, LRM, ICP 2016, RDA, and BIBFRAME, and by analyzing RDA rules for attributes and authorized access points of music works and expression and VIAF examples. First, an aggregated authorized access points were suggested as the direction of authorized access points, and original title may be selected as preferred title and the authorized access point may be based on forms in one of the languages suited to the users, if the original title is not normally suited. Second, music works's authorized access points is consisted of composer authorized access point and preferred title, and of adapter's authorized access point and preferred title in case of lacks of responsibility in composer. Also, the authorized access point of Korean traditional music work must be reviewed according to work types considering the responsibility of composer. Third, the controlled vocabularies for name of music type, medium of performance, and key could be considered for describing the attributes of work and expression. This study would be the foundation study for the authorized access point of music work, and additional research should be completed through surveying music user's need.

A Study on Considerations in the Authority Control to Accommodate LRM Nomen (LRM 노멘을 수용하기 위한 전거제어시 고려사항에 관한 연구)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.109-128
    • /
    • 2021
  • This paper is to explore considerations in authority control to accommodate LRM nomen entities through the literature reviews, the analysis of RDA rules, and the opinion survey of domestic catalog experts. As a result, for authority control, considerations were proposed in the aspect of nomen's attribute elements, catalog description, and MARC authority format. First, it is necessary to describe in as much detail as possible the category, the scheme, intended audience, the context of use, the reference source, the language, the script, the script conversion as the attributes of the nomen with the status of identification, note, and indifferentiated name indicators added in RDA. Second, the description method of attribute elements and relational elements of nomen can be unstructured, structured, identifier, and IRI as suggested in RDA, and vocabulary encoding scheme (VES) and string encoding scheme (SES) should be written for structured description, Also, cataloging rules for structuring authorized access points and preferred names/title should be established. Third, an additional expansion plan based on Maxwell's expansion (draft) was proposed in order to prepare the MARC 21 authority format to reflect the LRM nomen. (1) The attribute must be described in 4XX and 5XX so that the attribute can be entered for each nomen, and the attributes of the nomen to be described in 1XX, 5XX and 4XX are presented separately. (2) In order to describe the nomen category, language, script, script conversion, context of use, and date of usage as a nomen attribute, field and subfield in MARC 21 must be added. Accordingly, it was proposed to expand the subfield of 368, 381, and 377, and to add fields to describe the context of use and date of usage. The considerations in authority control for the LRM nomen proposed in this paper will be the basis for establishing an authority control plan that reflects LRM in Korea.

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance (한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교)

  • Cho, Heeryon;Im, Hyeonyeol;Yi, Yumi;Cha, Junwoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.133-140
    • /
    • 2022
  • We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.

LRM's Characterics and Applications Plan Through Comparing with FRBR (FRBR과 비교를 통한 LRM의 특징 및 적용방안)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.2
    • /
    • pp.355-375
    • /
    • 2022
  • This study is to grasp LRM's feature and applications plan to reflect LRM to cataloging related standards and individual system through comparing and analyzing LRM with the FR model in terms of entities, attributes, and relationships. The application plan is suggested as follows. First, the entity can be extended by defining sub-entities of each entity in the standards and the individual system in order to reflect LRM, even though entities such as families, groups, identifiers, authorized access points, concepts, objects, events, agency and rules have been deleted in LRM. Second, the attribute should be subdivided in the standards and the individual system in order to apply LRM, though many attributes have been changed to relationships for linked data and decreased in LRM. In particular, more specific and detailed property names in the standards and the individual system should be clearly presented, and the vocabulary encoding scheme corresponding to each property should be also developed, since properties with similar functions or repetition in various entities, and material specific properties are generalized and integrated into comprehensive property names. Third, the relationship should be extended through newly declaring the refinement or subtype of the relationship and considering a multi-level relationship, since the relationship itself is general and abstract under increasing the number of relationships in comparing to the property. This study will be practically utilized in cataloging related standards and individual system for applying LRM.

Mapping Heterogenous Ontologies for the HLP Applications - Sejong Semantic Classes and KorLexNoun 1.5 - (인간언어공학에의 활용을 위한 이종 개념체계 간 사상 - 세종의미부류와 KorLexNoun 1.5 -)

  • Bae, Sun-Mee;Im, Kyoung-Up;Yoon, Ae-Sun
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.95-126
    • /
    • 2010
  • This study proposes a bottom-up and inductive manual mapping methodology for integrating two heterogenous fine-grained ontologies which were built by a top-down and deductive methodology, namely the Sejong semantic classes (SJSC) and the upper nodes in KorLexNoun 1.5 (KLN), for HLP applications. It also discusses various problematics in the mapping processes of two language resources caused by their heterogeneity and proposes the solutions. The mapping methodology of heterogeneous fine-grained ontologies uses terminal nodes of SJSC and Least Upper Bounds (LUB) of KLN as basic mapping units. Mapping procedures are as follows: first, the mapping candidate groups are decided by the lexfollocorrelation between the synsets of KLN and the noun senses of Sejong Noun Dfotionaeci(SJND) which are classified according to SJSC. Secondly, the meanings of the candidate groups are precisely disambiguated by linguistic information provided by the two ontologies, i.e. the hierarchicllostructures, the definitions, and the exae les. Thirdly, the level of LUB is determined by applying the appropriate predicates and definitions of SJSC to the upper-lower and sister nodes of the candidate LUB. Fourthly, the mapping possibility ic inthe terminal node of SJSC is judged by che aring hierarchicllorelations of the two ontologies. Finally, the ituorrect synsets of KLN and terminologiollocandidate groups are excluded in the mapping. This study positively uses various language information described in each ontology for establishing the mapping criteria, and it is indeed the advantage of the fine-grained manual mapping. The result using the proposed methodology shows that 6,487 LUBs are mapped with 474 terminal and non-terminal nodes of SJSC, excluding the multiple mapped nodes, and that 88,255 nodes of KLN are mapped including all lower-level nodes of the mapped LUBs. The total mapping coverage is 97.91% of KLN synsets. This result can be applied in many elaborate syntactic and semantic analyses for Korean language processing.

  • PDF

RGB Channel Selection Technique for Efficient Image Segmentation (효율적인 이미지 분할을 위한 RGB 채널 선택 기법)

  • 김현종;박영배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1332-1344
    • /
    • 2004
  • Upon development of information super-highway and multimedia-related technoiogies in recent years, more efficient technologies to transmit, store and retrieve the multimedia data are required. Among such technologies, firstly, it is common that the semantic-based image retrieval is annotated separately in order to give certain meanings to the image data and the low-level property information that include information about color, texture, and shape Despite the fact that the semantic-based information retrieval has been made by utilizing such vocabulary dictionary as the key words that given, however it brings about a problem that has not yet freed from the limit of the existing keyword-based text information retrieval. The second problem is that it reveals a decreased retrieval performance in the content-based image retrieval system, and is difficult to separate the object from the image that has complex background, and also is difficult to extract an area due to excessive division of those regions. Further, it is difficult to separate the objects from the image that possesses multiple objects in complex scene. To solve the problems, in this paper, I established a content-based retrieval system that can be processed in 5 different steps. The most critical process of those 5 steps is that among RGB images, the one that has the largest and the smallest background are to be extracted. Particularly. I propose the method that extracts the subject as well as the background by using an Image, which has the largest background. Also, to solve the second problem, I propose the method in which multiple objects are separated using RGB channel selection techniques having optimized the excessive division of area by utilizing Watermerge's threshold value with the object separation using the method of RGB channels separation. The tests proved that the methods proposed by me were superior to the existing methods in terms of retrieval performances insomuch as to replace those methods that developed for the purpose of retrieving those complex objects that used to be difficult to retrieve up until now.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

A study on the relation between colors and tastes used mostly (실생활에서 주로 사용하는 색과 미각의 관계에 관한 연구)

  • Choi, Hyoung-Soon;Kim, Yoo-Jin;Lee, Kyung-Won
    • Science of Emotion and Sensibility
    • /
    • v.12 no.4
    • /
    • pp.471-480
    • /
    • 2009
  • According to former studies, people can imagine the specific taste by the specific color, not every color. Besides, studies also say that the relation between colors and tastes is decided by personal experience and frequency about the color of food. So the authors supposed that there is the specific color related the taste. To find the relation, we selected 24 colors and 24 taste adjectives mainly used by people. Then, we examined taste imagined on color with questionnaires of 20 college students who are sensitive to colors and able to use all 24 taste adjectives. Then we analyzed the result by MDS. Finally we could find 5 definite relations between colors and tastes. The result suggested that the number of colors which can be associated with tastes are quite limited. Also, only limited colors can be associated with tastes and it is different by sex. This study shows not only the relation between the color and the taste but also how closely the taste is related to other colors. This study can be used for effective food package design, advertising and so on.

  • PDF