• Title/Summary/Keyword: 어휘정보

Search Result 1,062, Processing Time 0.023 seconds

Generalized LR Parser with Conditional Action Model(CAM) using Surface Phrasal Types (표층 구문 타입을 사용한 조건부 연산 모델의 일반화 LR 파서)

  • 곽용재;박소영;황영숙;정후중;이상주;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.81-92
    • /
    • 2003
  • Generalized LR parsing is one of the enhanced LR parsing methods so that it overcome the limit of one-way linear stack of the traditional LR parser using graph-structured stack, and it has been playing an important role of a firm starting point to generate other variations for NL parsing equipped with various mechanisms. In this paper, we propose a conditional Action Model that can solve the problems of conventional probabilistic GLR methods. Previous probabilistic GLR parsers have used relatively limited contextual information for disambiguation due to the high complexity of internal GLR stack. Our proposed model uses Surface Phrasal Types representing the structural characteristics of the parse for its additional contextual information, so that more specified structural preferences can be reflected into the parser. Experimental results show that our GLR parser with the proposed Conditional Action Model outperforms the previous methods by about 6-7% without any lexical information, and our model can utilize the rich stack information for syntactic disambiguation of probabilistic LR parser.

Multi-Document Summarization Method Based on Semantic Relationship using VAE (VAE를 이용한 의미적 연결 관계 기반 다중 문서 요약 기법)

  • Baek, Su-Jin
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.341-347
    • /
    • 2017
  • As the amount of document data increases, the user needs summarized information to understand the document. However, existing document summary research methods rely on overly simple statistics, so there is insufficient research on multiple document summaries for ambiguity of sentences and meaningful sentence generation. In this paper, we investigate semantic connection and preprocessing process to process unnecessary information. Based on the vocabulary semantic pattern information, we propose a multi-document summarization method that enhances semantic connectivity between sentences using VAE. Using sentence word vectors, we reconstruct sentences after learning from compressed information and attribute discriminators generated as latent variables, and semantic connection processing generates a natural summary sentence. Comparing the proposed method with other document summarization methods showed a fine but improved performance, which proved that semantic sentence generation and connectivity can be increased. In the future, we will study how to extend semantic connections by experimenting with various attribute settings.

A Malicious Comments Detection Technique on the Internet using Sentiment Analysis and SVM (감성분석과 SVM을 이용한 인터넷 악성댓글 탐지 기법)

  • Hong, Jinju;Kim, Sehan;Park, Jeawon;Choi, Jaehyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.2
    • /
    • pp.260-267
    • /
    • 2016
  • The Internet has brought lots of changes to us sharing information mutually. However, as all social symptom have double-sided character, it has serious social problem. Vicious users have been taking advantage of anonymity on the Internet, stating comments aggressively for defamation, personal attacks, privacy violation and more. Malicious comments on the Internet are creating the biggest problem regarding unlawful acts and insults which occur on the Internet. In order to solve the issues, several studies have been done to efficiently manage the comments. However, there are limitations to recognize modified malicious vocabulary in previous research. So, in this paper, we propose a malicious comments detection technique by improving limitation of previous studies. The experimental result has shown accuracy of 87.8% providing higher accuracy as compared to previous studies done.

Design and Implementation of Search System Using Domain Ontology (도메인 온톨로지를 이용한 검색 시스템 설계 및 구현)

  • Kang, Rae-Goo;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.7
    • /
    • pp.1318-1324
    • /
    • 2007
  • TSP(Traveling Salesman Problem) is a problem finding out the shortest distance out of many courses where given cities of the number of N, one starts a certain city and turns back to a starting city, visiting every city only once. As the number of cities having visited increases, the calculation rate increases geometrically. This problem makes TSP classified in NP-Hard Problem and genetic algorithm is used representatively. To obtain a better result in TSP, various operators have been developed and studied. This paper suggests new method of population initialization and of sequential transformation, and then proves the improvement of capability by comparing them with existing methods.

A Study on the Statistical Characteristics for Table of Contents Text of the Books in Social Sciences Field (사회과학 분야 도서의 목차 텍스트에 대한 통계적 특성에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.255-273
    • /
    • 2019
  • Recently, the table of contents (TOC) has been becoming increasingly accessible and utilized. The study conducted descriptive statistics and comparative analysis of the table of contents in terms of parts of speech and subject in text. For this purpose, this study chose the books of the social sciences field from acquisition lists of an academic library, obtained Dewey class numbers of target books from KERIS union catalog, and extracted TOC data from online bookstore. Morphological analysis was performed on each book titles and TOCs, and descriptive statistics and frequency analysis were carried out. As a result, nouns made up roughly half of the morphemes of titles or the TOCs. TOCs had about 50 times more nouns than titles. The percentage of unique nouns that appeared only in the table of contents is estimated to be 95.2% of the TOC's total nouns. The table of contents also showed a differences in its lengths depending on the field of social science.

Ontology Design for the Register of Officials(先生案) of the Joseon Period (조선시대 선생안 온톨로지 설계)

  • Kim, Sa-hyun
    • (The)Study of the Eastern Classic
    • /
    • no.69
    • /
    • pp.115-146
    • /
    • 2017
  • This paper is about the research on ontology design for a digital archive of seonsaengan(先生案) of the Joseon Period. Seonsaengan is the register of staff officials at each government office, along with their personal information and records of their transfer from one office to another, in addition to their DOBs, family clan, etc. A total of 176 types of registers are known to be kept at libraries and museums in the country. This paper intends to engage in the ontology design of 47 cases of such registers preserved at the Jangseogak Archives of the Academy of Korean Studies (AKS) with a focus on their content and structure including the names of the relevant government offices and posts assumed by the officials, etc. The work for the ontology design was done with a focus on the officials, the offices they belong to, and records about their transfers kept in the registers. The ontology design categorized relevant resources into classes according to the attributes common to the individuals. Each individual has defined a semantic postposition word that can explicitly express the relationship with other individuals. As for the classes, they were divided into eight categories, i.e. registers, figures, offices, official posts, state examination, records, and concepts. For design of relationships and attributes, terms and phrases such as Dublin Core, Europeana Data Mode, CIDOC-CRM, data model for database of those who passed the exam in the past, which are already designed and used, were referred to. Where terms and phrases designed in existing data models are used, the work used Namespace of the relevant data model. The writer defined the relationships where necessary. The designed ontology shows an exemplary implementation of the Myeongneung seonsaengan(明陵先生案). The work gave consideration to expected effects of information entered when a single registered is expanded to plural registers, along with ways to use it. The ontology design is not one made based on the review of all of the 176 registers. The model needs to be improved each time relevant information is obtained. The aim of such efforts is the systematic arrangement of information contained in the registers. It should be remembered that information arranged in this manner may be rearranged with the aid of databases or archives existing currently or to be built in the future. It is expected that the pieces of information entered through the ontology design will be used as data showing how government offices were operated and what their personnel system was like, along with politics, economy, society, and culture of the Joseon Period, in linkage with databases already established.

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

Quantities, Degrees, and Possible Worlds - Lexical Semantics of Korean Adverb '거의(geoui)' (양(quantity), 정도(degree), 가능세계 - 부사 '거의'의 어휘의미를 중심으로 -)

  • Kim, Shin-Hwe
    • Language and Information
    • /
    • v.15 no.2
    • /
    • pp.47-65
    • /
    • 2011
  • A Korean adverb '거의(geoui)' modifies predicates to generate complex predicates which have meanings of 'nearly' complete or typical properties of the modified predicates in quantities, degrees, and frequencies. The modified predicates 'complete' or 'typical' properties are referred counterfactually as standards for the generated predicates' meanings of deficiencies. These counterfactual standards can be formalized by a counterfactual conditional operator of the intensional semantics in Cresswell(1990). The deficiencies in the quantities, degrees, or frequencies of the properties can be expressed formally introducing a world-independent measure of comparison. The measure can be manufactured out of relations between intensional things at indices and their equivalence classes. The world-independent measure of comparison has a semantic structure under-specified in quantity, degree, and frequency, and seems very well-suited in describing lexical meaning of '거의(geoui)'. The lexical-semantic analysis of '거의(geoui)' shows explicitly the plausibility of the indispensable existence of the comparing measure which works across real and counterfactual worlds in natural language meaning. On the other hand, we examined Kim, young-hee(1985)'s proposal of a transition of quantificational meaning for Korean degree adverbs, where he tried to explain the quantificational meaning of Korean degree adverbs in general including '거의(geoui)' with several syntactic and semantic constraints of 'contextual deletion'. But it is shown that the quantificational meanings of the degree adverbs which Kim(1985) discussed are also explained better by their under-specified meanings in quantities, frequencies and degrees with the world-independent measure of comparison applied to their paradigmatic lexical constraint rather than Kim(1985)'s transition of meaning.

  • PDF

A Study on the Korean Broadcasting Speech Recognition (한국어 방송 음성 인식에 관한 연구)

  • 김석동;송도선;이행세
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.53-60
    • /
    • 1999
  • This paper is a study on the korean broadcasting speech recognition. Here we present the methods for the large vocabuary continuous speech recognition. Our main concerns are the language modeling and the search algorithm. The used acoustic model is the uni-phone semi-continuous hidden markov model and the used linguistic model is the N-gram model. The search algorithm consist of three phases in order to utilize all available acoustic and linguistic information. First, we use the forward Viterbi beam search to find word end frames and to estimate related scores. Second, we use the backword Viterbi beam search to find word begin frames and to estimate related scores. Finally, we use A/sup */ search to combine the above two results with the N-grams language model and to get recognition results. Using these methods maximum 96.0% word recognition rate and 99.2% syllable recognition rate are achieved for the speaker-independent continuous speech recognition problem with about 12,000 vocabulary size.

  • PDF

XML Schema Matching based on Ontology Update for the Transformation of XML Documents (XML 문서의 변환을 위한 온톨로지 갱신 기반 XML 스키마 매칭)

  • Lee, Kyong-Ho;Lee, Jun-Seung
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.727-740
    • /
    • 2006
  • Schema matching is important as a prerequisite to the transformation of XML documents. This paper presents a schema matching method for the transformation of XML documents. The proposed method consists of two steps: preliminary matching relationships between leaf nodes in the two XML schemas are computed based on proposed ontology and leaf node similarity, and final matchings are extracted based on a proposed path similarity. Particularly, for a sophisticated schema matching, the proposed ontology is incrementally updated by users' feedback. furthermore, since the ontology can describe various relationships between concepts, the proposed method can compute complex matchings as well as simple matchings. Experimental results with schemas used in various domains show that the proposed method is superior to previous works, resulting in a precision of 97% and a recall of 83 % on the average. Furthermore, the dynamic ontology increased by 9 percent overall.