• Title/Summary/Keyword: Label

Search Result 1,634, Processing Time 0.025 seconds

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

The actual aspects of North Korea's 1950s Changgeuk through the Chunhyangjeon in the film Moranbong(1958) and the album Corée Moranbong(1960) (영화 <모란봉>(1958)과 음반 (1960) 수록 <춘향전>을 통해 본 1950년대 북한 창극의 실제적 양상)

  • Song, Mi-Kyoung
    • (The) Research of the performance art and culture
    • /
    • no.43
    • /
    • pp.5-46
    • /
    • 2021
  • The film Moranbong is the product of a trip to North Korea in 1958, when Armangati, Chris Marker, Claude Lantzmann, Francis Lemarck and Jean-Claude Bonardo left at the invitation of Joseon Film. However, for political reasons, the film was not immediately released, and it was not until 2010 that it was rediscovered and received attention. The movie consists of the narratives of Young-ran and Dong-il, set in the Korean War, that are folded into the narratives of Chunhyang and Mongryong in the classic Chunhyangjeon of Joseon. At this time, Joseon's classics are reproduced in the form of the drama Chunhyangjeon, which shares the time zone with the two main characters, and the two narratives are covered in a total of six scenes. There are two layers of middle-story frames in the movie, and if the same narrative is set in North Korea in the 1950s, there is an epic produced by the producers and actors of the Changgeuk Chunhyangjeon and the Changgeuk Chunhyangjeon as a complete work. In the outermost frame of the movie, Dong-il is the main character, but in the inner double frame, Young-ran, who is an actor growing up with the Changgeuk Chunhyangjeon and a character in the Changgeuk Chunhyangjeon, is the center. The following three OST albums are Corée Moranbong released in France in 1960, Musique de corée released in 1970, and 朝鮮の伝統音樂-唱劇 「春香伝」と伝統樂器- released in 1968 in Japan. While Corée Moranbong consists only of the music from the film Moranbong, the two subsequent albums included additional songs collected and recorded by Pyongyang National Broadcasting System. However, there is no information about the movie Moranbong on the album released in Japan. Under the circumstances, it is highly likely that the author of the record label or music commentary has not confirmed the existence of the movie Moranbong, and may have intentionally excluded related contents due to the background of the film's ban on its release. The results of analyzing the detailed scenes of the Changgeuk Chunhyangjeon, Farewell Song, Sipjang-ga, Chundangsigwa, Bakseokti and Prison Song in the movie Moranbong or OST album in the 1950s are as follows. First, the process of establishing the North Korean Changgeuk Chunhyangjeon in the 1950s was confirmed. The play, compiled in 1955 through the Joseon Changgeuk Collection, was settled in the form of a Changgeuk that can be performed in the late 1950s by the Changgeuk Chunhyangjeon between 1956 and 1958. Since the 1960s, Chunhyangjeon has no longer been performed as a traditional pansori-style Changgeuk, so the film Moranbong and the album Corée moranbong are almost the last records to capture the Changgeuk Chunhyangjeon and its music. Second, we confirmed the responses of the actors to the controversy over Takseong in the North Korean creative world in the 1950s. Until 1959, there was a voice of criticism surrounding Takseong and a voice of advocacy that it was also a national characteristic. Shin Woo-sun, who almost eliminated Takseong with clear and high-pitched phrases, air man who changed according to the situation, who chose Takseong but did not actively remove Takseong, Lim So-hyang, who tried to maintain his own tone while accepting some of modern vocalization. Although Cho Sang-sun and Lim So-hyang were also guaranteed roles to continue their voices, the selection/exclusion patterns in the movie Moranbong were linked to the Takseong removal guidelines required by North Korean musicians in the name of Dang and People in the 1950s. Second, Changgeuk actors' response to the controversy over the turbidity of the North Korean Changgeuk community in the 1950s was confirmed. Until 1959, there were voices of criticism and support surrounding Taksung in North Korea. Shin Woo-sun, who showed consistent performance in removing turbidity with clear, high-pitched vocal sounds, Gong Gi-nam, who did not actively remove turbidity depending on the situation, Cho Sang-sun, who accepted some of the vocalization required by the party, while maintaining his original tone. On the other hand, Cho Sang-seon and Lim So-hyang were guaranteed roles to continue their sounds, but the selection/exclusion patterns of Moranbong was independently linked to the guidelines for removing turbidity that the Gugak musicians who crossed to North Korea had been asked for.