• Title/Summary/Keyword: natural language query

Search Result 79, Processing Time 0.025 seconds

Automatic Generation of Issue Analysis Report Based on Social Big Data Mining (소셜 빅데이터 마이닝 기반 이슈 분석보고서 자동 생성)

  • Heo, Jeong;Lee, Chung Hee;Oh, Hyo Jung;Yoon, Yeo Chan;Kim, Hyun Ki;Jo, Yo Han;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.553-564
    • /
    • 2014
  • In this paper, we propose the system for automatic generation of issue analysis report based on social big data mining, with the purpose of resolving three problems of the previous technologies in a social media analysis and analytic report generation. Three problems are the isolation of analysis, the subjectivity of experts and the closure of information attributable to a high price. The system is comprised of the natural language query analysis, the issue analysis, the social big data analysis, the social big data correlation analysis and the automatic report generation. For the evaluation of report usefulness, we used a Likert scale and made two experts of big data analysis evaluate. The result shows that the quality of report is comparatively useful and reliable. Because of a low price of the report generation, the correlation analysis of social big data and the objectivity of social big data analysis, the proposed system will lead us to the popularization of social big data analysis.

Question Analysis and Expansion based on Semantics (의미 기반의 질의 분석 및 확장)

  • Shin, Seung-Eun;Park, Hee-Guen;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.7
    • /
    • pp.50-59
    • /
    • 2007
  • This paper describes a question analysis and expansion based on semantics for on efficient information retrieval. Results of all information retrieval systems include many non-relevant documents because the index cannot naturally reflect the contents of documents and because queries used in information retrieval systems cannot represent enough information in user's question. To solve this problem, we analyze user's question semantically, determine the answer type, and extract semantic features. And then we expand user's question using them and syntactic structures which are used to represent the answer. Our similarity is to rank documents which include expanded queries in high position. Especially, we found that an efficient document retrieval is possible by a question analysis and expansion based on semantics on natural language questions which are comparatively short but fully expressing the information demand of users.

Design and Implementation of OCR Correction Model for Numeric Digits based on a Context Sensitive and Multiple Streams (제한적 문맥 인식과 다중 스트림을 기반으로 한 숫자 정정 OCR 모델의 설계 및 구현)

  • Shin, Hyun-Kyung
    • The KIPS Transactions:PartD
    • /
    • v.18D no.1
    • /
    • pp.67-80
    • /
    • 2011
  • On an automated business document processing system maintaining financial data, errors on query based retrieval of numbers are critical to overall performance and usability of the system. Automatic spelling correction methods have been emerged and have played important role in development of information retrieval system. However scope of the methods was limited to the symbols, for example alphabetic letter strings, which can be reserved in the form of trainable templates or custom dictionary. On the other hand, numbers, a sequence of digits, are not the objects that can be reserved into a dictionary but a pure markov sequence. In this paper we proposed a new OCR model for spelling correction for numbers using the multiple streams and the context based correction on top of probabilistic information retrieval framework. We implemented the proposed error correction model as a sub-module and integrated into an existing automated invoice document processing system. We also presented the comparative test results that indicated significant enhancement of overall precision of the system by our model.

PC-SAN: Pretraining-Based Contextual Self-Attention Model for Topic Essay Generation

  • Lin, Fuqiang;Ma, Xingkong;Chen, Yaofeng;Zhou, Jiajun;Liu, Bo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3168-3186
    • /
    • 2020
  • Automatic topic essay generation (TEG) is a controllable text generation task that aims to generate informative, diverse, and topic-consistent essays based on multiple topics. To make the generated essays of high quality, a reasonable method should consider both diversity and topic-consistency. Another essential issue is the intrinsic link of the topics, which contributes to making the essays closely surround the semantics of provided topics. However, it remains challenging for TEG to fill the semantic gap between source topic words and target output, and a more powerful model is needed to capture the semantics of given topics. To this end, we propose a pretraining-based contextual self-attention (PC-SAN) model that is built upon the seq2seq framework. For the encoder of our model, we employ a dynamic weight sum of layers from BERT to fully utilize the semantics of topics, which is of great help to fill the gap and improve the quality of the generated essays. In the decoding phase, we also transform the target-side contextual history information into the query layers to alleviate the lack of context in typical self-attention networks (SANs). Experimental results on large-scale paragraph-level Chinese corpora verify that our model is capable of generating diverse, topic-consistent text and essentially makes improvements as compare to strong baselines. Furthermore, extensive analysis validates the effectiveness of contextual embeddings from BERT and contextual history information in SANs.

A Semantic Similarity Measure for Retrieving Software Components (소프트웨어 부품의 검색을 위한 의미 유사도 측정)

  • Kim, Tae-Hee;Kang, Moon-Seol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1443-1452
    • /
    • 1996
  • In this paper, we propose a semantic similarity measure for reusable software components, which aims to provide the automatic classification process of reusable to be stored in the structure of a software library, and to provide an efficient retrieval method of the software components satisfying the user's requirements. We have identified the facets to represent component characteristics by extracting information from the component descriptions written in a natural language, composed the software component identifiers from the automatically extracted terms corresponding to each facets, and stored them which the components in the nearest locations according to the semantic similarity of the classified components. In order to retrieve components satisfying user's requirements, we measured a semantic similarity between the queries and the stored components in the software library. As a result of using the semantic similarity to retrieve reusable components, we could not only retrieve the set of components satisfying user's queries. but also reduce the retrieval time of components of user's request. And we further improve the overall retrieval efficiency by assigning relevance ranking to the retrieved components according to the degree of query satisfaction.

  • PDF

Classification and Retrieval of Object - Oriented Reuse Components with HACM (HACM을 사용한 객체지향 재사용 부품의 분류와 검색)

  • Bae, Je-Min;Kim, Sang-Geun;Lee, Kyung-Whan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.7
    • /
    • pp.1733-1748
    • /
    • 1997
  • In this paper, we propose the classification scheme and retrieval mechanism which can apply to many application domains in order to construct the software reuse library. Classification scheme which is the core of the accessibility in the reusability, is defined by the hierarchical structure using the agglomerative clusters. Agglomerative cluster means the group of the reuse component by the functional relationships. Functional relationships are measured by the HACM which is the representation method about software components to calculate the similarities among the classes in the particular domain. And clustering informations are added to the library structure which determines the functionality and accuracy of the retrieval system. And the system stores the classification results such as the index information with the weights, the similarity matrix, the hierarchical structure. Therefore users can retrieve the software component using the query which is the natural language. The thesis is studied to focus on the findability of software components in the reuse library. As a result, the part of the construction process of the reuse library was automated, and we can construct the object-oriented reuse library with the extendibility and relationship about the reuse components. Also the our process is visualized through the browse hierarchy of the retrieval environment, and the retrieval system is integrated to the reuse system CARS 2.1.

  • PDF

A study on the Extraction of Similar Information using Knowledge Base Embedding for Battlefield Awareness

  • Kim, Sang-Min;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.33-40
    • /
    • 2021
  • Due to advanced complex strategies, the complexity of information that a commander must analyze is increasing. An intelligent service that can analyze battlefield is needed for the commander's timely judgment. This service consists of extracting knowledge from battlefield information, building a knowledge base, and analyzing the battlefield information from the knowledge base. This paper extract information similar to an input query by embedding the knowledge base built in the 2nd step. The transformation model is needed to generate the embedded knowledge base and uses the random-walk algorithm. The transformed information is embedding using Word2Vec, and Similar information is extracted through cosine similarity. In this paper, 980 sentences are generated from the open knowledge base and embedded as a 100-dimensional vector and it was confirmed that similar entities were extracted through cosine similarity.

Development of AI-based Real Time Agent Advisor System on Call Center - Focused on N Bank Call Center (AI기반 콜센터 실시간 상담 도우미 시스템 개발 - N은행 콜센터 사례를 중심으로)

  • Ryu, Ki-Dong;Park, Jong-Pil;Kim, Young-min;Lee, Dong-Hoon;Kim, Woo-Je
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.750-762
    • /
    • 2019
  • The importance of the call center as a contact point for the enterprise is growing. However, call centers have difficulty with their operating agents due to the agents' lack of knowledge and owing to frequent agent turnover due to downturns in the business, which causes deterioration in the quality of customer service. Therefore, through an N-bank call center case study, we developed a system to reduce the burden of keeping up business knowledge and to improve customer service quality. It is a "real-time agent advisor" system that provides agents with answers to customer questions in real time by combining AI technology for speech recognition, natural language processing, and questions & answers for existing call center information systems, such as a private branch exchange (PBX) and computer telephony integration (CTI). As a result of the case study, we confirmed that the speech recognition system for real-time call analysis and the corpus construction method improves the natural speech processing performance of the query response system. Especially with name entity recognition (NER), the accuracy of the corpus learning improved by 31%. Also, after applying the agent advisor system, the positive feedback rate of agents about the answers from the agent advisor was 93.1%, which proved the system is helpful to the agents.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.