• Title/Summary/Keyword: Question Clustering

Search Result 19, Processing Time 0.025 seconds

Question and Answering System through Search Result Summarization of Q&A Documents (Q&A 문서의 검색 결과 요약을 활용한 질의응답 시스템)

  • Yoo, Dong Hyun;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.4
    • /
    • pp.149-154
    • /
    • 2014
  • A user should pick up relevant answers by himself from various search results when using user participation question answering community like Knowledge-iN. If refined answers are automatically provided, usability of question answering community must be improved. This paper divides questions in Q&A documents into 4 types(word, list, graph and text), then proposes summarizing methods for each question type using document statistics. Summarized answers for word, list and text type are obtained by question clustering and calculating scores for words using frequency, proximity and confidence of answers. Answers for graph type is shown by extracting user opinion from answers.

Research on the Hybrid Paragraph Detection System Using Syntactic-Semantic Analysis (구문의미 분석을 활용한 복합 문단구분 시스템에 대한 연구)

  • Kang, Won Seog
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.106-116
    • /
    • 2021
  • To increase the quality of the system in the subjective-type question grading and document classification, we need the paragraph detection. But it is not easy because it is accompanied by semantic analysis. Many researches on the paragraph detection solve the detection problem using the word based clustering method. However, the word based method can not use the order and dependency relation between words. This paper suggests the paragraph detection system using syntactic-semantic relation between words with the Korean syntactic-semantic analysis. This system is the hybrid system of word based, concept based, and syntactic-semantic tree based detection. The experiment result of the system shows it has the better result than the word based system. This system will be utilized in Korean subjective question grading and document classification.

An Experimental Study on Multi-Document Summarization for Question Answering (질의응답을 위한 복수문서 요약에 관한 실험적 연구)

  • Choi, Sang-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.289-303
    • /
    • 2004
  • This experimental study proposes a multi-document summarization method that produces optimal summaries in which users can find answers to their queries. In order to identify the most effective method for this purpose, the performance of the three summarization methods were compared. The investigated methods are sentence clustering, passage extraction through spreading activation, and clustering-passage extraction hybrid methods. The effectiveness of each summarizing method was evaluated by two criteria used to measure the accuracy and the redundancy of a summary. The passage extraction method using the sequential bnb search algorithm proved to be most effective in summarizing multiple documents with regard to summarization precision. This study proposes the passage extraction method as the optimal multi-document summarization method.

Query Reconstruction for Searching QA Documents by Utilizing Structural Components (질의응답문서 검색에서 문서구조를 이용한 질의재생성에 관한 연구)

  • Choi, Sang-Hee;Seo, Eun-Gyoung
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.229-243
    • /
    • 2006
  • This study aims to suggest an effective way to enhance question-answer(QA) document retrieval performance by reconstructing queries based on the structural features in the QA documents. QA documents are a structured document which consists of three components : question from a questioner, short description on the question, answers chosen by the questioner. The study proposes the methods to reconstruct a new query using by two major structural parts, question and answer, and examines which component of a QA document could contribute to improve query performance. The major finding in this study is that to use answer document set is the most effective for reconstructing a new query. That is, queries reconstructed based on terms appeared on the answer document set provide the most relevant search results with reducing redundancy of retrieved documents.

Automatic Response and Conceptual Browsing of Internet FAQs Using Self-Organizing Maps (자기구성 지도를 이용한 인터넷 FAQ의 자동응답 및 개념적 브라우징)

  • Ahn, Joon-Hyun;Ryu, Jung-Won;Cho, Sung-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.5
    • /
    • pp.432-441
    • /
    • 2002
  • Though many services offer useful information on internet, computer users are not so familiar with such services that they need an assistant system to use the services easily In the case of web sites, for example, the operators answer the users e-mail questions, but the increasing number of users makes it hard to answer the questions efficiently. In this paper, we propose an assistant system which responds to the users questions automatically and helps them browse the Hanmail Net FAQ (Frequently Asked Question) conceptually. This system uses two-level self-organizing map (SOM): the keyword clustering SOM and document classification SOM. The keyword clustering SOM reduces a variable length question to a normalized vector and the document classification SOM classifies the question into an answer class. Experiments on the 2,206 e-mail question data collected for a month from the Hanmail net show that this system is able to find the correct answers with the recognition rate of 95% and also the browsing based on the map is conceptual and efficient.

Phonetic Question Set Generation Algorithm (음소 질의어 집합 생성 알고리즘)

  • 김성아;육동석;권오일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.173-179
    • /
    • 2004
  • Due to the insufficiency of training data in large vocabulary continuous speech recognition, similar context dependent phones can be clustered by decision trees to share the data. When the decision trees are built and used to predict unseen triphones, a phonetic question set is required. The phonetic question set, which contains categories of the phones with similar co-articulation effects, is usually generated by phonetic or linguistic experts. This knowledge-based approach for generating phonetic question set, however, may reduce the homogeneity of the clusters. Moreover, the experts must adjust the question sets whenever the language or the PLU (phone-like unit) of a recognition system is changed. Therefore, we propose a data-driven method to automatically generate phonetic question set. Since the proposed method generates the phone categories using speech data distribution, it is not dependent on the language or the PLU, and may enhance the homogeneity of the clusters. In large vocabulary speech recognition experiments, the proposed algorithm has been found to reduce the error rate by 14.3%.

The Strength of the Relationship between Semantic Similarity and the Subcategorization Frames of the English Verbs: a Stochastic Test based on the ICE-GB and WordNet (영어 동사의 의미적 유사도와 논항 선택 사이의 연관성 : ICE-GB와 WordNet을 이용한 통계적 검증)

  • Song, Sang-Houn;Choe, Jae-Woong
    • Language and Information
    • /
    • v.14 no.1
    • /
    • pp.113-144
    • /
    • 2010
  • The primary goal of this paper is to find a feasible way to answer the question: Does the similarity in meaning between verbs relate to the similarity in their subcategorization? In order to answer this question in a rather concrete way on the basis of a large set of English verbs, this study made use of various language resources, tools, and statistical methodologies. We first compiled a list of 678 verbs that were selected from the most and second most frequent word lists from the Colins Cobuild English Dictionary, which also appeared in WordNet 3.0. We calculated similarity measures between all the pairs of the words based on the 'jcn' algorithm (Jiang and Conrath, 1997) implemented in the WordNet::Similarity module (Pedersen, Patwardhan, and Michelizzi, 2004). The clustering process followed, first building similarity matrices out of the similarity measure values, next drawing dendrograms on the basis of the matricies, then finally getting 177 meaningful clusters (covering 437 verbs) that passed a certain level set by z-score. The subcategorization frames and their frequency values were taken from the ICE-GB. In order to calculate the Selectional Preference Strength (SPS) of the relationship between a verb and its subcategorizations, we relied on the Kullback-Leibler Divergence model (Resnik, 1996). The SPS values of the verbs in the same cluster were compared with each other, which served to give the statistical values that indicate how much the SPS values overlap between the subcategorization frames of the verbs. Our final analysis shows that the degree of overlap, or the relationship between semantic similarity and the subcategorization frames of the verbs in English, is equally spread out from the 'very strongly related' to the 'very weakly related'. Some semantically similar verbs share a lot in terms of their subcategorization frames, and some others indicate an average degree of strength in the relationship, while the others, though still semantically similar, tend to share little in their subcategorization frames.

  • PDF

Can the Evolutionary Economics Solve the Walras' Trap? (진화주의 기술경제학과 '왈라스 함정')

  • Kim, Tae-Eok
    • Journal of Technology Innovation
    • /
    • v.13 no.1
    • /
    • pp.213-246
    • /
    • 2005
  • Despite of the impressive progress made in the Evolutionary techno-economics during the last two decades, there have been very little, if not at all, theoretical advancement in explaining an endogenous mechanism of transforming a technological paradigm within self-perpetuatingstructural dynamics. The question poorly attempted was raised by Schumpeter a century ago in his effort to overcome the well-known 'Walras' trap'. Although there have been increasing number of researchers recently tackling the issue quite seriously from within the Evolutionary school, I see it that radical reconstruction of the basic principle of Evolutionary research framework is urgently needed to solve the century long fundamental question, from evolutionary approach to transformational approach. In the paper, I will show the theoretical feasibility of explaining an endogenous mechanism of paradigm transformation, relying upon the concept of localized dynamics and the concept of morphogenetic structuration. It should be emphasized that there must be aendogenous process of deepening structural Instability generated in the process of economic coordination to secure efficient circular flow. The concept of development bottleneck initiated by the Baumol's cost disease could be regarded as one of the important source of such mechanism. Unfortunately, however, it is a brief conceptual description presented in the paper rather than a comprehensive analytical model, due to the space limitation imposed.

  • PDF

Water Distribution Network Partitioning Based on Community Detection Algorithm and Multiple-Criteria Decision Analysis

  • Bui, Xuan-Khoa;Kang, Doosun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.115-115
    • /
    • 2020
  • Water network partitioning (WNP) is an initiative technique to divide the original water distribution network (WDN) into several sub-networks with only sparse connections between them called, District Metered Areas (DMAs). Operating and managing (O&M) WDN through DMAs is bringing many advantages, such as quantification and detection of water leakage, uniform pressure management, isolation from chemical contamination. The research of WNP recently has been highlighted by applying different methods for dividing a network into a specified number of DMAs. However, it is an open question on how to determine the optimal number of DMAs for a given network. In this study, we present a method to divide an original WDN into DMAs (called Clustering) based on community structure algorithm for auto-creation of suitable DMAs. To that aim, many hydraulic properties are taken into consideration to form the appropriate DMAs, in which each DMA is controlled as uniform as possible in terms of pressure, elevation, and water demand. In a second phase, called Sectorization, the flow meters and control valves are optimally placed to divide the DMAs, while minimizing the pressure reduction. To comprehensively evaluate the WNP performance and determine optimal number of DMAs for given WDN, we apply the framework of multiple-criteria decision analysis. The proposed method is demonstrated using a real-life benchmark network and obtained permissible results. The approach is a decision-support scheme for water utilities to make optimal decisions when designing the DMAs of their WDNs.

  • PDF

An Interactive Approach to Categorize Questions on the Internet BBSs (인터넷 게시판 질문 분류를 위한 인터랙티브 접근방법에 관한 연구)

  • Jae-Kwang Lee;Seong-Ho Noh;Ok-Hyun Ryou
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.3
    • /
    • pp.177-195
    • /
    • 2003
  • In a traditional customer support environment, mainly call centers or service centers are responsible for receiving inquiries from their customers via telephone calls. Due to the rapid growth of Internet with its widespread acceptance and accessibility, means of communication with customers in the traditional customer support center, such as telephones, letters, and direct-visiting, have been replaced by e-mails and bulletin board systems (BBSs) using the Internet constantly. BBSs are basically question and answer systems, they require some lead time to get answer from administrator. To reduce lead time, BBSs enable remote customers or users to log on and tap into a knowledge database that is generally formatted in the form of Frequently Asked Questions (FAQs) that provide answers and solutions to the common problems. And, many different types of the questions are mixed on the BBS. It is a burden to administrator. To build FAQs and to support BBS adminstrator, a supporting tool which is to categorize questions is helpful. In this research, we suggest an interactive question categorizing methodology which consists of steps to present question using keywords, identifying keywords' affinity, computing similarity among questions, and clustering questions. This methodology allows users to interact iteratively for clear manifestation of ambiguous questions. We also developed a prototype system, IQC (interactive question categorizer) and evaluated its performance using the comparison experiments with other systems. IQC is not a general purposed system, but it produces a good result in a given specific domain.

  • PDF