• 제목/요약/키워드: Inverse Vector Space Model

검색결과 8건 처리시간 0.024초

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권4호
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

  • Mikawa, Kenta;Ishida, Takashi;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • 제11권1호
    • /
    • pp.87-93
    • /
    • 2012
  • This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.

An iterative method for damage identification of skeletal structures utilizing biconjugate gradient method and reduction of search space

  • Sotoudehnia, Ebrahim;Shahabian, Farzad;Sani, Ahmad Aftabi
    • Smart Structures and Systems
    • /
    • 제23권1호
    • /
    • pp.45-60
    • /
    • 2019
  • This paper is devoted to proposing a new approach for damage detection of structures. In this technique, the biconjugate gradient method (BCG) is employed. To remedy the noise effects, a new preconditioning algorithm is applied. The proposed preconditioner matrix significantly reduces the condition number of the system. Moreover, based on the characteristics of the damage vector, a new direct search algorithm is employed to increase the efficiency of the suggested damage detection scheme by reducing the number of unknowns. To corroborate the high efficiency and capability of the presented strategy, it is applied for estimating the severity and location of damage in the well-known 31-member and 52-member trusses. For damage detection of these trusses, the time history responses are measured by a limited number of sensors. The results of numerical examples reveal high accuracy and robustness of the proposed method.

상지 외골격 로봇 제어를 위한 인체 팔 동작의 기구학 및 동역학적 분석 - 파트 1: 시스템 모델 및 기구학적 제한 (Analysis on Kinematics and Dynamics of Human Arm Movement Toward Upper Limb Exoskeleton Robot Control Part 1: System Model and Kinematic Constraint)

  • 김현철;이춘영
    • 제어로봇시스템학회논문지
    • /
    • 제18권12호
    • /
    • pp.1106-1114
    • /
    • 2012
  • To achieve synchronized motion between a wearable robot and a human user, the redundancy must be resolved in the same manner by both systems. According to the seven DOF (Degrees of Freedom) human arm model composed of the shoulder, elbow, and wrist joints, positioning and orientating the wrist in space is a task requiring only six DOFs. Due to this redundancy, a given task can be completed by multiple arm configurations, and thus there exists no unique mathematical solution to the inverse kinematics. This paper presents analysis on the kinematic and dynamic aspect of the human arm movement and their effect on the redundancy resolution of the human arm based on a seven DOF manipulator model. The redundancy of the arm is expressed mathematically by defining the swivel angle. The final form of swivel angle can be represented as a linear combination of two different swivel angles achieved by optimizing different cost functions based on kinematic and dynamic criteria. The kinematic criterion is to maximize the projection of the longest principal axis of the manipulability ellipsoid for the human arm on the vector connecting the wrist and the virtual target on the head region. The dynamic criterion is to minimize the mechanical work done in the joint space for each two consecutive points along the task space trajectory. As a first step, the redundancy based on the kinematic criterion will be thoroughly studied based on the motion capture data analysis. Experimental results indicate that by using the proposed redundancy resolution criterion in the kinematic level, error between the predicted and the actual swivel angle acquired from the motor control system is less than five degrees.

Retrieval methodology for similar NPP LCO cases based on domain specific NLP

  • No Kyu Seong ;Jae Hee Lee ;Jong Beom Lee;Poong Hyun Seong
    • Nuclear Engineering and Technology
    • /
    • 제55권2호
    • /
    • pp.421-431
    • /
    • 2023
  • Nuclear power plants (NPPs) have technical specifications (Tech Specs) to ensure that the equipment and key operating parameters necessary for the safe operation of the power plant are maintained within limiting conditions for operation (LCO) determined by a safety analysis. The LCO of Tech Specs that identify the lowest functional capability of equipment required for safe operation for a facility must be complied for the safe operation of NPP. There have been previous studies to aid in compliance with LCO relevant to rule-based expert systems; however, there is an obvious limit to expert systems for implementing the rules for many situations related to LCO. Therefore, in this study, we present a retrieval methodology for similar LCO cases in determining whether LCO is met or not met. To reflect the natural language processing of NPP features, a domain dictionary was built, and the optimal term frequency-inverse document frequency variant was selected. The retrieval performance was improved by adding a Boolean retrieval model based on terms related to the LCO in addition to the vector space model. The developed domain dictionary and retrieval methodology are expected to be exceedingly useful in determining whether LCO is met.

상지 외골격 로봇 제어를 위한 인체 팔 동작의 기구학 및 동역학적 분석 - 파트 2: 제한조건의 선형 결합 (Analysis on the Kinematics and Dynamics of Human Arm Movement Toward Upper Limb Exoskeleton Robot Control - Part 2: Combination of Kinematic and Dynamic Constraints)

  • 김현철;이춘영
    • 제어로봇시스템학회논문지
    • /
    • 제20권8호
    • /
    • pp.875-881
    • /
    • 2014
  • The redundancy resolution of the seven DOF (Degree of Freedom) upper limb exoskeleton is key to the synchronous motion between a robot and a human user. According to the seven DOF human arm model, positioning and orientating the wrist can be completed by multiple arm configurations that results in the non-unique solution to the inverse kinematics. This paper presents analysis on the kinematic and dynamic aspect of the human arm movement and its effect on the redundancy resolution of the seven DOF human arm model. The redundancy of the arm is expressed mathematically by defining the swivel angle. The final form of swivel angle can be represented as a linear combination of two different swivel angles achieved by optimizing two cost functions based on kinematic and dynamic criteria. The kinematic criterion is to maximize the projection of the longest principal axis of the manipulability ellipsoid of the human arm on the vector connecting the wrist and the virtual target on the head region. The dynamic criterion is to minimize the mechanical work done in the joint space for each of two consecutive points along the task space trajectory. The contribution of each criterion on the redundancy was verified by the post processing of experimental data collected with a motion capture system. Results indicate that the bimodal redundancy resolution approach improved the accuracy of the predicted swivel angle. Statistical testing of the dynamic constraint contribution shows that under moderate speeds and no load, the dynamic component of the human arm is not dominant, and it is enough to resolve the redundancy without dynamic constraint for the realtime application.

인터넷 상점에서의 내용기반 추천을 위한 상품 및 고객의 자질 추출 성능 비교 (Comparison of Product and Customer Feature Selection Methods for Content-based Recommendation in Internet Storefronts)

  • 안형준;김종우
    • 정보처리학회논문지D
    • /
    • 제13D권2호
    • /
    • pp.279-286
    • /
    • 2006
  • 인터넷 쇼핑몰에서의 상품 추천을 위해 널리 사용되는 방식 중 한 가지는 상품의 특성과 고객의 특성을 비교하여 고객에 맞는 상품을 추천하는 방식이다. 이 방식은 상품이나 고객의 특성을 표현하는 자질(Feature)의 개수가 많을수록 그 중에 어떤 자질을 선택해야 더 좋은 추천 성과를 가져올 수 있는지 파악해 내는 것이 추천의 효과 및 효율성 측면에서 중요하지만 아직까지 충분히 연구되지 않은 실정이다. 본 연구에서는 인터넷 서점에서의 가상 구매실험을 바탕으로 사용자가 구매한 책 들에서 사용자를 잘 나타낼 수 있는 자질을 선택하는 방식에 대해서 벡터 스페이스 모형, TFIDF(Term Frequency-Inverse Document Frequency), Mutual Information, SVD(Singular Value Decomposition) 방식 등을 활용하여 실험하고 그 결과를 비교해본다. 실험 결과 SVD를 응용한 자질 추출 기법이 가장 좋은 성능을 나타내었다.