• Title/Summary/Keyword: online information retrieval

Search Result 123, Processing Time 0.024 seconds

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

A Study of KORMARC Database: Problems and Recomendations (한국문헌목록정보(KORMARC)의 문제점 및 개선방향에 관한 연구)

    • Journal of Korean Library and Information Science Society
    • /
    • v.30 no.3
    • /
    • pp.295-322
    • /
    • 1999
  • The purpose of this study is to identify and present the solution to the problems of KORMARC on Disc, which was produced by the National Library of Korea and is being distributed nationwide. Currently, KORMARC on Disc has reached the serious level of duplicates of input record, error on input data and noise of retrieval. Futhermore, input data is not in accordance with KORMARC Rules for Descriptive Cataloging, thus generating many problems. Of all thing, since current MARC system itself is based on manual system, it does not correspond effectively to the online environment. Accordingly, in order to elevate the quality of KORMARC database, current problems must be resolved, at the same time, korea Machine Readable Cataloging must be modified into a format, more suitable to Machine Readable environment. Consequently, the current study analyzes and identifies problems of data in KORMARC on Disc, at the same time, it examines currently used KORMARC Format and Korea machine Readable Cataloging Rules for descriptive Cataloging as to provide easier usage and guidelines for accurate data inputs.

  • PDF

Development of an Indexing Model for Korean Textual Databases (국내 문자정보 데이터베이스의 색인에 관한 연구)

  • 정영미
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.1
    • /
    • pp.19-43
    • /
    • 1996
  • The indexing languages and techniques were ~ u ~ e y e d for Korean textual databases, and retrieval effectivenesses of two indexing languages were evaluated in an online searching experiment. It was found that most of the Korean textual databases surveyed employ natural language indexing by either an automatic or a manual method, and that natural language indexing may outperform controlled language indexing if appropriate search strategies are employed.

  • PDF

A Study on Romanization Rules and Practices of the International institutions for Korean language materials (한글로마자표기에 대한 국제기관의 규정과 표기의 실제에 관한 연구)

  • Oh, Kyung-Mook
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.33-51
    • /
    • 2007
  • The fundamental issue of information retrieval in the Internet-based society is closely interrelated with the characteristics of language selected. The McCune-Reischauer Romanization system is not only considered as the international standard for romanizing Korean language, it is also familiar to the majority of the Korean material users internationally. McCune-Reischauer system is adopted by the ISO, UNGEGN, ALA, LC, British PCGN, BL, and the relevant agencies in Europe, Canada and Australia etc. Encouraging for switching to the new Romanization system(2000) would result in complications among the library's catalogs and online databases, causing confusion for both staffs and readers. This paper analysed that the international efforts and rules for Romanizing Korean language materials and recommended direction for bibliographical issues.

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Understanding Sexual Identity-related Concerns through the Analysis of Questions on a Social Q&A Site (소셜 Q&A 사이트의 질문 분석을 통한 청소년의 성 정체성(sexual identity) 고민에 대한 이해)

  • Zhu, Yongjun;Nam, Seojin;Yi, Dajeong;Yi, Yong Jeong
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.4
    • /
    • pp.101-119
    • /
    • 2020
  • The study aims to understand major topics and concerns of gender identity-related questions expressed by the users of the NAVER social Q&A site. To achieve this goal, we analyzed 2,120 questions created from 2010 to 2018 using natural language- and information retrieval-based methods. Results indicated that the major topics discussed by the users include interpersonal relationships, doubts about gender identity, sexual orientation, feelings and relationships, and concerns about gender identity. In addition, users mainly expressed concerns regarding general issues of gender identity; sexual orientation; negative cognition about gender identity; confession, coming-out, homosexuality; future, heterosexual relationships, military enlistment; and causes of gender identity confusion. The present study effectively derives information needs from real-world concerns about sexual identity by employing topic modeling techniques, and by comparing the advantages of exact match and tf-idf-based information retrieval methods extends methodology of Library and Information Science. Further, it has contributed to the academic maturity of the study of information behavior by observing the information needs or information-seeking behaviors of online community users with specific interests.

Analyzing the Relevancy of Policy by Abnormal Pattern Analysis : Focused on the Case of S-City's e-Card for Child Meal Support (이상 패턴 분석을 통한 정책의 적합성 분석 연구 : S 시의 아동 급식 전자 카드 사례를 중심으로)

  • Jeon, Jongshik;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.17 no.1
    • /
    • pp.135-153
    • /
    • 2018
  • E-Card Service for Child Nutrition Program is one of the main public policy services nowadays. In case of inconvenience during the use of the e-cards, it is recommended to cooperate with related organizations in order to promptly handle and provide guidance, and thoroughly manage child feeding service such as hygiene, nutrition and kindness etc. To do so, it is very important to provide food service that meets local actual conditions and children's needs in a cost effective manner for the underage who are worried about the poorly-fed by understanding the pattern of child feeding e-card service. Hence. this paper aims to investigate how child feeding e-card service efficiently provides meals according to the local situation and children's needs through big data analysis and to propose a method of identifying welfare conditions according to the purpose of service with actual application examples. The results suggest that, first of all, this study is able to judge appropriateness of public institution's policy in a timely and repetitive manner through non-standard data analysis such as Naver News and transaction data. Secondly, this paper proposes a multi-layered analysis framework, which performs online open data analysis to detect policy issues, visualizes retrieval and preprocessing of real data, and performs abnormal pattern recognition. These will be worthy of reference to other similar projects.

Face Annotation System for Social Network Environments (소셜 네트웍 환경에서의 얼굴 주석 시스템)

  • Chai, Kwon-Taeg;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.601-605
    • /
    • 2009
  • Recently, photo sharing and publishing based Social Network Sites(SNSs) are increasingly attracting the attention of academic and industry researches. Millions of users have integrated these sites into their daily practices to communicate with online people. In this paper, we propose an efficient face annotation and retrieval system under SNS. Since the system needs to deal with a huge database which consists of an increasing users and images, both effectiveness and efficiency are required, In order to deal with this problem, we propose a face annotation classifier which adopts an online learning and social decomposition approach. The proposed method is shown to have comparable accuracy and better efficiency than that of the widely used Support Vector Machine. Consequently, the proposed framework can reduce the user's tedious efforts to annotate face images and provides a fast response to millions of users.

A Study on Service Integration of Research Information and Dictionary in Portal Site (포털사이트의 사전과 학술정보 연계 검색 방안 연구)

  • Yang, Chang-Jin
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.7-22
    • /
    • 2011
  • Internet portals have been revolutionized not only as simple search engines but also as a new space for the Internet users. They have developed to give satisfying search results for academic information users. academic fields. However, their attention was given to the quantity rather than the quality of the results. This tendency is now changing. This study addresses the problems in the search process using the current portal sites and presents an integrated scholarly information service where users can access more organized and trustworthy information linked with online technical keyword dictionary. When a user enter a keyword on a portal site, he/she can access to high quality scholarly information resources linked with keyword. This could assure the user to get an expanded knowledge with confirmation.

A Study on Query Refinement by Online Relevance Feedback in an Information Filtering System (온라인 이용자 피드백을 사용한 정보필터링 시스템의 수정질의 최적화에 관한 연구)

  • Choi, Kwang;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.4 s.50
    • /
    • pp.23-48
    • /
    • 2003
  • In this study an information filtering system was implemented and a series of relevance feedback experiments were conducted using the system. For the relevance feedback, the original queries were searched against the database and the results were reviewed by the researchers. Based on users' online relevance judgements a pair of 17 refined queries were generated using two methods called 'co-occurrence exclusion method' and 'lower frequencies exclusion method,' In order to generate them, the original queries, the descriptors and category codes appeared in either relevant or irrelevant document sets were applied as elements. Users' relevance judgments on the search results of the refined queries were compared and analyzed against those of the original queries.