• Title/Summary/Keyword: 유사 키워드

Search Result 311, Processing Time 0.028 seconds

Ontology-based User Customized Search Service Considering User Intention (온톨로지 기반의 사용자 의도를 고려한 맞춤형 검색 서비스)

  • Kim, Sukyoung;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.129-143
    • /
    • 2012
  • Recently, the rapid progress of a number of standardized web technologies and the proliferation of web users in the world bring an explosive increase of producing and consuming information documents on the web. In addition, most companies have produced, shared, and managed a huge number of information documents that are needed to perform their businesses. They also have discretionally raked, stored and managed a number of web documents published on the web for their business. Along with this increase of information documents that should be managed in the companies, the need of a solution to locate information documents more accurately among a huge number of information sources have increased. In order to satisfy the need of accurate search, the market size of search engine solution market is becoming increasingly expended. The most important functionality among much functionality provided by search engine is to locate accurate information documents from a huge information sources. The major metric to evaluate the accuracy of search engine is relevance that consists of two measures, precision and recall. Precision is thought of as a measure of exactness, that is, what percentage of information considered as true answer are actually such, whereas recall is a measure of completeness, that is, what percentage of true answer are retrieved as such. These two measures can be used differently according to the applied domain. If we need to exhaustively search information such as patent documents and research papers, it is better to increase the recall. On the other hand, when the amount of information is small scale, it is better to increase precision. Most of existing web search engines typically uses a keyword search method that returns web documents including keywords which correspond to search words entered by a user. This method has a virtue of locating all web documents quickly, even though many search words are inputted. However, this method has a fundamental imitation of not considering search intention of a user, thereby retrieving irrelevant results as well as relevant ones. Thus, it takes additional time and effort to set relevant ones out from all results returned by a search engine. That is, keyword search method can increase recall, while it is difficult to locate web documents which a user actually want to find because it does not provide a means of understanding the intention of a user and reflecting it to a progress of searching information. Thus, this research suggests a new method of combining ontology-based search solution with core search functionalities provided by existing search engine solutions. The method enables a search engine to provide optimal search results by inferenceing the search intention of a user. To that end, we build an ontology which contains concepts and relationships among them in a specific domain. The ontology is used to inference synonyms of a set of search keywords inputted by a user, thereby making the search intention of the user reflected into the progress of searching information more actively compared to existing search engines. Based on the proposed method we implement a prototype search system and test the system in the patent domain where we experiment on searching relevant documents associated with a patent. The experiment shows that our system increases the both recall and precision in accuracy and augments the search productivity by using improved user interface that enables a user to interact with our search system effectively. In the future research, we will study a means of validating the better performance of our prototype system by comparing other search engine solution and will extend the applied domain into other domains for searching information such as portal.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Professional Baseball Viewing Culture Survey According to Corona 19 using Social Network Big Data (소셜네트워크 빅데이터를 활용한 코로나 19에 따른 프로야구 관람문화조사)

  • Kim, Gi-Tak
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.6
    • /
    • pp.139-150
    • /
    • 2020
  • The data processing of this study focuses on the textom and social media words about three areas: 'Corona 19 and professional baseball', 'Corona 19 and professional baseball', and 'Corona 19 and professional sports' The data was collected and refined in a web environment and then processed in batch, and the Ucinet6 program was used to visualize it. Specifically, the web environment was collected using Naver, Daum, and Google's channels, and was summarized into 30 words through expert meetings among the extracted words and used in the final study. 30 extracted words were visualized through a matrix, and a CONCOR analysis was performed to identify clusters of similarity and commonality of words. As a result of analysis, the clusters related to Corona 19 and Pro Baseball were composed of one central cluster and five peripheral clusters, and it was found that the contents related to the opening of professional baseball according to the corona 19 wave were mainly searched. The cluster related to Corona 19 and unrelated to professional baseball consisted of one central cluster and five peripheral clusters, and it was found that the keyword of the position of professional baseball related to the professional baseball game according to Corona 19 was mainly searched. Corona 19 and the cluster related to professional sports consisted of one central cluster and five peripheral clusters, and it was found that the keywords related to the start of professional sports according to the aftermath of Corona 19 were mainly searched.

Analysis of Sea Trial's Title for Naval Ships Based on Big Data (빅데이터 기반 함정 시운전 종목명 분석)

  • Lee, Hyeong-Sin;Seo, Hyeong-Pil;Beak, Yong-Kawn;Lee, Sang-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.420-426
    • /
    • 2020
  • The purpose and main points of the ROK-US Navy were analyzed from various angles using the big data technology Word Cloud for efficient sea trials. First, a comparison of words extracted through keyword cleansing in the ROK-US Navy sea trial showed that the ROK Navy conducted a single equipment test, and the US Navy conducted an integrated test run focusing on the system. Second, an analysis of the ROK-US Navy sea trials showed that approximately 66.6% were analyzed as similar items, of which more than two items were 112 items Approximately 44% of the 252 items of the ROK Navy sea trials overlapped, and that 89 items (35% of the total) could be reduced when integrated into the US Navy sea trials. A ship is a complex system in which multiple equipment operates simultaneously. The focus on checking the functions and performance of individual equipment, such as the ROK Navy's sea trials, will increase the sea trial period because of the excessive number of sea trial targets. In addition, the budget required will inevitably increase due to an increase in schedule and evaluation costs. In the future, further research will be needed to achieve more efficient and accurate sea trials through integrated system evaluations, such as the U.S. Navy sea trials.

Intelligent Web Crawler for Supporting Big Data Analysis Services (빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러)

  • Seo, Dongmin;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.575-584
    • /
    • 2013
  • Data types used for big-data analysis are very widely, such as news, blog, SNS, papers, patents, sensed data, and etc. Particularly, the utilization of web documents offering reliable data in real time is increasing gradually. And web crawlers that collect web documents automatically have grown in importance because big-data is being used in many different fields and web data are growing exponentially every year. However, existing web crawlers can't collect whole web documents in a web site because existing web crawlers collect web documents with only URLs included in web documents collected in some web sites. Also, existing web crawlers can collect web documents collected by other web crawlers already because information about web documents collected in each web crawler isn't efficiently managed between web crawlers. Therefore, this paper proposed a distributed web crawler. To resolve the problems of existing web crawler, the proposed web crawler collects web documents by RSS of each web site and Google search API. And the web crawler provides fast crawling performance by a client-server model based on RMI and NIO that minimize network traffic. Furthermore, the web crawler extracts core content from a web document by a keyword similarity comparison on tags included in a web documents. Finally, to verify the superiority of our web crawler, we compare our web crawler with existing web crawlers in various experiments.

An Investigation on Characteristics and Intellectual Structure of Sociology by Analyzing Cited Data (사회학 분야의 연구데이터 특성과 지적구조 규명에 관한 연구)

  • Choi, Hyung Wook;Chung, EunKyung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.3
    • /
    • pp.109-124
    • /
    • 2017
  • Through a wide variety of disciplines, practices on data access and re-use have been increased recently. In fact, there has been an emerging phenomenon that researchers tend to use the data sets produced by other researchers and give scholarly credit as citation. With respect to this practice, in 2012, Thomson Reuters launched Data Citation Index (DCI). With the DCI, citation to research data published by researchers are collected and analyzed in a similar way for citation to journal articles. The purpose of this study is to identify the characteristics and intellectual structure of sociology field based on research data, which is one of actively data-citing fields. To accomplish this purpose, two data sets were collected and analyzed. First, from DCI, a total of 8,365 data were collected in the field of sociology. Second, a total of 12,132 data were collected from Web of Science with a topic search with 'Sociology'. As a result of the co-word analysis of author provided-keywords for both data sets, the intellectual structure of research data-based sociology was composed of two areas and 15 clusters and that of article-based sociology was composed with three areas and 17 clusters. More importantly, medical science area was found to be actively studied in research data-based sociology and public health and psychology are identified to be central areas from data citation.

Effective Picture Search in Lifelog Management Systems using Bluetooth Devices (라이프로그 관리 시스템에서 블루투스 장치를 이용한 효과적인 사진 검색 방법)

  • Chung, Eun-Ho;Lee, Ki-Yong;Kim, Myoung-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.383-391
    • /
    • 2010
  • A Lifelog management system provides users with services to store, manage, and search their life logs. This paper proposes a fully-automatic collecting method of real world social contacts and lifelog search engine using collected social contact information as keyword. Wireless short-distance network devices in mobile phones are used to detect social contacts of their users. Human-Bluetooth relationship matrix is built based on the frequency of a human-being and a Bluetooth device being observed at the same time. Results show that with 20% of social contact information out of full social contact information of the observation times used for calculation, 90% of human-Bluetooth relationship can be correctly acquired. A lifelog search-engine that takes human names as keyword is suggested which compares two vectors, a row of Human-Bluetooth matrix and a vector of Bluetooth list scanned while a lifelog was created, using vector information retrieval model. This search engine returns more lifelog than existing text-matching search engine and ranks the result unlike existing search-engine.

Improving Performance of Search Engine By Using WordNet-based Collaborative Evaluation and Hyperlink (워드넷 기반 협동적 평가와 하이퍼링크를 이용한 검색엔진의 성능 향상)

  • Kim, Hyun-Gil;Kim, Jun-Tae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.369-380
    • /
    • 2004
  • In this paper, we propose a web page weighting scheme based on WordNet-based collaborative evaluation and hyperlink to improve the precision of web search engine. Generally search engines use keyword matching to decide web page ranking. In the information retrieval from huge data such as the Web, simple word comparison cannot distinguish important documents because there exist too many documents with similar relevancy. In this paper, we implement a WordNet-based user interface that helps to distinguish different senses of query word, and constructed a search engine in which the implicit evaluations by multiple users are reflected in ranking by accumulating the number of clicks. In accumulating click counts, they are stored separately according to lenses, so that more accurate search is possible. Weighting of each web page by using collaborative evaluation and hyperlink is reflected in ranking. The experimental results with several keywords show that the precision of proposed system is improved compared to conventional search engines.

Design and Implementation of Web-Based Self-directed Learning System for Word Processor Qualifying Exams (워드프로세서 자격증 시험을 위한 웹 기반 자기 주도적 학습 시스템 설계 및 구현)

  • Yang, Yun-Jeong;Kim, Chang-Suk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.1
    • /
    • pp.43-48
    • /
    • 2006
  • The educational system has been changed owing to Web, which is most actively used on internet and has the characteristics of providing suitable environments for implementing constructivism study theory. WBI(Web Based Instruction), web-mediated teaching form for students at a long distance, has the advantages of possible interact between instructors and learners, offering a great variety of learning materials, and overcome the spatiotemporal restriction. This paper focuces on the construction of learning surroundings where the learner-centered, active learning can be done by design and Implementation of web based instruct system providing a sham examination with an item pool system. The web based Self-directed Learning system for word processor qualifying exams on this paper, can be mentioned as a real item pool that the question is not setting each time by the instructors but can be reused by reference on item pool bank, designed the number of question. It helps the learner Self-directed Learning study with evaluation during the web based instruct process and immediate feedback. It also provides the chance to research some similar using keyword. To sum up, this system can amplify the efficiency of study.

Semantic Search and Recommendation of e-Catalog Documents through Concept Network (개념 망을 통한 전자 카탈로그의 시맨틱 검색 및 추천)

  • Lee, Jae-Won;Park, Sung-Chan;Lee, Sang-Keun;Park, Jae-Hui;Kim, Han-Joon;Lee, Sang-Goo
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.3
    • /
    • pp.131-145
    • /
    • 2010
  • Until now, popular paradigms to provide e-catalog documents that are adapted to users' needs are keyword search or collaborative filtering based recommendation. Since users' queries are too short to represent what users want, it is hard to provide the users with e-catalog documents that are adapted to their needs(i.e., queries and preferences). Although various techniques have beenproposed to overcome this problem, they are based on index term matching. A conventional Bayesian belief network-based approach represents the users' needs and e-catalog documents with their corresponding concepts. However, since the concepts are the index terms that are extracted from the e-catalog documents, it is hard to represent relationships between concepts. In our work, we extend the conventional Bayesian belief network based approach to represent users' needs and e-catalog documents with a concept network which is derived from the Web directory. By exploiting the concept network, it is possible to search conceptually relevant e-catalog documents although they do not contain the index terms of queries. Furthermore, by computing the conceptual similarity between users, we can exploit a semantic collaborative filtering technique for recommending e-catalog documents.