• Title/Summary/Keyword: 정보검색 시스템

Search Result 5,087, Processing Time 0.036 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

The Method for Real-time Complex Event Detection of Unstructured Big data (비정형 빅데이터의 실시간 복합 이벤트 탐지를 위한 기법)

  • Lee, Jun Heui;Baek, Sung Ha;Lee, Soon Jo;Bae, Hae Young
    • Spatial Information Research
    • /
    • v.20 no.5
    • /
    • pp.99-109
    • /
    • 2012
  • Recently, due to the growth of social media and spread of smart-phone, the amount of data has considerably increased by full use of SNS (Social Network Service). According to it, the Big Data concept is come up and many researchers are seeking solutions to make the best use of big data. To maximize the creative value of the big data held by many companies, it is required to combine them with existing data. The physical and theoretical storage structures of data sources are so different that a system which can integrate and manage them is needed. In order to process big data, MapReduce is developed as a system which has advantages over processing data fast by distributed processing. However, it is difficult to construct and store a system for all key words. Due to the process of storage and search, it is to some extent difficult to do real-time processing. And it makes extra expenses to process complex event without structure of processing different data. In order to solve this problem, the existing Complex Event Processing System is supposed to be used. When it comes to complex event processing system, it gets data from different sources and combines them with each other to make it possible to do complex event processing that is useful for real-time processing specially in stream data. Nevertheless, unstructured data based on text of SNS and internet articles is managed as text type and there is a need to compare strings every time the query processing should be done. And it results in poor performance. Therefore, we try to make it possible to manage unstructured data and do query process fast in complex event processing system. And we extend the data complex function for giving theoretical schema of string. It is completed by changing the string key word into integer type with filtering which uses keyword set. In addition, by using the Complex Event Processing System and processing stream data at real-time of in-memory, we try to reduce the time of reading the query processing after it is stored in the disk.

Analyzing Passenger Arrival Behavior Based on the Spent Time for Airport Access (공항접근시간에 따른 여객의 공항도착 행태분석)

  • 오성열;김원규;박용화
    • Journal of Korean Society of Transportation
    • /
    • v.21 no.4
    • /
    • pp.17-27
    • /
    • 2003
  • In general, an airport access system has influenced on airport terminal operation. The congestion and delay in service facilities at an airport are definitely depended on the patterns of passenger arrival behavior and time spent in a terminal. Therefore, it is necessary to analyze the passenger arrival behavior at an airport to improve the operations at passenger terminal. Passenger arrival patterns to an airport are mainly depended on factors such as the length of access time. reliability of access time. and provision of transport modes, etc. The focus of this paper is to estimate the relationship between the length of access time and passenger's total time spent to board aeroplane. For this, passenger surveys were conducted at the Gimpo International Airport for a large airport and Sacheon Airport for a small size airport. The mathematical relationship between arrival time at an airport prior to the scheduled time of departure(STD) and access time spent was then estimated. It is considered that the results of this study can be used to reduce congestion and delays, thereby to improve the efficiency of the passenger services at the airports.

A Study on the Regional Characteristics of Broadband Internet Termination by Coupling Type using Spatial Information based Clustering (공간정보기반 클러스터링을 이용한 초고속인터넷 결합유형별 해지의 지역별 특성연구)

  • Park, Janghyuk;Park, Sangun;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.45-67
    • /
    • 2017
  • According to the Internet Usage Research performed in 2016, the number of internet users and the internet usage have been increasing. Smartphone, compared to the computer, is taking a more dominant role as an internet access device. As the number of smart devices have been increasing, some views that the demand on high-speed internet will decrease; however, Despite the increase in smart devices, the high-speed Internet market is expected to slightly increase for a while due to the speedup of Giga Internet and the growth of the IoT market. As the broadband Internet market saturates, telecom operators are over-competing to win new customers, but if they know the cause of customer exit, it is expected to reduce marketing costs by more effective marketing. In this study, we analyzed the relationship between the cancellation rates of telecommunication products and the factors affecting them by combining the data of 3 cities, Anyang, Gunpo, and Uiwang owned by a telecommunication company with the regional data from KOSIS(Korean Statistical Information Service). Especially, we focused on the assumption that the neighboring areas affect the distribution of the cancellation rates by coupling type, so we conducted spatial cluster analysis on the 3 types of cancellation rates of each region using the spatial analysis tool, SatScan, and analyzed the various relationships between the cancellation rates and the regional data. In the analysis phase, we first summarized the characteristics of the clusters derived by combining spatial information and the cancellation data. Next, based on the results of the cluster analysis, Variance analysis, Correlation analysis, and regression analysis were used to analyze the relationship between the cancellation rates data and regional data. Based on the results of analysis, we proposed appropriate marketing methods according to the region. Unlike previous studies on regional characteristics analysis, In this study has academic differentiation in that it performs clustering based on spatial information so that the regions with similar cancellation types on adjacent regions. In addition, there have been few studies considering the regional characteristics in the previous study on the determinants of subscription to high-speed Internet services, In this study, we tried to analyze the relationship between the clusters and the regional characteristics data, assuming that there are different factors depending on the region. In this study, we tried to get more efficient marketing method considering the characteristics of each region in the new subscription and customer management in high-speed internet. As a result of analysis of variance, it was confirmed that there were significant differences in regional characteristics among the clusters, Correlation analysis shows that there is a stronger correlation the clusters than all region. and Regression analysis was used to analyze the relationship between the cancellation rate and the regional characteristics. As a result, we found that there is a difference in the cancellation rate depending on the regional characteristics, and it is possible to target differentiated marketing each region. As the biggest limitation of this study and it was difficult to obtain enough data to carry out the analyze. In particular, it is difficult to find the variables that represent the regional characteristics in the Dong unit. In other words, most of the data was disclosed to the city rather than the Dong unit, so it was limited to analyze it in detail. The data such as income, card usage information and telecommunications company policies or characteristics that could affect its cause are not available at that time. The most urgent part for a more sophisticated analysis is to obtain the Dong unit data for the regional characteristics. Direction of the next studies be target marketing based on the results. It is also meaningful to analyze the effect of marketing by comparing and analyzing the difference of results before and after target marketing. It is also effective to use clusters based on new subscription data as well as cancellation data.

DNA barcoding of Raptor carcass collected in the Paju city, Korea (파주시에서 수집한 폐사체 맹금류의 DNA 바코드 연구)

  • Jin, Seon-Deok;Paik, In-Hwan;Lee, Soo-Young;Han, Gap-Soo;Yu, Jae-Pyoung;Paek, Woon-Kee
    • Korean Journal of Environment and Ecology
    • /
    • v.28 no.5
    • /
    • pp.523-530
    • /
    • 2014
  • One juvenile raptor which was not able to be identified due to its head damage was discovered on a roadside in Janggok-ri, Jori-eup, Paju on 28th June, 2011. The species was identified by DNA barcoding. After polymerase chain reaction (PCR) of the mitochondrial cytochrome c oxidase subunit I gene (COI), we obtained 695 bp sequences. We analyzed the obtained COI sequence with similar sequences from the BOLD systems and BLAST of the NCBI Genbank, and discovered that its sequence showed 100 % similarity values with the one of the five gray-faced buzzards which were previously researched. In addition, it was confirmed to be a female through sex determination using DNA. Such results are important information as it confirms the breeding of the gray-faced buzzards for the first time in 43 years since its breeding was last recorded in 1968, in Paju. Wildlife rescue center needs to work with adjacent consigned registration and preservation institutions when carcass of wild animals is collected or DNA samples are obtained for more accurate both species and sex identification through a systematic management system in the future. Furthermore, the obtained DNA sample of the gray-faced buzzard and COI gene, DNA barcode, could be used as reference standards for similar researches in the future.

A Study on the Effects of the Characteristics of Internet Shopping mall on Shopping Values and Customer Retantiong (인터넷 쇼핑몰 특성에 의한 쇼핑가치와 고객유지에 관한 연구)

  • Kim, Young-Man;Kim, Dong-Hyeon
    • Journal of Global Scholars of Marketing Science
    • /
    • v.8
    • /
    • pp.61-87
    • /
    • 2001
  • Internet, which has been developed as a new exchange revolution, forms a huge virtual exchange market, and the innovative electronic commerce has completely broken off the way of existing goods distribution. This study begins with an awareness of the importance of customer retention to keep winning over the competition in internet shopping mall. In order to explain of the customer retention between individual and internet shopping mall, the study introduces first a satisfaction on shopping followed by an awareness of the importance of customer retention, and looks into a formation process of trust, satisfaction, and relationship orientation occurred by the offer of valuable convenience to customers. The study also explores the influence on shopping value by the characteristics with which internet shopping mall can bear, unfold by a cause and effect relationship the degree of shopping satisfaction, trust, and relationship orientation, and inquires a question to find out how to fuse the characteristics for internet retention. Therefore, this study has the following purposes: After examining prior research for the characteristics of internet shopping mall, it presents a possibility to connect shopping value with customer retention in light of theoretical system on characteristic elements derived from emotional and utilitarian perspectives. In order to achieve the purposes, the characteristics of internet retailing shop included site design, virtual reality, web awareness, customer concern, merchandise search, information supply, product value, and transaction system. Hypotheses were set up for the relationship with these characteristics and substantially analyzed. To prove this research, we analyzed collected data in which customers had experienced in shopping at internet shopping mall and discussed strategic current issues about its analytic results.

  • PDF

The Consensus String Problem based on Radius is NP-complete (거리반경기반 대표문자열 문제의 NP-완전)

  • Na, Joong-Chae;Sim, Jeong-Seop
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.3
    • /
    • pp.135-139
    • /
    • 2009
  • The problems to compute the distances or similarities of multiple strings have been vigorously studied in such diverse fields as pattern matching, web searching, bioinformatics, computer security, etc. One well-known method to compare multiple strings in the given set is finding a consensus string which is a representative of the given set. There are two objective functions that are frequently used to find a consensus string, one is the radius and the other is the consensus error. The radius of a string x with respect to a set S of strings is the smallest number r such that the distance between the string x and each string in S is at most r. A consensus string based on radius is a string that minimizes the radius with respect to a given set. The consensus error of a string with respect to a given set S is the sum of the distances between x and all the strings in S. A consensus string of S based on consensus error is a string that minimizes the consensus error with respect to S. In this paper, we show that the problem of finding a consensus string based on radius is NP-complete when the distance function is a metric.

Design and Implementation of a Main Memory Index based on the R-tree for Moving Object Databases (이동체 데이터베이스를 위한 R-tree 기반 메인 메모리 색인의 설계 및 구현)

  • Ahn, Sung-Woo;An, Kyoung-Hwan;Lee, Chaug-Woo;Hong, Bong-Hee
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.2 s.17
    • /
    • pp.53-73
    • /
    • 2006
  • Recently, the need for Location-Based Services (LBS) has increased due to the development of mobile devices, such as PDAs, cellular phones and GPS. As a moving object database that stores and manages the positions of moving objects is the core technology of LBS, the scheme for maintaining the main memory DBMS to the server is necessary to store and process frequent reported positions of moving objects efficiently. However, previous works on a moving object database have studied mostly a disk based moving object index that is not guaranteed to work efficiently in the main memory DBMS because these indexes did not consider characteristics of the main memory. It is necessary to study the main memory index scheme for a moving object database. In this paper, we propose the main memory index scheme based on the R-tree for storing and processing positions of moving objects efficiently in the main memory DBMS. The proposed index scheme, which uses a growing node structure, prevents the splitting cost from increasing by delaying the node splitting when a node overflows. The proposed scheme also improves the search performance by using a MergeAndSplit policy for reducing overlaps between nodes and a LargeDomainNodeSplit policy for reducing a ratio of a domain size occupied by node's MBRs. Our experiments show that the proposed index scheme outperforms the existing index scheme on the maximum 30% for range queries.

  • PDF

Customizable Global Job Scheduler for Computational Grid (계산 그리드를 위한 커스터마이즈 가능한 글로벌 작업 스케줄러)

  • Hwang Sun-Tae;Heo Dae-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.7
    • /
    • pp.370-379
    • /
    • 2006
  • Computational grid provides the environment which integrates v 따 ious computing resources. Grid environment is more complex and various than traditional computing environment, and consists of various resources where various software packages are installed in different platforms. For more efficient usage of computational grid, therefore, some kind of integration is required to manage grid resources more effectively. In this paper, a global scheduler is suggested, which integrates grid resources at meta level with applying various scheduling policies. The global scheduler consists of a mechanical part and three policies. The mechanical part mainly search user queues and resource queues to select appropriate job and computing resource. An algorithm for the mechanical part is defined and optimized. Three policies are user selecting policy, resource selecting policy, and executing policy. These can be defined newly and replaced with new one freely while operation of computational grid is temporarily holding. User selecting policy, for example, can be defined to select a certain user with higher priority than other users, resource selecting policy is for selecting the computing resource which is matched well with user's requirements, and executing policy is to overcome communication overheads on grid middleware. Finally, various algorithms for user selecting policy are defined only in terms of user fairness, and their performances are compared.

A Logical Cell-Based Approach for Robot Component Repositories (논리적 셀 기반의 로봇 소프트웨어 컴포넌트 저장소)

  • Koo, Hyung-Min;Ko, In-Young
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.8
    • /
    • pp.731-742
    • /
    • 2007
  • Self-growing software is a software system that has the capability of evolving its functionalities and configurations by itself based on dynamically monitored situations. Self-growing software is especially necessary for intelligent service robots, which must have the capability to monitor their surrounding environments and provide appropriate behaviors for human users. However, it is hard to anticipate all situations that robots face with, and it is hard to make robots have all functionalities for various environments. In addition, robots have limited internal capacity. To support self-growing software for intelligent service robots, we are developing a cell-based distributed repository system that allows robots and developers transparently to share robot functionalities. To accomplish the creation of evolutionary repositories, we invented the concept of a cell, which is a logical group of distributed repositories based upon the functionalities of components. In addition, a cell can be used as a unit for the evolutionary growth of the components within the repositories. In this paper, we describe the requirements and architecture of the cell-based repository system for self-growing software. We also present a prototype implementation and experiment of the repository system. Through the cell-based repositories, we achieve improved performance of self-growing actions for robots and efficient sharing of components among robots and developers.