• Title/Summary/Keyword: 검색 정보 시각화

Search Result 193, Processing Time 0.024 seconds

Knowledge Creation Structure of Big Data Research Domain (빅데이터 연구영역의 지식창출 구조)

  • Namn, Su-Hyeon
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.129-136
    • /
    • 2015
  • We investigate the underlying structure of big data research domain, which is diversified and complicated using bottom-up approach. For that purpose, we derive a set of articles by searching "big data" through the Korea Citation Index System provided by National Research Foundation of Korea. With some preprocessing on the author-provided keywords, we analyze bibliometric data such as author-provided keywords, publication year, author, and journal characteristics. From the analysis, we both identify major sub-domains of big data research area and discover the hidden issues which made big data complex. Major keywords identified include SOCIAL NETWORK ANALYSIS, HADOOP, MAPREDUCE, PERSONAL INFORMATION POLICY/PROTECTION/PRIVATE INFORMATION, CLOUD COMPUTING, VISUALIZATION, and DATA MINING. We finally suggest missing research themes to make big data a sustainable management innovation and convergence medium.

Adaptive Image Content-Based Retrieval Techniques for Multiple Queries (다중 질의를 위한 적응적 영상 내용 기반 검색 기법)

  • Hong Jong-Sun;Kang Dae-Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.3 s.303
    • /
    • pp.73-80
    • /
    • 2005
  • Recently there have been many efforts to support searching and browsing based on the visual content of image and multimedia data. Most existing approaches to content-based image retrieval rely on query by example or user based low-level features such as color, shape, texture. But these methods of query are not easy to use and restrict. In this paper we propose a method for automatic color object extraction and labelling to support multiple queries of content-based image retrieval system. These approaches simplify the regions within images using single colorizing algorithm and extract color object using proposed Color and Spatial based Binary tree map(CSB tree map). And by searching over a large of number of processed regions, a index for the database is created by using proposed labelling method. This allows very fast indexing of the image by color contents of the images and spatial attributes. Futhermore, information about the labelled regions, such as the color set, size, and location, enables variable multiple queries that combine both color content and spatial relationships of regions. We proved our proposed system to be high performance through experiment comparable with another algorithm using 'Washington' image database.

A Object-Based Image Retrieval Using Feature Analysis and Fractal Dimension (특징 분석과 프랙탈 차원을 이용한 객체 기반 영상검색)

  • 이정봉;박장춘
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.2
    • /
    • pp.173-186
    • /
    • 2004
  • This paper proposed the content-based retrieval system as a method for performing image retrieval through the effective feature extraction of the object of significant meaning based on the characteristics of man's visual system. To allow the object region of interest to be primarily detected, the region, being comparatively large size, greatly different from the background color and located in the middle of the image, was judged as the major object with a meaning. To get the original features of the image, the cumulative sum of tile declination difference vector the segment of the object contour had and the signature of the bipartite object were extracted and used in the form of being applied to the rotation of the object and the change of the size after partition of the total length of the object contour of the image into the normalized segment. Starting with this form feature, it was possible to make a retrieval robust to any change in translation, rotation and scaling by combining information on the texture sample, color and eccentricity and measuring the degree of similarity. It responded less sensitively to the phenomenon of distortion of the object feature due to the partial change or damage of the region. Also, the method of imposing a different weight of similarity on the image feature based on the relationship of complexity between measured objects using the fractal dimension by the Boxing-Counting Dimension minimized the wrong retrieval and showed more efficient retrieval rate.

  • PDF

A Study on the Intellectual Structure of Metadata Research by Using Co-word Analysis (동시출현단어 분석에 기반한 메타데이터 분야의 지적구조에 관한 연구)

  • Choi, Ye-Jin;Chung, Yeon-Kyoung
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.63-83
    • /
    • 2016
  • As the usage of information resources produced in various media and forms has been increased, the importance of metadata as a tool of information organization to describe the information resources becomes increasingly crucial. The purposes of this study are to analyze and to demonstrate the intellectual structure in the field of metadata through co-word analysis. The data set was collected from the journals which were registered in the Core collection of Web of Science citation database during the period from January 1, 1998 to July 8, 2016. Among them, the bibliographic data from 727 journals was collected using Topic category search with the query word 'metadata'. From 727 journal articles, 410 journals with author keywords were selected and after data preprocessing, 1,137 author keywords were extracted. Finally, a total of 37 final keywords which had more than 6 frequency were selected for analysis. In order to demonstrate the intellectual structure of metadata field, network analysis was conducted. As a result, 2 domains and 9 clusters were derived, and intellectual relations among keywords from metadata field were visualized, and proposed keywords with high global centrality and local centrality. Six clusters from cluster analysis were shown in the map of multidimensional scaling, and the knowledge structure was proposed based on the correlations among each keywords. The results of this study are expected to help to understand the intellectual structure of metadata field through visualization and to guide directions in new approaches of metadata related studies.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.

Research Trends in Record Management Using Unstructured Text Data Analysis (비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향)

  • Deokyong Hong;Junseok Heo
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.73-89
    • /
    • 2023
  • This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as "record management" (889 times), "analysis" (888 times), "archive" (742 times), "record" (562 times), and "utilization" (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as "archiving," "national record management," "standardization," "official documents," and "record management systems" occurring frequently in the first group (past). On the other hand, keywords such as "community," "data," "record information service," "online," and "digital archives" in the second group (current) were garnering substantial focus.

Degree Programs in Data Science at the School of Information in the States (미국 정보 대학의 데이터사이언스 학위 현황 연구)

  • Park, Hyoungjoo
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.2
    • /
    • pp.305-332
    • /
    • 2022
  • This preliminary study examined the degree programs in data science at the School of Information in the States. The focus of this study was the data science degrees offered at the School of Information awarded by the 64 Library and Information Science (LIS) programs accredited by the American Library Association (ALA) in 2022. In addition, this study examined the degrees, majors, minors, specialized tracks, and certificates in data science, as well as the potential careers after earning a data science degree. Overall, eight Schools of Information (iSchools) offered 12 data science degrees. Data science courses at the School of Information focus on topics such as introduction to data science, information retrieval, data mining, database, data and humanities, machine learning, metadata, research methods, data analysis and visualization, internship/capstone, ethics and security, user, policy, and curation and management. Most schools did not offer traditional LIS courses. After earning the data science degree in the School of Information, the potential careers included data scientists, data engineers and data analysts. The researcher hopes the findings of this study can be used as a starting point to discuss the directions of data science programs from the perspectives of the information field, specifically the degrees, majors, minors, specialized tracks and certificates in data science.

Analysis of Major Changes in Press Articles Related to 'High School Credit System'

  • Kwon, Choong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.183-191
    • /
    • 2020
  • The purpose of this study is to objectively analyze the trend of media articles related to the 'high school credit system' (2017~2019: 3 years), which has become the biggest concern among Korean education policies, through BIGKinds, a news data big data analysis service for media companies. The main research methodologies were BIGKinds system's specific search term news search, news trend analysis, keyword extraction and wordcloud implementation, network analysis and network picture presentation. The research results are as follows; First, the number of articles related to the high school credit system that appeared in major media outlets in Korea for 3 years from 2017 to 2019 was 3,649. The number of articles was sharply increased at a certain point about 4 times, based on the government's announcement of related policies. It showed an increasing news trend. Second, the top 20 keywords that emerged from the press articles related to the high school credit system for 3 years of analysis were presented, and it was confirmed that the keyword change by year appeared. Third, the network of media articles related to the high school credit system was visualized and presented in different ways by person, institution, and keyword. The results of this study confirmed that the high school credit system education policy was adopted as the representative education policy of the Moon Jae-in government, and is proceeding in the policy decision stage and policy implementation stage.

Analysis of a Compound-Target Network of Oryeong-san (오령산 구성성분-타겟 네트워크 분석)

  • Kim, Sang-Kyun
    • Journal of the Korea Knowledge Information Technology Society
    • /
    • v.13 no.5
    • /
    • pp.607-614
    • /
    • 2018
  • Oryeong-san is a prescription widely used for diseases where water is stagnant because it has the effect of circulating the water in the body and releasing it into the urine. In order to investigate the mechanisms of oryeong-san, we in this paper construct and analysis the compound-target network of medicinal materials constituting oryeong-san based on a systems pharmacology approach. First, the targets related to the 475 chemical compounds of oryeong-san were searched in the STITCH database, and the search results for the interactions between compounds and targets were downloaded as XML files. The compound-target network of oryeong-san is visualized and explored using Gephi 0.8.2, which is an open-source software for graphs and networks. In the network, nodes are compounds and targets, and edges are interactions between the nodes. The edge is weighted according to the reliability of the interaction. In order to analysis the compound-target network, it is clustered using MCL algorithm, which is able to cluster the weighted network. A total of 130 clusters were created, and the number of nodes in the cluster with the largest number of nodes was 32. In the clustered network, it was revealed that the active compounds of medicinal materials were associated with the targets for regulating the blood pressure in the kidney. In the future, we will clarify the mechanisms of oryeong-san by linking the information on disease databases and the network of this research.