• Title/Summary/Keyword: 개방 데이터 마이닝

Search Result 19, Processing Time 0.031 seconds

Finding Frequent Itemsets based on Open Data Mining in Data Streams (데이터 스트림에서 개방 데이터 마이닝 기반의 빈발항목 탐색)

  • Chang, Joong-Hyuk;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.447-458
    • /
    • 2003
  • The basic assumption of conventional data mining methodology is that the data set of a knowledge discovery process should be fixed and available before the process can proceed. Consequently, this assumption is valid only when the static knowledge embedded in a specific data set is the target of data mining. In addition, a conventional data mining method requires considerable computing time to produce the result of mining from a large data set. Due to these reasons, it is almost impossible to apply the mining method to a realtime analysis task in a data stream where a new transaction is continuously generated and the up-to-dated result of data mining including the newly generated transaction is needed as quickly as possible. In this paper, a new mining concept, open data mining in a data stream, is proposed for this purpose. In open data mining, whenever each transaction is newly generated, the updated mining result of whole transactions including the newly generated transactions is obtained instantly. In order to implement this mechanism efficiently, it is necessary to incorporate the delayed-insertion of newly identified information in recent transactions as well as the pruning of insignificant information in the mining result of past transactions. The proposed algorithm is analyzed through a series of experiments in order to identify the various characteristics of the proposed algorithm.

A Sliding Window Technique for Open Data Mining over Data Streams (개방 데이터 마이닝에 효율적인 이동 윈도우 기법)

  • Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.335-344
    • /
    • 2005
  • Recently open data mining methods focusing on a data stream that is a massive unbounded sequence of data elements continuously generated at a rapid rate are proposed actively. Knowledge embedded in a data stream is likely to be changed over time. Therefore, identifying the recent change of the knowledge quickly can provide valuable information for the analysis of the data stream. This paper proposes a sliding window technique for finding recently frequent itemsets, which is applied efficiently in open data mining. In the proposed technique, its memory usage is kept in a small space by delayed-insertion and pruning operations, and its mining result can be found in a short time since the data elements within its target range are not traversed repeatedly. Moreover, the proposed technique focused in the recent data elements, so that it can catch out the recent change of the data stream.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

Open Platform for Improvement of e-Health Accessibility (의료정보서비스 접근성 향상을 위한 개방형 플랫폼 구축방안)

  • Lee, Hyun-Jik;Kim, Yoon-Ho
    • Journal of Digital Contents Society
    • /
    • v.18 no.7
    • /
    • pp.1341-1346
    • /
    • 2017
  • In this paper, we designed the open service platform based on integrated type of individual customized service and intelligent information technology with individual's complex attributes and requests. First, the data collection phase is proceed quickly and accurately to repeat extraction, transformation and loading. The generated data from extraction-transformation-loading process module is stored in the distributed data system. The data analysis phase is generated a variety of patterns that used the analysis algorithm in the field. The data processing phase is used distributed parallel processing to improve performance. The data providing should operate independently on device-specific management platform. It provides a type of the Open API.

A Study on the Research Trends on Open Innovation using Topic Modeling (토픽 모델링을 이용한 개방형 혁신 연구동향 분석 및 정책 방향 모색)

  • Cho, Sung-Bae;Shin, Shin-Ae;Kang, Dong-Seok
    • Informatization Policy
    • /
    • v.25 no.3
    • /
    • pp.52-74
    • /
    • 2018
  • In February 2018, the Korean government established the "Comprehensive Plans for Government Innovation" in order to realize 'the people-centered government'. The core of the comprehensive plans is participation of the people, which is very similar to open innovation where social issues are solved by ideas and capabilities of the private sector rather than those of the government. Therefore, this study was conducted by extracting open innovation topics through topic modeling based on LDA(Latent Dirichlet Allocation) as English abstract-data from 2003, when the plans for open innovation was first announced, to April 2018. Based on the extracted results, it also conducted a comparative analysis with "Comprehensive Plans for Government Innovation." The study has significant implications in that it derives the relationship between the subjects, analyzes the present policies of Korea on open innovation and suggests directions for development.

A study of the vitalization strategy for public sports facility through big-data (빅데이터 분석을 활용한 기금지원 체육시설 활성화 방안)

  • Kim, Mi-ok;Ko, Jin-soo;Noh, Seung-Chul;Chung, Jae-Hoon
    • Journal of Digital Convergence
    • /
    • v.15 no.2
    • /
    • pp.527-535
    • /
    • 2017
  • As interest increases in health promotion through sports, demand for public sports facilities is steadily growing. However, there is a lack of research on operation and management compared with the supply plan of public sports facility. In this context, the aim of this study is to address problems of management of public sports centers and suggest strategies for vitalizing the facilities through the big-data. The data are collected from web such as news, blog, and cafe for one year in 2015. From the big-data, We can find that the national sports centers and the open gyms showed similar users' behavior but showed different needs. Both facilities have been used as sports and leisure area and have a high percentage of visitors for other purposes such as walking, picnics, etc. However, while the national sports facilities which were used for more specialized programs, the open sports center were used as leisure space.

Social graph visualization techniques for public data (공공데이터에 적합한 다양한 소셜 그래프 비주얼라이제이션 알고리즘 제안)

  • Lee, Manjai;On, Byung-Won
    • Journal of the HCI Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.5-17
    • /
    • 2015
  • Nowadays various public data have been serviced to the public. Through the opening of public data, the transparency and effectiveness of public policy developed by governments are increased and users can lead to the growth of industry related to public data. Since end-users of using public data are citizens, it is very important for everyone to figure out the meaning of public data using proper visualization techniques. In this work, to indicate the significance of widespread public data, we consider UN voting record as public data in which many people may be interested. In general, it has high utilization value by diplomatic and educational purposes, and is available in public. If we use proper data mining and visualization algorithms, we can get an insight regarding the voting patterns of UN members. To visualize, it is necessary to measure the voting similarity values among UN members and then a social graph is created by the similarity values. Next, using a graph layout algorithm, the social graph is rendered on the screen. If we use the existing method for visualizing the social graph, it is hard to understand the meaning of the social graph because the graph is usually dense. To improve the weak point of the existing social graph visualization, we propose Friend-Matching, Friend-Rival Matching, and Bubble Heap algorithms in this paper. We also validate that our proposed algorithms can improve the quality of visualizing social graphs displayed by the existing method. Finally, our prototype system has been released in http://datalab.kunsan.ac.kr/politiz/un/. Please, see if it is useful in the aspect of public data utilization.

Generating and Controlling an Interlinking Network of Technical Terms to Enhance Data Utilization (데이터 활용률 제고를 위한 기술 용어의 상호 네트워크 생성과 통제)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.157-182
    • /
    • 2018
  • As data management and processing techniques have been developed rapidly in the era of big data, nowadays a lot of business companies and researchers have been interested in long tail data which were ignored in the past. This study proposes methods for generating and controlling a network of technical terms based on text mining technique to enhance data utilization in the distribution of long tail theory. Especially, an edit distance technique of text mining has given us efficient methods to automatically create an interlinking network of technical terms in the scholarly field. We have also used linked open data system to gather experimental data to improve data utilization and proposed effective methods to use data of LOD systems and algorithm to recognize patterns of terms. Finally, the performance evaluation test of the network of technical terms has shown that the proposed methods were useful to enhance the rate of data utilization.

Multi-Agent Knowledge Discovery and Problem Solving Framework (다중 에이전트 기반 지식 탐사 및 문제 해결 프레임워크)

  • 강성희;박승수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.101-103
    • /
    • 1999
  • Decentralized 정보는 여러 도메인에 대한 heterogeneous한 독립적인 정보가 자율적으로 존재하며 이들 정보간의 관계성의 고려한 전체에 대한 global view가 존재하지 않기 때문에 inter-domain에 대한 마이닝을 수행하는데 어려움이 있다. 본 연구에서는 intra-domain knowledge discovery, intra 및 inter-domain problem solving method라는 접근방법으로, decentralized 데이터 환경에서 문제 해결에 필요한 정보 추출을 위한 데이터 tailoring과 분산 데이터에 대한 목표-지향 데이터마이닝(goal-oriented data-mining)을 통해 문제 해결을 위해 필요한 지식을 생성하고 이들 간의 관련 정보를 탐색하여 문제를 해결하는 프레임워크를 제안한다. 특히, 생성된 지식간의 협동 문제 처리를 멀티 에이전트 패러다임을 이용하기로 한다. 제안 프레임워크는 산재되어 있는 데이터들로부터 문제 해결에 유용한 지식 차원의 정보를 추출해내고 생성된 지식을 바탕으로 각 도메인 정보에 대한 개별적인 사용뿐 만 아니라 서로 cooperation을 통한 문제 해결을 지원함으로써, 개방된 분산 환경하에 decentralized 되어 있는 여러 도메인 정보를 보다 효율적으로 활용할 수 있는 새로운 형태의 문제 해결 방법이라고 할 수 있다.

  • PDF

A Public Open Civil Complaint Data Analysis Model to Improve Spatial Welfare for Residents - A Case Study of Community Welfare Analysis in Gangdong District - (거주민 공간복지 향상을 위한 공공 개방 민원 데이터 분석 모델 - 강동구 공간복지 분석 사례를 중심으로 -)

  • Shin, Dongyoun
    • Journal of KIBIM
    • /
    • v.13 no.3
    • /
    • pp.39-47
    • /
    • 2023
  • This study aims to introduce a model for enhancing community well-being through the utilization of public open data. To objectively assess abstract notions of residential satisfaction, text data from complaints is analyzed. By leveraging accessible public data, costs related to data collection are minimized. Initially, relevant text data containing civic complaints is collected and refined by removing extraneous information. This processed data is then combined with meaningful datasets and subjected to topic modeling, a text mining technique. The insights derived are visualized using Geographic Information System (GIS) and Application Programming Interface (API) data. The efficacy of this analytical model was demonstrated in the Godeok/Gangil area. The proposed methodology allows for comprehensive analysis across time, space, and categories. This flexible approach involves incorporating specific public open data as needed, all within the overarching framework.