• Title/Summary/Keyword: 비정형 빅데이터

Search Result 238, Processing Time 0.032 seconds

A Comparative Analysis of Cognitive Change about Big Data Using Social Media Data Analysis (소셜 미디어 데이터 분석을 활용한 빅데이터에 대한 인식 변화 비교 분석)

  • Yun, Youdong;Jo, Jaechoon;Hur, Yuna;Lim, Heuiseok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.7
    • /
    • pp.371-378
    • /
    • 2017
  • Recently, with the spread of smart device and the introduction of web services, the data is rapidly increasing online, and it is utilized in various fields. In particular, the emergence of social media in the big data field has led to a rapid increase in the amount of unstructured data. In order to extract meaningful information from such unstructured data, interest in big data technology has increased in various fields. Big data is becoming a key resource in many areas. Big data's prospects for the future are positive, but concerns about data breaches and privacy are constantly being addressed. On this subject of big data, where positive and negative views coexist, the research of analyzing people's opinions currently lack. In this study, we compared the changes in peoples perception on big data based on unstructured data collected from the social media using a text mining. As a results, yearly keywords for domestic big data, declining positive opinions, and increasing negative opinions were observed. Based on these results, we could predict the flow of domestic big data.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

Flood monitoring and prediction using online unstructured data (비정형데이터를 활용한 홍수 모니터링 및 예측)

  • Lee, Jeong Ha;Hwang, Seok Hwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.118-118
    • /
    • 2019
  • 현재 홍수예보는 정형데이터인 유량 및 수위 등을 활용하여 이뤄지고 있다. 하지만 실제 사람들이 체감하는 홍수에 대한 위험도는 홍수예보 발령과는 달라 홍수예보가 이뤄지지 않은 지역에서 인명사고가 발생하기도 한다. 이는 수위 측정이 이뤄지지 않는 소규모 하천이나 사람들의 유동성이 큰 도심지역에서 빈번하게 발생한다. 이를 보완하기 위해서는 사람들의 체감 정도 및 인구의 유동성을 고려한 비정형데이터를 활용해야 한다. 특히 소셜 네트워크 서비스(Social Network Commuinty, SNS)를 사용하는 사람들이 많아지면서 기존에 사용되어 왔던 정형데이터 센서 이외의 데이터를 제공한다. 또한 개개인이 작성하는 글은 실시간으로 활용이 가능하여 인구의 유동성 및 시 공간적 데이터를 얻기에 유용하여 활용성이 매우 높은 비정형데이터이다. 따라서 본 연구에서는 SNS 데이터를 추출하고 이를 분석하여 2018년에 발생했던 강우사상과의 패턴을 비교하여 홍수예보에서의 활용성을 분석하였다. 홍수와 관련한 키워드를 중심으로 시 공간적 정보 및 추출이 가능한 웹 크롤러(Web Crawler) 프로그램을 작성하였으며 이를 토대로 데이터를 수집하였다. 수집한 데이터와 실제 홍수사상을 비교 분석을 한 결과 강우량 및 수위와 해당 지역에 대한 데이터의 양이 유사한 패턴을 보인 것으로 확인되었다. 실시간으로 데이터를 수집하고 이를 분석하여 리드타임을 충분히 확보한다면 홍수예측에 활용 가능할 것이라 생각된다. 본 연구는 한국건설기술연구원 19주요-대4-시드사업인 '커뮤니티 빅데이터 패턴 해석을 통한 수난(水難) 발생 및 규모 예측 기술 개발(20190126-001) '로 수행되었습니다.

  • PDF

Suggestions on how to convert official documents to Machine Readable (공문서의 기계가독형(Machine Readable) 전환 방법 제언)

  • Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.67
    • /
    • pp.99-138
    • /
    • 2021
  • In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.

Study of Trust Bigdata Platform (신뢰성 빅데이터 플렛폼의 연구)

  • Kim, Jeong-Joon;Kwak, Kwang-Jin;Lee, Don-Hee;Lee, Yong-Soo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.6
    • /
    • pp.225-230
    • /
    • 2016
  • Recently, Web has arisen large amount of data that to the development of the network and the Internet. In order to process it appeared that Big Data technology. Big Data technologies have been studied aiming a multifaceted and accurate analysis using existing regular data and a variety of data social data. But social data does not have the expertise and objectivity. And such manipulation and concealment and distortion of information have been raised troubling. Thus, this paper proposes for trust big data platform and will be described in detail. The big data platform proposed in this paper consists of data refiner, Data Analyzer, co-truster, visualizer, searcher, etc.

A Study on Construction of Crime Prevention System using Big Data in Korea (한국에서 빅데이터를 활용한 범죄예방시스템 구축을 위한 연구)

  • Kim, SungJun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.217-221
    • /
    • 2017
  • Proactive prevention is important for crime. Past crimes have focused on coping after death and punishing them. But with Big Data technology, crime can be prevented spontaneously. Big data can predict the behavior of criminals or potential criminals. This article discusses how to build a big data system for crime prevention. Specifically, it deals with the way to combine unstructured data of big data with basic form data, and as a result, designs crime prevention system. Through this study, it is expected that the possibility of using big data for crime prevention is described through fingerprints, and it is expected to help crime prevention program and research in future.

Prediction of Onion Purchase Using Structured and Unstructured Big Data (정형 및 비정형 빅데이터를 이용한 양파 소비 예측)

  • Rah, HyungChul;Oh, Eunhwa;Yoo, Do-il;Cho, Wan-Sup;Nasridinov, Aziz;Park, Sungho;Cho, Youngbeen;Yoo, Kwan-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.30-37
    • /
    • 2018
  • The social media data and the broadcasting data related to onion as well as agri-food consumer panel data were collected and investigated if the amount of money spent to purchase onion in year 2014 when onion price plunged latest were correlated with the frequencies of onion-related keywords in the social media data and the broadcasting programs because onion price in year 2018 is expected to plunge due to overproduction and there has been needs to analyze impacts of social media and broadcasting program on onion purchase in the previous similar events, and identify potential factors that can promote onion consumption in advance. What we identified from our study include a) broadcasting news programs mentioning words "onion," were correlated with onion purchase with 3 - 6 weeks in advance; b) broadcasting entertainment programs mentioning words "onion and health," were correlated with onion purchase with 11 weeks in advance; c) blog mentioning words "onion and efficacy," were correlated with onion purchase with 5 weeks in advance. Our study provided a case on how social media and broadcasting programs could be analyzed for their effects on consumer purchase behavior using big data collection and analysis in the field of agriculture. We propose to use the findings from the study may be applied to promote onion consumption.

Design of Streaming based Unstructured-Data Collecting Framework in IoT Environment (IoT 환경에서 스트리밍 기반의 비정형 데이터 수집 프레임워크 설계)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.57-58
    • /
    • 2017
  • 사물인터넷 환경의 다양한 기기에서는 매초마다 시스템 로그 데이터, 온도, 습도, 조도 및 위치 정보 등과 같은 데이터를 지속적으로 생성한다. 이렇게 생성된 데이터는 기기 안에서 대부분 소멸되거나 수집된다 하더라도 시스템 개선의 일부 목적으로 활용하는데 그칠 뿐이다. 본 논문에서는 각각의 사물인터넷 기기에서 발생하는 비정형 데이터를 스트리밍 방식을 통해 수집 서버로 전송하고 이를 유연한 스키마 구조를 가지는 NoSQL 데이터베이스에 적재하는 프레임워크 설계를 제안한다. 이렇게 수많은 장비로부터 수집된 로그 및 센싱 데이터는 빅데이터 분석을 통해 산업의 현장에서 생산성 향상을 위해 사용할 수 있으며 공공의 목적으로 도심지의 교통문제 해소와 재난 및 재해 예측에 활용될 수 있다.

  • PDF

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

Methodology of Local Government Policy Issues Through Big Data Analysis (빅데이터 분석을 통한 지방자치단체 정책이슈 도출 방법론)

  • Kim, Yong-Jin;Kim, Do-Young
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.10
    • /
    • pp.229-235
    • /
    • 2018
  • The purpose of this study is to propose a method to utilize Big Data Analysis to find policy issues of local governments in the reality that utilization of big data becomes increasingly important in efficient and effective policy making process. For this purpose, this study analyzed the 180,000 articles of Suwon city for the past three years and identified policy issues and evaluated policy priorities through IPA analysis. The results of this study showed that the analysis of semi-formal big data through newspaper articles is effective in deriving the differentiated policy issues of different local autonomous bodies from the main issues in the nation, In this way, the methodology of finding policy issues through the analysis of big data suggested in this study means that local governments can effectively identify policy issues and effectively identify the people. In addition, the methodology proposed in this study is expected to be applicable to the policy issues through the analysis of various semi - formal and informal big data such as online civil complaint data of the local government, resident SNS.