• Title/Summary/Keyword: Big data collection

Search Result 342, Processing Time 0.031 seconds

Keyword Analysis of Arboretums and Botanical Gardens Using Social Big Data

  • Shin, Hyun-Tak;Kim, Sang-Jun;Sung, Jung-Won
    • Journal of People, Plants, and Environment
    • /
    • v.23 no.2
    • /
    • pp.233-243
    • /
    • 2020
  • This study collects social big data used in various fields in the past 9 years and explains the patterns of major keywords of the arboretums and botanical gardens to use as the basic data to establish operational strategies for future arboretums and botanical gardens. A total of 6,245,278 cases of data were collected: 4,250,583 from blogs (68.1%), 1,843,677 from online cafes (29.5%), and 151,018 from knowledge search engine (2.4%). As a result of refining valid data, 1,223,162 cases were selected for analysis. We came up with keywords through big data, and used big data program Textom to derive keywords of arboretums and botanical gardens using text mining analysis. As a result, we identified keywords such as 'travel', 'picnic', 'children', 'festival', 'experience', 'Garden of Morning Calm', 'program', 'recreation forest', 'healing', and 'museum'. As a result of keyword analysis, we found that keywords such as 'healing', 'tree', 'experience', 'garden', and 'Garden of Morning Calm' received high public interest. We conducted word cloud analysis by extracting keywords with high frequency in total 6,245,278 titles on social media. The results showed that arboretums and botanical gardens were perceived as spaces for relaxation and leisure such as 'travel', 'picnic' and 'recreation', and that people had high interest in educational aspects with keywords such as 'experience' and 'field trip'. The demand for rest and leisure space, education, and things to see and enjoy in arboretums and botanical gardens increased than in the past. Therefore, there must be differentiation and specialization strategies such as plant collection strategies, exhibition planning and programs in establishing future operation strategies.

A Case Study of Basic Data Science Education using Public Big Data Collection and Spreadsheets for Teacher Education (교사교육을 위한 공공 빅데이터 수집 및 스프레드시트 활용 기초 데이터과학 교육 사례 연구)

  • Hur, Kyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.3
    • /
    • pp.459-469
    • /
    • 2021
  • In this paper, a case study of basic data science practice education for field teachers and pre-service teachers was studied. In this paper, for basic data science education, spreadsheet software was used as a data collection and analysis tool. After that, we trained on statistics for data processing, predictive hypothesis, and predictive model verification. In addition, an educational case for collecting and processing thousands of public big data and verifying the population prediction hypothesis and prediction model was proposed. A 34-hour, 17-week curriculum using a spreadsheet tool was presented with the contents of such basic education in data science. As a tool for data collection, processing, and analysis, unlike Python, spreadsheets do not have the burden of learning program- ming languages and data structures, and have the advantage of visually learning theories of processing and anal- ysis of qualitative and quantitative data. As a result of this educational case study, three predictive hypothesis test cases were presented and analyzed. First, quantitative public data were collected to verify the hypothesis of predicting the difference in the mean value for each group of the population. Second, by collecting qualitative public data, the hypothesis of predicting the association within the qualitative data of the population was verified. Third, by collecting quantitative public data, the regression prediction model was verified according to the hypothesis of correlation prediction within the quantitative data of the population. And through the satisfaction analysis of pre-service and field teachers, the effectiveness of this education case in data science education was analyzed.

Big Data Analysis of the Women Who Score Goal Sports Entertainment Program: Focusing on Text Mining and Semantic Network Analysis.

  • Hyun-Myung, Kim;Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.222-230
    • /
    • 2023
  • The purpose of this study is to provide basic data on sports entertainment programs by collecting data on unstructured data generated by Naver and Google for SBS entertainment program 'Women Who Score Goal', which began regular broadcast in June 2021, and analyzing public perceptions through data mining, semantic matrix, and CONCOR analysis. Data collection was conducted using Textom, and 27,911 cases of data accumulated for 16 months from June 16, 2021 to October 15, 2022. For the collected data, 80 key keywords related to 'Kick a Goal' were derived through simple frequency and TF-IDF analysis through data mining. Semantic network analysis was conducted to analyze the relationship between the top 80 keywords analyzed through this process. The centrality was derived through the UCINET 6.0 program using NetDraw of UCINET 6.0, understanding the characteristics of the network, and visualizing the connection relationship between keywords to express it clearly. CONCOR analysis was conducted to derive a cluster of words with similar characteristics based on the semantic network. As a result of the analysis, it was analyzed as a 'program' cluster related to the broadcast content of 'Kick a Goal' and a 'Soccer' cluster, a sports event of 'Kick a Goal'. In addition to the scenes about the game of the cast, it was analyzed as an 'Everyday Life' cluster about training and daily life, and a cluster about 'Broadcast Manipulation' that disappointed viewers with manipulation of the game content.

A Study on Evaluation of the Analyzing and Collecting Method on Social Big Data Information (소셜 빅데이터 정보 수집 및 분석방법 평가에 대한 연구)

  • Song, Eun-Jee;Kang, Min-Sik
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.853-854
    • /
    • 2014
  • 서비스 산업에 있어 효율적인 경영을 위해서는 시시각각으로 변하는 고객의 니즈를 파악하기 위해 그 어느 때 보다도 고객피드백이 필요한 시대이다. 기존의 설문조사를 이용한 방법은 자발적이고 즉각적인 고객의 의견을 수집하는데 한계가 있어 최근에는 서비스의 즉각적이고 사실적인 피드백을 얻기 위해서 조사에 대한 인지 없이 능동적이고 자발적으로 작성한 소셜미디어 상의 게시글을 수집하고 분석하는 방법을 이용하여 고객의 피드백을 파악하고 있다. 본 연구에서는 이러한 소셜 미디어상의 빅데이터 정보를 분석하는 기술의 적합성을 평가하는 방법을 제안한다. 수집 적합성 평가는 사전 설정된 수집규칙에 의해 수집된 수집데이터에 대한 검증방안을 수립하고 샘플링 조사를 수행하여 목표 수준의 정확도가 이루어지지 않을 경우 수집엔진에 대한 기능 보완 및 수집 주기 재설정 등 수집 규칙을 재설정하고 샘플조사 범위를 확대하여 평가하는 일련의 과정 반복을 통해 수집 정확도를 향상시킨다.

  • PDF

Design of Log Management System based on Document Database for Big Data Management (빅데이터 관리를 위한 문서형 DB 기반 로그관리 시스템 설계)

  • Ryu, Chang-ju;Han, Myeong-ho;Han, Seung-jo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.11
    • /
    • pp.2629-2636
    • /
    • 2015
  • Recently Big Data management have a rapid increases interest in IT field, much research conducting to solve a problem of real-time processing to Big Data. Lots of resources are required for the ability to store data in real-time over the network but there is the problem of introducing an analyzing system due to aspect of high cost. Need of redesign of the system for low cost and high efficiency had been increasing to solve the problem. In this paper, the document type of database, MongoDB, is used for design a log management system based a document type of database, that is good at big data managing. The suggested log management system is more efficient than other method on log collection and processing, and it is strong on data forgery through the performance evaluation.

Analysis of CSR·CSV·ESG Research Trends - Based on Big Data Analysis - (CSR·CSV·ESG 연구 동향 분석 - 빅데이터 분석을 중심으로 -)

  • Lee, Eun Ji;Moon, Jaeyoung
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.751-776
    • /
    • 2022
  • Purpose: The purpose of this paper is to present implications by analyzing research trends on CSR, CSV and ESG by text analysis and visual analysis(Comprehensive/ Fields / Years-based) which are big data analyses, by collecting data based on previous studies on CSR, CSV and ESG. Methods: For the collection of analysis data, deep learning was used in the integrated search on the Academic Research Information Service (www.riss.kr) to search for "CSR", "CSV" and "ESG" as search terms, and the Korean abstracts and keyword were scrapped out of the extracted paper and they are organize into EXCEL. For the final step, CSR 2,847 papers, CSV 395 papers, ESG 555 papers derived were analyzed using the Rx64 4.0.2 program and Rstudio using text mining, one of the big data analysis techniques, and Word Cloud for visualization. Results: The results of this study are as follows; CSR, CSV, and ESG studies showed that research slowed down somewhat before 2010, but research increased rapidly until recently in 2019. Research have been found to be heavily researched in the fields of social science, art and physical education, and engineering. As a result of the study, there were many keyword of 'corporate', 'social', and 'responsibility', which were similar in the word cloud analysis. Looking at the frequent keyword and word cloud analysis by field and year, overall keyword were derived similar to all keyword by year. However, some differences appeared in each field. Conclusion: Government support and expert support for CSR, CSV and ESG should be activated, and researches on technology-based strategies are needed. In the future, it is necessary to take various approaches to them. If researches are conducted in consideration of the environment or energy, it is judged that bigger implications can be presented.

RHadoop platform for K-Means clustering of big data (빅데이터 K-평균 클러스터링을 위한 RHadoop 플랫폼)

  • Shin, Ji Eun;Oh, Yoon Sik;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.609-619
    • /
    • 2016
  • RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. In this paper, we implement K-Means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. The main idea introduces a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. We showed that our K-Means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases. We also implemented Elbow method with MapReduce for finding the optimum number of clusters for K-Means clustering on large dataset. Comparison with our MapReduce implementation of Elbow method and classical kmeans() in R with small data showed similar results.

Design and Implementation of a Web Crawler System for Collection of Structured and Unstructured Data (정형 및 비정형 데이터 수집을 위한 웹 크롤러 시스템 설계 및 구현)

  • Bae, Seong Won;Lee, Hyun Dong;Cho, DaeSoo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.199-209
    • /
    • 2018
  • Recently, services provided to consumers are increasingly being combined with big data such as low-priced shopping, customized advertisement, and product recommendation. With the increasing importance of big data, the web crawler that collects data from the web has also become important. However, there are two problems with existing web crawlers. First, if the URL is hidden from the link, it can not be accessed by the URL. The second is the inefficiency of fetching more data than the user wants. Therefore, in this paper, through the Casper.js which can control the DOM in the headless brwoser, DOM event is generated by accessing the URL to the hidden link. We also propose an intelligent web crawler system that allows users to make steps to fine-tune both Structured and unstructured data to bring only the data they want. Finally, we show the superiority of the proposed crawler system through the performance evaluation results of the existing web crawler and the proposed web crawler.

De-identification Policy Comparison and Activation Plan for Big Data Industry (비식별화 정책 비교 및 빅데이터 산업 활성화 방안)

  • Lee, So-Jin;Jin, Chae-Eun;Jeon, Min-Ji;Lee, Jo-Eun;Kim, Su-Jeong;Lee, Sang-Hyun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.2 no.4
    • /
    • pp.71-76
    • /
    • 2016
  • In this study, de-identification policies of the US, the UK, Japan, China and Korea are compared to suggest a future direction of de-identification regulations and a method for vitalizing the big data industry. Efficiently using the de-identification technology and the standard of adequacy evaluation contributes to using personal information for the industry to develop services and technology while not violating the right of private lives and avoiding the restrictions specified in the Personal Information Protection Act. As a counteraction, the re-identification issue may occur, for re-identifying each person as a de-identified data collection. From the perspective of business, it is necessary to mitigate schemes for discarding some regulations and using big data, and also necessary to strengthen security and refine regulations from the perspective of information security.

The Study of Patient Prediction Models on Flu, Pneumonia and HFMD Using Big Data (빅데이터를 이용한 독감, 폐렴 및 수족구 환자수 예측 모델 연구)

  • Yu, Jong-Pil;Lee, Byung-Uk;Lee, Cha-min;Lee, Ji-Eun;Kim, Min-sung;Hwang, Jae-won
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.55-62
    • /
    • 2018
  • In this study, we have developed a model for predicting the number of patients (flu, pneumonia, and outbreak) using Big Data, which has been mainly performed overseas. Existing patient number system by government adopt procedures that collects the actual number and percentage of patients from several big hospital. However, prediction model in this study was developed combing a real-time collection of disease-related words and various other climate data provided in real time. Also, prediction number of patients were counted by machine learning algorithm method. The advantage of this model is that if the epidemic spreads rapidly, the propagation rate can be grasped in real time. Also, we used a variety types of data to complement the failures in Google Flu Trends.