• 제목/요약/키워드: big data mining

Search Result 679, Processing Time 0.032 seconds

An Insight Study on Keyword of IoT Utilizing Big Data Analysis (빅데이터 분석을 활용한 사물인터넷 키워드에 관한 조망)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.146-147
    • /
    • 2017
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Internet of things" keyword, one month as of october 8, 2017. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Internet of things" has been found to be technology (995). This study suggests theoretical implications based on the results.

  • PDF

Study of the Activation Plan for Rural Tourism of the Jeollabuk-do Using Big Data Analysis (빅데이터 분석을 통한 농촌관광 실태와 활성화 방안 연구: 전라북도를 중심으로)

  • Park, Ro Un;Lee, Ki Hoon
    • The Korean Journal of Community Living Science
    • /
    • v.27 no.spc
    • /
    • pp.665-679
    • /
    • 2016
  • This study examined the main factors for activating rural tourism of Jeollabuk-do using big data analysis. The tourism big data was gathered from public open data sources and social network services (SNS), and the analysis tools, 'Opinion Mining', 'Text Mining', and 'Social Network Analysis(SNA)' were used. The opinion mining and text mining analysis identified the key local contents of the 14 areas of Jeollabuk-do and the evaluations of customers on rural tourism. Social network analysis detected the relationships between their contents and determined the importance of the contents. The results of this research showed that each location in Jeollabuk-do had their specific contents attracting visitors and the number of contents affected the scale of tourists. In addition, the number of visitors might be large when their tourism contents were strongly correlated with the other contents. Hence, strong connections among their contents are a point to activate rural tourism. Social network analysis divided the contents into several clusters and derived the eigenvector centralities of the content nodes implying the importance of them in the network. Tourism was active when the nodes at high value of the eigenvector centrality were distributed evenly in every cluster; however the results were contrary when the nodes were located in a few clusters. This study suggests an action plan to extend rural tourism that develop valuable contents and connect the content clusters properly.

A Development on a Predictive Model for Buying Unemployment Insurance Program Based on Public Data (공공데이터 기반 고용보험 가입 예측 모델 개발 연구)

  • Cho, Minsu;Kim, Dohyeon;Song, Minseok;Kim, Kwangyong;Jeong, Chungsik;Kim, Kidae
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.17-31
    • /
    • 2017
  • With the development of the big data environment, public institutions also have been providing big data infrastructures. Public data is one of the typical examples, and numerous applications using public data have been provided. One of the cases is related to the employment insurance. All employers have to make contracts for the employment insurance for all employees to protect the rights. However, there are abundant cases where employers avoid to buy insurances. To overcome these challenges, a data-driven approach is needed; however, there are lacks of methodologies to integrate, manage, and analyze the public data. In this paper, we propose a methodology to build a predictive model for identifying whether employers have made the contracts of employment insurance based on public data. The methodology includes collection, integration, pre-processing, analysis of data and generating prediction models based on process mining and data mining techniques. Also, we verify the methodology with case studies.

  • PDF

A Study on Continuous Monitoring Reinforcement for Sales Audit Using Process Mining Under Big Data Environment (빅데이터 환경에서 프로세스 마이닝을 이용한 영업감사 상시 모니터링 강화에 대한 연구)

  • Yoo, Young-Seok;Park, Han-Gyu;Back, Seung-Hoon;Hong, Sung-Chan
    • Journal of Internet Computing and Services
    • /
    • v.17 no.6
    • /
    • pp.123-131
    • /
    • 2016
  • Process mining in big data environment utilize a number of data were generated from the business process. It generates lots of knowledge and insights regarding implementation and improvement of the process through the event log of the company's enterprise resource planning (ERP) system. In recent years, various research activities engaged with the audit work of company organizations are trying actively by using the maximum strength of the mining process. However, domestic studies on applicable sales auditing system for the process mining are insufficient under big data environment. Therefore, we propose process-mining methods that can be optimally applied to online and traditional auditing system. In advance, we propose continuous monitoring information system that can early detect and prevent the risk under the big data environment by monitoring risk factors in the organizations of enterprise. The scope of the research of this paper is to design a pre-verification system for risk factor via practical examples in sales auditing. Furthermore, realizations of preventive audit, continuous monitoring for high risk, reduction of fraud, and timely action for violation of rules are enhanced by proposed sales auditing system. According to the simulation results, avoidance of financial risks, reduction of audit period, and improvement of audit quality are represented.

iSSD-Based Collaborative Processing for Big Data Mining (효율적인 빅 데이터 마이닝을 위한 iSSD 기반 협업 처리 방안)

  • Jo, Yong-Yoen;Kim, Sang-Wook;Bae, Duck-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.460-470
    • /
    • 2017
  • We address how to handle big data mining effectively using the intelligent SSD (iSSD). ISSD is a storage device equipped with computing power inside SSD for reducing the transferring cost and for processing data nearby SSD where the data is stored. We first introduce the structural characteristics of iSSD for efficient data processing. Then, we present how to process data mining algorithms by using iSSD. Finally, we discuss how to improve the performance of data mining algorithms significantly by exploiting heterogeneous computing environment where host CPUs and GPU coexist for maximizing the performance.

Analysis of 'Better Class' Characteristics and Patterns from College Lecture Evaluation by Longitudinal Big Data

  • Nam, Min-Woo;Cho, Eun-Soon
    • International Journal of Contents
    • /
    • v.15 no.3
    • /
    • pp.7-12
    • /
    • 2019
  • The purpose of this study was to analyze characteristics and patterns of 'better class' by using the longitudinal text mining big data analysis technique from subjective lecture evaluation comments. First, this study classified upper 30% classes to deduce certain characteristics and patterns from every five-year subjective text data for 10 years. A total of 47,177courses (100%) from spring semester 2005 to fall semester 2014 were analyzed from a university at a metropolitan city in the mid area of South Korea. This study extracted meaningful words such as good, course, professor, appreciation, lecture, interesting, useful, know, easy, improvement, progress, teaching material, passion, and concern from the order of frequency 2005-2009. The other set of words were class, appreciation, professor, good, course, interesting, understanding, useful, help, student, effort, thinking, not difficult, explanation, lecture, hard, pleasant, easy, study, examination, like, various, fun, and knowledge 2010-2014. This study suggests that the characteristics and patterns of 'better class' at college, should be analyzed according to different academic code such as liberal arts, fine arts, social science, engineering, math and science, and etc.

Understanding the Food Hygiene of Cruise through the Big Data Analytics using the Web Crawling and Text Mining

  • Shuting, Tao;Kang, Byongnam;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.24 no.2
    • /
    • pp.34-43
    • /
    • 2018
  • The objective of this study was to acquire a general and text-based awareness and recognition of cruise food hygiene through big data analytics. For the purpose, this study collected data with conducting the keyword "food hygiene, cruise" on the web pages and news on Google, during October 1st, 2015 to October 1st, 2017 (two years). The data collection was processed by SCTM which is a data collecting and processing program and eventually, 899 kb, approximately 20,000 words were collected. For the data analysis, UCINET 6.0 packaged with visualization tool-Netdraw was utilized. As a result of the data analysis, the words such as jobs, news, showed the high frequency while the results of centrality (Freeman's degree centrality and Eigenvector centrality) and proximity indicated the distinct rank with the frequency. Meanwhile, as for the result of CONCOR analysis, 4 segmentations were created as "food hygiene group", "person group", "location related group" and "brand group". The diagnosis of this study for the food hygiene in cruise industry through big data is expected to provide instrumental implications both for academia research and empirical application.

Text Mining and Visualization of Unstructured Data Using Big Data Analytical Tool R (빅데이터 분석 도구 R을 이용한 비정형 데이터 텍스트 마이닝과 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1199-1205
    • /
    • 2021
  • In the era of big data, not only structured data well organized in databases, but also the Internet, social network services, it is very important to effectively analyze unstructured big data such as web documents, e-mails, and social data generated in real time in mobile environment. Big data analysis is the process of creating new value by discovering meaningful new correlations, patterns, and trends in big data stored in data storage. We intend to summarize and visualize the analysis results through frequency analysis of unstructured article data using R language, a big data analysis tool. The data used in this study was analyzed for total 104 papers in the Mon-May 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 1,538 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Analysis of Real Estate Market Trend Using Text Mining and Big Data (빅데이터와 텍스트마이닝을 이용한 부동산시장 동향분석)

  • Chun, Hae-Jung
    • Journal of Digital Convergence
    • /
    • v.17 no.4
    • /
    • pp.49-55
    • /
    • 2019
  • This study is on the trend of real estate market using text mining and big data. The data were collected through internet news posted on Naver from August 2016 to August 2017. As a result of TF-IDF analysis, the frequency was high in the order of housing, sale, household, real estate market, and region. Many words related to policies such as loan, government, countermeasures, and regulations were extracted, and the region - related words appeared the most frequently in Seoul. The combination of the words related to the region showed that the frequencies of 'Seoul - Gangnam', 'Seoul - Metropolitan area', 'Gangnam - reconstruction' and 'Seoul - reconstruction' appeared frequently. It can be seen that the people's interest and expectation about the reconstruction of Gangnam area is high.

A Study on the Characteristics of Amekaji Fashion Trends Using Big Data Text Mining Analysis (빅데이터 텍스트 마이닝 분석을 활용한 아메카지 패션 트렌드 특징 고찰)

  • Kim, Gihyung
    • Journal of Fashion Business
    • /
    • v.26 no.3
    • /
    • pp.138-154
    • /
    • 2022
  • The purpose of this study is to identify the characteristics of domestic American casual fashion trends using big data text mining analysis. 108,524 posts and 2,038,999 extracted keywords from Naver and Daum related to American casual fashion in the past 5 years were collected and refined by the Textom program, and frequency analysis, word cloud, N-gram, centrality analysis, and CONCOR analysis were performed. The frequency analysis, 'vintage', 'style', 'daily look', 'coordination', 'workwear', 'men's wear' appeared as the main keywords. The main nationality of the representative brands was Japanese, followed by American, Korean, and others. As a result of the CONCOR analysis, four clusters were derived: "general American casual trend", "vintage taste", "direct sales mania", and "American styling". This study results showed that Japanese American casual clothes are influenced by American casual clothes, and American casual fashion in Korea, which has been reinterpreted, is completed with various coordination and creative styles such as workwear, street, military, classic, etc., focusing on items and brands. Looks were worn and shared on social networks, and the existence of an active consumer group and market potential to obtain genuine products, ranging from second-hand transactions for limited edition vintages to individual transactions were also confirmed. The significance of this study is that it presented the characteristics of American casual fashion trends academically based on online text data that the public actually uses because it has been spread by the public.