• Title/Summary/Keyword: Bigdata analysis

Search Result 345, Processing Time 0.022 seconds

Bigdata Analysis on Keyword by Generations through Text Mining: Focused on Board of Nate Pann in 10s, 20s, 30s (텍스트 마이닝을 활용한 세대별 키워드 빅데이터 분석: 네이트판 10대·20대·30대 게시판을 중심으로)

  • Jeong, Baek;Bae, Sungwon;Hwangbo, Yujeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.513-516
    • /
    • 2022
  • 본 논문에서는 텍스트 마이닝 기법을 이용하여 MZ 세대를 이해하는 키워드를 도출하고자 한다. MZ 세대의 비중이 높아지면서, MZ 세대를 분석하려고 하는 많은 연구들이 수행되고 있다. 이에 본 연구에서는 MZ 세대를 이해하기 위하여 네이트 판의 연령별 게시판 크롤링을 통해 빅데이터를 수집하였다. 그리고 텍스트 마이닝 기법을 활용하여 10대, 20대, 30대의 각각의 키워드를 도출할 수 있었다. 본 논문에서 도출된 키워드는 이는 MZ 세대를 이해하는데 중요한 키워드로 볼 수 있을 것이다. 향후 연구로는 MZ 세대와 기성 세대를 비교하기 위하여 추가 크롤링을 통해 세대 간 비교 연구를 수행하고자 한다.

  • PDF

A Comparative Analysis of Success Factors Between Social Commerce and Multichannel Distribution Using Text Mining Techniques (텍스트마이닝 기법을 이용한 소셜커머스와 멀티채널 유통업체 간 성공요인 비교 연구)

  • Choi, Hyun-Seung;Kim, Ye-Sol;Cho, Hyuk-Jun;Kang, Ju-Young
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.35-44
    • /
    • 2016
  • Today there is a fierce competition between social commerce and multi-channel distribution in korea and it is need to do comparative analysis about success factors between social commerce and multi-channel distribution. Unlike the other studies that have only used survey method, this study analyzed the success factors between social commerce and multichannel distribution using text mining techniques. We expect that the result of the study not only gives the practical implication for making the competition strategy of the retailers but also contributes to the diverse extension research.

  • PDF

Post-Examination Analysis on the Student Dropout Prediction Index (학생 중도탈락 예측지수에 관한 사후검증 연구)

  • Lee, Ji-Eun
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.175-183
    • /
    • 2019
  • Drop-out issue is one of the challenges of cyber university. There are about 130,000 students enrolled in cyber universities, but the dropout rate is also very high. To lower the dropout rate, cyber universities invest heavily in learning analytics. Some cyber universities analyze the possibility of dropout and actively support students who are more likely to drop out. The purpose of this paper is to identify the learning data affecting the dropout prediction index. As a result of the analysis, it is confirmed that number of lessons(progress), credits, achievement and leave of absence have a significant effect on dropout rate. It is necessary to increase the accuracy of the prediction model through post-test on the student dropout prediction index.

  • PDF

Comparing Customer Reactions Before and After of a Smart Watch Release through Opinion Mining (오피니언 마이닝을 통한 스마트 워치 출시 전후 소비자 반응 분석)

  • Lee, Jongho;Park, Heejun
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.1-7
    • /
    • 2016
  • Social media such as twitter has been popular by the diffusion of internet, and thanks to the radical improvement of computational ability of computers big data analysis became possible. This research is regarding about smart watch which is receiving attention as post-smartphone technology. Among various types of smart watch, this research focuses on the recently released Samsung Galaxy Gear S2. The main purpose of the research is to analyze customer's actual twitter data that was produced before and after the release of the smart watch to the market. Through the analysis, this research provides practical marketing strategy guideline, and also the analysis framework used in this research can be a research framework for other area and product researches.

  • PDF

A dimensional reduction method in cluster analysis for multidimensional data: principal component analysis and factor analysis comparison (다차원 데이터의 군집분석을 위한 차원축소 방법: 주성분분석 및 요인분석 비교)

  • Hong, Jun-Ho;Oh, Min-Ji;Cho, Yong-Been;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.135-143
    • /
    • 2020
  • This paper proposes a pre-processing method and a dimensional reduction method in the analysis of shopping carts where there are many correlations between variables when dividing the types of consumers in the agri-food consumer panel data. Cluster analysis is a widely used method for dividing observational objects into several clusters in multivariate data. However, cluster analysis through dimensional reduction may be more effective when several variables are related. In this paper, the food consumption data surveyed of 1,987 households was clustered using the K-means method, and 17 variables were re-selected to divide it into the clusters. Principal component analysis and factor analysis were compared as the solution for multicollinearity problems and as the way to reduce dimensions for clustering. In this study, both principal component analysis and factor analysis reduced the dataset into two dimensions. Although the principal component analysis divided the dataset into three clusters, it did not seem that the difference among the characteristics of the cluster appeared well. However, the characteristics of the clusters in the consumption pattern were well distinguished under the factor analysis method.

The Factors Affecting Promotion Effects: SNS Analysis for Franchise Food Service Industry (프로모션 효과에 영향을 미치는 요인: 프랜차이즈 외식 산업의 SNS 버즈 분석을 중심으로)

  • Jeong, Min-Seo;Lee, Cheol-Jin;Yoon, Ji-Hee;Jung, Yoonhyuk
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.57-66
    • /
    • 2017
  • Companies has been investing enormous resources in promotion as the market keeps changing rapidly. Therefore, there are growing needs to measure the impact of a promotion on revenue growth. To investigate the effect of promotion in franchise food service industry, this study empirically analyzed text data from Twitter, one of the dominant social network services. Our findings show that a gap between promotions, promotion duration, and season have a significant influence on a volume of twitter buzz, which represents a promotion effect in our study. Next, we tried to analyze the reason why those factors were related to the promotion effect. Finally, we suggested promotion strategies related to each influential factor depending on types of business in food service industry.

  • PDF

The Analysis of HPAI Using CDR Data (CDR 자료를 이용한 고병원성 조류인플루엔자 분석)

  • Choi, Dae-Woo;Joo, Jae-Yun;Song, Yu-Han;Han, Ye-Ji
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.13-22
    • /
    • 2019
  • This study was conducted with funding from the government (Ministry of Agriculture, Food and Rural Affairs) in 2018 with support from the Agricultural, Food, and Rural Affairs Agency, 318069-03-HD040, and is based on artificial intelligence-based HPAI spread analysis and patterning. The inflow of highly pathogenic avian influenza is coming through migratory birds from abroad, but it is not known exactly what pathways provide the farm with the cause of the infection. And the transition between farms from the generated farms only assumes that the vehicle is the main cause, and the main cause of the spread is not exactly known. Based on the call detailed records (CDR) data provided by KT, the study aims to see how people visiting migratory bird-watching sites, presumed to be the site of the outbreak, will flow through infected farms.

  • PDF

Comparision of Missing Imputaion Methods In fine dust data (미세먼지 자료에서의 결측치 대체 방법 비교)

  • Kim, YeonJin;Park, HeonJin
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.105-114
    • /
    • 2019
  • Missing value replacement is one of the big issues in data analysis. If you ignore the occurrence of the missing value and proceed with the analysis, a bias can occur and give incorrect results for the estimate. In this paper, we need to find and apply an appropriate alternative to missing data from weather data. Through this, we attempted to clarify and compare the simulations for various situations using existing methods such as MICE and MissForest based on R and time series-based models. When comparing these results with each variable, it was determined that the kalman filter of the auto arima model using the ImputeTS package and the MissForest model gave good results in the weather data.

  • PDF

A Model of Predictive Movie 10 Million Spectators through Big Data Analysis (빅데이터 분석을 통한 천만 관객 영화 예측 모델)

  • Yu, Jong-Pil;Lee, Eung-hwan
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.63-71
    • /
    • 2018
  • In the last five years (2013~2017), we analyzed what factors influenced Korean films that have surpassed 10 million viewers in the Korean movie industry, where the total number of moviegoers is over 200 million. In general, many people consider the number of screens and ratings as important factors that affect the audience's success. In this study, four additional factors, including the number of screens and ratings, were established to establish a hypothesis and correlate it with the presence of 10 million spectators through big data analysis. The results were significant, with 91 percent accuracy in predicting 10 million viewers and 99.4 percent accuracy in estimating cumulative attendance.

Comparative Analysis of Prediction Performance of Aperiodic Time Series Data using LSTM and Bi-LSTM (LSTM과 Bi-LSTM을 사용한 비주기성 시계열 데이터 예측 성능 비교 분석)

  • Ju-Hyung Lee;Jun-Ki Hong
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.217-224
    • /
    • 2022
  • Since online shopping has become common, people can easily buy fashion goods anytime, anywhere. Therefore, consumers quickly respond to various environmental variables such as weather and sales prices. Therefore, utilizing big data for efficient inventory management has become very important in the fashion industry. In this paper, the changes in sales volume of fashion goods due to changes in temperature is analyzed via the proposed big data analysis algorithm by utilizing actual big data from Korean fashion company 'A'. According to the simulation results, it was confirmed that Bidirectional-LSTM(Bi-LSTM) compared to LSTM(Long Short-Term Memory) takes more simulation time about more than 50%, but the prediction accuracy of non-periodic time series data such as clothing product sales data is the same.