• Title/Summary/Keyword: Learning Analytics

Search Result 168, Processing Time 0.022 seconds

A Study on Application of Machine Learning Algorithms to Visitor Marketing in Sports Stadium (기계학습 알고리즘을 사용한 스포츠 경기장 방문객 마케팅 적용 방안)

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.27-33
    • /
    • 2018
  • In this study, we analyze the big data of visitors who are looking for a sports stadium in marketing field and conduct research to provide customized marketing service to consumers. For this purpose, we intend to derive a similar visitor group by using the K-means clustering method. Also, we will use the K-nearest neighbors method to predict the store of interest for new visitors. As a result of the experiment, it was possible to provide a marketing service suitable for each group attribute by deriving a group of similar visitors through the above two algorithms, and it was possible to recommend products and events for new visitors.

Design of Anomaly Detection System Based on Big Data in Internet of Things (빅데이터 기반의 IoT 이상 장애 탐지 시스템 설계)

  • Na, Sung Il;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.19 no.2
    • /
    • pp.377-383
    • /
    • 2018
  • Internet of Things (IoT) is producing various data as the smart environment comes. The IoT data collection is used as important data to judge systems's status. Therefore, it is important to monitor the anomaly state of the sensor in real-time and to detect anomaly data. However, it is necessary to convert the IoT data into a normalized data structure for anomaly detection because of the variety of data structures and protocols. Thus, we can expect a good quality effect such as accurate analysis data quality and service quality. In this paper, we propose an anomaly detection system based on big data from collected sensor data. The proposed system is applied to ensure anomaly detection and keep data quality. In addition, we applied the machine learning model of support vector machine using anomaly detection based on time-series data. As a result, machine learning using preprocessed data was able to accurately detect and predict anomaly.

Big Data Analytics for Countermeasure System Against GPS Jamming (빅데이터 분석을 활용한 GPS 전파교란 대응방안)

  • Choi, Young-Dong;Han, Kyeong-Seok
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.4
    • /
    • pp.296-301
    • /
    • 2019
  • Artificial intelligence is closely linked to our real lives, leading innovation in various fields. Especially, as a means of transportation possessing artificial intelligence, autonomous unmanned vehicles are actively researched and are expected to be put into practical use soon. Autonomous cars and autonomous unmanned aerial vehicles are required to equip accurate navigation system so that they can find out their present position and move to their destination. At present, the navigation of transportation that we operate is mostly dependent on GPS. However, GPS is vulnerable to external intereference. In fact, since 2010, North Korea has jammed GPS several times, causing serious disruptions to mobile communications and aircraft operations. Therefore, in order to ensure safety in the operation of the autonomous unmanned vehicles and to prevent serious accidents caused by the intereference, rapid situation judgment and countermeasure are required. In this paper, based on big data and machine learning technology, we propose a countermeasure system for GPS interference that supports decision making by applying John Boyd's OODA loop cycle (detection - direction setting - determination - action).

Stock News Dataset Quality Assessment by Evaluating the Data Distribution and the Sentiment Prediction

  • Alasmari, Eman;Hamdy, Mohamed;Alyoubi, Khaled H.;Alotaibi, Fahd Saleh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • This work provides a reliable and classified stocks dataset merged with Saudi stock news. This dataset allows researchers to analyze and better understand the realities, impacts, and relationships between stock news and stock fluctuations. The data were collected from the Saudi stock market via the Corporate News (CN) and Historical Data Stocks (HDS) datasets. As their names suggest, CN contains news, and HDS provides information concerning how stock values change over time. Both datasets cover the period from 2011 to 2019, have 30,098 rows, and have 16 variables-four of which they share and 12 of which differ. Therefore, the combined dataset presented here includes 30,098 published news pieces and information about stock fluctuations across nine years. Stock news polarity has been interpreted in various ways by native Arabic speakers associated with the stock domain. Therefore, this polarity was categorized manually based on Arabic semantics. As the Saudi stock market massively contributes to the international economy, this dataset is essential for stock investors and analyzers. The dataset has been prepared for educational and scientific purposes, motivated by the scarcity of data describing the impact of Saudi stock news on stock activities. It will, therefore, be useful across many sectors, including stock market analytics, data mining, statistics, machine learning, and deep learning. The data evaluation is applied by testing the data distribution of the categories and the sentiment prediction-the data distribution over classes and sentiment prediction accuracy. The results show that the data distribution of the polarity over sectors is considered a balanced distribution. The NB model is developed to evaluate the data quality based on sentiment classification, proving the data reliability by achieving 68% accuracy. So, the data evaluation results ensure dataset reliability, readiness, and high quality for any usage.

Genetic Programming based Manufacutring Big Data Analytics (유전 프로그래밍을 활용한 제조 빅데이터 분석 방법 연구)

  • Oh, Sanghoun;Ahn, Chang Wook
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.31-40
    • /
    • 2020
  • Currently, black-box-based machine learning algorithms are used to analyze big data in manufacturing. This algorithm has the advantage of having high analytical consistency, but has the disadvantage that it is difficult to interpret the analysis results. However, in the manufacturing industry, it is important to verify the basis of the results and the validity of deriving the analysis algorithms through analysis based on the manufacturing process principle. To overcome the limitation of explanatory power as a result of this machine learning algorithm, we propose a manufacturing big data analysis method using genetic programming. This algorithm is one of well-known evolutionary algorithms, which repeats evolutionary operators such as selection, crossover, mutation that mimic biological evolution to find the optimal solution. Then, the solution is expressed as a relationship between variables using mathematical symbols, and the solution with the highest explanatory power is finally selected. Through this, input and output variable relations are derived to formulate the results, so it is possible to interpret the intuitive manufacturing mechanism, and it is also possible to derive manufacturing principles that cannot be interpreted based on the relationship between variables represented by formulas. The proposed technique showed equal or superior performance as a result of comparing and analyzing performance with a typical machine learning algorithm. In the future, the possibility of using various manufacturing fields was verified through the technique.

Experiencing with Splunk, a Platform for Analyzing Machine Data, for Improving Recruitment Support Services in WorldJob+ (머신 데이터 분석용 플랫폼 스플렁크를 이용한 취업지원 서비스 개선에 관한 연구 : 월드잡플러스 사례를 중심으로)

  • Lee, Jae Deug;Rhee, MoonKi Kyle;Kim, Mi Ryang
    • Journal of Digital Convergence
    • /
    • v.16 no.3
    • /
    • pp.201-210
    • /
    • 2018
  • WorldJob+, being operated by The Human Resources Development Service of Korea, provides a recruitment support services to overseas companies wanting to hire talented Korean applicants and interns, and support the entire course from overseas advancement information check to enrollment, interview, and learning for young job-seekers. More than 300,000 young people have registered in WorldJob+, an overseas united information network, for job placement. To innovate WorldJob+'s services for young job-seekers, Splunk, a powerful platform for analyzing machine data, was introduced to collate and view system log files collected from its website. Leveraging Splunk's built-in data visualization and analytical features, WorldJob+ has built custom tools to gain insight into the operation of the recruitment supporting service system and to increase its integrity. Use cases include descriptive and predictive analytics for matching up services to allow employers and job seekers to be matched based on their respective needs and profiles, and connect jobseekers with the best recruiters and employers on the market, helping job seekers secure the best jobs fast. This paper will cover the numerous ways WorldJob+ has leveraged Splunk to improve its recruitment supporting services.

Performance Optimization Strategies for Fully Utilizing Apache Spark (아파치 스파크 활용 극대화를 위한 성능 최적화 기법)

  • Myung, Rohyoung;Yu, Heonchang;Choi, Sukyong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.1
    • /
    • pp.9-18
    • /
    • 2018
  • Enhancing performance of big data analytics in distributed environment has been issued because most of the big data related applications such as machine learning techniques and streaming services generally utilize distributed computing frameworks. Thus, optimizing performance of those applications at Spark has been actively researched. Since optimizing performance of the applications at distributed environment is challenging because it not only needs optimizing the applications themselves but also requires tuning of the distributed system configuration parameters. Although prior researches made a huge effort to improve execution performance, most of them only focused on one of three performance optimization aspect: application design, system tuning, hardware utilization. Thus, they couldn't handle an orchestration of those aspects. In this paper, we deeply analyze and model the application processing procedure of the Spark. Through the analyzed results, we propose performance optimization schemes for each step of the procedure: inner stage and outer stage. We also propose appropriate partitioning mechanism by analyzing relationship between partitioning parallelism and performance of the applications. We applied those three performance optimization schemes to WordCount, Pagerank, and Kmeans which are basic big data analytics and found nearly 50% performance improvement when all of those schemes are applied.

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

A Deep Learning Method for Cost-Effective Feed Weight Prediction of Automatic Feeder for Companion Animals (반려동물용 자동 사료급식기의 비용효율적 사료 중량 예측을 위한 딥러닝 방법)

  • Kim, Hoejung;Jeon, Yejin;Yi, Seunghyun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.263-278
    • /
    • 2022
  • With the recent advent of IoT technology, automatic pet feeders are being distributed so that owners can feed their companion animals while they are out. However, due to behaviors of pets, the method of measuring weight, which is important in automatic feeding, can be easily damaged and broken when using the scale. The 3D camera method has disadvantages due to its cost, and the 2D camera method has relatively poor accuracy when compared to 3D camera method. Hence, the purpose of this study is to propose a deep learning approach that can accurately estimate weight while simply using a 2D camera. For this, various convolutional neural networks were used, and among them, the ResNet101-based model showed the best performance: an average absolute error of 3.06 grams and an average absolute ratio error of 3.40%, which could be used commercially in terms of technical and financial viability. The result of this study can be useful for the practitioners to predict the weight of a standardized object such as feed only through an easy 2D image.

Methodology for Identifying Key Factors in Sentiment Analysis by Customer Characteristics Using Attention Mechanism

  • Lee, Kwangho;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.207-218
    • /
    • 2020
  • Recently, due to the increase of online reviews and the development of analysis technology, the interest and demand for online review analysis continues to increase. However, previous studies have not considered the emotions contained in each vocabulary may differ from one reviewer to another. Therefore, this study first classifies the customer group according to the customer's grade, and presents the result of analyzing the difference by performing review analysis for each customer group. We found that the price factor had a significant influence on the evaluation of products for customers with high ratings. On the contrary, in the case of low-grade customers, the degree of correspondence between the contents introduced in the mall and the actual product significantly influenced the evaluation of the product. We expect that the proposed methodology can be effectively used to establish differentiated marketing strategies by identifying factors that affect product evaluation by customer group.