• 제목/요약/키워드: machine learning algorithm

검색결과 1,531건 처리시간 0.032초

머신러닝을 활용한 지역축제 방문객 수 예측모형 개발 (Development of a Model to Predict the Number of Visitors to Local Festivals Using Machine Learning)

  • 이인지;윤현식
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제29권3호
    • /
    • pp.35-52
    • /
    • 2020
  • Purpose Local governments in each region actively hold local festivals for the purpose of promoting the region and revitalizing the local economy. Existing studies related to local festivals have been actively conducted in tourism and related academic fields. Empirical studies to understand the effects of latent variables on local festivals and studies to analyze the regional economic impacts of festivals occupy a large proportion. Despite of practical need, since few researches have been conducted to predict the number of visitors, one of the criteria for evaluating the performance of local festivals, this study developed a model for predicting the number of visitors through various observed variables using a machine learning algorithm and derived its implications. Design/methodology/approach For a total of 593 festivals held in 2018, 6 variables related to the region considering population size, administrative division, and accessibility, and 15 variables related to the festival such as the degree of publicity and word of mouth, invitation singer, weather and budget were set for the training data in machine learning algorithm. Since the number of visitors is a continuous numerical data, random forest, Adaboost, and linear regression that can perform regression analysis among the machine learning algorithms were used. Findings This study confirmed that a prediction of the number of visitors to local festivals is possible using a machine learning algorithm, and the possibility of using machine learning in research in the tourism and related academic fields, including the study of local festivals, was captured. From a practical point of view, the model developed in this study is used to predict the number of visitors to the festival to be held in the future, so that the festival can be evaluated in advance and the demand for related facilities, etc. can be utilized. In addition, the RReliefF rank result can be used. Considering this, it will be possible to improve the existing local festivals or refer to the planning of a new festival.

머신러닝 기반 골프 퍼팅 방향 예측 모델을 활용한 중요 변수 분석 방법론 (Method of Analyzing Important Variables using Machine Learning-based Golf Putting Direction Prediction Model)

  • Kim, Yeon Ho;Cho, Seung Hyun;Jung, Hae Ryun;Lee, Ki Kwang
    • 한국운동역학회지
    • /
    • 제32권1호
    • /
    • pp.1-8
    • /
    • 2022
  • Objective: This study proposes a methodology to analyze important variables that have a significant impact on the putting direction prediction using a machine learning-based putting direction prediction model trained with IMU sensor data. Method: Putting data were collected using an IMU sensor measuring 12 variables from 6 adult males in their 20s at K University who had no golf experience. The data was preprocessed so that it could be applied to machine learning, and a model was built using five machine learning algorithms. Finally, by comparing the performance of the built models, the model with the highest performance was selected as the proposed model, and then 12 variables of the IMU sensor were applied one by one to analyze important variables affecting the learning performance. Results: As a result of comparing the performance of five machine learning algorithms (K-NN, Naive Bayes, Decision Tree, Random Forest, and Light GBM), the prediction accuracy of the Light GBM-based prediction model was higher than that of other algorithms. Using the Light GBM algorithm, which had excellent performance, an experiment was performed to rank the importance of variables that affect the direction prediction of the model. Conclusion: Among the five machine learning algorithms, the algorithm that best predicts the putting direction was the Light GBM algorithm. When the model predicted the putting direction, the variable that had the greatest influence was the left-right inclination (Roll).

소프트웨어 비용산정을 위한 면역 알고리즘 기반의 서포트 벡터 회귀 (Support Vector Regression based on Immune Algorithm for Software Cost Estimation)

  • 권기태;이준길
    • 한국컴퓨터정보학회논문지
    • /
    • 제14권7호
    • /
    • pp.17-24
    • /
    • 2009
  • 정보시스템에 대한 이용이 늘어남에 따라 소프트웨어 개발 요구와 개발 비용이 증가하게 되었다. 기존에는 통계적 알고리즘 기반의 회귀분석을 이용하여 소프트웨어 개발비용을 산정하였으나 오늘날은 기계학습 방법들이 많이 연구되고 있다. 본 논문에서는 기계학습 기술의 하나인 SVR를 사용하여 소프트웨어 비용을 산정하였고, 이 때 SVR에서 사용하는 파라미터들의 최적 조합을 면역계의 동작원리를 적용한 면역 알고리즘을 적용하여 최적 조합을 찾았다. 소프트웨어 비용산정을 위해 세대수, 기억세포수, 대립유전자수를 변경해 가면서 면역 알고리즘 기반의 SVR을 적용하였고, 그 실험 결과를 기존 연구된 다른 기계학습 방법과 비교 분석하였다.

Stroke Disease Identification System by using Machine Learning Algorithm

  • K.Veena Kumari ;K. Siva Kumar ;M.Sreelatha
    • International Journal of Computer Science & Network Security
    • /
    • 제23권11호
    • /
    • pp.183-189
    • /
    • 2023
  • A stroke is a medical disease where a blood vessel in the brain ruptures, causes damage to the brain. If the flow of blood and different nutrients to the brain is intermittent, symptoms may occur. Stroke is other reason for loss of life and widespread disorder. The prevalence of stroke is high in growing countries, with ischemic stroke being the high usual category. Many of the forewarning signs of stroke can be recognized the seriousness of a stroke can be reduced. Most of the earlier stroke detections and prediction models uses image examination tools like CT (Computed Tomography) scan or MRI (Magnetic Resonance Imaging) which are costly and difficult to use for actual-time recognition. Machine learning (ML) is a part of artificial intelligence (AI) that makes software applications to gain the exact accuracy to predict the end results not having to be directly involved to get the work done. In recent times ML algorithms have gained lot of attention due to their accurate results in medical fields. Hence in this work, Stroke disease identification system by using Machine Learning algorithm is presented. The ML algorithm used in this work is Artificial Neural Network (ANN). The result analysis of presented ML algorithm is compared with different ML algorithms. The performance of the presented approach is compared to find the better algorithm for stroke identification.

Estimating Regression Function with $\varepsilon-Insensitive$ Supervised Learning Algorithm

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권2호
    • /
    • pp.477-483
    • /
    • 2004
  • One of the major paradigms for supervised learning in neural network community is back-propagation learning. The standard implementations of back-propagation learning are optimal under the assumptions of identical and independent Gaussian noise. In this paper, for regression function estimation, we introduce $\varepsilon-insensitive$ back-propagation learning algorithm, which corresponds to minimizing the least absolute error. We compare this algorithm with support vector machine(SVM), which is another $\varepsilon-insensitive$ supervised learning algorithm and has been very successful in pattern recognition and function estimation problems. For comparison, we consider a more realistic model would allow the noise variance itself to depend on the input variables.

  • PDF

SEQUENTIAL MINIMAL OPTIMIZATION WITH RANDOM FOREST ALGORITHM (SMORF) USING TWITTER CLASSIFICATION TECHNIQUES

  • J.Uma;K.Prabha
    • International Journal of Computer Science & Network Security
    • /
    • 제23권4호
    • /
    • pp.116-122
    • /
    • 2023
  • Sentiment categorization technique be commonly isolated interested in threes significant classifications name Machine Learning Procedure (ML), Lexicon Based Method (LB) also finally, the Hybrid Method. In Machine Learning Methods (ML) utilizes phonetic highlights with apply notable ML algorithm. In this paper, in classification and identification be complete base under in optimizations technique called sequential minimal optimization with Random Forest algorithm (SMORF) for expanding the exhibition and proficiency of sentiment classification framework. The three existing classification algorithms are compared with proposed SMORF algorithm. Imitation result within experiential structure is Precisions (P), recalls (R), F-measures (F) and accuracy metric. The proposed sequential minimal optimization with Random Forest (SMORF) provides the great accuracy.

머신러닝 자동화를 위한 개발 환경에 관한 연구 (A Study on Development Environments for Machine Learning)

  • 김동길;박용순;박래정;정태윤
    • 대한임베디드공학회논문지
    • /
    • 제15권6호
    • /
    • pp.307-316
    • /
    • 2020
  • Machine learning model data is highly affected by performance. preprocessing is needed to enable analysis of various types of data, such as letters, numbers, and special characters. This paper proposes a development environment that aims to process categorical and continuous data according to the type of missing values in stage 1, implementing the function of selecting the best performing algorithm in stage 2 and automating the process of checking model performance in stage 3. Using this model, machine learning models can be created without prior knowledge of data preprocessing.

Identifying the Optimal Machine Learning Algorithm for Breast Cancer Prediction

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • 제13권3호
    • /
    • pp.80-88
    • /
    • 2024
  • Breast cancer remains a significant global health burden, necessitating accurate and timely detection for improved patient outcomes. Machine learning techniques have demonstrated remarkable potential in assisting breast cancer diagnosis by learning complex patterns from multi-modal patient data. This study comprehensively evaluates several popular machine learning models, including logistic regression, decision trees, random forests, support vector machines (SVMs), naive Bayes, k-nearest neighbors (KNN), XGBoost, and ensemble methods for breast cancer prediction using the Wisconsin Breast Cancer Dataset (WBCD). Through rigorous benchmarking across metrics like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), we identify the naive Bayes classifier as the top-performing model, achieving an accuracy of 0.974, F1-score of 0.979, and highest AUC of 0.988. Other strong performers include logistic regression, random forests, and XGBoost, with AUC values exceeding 0.95. Our findings showcase the significant potential of machine learning, particularly the robust naive Bayes algorithm, to provide highly accurate and reliable breast cancer screening from fine needle aspirate (FNA) samples, ultimately enabling earlier intervention and optimized treatment strategies.

Prediction of the DO concentration using the machine learning algorithm: case study in Oncheoncheon, Republic of Korea

  • Lim, Heesung;An, Hyunuk;Choi, Eunhyuk;Kim, Yeonsu
    • 농업과학연구
    • /
    • 제47권4호
    • /
    • pp.1029-1037
    • /
    • 2020
  • The machine learning algorithm has been widely used in water-related fields such as water resources, water management, hydrology, atmospheric science, water quality, water level prediction, weather forecasting, water discharge prediction, water quality forecasting, etc. However, water quality prediction studies based on the machine learning algorithm are limited compared to other water-related applications because of the limited water quality data. Most of the previous water quality prediction studies have predicted monthly water quality, which is useful information but not enough from a practical aspect. In this study, we predicted the dissolved oxygen (DO) using recurrent neural network with long short-term memory model recurrent neural network long-short term memory (RNN-LSTM) algorithms with hourly- and daily-datasets. Bugok Bridge in Oncheoncheon, located in Busan, where the data was collected in real time, was selected as the target for the DO prediction. The 10-month (temperature, wind speed, and relative humidity) data were used as time prediction inputs, and the 5-year (temperature, wind speed, relative humidity, and rainfall) data were used as the daily forecast inputs. Missing data were filled by linear interpolation. The prediction model was coded based on TensorFlow, an open-source library developed by Google. The performance of the RNN-LSTM algorithm for the hourly- or daily-based water quality prediction was tested and analyzed. Research results showed that the hourly data for the water quality is useful for machine learning, and the RNN-LSTM algorithm has potential to be used for hourly- or daily-based water quality forecasting.