• Title/Summary/Keyword: machine learning classification models

Search Result 370, Processing Time 0.024 seconds

Malaysian Name-based Ethnicity Classification using LSTM

  • Hur, Youngbum
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3855-3867
    • /
    • 2022
  • Name separation (splitting full names into surnames and given names) is not a tedious task in a multiethnic country because the procedure for splitting surnames and given names is ethnicity-specific. Malaysia has multiple main ethnic groups; therefore, separating Malaysian full names into surnames and given names proves a challenge. In this study, we develop a two-phase framework for Malaysian name separation using deep learning. In the initial phase, we predict the ethnicity of full names. We propose a recurrent neural network with long short-term memory network-based model with character embeddings for prediction. Based on the predicted ethnicity, we use a rule-based algorithm for splitting full names into surnames and given names in the second phase. We evaluate the performance of the proposed model against various machine learning models and demonstrate that it outperforms them by an average of 9%. Moreover, transfer learning and fine-tuning of the proposed model with an additional dataset results in an improvement of up to 7% on average.

Diagnosis of Valve Internal Leakage for Ship Piping System using Acoustic Emission Signal-based Machine Learning Approach (선박용 밸브의 내부 누설 진단을 위한 음향방출신호의 머신러닝 기법 적용 연구)

  • Lee, Jung-Hyung
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.28 no.1
    • /
    • pp.184-192
    • /
    • 2022
  • Valve internal leakage is caused by damage to the internal parts of the valve, resulting in accidents and shutdowns of the piping system. This study investigated the possibility of a real-time leak detection method using the acoustic emission (AE) signal generated from the piping system during the internal leakage of a butterfly valve. Datasets of raw time-domain AE signals were collected and postprocessed for each operation mode of the valve in a systematic manner to develop a data-driven model for the detection and classification of internal leakage, by applying machine learning algorithms. The aim of this study was to determine whether it is possible to treat leak detection as a classification problem by applying two classification algorithms: support vector machine (SVM) and convolutional neural network (CNN). The results showed different performances for the algorithms and datasets used. The SVM-based binary classification models, based on feature extraction of data, achieved an overall accuracy of 83% to 90%, while in the case of a multiple classification model, the accuracy was reduced to 66%. By contrast, the CNN-based classification model achieved an accuracy of 99.85%, which is superior to those of any other models based on the SVM algorithm. The results revealed that the SVM classification model requires effective feature extraction of the AE signals to improve the accuracy of multi-class classification. Moreover, the CNN-based classification can be a promising approach to detect both leakage and valve opening as long as the performance of the processor does not degrade.

An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter (한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교)

  • Lim, Joa-Sang;Kim, Jin-Man
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.2
    • /
    • pp.232-239
    • /
    • 2014
  • As online texts have been rapidly growing, their automatic classification gains more interest with machine learning methods. Nevertheless, comparatively few research could be found, aiming for Korean texts. Evaluating them with statistical methods are also rare. This study took a sample of tweets and used machine learning methods to classify emotions with features of morphemes and n-grams. As a result, about 76% of emotions contained in tweets was correctly classified. Of the two methods compared in this study, Support Vector Machines were found more accurate than Na$\ddot{i}$ve Bayes. The linear model of SVM was not inferior to the non-linear one. Morphological features did not contribute to accuracy more than did the n-grams.

Application of Deep Learning to the Forecast of Flare Classification and Occurrence using SOHO MDI data

  • Park, Eunsu;Moon, Yong-Jae;Kim, Taeyoung
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.2
    • /
    • pp.60.2-61
    • /
    • 2017
  • A Convolutional Neural Network(CNN) is one of the well-known deep-learning methods in image processing and computer vision area. In this study, we apply CNN to two kinds of flare forecasting models: flare classification and occurrence. For this, we consider several pre-trained models (e.g., AlexNet, GoogLeNet, and ResNet) and customize them by changing several options such as the number of layers, activation function, and optimizer. Our inputs are the same number of SOHO)/MDI images for each flare class (None, C, M and X) at 00:00 UT from Jan 1996 to Dec 2010 (total 1600 images). Outputs are the results of daily flare forecasting for flare class and occurrence. We build, train, and test the models on TensorFlow, which is well-known machine learning software library developed by Google. Our major results from this study are as follows. First, most of the models have accuracies more than 0.7. Second, ResNet developed by Microsoft has the best accuracies : 0.77 for flare classification and 0.83 for flare occurrence. Third, the accuracies of these models vary greatly with changing parameters. We discuss several possibilities to improve the models.

  • PDF

A Deep Learning Model for Extracting Consumer Sentiments using Recurrent Neural Network Techniques

  • Ranjan, Roop;Daniel, AK
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.238-246
    • /
    • 2021
  • The rapid rise of the Internet and social media has resulted in a large number of text-based reviews being placed on sites such as social media. In the age of social media, utilizing machine learning technologies to analyze the emotional context of comments aids in the understanding of QoS for any product or service. The classification and analysis of user reviews aids in the improvement of QoS. (Quality of Services). Machine Learning algorithms have evolved into a powerful tool for analyzing user sentiment. Unlike traditional categorization models, which are based on a set of rules. In sentiment categorization, Bidirectional Long Short-Term Memory (BiLSTM) has shown significant results, and Convolution Neural Network (CNN) has shown promising results. Using convolutions and pooling layers, CNN can successfully extract local information. BiLSTM uses dual LSTM orientations to increase the amount of background knowledge available to deep learning models. The suggested hybrid model combines the benefits of these two deep learning-based algorithms. The data source for analysis and classification was user reviews of Indian Railway Services on Twitter. The suggested hybrid model uses the Keras Embedding technique as an input source. The suggested model takes in data and generates lower-dimensional characteristics that result in a categorization result. The suggested hybrid model's performance was compared using Keras and Word2Vec, and the proposed model showed a significant improvement in response with an accuracy of 95.19 percent.

Developing Models for Patterns of Road Surface Temperature Change using Road and Weather Conditions (도로 및 기상조건을 고려한 노면온도변화 패턴 추정 모형 개발)

  • Kim, Jin Guk;Yang, Choong Heon;Kim, Seoung Bum;Yun, Duk Geun;Park, Jae Hong
    • International Journal of Highway Engineering
    • /
    • v.20 no.2
    • /
    • pp.127-135
    • /
    • 2018
  • PURPOSES : This study develops various models that can estimate the pattern of road surface temperature changes using machine learning methods. METHODS : Both a thermal mapping system and weather forecast information were employed in order to collect data for developing the models. In previous studies, the authors defined road surface temperature data as a response, while vehicular ambient temperature, air temperature, and humidity were considered as predictors. In this research, two additional factors-road type and weather forecasts-were considered for the estimation of the road surface temperature change pattern. Finally, a total of six models for estimating the pattern of road surface temperature changes were developed using the MATLAB program, which provides the classification learner as a machine learning tool. RESULTS : Model 5 was considered the most superior owing to its high accuracy. It was seen that the accuracy of the model could increase when weather forecasts (e.g., Sky Status) were applied. A comparison between Models 4 and 5 showed that the influence of humidity on road surface temperature changes is negligible. CONCLUSIONS : Even though Models 4, 5, and 6 demonstrated the same performance in terms of average absolute error (AAE), Model 5 can be considered the optimal one from the point of view of accuracy.

Applications of Machine Learning Models on Yelp Data

  • Ruchi Singh;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.29 no.1
    • /
    • pp.35-49
    • /
    • 2019
  • The paper attempts to document the application of relevant Machine Learning (ML) models on Yelp (a crowd-sourced local business review and social networking site) dataset to analyze, predict and recommend business. Strategically using two cloud platforms to minimize the effort and time required for this project. Seven machine learning algorithms in Azure ML of which four algorithms are implemented in Databricks Spark ML. The analyzed Yelp business dataset contained 70 business attributes for more than 350,000 registered business. Additionally, review tips and likes from 500,000 users have been processed for the project. A Recommendation Model is built to provide Yelp users with recommendations for business categories based on their previous business ratings, as well as the business ratings of other users. Classification Model is implemented to predict the popularity of the business as defining the popular business to have stars greater than 3 and unpopular business to have stars less than 3. Text Analysis model is developed by comparing two algorithms, uni-gram feature extraction and n-feature extraction in Azure ML studio and logistic regression model in Spark. Comparative conclusions have been made related to efficiency of Spark ML and Azure ML for these models.

Finding a plan to improve recognition rate using classification analysis

  • Kim, SeungJae;Kim, SungHwan
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.184-191
    • /
    • 2020
  • With the emergence of the 4th Industrial Revolution, core technologies that will lead the 4th Industrial Revolution such as AI (artificial intelligence), big data, and Internet of Things (IOT) are also at the center of the topic of the general public. In particular, there is a growing trend of attempts to present future visions by discovering new models by using them for big data analysis based on data collected in a specific field, and inferring and predicting new values with the models. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable, the correlation between the variables, and multicollinearity. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified according to the purpose of analysis. Therefore, in this study, data is classified using a decision tree technique and a random forest technique among classification analysis, which is a machine learning technique that implements AI technology. And by evaluating the degree of classification of the data, we try to find a way to improve the classification and analysis rate of the data.

KOMPSAT-3A Urban Classification Using Machine Learning Algorithm - Focusing on Yang-jae in Seoul - (기계학습 기법에 따른 KOMPSAT-3A 시가화 영상 분류 - 서울시 양재 지역을 중심으로 -)

  • Youn, Hyoungjin;Jeong, Jongchul
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_2
    • /
    • pp.1567-1577
    • /
    • 2020
  • Urban land cover classification is role in urban planning and management. So, it's important to improve classification accuracy on urban location. In this paper, machine learning model, Support Vector Machine (SVM) and Artificial Neural Network (ANN) are proposed for urban land cover classification based on high resolution satellite imagery (KOMPSAT-3A). Satellite image was trained based on 25 m rectangle grid to create training data, and training models used for classifying test area. During the validation process, we presented confusion matrix for each result with 250 Ground Truth Points (GTP). Of the four SVM kernels and the two activation functions ANN, the SVM Polynomial kernel model had the highest accuracy of 86%. In the process of comparing the SVM and ANN using GTP, the SVM model was more effective than the ANN model for KOMPSAT-3A classification. Among the four classes (building, road, vegetation, and bare-soil), building class showed the lowest classification accuracy due to the shadow caused by the high rise building.

A Study on a Wearable Smart Airbag Using Machine Learning Algorithm (머신러닝 알고리즘을 사용한 웨어러블 스마트 에어백에 관한 연구)

  • Kim, Hyun Sik;Baek, Won Cheol;Baek, Woon Kyung
    • Journal of the Korean Society of Safety
    • /
    • v.35 no.2
    • /
    • pp.94-99
    • /
    • 2020
  • Bikers can be subjected to injuries from unexpected accidents even if they wear basic helmets. A properly designed airbag can efficiently protect the critical areas of the human body. This study introduces a wearable smart airbag system using machine learning techniques to protect human neck and shoulders. When a bicycle accident happens, a microprocessor analyzes the biker's motion data to recognize if it is a critical accident by comparing with accident classification models. These models are trained by a variety of possible accidents through machine learning techniques, like k-means and SVM methods. When the microprocessor decides it is a critical accident, it issues an actuation signal for the gas inflater to inflate the airbag. A protype of the wearable smart airbag with the machine learning techniques is developed and its performance is tested using a human dummy mounted on a moving cart.