• Title/Summary/Keyword: Scikit-learn

Search Result 21, Processing Time 0.021 seconds

Income prediction of apple and pear farmers in Chungnam area by automatic machine learning with H2O.AI

  • Hyundong, Jang;Sounghun, Kim
    • Korean Journal of Agricultural Science
    • /
    • v.49 no.3
    • /
    • pp.619-627
    • /
    • 2022
  • In Korea, apples and pears are among the most important agricultural products to farmers who seek to earn money as income. Generally, farmers make decisions at various stages to maximize their income but they do not always know exactly which option will be the best one. Many previous studies were conducted to solve this problem by predicting farmers' income structure, but researchers are still exploring better approaches. Currently, machine learning technology is gaining attention as one of the new approaches for farmers' income prediction. The machine learning technique is a methodology using an algorithm that can learn independently through data. As the level of computer science develops, the performance of machine learning techniques is also improving. The purpose of this study is to predict the income structure of apples and pears using the automatic machine learning solution H2O.AI and to present some implications for apple and pear farmers. The automatic machine learning solution H2O.AI can save time and effort compared to the conventional machine learning techniques such as scikit-learn, because it works automatically to find the best solution. As a result of this research, the following findings are obtained. First, apple farmers should increase their gross income to maximize their income, instead of reducing the cost of growing apples. In particular, apple farmers mainly have to increase production in order to obtain more gross income. As a second-best option, apple farmers should decrease labor and other costs. Second, pear farmers also should increase their gross income to maximize their income but they have to increase the price of pears rather than increasing the production of pears. As a second-best option, pear farmers can decrease labor and other costs.

Development of Multilayer Perceptron Model for the Prediction of Alcohol Concentration of Makgeolli

  • Kim, JoonYong;Rho, Shin-Joung;Cho, Yun Sung;Cho, EunSun
    • Journal of Biosystems Engineering
    • /
    • v.43 no.3
    • /
    • pp.229-236
    • /
    • 2018
  • Purpose: Makgeolli is a traditional alcoholic beverage made from rice with a fermentation starter called "nuruk." The concentration of alcohol in makgeolli depends on the temperature of the fermentation tank. It is important to monitor the alcohol concentration to manage the makgeolli production process. Methods: Data were collected from 84 makgeolli fermentation tanks over a year period. Independent variables included the temperatures of the tanks and the room where the tanks were located, as well as the quantity, acidity, and water concentration of the source. Software for the multilayer perceptron model (MLP) was written in Python using the Scikit-learn library. Results: Many models were created for which the optimization converged within 100 iterations, and their coefficients of determination $R^2$ were considerably high. The coefficient of determination $R^2$ of the best model with the training set and the test set were 0.94 and 0.93, respectively. The fact that the difference between them was very small indicated that the model was not overfitted. The maximum and minimum error was approximately 2% and the total MSE was 0.078%. Conclusions: The MLP model could help predict the alcohol concentration and to control the production process of makgeolli. In future research, the optimization of the production process will be studied based on the model.

Emotion Recognition of Low Resource (Sindhi) Language Using Machine Learning

  • Ahmed, Tanveer;Memon, Sajjad Ali;Hussain, Saqib;Tanwani, Amer;Sadat, Ahmed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.369-376
    • /
    • 2021
  • One of the most active areas of research in the field of affective computing and signal processing is emotion recognition. This paper proposes emotion recognition of low-resource (Sindhi) language. This work's uniqueness is that it examines the emotions of languages for which there is currently no publicly accessible dataset. The proposed effort has provided a dataset named MAVDESS (Mehran Audio-Visual Dataset Mehran Audio-Visual Database of Emotional Speech in Sindhi) for the academic community of a significant Sindhi language that is mainly spoken in Pakistan; however, no generic data for such languages is accessible in machine learning except few. Furthermore, the analysis of various emotions of Sindhi language in MAVDESS has been carried out to annotate the emotions using line features such as pitch, volume, and base, as well as toolkits such as OpenSmile, Scikit-Learn, and some important classification schemes such as LR, SVC, DT, and KNN, which will be further classified and computed to the machine via Python language for training a machine. Meanwhile, the dataset can be accessed in future via https://doi.org/10.5281/zenodo.5213073.

Model Transformation and Inference of Machine Learning using Open Neural Network Format (오픈신경망 포맷을 이용한 기계학습 모델 변환 및 추론)

  • Kim, Seon-Min;Han, Byunghyun;Heo, Junyeong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.3
    • /
    • pp.107-114
    • /
    • 2021
  • Recently artificial intelligence technology has been introduced in various fields and various machine learning models have been operated in various frameworks as academic interest has increased. However, these frameworks have different data formats, which lack interoperability, and to overcome this, the open neural network exchange format, ONNX, has been proposed. In this paper we describe how to transform multiple machine learning models to ONNX, and propose algorithms and inference systems that can determine machine learning techniques in an integrated ONNX format. Furthermore we compare the inference results of the models before and after the ONNX transformation, showing that there is no loss or performance degradation of the learning results between the ONNX transformation.

Comparative Analysis of Vectorization Techniques in Electronic Medical Records Classification (의무 기록 문서 분류를 위한 자연어 처리에서 최적의 벡터화 방법에 대한 비교 분석)

  • Yoo, Sung Lim
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.2
    • /
    • pp.109-115
    • /
    • 2022
  • Purpose: Medical records classification using vectorization techniques plays an important role in natural language processing. The purpose of this study was to investigate proper vectorization techniques for electronic medical records classification. Material and methods: 403 electronic medical documents were extracted retrospectively and classified using the cosine similarity calculated by Scikit-learn (Python module for machine learning) in Jupyter Notebook. Vectors for medical documents were produced by three different vectorization techniques (TF-IDF, latent sematic analysis and Word2Vec) and the classification precisions for three vectorization techniques were evaluated. The Kruskal-Wallis test was used to determine if there was a significant difference among three vectorization techniques. Results: 403 medical documents were relevant to 41 different diseases and the average number of documents per diagnosis was 9.83 (standard deviation=3.46). The classification precisions for three vectorization techniques were 0.78 (TF-IDF), 0.87 (LSA) and 0.79 (Word2Vec). There was a statistically significant difference among three vectorization techniques. Conclusions: The results suggest that removing irrelevant information (LSA) is more efficient vectorization technique than modifying weights of vectorization models (TF-IDF, Word2Vec) for medical documents classification.

Prediction of the Number of Crimes according to Urban Environmental Factors in the Metropolitan Area (수도권 도시 환경 요인에 따른 범죄 발생 건수 예측)

  • Ye-Won Jang;Ye-Lim Kim;Si-Hyeon Park;Jae-Young Lee;Yoo-Jin Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.321-322
    • /
    • 2023
  • 본 논문에서는 Scikit-learn 패키지의 LinearRegression 모델과 Keras 딥러닝 모델을 활용하여 수도권 도시 환경 요인에 따른 범죄 발생 건수를 예측 모델을 제안한다. 연구 방법으로 범죄 발생과 유의미한 관계가 있다고 파악되는 수도권의 각 자치구 별 데이터셋을 분석하여, CCTV, 파출소, 가로등의 수가 범죄 발생에 유의미한 영향을 끼치는 것을 확인하였다. 독립 변수들 간에 Scale을 줄이고자 정규화를 진행했고, 종속변수의 정규성 확보를 위해 로그변환을 취했다. 손실 함수는 회귀문제에서 사용되는 'relu'함수를 사용했고 모델의 성능을 확인할 수 있는 지표로 MSE(Mean Squared Error)를 사용해 모델을 구성하였다. 본 논문에서 설계한 이 프로그램은 범죄 발생율이 높은 지역구에 경찰 인력의 추가적 배치, 안전 시설 확충 등 실무적 조치를 취함에 있어 근거를 제공할 수 있을 것으로 사료된다.

  • PDF

Creating a Smartphone User Recommendation System Using Clustering (클러스터링을 이용한 스마트폰 사용자 추천 시스템 만들기)

  • Jin Hyoung AN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.1-6
    • /
    • 2024
  • In this paper, we develop an AI-based recommendation system that matches the specifications of smartphones from company 'S'. The system aims to simplify the complex decision-making process of consumers and guide them to choose the smartphone that best suits their daily needs. The recommendation system analyzes five specifications of smartphones (price, battery capacity, weight, camera quality, capacity) to help users make informed decisions without searching for extensive information. This approach not only saves time but also improves user satisfaction by ensuring that the selected smartphone closely matches the user's lifestyle and needs. The system utilizes unsupervised learning, i.e. clustering (K-MEANS, DBSCAN, Hierarchical Clustering), and provides personalized recommendations by evaluating them with silhouette scores, ensuring accurate and reliable grouping of similar smartphone models. By leveraging advanced data analysis techniques, the system can identify subtle patterns and preferences that might not be immediately apparent to consumers, enhancing the overall user experience. The ultimate goal of this AI recommendation system is to simplify the smartphone selection process, making it more accessible and user-friendly for all consumers. This paper discusses the data collection, preprocessing, development, implementation, and potential impact of the system using Pandas, crawling, scikit-learn, etc., and highlights the benefits of helping consumers explore the various options available and confidently choose the smartphone that best suits their daily lives.

Anomaly Detection System in Mechanical Facility Equipment: Using Long Short-Term Memory Variational Autoencoder (LSTM-VAE를 활용한 기계시설물 장치의 이상 탐지 시스템)

  • Seo, Jaehong;Park, Junsung;Yoo, Joonwoo;Park, Heejun
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.4
    • /
    • pp.581-594
    • /
    • 2021
  • Purpose: The purpose of this study is to compare machine learning models for anomaly detection of mechanical facility equipment and suggest an anomaly detection system for mechanical facility equipment in subway stations. It helps to predict failures and plan the maintenance of facility. Ultimately it aims to improve the quality of facility equipment. Methods: The data collected from Daejeon Metropolitan Rapid Transit Corporation was used in this experiment. The experiment was performed using Python, Scikit-learn, tensorflow 2.0 for preprocessing and machine learning. Also it was conducted in two failure states of the equipment. We compared and analyzed five unsupervised machine learning models focused on model Long Short-Term Memory Variational Autoencoder(LSTM-VAE). Results: In both experiments, change in vibration and current data was observed when there is a defect. When the rotating body failure was happened, the magnitude of vibration has increased but current has decreased. In situation of axis alignment failure, both of vibration and current have increased. In addition, model LSTM-VAE showed superior accuracy than the other four base-line models. Conclusion: According to the results, model LSTM-VAE showed outstanding performance with more than 97% of accuracy in the experiments. Thus, the quality of mechanical facility equipment will be improved if the proposed anomaly detection system is established with this model used.

Covid19 trends predictions using time series data (시계열 데이터를 활용한 코로나19 동향 예측)

  • Kim, Jae-Ho;Kim, Jang-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.7
    • /
    • pp.884-889
    • /
    • 2021
  • The number of people infected with Covid-19 in Korea seemed to be gradually decreasing thanks to various efforts such as social distancing and vaccines. However, just as the number of infected people increased after a particular incident on February 20, 2020, the number of infected people has been increasing rapidly since December 2020 by approximately 500 per day. Therefore, the future Covid-19 is predicted through the Prophet algorithm using Kaggle's dataset, and the explanatory power for this prediction is added through the coefficient of determination, mean absolute error, mean percent error, mean square difference, and mean square deviation through Scikit-learn. Moreover, in the absence of a specific incident rapidly increasing the cases of Covid-19, the proposed method predicts the number of infected people in Korea and emphasizes the importance of implementing epidemic prevention and quarantine rules for future diseases.

A Machine Learning-Based Encryption Behavior Cognitive Technique for Ransomware Detection (랜섬웨어 탐지를 위한 머신러닝 기반 암호화 행위 감지 기법)

  • Yoon-Cheol Hwang
    • Journal of Industrial Convergence
    • /
    • v.21 no.12
    • /
    • pp.55-62
    • /
    • 2023
  • Recent ransomware attacks employ various techniques and pathways, posing significant challenges in early detection and defense. Consequently, the scale of damage is continually growing. This paper introduces a machine learning-based approach for effective ransomware detection by focusing on file encryption and encryption patterns, which are pivotal functionalities utilized by ransomware. Ransomware is identified by analyzing password behavior and encryption patterns, making it possible to detect specific ransomware variants and new types of ransomware, thereby mitigating ransomware attacks effectively. The proposed machine learning-based encryption behavior detection technique extracts encryption and encryption pattern characteristics and trains them using a machine learning classifier. The final outcome is an ensemble of results from two classifiers. The classifier plays a key role in determining the presence or absence of ransomware, leading to enhanced accuracy. The proposed technique is implemented using the numpy, pandas, and Python's Scikit-Learn library. Evaluation indicators reveal an average accuracy of 94%, precision of 95%, recall rate of 93%, and an F1 score of 95%. These performance results validate the feasibility of ransomware detection through encryption behavior analysis, and further research is encouraged to enhance the technique for proactive ransomware detection.