• Title/Summary/Keyword: ML-based Data Analysis

Search Result 103, Processing Time 0.022 seconds

A Study on PCS for ML-Based Electrical Propulsion System (ML 기반의 전기추진시스템을 위한 PCS에 관한 연구)

  • Lee, Jong-Hak;Lee, Hun-Seok;Oh, Jin-Seok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.9
    • /
    • pp.1025-1031
    • /
    • 2019
  • This study proposes a PCS that enables efficient operation of seawater pumps for ships by implementing ML-based algorithms. Seawater temperature, RPM and power consumption data are acquired from two ships with PCS, analyzed with regression analysis method, and new algorithms are presented. Using the algorithms presented, Ship A saved about 36% compared to the PCS application, and ML-based algorithms in certain sea temperatures of 19 to 27 degrees Celsius and above 32 degrees Celsius were about 1% lower than Ship A's PCS. Ship B saved about 50% compared to PCS not applied, and about 2% more than Ship B's PCS in waters above $19^{\circ}C$, a specified sea temperature. The derived data can be used to suggest the optimum pump speed and sea route. In addition, the trend of acquired data can be used to infer the performance of the pump or the timing of elimination of the MGPS when efficiency becomes poor.

Towards Effective Analysis and Tracking of Mozilla and Eclipse Defects using Machine Learning Models based on Bugs Data

  • Hassan, Zohaib;Iqbal, Naeem;Zaman, Abnash
    • Soft Computing and Machine Intelligence
    • /
    • v.1 no.1
    • /
    • pp.1-10
    • /
    • 2021
  • Analysis and Tracking of bug reports is a challenging field in software repositories mining. It is one of the fundamental ways to explores a large amount of data acquired from defect tracking systems to discover patterns and valuable knowledge about the process of bug triaging. Furthermore, bug data is publically accessible and available of the following systems, such as Bugzilla and JIRA. Moreover, with robust machine learning (ML) techniques, it is quite possible to process and analyze a massive amount of data for extracting underlying patterns, knowledge, and insights. Therefore, it is an interesting area to propose innovative and robust solutions to analyze and track bug reports originating from different open source projects, including Mozilla and Eclipse. This research study presents an ML-based classification model to analyze and track bug defects for enhancing software engineering management (SEM) processes. In this work, Artificial Neural Network (ANN) and Naive Bayesian (NB) classifiers are implemented using open-source bug datasets, such as Mozilla and Eclipse. Furthermore, different evaluation measures are employed to analyze and evaluate the experimental results. Moreover, a comparative analysis is given to compare the experimental results of ANN with NB. The experimental results indicate that the ANN achieved high accuracy compared to the NB. The proposed research study will enhance SEM processes and contribute to the body of knowledge of the data mining field.

The inference and estimation for latent discrete outcomes with a small sample

  • Choi, Hyung;Chung, Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.131-146
    • /
    • 2016
  • In research on behavioral studies, significant attention has been paid to the stage-sequential process for longitudinal data. Latent class profile analysis (LCPA) is an useful method to study sequential patterns of the behavioral development by the two-step identification process: identifying a small number of latent classes at each measurement occasion and two or more homogeneous subgroups in which individuals exhibit a similar sequence of latent class membership over time. Maximum likelihood (ML) estimates for LCPA are easily obtained by expectation-maximization (EM) algorithm, and Bayesian inference can be implemented via Markov chain Monte Carlo (MCMC). However, unusual properties in the likelihood of LCPA can cause difficulties in ML and Bayesian inference as well as estimation in small samples. This article describes and addresses erratic problems that involve conventional ML and Bayesian estimates for LCPA with small samples. We argue that these problems can be alleviated with a small amount of prior input. This study evaluates the performance of likelihood and MCMC-based estimates with the proposed prior in drawing inference over repeated sampling. Our simulation shows that estimates from the proposed methods perform better than those from the conventional ML and Bayesian method.

Design of Distributed Cloud System for Managing large-scale Genomic Data

  • Seine Jang;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.119-126
    • /
    • 2024
  • The volume of genomic data is constantly increasing in various modern industries and research fields. This growth presents new challenges and opportunities in terms of the quantity and diversity of genetic data. In this paper, we propose a distributed cloud system for integrating and managing large-scale gene databases. By introducing a distributed data storage and processing system based on the Hadoop Distributed File System (HDFS), various formats and sizes of genomic data can be efficiently integrated. Furthermore, by leveraging Spark on YARN, efficient management of distributed cloud computing tasks and optimal resource allocation are achieved. This establishes a foundation for the rapid processing and analysis of large-scale genomic data. Additionally, by utilizing BigQuery ML, machine learning models are developed to support genetic search and prediction, enabling researchers to more effectively utilize data. It is expected that this will contribute to driving innovative advancements in genetic research and applications.

Xperanto: A Web-Based Integrated System for DNA Microarray Data Management and Analysis

  • Park, Ji Yeon;Park, Yu Rang;Park, Chan Hee;Kim, Ji Hoon;Kim, Ju Ha
    • Genomics & Informatics
    • /
    • v.3 no.1
    • /
    • pp.39-42
    • /
    • 2005
  • DNA microarray is a high-throughput biomedical technology that monitors gene expression for thousands of genes in parallel. The abundance and complexity of the gene expression data have given rise to a requirement for their systematic management and analysis to support many laboratories performing microarray research. On these demands, we developed Xperanto for integrated data management and analysis using user-friendly web-based interface. Xperanto provides an integrated environment for management and analysis by linking the computational tools and rich sources of biological annotation. With the growing needs of data sharing, it is designed to be compliant to MGED (Microarray Gene Expression Data) standards for microarray data annotation and exchange. Xperanto enables a fast and efficient management of vast amounts of data, and serves as a communication channel among multiple researchers within an emerging interdisciplinary field.

Applications of Machine Learning Models on Yelp Data

  • Ruchi Singh;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.29 no.1
    • /
    • pp.35-49
    • /
    • 2019
  • The paper attempts to document the application of relevant Machine Learning (ML) models on Yelp (a crowd-sourced local business review and social networking site) dataset to analyze, predict and recommend business. Strategically using two cloud platforms to minimize the effort and time required for this project. Seven machine learning algorithms in Azure ML of which four algorithms are implemented in Databricks Spark ML. The analyzed Yelp business dataset contained 70 business attributes for more than 350,000 registered business. Additionally, review tips and likes from 500,000 users have been processed for the project. A Recommendation Model is built to provide Yelp users with recommendations for business categories based on their previous business ratings, as well as the business ratings of other users. Classification Model is implemented to predict the popularity of the business as defining the popular business to have stars greater than 3 and unpopular business to have stars less than 3. Text Analysis model is developed by comparing two algorithms, uni-gram feature extraction and n-feature extraction in Azure ML studio and logistic regression model in Spark. Comparative conclusions have been made related to efficiency of Spark ML and Azure ML for these models.

Revolutionizing Brain Tumor Segmentation in MRI with Dynamic Fusion of Handcrafted Features and Global Pathway-based Deep Learning

  • Faizan Ullah;Muhammad Nadeem;Mohammad Abrar
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.105-125
    • /
    • 2024
  • Gliomas are the most common malignant brain tumor and cause the most deaths. Manual brain tumor segmentation is expensive, time-consuming, error-prone, and dependent on the radiologist's expertise and experience. Manual brain tumor segmentation outcomes by different radiologists for the same patient may differ. Thus, more robust, and dependable methods are needed. Medical imaging researchers produced numerous semi-automatic and fully automatic brain tumor segmentation algorithms using ML pipelines and accurate (handcrafted feature-based, etc.) or data-driven strategies. Current methods use CNN or handmade features such symmetry analysis, alignment-based features analysis, or textural qualities. CNN approaches provide unsupervised features, while manual features model domain knowledge. Cascaded algorithms may outperform feature-based or data-driven like CNN methods. A revolutionary cascaded strategy is presented that intelligently supplies CNN with past information from handmade feature-based ML algorithms. Each patient receives manual ground truth and four MRI modalities (T1, T1c, T2, and FLAIR). Handcrafted characteristics and deep learning are used to segment brain tumors in a Global Convolutional Neural Network (GCNN). The proposed GCNN architecture with two parallel CNNs, CSPathways CNN (CSPCNN) and MRI Pathways CNN (MRIPCNN), segmented BraTS brain tumors with high accuracy. The proposed model achieved a Dice score of 87% higher than the state of the art. This research could improve brain tumor segmentation, helping clinicians diagnose and treat patients.

Surface-Engineered Graphene surface-enhanced Raman scattering Platform with Machine-learning Enabled Classification of Mixed Analytes

  • Jae Hee Cho;Garam Bae;Ki-Seok An
    • Journal of Sensor Science and Technology
    • /
    • v.33 no.3
    • /
    • pp.139-146
    • /
    • 2024
  • Surface-enhanced Raman scattering (SERS) enables the detection of various types of π-conjugated biological and chemical molecules owing to its exceptional sensitivity in obtaining unique spectra, offering nondestructive classification capabilities for target analytes. Herein, we demonstrate an innovative strategy that provides significant machine learning (ML)-enabled predictive SERS platforms through surface-engineered graphene via complementary hybridization with Au nanoparticles (NPs). The hybridized Au NPs/graphene SERS platforms showed exceptional sensitivity (10-7 M) due to the collaborative strong correlation between the localized electromagnetic effect and the enhanced chemical bonding reactivity. The chemical and physical properties of the demonstrated SERS platform were systematically investigated using microscopy and spectroscopic analysis. Furthermore, an innovative strategy employing ML is proposed to predict various analytes based on a featured Raman spectral database. Using a customized data-preprocessing algorithm, the feature data for ML were extracted from the Raman peak characteristic information, such as intensity, position, and width, from the SERS spectrum data. Additionally, sophisticated evaluations of various types of ML classification models were conducted using k-fold cross-validation (k = 5), showing 99% prediction accuracy.

An AutoML-driven Antenna Performance Prediction Model in the Autonomous Driving Radar Manufacturing Process

  • So-Hyang Bak;Kwanghoon Pio Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3330-3344
    • /
    • 2023
  • This paper proposes an antenna performance prediction model in the autonomous driving radar manufacturing process. Our research work is based upon a challenge dataset, Driving Radar Manufacturing Process Dataset, and a typical AutoML machine learning workflow engine, Pycaret open-source Python library. Note that the dataset contains the total 70 data-items, out of which 54 used as input features and 16 used as output features, and the dataset is properly built into resolving the multi-output regression problem. During the data regression analysis and preprocessing phase, we identified several input features having similar correlations and so detached some of those input features, which may become a serious cause of the multicollinearity problem that affect the overall model performance. In the training phase, we train each of output-feature regression models by using the AutoML approach. Next, we selected the top 5 models showing the higher performances in the AutoML result reports and applied the ensemble method so as for the selected models' performances to be improved. In performing the experimental performance evaluation of the regression prediction model, we particularly used two metrics, MAE and RMSE, and the results of which were 0.6928 and 1.2065, respectively. Additionally, we carried out a series of experiments to verify the proposed model's performance by comparing with other existing models' performances. In conclusion, we enhance accuracy for safer autonomous vehicles, reduces manufacturing costs through AutoML-Pycaret and machine learning ensembled model, and prevents the production of faulty radar systems, conserving resources. Ultimately, the proposed model holds significant promise not only for antenna performance but also for improving manufacturing quality and advancing radar systems in autonomous vehicles.

Optimizing Input Parameters of Paralichthys olivaceus Disease Classification based on SHAP Analysis (SHAP 분석 기반의 넙치 질병 분류 입력 파라미터 최적화)

  • Kyung-Won Cho;Ran Baik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1331-1336
    • /
    • 2023
  • In text-based fish disease classification using machine learning, there is a problem that the input parameters of the machine learning model are too many, but due to performance problems, the input parameters cannot be arbitrarily reduced. This paper proposes a method of optimizing input parameters specialized for Paralichthys olivaceus disease classification using SHAP analysis techniques to solve this problem,. The proposed method includes data preprocessing of disease information extracted from the halibut disease questionnaire by applying the SHAP analysis technique and evaluating a machine learning model using AutoML. Through this, the performance of the input parameters of AutoML is evaluated and the optimal input parameter combination is derived. In this study, the proposed method is expected to be able to maintain the existing performance while reducing the number of input parameters required, which will contribute to enhancing the efficiency and practicality of text-based Paralichthys olivaceus disease classification.