• Title/Summary/Keyword: machine learning framework

Search Result 250, Processing Time 0.027 seconds

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Machine Learning-based Estimation of the Concentration of Fine Particulate Matter Using Domain Adaptation Method (Domain Adaptation 방법을 이용한 기계학습 기반의 미세먼지 농도 예측)

  • Kang, Tae-Cheon;Kang, Hang-Bong
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.8
    • /
    • pp.1208-1215
    • /
    • 2017
  • Recently, people's attention and worries about fine particulate matter have been increasing. Due to the construction and maintenance costs, there are insufficient air quality monitoring stations. As a result, people have limited information about the concentration of fine particulate matter, depending on the location. Studies have been undertaken to estimate the fine particle concentrations in areas without a measurement station. Yet there are limitations in that the estimate cannot take account of other factors that affect the concentration of fine particle. In order to solve these problems, we propose a framework for estimating the concentration of fine particulate matter of a specific area using meteorological data and traffic data. Since there are more grids without a monitor station than grids with a monitor station, we used a domain adversarial neural network based on the domain adaptation method. The features extracted from meteorological data and traffic data are learned in the network, and the air quality index of the corresponding area is then predicted by the generated model. Experimental results demonstrate that the proposed method performs better as the number of source data increases than the method using conditional random fields.

An evolutionary system for the prediction of high performance concrete strength based on semantic genetic programming

  • Castelli, Mauro;Trujillo, Leonardo;Goncalves, Ivo;Popovic, Ales
    • Computers and Concrete
    • /
    • v.19 no.6
    • /
    • pp.651-658
    • /
    • 2017
  • High-performance concrete, besides aggregate, cement, and water, incorporates supplementary cementitious materials, such as fly ash and blast furnace slag, and chemical admixture, such as superplasticizer. Hence, it is a highly complex material and modeling its behavior represents a difficult task. This paper presents an evolutionary system for the prediction of high performance concrete strength. The proposed framework blends a recently developed version of genetic programming with a local search method. The resulting system enables us to build a model that produces an accurate estimation of the considered parameter. Experimental results show the suitability of the proposed system for the prediction of concrete strength. The proposed method produces a lower error with respect to the state-of-the art technique. The paper provides two contributions: from the point of view of the high performance concrete strength prediction, a system able to outperform existing state-of-the-art techniques is defined; from the machine learning perspective, this case study shows that including a local searcher in the geometric semantic genetic programming system can speed up the convergence of the search process.

An Improvement of Accuracy for NaiveBayes by Using Large Word Sets (빈발단어집합을 이용한 NaiveBayes의 정확도 개선)

  • Lee Jae-Moon
    • Journal of Internet Computing and Services
    • /
    • v.7 no.3
    • /
    • pp.169-178
    • /
    • 2006
  • In this paper, we define the large word sets which are noble variations the large item sets in mining association rules, and improve the accuracy for NaiveBayes based on the defined large word sets. In order to use them, a document is divided into the several paragraphs, and then each paragraph can be transformed as the transaction by extracting words in it. The proposed method was implemented by using Al:Categorizer framework and its accuracies were measured by the experiments for reuter-21578 data set. The results of the experiments show that the proposed method improves the accuracy of the conventional NaiveBayes.

  • PDF

Removing Out - Of - Distribution Samples on Classification Task

  • Dang, Thanh-Vu;Vo, Hoang-Trong;Yu, Gwang-Hyun;Lee, Ju-Hwan;Nguyen, Huy-Toan;Kim, Jin-Young
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.80-89
    • /
    • 2020
  • Out - of - distribution (OOD) samples are frequently encountered when deploying a classification model in plenty of real-world machine learning-based applications. Those samples are normally sampling far away from the training distribution, but many classifiers still assign them high reliability to belong to one of the training categories. In this study, we address the problem of removing OOD examples by estimating marginal density estimation using variational autoencoder (VAE). We also investigate other proper methods, such as temperature scaling, Gaussian discrimination analysis, and label smoothing. We use Chonnam National University (CNU) weeds dataset as the in - distribution dataset and CIFAR-10, CalTeach as the OOD datasets. Quantitative results show that the proposed framework can reject the OOD test samples with a suitable threshold.

Convolutional Neural Network-based System for Vehicle Front-Side Detection (컨볼루션 신경망 기반의 차량 전면부 검출 시스템)

  • Park, Young-Kyu;Park, Je-Kang;On, Han-Ik;Kang, Dong-Joong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.11
    • /
    • pp.1008-1016
    • /
    • 2015
  • This paper proposes a method for detecting the front side of vehicles. The method can find the car side with a license plate even with complicated and cluttered backgrounds. A convolutional neural network (CNN) is used to solve the detection problem as a unified framework combining feature detection, classification, searching, and localization estimation and improve the reliability of the system with simplicity of usage. The proposed CNN structure avoids sliding window search to find the locations of vehicles and reduces the computing time to achieve real-time processing. Multiple responses of the network for vehicle position are further processed by a weighted clustering and probabilistic threshold decision method. Experiments using real images in parking lots show the reliability of the method.

Prototype-based Classifier with Feature Selection and Its Design with Particle Swarm Optimization: Analysis and Comparative Studies

  • Park, Byoung-Jun;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.7 no.2
    • /
    • pp.245-254
    • /
    • 2012
  • In this study, we introduce a prototype-based classifier with feature selection that dwells upon the usage of a biologically inspired optimization technique of Particle Swarm Optimization (PSO). The design comprises two main phases. In the first phase, PSO selects P % of patterns to be treated as prototypes of c classes. During the second phase, the PSO is instrumental in the formation of a core set of features that constitute a collection of the most meaningful and highly discriminative coordinates of the original feature space. The proposed scheme of feature selection is developed in the wrapper mode with the performance evaluated with the aid of the nearest prototype classifier. The study offers a complete algorithmic framework and demonstrates the effectiveness (quality of solution) and efficiency (computing cost) of the approach when applied to a collection of selected data sets. We also include a comparative study which involves the usage of genetic algorithms (GAs). Numerical experiments show that a suitable selection of prototypes and a substantial reduction of the feature space could be accomplished and the classifier formed in this manner becomes characterized by low classification error. In addition, the advantage of the PSO is quantified in detail by running a number of experiments using Machine Learning datasets.

Face Annotation System for Social Network Environments (소셜 네트웍 환경에서의 얼굴 주석 시스템)

  • Chai, Kwon-Taeg;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.601-605
    • /
    • 2009
  • Recently, photo sharing and publishing based Social Network Sites(SNSs) are increasingly attracting the attention of academic and industry researches. Millions of users have integrated these sites into their daily practices to communicate with online people. In this paper, we propose an efficient face annotation and retrieval system under SNS. Since the system needs to deal with a huge database which consists of an increasing users and images, both effectiveness and efficiency are required, In order to deal with this problem, we propose a face annotation classifier which adopts an online learning and social decomposition approach. The proposed method is shown to have comparable accuracy and better efficiency than that of the widely used Support Vector Machine. Consequently, the proposed framework can reduce the user's tedious efforts to annotate face images and provides a fast response to millions of users.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Development of water distribution systems performance evaluation framework using machine learning technique (머신러닝을 이용한 상수도시스템 성능평가 프레임워크 개발)

  • Min Jun Kim;Ryul Kim;Hui Geun Kwon;Young Hwan Choi
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.204-204
    • /
    • 2023
  • 2020년 상수도 통계에 따르면 전국 상수도 보급률은 약 99% 정도로 높은 수치를 기록하고 있으나 노후관으로 인한 관로파손 및 수질사고로 인해 효과적인 운영에는 많은 어려움이 존재한다. 이러한 문제를 해결하기 위해 기술진단 및 정밀안전진단 등 체계적인 유지관리 규정이 도입되었으며 적용되고 있으며, 이때 시스템의 정량적인 성능평가를 위해 간접평가와 직접평가로 구성된 점수평가법이 적용되었다. 간접평가는 지중에 매설된 관로를 대상으로 매설연도, 관경, 관로연장 등의 노후도인자를 통해 관의 노후도를 추정하고 간접평가 결과 3등급으로 판명되는 관로의 경우 객관적인 관의 상태를 평가하기 위해 시편채취 및 관로 내시진단 등의 직접평가가 수행된다. 하지만 관로의 직접평가는 간접평가결과 3등급의 모든 관로에 대해 수행하기에는 진단비용 및 시간 등 제약조건에 따라 모든 지점에 대한 직접평가 수행에는 한계가 있다. 따라서, 본 연구에서는 이러한 관로 성능평가 기법의 한계를 개선하기 위해 상수도시스템 통합평가 기술을 개발하였다. 개발한 기술은 머신러닝 기법을 적용하여 간접평가 및 직접평가 결과를 토대로 직접평가가 필요한 지점의 결과를 예측하였다. 이를 바탕으로 상수도시스템 평가성능 향상 및 보강 우선순위 선정 단계에서 의사결정권자의 판단에 도움이 될 것으로 판단된다.

  • PDF