• Title/Summary/Keyword: Feature Variables

Search Result 362, Processing Time 0.026 seconds

Investigating Non-Laboratory Variables to Predict Diabetic and Prediabetic Patients from Electronic Medical Records Using Machine Learning

  • Mukhtar, Hamid;Al Azwari, Sana
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.19-30
    • /
    • 2021
  • Diabetes Mellitus (DM) is one of common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of the medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient without the laboratory tests by performing screening based on some personal features can lessen the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic and prediabetic patients by considering factors other than the laboratory tests, as required by physicians in general. With the data obtained from local hospitals, medical records were processed to obtain a dataset that classified patients into three classes: diabetic, prediabetic, and non-diabetic. After applying three machine learning algorithms, we established good performance for accuracy, precision, and recall of the models on the dataset. Further analysis was performed on the data to identify important non-laboratory variables related to the patients for diabetes classification. The importance of five variables (gender, physical activity level, hypertension, BMI, and age) from the person's basic health data were investigated to find their contribution to the state of a patient being diabetic, prediabetic or normal. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing class-specific analysis of the disease, important factors specific to Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learnt from this research.

Feature Selection Methodology in Quality Data Mining

  • Soo, Nam-Ho;Halim, Yulius
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.05a
    • /
    • pp.698-701
    • /
    • 2004
  • In many literatures, data mining has been used as a utilization of data warehouse and data collection. The biggest utilizations of data mining are for marketing and researches. This is solely because of the data available for this field is usually in large amount. The usability of the data mining is expandable also to the production process. While the object of research of the data mining in marketing is the customers and products, data mining in the production field is object to the so called 4MlE, man, machine, materials, method (recipe) and environment. All of the elements are important to the production process which determines the quality of the product. Because the final aim of the data mining in production field is the quality of the production, this data mining is commonly recognized as quality data mining. As the variables researched in quality data mining can be hundreds or more, it could take a long time to reveal the information from the data warehouse. Feature selection methodology is proposed to help the research take the best performance in a relatively short time. The usage of available simple statistical tools in this method can help the speed of the mining.

  • PDF

Value Weighted Regularized Logistic Regression Model (속성값 기반의 정규화된 로지스틱 회귀분석 모델)

  • Lee, Chang-Hwan;Jung, Mina
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1270-1274
    • /
    • 2016
  • Logistic regression is widely used for predicting and estimating the relationship among variables. We propose a new logistic regression model, the value weighted logistic regression, which comprises of a fine-grained weighting method, and assigns adapted weights to each feature value. This gradient approach obtains the optimal weights of feature values. Experiments were conducted on several data sets from the UCI machine learning repository, and the results revealed that the proposed method achieves meaningful improvement in the prediction accuracy.

Research on Purchase Decision Factors to TV Home Shopping Product: Digital·Home Appliance

  • Lee, Kwang-Keun;Jang, Si-Nam;Kim, Pan-Jin
    • Asian Journal of Business Environment
    • /
    • v.2 no.2
    • /
    • pp.13-21
    • /
    • 2012
  • Purpose - The purpose of this research was to suggest purchasing decision factors through understanding the context of purchasing behavior and to figure out variables related to purchasing decision, purchasing cognition, and attitude. Research design / data / methodology - By random sampling, 200 consumers who are over 20, have purchased Digital·home appliance on TV home shopping, and have lived in the Seoul area were chosen as sample subjects. Questionnaires data were obtained from all subjects by self-administration method. Results - Result of analysis could be summarized as following. Analysis of the cognition of digital/home appliance product features, and influence of digital/home appliance product feature to purchasing intention are presented in the following order; price (3.50), diversity (3.10), brand (3.00). Also, analysis of the cognition of TV home shopping feature and influence of TV home shopping feature to purchasing of digital/home appliance are presented in the following order; awareness (3.63), safety of delivery (3.38), safety of transaction (3.28), product test (3.27). Conclusions - Purchasing attention of TV home shopping features presented difference in awareness, safety of delivery, safety of transaction, and product testing factors. In order to vitalize home shopping, impossibility of quality confirmation should be overcome and reinforcement of brand power should be considered.

  • PDF

A Prediction of Chip Quality using OPTICS (Ordering Points to Identify the Clustering Structure)-based Feature Extraction at the Cell Level (셀 레벨에서의 OPTICS 기반 특질 추출을 이용한 칩 품질 예측)

  • Kim, Ki Hyun;Baek, Jun Geol
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.3
    • /
    • pp.257-266
    • /
    • 2014
  • The semiconductor manufacturing industry is managed by a number of parameters from the FAB which is the initial step of production to package test which is the final step of production. Various methods for prediction for the quality and yield are required to reduce the production costs caused by a complicated manufacturing process. In order to increase the accuracy of quality prediction, we have to extract the significant features from the large amount of data. In this study, we propose the method for extracting feature from the cell level data of probe test process using OPTICS which is one of the density-based clustering to improve the prediction accuracy of the quality of the assembled chips that will be placed in a package test. Two features extracted by using OPTICS are used as input variables of quality prediction model because of having position information of the cell defect. The package test progress for chips classified to the correct quality grade by performing the improved prediction method is expected to bring the effect of reducing production costs.

Cumulative Sum Control Charts for Simultaneously Monitoring Means and Variances of Multiple Quality Variables

  • Chang, Duk-Joon;Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.5 no.4
    • /
    • pp.246-252
    • /
    • 2012
  • Multivariate cumulative sum (CUSUM) control charts for simultaneously monitoring both means and variances under multivariate normal process are investigated. Performances of multivariate CUSUM schemes are evaluated for matched fixed sampling interval (FSI) and variable sampling interval (VSI) features in terms of average time to signal (ATS), average number of samples to signal (ANSS). Multivariate Shewhart charts are also considered to compare the properties of multivariate CUSUM charts. Numerical results show that presented CUSUM charts are more efficient than the corresponding Shewhart chart for small or moderate shifts and VSI feature with two sampling intervals is more efficient than FSI feature. When small changes in the production process have occurred, CUSUM chart with small reference values will be recommended in terms of the time to signal.

Fine-Grain Weighted Logistic Regression Model (가중치 세분화 기반의 로지스틱 회귀분석 모델)

  • Lee, Chang-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.9
    • /
    • pp.77-81
    • /
    • 2016
  • Logistic regression (LR) has been widely used for predicting the relationships among variables in various fields. We propose a new logistic regression model with a fine-grained weighting method, called value weighted logistic regression, by assigning different weights to each feature value. A gradient approach is utilized to obtain the optimal weights of feature values. We conduct experiments on several data sets and the experimental results show that the proposed method shows meaningful improvement in prediction accuracy.

HMM-Based Automatic Speech Recognition using EMG Signal

  • Lee Ki-Seung
    • Journal of Biomedical Engineering Research
    • /
    • v.27 no.3
    • /
    • pp.101-109
    • /
    • 2006
  • It has been known that there is strong relationship between human voices and the movements of the articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The EMG signals were acquired from three articulatory facial muscles. Preliminary, 10 Korean digits were used as recognition variables. The various feature parameters including filter bank outputs, linear predictive coefficients and cepstrum coefficients were evaluated to find the appropriate parameters for EMG-based speech recognition. The sequence of the EMG signals for each word is modelled by a hidden Markov model (HMM) framework. A continuous word recognition approach was investigated in this work. Hence, the model for each word is obtained by concatenating the subword models and the embedded re-estimation techniques were employed in the training stage. The findings indicate that such a system may have a capacity to recognize speech signals with an accuracy of up to 90%, in case when mel-filter bank output was used as the feature parameters for recognition.

Feature extraction for Power Quality analysis (전력품질 분석을 위한 특징 추출)

  • Lee, Jin-Mok;Hong, Duc-Pyo;Choi, Jae-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2005.07e
    • /
    • pp.94-96
    • /
    • 2005
  • Power Quality(PQ) problems are various owing to a wide variety of causes so detection and classification of many kinds of PQ problems are awkward. Almost all studies about it were about getting good results by Neural Networks(NN) which get input features from as random variables, FFT and wavelet transform. However they are discontented with results because it is very difficult to classify all PQ items. A study about feature extraction becomes needed. Thus, this paper suggests effective way of using principle Component Analysis (PCA) for PQ Problem classification. PCA found more effective features among all features so it will help us to get more good result of classification.

  • PDF

Feature Analysis for Fisheries Electronic Catalog′s Standards (수산물 전자카탈로그 표준화를 위한 속성 분석)

  • 김진백
    • The Journal of Fisheries Business Administration
    • /
    • v.33 no.1
    • /
    • pp.19-41
    • /
    • 2002
  • Recently, the number of Internet shopping malls increases dramatically Internet shopping malls offer direct sales by electronic catalogs. As compared with to physical stores and paper catalogs, electronic catalogs differ in terms of the varieties and types of products offered, promotional efforts, service, interface, ordering and delivering, and so on. This paper analysed the features of electronic catalogs for fisheries by 45 variables. By descriptive statistics of electronic catalogs for fisheries, most electron)c catalogs had sufficient product related information. But promotion and transaction security related features were scarce. And some development technologies of electronic catalogs for fisheries were obsolete. By factor analysis, there were 9 factors of electronic catalogs for fisheries, that was, design of product pages, transaction information, playfulness, convenience of product selection, interface design, design of homepages, product information, learning capability, other electronic catalog related factor. Thus in standardizing electronic catalog for fisheries products, the above 9 factors should be reflected significantly.

  • PDF