Search | Korea Science

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

Kim, HanYong;Lee, Woojoo
- The Korean Journal of Applied Statistics
- /
- v.30 no.5
- /
- pp.681-690
- /
- 2017
Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.
https://doi.org/10.5351/KJAS.2017.30.5.681 인용 PDF KSCI

Image Quality Assessment by Combining Masking Texture and Perceptual Color Difference Model

Tang, Zhisen;Zheng, Yuanlin;Wang, Wei;Liao, Kaiyang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.7
- /
- pp.2938-2956
- /
- 2020
Objective image quality assessment (IQA) models have been developed by effective features to imitate the characteristics of human visual system (HVS). Actually, HVS is extremely sensitive to color degradation and complex texture changes. In this paper, we firstly reveal that many existing full reference image quality assessment (FR-IQA) methods can hardly measure the image quality with contrast and masking texture changes. To solve this problem, considering texture masking effect, we proposed a novel FR-IQA method, called Texture and Color Quality Index (TCQI). The proposed method considers both in the masking effect texture and color visual perceptual threshold, which adopts three kinds of features to reflect masking texture, color difference and structural information. Furthermore, random forest (RF) is used to address the drawbacks of existing pooling technologies. Compared with other traditional learning-based tools (support vector regression and neural network), RF can achieve the better prediction performance. Experiments conducted on five large-scale databases demonstrate that our approach is highly consistent with subjective perception, outperforms twelve the state-of-the-art IQA models in terms of prediction accuracy and keeps a moderate computational complexity. The cross database validation also validates our approach achieves the ability to maintain high robustness.
https://doi.org/10.3837/tiis.2020.07.012 인용 PDF KSCI HTML

No-reference Image Blur Assessment Based on Multi-scale Spatial Local Features

Sun, Chenchen;Cui, Ziguan;Gan, Zongliang;Liu, Feng
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.10
- /
- pp.4060-4079
- /
- 2020
Blur is an important type of image distortion. How to evaluate the quality of blurred image accurately and efficiently is a research hotspot in the field of image processing in recent years. Inspired by the multi-scale perceptual characteristics of the human visual system (HVS), this paper presents a no-reference image blur/sharpness assessment method based on multi-scale local features in the spatial domain. First, considering various content has different sensitivity to blur distortion, the image is divided into smooth, edge, and texture regions in blocks. Then, the Gaussian scale space of the image is constructed, and the categorized contrast features between the original image and the Gaussian scale space images are calculated to express the blur degree of different image contents. To simulate the impact of viewing distance on blur distortion, the distribution characteristics of local maximum gradient of multi-resolution images were also calculated in the spatial domain. Finally, the image blur assessment model is obtained by fusing all features and learning the mapping from features to quality scores by support vector regression (SVR). Performance of the proposed method is evaluated on four synthetically blurred databases and one real blurred database. The experimental results demonstrate that our method can produce quality scores more consistent with subjective evaluations than other methods, especially for real burred images.
https://doi.org/10.3837/tiis.2020.10.008 인용 PDF KSCI HTML

A Study on Automatic Learning of Weight Decay Neural Network (가중치감소 신경망의 자동학습에 관한 연구)

Hwang, Chang-Ha;Na, Eun-Young;Seok, Kyung-Ha
- Journal of the Korean Data and Information Science Society
- /
- v.12 no.2
- /
- pp.1-10
- /
- 2001
Neural networks we increasingly being seen as an addition to the statistics toolkit which should be considered alongside both classical and modern statistical methods. Neural networks are usually useful for classification and function estimation. In this paper we concentrate on function estimation using neural networks with weight decay factor The use of weight decay seems both to help the optimization process and to avoid overfitting. In this type of neural networks, the problem to decide the number of hidden nodes, weight decay parameter and iteration number of learning is very important. It is called the optimization of weight decay neural networks. In this paper we propose a automatic optimization based on genetic algorithms. Moreover, we compare the weight decay neural network automatically learned according to automatic optimization with ordinary neural network, projection pursuit regression and support vector machines.
PDF

A Study on the Prediction Model Considering the Multicollinearity of Independent Variables in the Seawater Reverse Osmosis (역삼투압 해수담수화(SWRO) 플랜트에서 독립변수의 다중공선성을 고려한 예측모델에 관한 연구)

Han, In sup;Yoon, Yeon-Ah;Chang, Tai-Woo;Kim, Yong Soo
- Journal of Korean Society for Quality Management
- /
- v.48 no.1
- /
- pp.171-186
- /
- 2020
Purpose: The purpose of this study is conducting of predictive models that considered multicollinearity of independent variables in order to carry out more efficient and reliable predictions about differential pressure in seawater reverse osmosis. Methods: The main variables of each RO system are extracted through factor analysis. Common variables are derived through comparison of RO system # 1 and RO system # 2. In order to carry out the prediction modeling about the differential pressure, which is the target variable, we constructed the prediction model reflecting the regression analysis, the artificial neural network, and the support vector machine in R package, and figured out the superiority of the model by comparing RMSE. Results: The number of factors extracted from factor analysis of RO system #1 and RO system #2 is same. And the value of variability(% Var) increased as step proceeds according to the analysis procedure. As a result of deriving the average RMSE of the models, the overall prediction of the SVM was superior to the other models. Conclusion: This study is meaningful in that it has been conducting a demonstration study of considering the multicollinearity of independent variables. Before establishing a predictive model for a target variable, it would be more accurate predictive model if the relevant variables are derived and reflected.
https://doi.org/10.7469/JKSQM.2020.48.1.171 인용 PDF KSCI

DESIGN OF A LOAD FOLLOWING CONTROLLER FOR APR+ NUCLEAR PLANTS

Lee, Sim-Won;Kim, Jae-Hwan;Na, Man-Gyun;Kim, Dong-Su;Yu, Keuk-Jong;Kim, Han-Gon
- Nuclear Engineering and Technology
- /
- v.44 no.4
- /
- pp.369-378
- /
- 2012
A load-following operation in APR+ nuclear plants is necessary to reduce the need to adjust the boric acid concentration and to efficiently control the control rods for flexible operation. In particular, a disproportion in the axial flux distribution, which is normally caused by a load-following operation in a reactor core, causes xenon oscillation because the absorption cross-section of xenon is extremely large and its effects in a reactor are delayed by the iodine precursor. A model predictive control (MPC) method was used to design an automatic load-following controller for the integrated thermal power level and axial shape index (ASI) control for APR+ nuclear plants. Some tracking controllers employ the current tracking command only. On the other hand, the MPC can achieve better tracking performance because it considers future commands in addition to the current tracking command. The basic concept of the MPC is to solve an optimization problem for generating finite future control inputs at the current time and to implement as the current control input only the first control input among the solutions of the finite time steps. At the next time step, the procedure to solve the optimization problem is then repeated. The support vector regression (SVR) model that is used widely for function approximation problems is used to predict the future outputs based on previous inputs and outputs. In addition, a genetic algorithm is employed to minimize the objective function of a MPC control algorithm with multiple constraints. The power level and ASI are controlled by regulating the control banks and part-strength control banks together with an automatic adjustment of the boric acid concentration. The 3-dimensional MASTER code, which models APR+ nuclear plants, is interfaced to the proposed controller to confirm the performance of the controlling reactor power level and ASI. Numerical simulations showed that the proposed controller exhibits very fast tracking responses.
https://doi.org/10.5516/NET.04.2012.509 인용 PDF KSCI

Feature Selection Using Submodular Approach for Financial Big Data

Attigeri, Girija;Manohara Pai, M.M.;Pai, Radhika M.
- Journal of Information Processing Systems
- /
- v.15 no.6
- /
- pp.1306-1325
- /
- 2019
As the world is moving towards digitization, data is generated from various sources at a faster rate. It is getting humungous and is termed as big data. The financial sector is one domain which needs to leverage the big data being generated to identify financial risks, fraudulent activities, and so on. The design of predictive models for such financial big data is imperative for maintaining the health of the country's economics. Financial data has many features such as transaction history, repayment data, purchase data, investment data, and so on. The main problem in predictive algorithm is finding the right subset of representative features from which the predictive model can be constructed for a particular task. This paper proposes a correlation-based method using submodular optimization for selecting the optimum number of features and thereby, reducing the dimensions of the data for faster and better prediction. The important proposition is that the optimal feature subset should contain features having high correlation with the class label, but should not correlate with each other in the subset. Experiments are conducted to understand the effect of the various subsets on different classification algorithms for loan data. The IBM Bluemix BigData platform is used for experimentation along with the Spark notebook. The results indicate that the proposed approach achieves considerable accuracy with optimal subsets in significantly less execution time. The algorithm is also compared with the existing feature selection and extraction algorithms.
https://doi.org/10.3745/JIPS.04.0149 인용 PDF KSCI

Discriminant analysis of grain flours for rice paper using fluorescence hyperspectral imaging system and chemometric methods

Seo, Youngwook;Lee, Ahyeong;Kim, Bal-Geum;Lim, Jongguk
- Korean Journal of Agricultural Science
- /
- v.47 no.3
- /
- pp.633-644
- /
- 2020
Rice paper is an element of Vietnamese cuisine that can be used to wrap vegetables and meat. Rice and starch are the main ingredients of rice paper and their mixing ratio is important for quality control. In a commercial factory, assessment of food safety and quantitative supply is a challenging issue. A rapid and non-destructive monitoring system is therefore necessary in commercial production systems to ensure the food safety of rice and starch flour for the rice paper wrap. In this study, fluorescence hyperspectral imaging technology was applied to classify grain flours. Using the 3D hyper cube of fluorescence hyperspectral imaging (fHSI, 420 - 730 nm), spectral and spatial data and chemometric methods were applied to detect and classify flours. Eight flours (rice: 4, starch: 4) were prepared and hyperspectral images were acquired in a 5 (L) × 5 (W) × 1.5 (H) cm container. Linear discriminant analysis (LDA), partial least square discriminant analysis (PLSDA), support vector machine (SVM), classification and regression tree (CART), and random forest (RF) with a few preprocessing methods (multivariate scatter correction [MSC], 1^st and 2^nd derivative and moving average) were applied to classify grain flours and the accuracy was compared using a confusion matrix (accuracy and kappa coefficient). LDA with moving average showed the highest accuracy at A = 0.9362 (K = 0.9270). 1D convolutional neural network (CNN) demonstrated a classification result of A = 0.94 and showed improved classification results between mimyeon flour (MF)1 and MF2 of 0.72 and 0.87, respectively. In this study, the potential of non-destructive detection and classification of grain flours using fHSI technology and machine learning methods was demonstrated.
https://doi.org/10.7744/kjoas.20200051 인용 PDF KSCI

Differences of Cold-heat Patterns between Healthy and Disease Group (건강군과 질환군의 한열지표 차이에 관한 고찰)

Kim Ji-Eun;Lee Seung-Gi;Ryu Hwa-Seung;Park Kyung-Mo
- Journal of Physiology & Pathology in Korean Medicine
- /
- v.20 no.1
- /
- pp.224-228
- /
- 2006
The pattern identification of exterior-interior syndrome and cold-heat syndrome is one of the diagnostic methods using most frequently in Oriental medicine. There was no systematic studies analyzing the characteristics of the 'exterior-interior and cold-heat' between healthy and disease group. In this study, cold-heat pattern, blood pressure, pulse rate, height and weight are recorded from 100 healthy subjects and 196 disease subjects with age ranging from 30 to 59 years. To analyze the differences between healthy and disease group, we used the descriptive statistics. And linear regression function, linear support vector machine and bayesian classifier were used for distinguishing healthy group from disease group. The score of both exterior-heat and interior-cold in healthy group is higher than the score in disease group. This means that if one belongs to the disease group, his(or her) exterior gets cold and his interior gets hot. And also, these result have no relevance to age. But, the attempt to classify healthy group from disease group with a exterior-interior and cold-heat and other vital signs did not have good performance. It mean that even though they have a different trend each other, only these kinds of information couldn't classify healthy group and disease group.
PDF KSCI

Landslide susceptibility assessment using feature selection-based machine learning models

Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
- Geomechanics and Engineering
- /
- v.25 no.1
- /
- pp.1-16
- /
- 2021
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.
https://doi.org/10.12989/gae.2021.25.1.001 인용 KSCI

Search Result 549, Processing Time 0.038 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)