• Title/Summary/Keyword: Machine classification

Search Result 2,079, Processing Time 0.025 seconds

Automatic Identification of Database Workloads by using SVM Workload Classifier (SVM 워크로드 분류기를 통한 자동화된 데이터베이스 워크로드 식별)

  • Kim, So-Yeon;Roh, Hong-Chan;Park, Sang-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.4
    • /
    • pp.84-90
    • /
    • 2010
  • DBMS is used for a range of applications from data warehousing through on-line transaction processing. As a result of this demand, DBMS has continued to grow in terms of its size. This growth invokes the most important issue of manually tuning the performance of DBMS. The DBMS tuning should be adaptive to the type of the workload put upon it. But, identifying workloads in mixed database applications might be quite difficult. Therefore, a method is necessary for identifying workloads in the mixed database environment. In this paper, we propose a SVM workload classifier to automatically identify a DBMS workload. Database workloads are collected in TPC-C and TPC-W benchmark while changing the resource parameters. Parameters for SVM workload classifier, C and kernel parameter, were chosen experimentally. The experiments revealed that the accuracy of the proposed SVM workload classifier is about 9% higher than that of Decision tree, Naive Bayes, Multilayer perceptron and K-NN classifier.

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Energy Expenditure of Male Blue Collar Workers (생산직 남성근로자의 작업 중 에너지 소모량)

  • Woo, Ji Hoon;Kang, Dongmug;Shin, Yong Chul;Kim, Myeong Ock;Son, Min Jung;Kim, Boo Wook;Cho, Byung Mann;Lee, Su Ill
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.16 no.2
    • /
    • pp.183-192
    • /
    • 2006
  • Predicting energy expenditure (EE) is important to prevent work-related musculoskeletal disorders (WMSDs). The problem to predict EE is that the standard of EE is based on western data. The authors checked average EE by job categories to provide basic data for suggesting proper work intensity for Korean workers. This study was conducted from 2003 to 2005. Study subjects were recruited from 4 car parts assembly plant, 2 car assembly plant, 2 Heavy machine manufacturing plant and 2 shipyards. Total study subjects were 515 male workers. To estimate VO2max, sub-maximal test was conducted to measure VO275%max by bicycle ergometer (Combi Co, Aerobike 75XL II). Heartbeats were recorded with heartbeat recorder (Polar Electro Co, Finland, S810) during work. EE of work was calculated by recorded heartbeat and individual regression equation which was derived from sub-maximal test. Subjects were classified into 4 industry and 8 work posture, 23 job task categories. Mean EEs (S.D.) according to industry classification (kcal/min) were 4.9 (0.7), 4.8 (0.7), 4.9 (0.7), 5.0 (0.9), and 4.0 (0.5) for Car Part manufacture, Car Assembly, Ship Building, Heavy Machinery Manufacture, and Hospital Office, respectively. The results suggest that Korean male workers of exceeding to the NIOSH criteria will be needed to plan for job rescheduling to maintain $worker^{\circ}$Øs health. Further study to establish Korean work intensity standard would be needed.

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

Solving Multi-class Problem using Support Vector Machines (Support Vector Machines을 이용한 다중 클래스 문제 해결)

  • Ko, Jae-Pil
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.12
    • /
    • pp.1260-1270
    • /
    • 2005
  • Support Vector Machines (SVM) is well known for a representative learner as one of the kernel methods. SVM which is based on the statistical learning theory shows good generalization performance and has been applied to various pattern recognition problems. However, SVM is basically to deal with a two-class classification problem, so we cannot solve directly a multi-class problem with a binary SVM. One-Per-Class (OPC) and All-Pairs have been applied to solve the face recognition problem, which is one of the multi-class problems, with SVM. The two methods above are ones of the output coding methods, a general approach for solving multi-class problem with multiple binary classifiers, which decomposes a complex multi-class problem into a set of binary problems and then reconstructs the outputs of binary classifiers for each binary problem. In this paper, we introduce the output coding methods as an approach for extending binary SVM to multi-class SVM and propose new output coding schemes based on the Error-Correcting Output Codes (ECOC) which is a dominant theoretical foundation of the output coding methods. From the experiment on the face recognition, we give empirical results on the properties of output coding methods including our proposed ones.

A layered-wise data augmenting algorithm for small sampling data (적은 양의 데이터에 적용 가능한 계층별 데이터 증강 알고리즘)

  • Cho, Hee-chan;Moon, Jong-sub
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.65-72
    • /
    • 2019
  • Data augmentation is a method that increases the amount of data through various algorithms based on a small amount of sample data. When machine learning and deep learning techniques are used to solve real-world problems, there is often a lack of data sets. The lack of data is at greater risk of underfitting and overfitting, in addition to the poor reflection of the characteristics of the set of data when learning a model. Thus, in this paper, through the layer-wise data augmenting method at each layer of deep neural network, the proposed method produces augmented data that is substantially meaningful and shows that the method presented by the paper through experimentation is effective in the learning of the model by measuring whether the method presented by the paper improves classification accuracy.

Document Clustering using Term reweighting based on NMF (NMF 기반의 용어 가중치 재산정을 이용한 문서군집)

  • Lee, Ju-Hong;Park, Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.11-18
    • /
    • 2008
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the re-weighted term based NMF(non-negative matrix factorization) to cluster documents relevant to a user's requirement. The proposed model uses the re-weighted term by using user feedback to reduce the gap between the user's requirement for document classification and the document clusters by means of machine. The Proposed method can improve the quality of document clustering because the re-weighted terms. the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.

  • PDF

PREDICTION OF SEVERE ACCIDENT OCCURRENCE TIME USING SUPPORT VECTOR MACHINES

  • KIM, SEUNG GEUN;NO, YOUNG GYU;SEONG, POONG HYUN
    • Nuclear Engineering and Technology
    • /
    • v.47 no.1
    • /
    • pp.74-84
    • /
    • 2015
  • If a transient occurs in a nuclear power plant (NPP), operators will try to protect the NPP by estimating the kind of abnormality and mitigating it based on recommended procedures. Similarly, operators take actions based on severe accident management guidelines when there is the possibility of a severe accident occurrence in an NPP. In any such situation, information about the occurrence time of severe accident-related events can be very important to operators to set up severe accident management strategies. Therefore, support systems that can quickly provide this kind of information will be very useful when operators try to manage severe accidents. In this research, the occurrence times of several events that could happen during a severe accident were predicted using support vector machines with short time variations of plant status variables inputs. For the preliminary step, the break location and size of a loss of coolant accident (LOCA) were identified. Training and testing data sets were obtained using the MAAP5 code. The results show that the proposed algorithm can correctly classify the break location of the LOCA and can estimate the break size of the LOCA very accurately. In addition, the occurrence times of severe accident major events were predicted under various severe accident paths, with reasonable error. With these results, it is expected that it will be possible to apply the proposed algorithm to real NPPs because the algorithm uses only the early phase data after the reactor SCRAM, which can be obtained accurately for accident simulations.

Feasibility of Using Similar Electrocardiography Measured around the Ears to Develop a Personal Authentication System (귀 주변에서 측정한 유사 심전도 기반 개인 인증 시스템 개발 가능성)

  • Choi, Ga-Young;Park, Jong-Yoon;Kim, Da-Yeong;Kim, Yeonu;Lim, Ji-Heon;Hwang, Han-Jeong
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.1
    • /
    • pp.42-47
    • /
    • 2020
  • A personal authentication system based on biosignals has received increasing attention due to its relatively high security as compared to traditional authentication systems based on a key and password. Electrocardiography (ECG) measured from the chest or wrist is one of the widely used biosignals to develop a personal authentication system. In this study, we investigated the feasibility of using similar ECG measured behind the ears to develop a personal authentication system. To this end, similar ECGs were measured from thirty subjects using a pair of three electrodes attached behind each of the ears during resting state during which the standard Lead-I ECG was also simultaneously measured from both wrists as baseline ECG. The three ECG components, Q, R, and S, were extracted for each subject as classification features, and authentication accuracy was estimated using support vector machine (SVM) based on a 5×5-fold cross-validation. The mean authentication accuracies of Lead I-ECG and similar ECG were 90.41 ± 8.26% and 81.15 ± 7.54%, respectively. Considering a chance level of 3.33% (=1/30), the mean authentication performance of similar ECG could demonstrate the feasibility of using similar ECG measured behind the ears on the development of a personal authentication system.

A Defect Inspection Method in TFT-LCD Panel Using LS-SVM (LS-SVM을 이용한 TFT-LCD 패널 내의 결함 검사 방법)

  • Choi, Ho-Hyung;Lee, Gun-Hee;Kim, Ja-Geun;Joo, Young-Bok;Choi, Byung-Jae;Park, Kil-Houm;Yun, Byoung-Ju
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.6
    • /
    • pp.852-859
    • /
    • 2009
  • Normally, to extract the defect in TFT-LCD inspection system, the image is obtained by using line scan camera or area scan camera which is achieved by CCD or CMOS sensor. Because of the limited dynamic range of CCD or CMOS sensor as well as the effect of the illumination, these images are frequently degraded and the important features are hard to decern by a human viewer. In order to overcome this problem, the feature vectors in the image are obtained by using the average intensity difference between defect and background based on the weber's law and the standard deviation of the background region. The defect detection method uses non-linear SVM (Supports Vector Machine) method using the extracted feature vectors. The experiment results show that the proposed method yields better performance of defect classification methods over conveniently method.