• 제목/요약/키워드: support vector machines (SVM)

검색결과 286건 처리시간 0.026초

문서측 자질선정을 이용한 고속 문서분류기의 성능향상에 관한 연구 (Improving the Performance of a Fast Text Classifier with Document-side Feature Selection)

  • 이재윤
    • 정보관리연구
    • /
    • 제36권4호
    • /
    • pp.51-69
    • /
    • 2005
  • 문서분류에 있어서 분류속도의 향상이 중요한 연구과제가 되고 있다. 최근 개발된 자질값투표 기법은 문서자동분류 문제에 대해서 매우 빠른 속도를 가졌지만, 분류정확도는 만족스럽지 못하다. 이 논문에서는 새로운 자질선정 기법인 문서측 자질선정 기법을 제안하고, 이를 자질값투표 기법에 적용해 보았다. 문서측 자질선정은 일반적인 분류자질선정과 달리 학습집단이 아닌 분류대상 문서의 자질 중 일부만을 선택하여 분류에 이용하는 방식이다. 문서측 자질선정을 적용한 실험에서는, 간단하고 빠른 자질값투표 분류기로 SVM 분류기만큼 좋은 성능을 얻을 수 있었다.

계량기 숫자 인식을 위한 잡영 제거 및 윤곽보존 숫자강화 (De-Noising and Contour Preserving Digit Enhancement for Meter Digit Recognition)

  • 이은규;고재필
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 가을 학술발표논문집 Vol.33 No.2 (B)
    • /
    • pp.515-520
    • /
    • 2006
  • 계량기 숫자 인식은 일반적으로 사용되고 있는 아날로그 계량기에 카메라를 부착하여, 검침 시 숫자 계기판 영상을 전송받고, 그 영상으로부터 숫자를 추출 및 인식하는 기술이다. 계량기 숫자 인식에서는 카메라의 설치 상태 및 기타 환경적인 요인들로 인해 숫자 계기판 영상의 일관성 있는 취득이 어렵게 된다. 본 논문에서는 숫자 인식에 악영향을 미치는, 취득 영상의 상태 변화를 보정해주기 위해 잡영 제거 및 윤곽보존 숫자강화를 제안하였다. 잡영 제거를 위해 잡영을 분포 위치에 따라서 세 가지 타입으로 나누었으며, 각 타입별로 잡영 제거를 하였다. 윤곽보존 숫자강화 과정에서는 일반적인 이진화 기법이 가지는 테두리 정보손실을 최소화할 수 있도록, 숫자 테두리의 명도를 보존하면서 숫자 중심부분의 밝기를 강화시켰다. 전처리 전/후의 인식률 비교 실험을 위해 SVM(Support Vector Machines)을 사용하였으며, 학습 데이터 1,409장과 조명 상태를 달리하여 취득한 1,782의 테스트 데이터를 실험 데이터로 사용하였다. 실험 결과, 81.09%라는 성능 향상을 확인하였으며 이는 제안한 전처리 기법이 조명으로 인한 데이터의 상태 변화 문제를 해결해줌으로써 인식 성능 향상에 크게 기여한다는 것을 입증해준다.

  • PDF

지지 벡터 머신을 이용한 다변수 결정 트리 (A Multivariate Decision Tree using Support Vector Machines)

  • 강선구;이병우;나용찬;조현성;윤철민;양지훈
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 가을 학술발표논문집 Vol.33 No.2 (B)
    • /
    • pp.278-283
    • /
    • 2006
  • 결정 트리는 큰 가설 공간을 가지고 있어 유연하고 강인한 성능을 지닐 수 있다. 하지만 결정트리가 학습 데이터에 지나치게 적응되는 경향이 있다. 학습데이터에 과도하게 적응되는 경향을 없애기 위해 몇몇 가지치기 알고리즘이 개발되었다. 하지만, 데이터가 속성 축에 평행하지 않아서 오는 공간 낭비의 문제는 이러한 방법으로 해결할 수 없다. 따라서 본 논문에서는 다변수 노드를 사용한 선형 분류기를 이용하여 이러한 문제점을 해결하는 방법을 제시하였으며, 결정트리의 성능을 높이고자 지지 벡터 머신을 도입하였다(SVMDT). 본 논문에서 제시한 알고리즘은 세 가지 부분으로 이루어졌다. 첫째로, 각 노드에서 사용할 속성을 선택하는 부분과 둘째로, ID3를 이 목적에 맞게 바꾼 알고리즘과 마지막으로 기본적인 형태의 가지치기 알고리즘을 개발하였다. UCI 데이터 셋을 이용하여 OC1, C4.5, SVM과 비교한 결과, SVMDT는 개선된 결과를 보였다.

  • PDF

Machine learning of LWR spent nuclear fuel assembly decay heat measurements

  • Ebiwonjumi, Bamidele;Cherezov, Alexey;Dzianisau, Siarhei;Lee, Deokjung
    • Nuclear Engineering and Technology
    • /
    • 제53권11호
    • /
    • pp.3563-3579
    • /
    • 2021
  • Measured decay heat data of light water reactor (LWR) spent nuclear fuel (SNF) assemblies are adopted to train machine learning (ML) models. The measured data is available for fuel assemblies irradiated in commercial reactors operated in the United States and Sweden. The data comes from calorimetric measurements of discharged pressurized water reactor (PWR) and boiling water reactor (BWR) fuel assemblies. 91 and 171 measurements of PWR and BWR assembly decay heat data are used, respectively. Due to the small size of the measurement dataset, we propose: (i) to use the method of multiple runs (ii) to generate and use synthetic data, as large dataset which has similar statistical characteristics as the original dataset. Three ML models are developed based on Gaussian process (GP), support vector machines (SVM) and neural networks (NN), with four inputs including the fuel assembly averaged enrichment, assembly averaged burnup, initial heavy metal mass, and cooling time after discharge. The outcomes of this work are (i) development of ML models which predict LWR fuel assembly decay heat from the four inputs (ii) generation and application of synthetic data which improves the performance of the ML models (iii) uncertainty analysis of the ML models and their predictions.

Language Matters: A Systemic Functional Linguistics-Enhanced Machine Learning Framework for Cyberbullying Detection

  • Raghad Altowairgi;Ala Eshamwi;Lobna Hsairi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권9호
    • /
    • pp.192-198
    • /
    • 2023
  • Cyberbullying is a growing problem among adolescents and can have serious psychological and emotional consequences for the victims. In recent years, machine learning techniques have emerged as promising approach for detecting instances of cyberbullying in online communication. This research paper focuses on developing a machine learning models that are able to detect cyberbullying including support vector machines, naïve bayes, and random forests. The study uses a dataset of real-world examples of cyberbullying collected from Twitter and extracts features that represents the ideational metafunction, then evaluates the performance of each algorithm before and after considering the theory of systemic functional linguistics in terms of precision, recall, and F1-score. The result indicates that all three algorithms are effective at detecting cyberbullying with 92% for naïve bayes and an accuracy of 93% for both SVM and random forests. However, the study also highlights the challenges of accurately detecting cyberbullying, particularly given the nuanced and context-dependent nature of online communication. This paper concludes by discussing the implications of these findings for future research and the development of practical tool for cyberbullying prevention and intervention.

A robust approach in prediction of RCFST columns using machine learning algorithm

  • Van-Thanh Pham;Seung-Eock Kim
    • Steel and Composite Structures
    • /
    • 제46권2호
    • /
    • pp.153-173
    • /
    • 2023
  • Rectangular concrete-filled steel tubular (RCFST) column, a type of concrete-filled steel tubular (CFST), is widely used in compression members of structures because of its advantages. This paper proposes a robust machine learning-based framework for predicting the ultimate compressive strength of RCFST columns under both concentric and eccentric loading. The gradient boosting neural network (GBNN), an efficient and up-to-date ML algorithm, is utilized for developing a predictive model in the proposed framework. A total of 890 experimental data of RCFST columns, which is categorized into two datasets of concentric and eccentric compression, is carefully collected to serve as training and testing purposes. The accuracy of the proposed model is demonstrated by comparing its performance with seven state-of-the-art machine learning methods including decision tree (DT), random forest (RF), support vector machines (SVM), deep learning (DL), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), and categorical gradient boosting (CatBoost). Four available design codes, including the European (EC4), American concrete institute (ACI), American institute of steel construction (AISC), and Australian/New Zealand (AS/NZS) are refereed in another comparison. The results demonstrate that the proposed GBNN method is a robust and powerful approach to obtain the ultimate strength of RCFST columns.

A novel method for vehicle load detection in cable-stayed bridge using graph neural network

  • Van-Thanh Pham;Hye-Sook Son;Cheol-Ho Kim;Yun Jang;Seung-Eock Kim
    • Steel and Composite Structures
    • /
    • 제46권6호
    • /
    • pp.731-744
    • /
    • 2023
  • Vehicle load information is an important role in operating and ensuring the structural health of cable-stayed bridges. In this regard, an efficient and economic method is proposed for vehicle load detection based on the observed cable tension and vehicle position using a graph neural network (GNN). Datasets are first generated using the practical advanced analysis program (PAAP), a robust program for modeling and considering both geometric and material nonlinearities of bridge structures subjected to vehicle load with low computational costs. With the superiority of GNN, the proposed model is demonstrated to precisely capture complex nonlinear correlations between the input features and vehicle load in the output. Four popular machine learning methods including artificial neural network (ANN), decision tree (DT), random forest (RF), and support vector machines (SVM) are refereed in a comparison. A case study of a cable-stayed bridge with the typical truck is considered to evaluate the model's performance. The results demonstrate that the GNN-based model provides high accuracy and efficiency in prediction with satisfactory correlation coefficients, efficient determination values, and very small errors; and is a novel approach for vehicle load detection with the input data of the existing monitoring system.

경항통 설문지를 이용한 한의학적 진단 및 분류체계에 관한 연구 (Research on Oriental Medicine Diagnosis and Classification System by Using Neck Pain Questionnaire)

  • 송인;이건목;홍권의
    • Journal of Acupuncture Research
    • /
    • 제28권3호
    • /
    • pp.85-100
    • /
    • 2011
  • Objectives : The purpose of this thesis is to help the preparation of oriental medicine clinical guidelines for drawing up the standards of oriental medicine demonstration and diagnosis classification about the neck pain. Methods : Statistical analysis about Gyeonghangtong(頸項痛), Nakchim(落枕), Sagyeong(斜頸), Hanggang (項强) classified experts' opinions about neck pain patients by Delphi method is conducted by using oriental medicine diagnosis questionnaire. The result was classified by using linear discriminant analysis (LDA), diagonal linear discriminant analysis (DLDA), diagonal quadratic discriminant analysis (DQDA), K-nearest neighbor classification (KNN), classification and regression trees (CART), support vector machines (SVM). Results : The results are summarized as follows. 1. The result analyzed by using LDA has a hit rate of 84.47% in comparison with the original diagnosis. 2. High hit rate was shown when the test for three categories such as Gyeonghangtong and Hanggang category, Sagyeong caterogy and Nakchim caterogy was conducted. 3. The result analyzed by using DLDA has a hit rate of 58.25% in comparison with the original diagnosis. The result analyzed by using DQDA has a accuracy of 57.28% in comparison with the original diagnosis. 4. The result analyzed by using KNN has a hit rate of 69.90% in comparison with the original diagnosis. 5. The result analyzed by using CART has a hit rate of 69.60% in comparison with the original diagnosis. There was a hit rate of 70.87% When the test of selected 8 significant questions based on analysis of variance was performed. 6. The result analyzed by using SVM has a hit rate of 80.58% in comparison with the original diagnosis. Conclusions : Statistical analysis using oriental medicine diagnosis questionnaire on neck pain generally turned out to have a significant result.

Hand Tracking and Hand Gesture Recognition for Human Computer Interaction

  • Bai, Yu;Park, Sang-Yun;Kim, Yun-Sik;Jeong, In-Gab;Ok, Soo-Yol;Lee, Eung-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제14권2호
    • /
    • pp.182-193
    • /
    • 2011
  • The aim of this paper is to present the methodology for hand tracking and hand gesture recognition. The detected hand and gesture can be used to implement the non-contact mouse. We had developed a MP3 player using this technology controlling the computer instead of mouse. In this algorithm, we first do a pre-processing to every frame which including lighting compensation and background filtration to reducing the adverse impact on correctness of hand tracking and hand gesture recognition. Secondly, YCbCr skin-color likelihood algorithm is used to detecting the hand area. Then, we used Continuously Adaptive Mean Shift (CAMSHIFT) algorithm to tracking hand. As the formula-based region of interest is square, the hand is closer to rectangular. We have improved the formula of the search window to get a much suitable search window for hand. And then, Support Vector Machines (SVM) algorithm is used for hand gesture recognition. For training the system, we collected 1500 hand gesture pictures of 5 hand gestures. Finally we have performed extensive experiment on a Windows XP system to evaluate the efficiency of the proposed scheme. The hand tracking correct rate is 96% and the hand gestures average correct rate is 95%.

엔트로피 거리와 SVM를 이용한 SNP 군집분석과 천식 유형 예측 (Cluster Analysis of SNPs with Entropy Distance and Prediction of Asthma Type Using SVM)

  • 이중섭;신기섭;위규범
    • 정보처리학회논문지B
    • /
    • 제18B권2호
    • /
    • pp.67-72
    • /
    • 2011
  • 단일염기다형성은 인간 게놈 구조 연구의 중요한 도구이다. 대량의 유전자 표현형 데이터에서의 군집 분석은 생물학적으로 연관이 있는 유전자 군을 발견하거나 유전자간 상호작용 네트워크를 생성하는데 유용하다. 본 논문에서는 엔트로피 거리를 기반으로 계층적 군집 분석 방법을 사용하여 천식환자군과 정상대조군의 군집을 형성하고 비교하였고 5개짜리 군집에서 두 군의 의미 있는 차이점이 나타남을 보였다. 천식환자군의 각 군집에서의 대표 SNP들의 조합의 질병 예측 정확도를 지지벡터기계를 이용하여 측정하여, 천식의 두 유형을 진단할 수 있는 최상의 조합을 찾았다. 최상의 조합은 유전자 ALOX12에 있는 단일염기다형성을 포함한 5개로 구성된 모델이며 66.41%의 아스피린 내성 천식 질병에 대한 예측 정확도를 갖는다.