• Title/Summary/Keyword: Classification algorithms

Search Result 1,168, Processing Time 0.031 seconds

Comparison of Performance Factors for Automatic Classification of Records Utilizing Metadata (메타데이터를 활용한 기록물 자동분류 성능 요소 비교)

  • Young Bum Gim;Woo Kwon Chang
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.3
    • /
    • pp.99-118
    • /
    • 2023
  • The objective of this study is to identify performance factors in the automatic classification of records by utilizing metadata that contains the contextual information of records. For this study, we collected 97,064 records of original textual information from Korean central administrative agencies in 2022. Various classification algorithms, data selection methods, and feature extraction techniques are applied and compared with the intent to discern the optimal performance-inducing technique. The study results demonstrated that among classification algorithms, Random Forest displayed higher performance, and among feature extraction techniques, the TF method proved to be the most effective. The minimum data quantity of unit tasks had a minimal influence on performance, and the addition of features positively affected performance, while their removal had a discernible negative impact.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

Determination of the stage and grade of periodontitis according to the current classification of periodontal and peri-implant diseases and conditions (2018) using machine learning algorithms

  • Kubra Ertas;Ihsan Pence;Melike Siseci Cesmeli;Zuhal Yetkin Ay
    • Journal of Periodontal and Implant Science
    • /
    • v.53 no.1
    • /
    • pp.38-53
    • /
    • 2023
  • Purpose: The current Classification of Periodontal and Peri-Implant Diseases and Conditions, published and disseminated in 2018, involves some difficulties and causes diagnostic conflicts due to its criteria, especially for inexperienced clinicians. The aim of this study was to design a decision system based on machine learning algorithms by using clinical measurements and radiographic images in order to determine and facilitate the staging and grading of periodontitis. Methods: In the first part of this study, machine learning models were created using the Python programming language based on clinical data from 144 individuals who presented to the Department of Periodontology, Faculty of Dentistry, Süleyman Demirel University. In the second part, panoramic radiographic images were processed and classification was carried out with deep learning algorithms. Results: Using clinical data, the accuracy of staging with the tree algorithm reached 97.2%, while the random forest and k-nearest neighbor algorithms reached 98.6% accuracy. The best staging accuracy for processing panoramic radiographic images was provided by a hybrid network model algorithm combining the proposed ResNet50 architecture and the support vector machine algorithm. For this, the images were preprocessed, and high success was obtained, with a classification accuracy of 88.2% for staging. However, in general, it was observed that the radiographic images provided a low level of success, in terms of accuracy, for modeling the grading of periodontitis. Conclusions: The machine learning-based decision system presented herein can facilitate periodontal diagnoses despite its current limitations. Further studies are planned to optimize the algorithm and improve the results.

Algorithms for Classifying the Results at the Baccalaureate Exam-Comparative Analysis of Performances

  • Marcu, Daniela;Danubianu, Mirela;Barila, Adina;Simionescu, Corina
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.35-42
    • /
    • 2021
  • In the current context of digitalization of education, the use of modern methods and techniques of data analysis and processing in order to improve students' school results has a very important role. In our paper, we aimed to perform a comparative study of the classification performances of AdaBoost, SVM, Naive Bayes, Neural Network and kNN algorithms to classify the results obtained at the Baccalaureate by students from a college in Suceava, during 2012-2019. To evaluate the results we used the metrics: AUC, CA, F1, Precision and Recall. The AdaBoost algorithm achieves incredible performance for classifying the results into two categories: promoted / rejected. Next in terms of performance is Naive Bayes with a score of 0.999 for the AUC metric. The Neural Network and kNN algorithms obtain scores of 0.998 and 0.996 for AUC, respectively. SVM shows poorer performance with the score 0.987 for AUC. With the help of the HeatMap and DataTable visualization tools we identified possible correlations between classification results and some characteristics of data.

Study of oversampling algorithms for soil classifications by field velocity resistivity probe

  • Lee, Jong-Sub;Park, Junghee;Kim, Jongchan;Yoon, Hyung-Koo
    • Geomechanics and Engineering
    • /
    • v.30 no.3
    • /
    • pp.247-258
    • /
    • 2022
  • A field velocity resistivity probe (FVRP) can measure compressional waves, shear waves and electrical resistivity in boreholes. The objective of this study is to perform the soil classification through a machine learning technique through elastic wave velocity and electrical resistivity measured by FVRP. Field and laboratory tests are performed, and the measured values are used as input variables to classify silt sand, sand, silty clay, and clay-sand mixture layers. The accuracy of k-nearest neighbors (KNN), naive Bayes (NB), random forest (RF), and support vector machine (SVM), selected to perform classification and optimize the hyperparameters, is evaluated. The accuracies are calculated as 0.76, 0.91, 0.94, and 0.88 for KNN, NB, RF, and SVM algorithms, respectively. To increase the amount of data at each soil layer, the synthetic minority oversampling technique (SMOTE) and conditional tabular generative adversarial network (CTGAN) are applied to overcome imbalance in the dataset. The CTGAN provides improved accuracy in the KNN, NB, RF and SVM algorithms. The results demonstrate that the measured values by FVRP can classify soil layers through three kinds of data with machine learning algorithms.

Classification Strategies for High Resolution Images of Korean Forests: A Case Study of Namhansansung Provincial Park, Korea

  • Park, Chong-Hwa;Choi, Sang-Il
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.708-708
    • /
    • 2002
  • Recent developments in sensor technologies have provided remotely sensed data with very high spatial resolution. In order to fully utilize the potential of high resolution images, new image classification strategies are necessary. Unfortunately, the high resolution images increase the spectral within-field variability, and the classification accuracy of traditional methods based on pixel-based classification algorithms such as Maximum-Likelihood method may be decreased (Schiewe 2001). Recent development in Object Oriented Classification based on image segmentation algorithms can be used for the classification of forest patches on rugged terrain of Korea. The objectives of this paper are as follows. First, to compare the pros and cons of image classification methods based on pixel-based and object oriented classification algorithm for the forest patch classification. Landsat ETM+ data and IKONOS data will be used for the classification. Second, to investigate ways to increase classification accuracy of forest patches. Supplemental data such as DTM and Forest Type Map of 1:25,000 scale are used for topographic correction and image segmentation. Third, to propose the best classification strategy for forest patch classification in terms of accuracy and data requirement. The research site for this paper is Namhansansung Provincial Park located at the eastern suburb of Seoul Metropolitan City for its diverse forest patch types and data availability. Both Landsat ETM+ and IKONOS data are used for the classification. Preliminary results can be summarized as follows. First, topographic correction of reflectance is essential for the classification of forest patches on rugged terrain. Second, object oriented classification of IKONOS data enables higher classification accuracy compared to Landsat ETM+ and pixel-based classification. Third, multi-stage segmentation is very useful to investigate landscape ecological aspect of forest communities of Korea.

  • PDF

Neural Network Model Compression Algorithms for Image Classification in Embedded Systems (임베디드 시스템에서의 객체 분류를 위한 인공 신경망 경량화 연구)

  • Shin, Heejung;Oh, Hyondong
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.2
    • /
    • pp.133-141
    • /
    • 2022
  • This paper introduces model compression algorithms which make a deep neural network smaller and faster for embedded systems. The model compression algorithms can be largely categorized into pruning, quantization and knowledge distillation. In this study, gradual pruning, quantization aware training, and knowledge distillation which learns the activation boundary in the hidden layer of the teacher neural network are integrated. As a large deep neural network is compressed and accelerated by these algorithms, embedded computing boards can run the deep neural network much faster with less memory usage while preserving the reasonable accuracy. To evaluate the performance of the compressed neural networks, we evaluate the size, latency and accuracy of the deep neural network, DenseNet201, for image classification with CIFAR-10 dataset on the NVIDIA Jetson Xavier.

Multiscale Clustering and Profile Visualization of Malocclusion in Korean Orthodontic Patients : Cluster Analysis of Malocclusion

  • Jeong, Seo-Rin;Kim, Sehyun;Kim, Soo Yong;Lim, Sung-Hoon
    • International Journal of Oral Biology
    • /
    • v.43 no.2
    • /
    • pp.101-111
    • /
    • 2018
  • Understanding the classification of malocclusion is a crucial issue in Orthodontics. It can also help us to diagnose, treat, and understand malocclusion to establish a standard for definite class of patients. Principal component analysis (PCA) and k-means algorithms have been emerging as data analytic methods for cephalometric measurements, due to their intuitive concepts and application potentials. This study analyzed the macro- and meso-scale classification structure and feature basis vectors of 1020 (415 male, 605 female; mean age, 25 years) orthodontic patients using statistical preprocessing, PCA, random matrix theory (RMT) and k-means algorithms. RMT results show that 7 principal components (PCs) are significant standard in the extraction of features. Using k-means algorithms, 3 and 6 clusters were identified and the axes of PC1~3 were determined to be significant for patient classification. Macro-scale classification denotes skeletal Class I, II, III and PC1 means anteroposterior discrepancy of the maxilla and mandible and mandibular position. PC2 and PC3 means vertical pattern and maxillary position respectively; they played significant roles in the meso-scale classification. In conclusion, the typical patient profile (TPP) of each class showed that the data-based classification corresponds with the clinical classification of orthodontic patients. This data-based study can provide insight into the development of new diagnostic classifications.

Comparison between Possibilistic c-Means (PCM) and Artificial Neural Network (ANN) Classification Algorithms in Land use/ Land cover Classification

  • Ganbold, Ganchimeg;Chasia, Stanley
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.7 no.1
    • /
    • pp.57-78
    • /
    • 2017
  • There are several statistical classification algorithms available for land use/land cover classification. However, each has a certain bias or compromise. Some methods like the parallel piped approach in supervised classification, cannot classify continuous regions within a feature. On the other hand, while unsupervised classification method takes maximum advantage of spectral variability in an image, the maximally separable clusters in spectral space may not do much for our perception of important classes in a given study area. In this research, the output of an ANN algorithm was compared with the Possibilistic c-Means an improvement of the fuzzy c-Means on both moderate resolutions Landsat8 and a high resolution Formosat 2 images. The Formosat 2 image comes with an 8m spectral resolution on the multispectral data. This multispectral image data was resampled to 10m in order to maintain a uniform ratio of 1:3 against Landsat 8 image. Six classes were chosen for analysis including: Dense forest, eucalyptus, water, grassland, wheat and riverine sand. Using a standard false color composite (FCC), the six features reflected differently in the infrared region with wheat producing the brightest pixel values. Signature collection per class was therefore easily obtained for all classifications. The output of both ANN and FCM, were analyzed separately for accuracy and an error matrix generated to assess the quality and accuracy of the classification algorithms. When you compare the results of the two methods on a per-class-basis, ANN had a crisper output compared to PCM which yielded clusters with pixels especially on the moderate resolution Landsat 8 imagery.

Discriminating Eggs from Two Local Breeds Based on Fatty Acid Profile and Flavor Characteristics Combined with Classification Algorithms

  • Dong, Xiao-Guang;Gao, Li-Bing;Zhang, Hai-Jun;Wang, Jing;Qiu, Kai;Qi, Guang-Hai;Wu, Shu-Geng
    • Food Science of Animal Resources
    • /
    • v.41 no.6
    • /
    • pp.936-949
    • /
    • 2021
  • This study discriminated fatty acid profile and flavor characteristics of Beijing You Chicken (BYC) as a precious local breed and Dwarf Beijing You Chicken (DBYC) eggs. Fatty acid profile and flavor characteristics were analyzed to identify differences between BYC and DBYC eggs. Four classification algorithms were used to build classification models. Arachidic acid, oleic acid (OA), eicosatrienoic acid, docosapentaenoic acid (DPA), hexadecenoic acid, monounsaturated fatty acids (MUFA), polyunsaturated fatty acids (PUFA), unsaturated fatty acids (UFA) and 35 volatile compounds had significant differences in fatty acids and volatile compounds by gas chromatography-mass spectrometry (GC-MS) (p<0.05). For fatty acid data, k-nearest neighbor (KNN) and support vector machine (SVM) got 91.7% classification accuracy. SPME-GC-MS data failed in classification models. For electronic nose data, classification accuracy of KNN, linear discriminant analysis (LDA), SVM and decision tree was all 100%. The overall results indicated that BYC and DBYC eggs could be discriminated based on electronic nose with suitable classification algorithms. This research compared the differentiation of the fatty acid profile and volatile compounds of various egg yolks. The results could be applied to evaluate egg nutrition and distinguish avian eggs.