• Title/Summary/Keyword: Data classification

Search Result 8,102, Processing Time 0.041 seconds

A Study on Utilizing 1:1,000 Digital Topographic Data for Urban Landuse Classification (도시지역 토지이용분류를 위한 1:1,000 수치지형도 활용에 관한 연구)

  • Min, Sookjoo;Kim, Kyehyun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.1D
    • /
    • pp.149-156
    • /
    • 2006
  • Existing method of landuse classification using aerial photographs or field survey requires relatively higher amount of time and cost due to necessary manual work. Especially in urban area where the pattern of landuse is densely aggregated, a landuse classification using satellite image is more complex. In this background, this study proposes a landuse classification method to utilize 1:1,000 digital topographic data and IKONOS satellite image. To prove the possibility of this method, the method was applied to Seoul metropolitan area. The results shows the total accuracy of approximately 95% and 14 landuse classes extracted. Based on the results from the pilot study, this method is applicable to landuse classification in urban area.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Semi-Supervised SAR Image Classification via Adaptive Threshold Selection (선별적인 임계값 선택을 이용한 준지도 학습의 SAR 분류 기술)

  • Jaejun Do;Minjung Yoo;Jaeseok Lee;Hyoi Moon;Sunok Kim
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.3
    • /
    • pp.319-328
    • /
    • 2024
  • Semi-supervised learning is a good way to train a classification model using a small number of labeled and large number of unlabeled data. We applied semi-supervised learning to a synthetic aperture radar(SAR) image classification model with a limited number of datasets that are difficult to create. To address the previous difficulties, semi-supervised learning uses a model trained with a small amount of labeled data to generate and learn pseudo labels. Besides, a lot of number of papers use a single fixed threshold to create pseudo labels. In this paper, we present a semi-supervised synthetic aperture radar(SAR) image classification method that applies different thresholds for each class instead of all classes sharing a fixed threshold to improve SAR classification performance with a small number of labeled datasets.

Ecological land cover classification of the Korean peninsula Ecological land cover classification of the Korean peninsula

  • Kim, Won-Joo;Lee, Seung-Gu;Kim, Sang-Wook;Park, Chong-Hwa
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.679-681
    • /
    • 2003
  • The objectives of this research are as follows. First, to investigate methods for a national-scale land cover map based on multi-temporal classification of MODIS data and multi-spectral classification of Landsat TM data. Second, to investigate methods to p roduce ecological zone maps of Korea based on vegetation, climate, and topographic characteristics. The results of this research can be summarized as follows. First, NDVI and EVI of MODIS can be used to ecological mapping of the country by using monthly phenological characteris tics. Second, it was found that EVI is better than NDVI in terms of atmospheric correction and vegetation mapping of dense forests of the country. Third, several ecological zones of the country can be identified from the VI maps, but exact labeling requires much field works, and sufficient field data and macro-environmental data of the country. Finally, relationship between land cover types and natural environmental factors such as temperature, precipitation, elevation, and slope could be identified.

  • PDF

Gender Classification of Speakers Using SVM

  • Han, Sun-Hee;Cho, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.59-66
    • /
    • 2022
  • This research conducted a study classifying gender of speakers by analyzing feature vectors extracted from the voice data. The study provides convenience in automatically recognizing gender of customers without manual classification process when they request any service via voice such as phone call. Furthermore, it is significant that this study can analyze frequently requested services for each gender after gender classification using a learning model and offer customized recommendation services according to the analysis. Based on the voice data of males and females excluding blank spaces, the study extracts feature vectors from each data using MFCC(Mel Frequency Cepstral Coefficient) and utilizes SVM(Support Vector Machine) models to conduct machine learning. As a result of gender classification of voice data using a learning model, the gender recognition rate was 94%.

Fuzzy SVM for Multi-Class Classification

  • Na, Eun-Young;Hong, Dug-Hun;Hwang, Chang-Ha
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.123-123
    • /
    • 2003
  • More elaborated methods allowing the usage of binary classifiers for the resolution of multi-class classification problems are briefly presented. This way of using FSVC to learn a K-class classification problem consists in choosing the maximum applied to the outputs of K FSVC solving a one-per-class decomposition of the general problem.

  • PDF

A Note on Fuzzy Support Vector Classification

  • Lee, Sung-Ho;Hong, Dug-Hun
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.133-140
    • /
    • 2007
  • The support vector machine has been well developed as a powerful tool for solving classification problems. In many real world applications, each training point has a different effect on constructing classification rule. Lin and Wang (2002) proposed fuzzy support vector machines for this kind of classification problems, which assign fuzzy memberships to the input data and reformulate the support vector classification. In this paper another intuitive approach is proposed by using the fuzzy ${\alpha}-cut$ set. It will show us the trend of classification functions as ${\alpha}$ changes.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

Performance Comparison of Decision Trees of J48 and Reduced-Error Pruning

  • Jin, Hoon;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.5 no.1
    • /
    • pp.30-33
    • /
    • 2016
  • With the advent of big data, data mining is more increasingly utilized in various decision-making fields by extracting hidden and meaningful information from large amounts of data. Even as exponential increase of the request of unrevealing the hidden meaning behind data, it becomes more and more important to decide to select which data mining algorithm and how to use it. There are several mainly used data mining algorithms in biology and clinics highlighted; Logistic regression, Neural networks, Supportvector machine, and variety of statistical techniques. In this paper it is attempted to compare the classification performance of an exemplary algorithm J48 and REPTree of ML algorithms. It is confirmed that more accurate classification algorithm is provided by the performance comparison results. More accurate prediction is possible with the algorithm for the goal of experiment. Based on this, it is expected to be relatively difficult visually detailed classification and distinction.

Data Classification Using the Robbins-Monro Stochastic Approximation Algorithm (로빈스-몬로 확률 근사 알고리즘을 이용한 데이터 분류)

  • Lee, Jae-Kook;Ko, Chun-Taek;Choi, Won-Ho
    • Proceedings of the KIPE Conference
    • /
    • 2005.07a
    • /
    • pp.624-627
    • /
    • 2005
  • This paper presents a new data classification method using the Robbins Monro stochastic approximation algorithm k-nearest neighbor and distribution analysis. To cluster the data set, we decide the centroid of the test data set using k-nearest neighbor algorithm and the local area of data set. To decide each class of the data, the Robbins Monro stochastic approximation algorithm is applied to the decided local area of the data set. To evaluate the performance, the proposed classification method is compared to the conventional fuzzy c-mean method and k-nn algorithm. The simulation results show that the proposed method is more accurate than fuzzy c-mean method, k-nn algorithm and discriminant analysis algorithm.

  • PDF