• Title/Summary/Keyword: Misclassification Error

Search Result 37, Processing Time 0.026 seconds

Performance Improvement in Speech Recognition by Weighting HMM Likelihood (은닉 마코프 모델 확률 보정을 이용한 음성 인식 성능 향상)

  • 권태희;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2003
  • In this paper, assuming that the score of speech utterance is the product of HMM log likelihood and HMM weight, we propose a new method that HMM weights are adapted iteratively like the general MCE training. The proposed method adjusts HMM weights for better performance using delta coefficient defined in terms of misclassification measure. Therefore, the parameter estimation and the Viterbi algorithms of conventional 1:.um can be easily applied to the proposed model by constraining the sum of HMM weights to the number of HMMs in an HMM set. Comparing with the general segmental MCE training approach, computing time decreases by reducing the number of parameters to estimate and avoiding gradient calculation through the optimal state sequence. To evaluate the performance of HMM-based speech recognizer by weighting HMM likelihood, we perform Korean isolated digit recognition experiments. The experimental results show better performance than the MCE algorithm with state weighting.

Image Thresholding Based on Within-Class Standard Deviation (클래스 내 표준편차 기반의 문턱치 처리에 의한 영상분할)

  • Sung, Jung-Min;Ha, Ho-Gun;Choi, Bong-Yeol
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.7
    • /
    • pp.216-224
    • /
    • 2013
  • The within-class variance of Otsu's method is moderate but improper in expressing class statistical distributions. Otsu's method uses a variance to represent the distribution of each class. The variance utilizes a distance square from the mean to a data. This process is not proper in denoting a real class statistical distribution because of the distance square. In this paper, to express more exact class statistical distributions, the within-class standard deviation as a criterion for threshold selection is proposed and then the optimal threshold is determined by minimizing it. In order to have validity, it is shown through the experimental results that the proposed method was more superior to the counterparts.

A Comparative Study of Classification Methods Using Data with Label Noise (레이블 노이즈가 존재하는 자료의 판별분석 방법 비교연구)

  • Kwon, So Young;Kim, Kyoung Hee
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2853-2864
    • /
    • 2018
  • Discriminant analysis predicts a class label of a new observation with an unknown label, using information from the existing labeled data. Hence, observed labels play a critical role in the analysis and we usually assume that these labels are correct. If the observed label contains an error, the data has label noise. Label noise can frequently occur in real data, which would affect classification performance. In order to resolve this, a comparative study was carried out using simulated data with label noise. In particular, we considered 4 different classification techniques such as LDA (linear discriminant analysis classifiers), QDA (quadratic discriminant analysis classifiers), KNN (k-nearest neighbour), and SVM (support vector machine). Then we evaluated each method via average accuracy using generated data from various scenarios. The effect of label noise was investigated through its occurrence rate and type (noise location). We confirmed that the label noise is a significant factor influencing the classification performance.

Finding the Optimal Data Classification Method Using LDA and QDA Discriminant Analysis

  • Kim, SeungJae;Kim, SungHwan
    • Journal of Integrative Natural Science
    • /
    • v.13 no.4
    • /
    • pp.132-140
    • /
    • 2020
  • With the recent introduction of artificial intelligence (AI) technology, the use of data is rapidly increasing, and newly generated data is also rapidly increasing. In order to obtain the results to be analyzed based on these data, the first thing to do is to classify the data well. However, when classifying data, if only one classification technique belonging to the machine learning technique is applied to classify and analyze it, an error of overfitting can be accompanied. In order to reduce or minimize the problems caused by misclassification of the classification system such as overfitting, it is necessary to derive an optimal classification by comparing the results of each classification by applying several classification techniques. If you try to interpret the data with only one classification technique, you will have poor reasoning and poor predictions of results. This study seeks to find a method for optimally classifying data by looking at data from various perspectives and applying various classification techniques such as LDA and QDA, such as linear or nonlinear classification, as a process before data analysis in data analysis. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable and the correlation between the variables. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified to suit the purpose of analysis. This is a process that must be performed before reaching the result by analyzing the data, and it may be a method of optimal data classification.

A Meta-Analysis of Air Pollution in Relation to Daily Mortality in Seven Major Cities of Korea, 1998-2001 (메타분석을 적용한 전국 7개 대도시의 대기오염과 일일사망발생의 상관성 연구(1998년$\sim$2001년))

  • Cho, Yong-Sung;Lee, Jong-Tae;Son, Ji-Young;Kim, Yoon-Shin
    • Journal of Environmental Health Sciences
    • /
    • v.32 no.4 s.91
    • /
    • pp.304-315
    • /
    • 2006
  • This study is performed to reexamine the association between ambient air pollution and daily mortality in seven major cities of Korea using a method of meta-analysis with the data filed for the period 1998-2001. These cities account for half of the Korean population (about 23 million). The observed concentrations of carbon monoxide (CO, mean=1.08 ppm), ozone ($O_3$, mean=33.97 ppb), particulate matter less than 10 ${\mu}m$ ($PM_{10},\;mean=57.11\;{\mu}g/m^3$), nitrogen dioxide ($NO_2$, mean=25.09 ppb), and sulfur dioxide ($SO_2$, mean=9.14 ppb) during the study period were at levels below Korea's current ambient air quality standards. Generalized additive models were applied to allow for the highly flexible fitting of seasonal and long-term time trends in air pollution as well as nonlinear associations with weather variables, such as air temperature and relative humidity. Also, we calculated a weighted mean as a meta-analysis summary of the estimates and its standard error. In city-specific analyses, an increase of $41.17{\mu}g/m^3(IQR)\;of\;PM_{10}$ corresponded to $1{\sim}12%$ more deaths, given constant weather conditions. Like most of air pollution epidemiologic studies, this meta-analysis cannot avoid fleeing from measurement misclassification since no personal measurement was taken. However, we can expect that a measurement bias be reduced in district-specific estimate since a monitoring station is better representative of air quality of the matched district. Significant heterogeneity was found for the effect of all pollutants. The estimated relative risks from meta-like analysis increased compared to those relative risks from pooled analysis. The similar results to those from the previous studies indicated existence of health effect of air pollution at current levels in many industrialized countries, including Korea.

Detection of Surface Water Bodies in Daegu Using Various Water Indices and Machine Learning Technique Based on the Landsat-8 Satellite Image (Landsat-8 위성영상 기반 수분지수 및 기계학습을 활용한 대구광역시의 지표수 탐지)

  • CHOUNG, Yun-Jae;KIM, Kyoung-Seop;PARK, In-Sun;CHUNG, Youn-In
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.1
    • /
    • pp.1-11
    • /
    • 2021
  • Detection of surface water features including river, wetland, reservoir from the satellite imagery can be utilized for sustainable management and survey of water resources. This research compared the water indices derived from the multispectral bands and the machine learning technique for detecting the surface water features from he Landsat-8 satellite image acquired in Daegu through the following steps. First, the NDWI(Normalized Difference Water Index) image and the MNDWI(Modified Normalized Difference Water Index) image were separately generated using the multispectral bands of the given Landsat-8 satellite image, and the two binary images were generated from these NDWI and MNDWI images, respectively. Then SVM(Support Vector Machine), the widely used machine learning techniques, were employed to generate the land cover image and the binary image was also generated from the generated land cover image. Finally the error matrices were used for measuring the accuracy of the three binary images for detecting the surface water features. The statistical results showed that the binary image generated from the MNDWI image(84%) had the relatively low accuracy than the binary image generated from the NDWI image(94%) and generated by SVM(96%). And some misclassification errors occurred in all three binary images where the land features were misclassified as the surface water features because of the shadow effects.

A Meta-analysis of Ambient Air Pollution in Relation to Daily Mortality in Seoul, $1991\sim1995$ (메타분석 방법을 적용한 서울시 대기오염과 조기사망의 상관성 연구 (1991년$\sim$1995년))

  • Dockery, Douglas W.;Kim, Chun-Bae;Jee, Sun-Ha;Chung, Yong;Lee, Jong-Tae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.32 no.2
    • /
    • pp.177-182
    • /
    • 1999
  • Objectives: To reexamine the association between air pollution and daily mortality in Seoul, Korea using a method of meta-analysis with the data filed for 1991 through 1995. Methods: A separate Poisson regression analysis on each district within the metropolitan area of Seoul was conducted to regress daily death counts on levels of each ambient air pollutant, such as total suspended particulates (TSP), sulfur dioxide $(SO_2)$, and ozone $(O_3)$, controlling for variability in the weather condition. We calculated a weighted mean as a meta-analysis summary of the estimates and its standard error. Results: We found that the p value from each pollutant model to test the homogeneity assumption was small (p<0.01) because of the large disparity among district-specific estimates. Therefore, all results reported here were estimated from the random effect model. Using the weighted mean that we calculated, the mortality at a $100{\mu}g/m^3$ increment in a 3-day moving average of TSP levels was 1.034 (95% Cl 1.009-1.059). The mortality was estimated to increase 6% (95% Cl 3-10%) and 3% (95% Cl 0-6%) with each 50 ppb increase for 9-day moving average of SO2 and 1-hr maximum O3, respectively. Conclusions: Like most of air pollution epidemiologic studies, this meta-analysis cannot avoid fleeing from measurement misclassification since no personal measurement was taken. However, we can expect that a measurement bias be reduced in a district-specific estimate since a monitoring station is hefter representative cf air quality of the matched district. The similar results to those from the previous studios indicated existence of health effect of air pollution at current levels in many industrialized countries, including Korea.

  • PDF