• Title/Summary/Keyword: Mixture of Gaussian

Search Result 509, Processing Time 0.03 seconds

Computer Vision Approach for Phenotypic Characterization of Horticultural Crops (컴퓨터 비전을 활용한 토마토, 파프리카, 멜론 및 오이 작물의 표현형 특성화)

  • Seungri Yoon;Minju Shin;Jin Hyun Kim;Ho Jeong Jeong;Junyoung Park;Tae In Ahn
    • Journal of Bio-Environment Control
    • /
    • v.33 no.1
    • /
    • pp.63-70
    • /
    • 2024
  • This study explored computer vision methods using the OpenCV open-source library to characterize the phenotypes of various horticultural crops. In the case of tomatoes, image color was examined to assess ripeness, while support vector machine (SVM) and histogram of oriented gradients (HOG) methods effectively identified ripe tomatoes. For sweet pepper, we visualized the color distribution and used the Gaussian mixture model for clustering to analyze its post-harvest color characteristics. For the quality assessment of netted melons, the LAB (lightness, a, b) color space, binary images, and depth mapping were used to measure the net patterns of the melon. In addition, a combination of depth and color data proved successful in identifying flowers of different sizes and distances in cucumber greenhouses. This study highlights the effectiveness of these computer vision strategies in monitoring the growth and development, ripening, and quality assessment of fruits and vegetables. For broader applications in agriculture, future researchers and developers should enhance these techniques with plant physiological indicators to promote their adoption in both research and practical agricultural settings.

The Study on the Verification of Speaker Change using GMM-UBM based KL distance (GMM-UBM 기반 KL 거리를 활용한 화자변화 검증에 대한 연구)

  • Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.71-77
    • /
    • 2016
  • In this paper, we proposed a verification of speaker change utilizing the KL distance based on GMM-UBM to improve the performance of conventional BIC based Speaker Change Detection(SCD). We have verified Conventional BIC-based SCD using KL-distance based SCD which is robust against difference of information volume than BIC-based SCD. And we have applied GMM-UBM to compensate asymmetric information volume. Conventional BIC-based SCD was composed of two steps. Step 1, to detect the Speaker Change Candidate Point(SCCP). SCCP is positive local maximum point of dissimilarity d. Step 2, to determine the Speaker Change Point(SCP). If ${\Delta}BIC$ of SCCP is positive, it decides to SCP. We examined verification of SCP using GMM-UBM based KL distance D. If the value of D on each SCP is higher than threshold, we accepted that point to the final SCP. In the experimental condition MDR(Missed Detection Rate) is 0, FAR(False Alarm Rate) when the threshold value of 0.028 has been improved to 60.7%.

Development of the Topography Restoration Method for Debris Flow Area Using Airborne LiDAR Data (항공 라이다 자료를 이용한 토석류 발생지역의 지형복원기법 개발)

  • Woo, Choong-Shik;Youn, Ho-Joong;Lee, Chang-Woo;Lee, Kyu-Sung
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.3
    • /
    • pp.174-187
    • /
    • 2011
  • The flowed soil is able to be estimated from topographic data of before and after the debris flow. However, it is often difficult to obtain airborne LiDAR data before the debris flow area. Thus, this study tries to develop a topographic restoration method that can provide spatial distribution of flowed soil and reconstruct the topography before the debris flow using airborne LiDAR data. The topographic restoration method can express a numerical formula induced from a Gaussian mixture model after extracting the cross sections of linear or non-linear in debris flowed area. The topographic restoration method was verified by two ways using airborne LiDAR data of before and after the debris flow. First, each cross section extracted from the debris flow sites to restore the topography was compared with airborne LiDAR data of before the debris flow. Also, the topographic data produced after the topographic restoration method applied to the debris flow sites was verified by airborne LiDAR DEM. Verifying the results of the topographic restoration method, overall fitting accuracy showed high accuracy close to 0.5m.

Statistical Characteristics of Hourly Tidal Levels around the Korean Peninsula (한반도 연안 1시간 조위자료의 통계적 특성)

  • Ko, Dong Hui;Jeong, Shin Taek;Cho, Hongyeon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.25 no.6
    • /
    • pp.365-373
    • /
    • 2013
  • Representative tidal gauging (TG) stations are selected to cover the tidal characteristics of the Korean peninsula coastal seas, and the statistical parameters of the data are analysed from the perspective of the probability distribution at that TG station. The shape of the distribution in the Incheon and Gunsan TG stations, which are tide-dominated areas, shows two clear modes at HWONT and LWONT in the distributions, and in the Mokpo station, shows an asymmetric double peak distribution. In contrast, the frequency distribution shape shows a smoothed flat peak in the Jeju, Yeosu and Busan TG stations, and a single peak in the Pohang and Sokcho TG stations. The emersion and submersion equations suggested as the 6-parameter Gaussian mixture models in this study are accurate, and well fitted to the observed tidal elevation data. The ${\mu}_1$, ${\mu}_2$ parameters are highly correlated to the LWONT and HWONT, and the ${\sigma}_1$ and ${\sigma}_2$ parameters are also closely correlated to the mean tidal range. The ${\mu}_1$ and ${\mu}_2$ parameters coincide with the modes of the suggested probability distribution of the hourly tidal level data.

Research on Classification of Sitting Posture with a IMU (하나의 IMU를 이용한 앉은 자세 분류 연구)

  • Kim, Yeon-Wook;Cho, Woo-Hyeong;Jeon, Yu-Yong;Lee, Sangmin
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.3
    • /
    • pp.261-270
    • /
    • 2017
  • Bad sitting postures are known to cause for a variety of diseases or physical deformation. However, it is not easy to fit right sitting posture for long periods of time. Therefore, methods of distinguishing and inducing good sitting posture have been constantly proposed. Proposed methods were image processing, using pressure sensor attached to the chair, and using the IMU (Internal Measurement Unit). The method of using IMU has advantages of simple hardware configuration and free of various constraints in measurement. In this paper, we researched on distinguishing sitting postures with a small amount of data using just one IMU. Feature extraction method was used to find data which contribution is the least for classification. Machine learning algorithms were used to find the best position to classify and we found best machine learning algorithm. Used feature extraction method was PCA(Principal Component Analysis). Used Machine learning models were five : SVM(Support Vector Machine), KNN(K Nearest Neighbor), K-means (K-means Algorithm) GMM (Gaussian Mixture Model), and HMM (Hidden Marcov Model). As a result of research, back neck is suitable position for classification because classification rate of it was highest in every model. It was confirmed that Yaw data which is one of the IMU data has the smallest contribution to classification rate using PCA and there was no changes in classification rate after removal it. SVM, KNN are suitable for classification because their classification rate are higher than the others.

A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network (심층신경망 구조에 따른 구개인두부전증 환자 음성 인식 향상 연구)

  • Kim, Min-seok;Jung, Jae-hee;Jung, Bo-kyung;Yoon, Ki-mu;Bae, Ara;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.6
    • /
    • pp.703-709
    • /
    • 2019
  • This paper proposes speech recognition systems employing Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) structures combined with Hidden Markov Moldel (HMM) to effectively recognize the speech of VeloPharyngeal Insufficiency (VPI) patients, and compares the recognition performance of the systems to the Gaussian Mixture Model (GMM-HMM) and fully-connected Deep Neural Network (DNNHMM) based speech recognition systems. In this paper, the initial model is trained using normal speakers' speech and simulated VPI speech is used for generating a prior model for speaker adaptation. For VPI speaker adaptation, selected layers are trained in the CNN-HMM based model, and dropout regulatory technique is applied in the LSTM-HMM based model, showing 3.68 % improvement in recognition accuracy. The experimental results demonstrate that the proposed LSTM-HMM-based speech recognition system is effective for VPI speech with small-sized speech data, compared to conventional GMM-HMM and fully-connected DNN-HMM system.

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

  • Kim, Min Sik;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.65-71
    • /
    • 2020
  • Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters (MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상)

  • Kang, Ji Hoon;Kim, Bo Ram;Kim, Kyu Young;Lee, Sang Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.679-686
    • /
    • 2020
  • In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.

On-Road Car Detection System Using VD-GMM 2.0 (차량검출 GMM 2.0을 적용한 도로 위의 차량 검출 시스템 구축)

  • Lee, Okmin;Won, Insu;Lee, Sangmin;Kwon, Jangwoo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.11
    • /
    • pp.2291-2297
    • /
    • 2015
  • This paper presents a vehicle detection system using the video as a input image what has moving of vehicles.. Input image has constraints. it has to get fixed view and downward view obliquely from top of the road. Road detection is required to use only the road area in the input image. In introduction, we suggest the experiment result and the critical point of motion history image extraction method, SIFT(Scale_Invariant Feature Transform) algorithm and histogram analysis to detect vehicles. To solve these problem, we propose using applied Gaussian Mixture Model(GMM) that is the Vehicle Detection GMM(VDGMM). In addition, we optimize VDGMM to detect vehicles more and named VDGMM 2.0. In result of experiment, each precision, recall and F1 rate is 9%, 53%, 15% for GMM without road detection and 85%, 77%, 80% for VDGMM2.0 with road detection.

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition (라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응)

  • Jeong, Hyeonjae;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.29-37
    • /
    • 2020
  • Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.