• Title/Summary/Keyword: Combined dataset

Search Result 159, Processing Time 0.028 seconds

Speaker verification with ECAPA-TDNN trained on new dataset combined with Voxceleb and Korean (Voxceleb과 한국어를 결합한 새로운 데이터셋으로 학습된 ECAPA-TDNN을 활용한 화자 검증)

  • Keumjae Yoon;Soyoung Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.209-224
    • /
    • 2024
  • Speaker verification is becoming popular as a method of non-face-to-face identity authentication. It involves determining whether two voice data belong to the same speaker. In cases where the criminal's voice remains at the crime scene, it is vital to establish a speaker verification system that can accurately compare the two voice evidence. In this study, to achieve this, a new speaker verification system was built using a deep learning model for Korean language. High-dimensional voice data with a high variability like background noise made it necessary to use deep learning-based methods for speaker matching. To construct the matching algorithm, the ECAPA-TDNN model, known as the most famous deep learning system for speaker verification, was selected. A large dataset of the voice data, Voxceleb, collected from people of various nationalities without Korean. To study the appropriate form of datasets necessary for learning the Korean language, experiments were carried out to find out how Korean voice data affects the matching performance. The results showed that when comparing models learned only with Voxceleb and models learned with datasets combining Voxceleb and Korean datasets to maximize language and speaker diversity, the performance of learning data, including Korean, is improved for all test sets.

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.

Texture Descriptor for Texture-Based Image Retrieval and Its Application in Computer-Aided Diagnosis System (질감 기반 이미지 검색을 위한 질감 서술자 및 컴퓨터 조력 진단 시스템의 적용)

  • Saipullah, Khairul Muzzammil;Peng, Shao-Hu;Kim, Deok-Hwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.34-43
    • /
    • 2010
  • Texture information plays an important role in object recognition and classification. To perform an accurate classification, the texture feature used in the classification must be highly discriminative. This paper presents a novel texture descriptor for texture-based image retrieval and its application in Computer-Aided Diagnosis (CAD) system for Emphysema classification. The texture descriptor is based on the combination of local surrounding neighborhood difference and centralized neighborhood difference and is named as Combined Neighborhood Difference (CND). The local differences of surrounding neighborhood difference and centralized neighborhood difference between pixels are compared and converted into binary codewords. Then binomial factor is assigned to the codewords in order to convert them into high discriminative unique values. The distribution of these unique values is computed and used as the texture feature vectors. The texture classification accuracies using Outex and Brodatz dataset show that CND achieves an average of 92.5%, whereas LBP, LND and Gabor filter achieve 89.3%, 90.7% and 83.6%, respectively. The implementations of CND in the computer-aided diagnosis of Emphysema is also presented in this paper.

Molecular Phylogenetic Analysis of Botrytis cinerea Occurring in Korea (우리나라에 발생하는 잿빛곰팡이병균 Botrytis cinerea의 분자계통학적 유연관계)

  • Back, Chang-Gi;Lee, Seung-Yeol;Jung, Hee-Young
    • The Korean Journal of Mycology
    • /
    • v.42 no.2
    • /
    • pp.138-143
    • /
    • 2014
  • Several isolates were collected from apple, pepper, strawberry, cucumber and tomato having typical gray mold symptoms. All the isolates were identified as Botrytis cinerea by using morphological characteristics and PCR-RFLP method. It was difficult to analyze the phylogenetic relationship of these isolates by using ITS region, HSP60 and G3PDH because these genes were highly homologous in their nucleotide in inter-species of B. cinerea and intra-species of genus Botrytis. However, phylogenetic analysis using combined sequences (RPB2, HSP60 and G3PDH genes) clearly showed that all isolate of B. cinerea were different from Botrytis spp. Furthermore, it was also confirmed that strawberry isolate was distantly related to apple, pepper, cucumber and tomato isolates that were closely related to each other in nucleotide level.

A comparison of single-epoch black hole masses at z>0.5

  • Karouzos, M.;Woo, Jong-Hak;Matsuoka, Kenta;Onken, Christopher;Kollmeier, Juna;Park, Dawoo;Nagao, Tohru
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.40 no.1
    • /
    • pp.42.1-42.1
    • /
    • 2015
  • Accurately estimating black hole (BH) masses at high redshifts is imperative in the current and future era of large-area extragalactic spectroscopic surveys. We present an extension of existing comparisons between rest-frame UV and optical virial BH mass estimators to intermediate redshifts, lower luminosities, and lower BH masses, comparable to the local $H{\beta}$ reverberation-mapping sample. We use data from the AGES survey and also newly acquired near-infrared spectra from the FMOS instrument on Subaru telescope for 89 broad-lined active galaxies at redshifts between 0.5 and 1.6. We focus on the MgII, CIV, and CIII broad emission lines and compare them to both $H{\alpha}$ and $H{\beta}$, using two different prescriptions to describe their emission profile width. We confirm that MgII shows a tight correlation with $H{\alpha}$, with a scatter of ~0.25 dex. The CIV and CIII estimators can be considered viable virial mass estimators, despite large scatter values. We combine our dataset with previous high redshift and high luminosity CIV and CIII measurements from the literature and we calculate a scatter of $\sim0.4$ dex and an offset to the 1:1 relation consistent with 0 for the combined sample. This updated comparison spans a total of 4 decades in BH mass, a much wider range than any previous individual study.

  • PDF

First Report on Isolation of $Penicillium$ $adametzioides$ and $Purpureocillium$ $lilacinum$ from Decayed Fruit of Cheongsoo Grapes in Korea

  • Deng, Jian Xin;Paul, Narayan Chandra;Sang, Hyun-Kyu;Lee, Ji-Hye;Hwang, Yong-Soo;Yu, Seung-Hun
    • Mycobiology
    • /
    • v.40 no.1
    • /
    • pp.66-70
    • /
    • 2012
  • Two species, $Penicillium$ $adametzioides$ and $Purpureocillium$ $lilacinum$, were isolated from decayed grapes (cv. Cheongsoo) in Korea. Each species was initially identified by phylogenetic analysis of a combined dataset of two genes. Internal transcribed spacer (ITS) and ${\beta}$-tubulin (BT2) genes were used for identification of $Penicillium$ $adametzioides$, and ITS and partial translation elongation factor 1-${\alpha}$ (TEF) genes were used for identification of $Purpureocillium$ $lilacinum$. Morphologically, they were found to be identical to previous descriptions. The two species presented here have not been previously reported in Korea.

Residual Polar Motion excluding Chandler and Annual components

  • Na, Sung-Ho;Baek, Jeong-Ho;Kwak, Young-Hee;Yoo, Sung-Moon;Cho, Jung-Ho;Cho, Sung-Ki;Park, Jong-Uk;Park, Pil-Ho
    • Bulletin of the Korean Space Science Society
    • /
    • 2011.04a
    • /
    • pp.22.1-22.1
    • /
    • 2011
  • Two dominant components of polar motion are the Chandler and the annual components. Recently, the existence of 500-day period component in the Earth's polar motion has been manifested. But its existence is not clear on Fourier spectrum. One cause of difficulty involved here is that the amplitudes of the two main components are slightly variable in time by certain amounts (Chandler: 0.15~0.28 arcsec, annual: 0.09~0.15 arcsec). A residual polar motion time series excluding the two main components for a time span between 1962 Jan and 2010 Nov from IERS C04 time series dataset was constructed by least square fitting. For faithful fitting, 43 time segments of 6.8 year length (each starts on January 1st of successive years) were separately acquired and later combined together. The period of dominant peak in the spectrum of this residual polar motion time series is 490 days. Next peaks have their periods as semi-annual, 300~330 days, ~560 days, 670 days, and 1360 days.

  • PDF

Enhancing the Performance of Blog Retrieval by User Tagging and Social Network Analysis (사용자 태그와 중심성 지수를 이용한 블로그 검색 성능 향상에 관한 연구)

  • Kim, Eun-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.1
    • /
    • pp.61-77
    • /
    • 2010
  • Blogs are now one of the major information resources on the web. The purpose of this study is to enhance the performance of blog retrieval by means of user assigned tags and trackback information. To this end, retrieval experiments were performed with a dataset of 4,908 blog pages together with their associated trackback URLs. In the experiments, text terms, user tags, and network centrality values based on trackbacks were variously combined as retrieval features. The experimental results showed that employing user tags and network centrality values as retrieval features in addition to text words could improve the performance of blog retrieval.

Performance Improvement of Deep Clustering Networks for Multi Dimensional Data (다차원 데이터에 대한 심층 군집 네트워크의 성능향상 방법)

  • Lee, Hyunjin
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.952-959
    • /
    • 2018
  • Clustering is one of the most fundamental algorithms in machine learning. The performance of clustering is affected by the distribution of data, and when there are more data or more dimensions, the performance is degraded. For this reason, we use a stacked auto encoder, one of the deep learning algorithms, to reduce the dimension of data which generate a feature vector that best represents the input data. We use k-means, which is a famous algorithm, as a clustering. Sine the feature vector which reduced dimensions are also multi dimensional, we use the Euclidean distance as well as the cosine similarity to increase the performance which calculating the similarity between the center of the cluster and the data as a vector. A deep clustering networks combining a stacked auto encoder and k-means re-trains the networks when the k-means result changes. When re-training the networks, the loss function of the stacked auto encoder and the loss function of the k-means are combined to improve the performance and the stability of the network. Experiments of benchmark image ad document dataset empirically validated the power of the proposed algorithm.

Study on the Relationship between Weather Conditions, Sewage and Operational Variables of WWTPs using Multivariate Statistical Methods (기상조건이 하수발생량 및 하수처리장 운전인자에 미치는 영향에 관한 통계적 분석)

  • Lee, Jae-Hyun
    • Journal of Korean Society on Water Environment
    • /
    • v.28 no.2
    • /
    • pp.285-291
    • /
    • 2012
  • Generally, the rainfall and the influent of wastewater treatment plants (WWTPs) have strong relationship at the case of combined sewers. With the fact that the influent variations in terms of quantity and sewage quality is the most common and significant disturbance, the impact factor to the characteristics of sewage should be searched for. In this paper, the relationship between weather conditions such as humidity, temperature and rainfall and influent flowrate and contaminant concentration was analysed using factor analysis. Additionally, 3 influent types were deduced using cluster analysis and the distributions of operational variables were compared to the each groups by one-way ANOVA. The applied dataset were clustered to three groups that have the similar weather and influent conditions. These different conditions can cause the different operating conditions at WWTPs. That is, the Group 1 is for the condition with high humidity and rainfall, so DO concentration in the reactor was very high but MLSS concentration was very low because of too large flowrate. However, the Group 3 is classified to the case having low humidity, temperature, and rainfall, therefore, the SRT was the longest and the SVI was the highest due to the worst settleability in the winter for a year.