Search | Korea Science

Shin, Hyun-seo;Kim, Ju-ho;Heo, Jungwoo;Shim, Hye-jin;Yu, Ha-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.3
- /
- pp.319-325
- /
- 2022
The variation of utterance lengths is a representative factor that can degrade the performance of speaker verification systems. To handle this issue, previous studies had attempted to extract speaker features from various branches or to use convolution layers with different receptive fields. Combining the advantages of the previous two approaches for variable-length input, this paper proposes integrated receptive field diversification that extracts speaker features through more diverse receptive field. The proposed method processes the input features by convolutional layers with different receptive fields at multiple time-axis branches, and extracts speaker embedding by dynamically aggregating the processed features according to the lengths of input utterances. The deep neural networks in this study were trained on the VoxCeleb2 dataset and tested on the VoxCeleb1 evaluation dataset that divided into 1 s, 2 s, 5 s, and full-length. Experimental results demonstrated that the proposed method reduces the equal error rate by 19.7 % compared to the baseline.
https://doi.org/10.7776/ASK.2022.41.3.319 인용 PDF KSCI

Ju, Youngho;Babukaji, Baniya;Lee, Joonwhan
- The Journal of the Korea Contents Association
- /
- v.12 no.11
- /
- pp.9-19
- /
- 2012
Time-varying tempo of a song is one of the error sources for the identification of a note duration in automatic music recognition. This paper proposes an improved music transcription scheme equipped with the identification of note duration considering the time-varying tempo. In the proposed scheme the measures are found at first and the tempo, the playing time of each measure, is then estimated. The tempo is then used for resizing each IOI(Inter Onset Interval) length and considered to identify the accurate note duration, which increases the degree of correspondence to the music piece. In the experiment the proposed scheme found the accurate measure position for 14 monophonic children songs out of 16 ones recorded by men and women. Also, it achieved about 89.4% and 84.8% of the degree of matching to the original music piece for identification of note duration and pitch, respectively.
https://doi.org/10.5392/JKCA.2012.12.11.009 인용 PDF KSCI

Sin, Chan-Hu;Lee, Hui-Jeong;Park, Byeong-Cheol
- The Journal of the Acoustical Society of Korea
- /
- v.6 no.4
- /
- pp.21-30
- /
- 1987
Length normalization by variable frame size is proposed as a novel approach to length normalization to solve the problem that the length variation of spoken word results in a lowing of recognition accuracy. This method has the advantage of curtailment of recognition time in the recognition stage because it can reduce the number of frames constructing a word compared with length normalization by a fixed frame size. In this paper, variable frame length normalization is applied to multisection vector quantization and the efficiency of this method is estimated in the view of recognition time and accuracy through practical recognition experiments.
PDF