Search | Korea Science

A Study on Duration Length and Place of Feature Extraction for Phoneme Recognition (음소 인식을 위한 특징 추출의 위치와 지속 시간 길이에 관한 연구)

Kim, Bum-Koog;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.4
- /
- pp.32-39
- /
- 1994
As a basic research to realize Korean speech recognition system, phoneme recognition was carried out to find out ; 1) the best place which represents each phoneme's characteristics, and 2) the reasonable length of duration for obtaining the best recognition rates. For the recognition experiments, multi-speaker dependent recognition with Bayesian decision rule using 21 order of cepstral coefficient as a feature parameter was adopted. It turned out that the best place of feature extraction for the highest recognition rates were 10~50ms in vowels, 40~100ms in fricatives and affricates, 10~50ms in nasals and liquids, and 10~50ms in plosives. And about 70ms of duration was good enough for the recognition of all 35 phonemes.
PDF

On the Transmission Quality of Wide-Band Telephony (전화 대역 확장에 따른 통화품질의 변화)

김정환
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.155-158
- /
- 1995
150~7,000Hzd의 확대역 전화를 위한 전화 전송특성 설계지침으로 활용하기 위해, 확대역과 300~3400Hz 의 협대역 전화에 대한 통화품질 평가결과를 비교/분석하였다. 토화품질 평가는, 조정법에 의한 선호 라우드니스 레벨고 ㅏ동가 라우드니스 레벨 조정실험, 그리고 단음절 명료도평가로 구성되었다. 선호 라우드니스 레벨 조정실험의 결과, 협대역과 확대역 음성에 대한 피험자의 선호레벨이 각각 70.7dB 및 68.6dB로 약 2dB의 차이를, 피험자간 분산은 2.12와 6.11로 의미있는 차이를 보였는데, 이것은 음성대역의 확장에 따라 사용자들의분산이 크기 때문에 확대역 전화에서 수화음량 조절기능이 필요함을 증명한 결과이다. 그리고, 협/확대역 조건에서의 100개 단음절에 대한 명료도 실험 결과에서, 전체 명료도 점수간에는 통계적으로 의미있는 차이를 보이지 않았지만 단음절중 3,400Hz이상에서 많은 에너지를 갖는, 파열음 'ㅌ', 파찰음 'ㅈ', 'ㅉ', 'ㅊ', 그리고 마찰음 'ㅅ', 'ㅆ' 으로 시작하는 20개 단음절에 대한 부분명료도에 있어서 협대역과 확대역 조건간에 20%의 명료도 차이를 나타내었다. 또한, 비교 라우드니스 레벨 조정실험의 결과, 협대역과 확대역 사이의 평균 라우드니스 레벨 차이가 약 3.4dB (A)로 나타났는데, 이 결과는 국내 확대역 전화의 수화음량적격 설정에 지침으로 활용할 것이다.
PDF

An Acoustic and Aerodynamic Study of Korean Fricatives, Affricates, Alveolar Plosives (한국어 마찰음, 파찰음, 치조 파열음의 음향학적 및 공기역학적 특성에 관한 연구)

Choi Jae-Nam;Nam Do Hyun;Choi Hong-Shik
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.16 no.2
- /
- pp.152-157
- /
- 2005
Background and Objectives : 10 normal Korean native speakers participated subjects to investigate the acoustic and aerodynamic study of Korean fricatives, affricates, and plosives and to make good use of the results for the patients with articulation problems. Materials and Method Their productions of [asa], [as'a], [aca], $[ac^ha]$, (ac'a), (ata) , $[at^ha]$, and [at'a] were analyzed with Lx Speech Studio Program (Laryngogrtaph Ltd, UK) for acoustic analysis and Phonatory Function Analyze. (Nagashima Ltd. Model PS 77H, Tokyo, Japan) for aerodynamic analysis. Results : The results are as follows : 1) Plosives showed higher Qx1 in vocal folds closure ratio than fricatives and affricates. 2) Tense fricatives, affricates, and plosives showed higher Qx2 in vocal folds closure ratio than asperated and 1ax. 3) Asperated showed higher Qx1 in vocal folds closure ratio than tense and 1ax. 4) Asperated showed higer peak flow rate than tense and 1ax. Conclusion This results may be helpful for treatment in articulation disorders.
PDF

Speech Recognition of Korean Phonemes 'ㅅ', 'ㅈ', 'ㅊ' based on Volatility and Turning Points (변동성과 전환점에 기반한 한국어 음소 'ㅅ', 'ㅈ', 'ㅊ' 음성 인식)

Lee, Jae Won
- KIISE Transactions on Computing Practices
- /
- v.20 no.11
- /
- pp.579-585
- /
- 2014
A phoneme is the minimal unit of speech, and it plays a very important role in speech recognition. This paper proposes a novel method that can be used to recognize 'ㅅ', 'ㅈ', and 'ㅊ' among Korean phonemes. The proposed method is based on a volatility indicator and a turning point indicator that are calculated for each constituting block of the input speech signal. The volatility indicator is the sum of the differences between the values of each two samples adjacent in a block, and the turning point indicator is the number of extremal points at which the direction of the increment or decrement of the values of the sample are inverted in a block. A phoneme recognition algorithm combines the two indicators to finally determine the positions at which the three target phonemes mentioned above are recognized by utilizing optimized thresholds related with those indicators. The experimental results show that the proposed method can markedly reduce the error rate of the existing methods both in terms of the false reject rate and the false accept rate.
https://doi.org/10.5626/KTCP.2014.20.11.579 인용

Phoneme Recognition and Error in Postlingually Deafened Adults with Cochlear Implantation (언어습득 이후 난청 성인 인공와우이식자의 음소 지각과 오류)

Choi, A.H.;Heo, S.D.
- Journal of rehabilitation welfare engineering & assistive technology
- /
- v.8 no.3
- /
- pp.227-232
- /
- 2014
The aim of this study was to investigate phoneme recognition in postlingually deafened adults with cochlear implantation. 21-cochlear implantee were participated. They was used cochlear implants more than 1 year. In order to measure consonant performance abilities, subjects were asked for 18 items of Korean consonants in a "aCa" condition with audition alone. The scores ranged from 11 to 86 ($60{\pm}17$)%. The consonant performance abilities correlated with implanted hearing threshold level, significantly (p<.046). This results suggest that consonant performance abilities of postlingual deafened adults cochlear implantee be important for implanted hearing. They had higher correct rates for fricatives and affricatives with distinctive frequency bands than for plosives, liquids & nasals with the same or adjacent frequency bands. All subjects had confusion patterns among the consonants of the same manner of articulation. The reason of consonant confusions was caused that they couldn't recognize different intensities and durations of consonants with the same or adjacent frequency bands.
PDF

Acoustic model training using self-attention for low-resource speech recognition (저자원 환경의 음성인식을 위한 자기 주의를 활용한 음향 모델 학습)

Park, Hosung;Kim, Ji-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.5
- /
- pp.483-489
- /
- 2020
This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.
https://doi.org/10.7776/ASK.2020.39.5.483 인용 PDF KSCI

LYRYNGEAL ADJUSTMENTS FOR KOREAN CONSONANTS (한국어 파열음에 대한 후두내근의 역할)

;H. Hirose
- Proceedings of the KOR-BRONCHOESO Conference
- /
- 1991.06a
- /
- pp.15-15
- /
- 1991
한국어 자음에 대한 생리적인 분류는 조음점 및 조음발법에 따라 다시 세분화할 수 있는데 그중에서 조음발법에 따라 파열음, 마찰음, 파찰음 및 비음들 여러가지로 분류할 수 있다. 그중 특히 파열음은 그 개방하는 방법에 따라 연음(lenis), 경음(glottalized) 및 기식음(aspirated)등으로 구분하는데 이러한 각음을 육안으로 확인하면 모음이 발성되기 위한 성대진동이 있기전의 자음을 위한 성대의 운동의 현상을 보면 기식음에서는 성대열림이 가장 크고 연음에서도 열림이 크지만 기식음보다는 적고 경음에서는 성대의 열림이 가장 작았다. 이러한 현상은 후두내시경에 의해 쉽게 확인할 수 있었는데 이것을 과학적으로 규명하기 위해서는 여러연구에 의해 가능하나 흔히 후두근전도 검사에 의한 성대내전근과 외전근의 역할의 차이를 비교함으로서 가능해지리라 예상되어 본 연구를 시행하였다. 사용된 문형 또는 단어는 한가지를 제외하고는 모두 의미있는 단어를 사용하였으며 EMG recording을 위해 사용된 근육은 후두내전근인 Vocalis muscle과 후두외전근인 Posterior cricoarytenoid muscle이 사용되었고 전기신호는 computer data processing system에 의해 분석되어졌다. 결과는 내시경에 의한 성대열림의 거리측정 결과를 분석함과 동시에 후두내근에 대한 근전도검사에 의한 분석을 토대로 하였으며 이를 간단히 설명하면 이제까지 많은 사람들은 한국어 자음에 대한 각각의 특징적인 현상들을 주로 성대내전근의 역할에 의해 규명하였으나 본 결과로는 성대내전근의 역할도 중요하지만 성대외전근의 역할 또한 상호 연관성을 가지면서 매우 중요한 역할을 한다는 점이다.for the Isotropic plates can be used. Use of some coefficients can produce "exact" value for laminates with such configuration.trap with 2.88[eV] deep of injected space charge from the chathode in the crystaline regions. The origin of ${\alpha}$$_2$ peak was regarded as the detrapping process of ions trapped with 0.9[eV] deep originated from impurity-ion remained in the specimen during production process of the material, in the crystalline regions. The origin of ${\beta}$ peak was concluded to be due to the depolarization process of "C=0"dipole with the activation energy of 0.75[eV] in the amorphous regions. The origin of ${\gamma}$ peak was responsible to the process combined with the depolarization of "CH$_3$", chain segment, with the activation energy of carriers from the shallo
PDF

Adaptive Noise Reduction using Standard Deviation of Wavelet Coefficients in Speech Signal (웨이브렛 계수의 표준편차를 이용한 음성신호의 적응 잡음 제거)

황향자;정광일;이상태;김종교
- Science of Emotion and Sensibility
- /
- v.7 no.2
- /
- pp.141-148
- /
- 2004
This paper proposed a new time adapted threshold using the standard deviations of Wavelet coefficients after Wavelet transform by frame scale. The time adapted threshold is set up using the sum of standard deviations of Wavelet coefficient in cA3 and weighted cDl. cA3 coefficients represent the voiced sound with low frequency and cDl coefficients represent the unvoiced sound with high frequency. From simulation results, it is demonstrated that the proposed algorithm improves SNR and MSE performance more than Wavelet transform and Wavelet packet transform does. Moreover, the reconstructed signals by the proposed algorithm resemble the original signal in terms of plosive sound, fricative sound and affricate sound but Wavelet transform and Wavelet packet transform reduce those sounds seriously.
PDF

A Study on the Intelligibility of Esophageal Speech (식도발성 발화의 명료도에 대한 연구)

Pyo, Hwa-Young
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.5
- /
- pp.182-187
- /
- 2007
The present study was to investigate the speech intelligibility of esophageal speech, which is the way that the laryngectomized people who lost their voices by total laryngectomy can phonate by using the airstream driven into esophagus, not trachea. Three normal listeners transcribed the CVVand VCV syllables produced by 10 esophageal speakers. As a result, overall intelligibility of esophageal speech was 27%. Affricates showed the highest intelligibility, and fricatives, the lowest. In the aspect of the place of articulation, palatals were the most intelligble, and alveolars, the least. Most of the aspirated consonants showed a low intelligibility. The consonants in VCV syllables were more intelligible than the ones in CVV syllables. The low intelligibility of esophageal speakers is due to insufficient airflow intake into esophagus. Therefore, training to increase airflow intake, as well as correct articulation training, will improve their low intelligibility.
https://doi.org/10.7776/ASK.2007.26.5.182 인용 PDF KSCI

Korean Phoneme Recognition Using Self-Organizing Feature Map (SOFM 신경회로망을 이용한 한국어 음소 인식)

Jeon, Yong-Koo;Yang, Jin-Woo;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.2
- /
- pp.101-112
- /
- 1995
In order to construct a feature map-based phoneme classification system for speech recognition, two procedures are usually required. One is clustering and the other is labeling. In this paper, we present a phoneme classification system based on the Kohonen's Self-Organizing Feature Map (SOFM) for clusterer and labeler. It is known that the SOFM performs self-organizing process by which optimal local topographical mapping of the signal space and yields a reasonably high accuracy in recognition tasks. Consequently, SOFM can effectively be applied to the recognition of phonemes. Besides to improve the performance of the phoneme classification system, we propose the learning algorithm combined with the classical K-mans clustering algorithm in fine-tuning stage. In order to evaluate the performance of the proposed phoneme classification algorithm, we first use totaly 43 phonemes which construct six intra-class feature maps for six different phoneme classes. From the speaker-dependent phoneme classification tests using these six feature maps, we obtain recognition rate of $87.2\%$ and confirm that the proposed algorithm is an efficient method for improvement of recognition performance and convergence speed.
PDF

Search Result 33, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)