Search | Korea Science

Auto-Segmentation of Unsegmented Speech based on HMM and Time-Synchronous Viterbi Algorithm (시간동기형 Viterbi 알고리즘과 HMM에 기반한 음성의 자동 세그멘테이션)

오세진;황철준;김범국;정호열;정현열
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04b
- /
- pp.592-594
- /
- 2001
본 연구에서는 음성인식에 있어서 음향모델의 고정도화를 위해 통계적 방법인 HMM과 시간동기형 Viterbi 알고리즘을 기반으로 한 세그멘트되지 않은 음성의 자동 세그멘테이션에 관한 연구를 수행하였다. 본 연구에서는 소량의 세그멘트된 음성에 대해 연속분포형 HMM 기본모델을 작성한 후 이를 표준패턴으로 사용하고, 세그멘트되지 않은 입력음성의 특징 피라미터에 대해 시간동기형 Viterbi 알고리즘의 프레임마다 최대가 되는 지점을 최적경계로 설정하고, 앞에서 구현 최적 경계 정보와 언어학적 지식인 발음사전 정보를 이용하여 음성을 세그멘테이션 하는 것이다. 본 연구와의 비교를 위해 HTK를 이용하여 위와 동일한 과정을 수행하였다. 이렇게 구한 음성의 세그멘테이션 정보를 이용하여 연속분포형 HMM 기본모델과 HTK의 CHMM 기본모델을 각각 작성한 후, 국어공학센터(KLE) 단어 데이터에 대해 단어인식 성능을 평가하였다. 실험결과, KLE 452 남성과 여성에 대해, 본 연구실 인식 시스템은 화자독립 단어인식률 89.4%, 85.1%, HTK의 화자독립 단어인식률 85.1%, 81.9%를 각각 얻었다.
PDF

Subjective Evaluation of Computer Noise for Improving the Acoustical Environment of Open-plan Offices (사무음환경 개선을 위한 컴퓨터 소음의 감성적 평가)

정정호;송희수;전진용;조문재
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 2002.05a
- /
- pp.101-106
- /
- 2002
최근 사무환경에서 컴퓨터 사용시간이 급증함에 따라 컴퓨터 소음의 불쾌감으로 인한 저소음형 컴퓨터에 대한 요구가 증가하고 있다. 이에 따라 효율적인 소음조절을 위하여 컴퓨터 소음에 대한 실태 조사를 위하여 설문조사를 실시하였으며, 물리적 측정치와 더불어 심리음향학적 각종 파라메타를 계산하였다. 또한 일반적 업무 Task가 주어진 상태에서 컴퓨터 소음에 대한 근무자의 상하한계치 설정을 위한 청감실험을 실시하였다. 설문조사 결과, 가장 시끄럽게 인식되는 소음은 팬소음(사용자 컴퓨터)과 키보드 사용음(타인 컴퓨터)으로 나타났으며, 약 55%의 사용자들이 보통 컴퓨터 소음에 대해 시끄럽게 느끼고 있는 것으로 나타났다. 또한 현재 사용중인 컴퓨터의 소음에 대해 사용자의 약 20% 정도만이 조용하게 느끼며, 컴퓨터 소음이 업무효율을 떨어뜨린다는 응답도 약 35%로 나타났다. 팬소음, CD-Rom 구동음, HDD 작동음에 대해 청감실험을 실시한 결과, 각 소음의 상하한치는 각각 31∼51dB(A), 34∼54dB(A), 34∼58dB(A)로 나타났다.
PDF

An Objective Speech Quality Measure using Masking Effect under Digital Mobile Telephone Network Environment (디지털 이동통신망 환경 하에서 마스킹 효과를 이용한 객관적 음질 평가 척도)

김광수;김민정;석수영;정호열;정현일
- Journal of Korea Multimedia Society
- /
- v.5 no.4
- /
- pp.405-414
- /
- 2002
In this paper, we propose a new objective speech quality measure using noise masking threshold for speech quality assessment of mobile telephone network environments, and verify the effectiveness of the proposed method through the experiments. For such a purpose, well known objective speech quality measures such as BSD and PSQM are first evaluated for digital mobile telephone network environments. However, these conventional methods does not have good performance under mobile networks environments compared to literary results. To be mote effective objective speech quality measure under mobile telephone environments, the proposed method employs human psychoacoustic masking effect. The DMOS, instead of MOS, is used as a subjective speech quality measure for performance evaluation. The performance comparison are carried out with speech data collected from digital mobile telephone environments. As results, the proposed measure have and average 4% higher performance, in terms of correlation, than existing objective speech quality measures such as BSD and PSQM.
PDF

A Study on the Efficiency Evaluation of Ultrasound Therapy Using Varicose Vein Simulated Tissue Phantom and Tissue Equivalent Phantom (하지정맥류 모사 생체조직 팬텀과 조직등가 팬텀을 이용한 초음파 치료효과 평가에 관한 연구)

Kim, Ju-Young;Jung, Tae-Woong;Shin, Kyoung-Won;Noh, Si-Cheol;Choi, Heung-Ho
- Journal of the Korean Society of Radiology
- /
- v.12 no.3
- /
- pp.427-433
- /
- 2018
Because of the expectation of the non-invasive treatment effect, Various studies on the treatment of varicose veins using focused ultrasound are reported. In this study, the bio-tissue phantom and tissue equivalent phantom that can be applied to estimation of ultrasonic varicose veins treatment effect. Each phantom was evaluated for its usefulness by evaluating the acoustic characteristics and the shrinkage rate according to the ultrasonic irradiation. A multi-layer structure phantom with three layers of skin, fat, and muscle was constructed considering the structure of the tissue where the varicose veins occurred. The materials constituting each layer were made to have characteristics similar to human body. In addition, the multi-layered phantoms with blood vessel mimic tube, with bovine blood vessel, and with animal tissue were fabricated. The degree of shrinkage of blood vessel mimic material and vascular tissue according to ultrasonic irradiation was evaluated using B-mode image. As the results of this study, it was thought that the proposed phantom could be used effectively in the evaluation of ultrasonic varicose veins treatment. In addition, it is thought that these phantoms could be applied to the development of varicose vein treatment device using the focused ultrasound and the verification of the therapeutic effect.
https://doi.org/10.7742/jksr.2018.12.3.427 인용 PDF KSCI

Physical factors Affecting Sound Sensation for Korean Traditional Silk Fabrics with Similar Sound Pressure Levels (유사 음압 전통 견직물의 소리 감각에 영향을 미치는 물리적 요인)

Cho Su-Min;Cho Gil-Soo;Yi Eun-Jou
- Science of Emotion and Sensibility
- /
- v.9 no.1
- /
- pp.39-48
- /
- 2006
This study was carried out to investigate sound sensation of Korean traditional silk fabrics with similar sound pressure levels (SPL) and to identify secondary physical factors excluding SPL which determine sound sensation of the fabrics. Sounds of the silk fabrics tended to be perceived differently from one another as for some of sensation such as clearness ant roughness. They were felt more strongly in aspects of loudness, roughness, and highness than of softness, sharpness, clearness, and pleasantness. Subjective clearness, roughness, and highness were significantly correlated with some of sound parameters including roughness(z), ${\Delta}L,\;and\;{\Delta}f$. Especially, both of clearness and roughness which were varied among the fabrics were found as determined by ${\Delta}L$. This result means that ${\Delta}L$ as well as roughness(z) and ${\Delta}f$ could be utilized secondary to SPL in order to satisfy some of human sensibility for sound from traditional silk fabrics without variation of physical loudness.
PDF

Sound Sensation and Its Related Objective Parameters of Nylon Fabrics for Sports Outerwear (스포츠 아우터웨어용 나일론 직물의 소리 감각과 이와 관련된 객관적 파라미터들)

Yi, Eunjou;Cho, Gilsoo
- Journal of the Korean Society of Clothing and Textiles
- /
- v.25 no.9
- /
- pp.1593-1602
- /
- 2001
본 연구는 스포츠 아우터웨어용 나일론 직물의 소리에 대한 주관적 감각과 이에 관련된 객관적 측정치를 규명하기 위하여, 서로 다른 8종의 나일론 직물의 소리의 스펙트럼 파형을 고찰하였으며, 소리 파라미터로 총음압(level pressure of total sound, LPT),세 가지 AR (autoregressive)계수, Zwicker의 심리음향학적 모델에 따른 크기(Z)와 날카로움(Z)를 계산하였고, Kawabata Evaluation System(KES)으로 직물의 물리적 성질을 측정하였다. 주관적 감각 평가를 위하여 피험자에게 녹음된 각 직물소리를 들려주어 7개 소리 감각 (부드러움, 시끄러움, 날카로움, 맑음, 거 침, 높음, 유쾌함)을 의미분별척도로 답하게 한 후, 단계적 선형 회귀식을 이용하여 직물 소리의 주관적 감각에 대한 예측 모델을 제시하였다. 울트라스웨이드를 제외한 태피터 나일론 직물들은 스펙트럼 파형 에서 다른 조성 섬유의 직물들보다 음압 값이 높고, 총음압이 60dB 안팎의 값을 보여, 착용자에게 불쾌감을 줄 것으로 예상되었으며, 주관적 감각 평가에서도 소리의 부드러움과 맑음, 유쾌함에서 음의 점수를, 시끄러움과 날카로움, 거침, 높음에서 양의 점수를 얻었다. 주관적 감각의 예측모델에서 총음압은 시끄러움과 거침에 정적 영향을, 유쾌함에 부적 영향을 미쳐서 나일론 직물 소리의 총음압이 50dB 이하일 때 주관적으로 유쾌하게 느껴지는 것으로 나타났다.
PDF

Diagnosis and Evaluation of Humanities Therapy: The Phonetic Analysis of Speech Rates and Fundamental Frequency According to Preferred Sensation Type (인문치료의 진단 및 평가: 감각유형에 따른 말속도와 기본주파수의 실험음성학적 분석)

Lee, Chan-Jong;Heo, Yun-Ju
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.4
- /
- pp.231-237
- /
- 2011
The purpose of this study is to examine the correlation between the preferred sensation type and speech sounds, especially on $F_0$ and the speech rates. Data for the sensation types and speech sounds were collected from 36 undergraduate and graduate students (17 male, 19 female). Subjects were asked to read a given text (400 syllables), describe a drawing, and give answers to some questions. We measured speakers' $F_0$ and speech rates. The results show that type V (Visual) has the correlation with the speech rates when type D (Digital) was ruled out, and type A (Auditory) has the correlation with the speech rates when type D was included. Furthermore, the analysis of the mean values of V, A, K (Visual, Auditory, Kinethetic) indicates that type V is characterized with faster speech rates and higher $F_0$ in all parts except for interview and the same is true for that of V, A, K, D (Visual, Auditory, Kinethetic, Digital) in all parts. In conclusion, this study proved that the preferred sensation type has the correlation with $F_0$ and speech rates. Based on the results of this study, $F_0$ and speech rates can be used to analyze the sensation types for individualized education as well as consultation. In addition, this study has great significance in that it lays a foundation for the study on the correlation between a preferred sensation type and speech sounds.
https://doi.org/10.7776/ASK.2011.30.4.231 인용 PDF KSCI

Fabrication and Evaluation of High Frequency Ultrasound Receive Transducers for Intravascular Photoacoustic Imaging (혈관내 광음향 영상을 위한 고주파수 초음파 수신 변환기 제작 및 평가)

Lee, Jun-Su;Chang, Jin Ho
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.5
- /
- pp.300-308
- /
- 2014
Photoacoustic imaging is a useful tool for the diagnosis of atherosclerosis because it is capable of providing anatomical and pathological information at the same time. A photoacoustic signal detector is a pivotal element to achieve high spatial resolution, so that it should have broadband spectrum with a high center frequency. Since a photoacoustic imaging probe is directly inserted into blood vessel to diagnose atherosclerosis, the total size of the photoacoustic signal detector should be less than 1 mm. The main purpose of this paper is to demonstrate that PVDF can be used as an active material for the photoacoustic signal detector with a high frequency and broadband characteristic. The photoacoustic signal detector developed in this study was a single element ultrasound transducer with an aperture of $0.5{\times}0.5mm$ and the total size of 1 mm. In the design stage, the natural focal depth was adjusted for an effective focal area to cover the region of interest, i.e., 1~5 mm in depth. This was because geometrical focusing could not be used due to the small aperture. Through a pulse-echo test, it was ascertained that the developed photoacoustic signal detector has the -6 dB bandwidth ranging between 40.1 and 112.8 MHz and the center frequency of 76.83 MHz.
https://doi.org/10.7776/ASK.2014.33.5.300 인용 PDF KSCI

Analysis of bed change based on the geometric characteristics of channel cross-sections (유로 단면의 기하학적 특성을 이용한 하상변화량 분석)

Ko, Joo Suk;Lee, Kyungsu;Kwak, Sunghyun;Lyu, Siwan
- Journal of Korea Water Resources Association
- /
- v.53 no.12
- /
- pp.1097-1107
- /
- 2020
A methodology has been proposed to understand the spatiotemporal changes of the river topography through the longitudinal change of the geometric characteristics of the cross-sections and the properties related thereto. Three-dimensional spatial information of the riverbed was obtained through the detailed bathymetry survey using an acoustic echo sounder for the reach from Gumi Weir to Chilgok Weir in the Nakdong river. Geometric informations for the reference sections were extracted using the acquired bathymetry survey data. By comparing the geometric properties for the reference sections, it was possible to catch the topographic characteristics and its changes over a reach of the channel. Through comparison with past survey data, it was also possible to quantitatively grasp the amount of change in cross-sectional area and volumetric change of riverbed. It is expected that a quantitative evaluation of river topography changes will be possible by applying the method proposed in this study.
https://doi.org/10.3741/JKWRA.2020.53.12.1097 인용 PDF KSCI

The Study on Korean Prosody Generation using Artificial Neural Networks (인공 신경망의 한국어 운율 발생에 관한 연구)

Min Kyung-Joong;Lim Un-Cheon
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.337-340
- /
- 2004
The exactly reproduced prosody of a TTS system is one of the key factors that affect the naturalness of synthesized speech. In general, rules about prosody had been gathered either from linguistic knowledge or by analyzing the prosodic information from natural speech. But these could not be perfect and some of them could be incorrect. So we proposed artificial neural network(ANN)s that can be trained to team the prosody of natural speech and generate it. In learning phase, let ANNs learn the pitch and energy contour of center phoneme by applying a string of phonemes in a sentence to ANNs and comparing the output pattern with target pattern and making adjustment in weighting values to get the least mean square error between them. In test phase, the estimation rates were computed. We saw that ANNs could generate the prosody of a sentence.
PDF

Search Result 137, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)