• Title/Summary/Keyword: overall speech rate

Search Result 61, Processing Time 0.025 seconds

A Study on Word Vector Models for Representing Korean Semantic Information

  • Yang, Hejung;Lee, Young-In;Lee, Hyun-jung;Cho, Sook Whan;Koo, Myoung-Wan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.41-47
    • /
    • 2015
  • This paper examines whether the Global Vector model is applicable to Korean data as a universal learning algorithm. The main purpose of this study is to compare the global vector model (GloVe) with the word2vec models such as a continuous bag-of-words (CBOW) model and a skip-gram (SG) model. For this purpose, we conducted an experiment by employing an evaluation corpus consisting of 70 target words and 819 pairs of Korean words for word similarities and analogies, respectively. Results of the word similarity task indicated that the Pearson correlation coefficients of 0.3133 as compared with the human judgement in GloVe, 0.2637 in CBOW and 0.2177 in SG. The word analogy task showed that the overall accuracy rate of 67% in semantic and syntactic relations was obtained in GloVe, 66% in CBOW and 57% in SG.

Syllable Structure Constraints and the Perception of Biconsonantal Clusters by Korean EFL Learners

  • Lee, Shinsook
    • Journal of English Language & Literature
    • /
    • v.55 no.6
    • /
    • pp.1193-1220
    • /
    • 2009
  • This study examined the impact of sonority profiles, positional differences and L2 proficiency on Korean EFL learners' perception of English biconsonantal clusters, using nonce words. The overall results showed that major predictions of the sonority-based typological markedness on consonant clusters were supported, as obstruent plus sonorant and sonorant plus obstruent sequences were better perceived than obstruent only or sonorant only sequences. Yet, some consonant clusters did not show a preference for sonority profiles. Positional effects were also confirmed, as word-initial biconsonantal clusters were better perceived than wordfinal ones across all the participant groups. Participants' English proficiency turned out to be also important in the perception of consonant clusters, since university students' mean rate of accuracy was highest, followed by that of high school students, which in turn followed by that of middle school students. Further, the effects of other factors like frequency and stimuli on speech perception were also addressed, along with some implications for future research.

The Role of Post-lexical Intonational Patterns in Korean Word Segmentation

  • Kim, Sa-Hyang
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.37-62
    • /
    • 2007
  • The current study examines the role of post-lexical tonal patterns of a prosodic phrase in word segmentation. In a word spotting experiment, native Korean listeners were asked to spot a disyllabic or trisyllabic word from twelve syllable speech stream that was composed of three Accentual Phrases (AP). Words occurred with various post-lexical intonation patterns. The results showed that listeners spotted more words in phrase-initial than in phrase-medial position, suggesting that the AP-final H tone from the preceding AP helped listeners to segment the phrase-initial word in the target AP. Results also showed that listeners' error rates were significantly lower when words occurred with initial rising tonal pattern, which is the most frequent intonational pattern imposed upon multisyllabic words in Korean, than with non-rising patterns. This result was observed both in AP-initial and in AP-medial positions, regardless of the frequency and legality of overall AP tonal patterns. Tonal cues other than initial rising tone did not positively influence the error rate. These results not only indicate that rising tone in AP-initial and AP_final position is a reliable cue for word boundary detection for Korean listeners, but further suggest that phrasal intonation contours serve as a possible word boundary cue in languages without lexical prominence.

  • PDF

Robust, Low Delay Multi-tree Speech Coding at 9.6Kbits/sec (견실, 저지연 멀티트리 9.6Kbits/s 음성부호기에 관한 연구)

  • 우홍체;문병현;이채욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.3
    • /
    • pp.348-354
    • /
    • 1993
  • In this research, a multi-tree coder at 9.6Kbits/sec using a novel scheme for adaptation of the short-term coefficients is developed. The overall delay of the tree coder is maintained at 2.5 msec(16 samples at the 6.4KHz sampling frequency). This coder produces good quality speech over ideal channels, and it is very robust to channel errors up to a bit error rate (BER) of $10^{-3}$. This robustness is achieved by using a parallel adaptation scheme in combination with the use of a smoothed version of the received excitation sequence for adaptation of the short-term prediction coefficients. For the multi-tree coder, reconstructed output speech is evaluated using signal-to-quantization noise ratios (SNR), segmental SNRs, and informal listening tests.

  • PDF

Delayless MDCT for Scalable Speech Codec (계층구조 음성 부호화기를 위한 지연 없는 MDCT 구조)

  • Sung, Ho-Sang;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3
    • /
    • pp.102-108
    • /
    • 2007
  • A high-Performance scalable speech codec generally requires a very low-rate first layer and a fine granule second layer, and this codec can be implemented with the harmonic codec and the MDCT-based transform codec for each layer. In this structure, however. each codec requires independent frequency transform and the time delay of each codec is accumulated. resulting in long time delay for the overall codec. In this paper, new MDCT structure in the second layer is Proposed. where MDCT is forced to share the look-ahead region of the first layer in order to prevent the time delay accumulation and the resulting functional error of MDCT is analyzed and removed after IMDCT The Proposed delayless MDCT requires no additional bits and Provides the equivalent coding performance with the reduced time delay, yielding a meaningful enhancement of the overall codec.

Evaluating Impressions of Robots According to the Robot's Embodiment Level and Response Speed (로봇의 외형 구체화 정도 및 반응속도에 따른 로봇 인상 평가)

  • Kang, Dahyun;Kwak, Sonya S.
    • Design Convergence Study
    • /
    • v.16 no.6
    • /
    • pp.153-167
    • /
    • 2017
  • Nowadays, as many robots are developed for desktop, users interact with the robots based on speech. However, due to technical limitations related to speech-based interaction, an alternative is needed. We designed this research to design a robot that interacts with the user by using unconditional reflection of biological signals. In order to apply bio-signals to robots more effectively, we evaluated the robots' overall service evaluation, perceived intelligence, appropriateness, trustworthy, and sociability according to the degree of the robot's embodiment level and the response speed of the robot. The result showed that in terms of intelligence and appropriateness, 3D robot with higher embodiment level was more positively evaluated than 2D robot with lower embodiment level. Also, the robot with faster response rate was evaluated more favorably in overall service evaluation, intelligence, appropriateness, trustworthy, and sociability than the robot with slower response rate. In addition, in service evaluation, trustworthy, and sociability, there were interaction effects according to the robot's embodiment level and the response speed.

Effects of Neonatal Hearing Screening Program (NHSP) Information on Parental Satisfaction (신생아 청각선별검사 프로그램에 관한 정보제공이 부모 만족도에 미치는 영향)

  • Ahn, Hyun-Sook;Cho, Soo-Jin
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.51-59
    • /
    • 2009
  • This study was designed to investigate the effects of neonatal hearing screening program (NHSP) information on parental satisfaction with the Parent Satisfaction Questionnaire with Neonatal Hearing Screening Program (PSQ-NHSP) by Mazlan et al. (2006). The PSQ-NHSP consisted of four aspects including: information, personnel in charge of the hearing test, appointment activity, and overall satisfaction in the neonatal hearing screening program. A total of 106 parents (50 in the experimental group and 56 in the control group) participated in this study in one general hospital and two delivery clinics. The fifty parents in the experimental group received information and counseling with educational materials before filling out the PSQ-NHSP, but the fifty-six parents in the control group did not receive any counseling or education materials before completing the PSQ-NHSP. The PSQ-NHSP demonstrated excellent internal consistency reliability (${\sigma}=0.914$). The results of the study were as follows. First, the overall satisfaction ($3.77{\pm}0.81$) and personnel in charge of hearing test ($3.52{\pm}0.79$) aspects showed higher rates of satisfaction than the appointment activity aspect ($3.51{\pm}0.80$) for total subjects. Second, the overall parental satisfaction rate of the experimental group ($4.15{\pm}0.50$) was significantly higher than that of the control group ($3.09{\pm}0.53$) in all items. Lastly, thirty-two participants (30%) made at least one comment in response to the open-set items. A total of 29 comments were related to satisfaction with participating in the NHSP and II comments were related to dissatisfaction. In conclusion, to improve parental satisfaction it is important to provide parents with education and information about the NHSP before the test. In addition, PSQ-NHSP was found to be a useful instrument for identifying the benefits and shortfalls of the NHSP.

  • PDF

Automatic speech recognition using acoustic doppler signal (초음파 도플러를 이용한 음성 인식)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.74-82
    • /
    • 2016
  • In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.

Channel Coding Design Combined with Source Coder for Mobile Communication Systems (이동통신시스템을 위한 소스 코더와 결합된 채널코딩 방법 연구)

  • 김종현;이인성강석봉이정구
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.19-22
    • /
    • 1998
  • In this study, the efficient channel coding method combined with CS-ACELP is proposed. The same convolutional coder and Viterbi decoder of COMA mobile communication system is used as channel coder. To make the best available use of limited channel coding redundancy, unequal error protection of punctured convolutional coder is used for variable reate allocation. But, the overall code rate is given by 2. The performance of proposed coder is analyzed and simulated in a Rayleigh fading channel. Experimental results show that the objective and subjective speech quality of variable rate channel coding methods are superior to those of non-variable channel coding method.

  • PDF

A study of the prosodic patterns of autism and normal children in the imitating declarative and interrogative sentences (따라말하기 과제를 통한 자폐범주성 장애 아동과 일반 아동의 평서문과 의문문의 음향학적 특성 비교)

  • Lee, Jinhyung;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.39-49
    • /
    • 2020
  • The prosody of children with autism spectrum disorders (ASD) has several abnormal features, including monotonous speech. The purpose of this study was to compare acoustic features between an ASD group and a typically developing (TD) group and within the ASD group. The study also examined audience perceptions of the lengthening effect of increasing the number of syllables. 50 participants were divided into two groups (20 with ASD and 30 TD), and they were asked to imitate a total of 28 sentences. In the auditory-perceptual evaluation, seven participants chose sentence types in 115 sentences. Pitch, intensity, speech rate, and pitch slope were used to analyze the significant differences. In conclusion, the ASD group showed higher pitch and intensity and a lower overall speaking rate than the TD group. Moreover, there were significant differences in s2 slope of interrogative sentences. Finally, based on the auditory-perceptual evaluation, only 4.3% of interrogative sentences produced by participants with ASD were perceived as declarative sentences. The cause of this abnormal prosody has not been clearly identified; however, pragmatic ability and other characteristics of autism are related to ASD prosody. This study identified prosodic ASD patterns and suggested the need to develop treatments to improve prosody.