• Title/Summary/Keyword: Korean speakers

Search Result 958, Processing Time 0.02 seconds

A Study of the noise level in hospital and the Count-Measure against the noise (병실내 소음도와 환자와의 관계)

  • Kim, Myung-Ho;Cha, Il-Whan
    • Journal of Preventive Medicine and Public Health
    • /
    • v.6 no.1
    • /
    • pp.43-49
    • /
    • 1973
  • In this study noise source in a ward at four general hospitals in Seoul area has been investigated and analysed. The degree of reaction against noise by 171 randomly has also been examined. The results of the study have shown that the source of noise is the speakers of wired broadcasting or from visiting guests in two hospitals located in residential area. The patients at the two other hospitals located at commercial site have been suffered more from traffic noise. However, because of their separated living at hospital from their ordinary houselife, sixty one percent of the inpatients have wished a music sound of around 60 dB (A). After having considered the results of the investigation and wishes of the inpatients, following suggestions have been made: 1. Reduce the number of guests or their length of stay. 2. Wired broadcasting system should be substitued by wireless one, or if it's unavoidable, it should be used in office rooms only. 3. Since the stops and starts of vehicles induce much noise, Seoul City Government be requested to prepare an appropriate administrative measure for the vehicles around hospital area and it should prevent the establishment of new hospitals along high way site. 4. By using earphone, inpatients can choose a wireless channel according to each individual's taste. This through the masking effect, would cover up the noise source. 5. Rooms along the streets should be utilized as offices, otherwise double windows should be set up for inpatient's wards.

  • PDF

A Visualization Technique of Inter-Device Packet Exchanges to Test DLNA Device Interoperability (DLNA 기기의 상호운용성 시험을 위한 패킷교환정보 시각화 방법)

  • Kim, Mijung;Jin, Feng;Yoon, Ilchul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.531-534
    • /
    • 2014
  • DLNA is an established industry standard which supports contents sharing among smart devices in home wired- and wireless-network environment and is well known in Korea as Allshare or Smartshare. The DLNA standard is implemented as built-in services in most of Android smart phones and tablets. In addition to the handheld devices, DLNA service can also be employed in speakers, printers, and so on. However, users have reported many interoperability issues between DLNA devices. Developers typically identify causes by analyzing the packet exchange information between devices. However, this approach costs them to put additional effort to filter relevant packets, to reconstruct packet exchange history and the protocol flow. Consequently, it ends up with increased development time. In this paper, we demonstrate a technique to automatically analyze and visualize the packet exchange history. We modified a router firmware to capture and store packets exchanged between DLNA devices, and then analyze and visualize the stored packet exchange history for developers. We believe that visualized packet exchange history can help developers to test the interoperability between DLNA devices with less effort, and ultimately to improve the productivity of developers.

  • PDF

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

A study on the predictability of acoustic power distribution of English speech for English academic achievement in a Science Academy (과학영재학교 재학생 영어발화 주파수 대역별 음향 에너지 분포의 영어 성취도 예측성 연구)

  • Park, Soon;Ahn, Hyunkee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.41-49
    • /
    • 2022
  • The average acoustic distribution of American English speakers was statistically compared with the English-speaking patterns of gifted students in a Science Academy in Korea. By analyzing speech recordings, the duration time of which is much longer than in previous studies, this research identified the degree of acoustic proximity between the two parties and the predictability of English academic achievement of gifted high school students. Long-term spectral acoustic power distribution vectors were obtained for 2,048 center frequencies in the range of 20 Hz to 20,000 Hz by applying an long-term average speech spectrum (LTASS) MATLAB code. Three more variables were statistically compared to discover additional indices that can predict future English academic achievement: the receptive vocabulary size test, the cumulative vocabulary scores of English formative assessment, and the English Speaking Proficiency Test scores. Linear regression and correlational analyses between the four variables showed that the receptive vocabulary size test and the low-frequency vocabulary formative assessments which require both lexical and domain-specific science background knowledge are relatively more significant variables than a basic suprasegmental level English fluency in the predictability of gifted students' academic achievement.

Developing a New Algorithm for Conversational Agent to Detect Recognition Error and Neologism Meaning: Utilizing Korean Syllable-based Word Similarity (대화형 에이전트 인식오류 및 신조어 탐지를 위한 알고리즘 개발: 한글 음절 분리 기반의 단어 유사도 활용)

  • Jung-Won Lee;Il Im
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.267-286
    • /
    • 2023
  • The conversational agents such as AI speakers utilize voice conversation for human-computer interaction. Voice recognition errors often occur in conversational situations. Recognition errors in user utterance records can be categorized into two types. The first type is misrecognition errors, where the agent fails to recognize the user's speech entirely. The second type is misinterpretation errors, where the user's speech is recognized and services are provided, but the interpretation differs from the user's intention. Among these, misinterpretation errors require separate error detection as they are recorded as successful service interactions. In this study, various text separation methods were applied to detect misinterpretation. For each of these text separation methods, the similarity of consecutive speech pairs using word embedding and document embedding techniques, which convert words and documents into vectors. This approach goes beyond simple word-based similarity calculation to explore a new method for detecting misinterpretation errors. The research method involved utilizing real user utterance records to train and develop a detection model by applying patterns of misinterpretation error causes. The results revealed that the most significant analysis result was obtained through initial consonant extraction for detecting misinterpretation errors caused by the use of unregistered neologisms. Through comparison with other separation methods, different error types could be observed. This study has two main implications. First, for misinterpretation errors that are difficult to detect due to lack of recognition, the study proposed diverse text separation methods and found a novel method that improved performance remarkably. Second, if this is applied to conversational agents or voice recognition services requiring neologism detection, patterns of errors occurring from the voice recognition stage can be specified. The study proposed and verified that even if not categorized as errors, services can be provided according to user-desired results.

The role of voice onset time (VOT) and post-stop fundamental frequency (F0) in the perception of Tohoku Japanese stops (도호쿠 일본어의 폐쇄음 지각에 있어서 voice onset time(VOT)과 후속모음 fundamental frequency(F0)의 역할)

  • Hi-Gyung Byun
    • Phonetics and Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.35-45
    • /
    • 2023
  • Tohoku Japanese is known to have voiced stops without pre-voicing in word-initial position, whereas traditional or conservative Japanese has voiced stops with pre-voicing in the same position. One problem with this devoicing of voiced stops is that it affects the distinction between voiced and voiceless stops because their voice onset time (VOT) values overlap. Previous studies have confirmed that Tohoku speakers use post-stop fundamental frequency (F0) as an acoustic cue along with VOT to avoid overlap. However, the role of post-stop F0 as a perceptual cue in this region has barely been investigated. Therefore, this study explored the role of post-stop F0 in stop voicing perception along with VOT. Several perception tests were conducted using resynthesized stimuli, which were manipulated along a VOT continuum orthogonal to an F0 continuum. The results showed no significant regional difference (Tohoku vs. Chubu) for nonsense words (/ta-da/). However, for meaningful words (/pari/ 'Paris' vs. /bari/ 'Bali,' /piza/ 'pizza' vs. /biza/ 'visa'), a significant word effect was found, and it was confirmed that some listeners utilized the post-stop F0 more consistently and steadily than others. Based on these results, we discuss innovative listeners who may lead the change in the perception of stop voicing.

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people (언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법)

  • SeungKwon Lee;U-Jin Choe;Gwangil Jeon
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.17-24
    • /
    • 2022
  • With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%.

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers (머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구)

  • Jihyun Mun;Sunhee Kim;Myeong Ju Kim;Jiwon Ryu;Sejoong Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.45-56
    • /
    • 2022
  • This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.

A study on English vowel duration with respect to the various characteristics of the following consonant (후행하는 자음의 여러 특성에 따른 영어 모음 길이에 관한 연구)

  • Yoo, Hyunbin;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.1-11
    • /
    • 2022
  • The purpose of this study is to investigate the difference of vowel duration due to the voicing of word-final consonants in English and its relation to the types of word-final consonants (stops vs. fricatives), (partial) devoicing, and stop releasing. Addtionally, this study attempts to interpret the findings from the functional view that the vowels before voiced consonants are produced with a longer duration in order to enhance the salience of the voicing of word-final consonants. This study conducted a recording experiment with English native speakers, and measured the vowel duration, the degree of (partial) devoicing of word-final voiced consonants and the release of word-final stops. First, the results showed that the ratio of the duration difference was not influenced by the types of word-final consonants. Second, it was revealed that the higher the degree of (partial) devoicing of word-final voiced consonants, the longer vowel duration before word-final voiced consonants, which was compatible with the prediction based on the functional view. Lastly, the ratio of the duration difference was greater when the word-final stops were uttered with the release compared to when uttered without the release, which was not consistent with the functional view. These results suggest that it is not sufficient enough to explain the voicing effect by its function of distinguishing the voicing of word-final consonants.

QRAS-based Algorithm for Omnidirectional Sound Source Determination Without Blind Spots (사각영역이 없는 전방향 음원인식을 위한 QRAS 기반의 알고리즘)

  • Kim, Youngeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.91-103
    • /
    • 2022
  • Determination of sound source characteristics such as: sound volume, direction and distance to the source is one of the important techniques for unmanned systems like autonomous vehicles, robot systems and AI speakers. There are multiple methods of determining the direction and distance to the sound source, e.g., using a radar, a rider, an ultrasonic wave and a RF signal with a sound. These methods require the transmission of signals and cannot accurately identify sound sources generated in the obstructed region due to obstacles. In this paper, we have implemented and evaluated a method of detecting and identifying the sound in the audible frequency band by a method of recognizing the volume, direction, and distance to the sound source that is generated in the periphery including the invisible region. A cross-shaped based sound source recognition algorithm, which is mainly used for identifying a sound source, can measure the volume and locate the direction of the sound source, but the method has a problem with "blind spots". In addition, a serious limitation for this type of algorithm is lack of capability to determine the distance to the sound source. In order to overcome the limitations of this existing method, we propose a QRAS-based algorithm that uses rectangular-shaped technology. This method can determine the volume, direction, and distance to the sound source, which is an improvement over the cross-shaped based algorithm. The QRAS-based algorithm for the OSSD uses 6 AITDs derived from four microphones which are deployed in a rectangular-shaped configuration. The QRAS-based algorithm can solve existing problems of the cross-shaped based algorithms like blind spots, and it can determine the distance to the sound source. Experiments have demonstrated that the proposed QRAS-based algorithm for OSSD can reliably determine sound volume along with direction and distance to the sound source, which avoiding blind spots.