• Title/Summary/Keyword: Speech recognition

Search Result 2,051, Processing Time 0.024 seconds

Effects of Articulator-distance and Tense in Phonological Awareness in Korean: The case of Korean Infants and Toddlers (한국어 음운인식에서의 조음거리와 긴장성 자질의 특성 연구: 영·유아를 중심으로)

  • Kim, Choong-Myung
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.424-433
    • /
    • 2015
  • This study tried to investigate the differences between auditory preferences for a discrimination study of minimal pairs with the different onset and the same nucleus of a syllable on the basis of articulator-distance in case of Korean infants and toddlers. As a result we found a main effect for articulator-distance and age but not an effect according to the types of phonation especially in terms of tense. Former results are line with the previous studies having reported the order of consonants acquisition based on the places of articulation suggesting that more sensitive responses for the contiguous and different phonemes may lead earlier acquisition for the same place of articulation of the speech sounds. Specifically, bilabial soudns are followed by alveolar and palatal sounds in order. The latter results also showed that tense consonants got a high rate of recognition beside lax consonants according to the age and sex.

Determinants of Safety and Satisfaction with In-Vehicle Voice Interaction : With a Focus of Agent Persona and UX Components (자동차 음성인식 인터랙션의 안전감과 만족도 인식 영향 요인 : 에이전트 퍼소나와 사용자 경험 속성을 중심으로)

  • Kim, Ji-hyun;Lee, Ka-hyun;Choi, Jun-ho
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.573-585
    • /
    • 2018
  • Services for navigation and entertainment through AI-based voice user interface devices are becoming popular in the connected car system. Given the classification of VUI agent developers as IT companies and automakers, this study explores attributes of agent persona and user experience that impact the driver's perceived safety and satisfaction. Participants of a car simulator experiment performed entertainment and navigation tasks, and evaluated the perceived safety and satisfaction. Results of regression analysis showed that credibility of the agent developer, warmth and attractiveness of agent persona, and efficiency and care of the UX dimension showed significant impact on the perceived safety. The determinants of perceived satisfaction were unity of auto-agent makers and gender as predisposing factors, distance in the agent persona, and convenience, efficiency, ease of use, and care in the UX dimension. The contributions of this study lie in the discovery of the factors required for developing conversational VUI into the autonomous driving environment.

A Comparative Performance Analysis of Spark-Based Distributed Deep-Learning Frameworks (스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석)

  • Jang, Jaehee;Park, Jaehong;Kim, Hanjoo;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.299-303
    • /
    • 2017
  • By piling up hidden layers in artificial neural networks, deep learning is delivering outstanding performances for high-level abstraction problems such as object/speech recognition and natural language processing. Alternatively, deep-learning users often struggle with the tremendous amounts of time and resources that are required to train deep neural networks. To alleviate this computational challenge, many approaches have been proposed in a diversity of areas. In this work, two of the existing Apache Spark-based acceleration frameworks for deep learning (SparkNet and DeepSpark) are compared and analyzed in terms of the training accuracy and the time demands. In the authors' experiments with the CIFAR-10 and CIFAR-100 benchmark datasets, SparkNet showed a more stable convergence behavior than DeepSpark; but in terms of the training accuracy, DeepSpark delivered a higher classification accuracy of approximately 15%. For some of the cases, DeepSpark also outperformed the sequential implementation running on a single machine in terms of both the accuracy and the running time.

Contextual In-Video Advertising Using Situation Information (상황 정보를 활용한 동영상 문맥 광고)

  • Yi, Bong-Jun;Woo, Hyun-Wook;Lee, Jung-Tae;Rim, Hae-Chang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.8
    • /
    • pp.3036-3044
    • /
    • 2010
  • With the rapid growth of video data service, demand to provide advertisements or additional information with regard to a particular video scene is increasing. However, the direct use of automated visual analysis or speech recognition on videos virtually has limitations with current level of technology; the metadata of video such as title, category information, or summary does not reflect the content of continuously changing scenes. This work presents a new video contextual advertising system that serves relevant advertisements on a given scene by leveraging the scene's situation information inferred from video scripts. Experimental results show that the use of situation information extracted from scripts leads to better performance and display of more relevant advertisements to the user.

A Study on Classification of Waveforms Using Manifold Embedding Based on Commute Time (컴뮤트 타임 기반의 다양체 임베딩을 이용한 파형 신호 인식에 관한 연구)

  • Hahn, Hee-Il
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.148-155
    • /
    • 2014
  • In this paper a commute time embedding is implemented by organizing patches according to the graph-based metric, and its properties are investigated via changing the number of nodes on the graph.. It is shown that manifold embedding methods generate the intrinsic geometric structures when waveforms such as speech or music instrumental sound signals are embedded on the low dimensional Euclidean space. Basically manifold embedding algorithms only project the training samples on the graph into an embedding subspace but can not generalize the learning results to test samples. They are very effective for data clustering but are not appropriate for classification or recognition. In this paper a commute time guided transform is adopted to enhance the generalization ability and its performance is analyzed by applying it to the classification of 6 kinds of music instrumental sounds.

The Prosodic Changes of Korean English Learners in Robot Assisted Learning (로봇보조언어교육을 통한 초등 영어 학습자의 운율 변화)

  • In, Jiyoung;Han, JeongHye
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.4
    • /
    • pp.323-332
    • /
    • 2016
  • A robot's recognition and diagnosis of pronunciation and its speech are the most important interactions in RALL(Robot Assisted Language Learning). This study is to verify the effectiveness of robot TTS(Text to Sound) technology in assisting Korean English language learners to acquire a native-like accent by correcting the prosodic errors they commonly make. The child English language learners' F0 range and speaking rate in the 4th grade, a prosodic variable, will be measured and analyzed for any changes in accent. We compare whether robot with the currently available TTS technology appeared to be effective for the 4th graders and 1st graders who were not under the formal English learning with native speaker from the acoustic phonetic viewpoint. Two groups by repeating TTS of RALL responded to the speaking rate rather than F0 range.

Deep Neural Network Model For Short-term Electric Peak Load Forecasting (단기 전력 부하 첨두치 예측을 위한 심층 신경회로망 모델)

  • Hwang, Heesoo
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.1-6
    • /
    • 2018
  • In smart grid an accurate load forecasting is crucial in planning resources, which aids in improving its operation efficiency and reducing the dynamic uncertainties of energy systems. Research in this area has included the use of shallow neural networks and other machine learning techniques to solve this problem. Recent researches in the field of computer vision and speech recognition, have shown great promise for Deep Neural Networks (DNN). To improve the performance of daily electric peak load forecasting the paper presents a new deep neural network model which has the architecture of two multi-layer neural networks being serially connected. The proposed network model is progressively pre-learned layer by layer ahead of learning the whole network. For both one day and two day ahead peak load forecasting the proposed models are trained and tested using four years of hourly load data obtained from the Korea Power Exchange (KPX).

Classification of Consonants by SOM and LVQ (SOM과 LVQ에 의한 자음의 분류)

  • Lee, Chai-Bong;Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.1
    • /
    • pp.34-42
    • /
    • 2011
  • In an effort to the practical realization of phonetic typewriter, we concentrate on the classification of consonants in this paper. Since many of consonants do not show periodic behavior in time domain and thus the validity for Fourier analysis of them are not convincing, vector quantization (VQ) via LBG clustering is first performed to check if the feature vectors of MFCC and LPCC are ever meaningful for consonants. Experimental results of VQ showed that it's not easy to draw a clear-cut conclusion as to the validity of Fourier analysis for consonants. For classification purpose, two kinds of neural networks are employed in our study: self organizing map (SOM) and learning vector quantization (LVQ). Results from SOM revealed that some pairs of phonemes are not resolved. Though LVQ is free from this difficulty inherently, the classification accuracy was found to be low. This suggests that, as long as consonant classification by LVQ is concerned, other types of feature vectors than MFCC should be deployed in parallel. However, the combination of MFCC/LVQ was not found to be inferior to the classification of phonemes by language-moded based approach. In all of our work, LPCC worked worse than MFCC.

A Study on Speech Recognition using DMS Model (DMS 모델을 이용한 음성인식에 관한 연구)

  • An, Tae-Ock;Byun, Yong-Kyu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.41-50
    • /
    • 1994
  • This paper proposes a DMS(Dynamic Multi-Section) model based on the information of the similar features in word pattern. This model represents each word as a time series of several sections and each section implies duration time information and typical feature vectors. The procedure to make a model in the word pattern is that typical feature vector and duration time information are reflected in the distance, when matching between word pattern and model is repeated. As the result of it, the accumulated distance by matching is to be minimized.

  • PDF

A Study on the Perception of the Right to Vote of Persons with Developmental Disabilities in College Students: Based on the Experience of Disability Related Education (발달장애인 선거권에 대한 대학생의 인식 연구: 장애관련 교육경험 유무를 중심으로)

  • Lee, Woo-Jin;Kim, Tae-Gang
    • Journal of Digital Convergence
    • /
    • v.16 no.2
    • /
    • pp.65-71
    • /
    • 2018
  • The purpose of this study is to examine the perception of the right to vote of persons with developmental disabilities in college students. College students attending in A University and B University in Gwangju Metropolitan City were selected using convenience sampling and 370 samples were finally analyzed. The results were as follows. First, the subjects who took a disability course had high perception of the right to vote of persons with developmental disabilities. Second, it was discovered that the more times people participated in the volunteer work related to disabilities, the more recognition they had on the voting rights of people with developmental disabilities. Third, the subjects who responded for the need for the political rights of persons with developmental disabilities had more positive perception of the right to vote of persons with developmental disabilities than those who did not responded. Based on the findings, it was suggested that methods should be investigated to establish positive attitude and perception of the right to vote of persons with developmental disabilities in persons without developmental disabilities including college students.