• Title/Summary/Keyword: speech recognition

Search Result 2,040, Processing Time 0.025 seconds

Determinants of Safety and Satisfaction with In-Vehicle Voice Interaction : With a Focus of Agent Persona and UX Components (자동차 음성인식 인터랙션의 안전감과 만족도 인식 영향 요인 : 에이전트 퍼소나와 사용자 경험 속성을 중심으로)

  • Kim, Ji-hyun;Lee, Ka-hyun;Choi, Jun-ho
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.573-585
    • /
    • 2018
  • Services for navigation and entertainment through AI-based voice user interface devices are becoming popular in the connected car system. Given the classification of VUI agent developers as IT companies and automakers, this study explores attributes of agent persona and user experience that impact the driver's perceived safety and satisfaction. Participants of a car simulator experiment performed entertainment and navigation tasks, and evaluated the perceived safety and satisfaction. Results of regression analysis showed that credibility of the agent developer, warmth and attractiveness of agent persona, and efficiency and care of the UX dimension showed significant impact on the perceived safety. The determinants of perceived satisfaction were unity of auto-agent makers and gender as predisposing factors, distance in the agent persona, and convenience, efficiency, ease of use, and care in the UX dimension. The contributions of this study lie in the discovery of the factors required for developing conversational VUI into the autonomous driving environment.

A Comparative Performance Analysis of Spark-Based Distributed Deep-Learning Frameworks (스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석)

  • Jang, Jaehee;Park, Jaehong;Kim, Hanjoo;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.299-303
    • /
    • 2017
  • By piling up hidden layers in artificial neural networks, deep learning is delivering outstanding performances for high-level abstraction problems such as object/speech recognition and natural language processing. Alternatively, deep-learning users often struggle with the tremendous amounts of time and resources that are required to train deep neural networks. To alleviate this computational challenge, many approaches have been proposed in a diversity of areas. In this work, two of the existing Apache Spark-based acceleration frameworks for deep learning (SparkNet and DeepSpark) are compared and analyzed in terms of the training accuracy and the time demands. In the authors' experiments with the CIFAR-10 and CIFAR-100 benchmark datasets, SparkNet showed a more stable convergence behavior than DeepSpark; but in terms of the training accuracy, DeepSpark delivered a higher classification accuracy of approximately 15%. For some of the cases, DeepSpark also outperformed the sequential implementation running on a single machine in terms of both the accuracy and the running time.

Contextual In-Video Advertising Using Situation Information (상황 정보를 활용한 동영상 문맥 광고)

  • Yi, Bong-Jun;Woo, Hyun-Wook;Lee, Jung-Tae;Rim, Hae-Chang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.8
    • /
    • pp.3036-3044
    • /
    • 2010
  • With the rapid growth of video data service, demand to provide advertisements or additional information with regard to a particular video scene is increasing. However, the direct use of automated visual analysis or speech recognition on videos virtually has limitations with current level of technology; the metadata of video such as title, category information, or summary does not reflect the content of continuously changing scenes. This work presents a new video contextual advertising system that serves relevant advertisements on a given scene by leveraging the scene's situation information inferred from video scripts. Experimental results show that the use of situation information extracted from scripts leads to better performance and display of more relevant advertisements to the user.

A Study on Classification of Waveforms Using Manifold Embedding Based on Commute Time (컴뮤트 타임 기반의 다양체 임베딩을 이용한 파형 신호 인식에 관한 연구)

  • Hahn, Hee-Il
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.148-155
    • /
    • 2014
  • In this paper a commute time embedding is implemented by organizing patches according to the graph-based metric, and its properties are investigated via changing the number of nodes on the graph.. It is shown that manifold embedding methods generate the intrinsic geometric structures when waveforms such as speech or music instrumental sound signals are embedded on the low dimensional Euclidean space. Basically manifold embedding algorithms only project the training samples on the graph into an embedding subspace but can not generalize the learning results to test samples. They are very effective for data clustering but are not appropriate for classification or recognition. In this paper a commute time guided transform is adopted to enhance the generalization ability and its performance is analyzed by applying it to the classification of 6 kinds of music instrumental sounds.

The Prosodic Changes of Korean English Learners in Robot Assisted Learning (로봇보조언어교육을 통한 초등 영어 학습자의 운율 변화)

  • In, Jiyoung;Han, JeongHye
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.4
    • /
    • pp.323-332
    • /
    • 2016
  • A robot's recognition and diagnosis of pronunciation and its speech are the most important interactions in RALL(Robot Assisted Language Learning). This study is to verify the effectiveness of robot TTS(Text to Sound) technology in assisting Korean English language learners to acquire a native-like accent by correcting the prosodic errors they commonly make. The child English language learners' F0 range and speaking rate in the 4th grade, a prosodic variable, will be measured and analyzed for any changes in accent. We compare whether robot with the currently available TTS technology appeared to be effective for the 4th graders and 1st graders who were not under the formal English learning with native speaker from the acoustic phonetic viewpoint. Two groups by repeating TTS of RALL responded to the speaking rate rather than F0 range.

Deep Neural Network Model For Short-term Electric Peak Load Forecasting (단기 전력 부하 첨두치 예측을 위한 심층 신경회로망 모델)

  • Hwang, Heesoo
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.1-6
    • /
    • 2018
  • In smart grid an accurate load forecasting is crucial in planning resources, which aids in improving its operation efficiency and reducing the dynamic uncertainties of energy systems. Research in this area has included the use of shallow neural networks and other machine learning techniques to solve this problem. Recent researches in the field of computer vision and speech recognition, have shown great promise for Deep Neural Networks (DNN). To improve the performance of daily electric peak load forecasting the paper presents a new deep neural network model which has the architecture of two multi-layer neural networks being serially connected. The proposed network model is progressively pre-learned layer by layer ahead of learning the whole network. For both one day and two day ahead peak load forecasting the proposed models are trained and tested using four years of hourly load data obtained from the Korea Power Exchange (KPX).

Classification of Consonants by SOM and LVQ (SOM과 LVQ에 의한 자음의 분류)

  • Lee, Chai-Bong;Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.1
    • /
    • pp.34-42
    • /
    • 2011
  • In an effort to the practical realization of phonetic typewriter, we concentrate on the classification of consonants in this paper. Since many of consonants do not show periodic behavior in time domain and thus the validity for Fourier analysis of them are not convincing, vector quantization (VQ) via LBG clustering is first performed to check if the feature vectors of MFCC and LPCC are ever meaningful for consonants. Experimental results of VQ showed that it's not easy to draw a clear-cut conclusion as to the validity of Fourier analysis for consonants. For classification purpose, two kinds of neural networks are employed in our study: self organizing map (SOM) and learning vector quantization (LVQ). Results from SOM revealed that some pairs of phonemes are not resolved. Though LVQ is free from this difficulty inherently, the classification accuracy was found to be low. This suggests that, as long as consonant classification by LVQ is concerned, other types of feature vectors than MFCC should be deployed in parallel. However, the combination of MFCC/LVQ was not found to be inferior to the classification of phonemes by language-moded based approach. In all of our work, LPCC worked worse than MFCC.

A Study on Speech Recognition using DMS Model (DMS 모델을 이용한 음성인식에 관한 연구)

  • An, Tae-Ock;Byun, Yong-Kyu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.41-50
    • /
    • 1994
  • This paper proposes a DMS(Dynamic Multi-Section) model based on the information of the similar features in word pattern. This model represents each word as a time series of several sections and each section implies duration time information and typical feature vectors. The procedure to make a model in the word pattern is that typical feature vector and duration time information are reflected in the distance, when matching between word pattern and model is repeated. As the result of it, the accumulated distance by matching is to be minimized.

  • PDF

A Study on the Perception of the Right to Vote of Persons with Developmental Disabilities in College Students: Based on the Experience of Disability Related Education (발달장애인 선거권에 대한 대학생의 인식 연구: 장애관련 교육경험 유무를 중심으로)

  • Lee, Woo-Jin;Kim, Tae-Gang
    • Journal of Digital Convergence
    • /
    • v.16 no.2
    • /
    • pp.65-71
    • /
    • 2018
  • The purpose of this study is to examine the perception of the right to vote of persons with developmental disabilities in college students. College students attending in A University and B University in Gwangju Metropolitan City were selected using convenience sampling and 370 samples were finally analyzed. The results were as follows. First, the subjects who took a disability course had high perception of the right to vote of persons with developmental disabilities. Second, it was discovered that the more times people participated in the volunteer work related to disabilities, the more recognition they had on the voting rights of people with developmental disabilities. Third, the subjects who responded for the need for the political rights of persons with developmental disabilities had more positive perception of the right to vote of persons with developmental disabilities than those who did not responded. Based on the findings, it was suggested that methods should be investigated to establish positive attitude and perception of the right to vote of persons with developmental disabilities in persons without developmental disabilities including college students.

Speech Recognition in the Pager System displaying Defined Sentences (문자출력 무선호출기를 위한 음성인식 시스템)

  • Park, Gyu-Bong;Park, Jeon-Gue;Suh, Sang-Weon;Hwang, Doo-Sung;Kim, Hyun-Bin;Han, Mun-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.158-162
    • /
    • 1996
  • 본 논문에서는 문자출력이 가능한 무선호출기에 음성인식 기술을 접목한, 특성화된 한 음성인식 시스템에 대하여 설명하고자 한다. 시스템 동작 과정은, 일단 호출자가 음성인식 서버와 접속하게 되면 서버는 호출자의 자연스런 입력음성을 인식, 그 결과를 문장 형태로 피호출자의 호출기 단말기에 출력시키는 방식으로 되어 있다. 본 시스템에서는 통계적 음성인식 기법을 도입하여, 각 단어를 연속 HMM으로 모델링하였다. 가우시안 혼합 확률밀도함수를 사용하는 각 모델은 전통적인 HMM 학습법들 중의 하나인 Baum-Welch 알고리듬에 의해 학습되고 인식시에는 이들에 비터비 빔 탐색을 적용하여 최선의 결과를 얻도록 한다. MFCC와 파워를 혼용한 26 차원 특징벡터를 각 프레임으로부터 추출하여, 최종적으로, 83 개의 도메인 어휘들 및 무음과 같은 특수어휘들에 대한 모델링을 완성하게 된다. 여기에 구문론적 기능과 의미론적 기능을 함께 수행하는 FSN을 결합시켜 자연발화음성에 대한 연속음성인식 시스템을 구성한다. 본문에서는 이상의 사항들 외에도 음성 데이터베이스, 레이블링 등과 갈이 시스템 성능과 직결되는 시스템의 외적 요소들에 대해 고찰하고, 시스템에 구현되어 있는 다양한 특성들에 대해 밝히며, 실험 결과 및 앞으로의 개선 방향 등에 대해 논의하기로 한다.

  • PDF