• Title/Summary/Keyword: voice recognition

Search Result 650, Processing Time 0.034 seconds

A Comparative Study of the Speech Signal Parameters for the Consonants of Pyongyang and Seoul Dialects - Focused on "ㅅ/ㅆ" (평양 지역어와 서울 지역어의 자음에 대한 음성신호 파라미터들의 비교 연구 - "ㅅ/ ㅆ"을 중심으로)

  • So, Shin-Ae;Lee, Kang-Hee;You, Kwang-Bock;Lim, Ha-Young
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.6
    • /
    • pp.927-937
    • /
    • 2018
  • In this paper the comparative study of the consonants of Pyongyang and Seoul dialects of Korean is performed from the perspective of the signal processing which can be regarded as the basis of engineering applications. Until today, the most of speech signal studies were primarily focused on the vowels which are playing important role in the language evolution. In any language, however, the number of consonants is greater than the number of vowels. Therefore, the research of consonants is also important. In this paper, with the vowel study of the Pyongyang dialect, which was conducted by phonological research and experimental phonetic methods, the consonant studies are processed based on an engineering operation. The alveolar consonant, which has demonstrated many differences in the phonetic value between Pyongyang and Seoul dialects, was used as the experimental data. The major parameters of the speech signal analysis - formant frequency, pitch, spectrogram - are measured. The phonetic values between the two dialects were compared with respect to /시/ and /씨/ of Korean language. This study can be used as the basis for the voice recognition and the voice synthesis in the future.

Age classification of emergency callers based on behavioral speech utterance characteristics (발화행태 특징을 활용한 응급상황 신고자 연령분류)

  • Son, Guiyoung;Kwon, Soonil;Baik, Sungwook
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.6
    • /
    • pp.96-105
    • /
    • 2017
  • In this paper, we investigated the age classification from the speaker by analyzing the voice calls of the emergency center. We classified the adult and elderly from the call center calls using behavioral speech utterances and SVM(Support Vector Machine) which is a machine learning classifier. We selected two behavioral speech utterances through analysis of the call data from the emergency center: Silent Pause and Turn-taking latency. First, the criteria for age classification selected through analysis based on the behavioral speech utterances of the emergency call center and then it was significant(p <0.05) through statistical analysis. We analyzed 200 datasets (adult: 100, elderly: 100) by the 5 fold cross-validation using the SVM(Support Vector Machine) classifier. As a result, we achieved 70% accuracy using two behavioral speech utterances. It is higher accuracy than one behavioral speech utterance. These results can be suggested age classification as a new method which is used behavioral speech utterances and will be classified by combining acoustic information(MFCC) with new behavioral speech utterances of the real voice data in the further work. Furthermore, it will contribute to the development of the emergency situation judgment system related to the age classification.

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.239-246
    • /
    • 2023
  • In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF

Data Modeling for Cyber Security of IoT in Artificial Intelligence Technology (인공지능기술의 IoT 통합보안관제를 위한 데이터모델링)

  • Oh, Young-Taek;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.57-65
    • /
    • 2021
  • A hyper-connected intelligence information society is emerging that creates new value by converging IoT, AI, and Bigdata, which are new technologies of the fourth industrial revolution, in all industrial fields. Everything is connected to the network and data is exploding, and artificial intelligence can learn on its own and even intellectual judgment functions are possible. In particular, the Internet of Things provides a new communication environment that can be connected to anything, anytime, anywhere, enabling super-connections where everything is connected. Artificial intelligence technology is implemented so that computers can execute human perceptions, learning, reasoning, and natural language processing. Artificial intelligence is developing advanced technologies such as machine learning, deep learning, natural language processing, voice recognition, and visual recognition, and includes software, machine learning, and cloud technologies specialized in various applications such as safety, medical, defense, finance, and welfare. Through this, it is utilized in various fields throughout the industry to provide human convenience and new values. However, on the contrary, it is time to respond as intelligent and sophisticated cyber threats are increasing and accompanied by potential adverse functions such as securing the technical safety of new technologies. In this paper, we propose a new data modeling method to enable IoT integrated security control by utilizing artificial intelligence technology as a way to solve these adverse functions.

Robust Speech Segmentation Method in Noise Environment for Speech Recognizer (음성인식기 구현을 위한 잡음에 강인한 음성구간 검출기법)

  • 김창근;박정원;권호민;허강인
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.18-24
    • /
    • 2003
  • One of the most important subjects in the implementation of real time speech recognizer is to design both reliable VAD(Voice Activity Detection) and suitable speech feature vector. But, because it is difficult to calculate reliable VAD in the environment having surrounding noise, designed suitable speech feature vector may not be obtained. Solving this problem, in this paper, we implement not only short time power spectrum which is generally used but also two additive parameters, the comparison measure of spectrum density having robust property in noise and linear discriminant function using linear regression, then perform VAD by using the combination of each parameter having apt weight in other magnitudes of surrounding noise and confirm that proposed parameters show a robust characteristic in circumstances having surrounding noise by using DTW(Dynamic Time Waning) in recognition experiment.

  • PDF

NUI/NUX framework based on intuitive hand motion (직관적인 핸드 모션에 기반한 NUI/NUX 프레임워크)

  • Lee, Gwanghyung;Shin, Dongkyoo;Shin, Dongil
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.11-19
    • /
    • 2014
  • The natural user interface/experience (NUI/NUX) is used for the natural motion interface without using device or tool such as mice, keyboards, pens and markers. Up to now, typical motion recognition methods used markers to receive coordinate input values of each marker as relative data and to store each coordinate value into the database. But, to recognize accurate motion, more markers are needed and much time is taken in attaching makers and processing the data. Also, as NUI/NUX framework being developed except for the most important intuition, problems for use arise and are forced for users to learn many NUI/NUX framework usages. To compensate for this problem in this paper, we didn't use markers and implemented for anyone to handle it. Also, we designed multi-modal NUI/NUX framework controlling voice, body motion, and facial expression simultaneously, and proposed a new algorithm of mouse operation by recognizing intuitive hand gesture and mapping it on the monitor. We implement it for user to handle the "hand mouse" operation easily and intuitively.

The Study on Automatic Speech Recognizer Utilizing Mobile Platform on Korean EFL Learners' Pronunciation Development (자동음성인식 기술을 이용한 모바일 기반 발음 교수법과 영어 학습자의 발음 향상에 관한 연구)

  • Park, A Young
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1101-1107
    • /
    • 2017
  • This study explored the effect of ASR-based pronunciation instruction, using a mobile platform, on EFL learners' pronunciation development. Particularly, this quasi-experimental study focused on whether using mobile ASR, which provides voice-to-text feedback, can enhance the perception and production of target English consonants minimal pairs (V-B, R-L, and G-Z) of Korean EFL learners. Three intact classes of 117 Korean university students were assigned to three groups: a) ASR Group: ASR-based pronunciation instruction providing textual feedback by the mobile ASR; b) Conventional Group: conventional face-to-face pronunciation instruction providing individual oral feedback by the instructor; and the c) Hybrid Group: ASR-based pronunciation instruction plus conventional pronunciation instruction. The ANCOVA results showed that the adjusted mean score for pronunciation production post-test on the Hybrid instruction group (M=82.71, SD =3.3) was significantly higher than the Conventional group (M=62.6, SD =4.05) (p<.05).

A Study on a Feedback-Centric Piano Education System Using Kinect Sensors (키넥트를 활용한 피드백 중심의 피아노 교육 방안 연구)

  • Park, So Hyun;Ihm, Sun Young;Park, Eun Young;Son, Jong Seo;Park, Young Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.9
    • /
    • pp.403-408
    • /
    • 2015
  • Kinect sensors have the ability to recognize the behavior and voice of the user. Due to its low-cost and high accessibility, Kinect sensors have been used in various fields, including healthcare, education and so on. In this paper, we propose to use Kinect in piano education. Specifically, the proposed method first recognizes the coordinate values of user's posture, compares them with coordinate values of teacher's posture and provide real-time feedbacks to the user. This enables user to keep the correct posture even when he is learning piano without a teacher. However, since the piano education is a long process, it is difficult to achieve the correct posture as a teacher immediately. Thus, we propose a user-oriented method to measure the error tolerance rate. The proposed method is the first feedback based piano education system that uses Kinect sensors.

An EPG Configuration Constructing Method and Structure for Dynamically Implementing Viewer Chosen EPG Configurations (시청자 선택 기반의 EPG 형상의 동적 구현을 위한 EPG형상 제작 방법과 구조)

  • Ko, Kwang-Il
    • Convergence Security Journal
    • /
    • v.11 no.4
    • /
    • pp.51-58
    • /
    • 2011
  • Due to the digital technology, the TV broadcasting platform is evolving to the digital-TV, which is supporting data broadcasting service. Although the data broadcasting services (i.e., games, wether information, stock trading service) provide rich entertainment to viewers, they make the operation manners of digital-TV so complex that some viewers feel difficulty in using their TV sets. Several researches have been performed to address the problem by improving the functions of EPG such as searching and reserving programs, applying gesture and voice recognition technologies to operating EPG, guiding the design of the EPG's user interface, and developing agents helping EPG to behave intelligently. A research, however, that tries to address the problem that viewers have different familiarities with IT services has not been performed yet. The paper tackles the problem by letting a viewer to choose an EPG configuration (among the several EPG configurations provided by a broadcasting network) and designing an EPG that implements an EPG configuration based on the choice.