• Title/Summary/Keyword: Voice Feature

Search Result 232, Processing Time 0.029 seconds

A Study on Optical internet Transmission technic Using DWDM based on network (네트워크 기반에서의 DWDM을 이용한 광 인터넷 전송 기술에 관한 연구)

  • 장우순;정진호
    • Journal of Internet Computing and Services
    • /
    • v.2 no.1
    • /
    • pp.87-96
    • /
    • 2001
  • This article proposes traffic dispersion with optical transmission technical and development of transmission rate for the safe multicast computer communication in the high bandwidth, Recently multicast traffic such as distance conference or Internet broadcast increases therefore the importance of traffic dispersion and transmission rate is emphasized. Ultimately this article offers the way of carrying out the above suggestion, First this paper points out traffic problems occurred in voice and text centered transmission. Next, transmission rate can be controlled by optical transmission technic to solve above difficulties in the multimedia and Internet. We investigated the feature and output on Add-Drop Mux/Demux and Also presented charges of length accord each stage in interference. We can show, the best data of design as a result of this experiment.

  • PDF

Design and Analysis of Mobile-IPv6 Multicasting Algorithm Supporting Smooth Handoff in the All-IP Network (All-IP망에서 Smooth Handoff를 지원하는 Mobile-IP v6 멀티캐스팅 알고리즘의 설계 및 분석)

  • 박병섭
    • The Journal of the Korea Contents Association
    • /
    • v.2 no.3
    • /
    • pp.119-126
    • /
    • 2002
  • The QoS(Quality of Service) guarantee mechanism is one of critical issues in the wireless network. Real-time applications like VoIP(Voice over IP) in All-IP networks need smooth handoffs in order to minimize or eliminate packet loss as a Mobile Host(MH) transitions between network links. In this paper, we design a new multicasting algorithm using DB(Dynamic Buffering) mechanism for Mobile-IPv6. A key feature of the new protocol is the concepts of the DB and MRA(Multicast Routing Agent) to reduce delivery path length of the multicast datagram. Particularly, the number of tunneling and average routing length of datagram are reduced relatively, the multicast traffic load is also decreased.

  • PDF

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

  • Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
    • The Journal of Korea Robotics Society
    • /
    • v.13 no.1
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.

The Design of User-Authentication technique using QR-Code recognition (스마트폰의 QR-Code의 인식 기법을 이용한 사용자 인증 기법 설계)

  • Lee, Yong Jae;Kim, Young Gon;Park, Tae Sung;Jun, Moon Seog
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.3
    • /
    • pp.85-95
    • /
    • 2011
  • Smart phones, greatly expanding in the recent mobile market, are equipped with various features compared to existing feature phones and provide the conveniences to in several ways. The camera, one of the features of a smartphone, creates the digital contents, such photos and videos, and plays a role for the media which transmits information, such as video calls and bar code reader. QR-Code recognition is also one of the camera features. It contains a variety of information in two-dimensional bar code type in matrix format, and makes it possible to obtain the information by using smart phones. This paper analyzes the method of QR-Code recognition, password method-the existing user-authentication technique, smart card, biometrics and voice recognition and so on and thenn designs a new user-authentication technique. The proposed user-authentication technique is the technique in which QR-Code, which can be simply granted is read by smart phones and transmitted to a server, for authentication. It has the advantages in view that it will simply the process of authentication and conteract the disadvantages, such as brute force attack, man-inthe-middle attack, and keyboard hacking, which may occur in other authentication techniques.

Content-Based Retrieval System Design over the Internet (인터넷에 기반한 내용기반 검색 시스템 설계)

  • Kim Young Ho;Kang Dae-Seong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.5
    • /
    • pp.471-475
    • /
    • 2005
  • Recently, development of digital technology is occupying a large part of multimedia information like character, voice, image, video, etc. Research about video indexing and retrieval progresses especially in research relative to video. This paper proposes the novel notation in order to retrieve MPEG video in the international standards of moving picture encoding For realizing the retrieval-system, we detect DCT DC coefficient, and then we obtain shot to apply MVC(Mean Value Comparative) notation to image constructed DC coefficient. We choose the key frame for start-frame of a shot, and we have the codebook index generating it using feature of DC image and applying PCA(principal Component Analysis) to the key frame. Also, we realize the retrieval-system through similarity after indexing. We could reduce error detection due to distinguish shot from conventional shot detection algorithm. In the mean time, speed of indexing is faster by PCA due to perform it in the compressed domain, and it has an advantage which is to generate codebook due to use statistical features. Finally, we could realize efficient retrieval-system using MVC and PCA to shot detection and indexing which is important step of retrieval-system, and we using retrieval-system over the internet.

Emotion Recognition Algorithm Based on Minimum Classification Error incorporating Multi-modal System (최소 분류 오차 기법과 멀티 모달 시스템을 이용한 감정 인식 알고리즘)

  • Lee, Kye-Hwan;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.4
    • /
    • pp.76-81
    • /
    • 2009
  • We propose an effective emotion recognition algorithm based on the minimum classification error (MCE) incorporating multi-modal system The emotion recognition is performed based on a Gaussian mixture model (GMM) based on MCE method employing on log-likelihood. In particular, the reposed technique is based on the fusion of feature vectors based on voice signal and galvanic skin response (GSR) from the body sensor. The experimental results indicate that performance of the proposal approach based on MCE incorporating the multi-modal system outperforms the conventional approach.

Semantic Ontology Speech Recognition Performance Improvement using ERB Filter (ERB 필터를 이용한 시맨틱 온톨로지 음성 인식 성능 향상)

  • Lee, Jong-Sub
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.265-270
    • /
    • 2014
  • Existing speech recognition algorithm have a problem with not distinguish the order of vocabulary, and the voice detection is not the accurate of noise in accordance with recognized environmental changes, and retrieval system, mismatches to user's request are problems because of the various meanings of keywords. In this article, we proposed to event based semantic ontology inference model, and proposed system have a model to extract the speech recognition feature extract using ERB filter. The proposed model was used to evaluate the performance of the train station, train noise. Noise environment of the SNR-10dB, -5dB in the signal was performed to remove the noise. Distortion measure results confirmed the improved performance of 2.17dB, 1.31dB.

Design and Implementation of the Voice Feature Elimination Technique to Protect Speaker's Privacy (사용자 프라이버시 보호를 위한 음성 특징 제거 기법 설계 및 구현)

  • Yu, Byung-Seok;Lim, SuHyun;Park, Mi-so;Lee, Yoo-Jin;Yun, Sung-Hyun
    • Annual Conference of KIPS
    • /
    • 2012.11a
    • /
    • pp.672-675
    • /
    • 2012
  • 음성은 가장 익숙하고 편리한 의사 소통 수단으로 스마트폰과 같이 크기가 작은 모바일 기기의 입력 인터페이스로 적합하다. 서버 기반의 음성 인식은 서버를 방문하는 다양한 사용자들을 대상으로 음성 모델을 구축하기 때문에 음성 인식률을 높일 수 있고 상용화가 가능하다. 구글 음성인식, 아이폰의 시리(SiRi)가 대표적인 예이며 최근 스마트폰 사용자의 증가로 이에 대한 수요가 급증하고 있다. 서버 기반 음성 인식 기법에서 음성 인식은 스마트폰과 인터넷으로 연결되어 있는 원격지 서버에서 이루어진다. 따라서, 사용자는 스마트폰에 저장된 음성 데이터를 인터넷을 통하여 음성 인식 서버로 전달해야 된다[1, 2]. 음성 데이터는 사용자 고유 정보를 가지고 있으므로 개인 인증 및 식별을 위한 용도로 사용될 수 있으며 음성의 톤, 음성 신호의 피치, 빠르기 등을 통해서 사용자의 감정까지도 판단 할 수 있다[3]. 서버 기반 음성 인식에서 네트워크로 전송되는 사용자 음성 데이터는 제 3 자에게 쉽게 노출되기 때문에 화자의 신분 및 감정이 알려지게 되어 프라이버시 침해를 받게 된다. 본 논문에서는 화자의 프라이버시를 보호하기 위하여 사용자 음성 데이터로부터 개인의 고유 특징 및 현재 상태를 파악할 수 있는 감정 정보를 제거하는 기법을 설계 및 구현하였다.

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.