• Title/Summary/Keyword: speaker tracking

Search Result 23, Processing Time 0.035 seconds

Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.741-751
    • /
    • 2011
  • This paper focuses on online speaker tracking for telephone conversations and broadcast news. Since the online applicability imposes some limitations on the tracking strategy, such as data insufficiency, a reliable approach should be applied to compensate for this shortage. In this framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigenvoice decomposition space are proposed in this paper. We believe that the eigenvoice adaptation techniques would help to embed the speaker space in the models and hence enrich the generality of the selected speaker models. Also, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on 2002 Rich Transcription Broadcast News and Conversational Telephone Speech corpus as well as a synthetic dataset. The indexing errors of the proposed framework on telephone conversations, broadcast news, and synthetic dataset are 8.77%, 9.36%, and 12.4%, respectively. Using the index tree structure approach, the run time of the proposed framework is improved by 22%.

Segmentation and Tracking Algorithm for Moving Speaker in the Video Conference Image (화상회의 영상에서 움직이는 화자의 분할 및 추적 알고리즘)

  • Choi Woo-Young;Kim Han-Me
    • Journal of IKEEE
    • /
    • v.6 no.1 s.10
    • /
    • pp.54-64
    • /
    • 2002
  • In this paper, we propose the algorithm for segmenting the moving speaker and tracking its movement in the video conference image. For real time processing, we simplify the algorithm which is processed in the order of the segmenting and the tracking step. In the segmenting step, the speaker object is segmented from the image by using both the motion information obtained from the difference method and the illuminance information of image. The reference mask image is created from segmented speaker object. In the tracking step, the moving speaker is tracked by using simple block matching algorithm of which computation time is reduced by discarding the blocks which are classified into the unuseful blocks. In the simulation, we can get the good result of segmenting and tracking the moving speaker by applying the proposed algorithm to several test images.

  • PDF

Speaker Tracking System for Autonomous Mobile Robot (자율형 이동로봇을 위한 전방위 화자 추종 시스템)

  • Lee, Chang-Hoon;Kim, Yong-Hoh
    • Proceedings of the KIEE Conference
    • /
    • 2002.11c
    • /
    • pp.142-145
    • /
    • 2002
  • This paper describes a omni-directionally speaker tracking system for mobile robot interface in real environment. Its purpose is to detect a robust 360-degree sound source and to recognize voice command at a long distance(60-300cm). We consider spatial features, the relation of position and interaural time differences, and realize speaker tracking system using fuzzy inference process based on inference rules generated by its spatial features.

  • PDF

A Speaker Change Detection Experiment that Uses a Statistical Method (통계적 기법을 이용한 화자변화 검출 실험)

  • Lee, Kyong-Rok;Kim, Jin-Young
    • Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.59-72
    • /
    • 2001
  • In this paper, we experimented with speaker change detection that uses a statistical method for NOD (News On Demand) service. A specified speaker's change can find out content of each data in speech if analysed because it means change of data contents in news data. Speaker change detection acts as preprocessor that divide input speech by speaker. This is an important preprocessor phase for speaker tracking. We detected speaker change using GLR(generalized likelihood ratio) distance base division and BIC (Bayesian information criterion) base division among matrix method. An experiment verified speaker change point using BIC base division after divide by speaker unit using GLR distance base method first. In the experimental result, FAR (False Alarm Rate) was 63.29 in high noise environment and FAR was 54.28 in low noise environment in MDR (Missed Detection Rate) 15% neighborhood.

  • PDF

Development of a flood prevention system scenario using IoT Directional speaker Seamless-tracking technology (인명지킴이 시스템 기반 사회재난 대응 실증 연구 - IDS 기술을 활용한 수난 방지 시스템 시나리오 개발 -)

  • Lee, Yongsuk;Im, Sua;Shin, Jongkyun
    • Journal of the Society of Disaster Information
    • /
    • v.13 no.1
    • /
    • pp.106-117
    • /
    • 2017
  • This study is to present to be the efficient demonstration of the life protection systems which is developed for the prevention and prompt correspondence for social disaster. It is to suggest to be conducted prompt accident prevention and correspondence based on the type of accident and developing technology development of life protection systems for social disaster using convergence technology like directional speaker system.

Invisible Messenger: A System to Whisper in a Person′s Ear Remotely by integrating Visual Tracking and Speaker Array

  • Mizoguchi, Hiroshi;Kanamori, Tomohiko;Okabe, Kosuke;Hiraoka, Kazuyuki;Tanaka, Masaru;Shigehara, Takaomi;Mishima, Taketoshi
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1897-1900
    • /
    • 2002
  • This paper proposes a novel computer-human interface, named invisible Messenger. It integrates face detection and tracking, and speaker array signal processing. By speaker array it is possible to form acoustic focus at the arbitrary location that is measured by the face tracking. Thus the proposed system can whisper in a person's ear as if an invisible virtual messenger were standing by the person. Not only speculative discussion, the authors have implemented a working prototype system based upon the proposed idea. This paper also describes about this prototype. In order to confirm effectiveness of the proposed idea, the authors conduct experiments using the implemented system. Experimental results demonstrate the effectivenss of the proposed idea.

  • PDF

Microsoft-Kinect Sensor utilizing People Tracking System (Microsoft-Kinect 센서를 활용한 화자추적 시스템)

  • Ban, Tae-Hak;Lee, Sang-Won;Kim, Jae-Min;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.611-613
    • /
    • 2015
  • Multimedia classroom teaching as well as the automatic tracking of the camera are automatically saved track to be saved. The existing tracking system is attached to the body by a separate sensor to track or on the front of the sensor to the construction of the track was a hit at the same time in front of the discomfort caused by tracking errors when I had an issue that shouldn't be. In this paper, Microsoft-Kinect sensor, using the speaker's position and behavior analysis (instructor), and PTZ cameras, recording systems, storage classes and lectures with classroom lessons can be effective at the time of recording to the content production about the technology of unmanned speaker tracking solution.

  • PDF

Development of a Cost-Effective Tele-Robot System Delivering Speaker's Affirmative and Negative Intentions (화자의 긍정·부정 의도를 전달하는 실용적 텔레프레즌스 로봇 시스템의 개발)

  • Jin, Yong-Kyu;You, Su-Jeong;Cho, Hye-Kyung
    • The Journal of Korea Robotics Society
    • /
    • v.10 no.3
    • /
    • pp.171-177
    • /
    • 2015
  • A telerobot offers a more engaging and enjoyable interaction with people at a distance by communicating via audio, video, expressive gestures, body pose and proxemics. To provide its potential benefits at a reasonable cost, this paper presents a telepresence robot system for video communication which can deliver speaker's head motion through its display stanchion. Head gestures such as nodding and head-shaking can give crucial information during conversation. We also can assume a speaker's eye-gaze, which is known as one of the key non-verbal signals for interaction, from his/her head pose. In order to develop an efficient head tracking method, a 3D cylinder-like head model is employed and the Harris corner detector is combined with the Lucas-Kanade optical flow that is known to be suitable for extracting 3D motion information of the model. Especially, a skin color-based face detection algorithm is proposed to achieve robust performance upon variant directions while maintaining reasonable computational cost. The performance of the proposed head tracking algorithm is verified through the experiments using BU's standard data sets. A design of robot platform is also described as well as the design of supporting systems such as video transmission and robot control interfaces.

Optimizations of Air-trap Locations in the Speaker Encloser of Mobile Phone by Injection Molding Simulations (사출성형 시뮬레이션에 의한 휴대폰 스피커 인클로저의 에어트랩 위치 최적화)

  • Park, Ki-Yoon;Park, Jong-Cheon
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.10 no.5
    • /
    • pp.85-90
    • /
    • 2011
  • In this paper a design procedure via computer-aided molding simulation is presented to optimize the air-trap locations in a speaker encloser of mobile phone. The molding flow simulation reveals that the race-tracking phenomenon is the dominant feature in the current mold design. In obtaining an optimal filling pattern, the local modifications of the wall thickness such as in a flow leader attachment are considered as the primary control factor, and both the gate position and the filling time become the secondary control factor. In the one-at-a-time approach, the last location to be filled in the mold cavity could be successfully moved to the extremities of the part, allowing a natural ventilation of entrapped air through the mold parting plane.

Design of High-efficiency Power Amplifier System for High-directional Speaker (고지향성 스피커를 위한 새로운 전력 증폭기 설계)

  • Kim, Jin-Young;Kim, In-Dong;Moon, Wonkyu
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.8
    • /
    • pp.1215-1221
    • /
    • 2017
  • Parametric array transducers are used for highly directional speaker in an air environments. Piezoelectric micromachined ultrasonic transducers for parametric array transducers need DC-biased voltage driving signals in order to get high-directional quality-sound features. The existing power amplifier such as class A amplifiers has low efficiency and require large volume heatsinks. To overcome the above-mentioned disadvantages of the conventional amplifier, this paper proposes a new power amplifier system. The proposed power amplifier system ensures high linearity of output characteristic by utilizing the push-pull class B type amplifier. Furthermore, the proposed power amplifier system gets high efficiency because it contains the DC-DC converter-type power supply which can perform energy recovery and envelope tracking function. Also the paper suggests the detailed circuit topology. Its characteristics are verified by the detailed experimental results.