• Title/Summary/Keyword: Speaker Detection

Search Result 108, Processing Time 0.025 seconds

A Name Recognition Based Call-and-Come Service for Home Robots (가정용 로봇의 호출음 등록 및 인식 시스템)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;Park, Ji-Hun;Kim, Min-A;Kim, Hong-Kook;Kong, Dong-Geon;Myung, Hyun;Bang, Seok-Won
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.360-365
    • /
    • 2008
  • We propose an efficient robot name registration and recognition method in order to enable a Call-and-Come service for home robots. In the proposed method for the name registration, the search space is first restricted by using monophone-based acoustic models. Second, the registration of robot names is completed by using triphone-based acoustic models in the restricted search space. Next, the parameter for the utterance verification is calculated to reduce the acceptance rate of false calls. In addition, acoustic models are adapted by using a distance speech database to improve the performance of distance speech recognition, Moreover, the location of a user is estimated by using a microphone array. The experimental result on the registration and recognition of robot names shows that the word accuracy of speech recognition is 98.3%.

  • PDF

Design of Smart Device Assistive Emergency WayFinder Using Vision Based Emergency Exit Sign Detection

  • Lee, Minwoo;Mariappan, Vinayagam;Mfitumukiza, Joseph;Lee, Junghoon;Cho, Juphil;Cha, Jaesang
    • Journal of Satellite, Information and Communications
    • /
    • v.12 no.1
    • /
    • pp.101-106
    • /
    • 2017
  • In this paper, we present Emergency exit signs are installed to provide escape routes or ways in buildings like shopping malls, hospitals, industry, and government complex, etc. and various other places for safety purpose to aid people to escape easily during emergency situations. In case of an emergency situation like smoke, fire, bad lightings and crowded stamped condition at emergency situations, it's difficult for people to recognize the emergency exit signs and emergency doors to exit from the emergency building areas. This paper propose an automatic emergency exit sing recognition to find exit direction using a smart device. The proposed approach aims to develop an computer vision based smart phone application to detect emergency exit signs using the smart device camera and guide the direction to escape in the visible and audible output format. In this research, a CAMShift object tracking approach is used to detect the emergency exit sign and the direction information extracted using template matching method. The direction information of the exit sign is stored in a text format and then using text-to-speech the text synthesized to audible acoustic signal. The synthesized acoustic signal render on smart device speaker as an escape guide information to the user. This research result is analyzed and concluded from the views of visual elements selecting, EXIT appearance design and EXIT's placement in the building, which is very valuable and can be commonly referred in wayfinder system.

Investigation of the Lateral Acoustic Signal Detection Using by Two Fabry-Perot Fiber Optic Sensor Array (두 개의 Fabry-Perot 광섬유 센서 배열을 이용한 횡방향 음압 감지 특성 연구)

  • Lee, Jong kil
    • 대한공업교육학회지
    • /
    • v.31 no.1
    • /
    • pp.185-199
    • /
    • 2006
  • In this paper, to detect lateral direction sound pressure fiber optic sensor using Fabry-Perot interferometeric sensor array was fabricated and experimented. This parallel sensor array composed of one light source and the light split into each sensor using directional coupler and to see the output signal the array system do not need any digital signal processor. As a lateral direction sound source arbitrary sound frequency of 100Hz, 200Hz, and 655Hz using by nondirectional speaker were applied to the array sensor which installed on $60cm{\times}60cm{\times}60cm$ latticed structure. The detected signals from the two sensors were analyzed in the time and frequency domains. It was confirmed that the suggested sensor array detected applied sound source well but there were a little amplitude differences in between the sensors. Because the sensor supported simply at both ends theoretical analysis was performed and its solution was suggested. To compare the theoretical and experimental results arbitrary sound frequency of 2kHz was applied to the sensor array. It shows that experimental results was good agreement with theoretical results.

Electroglottographic Measurements of Glottal Function in Voice according to Gender and Age

  • Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.97-102
    • /
    • 2011
  • Electroglottography (EGG) is a common method for providing non-invasive measurements of glottal activity. EGG has been used in vocal pathology as a clinical or research tool to measure vocal fold contact. This paper presents the results of pitch, jitter, and closed quotient (CQ) measurements in electroglottographic signals of young (mean = 22.7 years) and elderly (mean = 74.3 years) male and female subjects. The sustained corner vowels /i/, /a/, and /u/ were measured at around 70 dB SPL since the most notable among EGG variables is the phonation intensity, which showed positive correlation with closed phase. The aim of this paper was to measure EGG data according to age and gender. In CQ, there was a significant difference between young and elderly female subjects while there was no significant difference between young and elderly male subjects. The mean value for young males was higher than that for elderly males while the mean value for young females was lower than that for elderly females. Thus, it can be said that in mean values, increased CQ was related to decreased age for females, while CQ decreased for males as the speaker's age decreased. Although the laryngeal degeneration due to increased age seems to occur to a lesser extent in females, the significant increase of CQ in elderly female voices could not be explained in terms of age-related physiological changes. In standard deviation of pitch and jitter, the mean values for young and elderly males were higher than that for young and elderly females. That is, male subjects showed higher in mean values of voice variables than female subjects. This result could be considered as a sign of vocal instability in males. It was suggested that these results may provide powerful insights into the control and regulation of normal phonation and into the detection and characterization of pathology.

  • PDF

Implementation of Prevention and Eradication System for Harmful Wild Animals Based on YOLO (YOLO에 기반한 유해 야생동물 피해방지 및 퇴치 시스템 구현)

  • Min-Uk Chae;Choong-Ho Lee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.3
    • /
    • pp.137-142
    • /
    • 2022
  • Every year, the number of wild animals appearing in human settlements increases, resulting in increased damage to property and human life. In particular, the damage is more severe when wild animals appear on highways or farmhouses. To solve this problem, ecological pathways and guide fences are being installed on highways. In addition, in order to solve the problem in farms, horn repelling using sensors, installing a net, and repelling by smell of excrement are being used. However, these methods are expensive and their effectiveness is not high. In this paper, we used YOLO (You Only Look Once), an AI-based image analysis method, to analyze harmful animals in real time to reduce malfunctions, and high-brightness LEDs and ultrasonic frequency speakers were used as extermination devices. The speaker outputs an audible frequency that only animals can hear, increasing the efficiency to only exterminate wild animals. The proposed system is designed using a general-purpose board so that it can be installed economically, and the detection performance is higher than that of the devices using the existing sensor.

Improvement of Character-net via Detection of Conversation Participant (대화 참여자 결정을 통한 Character-net의 개선)

  • Kim, Won-Taek;Park, Seung-Bo;Jo, Geun-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.241-249
    • /
    • 2009
  • Recently, a number of researches related to video annotation and representation have been proposed to analyze video for searching and abstraction. In this paper, we have presented a method to provide the picture elements of conversational participants in video and the enhanced representation of the characters using those elements, collectively called Character-net. Because conversational participants are decided as characters detected in a script holding time, the previous Character-net suffers serious limitation that some listeners could not be detected as the participants. The participants who complete the story in video are very important factor to understand the context of the conversation. The picture elements for detecting the conversational participants consist of six elements as follows: subtitle, scene, the order of appearance, characters' eyes, patterns, and lip motion. In this paper, we present how to use those elements for detecting conversational participants and how to improve the representation of the Character-net. We can detect the conversational participants accurately when the proposed elements combine together and satisfy the special conditions. The experimental evaluation shows that the proposed method brings significant advantages in terms of both improving the detection of the conversational participants and enhancing the representation of Character-net.

A Study on Lip-reading Enhancement Using Time-domain Filter (시간영역 필터를 이용한 립리딩 성능향상에 관한 연구)

  • 신도성;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.375-382
    • /
    • 2003
  • Lip-reading technique based on bimodal is to enhance speech recognition rate in noisy environment. It is most important to detect the correct lip-image. But it is hard to estimate stable performance in dynamic environment, because of many factors to deteriorate Lip-reading's performance. There are illumination change, speaker's pronunciation habit, versatility of lips shape and rotation or size change of lips etc. In this paper, we propose the IIR filtering in time-domain for the stable performance. It is very proper to remove the noise of speech, to enhance performance of recognition by digital filtering in time domain. While the lip-reading technique in whole lip image makes data massive, the Principal Component Analysis of pre-process allows to reduce the data quantify by detection of feature without loss of image information. For the observation performance of speech recognition using only image information, we made an experiment on recognition after choosing 22 words in available car service. We used Hidden Markov Model by speech recognition algorithm to compare this words' recognition performance. As a result, while the recognition rate of lip-reading using PCA is 64%, Time-domain filter applied to lip-reading enhances recognition rate of 72.4%.

Development of Street Crossing Assistive Embedded System for the Visually-Impaired Using Machine Learning Algorithm (머신러닝을 이용한 시각장애인 도로 횡단 보조 임베디드 시스템 개발)

  • Oh, SeonTaek;Jeong, Kidong;Kim, Homin;Kim, Young-Keun
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.41-47
    • /
    • 2019
  • In this study, a smart assistive device is designed to recognize pedestrian signal and to provide audio instructions for visually impaired people in crossing streets safely. Walking alone is one of the biggest challenges to the visually impaired and it deteriorates their life quality. The proposed device has a camera attached on a pair of glasses which can detect traffic lights, recognize pedestrian signals in real-time using a machine learning algorithm on GPU board and provide audio instructions to the user. For the portability, the dimension of the device is designed to be compact and light but with sufficient battery life. The embedded processor of device is wired to the small camera which is attached on a pair of glasses. Also, on inner part of the leg of the glasses, a bone-conduction speaker is installed which can give audio instructions without blocking external sounds for safety reason. The performance of the proposed device was validated with experiments and it showed 87.0% recall and 100% precision for detecting pedestrian green light, and 94.4% recall and 97.1% precision for detecting pedestrian red light.