Processing math: 100%
  • Title/Summary/Keyword: Voice, Sound

Search Result 338, Processing Time 0.03 seconds

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

  • Hong, Chunho;Cho, Youngho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.33-40
    • /
    • 2021
  • Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.

A Study on the Acoustic Characteristics of the Pansori by Voice Signals Analysis (음성신호 분석에 의한 판소리의 음성학적 특징 연구)

  • Kim, HyunSook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3218-3222
    • /
    • 2013
  • Pansori is our traditional vocal sound, originality and excellence in the art of conversation, gesture general became a globally recognized world intangible heritage. Especially, Pansori as shrews and humorous representation of audience participation with a high degree of artistic value and enjoy the arts throughout all layers to be responsible for the social integration of functions is evaluated. Therefore, in this paper, Pansori five yard target speech signal analysis techniques applied to analyze the Pansori acoustic features of a representation of a society and era correlation extraction studies were performed. Pansori on the five yard spectrogram, pitch, stability and strength analysis for this experiment. Pansori through experimental results Comical story while keeping the audience focused and interested to better reflect the characteristics of energy for the wave of voice and vocal cord tremor change the width of a large, stable and voice with a loud voice, that expresses were analyzed.

A Study on the Sasang Constitutional Symptom of Taeumin by Voice Characteristics (음향특성에 따른 태음인 체질병증(體質病證) 연구(硏究))

  • Kim, Dal-Rae
    • Journal of Sasang Constitution and Immune Medicine
    • /
    • v.19 no.1
    • /
    • pp.90-97
    • /
    • 2007
  • 1. Objectives and Methods This study was done to investigate the relationships of Sound parameters between Liver Heat Symptom and Esophagus Symptom of Taeumin using PSSC(Phonetic System of Sasang Constitution) in a sentence. Experimental Participants were 20 Korean adult males including, each 10 Liver Heat Symptom and Esophagus Symptom of Taeumin. 2. Results In Pitch segment, APQ segment and Shimmer segment, there were no significant differences between Liver Heat Symptom and Esophagus Symptom of Taeumin. In Octave segment, there were significant differences in Octave 1, Octave 3, Octave 4, Octave 6 of Liver Heat Symptom of Taeumin were significantly high compared with Esophagus Symptom of Taeumin. In Energy segment, FreQ Domain Total Sum / cnt(0), 0k-2k Total Sum,0k-2k sum dev., 2k-4k Total Sum, 2k-4k sum dev., A# Tot E, B__TOT_E, C__TOT_E, C# Tot E, D__TOT_E, A sum dev., A# sum dev., B sum dev., C sum dev., C# sum dev., Dsum dev., D# sum dev., E sum dev., F sum dev., F# sum dev., G sum dev., G# sum dev. of Liver Heat Symptom of Taeumin were significantly high compared with Esophagus Symptom of Taeumin. In Voice Recording time segment, Total Voice Recording Time, Voice Recording Time, Divide By Time3, Divide By Energy10, Total Unit, Max Unit Position, U_0 TO 3 of Liver Heat Symptom of Taeumin were significantly high compared with Esophagus Symptom of Taeumin. 3. Conclusion From above result, there is the postbility of efficiency quide constitutional sx. of Taeumin by Voice characteristics. More Soeumin, Soyangin and Taeyangin Symptoms are needed to determine Sasang Constitution using PSSC and to make PSSC effective.

  • PDF

An Implementation of Multimodal Speaker Verification System using Teeth Image and Voice on Mobile Environment (이동환경에서 치열영상과 음성을 이용한 멀티모달 화자인증 시스템 구현)

  • Kim, Dong-Ju;Ha, Kil-Ram;Hong, Kwang-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.162-172
    • /
    • 2008
  • In this paper, we propose a multimodal speaker verification method using teeth image and voice as biometric trait for personal verification in mobile terminal equipment. The proposed method obtains the biometric traits using image and sound input devices of smart-phone that is one of mobile terminal equipments, and performs verification with biometric traits. In addition, the proposed method consists the multimodal-fashion of combining two biometric authentication scores for totally performance enhancement, the fusion method is accompanied a weighted-summation method which has comparative simple structure and superior performance for considering limited resources of system. The performance evaluation of proposed multimodal speaker authentication system conducts using a database acquired in smart-phone for 40 subjects. The experimental result shows 8.59% of EER in case of teeth verification 11.73% in case of voice verification and the multimodal speaker authentication result presented the 4.05% of EER. In the experimental result, we obtain the enhanced performance more than each using teeth and voice by using the simple weight-summation method in the multimodal speaker verification system.

The Therapeutic Effects of SKTCLP(R) in Patients with Mutational Dysphonia (생리적 발성 기법의 변성발성장애 치료 적용 효과)

  • Kim, Seong-Tae;Nam, Soon-Yuhl
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.99-105
    • /
    • 2011
  • The treatment for patients with mutational dysphonia typically is useful with vegetative phonation, but has not yet been studied. This study attempts to identify the effect of SKTCLP(R) using throat clearing and laughing in patients with mutational dysphonia. The study, which was designed by the author, included 26 patients aged from 14 to 32 years (mean: 18.7 years) who had been diagnosed with mutational dysphonia between January 2007 and June 2010. Voice therapy for these patients included SKTCLP(R), ranging from two to seven sessions (mean: 3.8 sessions). Results were evaluated by videostroboscopy, perceptual evaluation of GRBAS scale, aerodynamic test, and acoustic analysis before and after therapy. Most patients could phonate with low pitch from the beginning and sustain with normal pitch sound in the last session. We had found that glottic gap reduced after therapy and anterior-posterior compression of superior laryngeal part at the first time, and these patients had complete closure of the glottis after treatment. The results of acoustic and aerodynamic measures after treatment indicated significant decreases in Fo, Jitter, Shimmer, SFF, and SPI, and increases in MPT, Psub, and vocal efficiency (p<.05). SKTCLP(R) may be a useful treatment method in managing mutational dysphonia. We can suggest this technique may be useful in improving the voice quality of other functional dysphonia having glottal chink or functional aphonia.

  • PDF

Development of a Monitoring and Forecasting System for the Delivery of Pregnant Sow (임신돈의 분만 감시 및 예측 시스템 개발)

  • 임영일
    • Journal of Animal Environmental Science
    • /
    • v.6 no.1
    • /
    • pp.15-22
    • /
    • 2000
  • A monitoring and the forecasting system for the swine delivery was developed using CCD camera multi-function board microphone and data-recorder equipped on a personal computer. For the swine delivery monitoring and forecasting factors four factors were selected such as genitalia swine body shape breast color and sound. Image of physical variation of body shape, shape and color of genitalia area and color of breast of pregnant sow were grabbed using the CCD color camera and multi-function board and variation of voice of pregnant sow was acquired using microphone and data recorder. Acquired information of image and voice were analyzed using a custom developed algorithm and program. The result of the forecasting efficiency of swine delivery was 89%, 71% and 100% using the variation of genitalia are the body shape and the voice of pregnant sow. respectively. The efficiency of image processing was 100% for the delivery detection when the piglet was delivered half of its body from genitalia of pregnant sow, The monitoring and forecasting system informed the estimated time of the delivery of swine to a farm manager immediately if an estimated and established time set by the farm manager was the same and/or the estimated time ws earlier than the established time and the system detected the delivery.

  • PDF

Comparison and Analysis of Response of Premature Infants to Auditory Stimulus (일변량 분산 분석과 이변량 시계열 분석을 이용한 미숙아의 목소리 자극에 대한 심박동수와 호흡수 반응의 비교)

  • Lee, Hye-Jung
    • Child Health Nursing Research
    • /
    • v.15 no.3
    • /
    • pp.261-270
    • /
    • 2009
  • Purpose: The purpose of this study was to compare the result of one-way ANOVA with that of cross-correlation time series analysis in order to evaluate physiologic responses of premature infants to human voices. Methods: Four premature infants born prior to 32 weeks gestational age were included in the study. The Gould 4000TA Recording System recorded the preterm infant's heart and respiratory rate while they were listening to a pre-recorded voice recording. Each infant listened to both male and female voices (1 min each) at each testing session. Results: The results of both one-wayANOVA and cross-correlation time series analysis using heart and respiratory rate data were not consistent in some of premature infants. A cross-correlation time series analysis revealed that the responses of premature infant to vocal stimulation occurred at a varying number of seconds after the stimulus was presented and lasted for over 20-30 sec. Conclusion: The results indicate that a time series analysis can provide more detailed information on the rapidly changing physiologic status of premature infant to the auditory stimulus. In addition, the results provide an insight into an auditory responsitivity of premature infants to a naturally occurring sound, the human voice, in the neonatal intensive care unit.

  • PDF

A Study on the underwater communication system of ultrasonic transducer (압전 초음파 센서를 이용한 수중통신에 관한 연구)

  • Kim, Dong-Hyun;Woo, Hyoung-Gwan;Hwang, Hyun-Suk;Jin, Hong-Bum;Song, Joon-Tae
    • Proceedings of the KIEE Conference
    • /
    • 2000.07c
    • /
    • pp.1658-1660
    • /
    • 2000
  • Simple signs were usually exchanged as the means of underwater communications. As people recently, need more informations for underwater activities, necessities of underwater communication systems exchanging hunman voice are increased. The purpose of this paper is understanding the ordinary characteristics of underwater communication and investigating the necessary conditions for a good underwater communication system by making a basic communication module. The experiment is achieved by applying AM (Amplitude Modulation) which is mainly used for the underwater communication systems and using common ultrasonic transducers. Ultrasonic transducers usually have narrow bandwidth for transducing electrical energy to mechanical energy. For improvement of sound reconstruction, transducers need more bandwidth which covers voice's frequency range, and goof linearity characteristics in this frequency range. As underwater transmissions have many factors to distort signals. Amplitude Modulation is not a proper way for underwater communications. Using digital signal by sampling human voice should be a good way for this systems, because digital communication simplify transmitting signals.

  • PDF

Plan for the Development of a Standardized Dummy for Persons in Need of Rescue in a Confined Space (밀폐공간 구조 요구자를 위한 더미 표준화 개발 방안)

  • Choi, Seo-Yeon;Rie, Dong-Ho;Kim, Hyung-Jun
    • Journal of the Korea Safety Management & Science
    • /
    • v.18 no.4
    • /
    • pp.99-105
    • /
    • 2016
  • This study was conducted to develop a dummy in an environment similar to the human body, to prepare a standard for evaluation and to present the process of the production in order to evaluate the performance of the robot that can detect the persons needing rescue in a confined space, who are difficult for fire-fighting officials to rescue in case of fire and disaster. As a result, a standard for evaluation was developed and standardized into four parts 'Normal,' 'Risk Stage 1,' 'Risk Stage 2' and 'Risk Stage 3'based on the number of breath cycles, carbon dioxide concentration, core temperature and criteria for hearing to recognize the voice. In addition, in order to produce a dummy, fever, breathing capacity and voice output function were compared and analyzed. This study has significance that it built up basic data of the method of producing the actual dummy, by presenting characteristics and controlling methods using the waterproof insulation heating coil for the function, solenoid valve for the consecutive output of breathing capacity and USB program sound board for voice output.

Ortho-phonic Alphabet Creation by the Musical Theory and its Segmental Algorithm (악리론으로 본 정음창제와 정음소 분절 알고리즘)

  • Chin, Yong-Ohk;Ahn, Cheong-Keung
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.49-59
    • /
    • 2001
  • The phoneme segmentation is a very difficult problem in speech sound processing because it has found out segmental algorithm in many kinds of allophone and coarticulation's trees. Thus system configuration for the speech recognition and voice retrieval processing has a complex system structure. To solve it, we discuss a possibility of new segmental algorithm, which is called the minus a thirds one or plus in tripartitioning(삼분손익) of twelve temporament(12 율려), first proposed by Prof. T. S. Han. It is close to oriental and western musical theory. He also has suggested a 3 consonant and 3 vowel phonemes in Hunminjungum(훈민정음) invented by the King Sejong in the 15th century. In this paper, we suggest to newly name it as ortho-phonic phoneme(OPP/정음소), which carries the meaning of 'the absoluteness and independency'. OPP also is acceptable to any other languages, for example IPA. Lastly we know that this algorithm is constantly applicable to the global language and is very useful to construct a voice recognition and retrieval structuring engineering.

  • PDF