• Title/Summary/Keyword: Speech Detection

Search Result 472, Processing Time 0.025 seconds

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers (머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구)

  • Jihyun Mun;Sunhee Kim;Myeong Ju Kim;Jiwon Ryu;Sejoong Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.45-56
    • /
    • 2022
  • This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.

Generalized cross correlation with phase transform sound source localization combined with steered response power method (조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정)

  • Kim, Young-Joon;Oh, Min-Jae;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.5
    • /
    • pp.345-352
    • /
    • 2017
  • We propose a methods which is reducing direction estimation error of sound source in the reverberant and noisy environments. The proposed algorithm divides speech signal into voice and unvoice using VAD. We estimate the direction of source when current frame is voiced. TDOA (Time-Difference of Arrival) between microphone array using the GCC-PHAT (Generalized Cross Correlation with Phase Transform) method will be estimated in that frame. Then, we compare the peak value of cross-correlation of two signals applied to estimated time-delay with other time-delay in time-table in order to improve the accuracy of source location. If the angle of current frame is far different from before and after frame in successive voiced frame, the angle of current frame is replaced with mean value of the estimated angle in before and after frames.

A Driving Information Centric Information Processing Technology Development Based on Image Processing (영상처리 기반의 운전자 중심 정보처리 기술 개발)

  • Yang, Seung-Hoon;Hong, Gwang-Soo;Kim, Byung-Gyu
    • Convergence Security Journal
    • /
    • v.12 no.6
    • /
    • pp.31-37
    • /
    • 2012
  • Today, the core technology of an automobile is becoming to IT-based convergence system technology. To cope with many kinds of situations and provide the convenience for drivers, various IT technologies are being integrated into automobile system. In this paper, we propose an convergence system, which is called Augmented Driving System (ADS), to provide high safety and convenience of drivers based on image information processing. From imaging sensor, the image data is acquisited and processed to give distance from the front car, lane, and traffic sign panel by the proposed methods. Also, a converged interface technology with camera for gesture recognition and microphone for speech recognition is provided. Based on this kind of system technology, car accident will be decreased although drivers could not recognize the dangerous situations, since the system can recognize situation or user context to give attention to the front view. Through the experiments, the proposed methods achieved over 90% of recognition in terms of traffic sign detection, lane detection, and distance measure from the front car.

A Study on Area Detection Using Transfer-Learning Technique (Transfer-Learning 기법을 이용한 영역검출 기법에 관한 연구)

  • Shin, Kwang-seong;Shin, Seong-yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.178-179
    • /
    • 2018
  • Recently, methods of using machine learning in artificial intelligence such as autonomous navigation and speech recognition have been actively studied. Classical image processing methods such as classical boundary detection and pattern recognition have many limitations in order to recognize a specific object or area in a digital image. However, when a machine learning method such as deep-learning is used, Can be obtained. However, basically, a large amount of learning data must be secured for machine learning such as deep-learning. Therefore, it is difficult to apply the machine learning for area classification when the amount of data is very small, such as aerial photographs for environmental analysis. In this study, we apply a transfer-learning technique that can be used when the dataset size of the input image is small and the shape of the input image is not included in the category of the training dataset.

  • PDF

Fast Algorithm for Recognition of Korean Isolated Words (한국어 고립단어인식을 위한 고속 알고리즘)

  • 남명우;박규홍;정상국;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.50-55
    • /
    • 2001
  • This paper presents a korean isolated words recognition algorithm which used new endpoint detection method, auditory model, 2D-DCT and new distance measure. Advantages of the proposed algorithm are simple hardware construction and fast recognition time than conventional algorithms. For comparison with conventional algorithm, we used DTW method. At result, we got similar recognition rate for speaker dependent korean isolated words and better it for speaker independent korean isolated words. And recognition time of proposed algorithm was 200 times faster than DTW algorithm. Proposed algorithm had a good result in noise environments too.

  • PDF

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

  • 변경진;최민석;한민수;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.56-61
    • /
    • 2001
  • This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.

  • PDF

A Study on the Robust Sound Localization System Using Subband Filter Bank (서브밴드 필터 뱅크를 이용한 강인한 음원 추적시스템에 대한 연구)

  • 박규식;박재현;온승엽;오상헌
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.36-42
    • /
    • 2001
  • This paper propose new sound localization algorithm that detects the sound source bearing in a closed office environment using two microphone array. The proposed Subband CPSP (Cross Power Spectrum Phase) algorithm is a development of previously Down CPSP method using subband approach. It first split the received microphone signals into subbands and then calculates subband CPSP which result in possible source bearings. This type of algorithm, Subband CPSP, can provide more robust and reliable sound localization system because it limits the effects of environmental noise within each subband. To verify the performance of the proposed Subband CPSP algorithm, a real time simulation was conducted and it was compared with previous CPSP method. From the simulation results, the proposed Subband CPSP is superior to previous CPSP algorithm more than 5% average accuracy for sound source detection.

  • PDF

An User-Friendly Kiosk System Based on Deep Learning (딥러닝 기반 사용자 친화형 키오스크 시스템)

  • Su Yeon Kang;Yu Jin Lee;Hyun Ah Jung;Seung A Cho;Hyung Gyu Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study aims to provide a customized dynamic kiosk screen that considers user characteristics to cope with changes caused by increased use of kiosks. In order to optimize the screen composition according to the characteristics of the digital vulnerable group such as the visually impaired, the elderly, children, and wheelchair users, etc., users are classified into nine categories based on real-time analysis of user characteristics (wheelchair use, visual impairment, age, etc.). The kiosk screen is dynamically adjusted according to the characteristics of the user to provide efficient services. This study shows that the system communication and operation were performed in the embedded environment, and the used object detection, gait recognition, and speech recognition technologies showed accuracy of 74%, 98.9%, and 96%, respectively. The proposed technology was verified for its effectiveness by implementing a prototype, and through this, this study showed the possibility of reducing the digital gap and providing user-friendly "barrier-free kiosk" services.

Analysis of auditory temporal processing in within- and cross-channel gap detection thresholds for low-frequency pure tones (저주파수 순음에 대한 within- 및 cross-channel gap detectin thresholds를 이용한 auditory temporal processing 특성 연구)

  • Koo, Sungmin;Lim, Dukhwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.58-63
    • /
    • 2022
  • This study was conducted to examine the characteristics of pitch perception and temporal resolution through Within-/Cross-Channel Gap Detection Thresholds (WC/CC GDTs) using low-frequency pure tones (such as 264 Hz, 373 Hz and 528 Hz related to C4, C4#, and C5 musical tones. 40 young people and 20 elderly people with normal hearing participated in this study. The results of WC GDTs were approximately 2 ms ~ 4 ms threshold values regardless of frequencies in two groups. There was no statistically significant difference in WC GDTs between groups. In both groups, CC GDTs were larger than WC GDTs, and as the frequency difference increased, the CC GDTs also increased. In particular, in the comparison between groups of CC GDTs, the results of the elderly group were 8 times ~ 10 times larger than that of the young group, and there was a statistically significant difference between the groups. These data also showed a different trend of GDTs in comparison with the previous data obtained from musical stimuli.This study suggests that GDTs may influence pitch perception mechanisms and can be used as psychoacoustic evidence for nonlinear responses of auditory nervous system.

Text Extraction from Complex Natural Images

  • Kumar, Manoj;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.6 no.2
    • /
    • pp.1-5
    • /
    • 2010
  • The rapid growth in communication technology has led to the development of effective ways of sharing ideas and information in the form of speech and images. Understanding this information has become an important research issue and drawn the attention of many researchers. Text in a digital image contains much important information regarding the scene. Detecting and extracting this text is a difficult task and has many challenging issues. The main challenges in extracting text from natural scene images are the variation in the font size, alignment of text, font colors, illumination changes, and reflections in the images. In this paper, we propose a connected component based method to automatically detect the text region in natural images. Since text regions in mages contain mostly repetitions of vertical strokes, we try to find a pattern of closely packed vertical edges. Once the group of edges is found, the neighboring vertical edges are connected to each other. Connected regions whose geometric features lie outside of the valid specifications are considered as outliers and eliminated. The proposed method is more effective than the existing methods for slanted or curved characters. The experimental results are given for the validation of our approach.