• Title/Summary/Keyword: Speech Processing

Search Result 956, Processing Time 0.028 seconds

The Low Cost Implementation of Speech Recognition System for the Web (웹에서의 저가 음성인식 시스템의 구현)

  • Park, Yong-Beom;Park, Jong-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.1129-1135
    • /
    • 1999
  • isolated word recognition using the Dynamic Time warping algorithm has shown good recognition rate on speaker dependent environment. But, practically, since the searching time of the dynamic Time Warping algorithm is rapidly increased as searching data is increased. it is hard to implement. In the context-dependent-short-query system such as educational children's workbook on the Web, the number of responses to the specific questions is limited. Therefore, the searching space for the answers can be reduced depending on the questions. In this paper, low cost implementation method using DTW for the Web has been proposed. To cover the weakness of DTW, the searching space is reduced by the context. the searching space, depends on the specific questions, is chosen from interest searchable candidates. In the real implementation, the proposed method show better performance of both time and recognition rate.

  • PDF

Training Network Design Based on Convolution Neural Network for Object Classification in few class problem (소 부류 객체 분류를 위한 CNN기반 학습망 설계)

  • Lim, Su-chang;Kim, Seung-Hyun;Kim, Yeon-Ho;Kim, Do-yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.1
    • /
    • pp.144-150
    • /
    • 2017
  • Recently, deep learning is used for intelligent processing and accuracy improvement of data. It is formed calculation model composed of multi data processing layer that train the data representation through an abstraction of the various levels. A category of deep learning, convolution neural network is utilized in various research fields, which are human pose estimation, face recognition, image classification, speech recognition. When using the deep layer and lots of class, CNN that show a good performance on image classification obtain higher classification rate but occur the overfitting problem, when using a few data. So, we design the training network based on convolution neural network and trained our image data set for object classification in few class problem. The experiment show the higher classification rate of 7.06% in average than the previous networks designed to classify the object in 1000 class problem.

Design of Programmable SC Filter (프로그램 가능한 SC Filter의 설계)

  • 이병수;이종악
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.3
    • /
    • pp.172-178
    • /
    • 1986
  • The recent interest in the design of filters is motivatied by the fact that such filter can be fully integrated using standard metal-oxide-semiconductor processing technology. This is due to replacing all the resistors in the active RC filter network by the switched capacitors. The voltage gain of a SC filter depends only on the rations of capacitance and these ratios can be obtained and maintained to high accuracy. Therefore, it is known that a switched capacitor is much better than a resistor in temperature and linearity characteristics. This paper proposed a programmable SC filter and proved the fact that ${omega}_0$ Q and G of this circuit can be controlled by digital signal. Experiments show that SC filter remains the low sensitivities but it can't avoid little influence of parasitic capacitance. As the transfer characteristic of the SC filter is varied with sampling frequency and resistor array, SC filtering technigue can be applied for digital processing, speech analysis and synthesis and so on.

  • PDF

Design of The Loudness Ratings And Talker Echo For ISDN Telephone (ISDN 전화기의 음량 정격 및 송화자 에코설계)

  • Hong, Jin-Woo;Kang, Kyeong-Ok;Kang, Seong-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.32-40
    • /
    • 1994
  • It is the purpose of this paper to describe the methods for establishing loudness ratings and talker echo out of transmission quality of ISDN telephone connected to fully digital network. In order to design the desirable loudness ratings and talker echo for ISDN telephone, the model system of digital speech communication for subjective tests is developed. Using this model system, opinion tests which decide the optimal CODEC input level, the range of overall loudness rating, sidetone masking rating and talker echo are performed. From the results of tests, we decided that the loudness ratings are 6 to 8dB for sending, 0 to 2dB for receiving, and 8 to 12dB for sidetone masking rating. And, the terminal coupling loss of TCLw of at least 40dB is necessary to provide echo-free telephone communications to telophone users when the overall loudness rating of ISDN telephone is normalized to 10dB.

  • PDF

An Adaptive AEC Based on the Wavelet Transform Using M-channel Subband QMF Filter Banks (M-채널 서브밴드 QMF 필터뱅크를 이용한 웨이브릿변환기반 적응 음향반향제거기)

  • 안주원;권기룡;문광석;김문수
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.4
    • /
    • pp.347-355
    • /
    • 2000
  • This paper presents an adaptive AEC(acoustic echo canceller) based on the wavelet transform using M-channel subband QMF filter banks. The proposed algorithm improves the performance of AEC with a realtime process by a low complexity of wavelet transform filter banks, a subband processing and a orthogonality of wavelet subband filter. Adaptive filter coefficients of each subband are updated using LMS algorithm with a low complexity and a easy realization for a realtime processing and a reduction of hardware cost. For a input signal, a white Gaussian noise and a real speech signal with a environment noises are used for a performance estimation of the proposed algorithm. As a result of computer simulation, the proposed AEC has a low asymptotic error, a low computation complexity and a robust performance.

  • PDF

Development for Estimation Model of Runway Visual Range using Deep Neural Network (심층신경망을 활용한 활주로 가시거리 예측 모델 개발)

  • Ku, SungKwan;Hong, SeokMin
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.435-442
    • /
    • 2017
  • The runway visual range affected by fog and so on is one of the important indicators to determine whether aircraft can take off and land at the airport or not. In the case of airports where transportation airplanes are operated, major weather forecasts including the runway visual range for local area have been released and provided to aviation workers for recognizing that. This paper proposes a runway visual range estimation model with a deep neural network applied recently to various fields such as image processing, speech recognition, natural language processing, etc. It is developed and implemented for estimating a runway visual range of local airport with a deep neural network. It utilizes the past actual weather observation data of the applied airfield for constituting the learning of the neural network. It can show comparatively the accurate estimation result when it compares the results with the existing observation data. The proposed model can be used to generate weather information on the airfield for which no other forecasting function is available.

A Contrast Enhancement Method using the Contrast Measure in the Laplacian Pyramid for Digital Mammogram (디지털 맘모그램을 위한 라플라시안 피라미드에서 대비 척도를 이용한 대비 향상 방법)

  • Jeon, Geum-Sang;Lee, Won-Chang;Kim, Sang-Hee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.2
    • /
    • pp.24-29
    • /
    • 2014
  • Digital mammography is the most common technique for the early detection of breast cancer. To diagnose the breast cancer in early stages and treat efficiently, many image enhancement methods have been developed. This paper presents a multi-scale contrast enhancement method in the Laplacian pyramid for the digital mammogram. The proposed method decomposes the image into the contrast measures by the Gaussian and Laplacian pyramid, and the pyramid coefficients of decomposed multi-resolution image are defined as the frequency limited local contrast measures by the ratio of high frequency components and low frequency components. The decomposed pyramid coefficients are modified by the contrast measure for enhancing the contrast, and the final enhanced image is obtained by the composition process of the pyramid using the modified coefficients. The proposed method is compared with other existing methods, and demonstrated to have quantitatively good performance in the contrast measure algorithm.

A study on performance improvement of neural network using output probability of HMM (HMM의 출력확률을 이용한 신경회로망의 성능향상에 관한 연구)

  • Pyo Chang Soo;Kim Chang Keun;Hur Kang In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.1 no.1
    • /
    • pp.1-6
    • /
    • 2000
  • In this paper, the hybrid system of HMM and neural network is proposed and show better recognition rate of the post-process procedure which minimizes the process error of recognition than that of HMM(Hidden Markov Model) only used. After the HMM training by training data, testing data that are not taken part in the training are sent to HMM. The output probability from HMM output by testing data is used for the training data of the neural network, post processor. After neural network training, the hybrid system is completed. This hybrid system makes the recognition rate improvement of about $4.5\%$ in MLP and about $2\%$ in RBFN and gives the solution to training time of conventional hybrid system and to decrease of the recognition rate due to the lack of training data in real-time speech recognition system.

  • PDF

Multi-Modal Instruction Recognition System using Speech and Gesture (음성 및 제스처를 이용한 멀티 모달 명령어 인식 시스템)

  • Kim, Jung-Hyun;Rho, Yong-Wan;Kwon, Hyung-Joon;Hong, Kwang-Seok
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2006.06a
    • /
    • pp.57-62
    • /
    • 2006
  • 휴대용 단말기의 소형화 및 지능화와 더불어 차세대 PC 기반의 유비쿼터스 컴퓨팅에 대한 관심이 높아짐에 따라 최근에는 펜이나 음성 입력 멀티미디어 등 여러 가지 대화 모드를 구비한 멀티 모달 상호작용 (Multi-Modal Interaction MMI)에 대한 연구가 활발히 진행되고 있다. 따라서, 본 논문에서는 잡음 환경에서의 명확한 의사 전달 및 휴대용 단말기에서의 음성-제스처 통합 인식을 위한 인터페이스의 연구를 목적으로 Voice-XML과 Wearable Personal Station(WPS) 기반의 음성 및 내장형 수화 인식기를 통합한 멀티 모달 명령어 인식 시스템 (Multi-Modal Instruction Recognition System : MMIRS)을 제안하고 구현한다. 제안되어진 MMIRS는 한국 표준 수화 (The Korean Standard Sign Language : KSSL)에 상응하는 문장 및 단어 단위의 명령어 인식 모델에 대하여 음성뿐만 아니라 화자의 수화제스처 명령어를 함께 인식하고 사용함에 따라 잡음 환경에서도 규정된 명령어 모델에 대한 인식 성능의 향상을 기대할 수 있다. MMIRS의 인식 성능을 평가하기 위하여, 15인의 피험자가 62개의 문장형 인식 모델과 104개의 단어인식 모델에 대하여 음성과 수화 제스처를 연속적으로 표현하고, 이를 인식함에 있어 개별 명령어 인식기 및 MMIRS의 평균 인식율을 비교하고 분석하였으며 MMIRS는 문장형 명령어 인식모델에 대하여 잡음환경에서는 93.45%, 비잡음환경에서는 95.26%의 평균 인식율을 나타내었다.

  • PDF

A Study on Area Detection Using Transfer-Learning Technique (Transfer-Learning 기법을 이용한 영역검출 기법에 관한 연구)

  • Shin, Kwang-seong;Shin, Seong-yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.178-179
    • /
    • 2018
  • Recently, methods of using machine learning in artificial intelligence such as autonomous navigation and speech recognition have been actively studied. Classical image processing methods such as classical boundary detection and pattern recognition have many limitations in order to recognize a specific object or area in a digital image. However, when a machine learning method such as deep-learning is used, Can be obtained. However, basically, a large amount of learning data must be secured for machine learning such as deep-learning. Therefore, it is difficult to apply the machine learning for area classification when the amount of data is very small, such as aerial photographs for environmental analysis. In this study, we apply a transfer-learning technique that can be used when the dataset size of the input image is small and the shape of the input image is not included in the category of the training dataset.

  • PDF