• Title/Summary/Keyword: Speech & image processing system

Search Result 25, Processing Time 0.03 seconds

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

  • Shin, Sung-Hwan;Jung, Ho-Young;Juang, Biing-Hwang
    • ETRI Journal
    • /
    • v.33 no.3
    • /
    • pp.423-433
    • /
    • 2011
  • This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.

Auxiliary Stacked Denoising Autoencoder based Collaborative Filtering Recommendation

  • Mu, Ruihui;Zeng, Xiaoqin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.6
    • /
    • pp.2310-2332
    • /
    • 2020
  • In recent years, deep learning techniques have achieved tremendous successes in natural language processing, speech recognition and image processing. Collaborative filtering(CF) recommendation is one of widely used methods and has significant effects in implementing the new recommendation function, but it also has limitations in dealing with the problem of poor scalability, cold start and data sparsity, etc. Combining the traditional recommendation algorithm with the deep learning model has brought great opportunity for the construction of a new recommender system. In this paper, we propose a novel collaborative recommendation model based on auxiliary stacked denoising autoencoder(ASDAE), the model learns effective the preferences of users from auxiliary information. Firstly, we integrate auxiliary information with rating information. Then, we design a stacked denoising autoencoder based collaborative recommendation model to learn the preferences of users from auxiliary information and rating information. Finally, we conduct comprehensive experiments on three real datasets to compare our proposed model with state-of-the-art methods. Experimental results demonstrate that our proposed model is superior to other recommendation methods.

Future Trends of AI-Based Smart Systems and Services: Challenges, Opportunities, and Solutions

  • Lee, Daewon;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.717-723
    • /
    • 2019
  • Smart systems and services aim to facilitate growing urban populations and their prospects of virtual-real social behaviors, gig economies, factory automation, knowledge-based workforce, integrated societies, modern living, among many more. To satisfy these objectives, smart systems and services must comprises of a complex set of features such as security, ease of use and user friendliness, manageability, scalability, adaptivity, intelligent behavior, and personalization. Recently, artificial intelligence (AI) is realized as a data-driven technology to provide an efficient knowledge representation, semantic modeling, and can support a cognitive behavior aspect of the system. In this paper, an integration of AI with the smart systems and services is presented to mitigate the existing challenges. Several novel researches work in terms of frameworks, architectures, paradigms, and algorithms are discussed to provide possible solutions against the existing challenges in the AI-based smart systems and services. Such novel research works involve efficient shape image retrieval, speech signal processing, dynamic thermal rating, advanced persistent threat tactics, user authentication, and so on.

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

  • Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.233-243
    • /
    • 2016
  • Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.

A Contrast Enhancement Method using the Contrast Measure in the Laplacian Pyramid for Digital Mammogram (디지털 맘모그램을 위한 라플라시안 피라미드에서 대비 척도를 이용한 대비 향상 방법)

  • Jeon, Geum-Sang;Lee, Won-Chang;Kim, Sang-Hee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.2
    • /
    • pp.24-29
    • /
    • 2014
  • Digital mammography is the most common technique for the early detection of breast cancer. To diagnose the breast cancer in early stages and treat efficiently, many image enhancement methods have been developed. This paper presents a multi-scale contrast enhancement method in the Laplacian pyramid for the digital mammogram. The proposed method decomposes the image into the contrast measures by the Gaussian and Laplacian pyramid, and the pyramid coefficients of decomposed multi-resolution image are defined as the frequency limited local contrast measures by the ratio of high frequency components and low frequency components. The decomposed pyramid coefficients are modified by the contrast measure for enhancing the contrast, and the final enhanced image is obtained by the composition process of the pyramid using the modified coefficients. The proposed method is compared with other existing methods, and demonstrated to have quantitatively good performance in the contrast measure algorithm.

Recent Advances in Examination of Vocal Fold Vibration (성대진동검사의 최신 지견)

  • Lee, Jin-Choon;Bae, Inho
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • Human vocal cords vibrate as quickly as 100-250 times per second, so it is impossible to observe them with normal endoscopic diagnostic equipment. High-speed videolaryngoscopy (HSV) allows the visualization of non-periodic vibratory motion of vocal fold beyond the limitation of videostroboscopy. New developed post-processing methods that converts HSV to two-dimensional videokymography (2D VKG) using U-medical image-processing software can provide quantitative information on vocal fold mucosa vibration. Multifunctional laryngeal examination system is composed of 3 kinds of examinations such as HSV, 2D scanning digital kymography (2D DKG) and line scanning digital kymography (DKG). Evaluation of entire vocal cord vibratory pattern in each cord is possible using 2D DKG and a faster and more reliable quantitative information can be obtained. As this system is used in clinical and research, it is expected to bring much advances to the diagnosis of voice disorders. In this review, I will introduce the principles and advantages on examination of the vocal fold vibration, which is in the spotlight recently, and proceed with the literature review.

A Study on the Learning Efficiency of Multilayered Neural Networks using Variable Slope (기울기 조정에 의한 다층 신경회로망의 학습효율 개선방법에 대한 연구)

  • 이형일;남재현;지선수
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.20 no.42
    • /
    • pp.161-169
    • /
    • 1997
  • A variety of learning methods are used for neural networks. Among them, the backpropagation algorithm is most widely used in such image processing, speech recognition, and pattern recognition. Despite its popularity for these application, its main problem is associated with the running time, namely, too much time is spent for the learning. This paper suggests a method which maximize the convergence speed of the learning. Such reduction in e learning time of the backpropagation algorithm is possible through an adaptive adjusting of the slope of the activation function depending on total errors, which is named as the variable slope algorithm. Moreover experimental results using this variable slope algorithm is compared against conventional backpropagation algorithm and other variations; which shows an improvement in the performance over pervious algorithms.

  • PDF

Optimization of Memristor Devices for Reservoir Computing (축적 컴퓨팅을 위한 멤리스터 소자의 최적화)

  • Kyeongwoo Park;HyeonJin Sim;HoBin Oh;Jonghwan Lee
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.1
    • /
    • pp.1-6
    • /
    • 2024
  • Recently, artificial neural networks have been playing a crucial role and advancing across various fields. Artificial neural networks are typically categorized into feedforward neural networks and recurrent neural networks. However, feedforward neural networks are primarily used for processing static spatial patterns such as image recognition and object detection. They are not suitable for handling temporal signals. Recurrent neural networks, on the other hand, face the challenges of complex training procedures and requiring significant computational power. In this paper, we propose memristors suitable for an advanced form of recurrent neural networks called reservoir computing systems, utilizing a mask processor. Using the characteristic equations of Ti/TiOx/TaOy/Pt, Pt/TiOx/Pt, and Ag/ZnO-NW/Pt memristors, we generated current-voltage curves to verify their memristive behavior through the confirmation of hysteresis. Subsequently, we trained and inferred reservoir computing systems using these memristors with the NIST TI-46 database. Among these systems, the accuracy of the reservoir computing system based on Ti/TiOx/TaOy/Pt memristors reached 99%, confirming the Ti/TiOx/TaOy/Pt memristor structure's suitability for inferring speech recognition tasks.

  • PDF

Volume Control using Gesture Recognition System

  • Shreyansh Gupta;Samyak Barnwal
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.161-170
    • /
    • 2024
  • With the technological advances, the humans have made so much progress in the ease of living and now incorporating the use of sight, motion, sound, speech etc. for various application and software controls. In this paper, we have explored the project in which gestures plays a very significant role in the project. The topic of gesture control which has been researched a lot and is just getting evolved every day. We see the usage of computer vision in this project. The main objective that we achieved in this project is controlling the computer settings with hand gestures using computer vision. In this project we are creating a module which acts a volume controlling program in which we use hand gestures to control the computer system volume. We have included the use of OpenCV. This module is used in the implementation of hand gestures in computer controls. The module in execution uses the web camera of the computer to record the images or videos and then processes them to find the needed information and then based on the input, performs the action on the volume settings if that computer. The program has the functionality of increasing and decreasing the volume of the computer. The setup needed for the program execution is a web camera to record the input images and videos which will be given by the user. The program will perform gesture recognition with the help of OpenCV and python and its libraries and them it will recognize or identify the specified human gestures and use them to perform or carry out the changes in the device setting. The objective is to adjust the volume of a computer device without the need for physical interaction using a mouse or keyboard. OpenCV, a widely utilized tool for image processing and computer vision applications in this domain, enjoys extensive popularity. The OpenCV community consists of over 47,000 individuals, and as of a survey conducted in 2020, the estimated number of downloads exceeds 18 million.

Detection of Dangerous Things to Infants through Image Analysis and Deep Learning (이미지 분석과 딥 러닝을 통한 영유아 위험물 탐지)

  • Kim, Hui-Joon;Park, Kil-Seop;Seo, Yeong-Hak;Kim, Kyung-Sup
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.845-848
    • /
    • 2017
  • In this paper, we implemented a system to detect dangerous situations by recognizing the dangerous elements for infants by reading 2D images of children's houses, parks, playgrounds, and living rooms where infants are present through Faster R-CNN. We have implemented a detection model based on data that can be easily obtained from real life. Currently, machine learning is commercialized based on speech recognition and behavior data. However, this model can be applied to various service fields Respectively.