• Title/Summary/Keyword: Query-by-speech

Search Result 14, Processing Time 0.028 seconds

A Query-by-Speech Scheme for Photo Albuming (음성 질의 기반 디지털 사진 검색 기법)

  • Kim Tae-Sung;Suh Young-Joo;Lee Yong-Ju;Kim Hoi-Rin
    • MALSORI
    • /
    • no.57
    • /
    • pp.99-112
    • /
    • 2006
  • In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.

  • PDF

Study on the song title query by humming melody information (허밍 운율정보를 이용한 곡목 검색 기술)

  • Lee Ji-Yeoun;Hahn Min-Soo
    • MALSORI
    • /
    • no.44
    • /
    • pp.131-143
    • /
    • 2002
  • Music query by humming is a challenging problem since the humming signal inevitably contains much variation and inaccuracy. In this paper, we suggest an algorithm for querying a wanted song from music database by humming its melody. In order to suit or adapt the inaccurate peoples humming, a new melody representation technique is proposed. Our algorithm is basically a pitch and duration information-based one and performs fairly well. 85% of correct query rate of the song is achieved for the top 3 matches when tested with 20 songs.

  • PDF

Retrieval of Player Event in Golf Videos Using Spoken Content Analysis (음성정보 내용분석을 통한 골프 동영상에서의 선수별 이벤트 구간 검색)

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.674-679
    • /
    • 2009
  • This paper proposes a method of player event retrieval using combination of two functions: detection of player name in speech information and detection of sound event from audio information in golf videos. The system consists of indexing module and retrieval module. At the indexing time audio segmentation and noise reduction are applied to audio stream demultiplexed from the golf videos. The noise-reduced speech is then fed into speech recognizer, which outputs spoken descriptors. The player name and sound event are indexed by the spoken descriptors. At search time, text query is converted into phoneme sequences. The lists of each query term are retrieved through a description matcher to identify full and partial phrase hits. For the retrieval of the player name, this paper compares the results of word-based, phoneme-based, and hybrid approach.

Semantic-oriented Error Correction for Spoken Query Processing (음성 질의 처리를 위한 의미 기반 오류 수정)

  • Jeong Minwoo;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

Speech Query Recognition for Tamil Language Using Wavelet and Wavelet Packets

  • Iswarya, P.;Radha, V.
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1135-1148
    • /
    • 2017
  • Speech recognition is one of the fascinating fields in the area of Computer science. Accuracy of speech recognition system may reduce due to the presence of noise present in speech signal. Therefore noise removal is an essential step in Automatic Speech Recognition (ASR) system and this paper proposes a new technique called combined thresholding for noise removal. Feature extraction is process of converting acoustic signal into most valuable set of parameters. This paper also concentrates on improving Mel Frequency Cepstral Coefficients (MFCC) features by introducing Discrete Wavelet Packet Transform (DWPT) in the place of Discrete Fourier Transformation (DFT) block to provide an efficient signal analysis. The feature vector is varied in size, for choosing the correct length of feature vector Self Organizing Map (SOM) is used. As a single classifier does not provide enough accuracy, so this research proposes an Ensemble Support Vector Machine (ESVM) classifier where the fixed length feature vector from SOM is given as input, termed as ESVM_SOM. The experimental results showed that the proposed methods provide better results than the existing methods.

Music Recognition Using Audio Fingerprint: A Survey (오디오 Fingerprint를 이용한 음악인식 연구 동향)

  • Lee, Dong-Hyun;Lim, Min-Kyu;Kim, Ji-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.77-87
    • /
    • 2012
  • Interest in music recognition has been growing dramatically after NHN and Daum released their mobile applications for music recognition in 2010. Methods in music recognition based on audio analysis fall into two categories: music recognition using audio fingerprint and Query-by-Singing/Humming (QBSH). While music recognition using audio fingerprint receives music as its input, QBSH involves taking a user-hummed melody. In this paper, research trends are described for music recognition using audio fingerprint, focusing on two methods: one based on fingerprint generation using energy difference between consecutive bands and the other based on hash key generation between peak points. Details presented in the representative papers of each method are introduced.

A Study on Robust Speech Emotion Feature Extraction Under the Mobile Communication Environment (이동통신 환경에서 강인한 음성 감성특징 추출에 대한 연구)

  • Cho Youn-Ho;Park Kyu-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.269-276
    • /
    • 2006
  • In this paper, we propose an emotion recognition system that can discriminate human emotional state into neutral or anger from the speech captured by a cellular-phone in real time. In general. the speech through the mobile network contains environment noise and network noise, thus it can causes serious System performance degradation due to the distortion in emotional features of the query speech. In order to minimize the effect of these noise and so improve the system performance, we adopt a simple MA (Moving Average) filter which has relatively simple structure and low computational complexity, to alleviate the distortion in the emotional feature vector. Then a SFS (Sequential Forward Selection) feature optimization method is implemented to further improve and stabilize the system performance. Two pattern recognition method such as k-NN and SVM is compared for emotional state classification. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance such as 86.5%. so that it will be very useful in application areas such as customer call-center.

Real-time Data Integration using Ontology and Semantic Mediators (온톨로지와 시맨틱 중재 에이전트를 이용한 실시간 통합 환경 구축에 관한 연구)

  • Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.16 no.4
    • /
    • pp.151-178
    • /
    • 2006
  • The objective of this research is to develop a formal framework and methodology to facilitate real-time data integration, thus enabling semantic interoperability among distributed and heterogeneous information systems. The proposed approach is based on the concepts of "ontology" and "semantic mediators." An ontology is developed and used to capture the intension (including structure, integrity rules and meta-properties) of the database schema. We also develop the agent communication protocol for semantic reconciliation, which is based on the theory of speech acts and agent communication language. This protocol is used by a set of semantic mediators, which automatically detect and resolve various semantic conflicts at the data- and schema-levels by referring to the ontology. A mediation-based query processing technique is developed to provide uniform and integrated access to the multiple heterogeneous information sources. Prototype tools are being implemented to provide proof of concept for this work.

Generating Audio Adversarial Examples Using a Query-Efficient Decision-Based Attack (질의 효율적인 의사 결정 공격을 통한 오디오 적대적 예제 생성 연구)

  • Seo, Seong-gwan;Mun, Hyunjun;Son, Baehoon;Yun, Joobeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.1
    • /
    • pp.89-98
    • /
    • 2022
  • As deep learning technology was applied to various fields, research on adversarial attack techniques, a security problem of deep learning models, was actively studied. adversarial attacks have been mainly studied in the field of images. Recently, they have even developed a complete decision-based attack technique that can attack with just the classification results of the model. However, in the case of the audio field, research is relatively slow. In this paper, we applied several decision-based attack techniques to the audio field and improved state-of-the-art attack techniques. State-of-the-art decision-attack techniques have the disadvantage of requiring many queries for gradient approximation. In this paper, we improve query efficiency by proposing a method of reducing the vector search space required for gradient approximation. Experimental results showed that the attack success rate was increased by 50%, and the difference between original audio and adversarial examples was reduced by 75%, proving that our method could generate adversarial examples with smaller noise.

Multimodal Approach for Summarizing and Indexing News Video

  • Kim, Jae-Gon;Chang, Hyun-Sung;Kim, Young-Tae;Kang, Kyeong-Ok;Kim, Mun-Churl;Kim, Jin-Woong;Kim, Hyung-Myung
    • ETRI Journal
    • /
    • v.24 no.1
    • /
    • pp.1-11
    • /
    • 2002
  • A video summary abstracts the gist from an entire video and also enables efficient access to the desired content. In this paper, we propose a novel method for summarizing news video based on multimodal analysis of the content. The proposed method exploits the closed caption data to locate semantically meaningful highlights in a news video and speech signals in an audio stream to align the closed caption data with the video in a time-line. Then, the detected highlights are described using MPEG-7 Summarization Description Scheme, which allows efficient browsing of the content through such functionalities as multi-level abstracts and navigation guidance. Multimodal search and retrieval are also within the proposed framework. By indexing synchronized closed caption data, the video clips are searchable by inputting a text query. Intensive experiments with prototypical systems are presented to demonstrate the validity and reliability of the proposed method in real applications.

  • PDF