• Title/Summary/Keyword: Audio Data

Search Result 879, Processing Time 0.029 seconds

Search speed improved minimum audio fingerprinting using the difference of Gaussian (가우시안의 차를 이용하여 검색속도를 향상한 최소 오디오 핑거프린팅)

  • Kwon, Jin-Man;Ko, Il-Ju;Jang, Dae-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.12
    • /
    • pp.75-87
    • /
    • 2009
  • This paper, which is about the method of creating the audio fingerprint and comparing with the audio data, presents how to distinguish music using the characteristics of audio data. It is a process of applying the Difference of Gaussian (DoG: generally used for recognizing images) to the audio data, and to extract the music that changes radically, and to define the location of fingerprint. This fingerprint is made insensitive to the changes of sound, and is possible to extract the same location of original fingerprint with just a portion of music data. By reducing the data and calculation of fingerprint, this system indicates more efficiency than the pre-system which uses pre-frequency domain. Adopting this, it is possible to indicate the copyrighted music distributed in internet, or meta information of music to users.

A Content-based Audio Retrieval System Supporting Efficient Expansion of Audio Database (음원 데이터베이스의 효율적 확장을 지원하는 내용 기반 음원 검색 시스템)

  • Park, Ji Hun;Kang, Hyunchul
    • Journal of Digital Contents Society
    • /
    • v.18 no.5
    • /
    • pp.811-820
    • /
    • 2017
  • For content-based audio retrieval which is one of main functions in audio service, the techniques for extracting fingerprints from the audio source, storing and indexing them in a database are widely used. However, if the fingerprints of new audio sources are continually inserted into the database, there is a problem that space efficiency as well as audio retrieval performance are gradually deteriorated. Therefore, there is a need for techniques to support efficient expansion of audio database without periodic reorganization of the database that would increase the system operation cost. In this paper, we design a content-based audio retrieval system that solves this problem by using MapReduce and NoSQL database in a cluster computing environment based on the Shazam's fingerprinting algorithm, and evaluate its performance through a detailed set of experiments using real world audio data.

DNN based Speech Detection for the Media Audio (미디어 오디오에서의 DNN 기반 음성 검출)

  • Jang, Inseon;Ahn, ChungHyun;Seo, Jeongil;Jang, Younseon
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.632-642
    • /
    • 2017
  • In this paper, we propose a DNN based speech detection system using acoustic characteristics and context information of media audio. The speech detection for discriminating between speech and non-speech included in the media audio is a necessary preprocessing technique for effective speech processing. However, since the media audio signal includes various types of sound sources, it has been difficult to achieve high performance with the conventional signal processing techniques. The proposed method improves the speech detection performance by separating the harmonic and percussive components of the media audio and constructing the DNN input vector reflecting the acoustic characteristics and context information of the media audio. In order to verify the performance of the proposed system, a data set for speech detection was made using more than 20 hours of drama, and an 8-hour Hollywood movie data set, which was publicly available, was further acquired and used for experiments. In the experiment, it is shown that the proposed system provides better performance than the conventional method through the cross validation for two data sets.

A Hybrid QoS Guarantee Scheme for High-Quality Audio Streaming Services on the Internet (인터넷에서 고품질 오디오 스트리밍 서비스를 위한 복합적 QoS 보장 기법)

  • 손주영;유성일
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.1
    • /
    • pp.54-63
    • /
    • 2004
  • This paper describes a hybrid QoS guarantee scheme for high quality audio streaming services on the Internet. The continuous playback of the audio data requires the isochronous transmission of the audio data packet through the Internet. In order to retain the QoS at the ultimate destination (client) as the same as servers provide, the transmission protocols should consider the error conditions such as packet loss, and out of order delivery. Generally, the protocols supporting the transmission of continuous media data do not try to recover the errors. The protocols are working somehow for the toll quality multimedia streaming services, but rot for the high quality streaming services, such as the DVD sound/music payback. The hybrid QoS guarantee scheme includes the three mechanisms to overcome the problem. The selective retransmission for the lost packet, the adaptive buffering at client-side, and the adaptive transmission rate at server-side are totally adopted to recover the packet loss with the minimal overhead, to prevent from the buffer starvation during the retransmission, and to maintain the isochronous transmission even after the retransmission. The experiments have shown good results for the high Quality audio streaming services on the Internet.

  • PDF

A Case Study of the Audio-Visual Archives System Development and Management (시청각(사진/동영상) 기록물 관리를 위한 시스템 구축과 운영 사례 연구)

  • Shin, Dong-Hyeon;Jung, Se-Young;Kim, Seon-Heon
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.9 no.1
    • /
    • pp.33-50
    • /
    • 2009
  • ADD(Agency for Defense Development) has developed digital audio-visual archives management system to ensure easy access and long-term preservation for digital audio-visual archives. This paper covers total process of the system development and database management in the aspect of preservation and utilization by users' easy search through digitization of audio-visual archives. In detail, it contains system design for images and video data handling, standard workflow establishment, data quality, and metadata settings for database by converting an analog data into digital format. Also, this study emphasizes the importance of audio-visual archives management system through cost-effectiveness analysis.

Musician Search in Time-Series Pattern Index Files using Features of Audio (오디오 특징계수를 이용한 시계열 패턴 인덱스 화일의 뮤지션 검색 기법)

  • Kim, Young-In
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.69-74
    • /
    • 2006
  • The recent development of multimedia content-based retrieval technologies brings great attention of musician retrieval using features of a digital audio data among music information retrieval technologies. But the indexing techniques for music databases have not been studied completely. In this paper, we present a musician retrieval technique for audio features using the space split methods in the time-series pattern index file. We use features of audio to retrieve the musician and a time-series pattern index file to search the candidate musicians. Experimental results show that the time-series pattern index file using the rotational split method is efficient for musician retrievals in the time-series pattern files.

  • PDF

Audio Steganography Method Using Least Significant Bit (LSB) Encoding Technique

  • Alarood, Alaa Abdulsalm;Alghamdi, Ahmed Mohammed;Alzahrani, Ahmed Omar;Alzahrani, Abdulrahman;Alsolami, Eesa
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.427-442
    • /
    • 2022
  • MP3 is one of the most widely used file formats for encoding and representing audio data. One of the reasons for this popularity is their significant ability to reduce audio file sizes in comparison to other encoding techniques. Additionally, other reasons also include ease of implementation, its availability and good technical support. Steganography is the art of shielding the communication between two parties from the eyes of attackers. In steganography, a secret message in the form of a copyright mark, concealed communication, or serial number can be embedded in an innocuous file (e.g., computer code, video film, or audio recording), making it impossible for the wrong party to access the hidden message during the exchange of data. This paper describes a new steganography algorithm for encoding secret messages in MP3 audio files using an improved least significant bit (LSB) technique with high embedding capacity. Test results obtained shows that the efficiency of this technique is higher compared to other LSB techniques.

A System of Audio Data Analysis and Masking Personal Information Using Audio Partitioning and Artificial Intelligence API (오디오 데이터 내 개인 신상 정보 검출과 마스킹을 위한 인공지능 API의 활용 및 음성 분할 방법의 연구)

  • Kim, TaeYoung;Hong, Ji Won;Kim, Do Hee;Kim, Hyung-Jong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.5
    • /
    • pp.895-907
    • /
    • 2020
  • With the recent increasing influence of multimedia content other than the text-based content, services that help to process information in content brings us great convenience. These services' representative features are searching and masking the sensitive data. It is not difficult to find the solutions that provide searching and masking function for text information and image. However, even though we recognize the necessity of the technology for searching and masking a part of the audio data, it is not easy to find the solution because of the difficulty of the technology. In this study, we propose web application that provides searching and masking functions for audio data using audio partitioning method. While we are achieving the research goal, we evaluated several speech to text conversion APIs to choose a proper API for our purpose and developed regular expressions for searching sensitive information. Lastly we evaluated the accuracy of the developed searching and masking feature. The contribution of this work is in design and implementation of searching and masking a sensitive information from the audio data by the various functionality proving experiments.

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

The Digital Redundancy Design for Back-up Mode Operation of Aviation Intercom (항공용 인터콤의 백업 모드 운용을 위한 디지털 방식의 이중화 설계)

  • Jeong, Seong-jae;Cho, Kyung-hak;Kim, Dong-hyouk;Lee, Seong-woo
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.5
    • /
    • pp.358-364
    • /
    • 2022
  • The Inter Communication System for avionics is in charge of processing all voice signals that internal calls between Pilot and Co-pilot, internal calls between Pilots and Crews, external calls through communication equipment such as Ultra/Very High Frequency Receiver/Transmitter(U/VHF RT), audio signal monitoring for navigation and mission equipment such as VHF Omnidirectional Range/Instrument Landing System(VOR/ILS), Tactical Air Navigation(TACAN), audio signal output for voice recording to Flight Data Recorder(FDR) and Data Transfer System(DTS), and warning/caution audio signal generate about the status and threat of aircraft. Because Inter Communication System for avionics is sensitive to noise in the case of analog audio signals, a redundant design that can protect audio signal from electromagnetic noise inside/outside of aircraft is required for the mission of pilots and crews. In this paper, Normal/Back-up operation mode and redundancy design plan based on digital method for the redundancy of the digital Inter Communication System for avionics and manufacturing, verification results are described.