• Title/Summary/Keyword: Audio Data

Search Result 883, Processing Time 0.026 seconds

Design and Implementation of an Embedded Audio Video Bridging Platform for Multichannel Multimedia Transmission (다채널 멀티미디어 전송용 임베디드 Audio Video Bridging 플랫폼 설계 및 구현)

  • Wee, Jungwook;Park, Kyoungwon;Kwon, Kiwon;Song, Byoungchul;Kang, Mingoo
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.1-6
    • /
    • 2015
  • In this paper, we designed an embedded audio video bridging (AVB) platform based on IEEE 802.1BA for real-time multimedia transmission in smart-car, smart-home, smart-theater, and then evaluated a performance of the implemented platform by analysis of IEEE 802.1AS (time synchronization protocol) and IEEE 802.1Qat (stream reservation protocol). Especially, the AVB Layer-2 protocol of MRP(Multiple Registration Protocol), MMAP(Multicast Address Acquisition Protocol), IEEE 1722, 1722.1 etc. was and implemented by linux based operating system. It is shown by interoperability tests with commercial products that the implemented platform transmits real-time multichannel AV data over AVB networks for Multichannel Multimedia Transmission.

Deep Learning-Based User Emergency Event Detection Algorithms Fusing Vision, Audio, Activity and Dust Sensors (영상, 음성, 활동, 먼지 센서를 융합한 딥러닝 기반 사용자 이상 징후 탐지 알고리즘)

  • Jung, Ju-ho;Lee, Do-hyun;Kim, Seong-su;Ahn, Jun-ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.109-118
    • /
    • 2020
  • Recently, people are spending a lot of time inside their homes because of various diseases. It is difficult to ask others for help in the case of a single-person household that is injured in the house or infected with a disease and needs help from others. In this study, an algorithm is proposed to detect emergency event, which are situations in which single-person households need help from others, such as injuries or disease infections, in their homes. It proposes vision pattern detection algorithms using home CCTVs, audio pattern detection algorithms using artificial intelligence speakers, activity pattern detection algorithms using acceleration sensors in smartphones, and dust pattern detection algorithms using air purifiers. However, if it is difficult to use due to security issues of home CCTVs, it proposes a fusion method combining audio, activity and dust pattern sensors. Each algorithm collected data through YouTube and experiments to measure accuracy.

Comparisions of stream activation mechanisms in computer based teleconferencing systems for low delay (지연 축소를 위한 컴퓨터 영상회의 시스템의 시트림 동작 구조 비교)

  • Lee, Gyeong-Hui;Kim, Du-Hyeon;Gang, Min-Gyu;Jeong, Chan-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.363-376
    • /
    • 1997
  • In this paper, we present a hardware architecture and a sofrware architecture for cimputer based teleconferencing systems.And also we analyse stream adtivation mechanisms for them form the viewpoint of delay. MuX that is a multimedia I/O server provides various processing elements for data I/O, synchronization, interleaving and mixing.We describe methods to build teleconferencing systems with the elements and compares the technique using master click with the techniquie using self clock.In the plase of dta input.the technique using self click is berrer than the technique using master clock.When we generate interleved stream from audio and video stream and activate channel objects by periodic audio stream as activation clock, dealy from imput audio stream to imterleved stream is reduced but delay for video stream is not reduced as much as in the case of audio stream.

  • PDF

A Comparison of Speech/Music Discrimination Features for Audio Indexing (오디오 인덱싱을 위한 음성/음악 분류 특징 비교)

  • 이경록;서봉수;김진영
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.10-15
    • /
    • 2001
  • In this paper, we describe the comparison between the combination of features using a speech and music discrimination, which is classifying between speech and music on audio signals. Audio signals are classified into 3classes (speech, music, speech and music) and 2classes (speech, music). Experiments carried out on three types of feature, Mel-cepstrum, energy, zero-crossings, and try to find a best combination between features to speech and music discrimination. We using a Gaussian Mixture Model (GMM) for discrimination algorithm and combine different features into a single vector prior to modeling the data with a GMM. In 3classes, the best result is achieved using Mel-cepstrum, energy and zero-crossings in a single feature vector (speech: 95.1%, music: 61.9%, speech & music: 55.5%). In 2classes, the best result is achieved using Mel-cepstrum, energy and Mel-cepstrum, energy, zero-crossings in a single feature vector (speech: 98.9%, music: 100%).

  • PDF

The Development of Multimedia Player Platform for Terrestrial Digital Multimedia Broadcasting (DMB) (지상파 이동 멀티미디어방송용 멀티미디어 재생기 개발)

  • 기명석;서정일;강경옥
    • Journal of Broadcast Engineering
    • /
    • v.8 no.4
    • /
    • pp.465-472
    • /
    • 2003
  • In this paper we propose the structure of MPEG-4 multimedia player platform for Terrestrial Digital Multimedia Broadcasting (DMB) Service. Korea will launch DMB service at next 2004 you based on Eureka-147 Digital Audio Broadcasting (DAB) Service System. This new mobile multimedia broadcasting services provide not only high quality digital audio broadcasting services, but also various multimedia data broadcasting services including high quality video. For the sake of MPEG-4 Systems technologies, it will provide an interactive service to users in the near future. Therefore it terminal shall have various functionalities as well as playing audio-visual contents. However there is no precedence standard for such mobile interactive multimedia broadcasting system. Therefore it is very import to provide the multimedia player platform of DMB service for accelerating the development process of commercial terminal and providing a direction of next DMB terminal structure.

Implementation of MP3 decoder with TMS320C541 DSP (TMS320C541 DSP를 이용한 MP3 디코더 구현)

  • 윤병우
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.3
    • /
    • pp.7-14
    • /
    • 2003
  • MPEG-1 audio standard is the algorithm for the compression of high-qualify digital audio signals. The standard dictates the functions of encoder and decoder pair, and includes three different layers as the complexity and the performance of the encoder and decoder. In this paper, we implemented the real-time system of MPEG-1 audio layer III decoder(MP3) with the TMS320C541 fixed point DSP chip. MP3 algorithm uses psycho-acoustic characteristic of human hearing system, and it reduces the amount of data with eliminating the signals hard to be heard to the hearing system of human being. It is difficult to implement MP3 decoder with fixed Point DSP because of it's broad dynamic range. We implemented realtime system with fixed DSP chip by using weighted look-up tables to reduce the amount of calculation and solve the problem of broad dynamic range.

  • PDF

Improving Fidelity of Synthesized Voices Generated by Using GANs (GAN으로 합성한 음성의 충실도 향상)

  • Back, Moon-Ki;Yoon, Seung-Won;Lee, Sang-Baek;Lee, Kyu-Chul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.9-18
    • /
    • 2021
  • Although Generative Adversarial Networks (GANs) have gained great popularity in computer vision and related fields, generating audio signals independently has yet to be presented. Unlike images, an audio signal is a sampled signal consisting of discrete samples, so it is not easy to learn the signals using CNN architectures, which is widely used in image generation tasks. In order to overcome this difficulty, GAN researchers proposed a strategy of applying time-frequency representations of audio to existing image-generating GANs. Following this strategy, we propose an improved method for increasing the fidelity of synthesized audio signals generated by using GANs. Our method is demonstrated on a public speech dataset, and evaluated by Fréchet Inception Distance (FID). When employing our method, the FID showed 10.504, but 11.973 as for the existing state of the art method (lower FID indicates better fidelity).

Shooting sound analysis using convolutional neural networks and long short-term memory (합성곱 신경망과 장단기 메모리를 이용한 사격음 분석 기법)

  • Kang, Se Hyeok;Cho, Ji Woong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.312-318
    • /
    • 2022
  • This paper proposes a model which classifies the type of guns and information about sound source location using deep neural network. The proposed classification model is composed of convolutional neural networks (CNN) and long short-term memory (LSTM). For training and test the model, we use the Gunshot Audio Forensic Dataset generated by the project supported by the National Institute of Justice (NIJ). The acoustic signals are transformed to Mel-Spectrogram and they are provided as learning and test data for the proposed model. The model is compared with the control model consisting of convolutional neural networks only. The proposed model shows high accuracy more than 90 %.

Low-Latency Implementation of Multi-channel in AoIP/UDP-based Audio Communication (AoIP/UDP 기반 오디오 통신의 다중 채널 Low-Latency 구현)

  • Seung-Do Yang;Jin-ku Choi
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.59-64
    • /
    • 2023
  • Fire and disaster broadcasting systems are divided into analog, digital, and network-based digital public address systems, and important specifications in network-based digital public address systems are low-latency audio, high sampling rate, and multi-channel input and output. In the past, it has been widely used to the AoE method for distinguishing based on the MAC address of the data link layer. However, this method has a problem of increasing complexity and cost. This proposal is an AoIP/UDP method, which allows communication to be easily distinguished by IP address without the need for a separate redundant network, so that the network can be freely used and configured, and cost can be reduced by reducing complexity. After implementing the AoIP/UDP method, the experimental results showed that the cost was improved with the equivalent performance with 2.66ms latency.

A Research on the Audio Utilization Method for Generating Movie Genre Metadata (영화 장르 메타데이터 생성을 위한 오디오 활용 방법에 대한 연구)

  • Yong, Sung-Jung;Park, Hyo-Gyeong;You, Yeon-Hwi;Moon, Il-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.284-286
    • /
    • 2021
  • With the continuous development of the Internet and digital, platforms are emerging to store large amounts of media data and provide customized services to individuals through online. Companies that provide these services recommend movies that suit their personal tastes to promote media consumption. Each company is doing a lot of research on various algorithms to recommend media that users prefer. Movies are divided into genres such as action, melodrama, horror, and drama, and the film's audio (music, sound effect, voice) is an important production element that makes up the film. In this research, based on movie trailers, we extract audio for each genre, check the commonalities of audio for each genre, distinguish movie genres through supervised learning of artificial intelligence, and propose a utilization method for generating metadata in the future.

  • PDF