• Title/Summary/Keyword: Audio Data Processing

Search Result 172, Processing Time 0.023 seconds

A VLSI DESIGN OF CD SIGNAL PROCESSOR for High-Speed CD-ROM

  • Kim, Jae-Won;Kim, Jae-Seok;Lee, Jaeshin
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.1296-1299
    • /
    • 2002
  • We implemented a CD signal processor operated on a CAV 48-speed CD-ROM drive into a VLSI. The CD signal processor is a mixed mode monolithic IC including servo-processor, data recovery, data-processor, and I-bit DAC. For servo signal processing, we included a DSP core, while, for CAV mode playback, we adopted a PLL with a wide recovery range. Data processor (DP) was designed to meet the yellow book specification.[2]So, the DP block consists of EFM demodulator, C1/C2 ECC block, audio processor and a block transferring data to an ATAPI chip. A modified Euclid's algorithm was used as a key equation solver for the ECC block To achieve the high-speed decoding, the RS decoder is operated by a pipelined method. Audio playability is increased by playing a CD-DA disc at the speed of 12X or 16X. For this, subcode sync and data are processed in the same way as main data processing. The overall performance of IC is verified by measuring a transfer rate from the innermost area of disc to the outermost area. At 48-speed, the operating frequency is 210 ㎒, and this chip is fabricated by 0.35 um STD90 cell library of Samsung Electronics.

  • PDF

Energy-Aware Data-Preprocessing Scheme for Efficient Audio Deep Learning in Solar-Powered IoT Edge Computing Environments (태양 에너지 수집형 IoT 엣지 컴퓨팅 환경에서 효율적인 오디오 딥러닝을 위한 에너지 적응형 데이터 전처리 기법)

  • Yeontae Yoo;Dong Kun Noh
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.4
    • /
    • pp.159-164
    • /
    • 2023
  • Solar energy harvesting IoT devices prioritize maximizing the utilization of collected energy due to the periodic recharging nature of solar energy, rather than minimizing energy consumption. Meanwhile, research on edge AI, which performs machine learning near the data source instead of the cloud, is actively conducted for reasons such as data confidentiality and privacy, response time, and cost. One such research area involves performing various audio AI applications using audio data collected from multiple IoT devices in an IoT edge computing environment. However, in most studies, IoT devices only perform sensing data transmission to the edge server, and all processes, including data preprocessing, are performed on the edge server. In this case, it not only leads to overload issues on the edge server but also causes network congestion by transmitting unnecessary data for learning. On the other way, if data preprocessing is delegated to each IoT device to address this issue, it leads to another problem of increased blackout time due to energy shortages in the devices. In this paper, we aim to alleviate the problem of increased blackout time in devices while mitigating issues in server-centric edge AI environments by determining where the data preprocessed based on the energy state of each IoT device. In the proposed method, IoT devices only perform the preprocessing process, which includes sound discrimination and noise removal, and transmit to the server if there is more energy available than the energy threshold required for the basic operation of the device.

MPEG-2 AAC Encoder Implementation Using a floating-Point DSP (부동 소수점 DSP를 이용한 MPEG-2 AAC 부호차기 구현)

  • Kim Seung-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.7
    • /
    • pp.882-888
    • /
    • 2005
  • MPEG-2 Advanced Audio Coding (AAC) has already been standardized as a sophisticated next generation technology AAC provides an audio signal that has CD quality at 96-128kbps/stereo. This paper describes a high-quality and efficient software implementation of an MPEG-2 AAC LC Profile encoder. Common scalefactor and noisless coding are accelerated by $45\%$ and $27\%$, respectively, through the use of TMS320C30 instructions. The implemented encoder uses 7.5kWords of program memory, 18kWords of data ROM and 92kBytes of data RAM, respectively. The results of subjective Qualify test showed that the sound quality achieved at 96kbps/stereo was equivalent to that of MP3 at 128kbps/stereo.

  • PDF

Collision Hazards Detection for Construction Workers Safety Using Equipment Sound Data

  • Elelu, Kehinde;Le, Tuyen;Le, Chau
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.736-743
    • /
    • 2022
  • Construction workers experience a high rate of fatal incidents from mobile equipment in the industry. One of the major causes is the decline in the acoustic condition of workers due to the constant exposure to construction noise. Previous studies have proposed various ways in which audio sensing and machine learning techniques can be used to track equipment's movement on the construction site but not on the audibility of safety signals. This study develops a novel framework to help automate safety surveillance in the construction site. This is done by detecting the audio sound at a different signal-to-noise ratio of -10db, -5db, 0db, 5db, and 10db to notify the worker of imminent dangers of mobile equipment. The scope of this study is focused on developing a signal processing model to help improve the audible sense of mobile equipment for workers. This study includes three-phase: (a) collect audio data of construction equipment, (b) develop a novel audio-based machine learning model for automated detection of collision hazards to be integrated into intelligent hearing protection devices, and (c) conduct field experiments to investigate the system' efficiency and latency. The outcomes showed that the proposed model detects equipment correctly and can timely notify the workers of hazardous situations.

  • PDF

Efficient DSP Architecture For High- Quality Audio Algorithms (고음질 오디오 알고리즘을 위한 효율적인 DSP 설계)

  • Moon, Jong-Ha;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.5
    • /
    • pp.112-117
    • /
    • 2007
  • This paper presents specialized DSP instructions and their hardware architecture for audio coding algorithms, such as the MPEG-2/4 Advanced Audio Coding(AAC), Dolby AC-3, MPEG-2 Backward Compatible(BC), etc. The proposed architecture is specially designed and optimized for the MDCT/IMDCT(Inverse Modified Discrete Cosine Transform), and Huffman decoding of the AAC decoding algorithm. Performance comparisons show a significant improvement compared with TMS320C62x and ASDSP21060 for the MDCT/IMDCT computation. In addition, the dedicated Huffman decoding accelerator performs decoding and preparing operand in only one cycle. The proposed DPU(Data Processing Unit) consists of 107,860 gates and achieves 150 MIPS.

Area-wise relational knowledge distillation

  • Sungchul Cho;Sangje Park;Changwon Lim
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.501-516
    • /
    • 2023
  • Knowledge distillation (KD) refers to extracting knowledge from a large and complex model (teacher) and transferring it to a relatively small model (student). This can be done by training the teacher model to obtain the activation function values of the hidden or the output layers and then retraining the student model using the same training data with the obtained values. Recently, relational KD (RKD) has been proposed to extract knowledge about relative differences in training data. This method improved the performance of the student model compared to conventional KDs. In this paper, we propose a new method for RKD by introducing a new loss function for RKD. The proposed loss function is defined using the area difference between the teacher model and the student model in a specific hidden layer, and it is shown that the model can be successfully compressed, and the generalization performance of the model can be improved. We demonstrate that the accuracy of the model applying the method proposed in the study of model compression of audio data is up to 1.8% higher than that of the existing method. For the study of model generalization, we demonstrate that the model has up to 0.5% better performance in accuracy when introducing the RKD method to self-KD using image data.

Comparisions of stream activation mechanisms in computer based teleconferencing systems for low delay (지연 축소를 위한 컴퓨터 영상회의 시스템의 시트림 동작 구조 비교)

  • Lee, Gyeong-Hui;Kim, Du-Hyeon;Gang, Min-Gyu;Jeong, Chan-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.363-376
    • /
    • 1997
  • In this paper, we present a hardware architecture and a sofrware architecture for cimputer based teleconferencing systems.And also we analyse stream adtivation mechanisms for them form the viewpoint of delay. MuX that is a multimedia I/O server provides various processing elements for data I/O, synchronization, interleaving and mixing.We describe methods to build teleconferencing systems with the elements and compares the technique using master click with the techniquie using self clock.In the plase of dta input.the technique using self click is berrer than the technique using master clock.When we generate interleved stream from audio and video stream and activate channel objects by periodic audio stream as activation clock, dealy from imput audio stream to imterleved stream is reduced but delay for video stream is not reduced as much as in the case of audio stream.

  • PDF

A Multimedia Conference System with a Hybrid Infrastructure (혼합형 하부 구조를 가진 멀티미디어 회의 시스템)

  • Seong, Mi-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.377-383
    • /
    • 1997
  • This paper presents the design and the implemenration of a Mutiuser Multimedia Confernce System for synchronous groupwork.The infrastructure of this system is a hybrid srchiercture of centralized and replicated archietctures,that is to maintain sharde information cinsistently and to reduce the overhead of network traffic to the central part.The communication control of data for groupwerk managrment is centralized to the virtual node and the communication control of real data such as audio,video,text is replicated.In order ot provide a realtime audio and video processing,this system uses synamic queues and multithreads.

  • PDF

Architecture Design for MPEG-2 AAC Filter bank Decoder using Recursive Structure (Recursive 구조를 이용한 MPEG-2 AAC 복호화기의 필터뱅크 구현)

  • 박세기;강명수;오신범;이채욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.6C
    • /
    • pp.865-873
    • /
    • 2004
  • MPEG-2 Advanced Audio Coding(AAC) is widely used in the multi-channel audio compression standards. And it combines hi인-resolution filter bank prediction techniques, and Huffman coding algorithm to achieve the broadcast-quality audio level at very low data rates. The forward and inverse modified discrete transforms which are operated in the encoder and the decoder of the filter bank need many computations. In this paper, we propose suitable recursive structure at IMDCT processing for MPEG-2 AAC real-time decoder. We confirm the memory, the computation speed and complexity of the proposed structure.

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

  • Park, Taejin;Beack, SeungKwan;Lee, Taejin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.259-266
    • /
    • 2014
  • In this paper, we propose a novel technique for noise robust automatic speech recognition (ASR). The development of ASR techniques has made it possible to recognize isolated words with a near perfect word recognition rate. However, in a highly noisy environment, a distinct mismatch between the trained speech and the test data results in a significantly degraded word recognition rate (WRA). Unlike conventional ASR systems employing Mel-frequency cepstral coefficients (MFCCs) and a hidden Markov model (HMM), this study employ histogram of oriented gradient (HOG) features and a Support Vector Machine (SVM) to ASR tasks to overcome this problem. Our proposed ASR system is less vulnerable to external interference noise, and achieves a higher WRA compared to a conventional ASR system equipped with MFCCs and an HMM. The performance of our proposed ASR system was evaluated using a phonetically balanced word (PBW) set mixed with artificially added noise.