• Title/Summary/Keyword: Digital Audio

Search Result 623, Processing Time 0.031 seconds

Design of a Inverter-Based 3rd Order ΔΣ Modulator Using 1.5bit Comparators (1.5비트 비교기를 이용한 인버터 기반 3차 델타-시그마 변조기)

  • Choi, Jeong Hoon;Seong, Jae Hyeon;Yoon, Kwang Sub
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.7
    • /
    • pp.39-46
    • /
    • 2016
  • This paper describes the third order feedforward delta-sigma modulator with inverter-based integrators and a 1.5bit comparator for the application of audio signal processing. The proposed 3rd-order delta-sigma modulator is multi-bit structure using 1.5 bit comparator instead of operational amplifier. This delta-sigma modulator has high SNR compared with single-bit 4th-order delta-sigma modulator in a low OSR. And it minimizes power consumes and simplified circuit structure using inverter-based integrator and using inverter-based integrator as analogue adder. The modulator was designed with 0.18um CMOS standard process and total chip area is $0.36mm^2$. The measured power cosumption is 28.8uW in a 0.8V analog supply and 66.6uW in a 1.8V digital supply. The measurement result shows that the peak SNDR of 80.7 dB, the ENOB of 13.1bit and the dynamic range of 86.1 dB with an input signal frequency of 2.5kHz, a sampling frequency of 2.56MHz and an oversampling rate of 64. The FOM (Walden) from the measurement result is 269 fJ/step, FOM (Schreier) was calculated as 169.3 dB.

Research for Characteristics of Sound Localization at Monaural System Using Acoustic Energy (청각에너지를 이용한 모노럴 시스템에서의 음상 정위 특성 연구)

  • Koo, Kyo-Sik;Cha, Hyung-Tai
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.4
    • /
    • pp.181-189
    • /
    • 2011
  • According to developments of digital signal processing, 3D sound come into focus on multimedia systems. Many studies on 3d sound have proposed lots of clues to create realistic sounds. But these clues are only focused on binaural systems which two ears are normal. If we make the 3d sound using those clues at monaural systems, the performance goes down dramatically. In order to use the clues for monaural systems, we have studies algorithms such as duplex theory. In duplex theory, the sounds that we listen are affected by human's body, pinna and shoulder. So, we can enhance sound localization performances using its characteristics. In this paper, we propose a new method to use psychoacoustic theory that creates realistic 3D audio at monaural systems. To improve 3d sound, we calculate the excitation energy rates of each symmetric HRTF and extract the weights in each bark range. Finally, they are applied to emphasize the characteristics related to each direction. Informal listening tests show that the proposed method improves sound localization performances much better than the conventional methods.

A Novel Query-by-Singing/Humming Method by Estimating Matching Positions Based on Multi-layered Perceptron

  • Pham, Tuyen Danh;Nam, Gi Pyo;Shin, Kwang Yong;Park, Kang Ryoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.7
    • /
    • pp.1657-1670
    • /
    • 2013
  • The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about the title or singer of song. Most previous researches on QbSH have been conducted using musical instrument digital interface (MIDI) files as reference songs. However, the production of MIDI files is a time-consuming process. In addition, more and more music files are newly published with the development of music market. Consequently, the method of using the more common MPEG-1 audio layer 3 (MP3) files for reference songs is considered as an alternative. However, there is little previous research on QbSH with MP3 files because an MP3 file has a different waveform due to background music and multiple (polyphonic) melodies compared to the humming/singing query. To overcome these problems, we propose a new QbSH method using MP3 files on mobile device. This research is novel in four ways. First, this is the first research on QbSH using MP3 files as reference songs. Second, the start and end positions on the MP3 file to be matched are estimated by using multi-layered perceptron (MLP) prior to performing the matching with humming/singing query file. Third, for more accurate results, four MLPs are used, which produce the start and end positions for dynamic time warping (DTW) matching algorithm, and those for chroma-based DTW algorithm, respectively. Fourth, two matching scores by the DTW and chroma-based DTW algorithms are combined by using PRODUCT rule, through which a higher matching accuracy is obtained. Experimental results with AFA MP3 database show that the accuracy (Top 1 accuracy of 98%, with an MRR of 0.989) of the proposed method is much higher than that of other methods. We also showed the effectiveness of the proposed system on consumer mobile device.

Content Based Video Retrieval by Example Considering Context (문맥을 고려한 예제 기반 동영상 검색 알고리즘)

  • 박주현;낭종호;김경수;하명환;정병희
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.756-771
    • /
    • 2003
  • Digital Video Library System which manages a large amount of multimedia information requires efficient and effective retrieval methods. In this paper, we propose and implement a new video search and retrieval algorithm that compares the query video shot with the video shots in the archives in terms of foreground object, background image, audio, and its context. The foreground object is the region of the video image that has been changed in the successive frames of the shot, the background image is the remaining region of the video image, and the context is the relationship between the low-level features of the adjacent shots. Comparing these features is a result of reflecting the process of filming a moving picture, and it helps the user to submit a query focused on the desired features of the target video clips easily by adjusting their weights in the comparing process. Although the proposed search and retrieval algorithm could not totally reflect the high level semantics of the submitted query video, it tries to reflect the users' requirements as much as possible by considering the context of video clips and by adjusting its weight in the comparing process.

Multiplexing of UHDTV Based on MPEG-2 TS (MPEG-2 TS 기반의 UHDTV 다중화)

  • Jang, Euy-Doc;Park, Dong-Il;Kim, Jae-Gon;Lee, Eung-Don;Cho, Suk-Hee;Choi, Jin-Soo
    • Journal of Broadcast Engineering
    • /
    • v.15 no.2
    • /
    • pp.205-216
    • /
    • 2010
  • In this paper, a method of MPEG-2 Transport Stream (TS) multiplexing for Ultra HDTV (UHDTV) and its design and implementation as a SW tool is described. In practice, UHD video may be divided into several HD videos and each video is encoded in parallel. Therefore, it is necessary to synchronize and multiplex multiple bitstreams encoding each HD video for transmitting and storing UHD video. In this paper, it is assumed that 4 HD videos partitioning a UHD spatially are encoded as H.264/AVC and two 5.0 channel audios are encoded by AC-3. Therefore, 4 H.264/AVC elementary streams (ESs) and 2 AC-3 ESs is mainly considered in the TS multiplexing of UHD. For the carriage of H.264/AVC and AC-3 over MPEG-2 TS, PES packetization and TS multiplexing are designed and implemented based on the extended specification of the MPEG-2 Systems and ATSC (Digital audio compressed standard), respectively. The implemented UHD TS multiplexing tool emulates real time HW operation in the time unit corresponding to the duration of one TS packet transmission in a given TS rate. In particular, in order to satisfy the timing model, the buffers defined in the TS System Target Decoder (T-STD) are monitored and their statuses are considered in the scheduling of TS multiplexing. For UHD multiplexing, two kinds of multiplexing structures, which are UHD re-multiplexing and UHD program multiplexing, are implemented and their strength and weakness are investigated. The developed UHD TS multiplexing tool is tested and verified in terms of the syntax and semantics conformance and functionalities by using a commercial analyzer and real-time presentation tools.

Research about Imaginary Line Extension Application in Composition of TV News - With Special Quality of Imaginary Line in Focus - (TV News 영상구성에서 Imaginary Line 확대 적용에 관한 연구 - 이미지너리 라인의 특성을 중심으로 -)

  • Lim, Pyung-Jong;Kwak, Hoon-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.9
    • /
    • pp.55-65
    • /
    • 2008
  • At these information age when the importance of news is of particular emphasis, the field of image-production for the news are being made rapid progressive by high-tech like multi-media, multi-channel digital system. Even experts who have engaged in the work of broadcasting in th field for a long time are perplexed with rapid development in Broadcasting equipments and expression techniques. The field of TV is characterized by the speed of change and the desire of viewers for new and interesting video images. The image expression system applying image line has ever existed as one of conventional image expression methods. Obsolete and old image expressions are paling into significance for viewers who want to access more information in a short time. but The change of image expression systems due to the progressive stream of time has forced existing imaginary to be changed constantly to accommodate the changing interests and expectations of the viewers. Therefore, in this treatise, we need a broad interpretation about the direction of this imaginary line for TV news image in that existing systems of image producing haven’t also been changed and adapted to the stream of time. In these days, image is defined as not only video, but also audio. also We need to reduce the confusion concerning the imaginary line and contribute to a correct understanding images of TV news for not only customers but also producer by extending and applying the concept of imaginary line to image producing.

Implementation of Character and Object Metadata Generation System for Media Archive Construction (미디어 아카이브 구축을 위한 등장인물, 사물 메타데이터 생성 시스템 구현)

  • Cho, Sungman;Lee, Seungju;Lee, Jaehyeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1076-1084
    • /
    • 2019
  • In this paper, we introduced a system that extracts metadata by recognizing characters and objects in media using deep learning technology. In the field of broadcasting, multimedia contents such as video, audio, image, and text have been converted to digital contents for a long time, but the unconverted resources still remain vast. Building media archives requires a lot of manual work, which is time consuming and costly. Therefore, by implementing a deep learning-based metadata generation system, it is possible to save time and cost in constructing media archives. The whole system consists of four elements: training data generation module, object recognition module, character recognition module, and API server. The deep learning network module and the face recognition module are implemented to recognize characters and objects from the media and describe them as metadata. The training data generation module was designed separately to facilitate the construction of data for training neural network, and the functions of face recognition and object recognition were configured as an API server. We trained the two neural-networks using 1500 persons and 80 kinds of object data and confirmed that the accuracy is 98% in the character test data and 42% in the object data.

Development of Hardware for the Architecture of A Remote Vital Sign Monitor (무선 체온 모니터기 아키텍처 하드웨어 개발)

  • Jang, Dong-Wook;Jang, Sung-Whan;Jeong, Byoung-Jo;Cho, Hyun-Seob
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.7
    • /
    • pp.2549-2558
    • /
    • 2010
  • A Remote Vital Sign Monitor is an in-home healthcare system designed to wirelessly monitor core-body temperature. The Remote Vital Sign Monitor provides accuracy and features which are comparable to hospital equipment while minimizing cost with ease-of-use. It has two parts, a bandage and a monitor. The bandage and the monitor both use the Chipcon2430(CC2430) which contains an integrated 2.4GHz Direct Sequence Spread Spectrum radio. The CC2430 allows Remote Vital Sign Monitor to operate at over a 100-foot indoor radius. A simple user interface allows the user to set an upper temperature and a lower temperature that is monitored with respect to the core-body temperature. If the core-body temperature exceeds the one of two defined temperatures, the alarm will sound. The alarm is powered by a low-voltage audio amplifier circuit which is connected to a speaker. In order to accurately calculate the core-body temperature, the Remote Vital Sign Monitor must utilize an accurate temperature sensing device. The thermistor selected from GE Sensing satisfies the need for a sensitive and accurate temperature reading. The LCD monitor has a screen size that measures 64.5mm long by 16.4mm wide and also contains back light, and this should allow the user to clearly view the monitor from at least 3 feet away in both light and dark situations.

Development of a Listener Position Adaptive Real-Time Sound Reproduction System (청취자 위치 적응 실시간 사운드 재생 시스템의 개발)

  • Lee, Ki-Seung;Lee, Seok-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.7
    • /
    • pp.458-467
    • /
    • 2010
  • In this paper, a new audio reproduction system was developed in which the cross-talk signals would be reasonably cancelled at an arbitrary listener position. To adaptively remove the cross-talk signals according to the listener's position, a method of tracking the listener position was employed. This was achieved using the two microphones, where the listener direction was estimated using the time-delay between the two signals from the two microphones, respectively. Moreover, room reverberation effects were taken into consideration where linear prediction analysis was involved. To remove the cross-talk signals at the left-and right-ears, the paths between the sources and the ears were represented using the KEMAR head-related transfer functions (HRTFs) which were measured from the artificial dummy head. To evaluate the usefulness of the proposed listener tracking system, the performance of cross-talk cancellation was evaluated at the estimated listener positions. The performance was evaluated in terms of the channel separation ration (CSR), a -10 dB of CSR was experimentally achieved although the listener positions were more or less deviated. A real-time system was implemented using a floating-point digital signal processor (DSP). It was confirmed that the average errors of the listener direction was 5 degree and the subjects indicated that 80 % of the stimuli was perceived as the correct directions.

Sound Engine for Korean Traditional Instruments Using General Purpose Digital Signal Processor (범용 디지털 신호처리기를 이용한 국악기 사운드 엔진 개발)

  • Kang, Myeong-Su;Cho, Sang-Jin;Kwon, Sun-Deok;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.229-238
    • /
    • 2009
  • This paper describes a sound engine of Korean traditional instruments, which are the Gayageum and Taepyeongso, by using a TMS320F2812. The Gayageum and Taepyeongso models based on commuted waveguide synthesis (CWS) are required to synthesize each sound. There is an instrument selection button to choose one of instruments in the proposed sound engine, and thus a corresponding sound is produced by the relative model at every certain time. Every synthesized sound sample is transmitted to a DAC (TLV5638) using SPI communication, and it is played through a speaker via an audio interface. The length of the delay line determines a fundamental frequency of a desired sound. In order to determine the length of the delay line, it is needed that the time for synthesizing a sound sample should be checked by using a GPIO. It takes $28.6{\mu}s$ for the Gayageum and $21{\mu}s$ for the Taepyeongso, respectively. It happens that each sound sample is synthesized and transferred to the DAC in an interrupt service routine (ISR) of the proposed sound engine. A timer of the TMS320F2812 has four events for generating interrupts. In this paper, the interrupt is happened by using the period matching event of it, and the ISR is called whenever the interrupt happens, $60{\mu}s$. Compared to original sounds with their spectra, the results are good enough to represent timbres of instruments except 'Mu, Hwang, Tae, Joong' of the Taepyeongso. Moreover, only one sound is produced when playing the Taepyeongso and it takes $21{\mu}s$ for the real-time playing. In the case of the Gayageum, players usually use their two fingers (thumb and middle finger or thumb and index finger), so it takes $57.2{\mu}s$ for the real-time playing.