• Title/Summary/Keyword: sound information

Search Result 1,718, Processing Time 0.03 seconds

CNN based Complex Spectrogram Enhancement in Multi-Rotor UAV Environments (멀티로터 UAV 환경에서의 CNN 기반 복소 스펙트로그램 향상 기법)

  • Kim, Young-Jin;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.4
    • /
    • pp.459-466
    • /
    • 2020
  • The sound collected through the multi-rotor unmanned aerial vehicle (UAV) includes the ego noise generated by the motor or propeller, or the wind noise generated during the flight, and thus the quality is greatly impaired. In a multi-rotor UAV environment, both the magnitude and phase of the target sound are greatly corrupted, so it is necessary to enhance the sound in consideration of both the magnitude and phase. However, it is difficult to improve the phase because it does not show the structural characteristics. in this study, we propose a CNN-based complex spectrogram enhancement method that removes noise based on complex spectrogram that can represent both magnitude and phase. Experimental results reveal that the proposed method improves enhancement performance by considering both the magnitude and phase of the complex spectrogram.

Real Time Monitoring of Smart Baby Bed using Sound Sensor (사운드 센서 이용한 Smart 아기 침대의 실시간 모니터링)

  • Kwon, Mi-Rae;Park, Hwa-Jung;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.230-232
    • /
    • 2021
  • As the ratio of double-income households and parental leave use increase, there is an increasing demand for products that help when raising children alone. In particular, there is a lot of demand for baby beds that help raise children without difficulty even by themselves. Therefore, in this paper, we propose a real-time monitoring of a smart crib using a sound sensor. The proposed bed uses a sound sensor to detect the child's crying and condition, and the measured sensor output value can be checked with a mobile application. When the sound sensor output value is more than a certain value, a voice file such as a lullaby recorded with the voice of the parents is played, and if the sensor output value is less than a certain value, the playing voice file is stopped. If the sensor output value continues to exist after a certain period of time, a pop-up notification is sent to the mobile application. This allows the child to quickly calm down with a sense of stability and comfort through the recorded voices of the parents, and the parents can remotely monitor the child's condition in real time.

  • PDF

Wild Bird Sound Classification Scheme using Focal Loss and Ensemble Learning (Focal Loss와 앙상블 학습을 이용한 야생조류 소리 분류 기법)

  • Jaeseung Lee;Jehyeok Rew
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.2
    • /
    • pp.15-25
    • /
    • 2024
  • For effective analysis of animal ecosystems, technology that can automatically identify the current status of animal habitats is crucial. Specifically, animal sound classification, which identifies species based on their sounds, is gaining great attention where video-based discrimination is impractical. Traditional studies have relied on a single deep learning model to classify animal sounds. However, sounds collected in outdoor settings often include substantial background noise, complicating the task for a single model. In addition, data imbalance among species may lead to biased model training. To address these challenges, in this paper, we propose an animal sound classification scheme that combines predictions from multiple models using Focal Loss, which adjusts penalties based on class data volume. Experiments on public datasets have demonstrated that our scheme can improve recall by up to 22.6% compared to an average of single models.

Psychological Reduction Effect of Road Traffic Noise Perception by the Visual Information of Landscape components (조경요소의 영상을 이용한 도로교통소음 인지도의 심리적인 저감효과에 대한 연구)

  • Kook, Chan;Jang, Gil-Soo;Shin, Yong-kyu
    • KIEAE Journal
    • /
    • v.3 no.2
    • /
    • pp.33-36
    • /
    • 2003
  • The influence of the visual information on the sound perception would be considerable. Furthermore, if the sound perception ranges in noisiness or annoyance beyond the loudness, it will depend much more on the shape of the visual information. This paper aims to estimate the influence of the several kinds of visual information on the perception of road traffic noise by means of the psycho-acoustic test method. The findings of present study on the influence of visual information on subjective noise perception are summarized as follows: Presenting visual images of mild and comfortable scenery reduced the noise perception reaction at the less noisy environments not exceeding 65 dB(A). At highly noisy environments exceeding 65 dB(A), however, the noise perception can be reduced by strong image of waterfall. Even eliminating the road traffic image may be helpful. Visual image of waterfall reduced the noise perception at all levels. It is inferred that the road traffic noise perception can be effectively ameliorated by presenting strong and real landscape images at any noisy environment.

A Study on a Intelligent GIS Monitoring System using the Preventive Diagnostic Technology (예방진단기술을 이용한 지능형 GIS 감시시스템에 관한 연구)

  • Park, Kee-Young;Lee, Jong-Ha;Cho, Sook-Jin;Choi, Hyung-Ki;Jung, Eui-Bung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.6
    • /
    • pp.244-251
    • /
    • 2014
  • In this study, we give a detailed account of normal and abnormal state of GIS(Gas Insulated Switch-gear) using the preventive diagnostic technology. And it is based on the analysis and diagnosis for storing data of GIS by intelligent GIS monitoring system. The wave shape of GIS sound is similar to noise and is systematically generated by discharge and its corona sound. Therefore, in this paper, to classify normal and abnormal GIS sound. We could discriminate between normal and abnormal case using level crossing rate(LCR) and spectrogram energy rate.

Wireless Digital Stethoscope Diagnosis System using Heart Rate (심박수를 이용한 무선 디지털 청진 진단시스템)

  • Park, Kee-Young;Lee, Jong-Ha;Cho, Sook-Jin;Lee, Chul-Hee;Jung, Eui-Bung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.6
    • /
    • pp.237-243
    • /
    • 2014
  • Heart sounds of patient's chest could be heard using an analog stethoscope. However, auscultation of a heart sound can be diagnosed differently by each doctor hearing it. Therefore the condition of each patient is determined by the subjective comments based on the hearing ability of a physician who has years of experience. In this paper, through analysis of heart sound and heart rate of the patient's condition, we will define minutely how to diagnose the condition of patient using a wireless digital stethoscope diagnostic system. And it is possible to perform an objective medical diagnosis by applying LCR (Level Crossing Rate) and to show the relationship of a disease using this system.

Auditory Spatial Arrangement of Object's Position in Virtual and Augmented Environment (가상환경에서의 위치정보 제시를 위한 청각적 공간배열)

  • Lee, Ju-Hwan
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.326-333
    • /
    • 2011
  • In the present study, we measured the performance (accuracy and reaction time) of the user in the virtual environment with see-through Head-Mounted Display system that includes 3D sound generated through Head-Related Transfer Function (HRTF) to investigate the feasibility of auditory display for a certain object's spatial information. To sum up the results of two experiments, when presenting location information of the object with 3D sound, it is desirable that information arrangement from the user should be an orthogonal pattern which is located with right angle, not a diagonal pattern. Like these results propose that spatial information presentation with 3D sound make the optimal object arrangement of virtual environment possible.

Nonnegative Matrix Factorization Based Direction-of-Arrival Estimation of Multiple Sound Sources Using Dual Microphone Array (이중 마이크로폰을 이용한 비음수 행렬분해 기반 다중음원 도래각 예측)

  • Jeon, Kwang Myung;Kim, Hong Kook;Yu, Seung Woo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.2
    • /
    • pp.123-129
    • /
    • 2017
  • This paper proposes a new nonnegative matrix factorization (NMF) based direction-of-arrival (DOA) estimation method for multiple sound sources using a dual microphone array. First of all, sound signals coming from the dual microphone array are segmented into consecutive analysis frames, and a steered-response power phase transform (SRP-PHAT) beamformer is applied to each frame so that stereo signals of each frame are represented in a time-direction domain. The time-direction outputs of SRP-PHAT are stored for a pre-defined number of frames, which is referred to as a time-direction block. Next, In order to estimate DOAs robust to noise, each time-direction block is normalized along the time by using a block subtraction technique. After that, an unsupervised NMF method is applied to the normalized time-direction block in order to cluster the directions of each sound source in a multiple sound source environments. In particular, the activation and basis matrices are used to estimate the number of sound sources and their DOAs, respectively. The DOA estimation performance of the proposed method is evaluated by measuring a mean absolute error (MAE) and the standard deviation of errors between the oracle and estimated DOAs under a three source condition, where the sources are located in [$-35{\circ}$, 5m], [$12{\circ}$, 4m], and [$38{\circ}$, 4.m] from the dual microphone array. It is shown from the experiment that the proposed method could relatively reduce MAE by 56.83%, compared to a conventional SRP-PHAT based DOA estimation method.

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Audio Watermarking through Modification of Tonal Maskers

  • Lee, Hee-Suk;Lee, Woo-Sun
    • ETRI Journal
    • /
    • v.27 no.5
    • /
    • pp.608-616
    • /
    • 2005
  • Watermarking has become a technology of choice for a broad range of multimedia copyright protection applications. This paper proposes an audio watermarking scheme that uses the modified tonal masker as an embedding carrier for imperceptible and robust audio watermarking. The method of embedding is to select one of the tonal maskers using a secret key, and to then modify the frequency signals that consist of the tonal masker without changing the sound pressure level. The modified tonal masker can be found using the same secret key without the original sound, and the embedded information can be extracted. The results show that the frequency signals are stable enough to keep embedded watermarks against various common signal processing types, while at the same time the proposed scheme has a robust performance.

  • PDF