• Title/Summary/Keyword: 스펙트로그램

Search Result 136, Processing Time 0.026 seconds

A study on improving the performance of the machine-learning based automatic music transcription model by utilizing pitch number information (음고 개수 정보 활용을 통한 기계학습 기반 자동악보전사 모델의 성능 개선 연구)

  • Daeho Lee;Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.207-213
    • /
    • 2024
  • In this paper, we study how to improve the performance of a machine learning-based automatic music transcription model by adding musical information to the input data. Where, the added musical information is information on the number of pitches that occur in each time frame, and which is obtained by counting the number of notes activated in the answer sheet. The obtained information on the number of pitches was used by concatenating it to the log mel-spectrogram, which is the input of the existing model. In this study, we use the automatic music transcription model included the four types of block predicting four types of musical information, we demonstrate that a simple method of adding pitch number information corresponding to the music information to be predicted by each block to the existing input was helpful in training the model. In order to evaluate the performance improvement proceed with an experiment using MIDI Aligned Piano Sounds (MAPS) data, as a result, when using all pitch number information, performance improvement was confirmed by 9.7 % in frame-based F1 score and 21.8 % in note-based F1 score including offset.

A Study on Underwater Source Localization Using the Wideband Interference Pattern Matching (수중에서 광대역 간섭 패턴 정합을 이용한 음원의 위치 추정 연구)

  • Chun, Seung-Yong;Kim, Se-Young;Kim, Ki-Man
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.415-425
    • /
    • 2007
  • This paper proposes a method of underwater source localization using the wideband interference patterns matching. By matching two interference patterns in the spectrogram, it is estimated a ratio of the range from source to sensor5, and then this ratio is applied to the Apollonius circle. The Apollonius circle is defined as the locus of all points whose distances from two fixed points are in a constant value so that it is possible to represent the locus of potential source location. The Apollonius circle alone, however still keeps the ambiguity against the correct source location. Therefore another equation is necessary to estimate the unique locus of the source location. By estimating time differences of signal arrivals between source and sensors, the hyperbola equation is used to get the cross point of the two equations, where the point being assumed to be the source position. Simulations are performed to get performances of the proposed algorithm. Also, comparisons with real sea experiment data are made to prove applicability of the algorithm in real environment. The results show that the proposed algorithm successfully estimates the source position within an error bound of 10%.

A Diagnosis system of misalignments of linear motion robots using transfer learning (전이 학습을 이용한 선형 이송 로봇의 정렬 이상진단 시스템)

  • Su-bin Hong;Young-dae Lee;Arum Park;Chanwoo Moon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.801-807
    • /
    • 2024
  • Linear motion robots are devices that perform functions such as transferring parts or positioning devices, and require high precision. In companies that develop linear robot application systems, human workers are in charge of quality control and fault diagnosis of linear robots, and the result and accuracy of a fault diagnosis varies depending on the skill level of the person in charge. Recently, there have been many attempts to utilize artificial intelligence to diagnose faults in industrial devices. In this paper, we present a system that automatically diagnoses linear rail and ball screw misalignment of a linear robot using transfer learning. In industrial systems, it is difficult to obtain a lot of learning data, and this causes a data imbalance problem. In this case, a transfer learning model configured by retraining an established model is widely used. The information obtained by using an acceleration sensor and torque sensor was used, and its usefulness was evaluated for each case. After converting the signal obtained from the sensor into a spectrogram image, the type of abnormality was diagnosed using an image recognition artificial intelligence classifier. It is expected that the proposed method can be used not only for linear robots but also for diagnosing other industrial robots.

Voice onset time in children with bilateral cochlear implants (양측 인공와우이식 아동의 성대진동시작시간 특성)

  • Jeon, Yesol;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.77-86
    • /
    • 2022
  • This study aimed to investigate the voice onset time (VOT) of plosives in the VCV syllables by the place of articulation and phonation type spoken by children with bilateral cochlear implants (CIs) in comparison with children with typical hearing (TH). In all, 15 children with bilateral CIs and 15 children with TH participated in this study, aged between 5 to 10 years. All children produced 9 VCV syllables and their VOT were analyzed by the Praat software. There was no significant difference in mean VOT between children with bilateral CIs and children with TH. However, there was a significant difference in mean VOT by the place of articulation, such that the VOT for velars were longer than those for bilabials and alveolars. Additionally, there was a significant difference in mean VOT by the phonation type, such that the VOT of aspirated consonants were longer than those of lenis and fortis consonants. The results of this study suggest that children with bilateral CIs can distinguish the acoustic properties of plosive consonants and control the speech timing between the structures of the larynx and the oral cavity at a similar level as children with TH.

Characteristics of Vocalizations of Laying Hen Related with Space in Battery Cage (케이지 내 사육 공간의 차이에 따른 산란계의 음성 특성)

  • Son, Seung-Hun;Shin, Ji-Hye;Kim, Min-Jin;Kang, Jeong-Hoon;Rhim, Shin-Jae;Paik, In-Kee
    • Journal of Animal Science and Technology
    • /
    • v.51 no.5
    • /
    • pp.421-426
    • /
    • 2009
  • This study was conducted to clarify the characteristics of vocalization of laying hen related with space in battery cage. The size of cages were classified into control (0.30 m ${\times}$ 0.14 m ${\times}$ 0.55 m, length ${\times}$ width ${\times}$ height), small (0.21 m ${\times}$ 0.14 m ${\times}$ 0.55 m) and large (0.30 m ${\times}$ 0.30 m ${\times}$ 0.55 m) size. Vocalization of 16 individuals of laying hen in each group of Hy-Line Brown (80 week old) were recorded 3 hours per day (10:00am~11:00am, 3:00pm~4:00pm and 7:00pm~8:00pm) using digital recorder and microphone during October 2008 and February 2009. Characteristics of frequency, intensity and duration of vocalization were analyzed by GLM (general linear model) and Duncan's multi-test. There were differences in basic and maximum frequency, and intensity based on analysis of spectrogram and spectrum among different cage sizes. Vocalization of laying hen would be one of the indicators to understand the stress caused by rearing space in batter cage.

A Study on the Classification of Fault Motors using Sound Data (소리 데이터를 이용한 불량 모터 분류에 관한 연구)

  • Il-Sik, Chang;Gooman, Park
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.885-896
    • /
    • 2022
  • Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.

Effective PPG Signal Processing Method for Detecting Emotional Stimulus (감성 자극 판단을 위한 효과적인 PPG 신호 처리 방법)

  • Oh, Dong-Gi;Min, Byung-Seok;Kwon, Sung-Oh;Kim, Hyun-Joong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.5C
    • /
    • pp.393-402
    • /
    • 2012
  • In this study, we propose a signal processing algorithm to measure the arousal level of a human subject using a PPG(Photoplethysmography) sensor. From the measured PPG signals, the arousal level is determined by PPI(Pulse to Pulse Interval) and discrete-time signal processing. We ran psychophysical experiments displaying visual stimuli on TV display while measuring PPG signal from a finger, where the nature landscape scenes were used for restorative effect, and the urban environments were used to stimulate the stress. However, the measured PPG signals may include noise due to subject movement and measurement error, which results in incorrect detections. In this paper, to mitigate the noise impact on stimulus detection, we propose a detecting algorithm using digital signal processing methods and statistics of measured signals. A filter is adopted to remove a high frequency noise and adaptively designed taking into account the statistics of the measured PPG signals. Moreover we employ a hysteresis method to reduce the distortion of PPI in decision of emotional. Via experiment, we show that the proposed scheme reduces signal noise and improves stimulus detection.

A Study on Fuzziness Parameter Selection in Fuzzy Vector Quantization for High Quality Speech Synthesis (고음질의 음성합성을 위한 퍼지벡터양자화의 퍼지니스 파라메타선정에 관한 연구)

  • 이진이
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.2
    • /
    • pp.60-69
    • /
    • 1998
  • This paper proposes a speech synthesis method using Fuzzy VQ, and then study how to make choice of fuzziness value which optimizes (controls) the performance of FVQ in order to obtain the synthesized speech which is closer to the original speech. When FVQ is used to synthesize a speech, analysis stage generates membership function values which represents the degree to which an input speech pattern matches each speech patterns in codebook, and synthesis stage reproduces a synthesized speech, using membership function values which is obtained in analysis stage, fuzziness value, and fuzzy-c-means operation. By comparsion of the performance of the FVQ and VQ synthesizer with simmulation, we show that, although the FVQ codebook size is half of a VQ codebook size, the performance of FVQ is almost equal to that of VQ. This results imply that, when Fuzzy VQ is used to obtain the same performance with that of VQ in speech synthesis, we can reduce by half of memory size at a codebook storage. And then we have found that, for the optimized FVQ with maximum SQNR in synthesized speech, the fuzziness value should be small when the variance of analysis frame is relatively large, while fuzziness value should be large, when it is small. As a results of comparsion of the speeches synthesized by VQ and FVQ in their spectrogram of frequency domain, we have found that spectrum bands(formant frequency and pitch frequency) of FVQ synthesized speech are closer to the original speech than those using VQ.

  • PDF

Design and Implementation of CW Radar-based Human Activity Recognition System (CW 레이다 기반 사람 행동 인식 시스템 설계 및 구현)

  • Nam, Jeonghee;Kang, Chaeyoung;Kook, Jeongyeon;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.5
    • /
    • pp.426-432
    • /
    • 2021
  • Continuous wave (CW) Doppler radar has the advantage of being able to solve the privacy problem unlike camera and obtains signals in a non-contact manner. Therefore, this paper proposes a human activity recognition (HAR) system using CW Doppler radar, and presents the hardware design and implementation results for acceleration. CW Doppler radar measures signals for continuous operation of human. In order to obtain a single motion spectrogram from continuous signals, an algorithm for counting the number of movements is proposed. In addition, in order to minimize the computational complexity and memory usage, binarized neural network (BNN) was used to classify human motions, and the accuracy of 94% was shown. To accelerate the complex operations of BNN, the FPGA-based BNN accelerator was designed and implemented. The proposed HAR system was implemented using 7,673 logics, 12,105 registers, 10,211 combinational ALUTs, and 18.7 Kb of block memory. As a result of performance evaluation, the operation speed was improved by 99.97% compared to the software implementation.

Acoustic features of diphthongs produced by children with speech sound disorders (말소리장애 아동이 산출한 이중모음의 음향학적 특성)

  • Cho, Yoon Soo;Pyo, Hwa Young;Han, Jin Soon;Lee, Eun Ju
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.65-72
    • /
    • 2021
  • The aim of this study is to prepare basic data that can be used for evaluation and intervention by investigating the characteristics of diphthongs produced by children with speech sound disorders. To confirm this, two groups of 10 children each, with and without speech sound disorders were asked to imitate the meaningless two-syllable 'diphthongs + da'. The slope of F1 and F2, amount of change of formant, and duration of glide were analyzed by Praat (version 6.1.16). As a result, the difference between the two groups was found in the slope of F1 of /ju/. Children with speech sound disorders had smaller changes in formants and shorter duration time values compared to normal children, and there were statistically significant differences. The amount of change in formant in the glide was found in F1 of /ju, jɛ/, F2 of /jɑ, jɛ/, and there were significant differences in the duration of glide in /ju, jɛ/. The results of this study showed that the range of articulation of diphthongs in children with speech sound disorders is relatively smaller than that of normal children, thus the time it takes to articulate was reduced. These results suggest that the range of articulation and acoustic analysis should be further investigated for evaluation and intervention regarding diphthongs of children with speech sound disorders.