• Title/Summary/Keyword: spectrograms

Search Result 60, Processing Time 0.024 seconds

Quality Improvement of Low-Bitrate HE-AAC Encoder (HE-AAC 부호화의 저비트율에서 음질향상 기법)

  • Kim, Jeong-Geun;Lee, Jae-Seong;Lee, Tae-Jin;Kang, Kyeong-Ok;Park, Young-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.2
    • /
    • pp.66-74
    • /
    • 2008
  • In this paper, we propose new techniques that can improve the quality of AAC and SBR encoders comprised in low bitrate HE-AAC. To reduce the pre-echo artifacts often occurring for transient blocks in AAC, we propose an extended Temporal Noise Shaping (sTNS) in which the frequency range is selectively extended down to the low-frequency region. Also, for he high-frequency region being coded by SBR encoder, tones are identified through a sinusoidal modeling and their frequencies are adjusted within the QMF band in order to reduce the noise floor due to aliasing. Spectrograms of the decoded signals were compared and listening tests were conducted to evaluate the proposed algorithm. Results confirmed the effectiveness of the proposed algorithm.

Studies on Photosensitive Polymers (X). Studies on Photosensitivity and Spectral Sensitivity of Naphthoquinone-1,2-diazide-5-sulfonyl Esters (感光性 樹脂에 關한 硏究 (第10報). Naphthoquinone-1,2-diazide-5-sulfonyl Esters의 感光性과 分光感度)

  • Shim Jyong Sup;Kang Doo Whan
    • Journal of the Korean Chemical Society
    • /
    • v.19 no.4
    • /
    • pp.269-279
    • /
    • 1975
  • Photosensitive properties of naphthoquinone-1,2-diazide-5-sulfonyl esters (PGND, BEND and PVAND) of polyglyceryl phthalate(PG), bisphenol A-epichlorohydrin condensate(BE) and polyvinyl alcohol(PVA) were investigated by the change of solubility before and after exposing to light. Various samples coated on glass or quartz plates were exposed to light under various conditions and steeped in aqueous alkali solution, and then the yield of residual film(W/W0) was determined. The yield of residual film, which was closely related to the sensitivity of the film, was affected by the degree of polymerization of the backbone resin, sensitizers and their concentration. In polymer homologs, the sensitivity was dependent on the degree of polymerization(the higher, the better). And also, it was most effective when 5 % of sensitizers to esters was used. The minimum exposed time was 0.6 min. for PGND-1, 1.0 min. for BEND-1, and 3.0 min. for PVAND-1. Most effective sensitizers for PGND, BEND and PVAND among those used here were benzanthrone, 5-nitroacenaphthene and picramide, respectively. The spectral sensitivities of PGND, BEND and PVAND were examined by comparing their spectrograms with UV-spectra in a solid state. Also, the sensitization and spectral sensitivity of the above polymers were studied. All the polymers containing the sensitizers showed optical sensitization. From the fact that in either case of sensitized or unsensitized sample, the ranges of absorption-maximum wave length were almost consistent with sensitivity maximum wave length, it was proved that the light absorbed by a sample served efficiently for photochemical reactions. Benzanthrone was found to be an excellent sensitizer for PGND.

  • PDF

Study on Discrimination between Natural Earthquakes and Man-made Explosions using Wonju KSRS Data (원주 KSRS 자료를 이용한 자연지진과 인공지진 구별에 관한 연구)

  • Kang, Ik-Bum;Kim, Sung-Bae;Suh, Man-Cheol;Jun, Myung-Soon
    • Journal of the Korean Geophysical Society
    • /
    • v.3 no.1
    • /
    • pp.25-36
    • /
    • 2000
  • 3-D Spectrograms for 22 events are drawn to discern about whether those are earthquakes or explosions. Generally, in case of explosions relative to the case of earthquakes, amplitude of P phase is more dominantly shown. According to the results on logarithm of spectral ratio of P (Pn, Pg)/Lg after removing free-surface effects from 3-D (U-D, N-S, E-W) seismogram, $-1.2{\sim}-0.9$ is shown for earthquakes and $-0.7{\sim}-0.1$ if shown for explosions. This result is consistent with previous researches (Kim Park, 1997) that -0.6 of spectral ratio between P and Lg after taking logarithm may be the criterion for the discrimination between earthquakes and explosions in Korea. In addition, Complexity is applied to two events as another discrimination method. The value of Complexity of explosion is much smaller than that of earthquake. This may be due to well-developed P-wave in explosion compared to that in earthquake. This result is in accordance with that of 3-D Spectrogram.

  • PDF

Acoustic Analysis for Thermal Environment-related Vocalizations in Laying Hens (산란계의 열환경별 특이음에 대한 음성학적 분석)

  • Jeon, J.H.;Yeon, S.C.;Ha, J.K.;Lee, S.J.;Chang, H.H.
    • Journal of Animal Science and Technology
    • /
    • v.47 no.4
    • /
    • pp.697-702
    • /
    • 2005
  • The aim of this study was to divide vocalizations of laying hens (Hy-Line Brown) into general vocalizations (GVs), heat stress-related vocalization (HSV), and cold stress-related vocalizations (CSVs) and to determine if they are classified by the discriminant function analysis method. Thirty laying hens, 65-wk-old, were recorded using digital video recorders 2 times from 10:00 to 14:00 h in each thermal environment (thermoneutral: $22.0{\pm}1.8^{\circ}C$, too hot: $32.0{\pm}2.0^{\circ}C$, too cold: $8.0{\pm}1.9^{\circ}C)$ after a 7 day acclimation period. When the laying hens were not recorded, they were kept in thermoneutral conditions. The GVs, HSV, and CSVs were divided based on the shapes of spectrums and spectrograms. The GVs, HSV, and CSVs were identified as 5, 1, and 3 types, respectively. Pitch, intensity, duration, formant 1, formant 2, formant 3, and formant 4 among the thermal environment-related vocalizations were significantly different (P<0.001). The discrimination rate determined by discriminant function analysis was 86.2%. These results suggest that HSV and CSVs are present and may be used as an indicator of the thermal environment.

Differentiation of Adductor-Type Spasmodic Dysphonia from Muscle Tension Dysphonia Using Spectrogram (스펙트로그램을 이용한 내전형 연축성 발성 장애와 근긴장성 발성 장애의 감별)

  • Noh, Seung Ho;Kim, So Yean;Cho, Jae Kyung;Lee, Sang Hyuk;Jin, Sung Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.28 no.2
    • /
    • pp.100-105
    • /
    • 2017
  • Background and Objectives : Adductor type spasmodic dysphonia (ADSD) is neurogenic disorder and focal laryngeal dystonia, while muscle tension dysphonia (MTD) is caused by functional voice disorder. Both ADSD and MTD may be associated with excessive supraglottic contraction and compensation, resulting in a strained voice quality with spastic voice breaks. The aim of this study was to determine the utility of spectrogram analysis in the differentiation of ADSD from MTD. Materials and Methods : From 2015 through 2017, 17 patients of ADSD and 20 of MTD, underwent acoustic recording and phonatory function studies, were enrolled. Jitter (frequency perturbation), Shimmer (amplitude perturbation) were obtained using MDVP (Multi-dimensional Voice Program) and GRBAS scale was used for perceptual evaluation. The two speech therapist evaluated a wide band (11,250 Hz) spectrogram by blind test using 4 scales (0-3 point) for four spectral findings, abrupt voice breaks, irregular wide spaced vertical striations, well defined formants and high frequency spectral noise. Results : Jitter, Shimmer and GRBAS were not found different between two groups with no significant correlation (p>0.05). Abrupt voice breaks and irregular wide spaced vertical striations of ADSD were significantly higher than those of MTD with strong correlation (p<0.01). High frequency spectral noise of MTD were higher than those of ADSD with strong correlation (p<0.01). Well defined formants were not found different between two groups. Conclusion : The wide band spectrograms provided visual perceptual information can differentiate ADSD from MTD. Spectrogram analysis is a useful diagnostic tool for differentiating ADSD from MTD where perceptual analysis and clinical evaluation alone are insufficient.

  • PDF

Influence of standard Korean and Gyeongsang regional dialect on the pronunciation of English vowels (표준어와 경상 지역 방언의 한국어 모음 발음에 따른 영어 모음 발음의 영향에 대한 연구)

  • Jang, Soo-Yeon
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.1-7
    • /
    • 2021
  • This study aims to enhance English pronunciation education for Korean students by examining the impact of standard Korean and Gyeongsang regional dialect on the articulation of English vowels. Data were obtained through the Korean-Spoken English Corpus (K-SEC). Seven Korean words and ten English mono-syllabic words were uttered by adult, male speakers of standard Korean and Gyeongsang regional dialect, in particular, speakers with little to no experience living abroad were selected. Formant frequencies of the recorded corpus data were measured using spectrograms, provided by the speech analysis program, Praat. The recorded data were analyzed using the articulatory graph for formants. The results show that in comparison with speakers using standard Korean, those using the Gyeongsang regional dialect articulated both Korean and English vowels in the back. Moreover, the contrast between standard Korean and Gyeongsang regional dialect in the pronunciation of Korean vowels (/으/, /어/) affected how the corresponding English vowels (/ə/, /ʊ/) were articulated. Regardless of the use of regional dialect, a general feature of vowel pronunciation among Korean people is that they show more narrow articulatory movements, compared with that of native English speakers. Korean people generally experience difficulties with discriminating tense and lax vowels, whereas native English speakers have clear distinctions in vowel articulation.

A study on loss combination in time and frequency for effective speech enhancement based on complex-valued spectrum (효과적인 복소 스펙트럼 기반 음성 향상을 위한 시간과 주파수 영역 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.38-44
    • /
    • 2022
  • Speech enhancement is performed to improve intelligibility and quality of the noise-corrupted speech. In this paper, speech enhancement performance was compared using different loss functions in time and frequency domains. This study proposes a combination of loss functions to utilize advantage of each domain by considering both the details of spectrum and the speech waveform. In our study, Scale Invariant-Source to Noise Ratio (SI-SNR) is used for the time domain loss function, and Mean Squared Error (MSE) is used for the frequency domain, which is calculated over the complex-valued spectrum and magnitude spectrum. The phase loss is obtained using the sin function. Speech enhancement result is evaluated using Source-to-Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). In order to confirm the result of speech enhancement, resulting spectrograms are also compared. The experimental results over the TIMIT database show the highest performance when using combination of SI-SNR and magnitude loss functions.

Multiple damage detection of maglev rail joints using time-frequency spectrogram and convolutional neural network

  • Wang, Su-Mei;Jiang, Gao-Feng;Ni, Yi-Qing;Lu, Yang;Lin, Guo-Bin;Pan, Hong-Liang;Xu, Jun-Qi;Hao, Shuo
    • Smart Structures and Systems
    • /
    • v.29 no.4
    • /
    • pp.625-640
    • /
    • 2022
  • Maglev rail joints are vital components serving as connections between the adjacent F-type rail sections in maglev guideway. Damage to maglev rail joints such as bolt looseness may result in rough suspension gap fluctuation, failure of suspension control, and even sudden clash between the electromagnets and F-type rail. The condition monitoring of maglev rail joints is therefore highly desirable to maintain safe operation of maglev. In this connection, an online damage detection approach based on three-dimensional (3D) convolutional neural network (CNN) and time-frequency characterization is developed for simultaneous detection of multiple damage of maglev rail joints in this paper. The training and testing data used for condition evaluation of maglev rail joints consist of two months of acceleration recordings, which were acquired in-situ from different rail joints by an integrated online monitoring system during a maglev train running on a test line. Short-time Fourier transform (STFT) method is applied to transform the raw monitoring data into time-frequency spectrograms (TFS). Three CNN architectures, i.e., small-sized CNN (S-CNN), middle-sized CNN (M-CNN), and large-sized CNN (L-CNN), are configured for trial calculation and the M-CNN model with excellent prediction accuracy and high computational efficiency is finally optioned for multiple damage detection of maglev rail joints. Results show that the rail joints in three different conditions (bolt-looseness-caused rail step, misalignment-caused lateral dislocation, and normal condition) are successfully identified by the proposed approach, even when using data collected from rail joints from which no data were used in the CNN training. The capability of the proposed method is further examined by using the data collected after the loosed bolts have been replaced. In addition, by comparison with the results of CNN using frequency spectrum and traditional neural network using TFS, the proposed TFS-CNN framework is proven more accurate and robust for multiple damage detection of maglev rail joints.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

Underwater Target Localization Using the Interference Pattern of Broadband Spectrogram Estimated by Three Sensors (3개 센서의 광대역 신호 스펙트로그램에 나타나는 간섭패턴을 이용한 수중 표적의 위치 추정)

  • Kim, Se-Young;Chun, Seung-Yong;Kim, Ki-Man
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.173-181
    • /
    • 2007
  • In this paper, we propose a moving target localization algorithm using acoustic spectrograms. A time-versus-frequency spectrogram provide a information of trajectory of the moving target in underwater. For a source at sufficiently long range from a receiver, broadband striation patterns seen in spectrogram represents the mutual interference between modes which reflected by surface and bottom. The slope of the maximum intensity striation is influenced by waveguide invariant parameter ${\beta}$ and distance between target and sensor. When more than two sensors are applied to measure the moving ship-radited noise, the slope and frequency of the maximum intensity striation are depend on distance between target and receiver. We assumed two sensors to fixed point then form a circle of apollonios which set of all points whose distances from two fixed points are in a constant ratio. In case of three sensors are applied, two circle form an intersection point so coordinates of this point can be estimated as a position of target. To evaluates a performance of the proposed localization algorithm, simulation is performed using acoustic propagation program.