• Title/Summary/Keyword: spectrograms

Search Result 60, Processing Time 0.021 seconds

The Effects of the Speaking Rate on the Duration of Syllable before Boundary (발화속도가 경계앞 음절 길이에 미치는 영향)

  • Lee, Soon-Hyang;Koo, Hee-San
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.103-111
    • /
    • 1997
  • The purpose of this study was to investigate the effect of the speaking rate on the duration of syllable before boundary. The materials used were four types of syllable-boundary sequences(Go-'Ga' Boundary-Gu) in a paragraph. The duration of 'Ga' syllables before 4 level of boundary was measured, and all of the measurements were taken from signals and spectrograms made by the $Signalyze^{TM}$ 3.04 for Power Mac 7200. Subjects were six female speakers who read the materials at fast, normal, and slow speed five times. The results show that (1) the slower the speaking rate becomes, the longer the duration of syllable before boundary, (2) the duration rank of syllable before each boundary does not correspond to the level of boundary, eg. at fast speed, = < #, + < $ ; at normal speed, +, #, = < $ ; at slow speed, + < =, #, $, and (3) the syllable before sentence boundary is less influenced than syllable before another boundary.

  • PDF

Acoustic, Intraoral Air Pressure and EMG Studies of Vowel Devoicing in Korean

  • Kim, Hyun-Gi;Niimi, Sei-Ji
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.3-13
    • /
    • 2003
  • The devoicing vowel is a phonological process whose contrast in sonority is lost or reduces in a particular phonetic environment. Phonetically, the vocal fold vibration originates from the abduction/adduction of the glottis in relation to supraglottal articulatory movements. The purpose of this study is to investigate Korean vowel devoicing by means of experimental instruments. The interrelated laryngeal adjustments and aerodynamic effects for this voicing can clarify the redundant articulatory gestures relevant to the distinctive feature of sonority. Five test words were selected, being composed of the high vowel /i/, between the fricative and strong aspirated or lenis affricated consonants. The subjects uttered the test words successively at a normal or at a faster speed. The EMG, the sensing tube Gaeltec S7b and the High-Speech Analysis system and MSL II were used in these studies. Acoustically, three different types of speech waveforms and spectrograms were classified, based on the voicing variation. The intraoral air pressure curves showed differences, depending on the voicing variations. The activity patterns of the PCA and the CT for devoicing vowels appeared differently from those showing the partially devoicing vowels and the voicing vowels.

  • PDF

Spectral Sensitization and Photographic Characteristics of 9-Phenyl-5,5'-Diphenyl-3,3'-Bis(3-sulfopropyl)Benzoxazolo Carbocyanine Triethyl Ammonium Set (9-Phenyl-5,5'-Diphenyl-3,3'-Bis(3-sulfopropyl)Benzoxazolo Carbocyanine Triethyl Ammonium Salt의 분광증감과 사진특성)

  • Kim, Yeoung-Chan
    • The Journal of Information Technology
    • /
    • v.8 no.3
    • /
    • pp.51-56
    • /
    • 2005
  • In this paper, we describe a study on the relationship between spectral sensitization and photographic characteristics of color paper. The photographic emulsion used in this study is a green-sensitizing emulsion. U.V maximum absorption peak value of 9-phenyl-5,5'-diphenyl-3,3'-bis(3-sulfopropyl)benzoxazolo carbocyanine triethyl ammonium salt was observed 507nm, and wedge spectrograms maximum peak value of color paper added dye to photographic emulsion was shown 553nm. As compared with the absorption peak of the dye in methanol solution, the sensitizing peaks of 9-phenyl-5,5'-diphenyl-3,3'-bis(3-sulfopropyl)benzoxazolo carbocyanine triethyl ammonium salt have red shifts of 46nm. The photographic characteristics of green-sensitizing emulsion were obtained contrast(2.6), speed(48-57), fog(0.08). Therefore, benzoxazolo carbocyanine dye is of industrial importance as green-sensitizing dye in the spectral sensitization of photographic emulsion.

  • PDF

A Study of response Spectrums and characteristics of Time-Frequency Domain of Microearthquakes in the Central Part of South Korea (남한 중부지역 미소지진들의 응답 스펙트럼 및 시간-주파수 영역에서의 특성에 관한 연구)

  • 이전희
    • Proceedings of the Earthquake Engineering Society of Korea Conference
    • /
    • 1999.10a
    • /
    • pp.72-82
    • /
    • 1999
  • The microearthquake and explosion events recorded in the seismic KNUE(Korea National University of Education) network were analyzed. The seismic data were recorded from Dec. 1997 to Dec. 1998. Total of 118 records consisted of 24 earthquake and 4 explosion events were instrumented at 6 stations. Spectral values increases as magnitude increases and the predominant frequency band expands to low frequency. zone as magnitude increases. Three-dimensional spectrograms(time frequency. amplitude) were also synthesized in order to discriminate microearthquakes and artificial underground explosions. The waves from microearthquakes show that frequency content of dominant amplitude appeared above 10 Hz and the discrimination can be performed in almost all the frequency domain of 3-d spectrogram.

  • PDF

Feasibility of Deep Learning-Based Analysis of Auscultation for Screening Significant Stenosis of Native Arteriovenous Fistula for Hemodialysis Requiring Angioplasty

  • Jae Hyon Park;Insun Park;Kichang Han;Jongjin Yoon;Yongsik Sim;Soo Jin Kim;Jong Yun Won;Shina Lee;Joon Ho Kwon;Sungmo Moon;Gyoung Min Kim;Man-deuk Kim
    • Korean Journal of Radiology
    • /
    • v.23 no.10
    • /
    • pp.949-958
    • /
    • 2022
  • Objective: To investigate the feasibility of using a deep learning-based analysis of auscultation data to predict significant stenosis of arteriovenous fistulas (AVF) in patients undergoing hemodialysis requiring percutaneous transluminal angioplasty (PTA). Materials and Methods: Forty patients (24 male and 16 female; median age, 62.5 years) with dysfunctional native AVF were prospectively recruited. Digital sounds from the AVF shunt were recorded using a wireless electronic stethoscope before (pre-PTA) and after PTA (post-PTA), and the audio files were subsequently converted to mel spectrograms, which were used to construct various deep convolutional neural network (DCNN) models (DenseNet201, EfficientNetB5, and ResNet50). The performance of these models for diagnosing ≥ 50% AVF stenosis was assessed and compared. The ground truth for the presence of ≥ 50% AVF stenosis was obtained using digital subtraction angiography. Gradient-weighted class activation mapping (Grad-CAM) was used to produce visual explanations for DCNN model decisions. Results: Eighty audio files were obtained from the 40 recruited patients and pooled for the study. Mel spectrograms of "pre-PTA" shunt sounds showed patterns corresponding to abnormal high-pitched bruits with systolic accentuation observed in patients with stenotic AVF. The ResNet50 and EfficientNetB5 models yielded an area under the receiver operating characteristic curve of 0.99 and 0.98, respectively, at optimized epochs for predicting ≥ 50% AVF stenosis. However, Grad-CAM heatmaps revealed that only ResNet50 highlighted areas relevant to AVF stenosis in the mel spectrogram. Conclusion: Mel spectrogram-based DCNN models, particularly ResNet50, successfully predicted the presence of significant AVF stenosis requiring PTA in this feasibility study and may potentially be used in AVF surveillance.

Noise source localization using comparison between candidate signal and beamformer output in time domain (시간 영역의 빔출력과 후보 신호 사이의 비교를 통한 소음원의 위치 추정)

  • Kim, Koo-Hwan;Kim, Yang-Hann
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2010.10a
    • /
    • pp.543-543
    • /
    • 2010
  • The objective of this research is estimating the location of interested sound source by using the similarity between a beamformer output in time domain and the candidate signal. The waveform of beamformer output at the location of sound source is similar with the waveform emitted by that source. To estimate the location of sound source by using this feature, we define quantified similarity between candidate signal and beamformer output. The candidate signal describes the signal which is generated by interested source. In this paper, similarity is defined by four methods. The two methods use time vector comparison, and the other two methods use time-frequency map or linear prediction coefficients. To figure out the results and performance of localization by using similarities, we demonstrate two conditions. The one is when two pure tone sources exist and the other condition is when several bird sounds exist. As a consequence, inner product with two time-vectors and structural similarity with spectrograms can estimate the locations of interest sound source.

  • PDF

Features Extraction and Mechanism Analysis of Partial Discharge Development under Protrusion Defect

  • Dong, Yu-Lin;Tang, Ju;Zeng, Fu-Ping;Liu, Min
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.1
    • /
    • pp.344-354
    • /
    • 2015
  • In order to study the development of partial discharge (PD) under typical protrusion defects in gas-insulated switchgear, we applied step voltages on the defect and obtained the ${\varphi}-u$ and ${\varphi}-n$ spectrograms of ultra-high frequency (UHF) PD signals in various PD stages. Furthermore, we extracted seven kinds of features to characterize the degree of deterioration of insulation and analyzed their values, variation trends, and change rates. These characteristics were inconsistent with the development of PD. Hence, the differences of these features could describe the severity of PD. In addition, these characteristics could provide integrated characteristics regarding PD development and improve the reliability of PD severity assessment because these characteristics were extracted from different angles. To explain the variation laws of these seven kinds of parameters, we analyzed the relevant physical mechanism by considering the microphysical process of PD formation and development as well as the distortion effect generated by the space charges on the initial field. The relevant physical mechanism effectively allocated PD severity among these features for assessment, and the effectiveness and reliability of using these features to assess PD severity were proved by testing a large number of PD samples.

An Acoustic Study on the Pronunciation of English [kwJ Sequences by Korean EFL Students

  • Kim, Jung-Eun;Cho, Mi-Hui
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.193-206
    • /
    • 2002
  • The aim of this study is to find out how the labiovelar onglide /w/ in English kwV sequences that have minimal pairs with kV sequences is pronounced differently among Korean EFL learners based on acoustic evidence. This study tries to identify /w/ sound in English kwV sequences through spectrograms and to examine the duration ratios of each segment in kwV words to compare the patterns of an English native speaker with those of Korean speakers of English. In spectrographic analyses, the complete deletion of /w/ and partial pronunciation of /w/ dubbed [$k^{w}$] were identified as well as the targetappropriate production of /w/. The general production patterns with respect to the duration ratios in English [kw] sequence words showed that the subjects who produced /w/ had similar ratio patterns that the native speaker had in that the vowel duration ratio in kwV sequences was shorter than that in kV sequences. By contrast, the subjects who deleted [w] had a long ratio of the onset [$k^{h}$] while the speaker with a partial pronunciation of /w/ had a long ratio of the following vowel.

  • PDF

A Study on Speaker Identification by Difference Sum and Correlation Coefficients of Narrow-band Spectrum (좁은대역 스펙트럼의 차이값과 상관계수에 의한 화자확인 연구)

  • Yang, Byung-Gon;Kang, Sun-Mee
    • Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.3-16
    • /
    • 2002
  • We examined some problems in speaker identification procedures: transformation of acoustic parameters into auditory scales, invalid measurement values, and comparability of spectral energy values across the frequency range. To resolve those problems, we analyzed the acoustic spectral energy of three Korean numbers produced by ten female students from narrow-band spectrograms at 19 proportional time points of each voiced segment. Then, cells of the first five spectral matrices were averaged to form a matrix model for each speaker. The correlation coefficients and sum of the absolute amplitude difference in each pair of the spectral models of the ten subjects were obtained. Also, some individual matrix models were compared to those of the same subject or the other subject with a similar spectral model. Results showed that in numbers '2' and '9' subjects could not be clearly distinguished from the others but in number '4' it shed some possibility of setting threshold values for speaker identification if we employed the coefficients and the sum of absolute difference. Further studies would be desirable on various combinations of the range of long-term average spectra and the degree of signal pre-emphasis.

  • PDF

Phonetic Factors Conditioning the Release of English Sentence-Final Stops (영어 문장 말 폐쇄음의 파열 양상)

  • Kim, Da-Hee
    • MALSORI
    • /
    • no.53
    • /
    • pp.1-16
    • /
    • 2005
  • This experimental study aims to test the hypothesis that the occurrence of English sentence-final stop release is, at least, partly predictable by examining its phonetic context. 10 native(5 male and 5 female) speakers of American English recorded, in a sound-proof booth, sentences excerpted from novels and the natural documents on the World Wide Web. Based on the waveforms and spectrograms of the recorded sentences, judgements of the release of a sentence-final stop were made. If the aperiodic energy of a given final stop lasted more than .015 second, it was considered to be "released." The result reveals that English sentence-final stops tend to be released when they are 1) velar consonants, 2) preceeded by tense vowels, and 3) coda consonants of content words. The phonetic environment in which final stops are often released can be characterized by the articulatory comfortableness and the need for release burst noise, without which the final stops may not be correctly perceived. By examining the release of English final stops, it is concluded that the phonological events, which had been considered to occur rather "randomly," in fact, reflect the universal tendency of human speech: to minimize the speakers' and hearers' effort.

  • PDF