• Title/Summary/Keyword: Spectrogram

Search Result 236, Processing Time 0.028 seconds

A study on improving the performance of the machine-learning based automatic music transcription model by utilizing pitch number information (음고 개수 정보 활용을 통한 기계학습 기반 자동악보전사 모델의 성능 개선 연구)

  • Daeho Lee;Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.207-213
    • /
    • 2024
  • In this paper, we study how to improve the performance of a machine learning-based automatic music transcription model by adding musical information to the input data. Where, the added musical information is information on the number of pitches that occur in each time frame, and which is obtained by counting the number of notes activated in the answer sheet. The obtained information on the number of pitches was used by concatenating it to the log mel-spectrogram, which is the input of the existing model. In this study, we use the automatic music transcription model included the four types of block predicting four types of musical information, we demonstrate that a simple method of adding pitch number information corresponding to the music information to be predicted by each block to the existing input was helpful in training the model. In order to evaluate the performance improvement proceed with an experiment using MIDI Aligned Piano Sounds (MAPS) data, as a result, when using all pitch number information, performance improvement was confirmed by 9.7 % in frame-based F1 score and 21.8 % in note-based F1 score including offset.

Abnormal State Detection using Memory-augmented Autoencoder technique in Frequency-Time Domain

  • Haoyi Zhong;Yongjiang Zhao;Chang Gyoon Lim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.348-369
    • /
    • 2024
  • With the advancement of Industry 4.0 and Industrial Internet of Things (IIoT), manufacturing increasingly seeks automation and intelligence. Temperature and vibration monitoring are essential for machinery health. Traditional abnormal state detection methodologies often overlook the intricate frequency characteristics inherent in vibration time series and are susceptible to erroneously reconstructing temperature abnormalities due to the highly similar waveforms. To address these limitations, we introduce synergistic, end-to-end, unsupervised Frequency-Time Domain Memory-Enhanced Autoencoders (FTD-MAE) capable of identifying abnormalities in both temperature and vibration datasets. This model is adept at accommodating time series with variable frequency complexities and mitigates the risk of overgeneralization. Initially, the frequency domain encoder processes the spectrogram generated through Short-Time Fourier Transform (STFT), while the time domain encoder interprets the raw time series. This results in two disparate sets of latent representations. Subsequently, these are subjected to a memory mechanism and a limiting function, which numerically constrain each memory term. These processed terms are then amalgamated to create two unified, novel representations that the decoder leverages to produce reconstructed samples. Furthermore, the model employs Spectral Entropy to dynamically assess the frequency complexity of the time series, which, in turn, calibrates the weightage attributed to the loss functions of the individual branches, thereby generating definitive abnormal scores. Through extensive experiments, FTD-MAE achieved an average ACC and F1 of 0.9826 and 0.9808 on the CMHS and CWRU datasets, respectively. Compared to the best representative model, the ACC increased by 0.2114 and the F1 by 0.1876.

A Study on Underwater Source Localization Using the Wideband Interference Pattern Matching (수중에서 광대역 간섭 패턴 정합을 이용한 음원의 위치 추정 연구)

  • Chun, Seung-Yong;Kim, Se-Young;Kim, Ki-Man
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.415-425
    • /
    • 2007
  • This paper proposes a method of underwater source localization using the wideband interference patterns matching. By matching two interference patterns in the spectrogram, it is estimated a ratio of the range from source to sensor5, and then this ratio is applied to the Apollonius circle. The Apollonius circle is defined as the locus of all points whose distances from two fixed points are in a constant value so that it is possible to represent the locus of potential source location. The Apollonius circle alone, however still keeps the ambiguity against the correct source location. Therefore another equation is necessary to estimate the unique locus of the source location. By estimating time differences of signal arrivals between source and sensors, the hyperbola equation is used to get the cross point of the two equations, where the point being assumed to be the source position. Simulations are performed to get performances of the proposed algorithm. Also, comparisons with real sea experiment data are made to prove applicability of the algorithm in real environment. The results show that the proposed algorithm successfully estimates the source position within an error bound of 10%.

A Diagnosis system of misalignments of linear motion robots using transfer learning (전이 학습을 이용한 선형 이송 로봇의 정렬 이상진단 시스템)

  • Su-bin Hong;Young-dae Lee;Arum Park;Chanwoo Moon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.801-807
    • /
    • 2024
  • Linear motion robots are devices that perform functions such as transferring parts or positioning devices, and require high precision. In companies that develop linear robot application systems, human workers are in charge of quality control and fault diagnosis of linear robots, and the result and accuracy of a fault diagnosis varies depending on the skill level of the person in charge. Recently, there have been many attempts to utilize artificial intelligence to diagnose faults in industrial devices. In this paper, we present a system that automatically diagnoses linear rail and ball screw misalignment of a linear robot using transfer learning. In industrial systems, it is difficult to obtain a lot of learning data, and this causes a data imbalance problem. In this case, a transfer learning model configured by retraining an established model is widely used. The information obtained by using an acceleration sensor and torque sensor was used, and its usefulness was evaluated for each case. After converting the signal obtained from the sensor into a spectrogram image, the type of abnormality was diagnosed using an image recognition artificial intelligence classifier. It is expected that the proposed method can be used not only for linear robots but also for diagnosing other industrial robots.

Orthographic Influence in the Perception and Production of English Intervocalic Consonants: A Pilot Study (영어 모음사이 자음의 인지와 발화에서 철자의 영향: 파일럿 연구)

  • Cho, Mi-Hui;Chung, Ju-Yeon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.459-466
    • /
    • 2009
  • While Korean allows the same consonants at the coda of the preceding syllable and at the onset of the following syllable, English does not allow the geminate consonants in the same intervocalic position. Due to this difference between Korean and English, Korean learners of English tend to incorrectly produce geminate consonants for English geminate graphemes as in $su\underline{mm}er$. Based on this observation, a pilot study was designed to investigate how Korean learners of English perceive and produce English doubleton graphemes and singleton graphemes. Twenty Korean college students were asked to perform a forced-choice perception test as well as a production test for the 36 real word stimuli which consist of (near) minimal pairs of singleton and doubleton graphemes. The result showed that the accuracy rates for the words with singleton graphemes were higher than those for the words with doubleton graphemes both in perception and production because the subjects misperceived and misproduced the doubleton graphemes as geminates due to orthographic influence. In addition, the low error rates of the word with voiced stops were accounted for by Korean language transfer. Further, spectrographic analyses were provided where more production errors were witnessed in doubleton grapheme words than singleton grapheme words. Finally, pedagogical implications are provided.

Clinical Acoustic Study of Acupuncture Therapy Effects on Post-Stroke Dysarthria (침치료가 뇌졸중으로 인한 구음장애에 미치는 음향적 특성에 대한 증례보고)

  • Lee, Min-Goo;Park, Sae-Wook;Lee, Sun-Woo;Ryu, Hyun-Hee;Lee, Seung-Eon;Kim, Yong-Jeong;Son, Ji-Woo;Rhim, Eun-Kyung;Kim, Sung-Nam;Lee, In;Moon, Byung-Soon;Yun, Jong-Min
    • The Journal of Internal Korean Medicine
    • /
    • v.26 no.3
    • /
    • pp.660-669
    • /
    • 2005
  • Objectives : The aim of this study is to find the acoustic characteristics of acupuncture therapy effects on post-stroke dysarthria. Methods : Acupuncture therapy was applied for four to six weeks by inserting needles into eight acupuncture points, CV23, CV24, bilateral 'Sheyu' and ipsilateral ST4, ST6 and contralateral LI4, ST36 on facial palsy side. All the speech samples were collected, pre-treatment and post-treatment, using Computerized Speech Lab. VOT and TD of each speech sample and vowel formant(F1&F2) were analyzed on spectrogram. Result : VOT and TD were decreased after treatment. F1 was decreased, and F2 was increased after treatment. Conclusions : This suggests that acupuncture therapy improves symptoms of post-stroke dysarthria by stimulating articulation organs such as tongue, lips, cheeks, larynx and pharynx.

  • PDF

Inhibition of Human Neutrophil Elastase by NSAIDs and Inhibitors, and Molecular Pharmacological Mechanism of the Inhibition (비스테로이드성 항염증제와 효소 억제제에 의한 사람 중성구 Elastase의 활성도 억제 및 분자약리학적 기전)

  • Kang, Koo-Il;Kim, Woo-Mi;Hong, In-Sik;Lee, Moo-Sang
    • The Korean Journal of Pharmacology
    • /
    • v.32 no.3
    • /
    • pp.425-431
    • /
    • 1996
  • Human neutrophil elastases (HNElastase, EC 3.4.21.37), a causative factor of inflammatory diseases, are regulated by plasma proteinase inhibitors, alpha-proteinase inhibitor and ${\alpha}_2-macroglobulin$. Under certain pathological conditions, however, released enzymes or abnormal function of inhibitors may cause various inflammatory disease. NSAIDs have been clinically applied for treatment of inflammatory diseases. Inhibition of cyclooxygenase is a known mechanism of action of NSAIDs in the treatment of inflammatory disease. In in vitro experiments, HNElastase was inhibited by naproxen, phenylbutazone, and oxyphenbutazone, but ibuprofen, ketoprofen, aspirin, salicylic acid, and tolmetin did not inhibit elastase. HNElastase was also inhibited by chelating agents, EDTA & EGTA, and tetracyclines. Removal of divalent metal ions by EDTA caused inhibition of elastase, and reconstitution of the metal ions recovered the enzyme activity to a certain level. Frequencies and contours in the Raman spectra of various conditions of human neutrophil elastase undergo drastic changes upon partial removal and/or reconstitution of calcium and zinc ions. The metal ion content dependent activities and change of the contour of the Raman spectrogram suggest us that the mechanism of action of a chelator or chelator-like agents on neutrophil elastase may be related to the conformational change at/or near the active site, especially -C=O radical or -COOH radical.

  • PDF

A STUDY ON THE INFLUENCE OF THE PALATAL PLATES UPON THE DURATION OF KOREAN SOUNDS (구개상 장착에 따른 한국어 어음의 조음시간 변화에 관한 연구)

  • Koh, Yeo-Joon;Kim, Chang-Whe;Kim, Yong-Soo
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.32 no.1
    • /
    • pp.77-102
    • /
    • 1994
  • Many studies have been made on the masticatory and esthetic effects of prosthodontic treatments, but few on the restoration of pronunciation, especially in complete denture wearers. The purpose of this study is to provide a basis that could be of help to the complete denture wearers' speech adaptation by analyzing the influence of the palatal coverage upon the duration of consonants and vowels with the method of experimental phonetics. For this study, metal plates and resin plates were made for 3 male subjects in their twenties, who have good occlusion, and do not have speech and hearing disorders. Then 8 Korean consonants and 4 Korean vowels were selected, systemically considering phonetic variants such as the place and manner of articulation, lenis/fortis, mutual effect of each phoneme, etc. They were combined into meaningless tested words in the form of /VCV/, and were included in the carrier sentences. Each informant uttered the sentences 1) without the plate, 2) with the metal plate, 3) with the resin plate. The recorded data were analyzed through the waveform of sounds and spectrogram by using the program SoundEdit, Signalize, Statview 512+for the Macintosh computer. The duration of each segment was measured by searching for the boundaries between the preceding vowels and consonants, and between the consonants and the following vowels. The study led to the conclusion that. 1. With the palatal plate, the duration of all the tested words increased and the duration increased more with the resin plate than with the metal plate. 2. With the palatal plate, the duration of all the preceding vowels, consonants, and following vowels increased, but the temporal structure of the tested words was maintained. 3. As for the manner of articulation, fricative /s/(ㅅ) was greatly influenced by both kinds of palatal plates. 4. As for the place of articulation, alveolar sounds /d/(ㄷ), /n/(ㄴ) were greatly influnced by the kinds of palatal plates, and the velar sounds /n/(ㅇ), /g/(ㄱ) were influenced by the platal plates, but the kind of the palatal plates did not show any significance. 5. As for the lenis/fortis, lenis was influenced more by the kind of the palatal plates. 6. As for the influence of vowels upon each segment in the tested words, palatal vowel /i/(ㅣ) had greater influence than pharyngeal vowel /a/(ㅏ), and following vowels than preceding vowels.

  • PDF

A Visual Study of the Quality of English Pronunciation Using the Praat Program (Praat을 활용한 영어발음특성의 시각적 연구)

  • Park, Heesuk
    • Journal of Digital Contents Society
    • /
    • v.14 no.3
    • /
    • pp.323-331
    • /
    • 2013
  • This study aims at investigating and comparing the diphthongs, words, and sentences between two Korean highschool students groups using the Praat program. To do this English words and sentences were uttered and recorded by twenty Korean subjects; each group has ten subjects. All the subjects are female and their grades range from freshman to sophomore. Acoustic features were measured from a sound spectrogram with the help of the Praat software program and analyzed through statistical analysis. Results showed that the lengths of diphthongs and words were different between two groups, but the difference was not significant. However, in the lengths of sentence utterance, the group of 5 to 6 grade students in the current grading system pronounced longer than that of 1 to 2 grade students. Especially in the pronunciation of the first two sentences with more than five words, the difference was significant. From the data of the overall sum of words between the two subject groups, we were able to find out that the differences of the lengths of the words with the diphthongs were not significant, but those of the sentences with more than five words were significant. In the pronunciation of the words between coat and code, the length of the diphthong in coat was smaller than that of in code.

The Characteristics of Voice Onset Time of the Korean Stops in the Benign Laryngeal Disorders (후두질환에 따른 자음의 음성발현시간의 특성)

  • Hong, Ki-Hwan;Lee, Hwa-Uk;Kim, Jin-Sung;Lee, Eun-Jung;So, Sang-Soo;Choi, Dong-Il;Ynng, Yoon-Soo
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.17 no.2
    • /
    • pp.98-102
    • /
    • 2006
  • Background and Objectives : Voice onset time(VOT) is defined as the time interval from oral release of a stop consonant to the onset of glottal pulsing in the following vowel. VOT is a temporal characteristics of stop consonants that reflects the complex timing of glottal articulation relative to supraglottal articulation. Stop consonants are characterized by creation of a pressure difference across a complete occlusion in the vocal tract, followed by a sudden release 'burst' due to opening that occlusion. The objects of this study is to evaluate a usefulness of voice onset time in the assessment of voice disorderd patients. Subjects : Subjects were 20 adults with normal voice and with benign laryngeal disorders. Subjects with voice disorders represented the following vocal pathologies : vocal polyp, vocal nodule, Reinke's edema and unilateral vocal fold paralysis(UVFP). Control subjects were matched for age (21-40 yews old) and sex(male) with the voice disorders subjects and had normal vocal qualities with no history of voice disorders. Methods : Each voice-disordered and matched control subject read the test passages containing three types of Korean bilabial consonants. VOT measures were made for the initial $/p/p^h/\;and\;/p'/$. VOT was measured using acoustic waveform or wide band spectrogram. Results : For each voiceless stop consonants, there was a significant difference in VOT between the voice disordered and normal subjects. The mean VOTs of the lax stops in UVFP was significantly shorter than those of control subjects in the UVFP. The mean VOTs of the aspirated stops in the vocal polyp and nodule were longer than those of control subjects, but not significant. The mean VOTs of the glottalized in voice disordered groups were longer than those of control subjects, and significant statistically in the UVFP. Conclusions : VOT may be a clinically useful acoustic parameter in the assessment of voice disordered patients, especially in the unilateral vocal fold paralysis.

  • PDF