Search | Korea Science

Dasol Lee;Dasaem Jeong
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.2
- /
- pp.102-111
- /
- 2023
Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.
https://doi.org/10.7776/ASK.2023.42.2.102 인용 PDF

Ju, Young-Ho;Lee, Joonwhoan
- Journal of the Korean Institute of Intelligent Systems
- /
- v.23 no.4
- /
- pp.371-377
- /
- 2013
This paper proposes three methods for improving the accuracy of auto-music transcription considering with time-varying tempo from monophonic sound. The first one that uses TV(Total Variation) filter for smoothing the pitch data reduces the fragmentation in the pitch segmentation result. Also, the measure finding method that combines three different ways based on pitch and energy of sound data, respectively as well as based on rules produces more stable result. In addition the temporal result of note-length encoding is corrected in optimal way that the resulted encoding minimizes the sum of quantization error in a measure while the sum of note-lengths is equal to the number of beats. In the experiment with 16 children songs, we obtained the improved result in which measure finding was complete, the accuracy of encoding for note-length and pitch was about 91.3 and 86.7, respectively.
https://doi.org/10.5391/JKIIS.2013.23.4.371 인용 PDF KSCI

Hyoung, A-Young;Lee, Joon-Whoan
- The KIPS Transactions:PartB
- /
- v.16B no.2
- /
- pp.141-150
- /
- 2009
This paper proposes a system that can automatically write singing voices to music notes. First, the system uses Stabilized Diffusion Equation(SIDE) to divide the song to a series of syllabic parts based on pitch detection. By the song segmentation, our method can recognize the sound length of each fragment through clustering based on genetic algorithm. Moreover, this study introduces a concept called 'Relative Interval' so as to recognize interval based on pitch of singer. And it also adopted measure extraction algorithm using pause data to implement the higher precision of song transcription. By the experiments using 16 nursery songs, it is shown that the measure recognition rate is 91.5% and DMOS score reaches 3.82. These findings demonstrate effectiveness of system performance.
https://doi.org/10.3745/KIPSTB.2009.16-B.2.141 인용 PDF KSCI

Park, En-Jong;Shin, Song-Yi;Lee, Joon-Whoan
- The Journal of the Korea Contents Association
- /
- v.9 no.10
- /
- pp.1-12
- /
- 2009
There exist some regular multiple relations in the intervals of notes when they are combined within one measure. This paper presents a method to find the exact measure positions in monophonic song based on those relations. In the proposed method the individual intervals are segmented at first and the rules that state the multiple relations are used to find the measure position. The measures can be applied as the foundational information for extracting beat and tempo of a song which can be used as background knowledge of automatic music transcription system. The proposed method exactly detected the measure positions of 11 songs out of 12 songs except one song which consist of monophonic voice song of the men and women. Also one can extract the information of beat and tempo of a song using the information about extracted measure positions with music theory.
https://doi.org/10.5392/JKCA.2009.9.10.001 인용 PDF