Search | Korea Science

Dasol Lee;Dasaem Jeong
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.2
- /
- pp.102-111
- /
- 2023
Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.
https://doi.org/10.7776/ASK.2023.42.2.102 인용 PDF

Park, Sang-Uk;Park, Si-Hyun;Park, Chun-Su
- Journal of IKEEE
- /
- v.23 no.1
- /
- pp.249-253
- /
- 2019
Most of music-transcription systems that have been commercialized operate based on audio information. However, these conventional systems have disadvantages of environmental dependency, equipment dependency, and time latency. This paper studied a vision-based music-transcription system that utilizes video information rather than audio information, which is a traditional method of music-transcription programs. Computer vision technology is widely used as a field for analyzing and applying information from equipment such as cameras. In this paper, we created a program to generate MIDI file which is electronic music notes by using smart-phone cameras to record the play of piano.
https://doi.org/10.7471/ikeee.2019.23.1.249 인용 PDF KSCI HTML

Daeho Lee;Seokjin Lee
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.207-213
- /
- 2024
In this paper, we study how to improve the performance of a machine learning-based automatic music transcription model by adding musical information to the input data. Where, the added musical information is information on the number of pitches that occur in each time frame, and which is obtained by counting the number of notes activated in the answer sheet. The obtained information on the number of pitches was used by concatenating it to the log mel-spectrogram, which is the input of the existing model. In this study, we use the automatic music transcription model included the four types of block predicting four types of musical information, we demonstrate that a simple method of adding pitch number information corresponding to the music information to be predicted by each block to the existing input was helpful in training the model. In order to evaluate the performance improvement proceed with an experiment using MIDI Aligned Piano Sounds (MAPS) data, as a result, when using all pitch number information, performance improvement was confirmed by 9.7 % in frame-based F1 score and 21.8 % in note-based F1 score including offset.
https://doi.org/10.7776/ASK.2024.43.2.207 인용 PDF

Shin, Ok Keun;Ryu, Da Hyun
- Journal of Advanced Marine Engineering and Technology
- /
- v.36 no.8
- /
- pp.1129-1135
- /
- 2012
To employ NMF to transcribe music by extracting feature matrix and weight matrix at the same time, it is necessary to know in advance the dimension of the feature matrix, and to determine the pitch of each extracted feature vector. Another drawback of this approach is that it becomes more difficult to accurately extract the feature matrix as the number of pitches included in the target music increases. In this study, we prepare a feature matrix database, and apply the matrix to transcribe real music. Transcription experiments are conducted by applying the feature matrix to the music played on the same piano on which the feature matrix is extracted, as well as on the music played on another piano. These results are also compared to those of another experiment where the feature matrix and weight matrix are extracted simultaneously, without making use of the database. We could observe that the proposed method outperform the method in which the two matrices are extracted at the same time.
https://doi.org/10.5916/jkosme.2012.36.8.1129 인용 PDF KSCI