• Title/Summary/Keyword: Automatic music transcription

Search Result 10, Processing Time 0.025 seconds

A Threshold Adaptation based Voice Query Transcription Scheme for Music Retrieval (음악검색을 위한 가변임계치 기반의 음성 질의 변환 기법)

  • Han, Byeong-Jun;Rho, Seung-Min;Hwang, Een-Jun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.2
    • /
    • pp.445-451
    • /
    • 2010
  • This paper presents a threshold adaptation based voice query transcription scheme for music information retrieval. The proposed scheme analyzes monophonic voice signal and generates its transcription for diverse music retrieval applications. For accurate transcription, we propose several advanced features including (i) Energetic Feature eXtractor (EFX) for onset, peak, and transient area detection; (ii) Modified Windowed Average Energy (MWAE) for defining multiple small but coherent windows with local threshold values as offset detector; and finally (iii) Circular Average Magnitude Difference Function (CAMDF) for accurate acquisition of fundamental frequency (F0) of each frame. In order to evaluate the performance of our proposed scheme, we implemented a prototype music transcription system called AMT2 (Automatic Music Transcriber version 2) and carried out various experiments. In the experiment, we used QBSH corpus [1], adapted in MIREX 2006 contest data set. Experimental result shows that our proposed scheme can improve the transcription performance.

Humming based High Quality Music Creation (허밍을 이용한 고품질 음악 생성)

  • Lee, Yoonjae;Kim, Sunmin
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2014.10a
    • /
    • pp.146-149
    • /
    • 2014
  • In this paper, humming based automatic music creation method is described. It is difficult for the general public which does not have music theory to compose the music in general. However, almost people can make the main melody by a humming. With this motivation, a melody and chord sequences are estimated by the humming analysis. In this paper, humming is generated without a metronome. Then based on the estimated chord sequence, accompaniment is generated using the MIDI template matched to each chord. The 5 Genre is supported in the music creation. The melody transcription is evaluated in terms of onset and pitch estimation accuracy and MOS evaluation is used for created music evaluation.

  • PDF

A study on improving the performance of the machine-learning based automatic music transcription model by utilizing pitch number information (음고 개수 정보 활용을 통한 기계학습 기반 자동악보전사 모델의 성능 개선 연구)

  • Daeho Lee;Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.207-213
    • /
    • 2024
  • In this paper, we study how to improve the performance of a machine learning-based automatic music transcription model by adding musical information to the input data. Where, the added musical information is information on the number of pitches that occur in each time frame, and which is obtained by counting the number of notes activated in the answer sheet. The obtained information on the number of pitches was used by concatenating it to the log mel-spectrogram, which is the input of the existing model. In this study, we use the automatic music transcription model included the four types of block predicting four types of musical information, we demonstrate that a simple method of adding pitch number information corresponding to the music information to be predicted by each block to the existing input was helpful in training the model. In order to evaluate the performance improvement proceed with an experiment using MIDI Aligned Piano Sounds (MAPS) data, as a result, when using all pitch number information, performance improvement was confirmed by 9.7 % in frame-based F1 score and 21.8 % in note-based F1 score including offset.

Structural Analysis Algorithm for Automatic Transcription 'Pansori' (판소리 자동채보를 위한 구조분석 알고리즘)

  • Ju, Young-Ho;Kim, Joon-Cheol;Seo, Kyoung-Suk;Lee, Joon-Whoan
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.2
    • /
    • pp.28-38
    • /
    • 2014
  • For western music there has been a volume of researches on music information analysis for automatic transcription or content-based music retrieval. But it is hard to find the similar research on Korean traditional music. In this paper we propose several algorithms to automatically analyze the structure of Korean traditional music 'Pansori'. The proposed algorithm automatically distinguishes between the 'sound' part and 'speech' part which are named 'sori' and 'aniri', respectively, using the ratio of phonetic and pause time intervals. For rhythm called 'jangdan' classification the algorithm makes the robust decision using the majority voting process based on template matching. Also an algorithm is suggested to detect the bar positions in the 'sori' part based on Kalman filter. Every proposed algorithm in the paper works so well enough for the sample music sources of 'Pansori' that the results may be used to automatically transcribe the 'Pansori'.

Automatic Music Transcription Considering Time-Varying Tempo (가변 템포를 고려한 자동 음악 채보)

  • Ju, Youngho;Babukaji, Baniya;Lee, Joonwhan
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.11
    • /
    • pp.9-19
    • /
    • 2012
  • Time-varying tempo of a song is one of the error sources for the identification of a note duration in automatic music recognition. This paper proposes an improved music transcription scheme equipped with the identification of note duration considering the time-varying tempo. In the proposed scheme the measures are found at first and the tempo, the playing time of each measure, is then estimated. The tempo is then used for resizing each IOI(Inter Onset Interval) length and considered to identify the accurate note duration, which increases the degree of correspondence to the music piece. In the experiment the proposed scheme found the accurate measure position for 14 monophonic children songs out of 16 ones recorded by men and women. Also, it achieved about 89.4% and 84.8% of the degree of matching to the original music piece for identification of note duration and pitch, respectively.

Extraction of Chord and Tempo from Polyphonic Music Using Sinusoidal Modeling

  • Kim, Do-Hyoung;Chung, Jae-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4E
    • /
    • pp.141-149
    • /
    • 2003
  • As music of digital form has been widely used, many people have been interested in the automatic extraction of natural information of music itself, such as key of a music, chord progression, melody progression, tempo, etc. Although some studies have been tried, consistent and reliable results of musical information extraction had not been achieved. In this paper, we propose a method to extract chord and tempo information from general polyphonic music signals. Chord can be expressed by combination of some musical notes and those notes also consist of some frequency components individually. Thus, it is necessary to analyze the frequency components included in musical signal for the extraction of chord information. In this study, we utilize a sinusoidal modeling, which uses sinusoids corresponding to frequencies of musical tones, and show reliable chord extraction results of sinusoidal modeling. We could also find that the tempo of music, which is the one of remarkable feature of music signal, interactively supports the chord extraction idea, if used together. The proposed scheme of musical feature extraction is able to be used in many application fields, such as digital music services using queries of musical features, the operation of music database, and music players mounting chord displaying function, etc.

Reducing latency of neural automatic piano transcription models (인공신경망 기반 저지연 피아노 채보 모델)

  • Dasol Lee;Dasaem Jeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.102-111
    • /
    • 2023
  • Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.

An Improved Automatic Music Transcription Method Using TV-Filter and Optimal Note Combination (TV-필터와 최적 음표조합을 이용한 개선된 가변템포 음악채보방법)

  • Ju, Young-Ho;Lee, Joonwhoan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.371-377
    • /
    • 2013
  • This paper proposes three methods for improving the accuracy of auto-music transcription considering with time-varying tempo from monophonic sound. The first one that uses TV(Total Variation) filter for smoothing the pitch data reduces the fragmentation in the pitch segmentation result. Also, the measure finding method that combines three different ways based on pitch and energy of sound data, respectively as well as based on rules produces more stable result. In addition the temporal result of note-length encoding is corrected in optimal way that the resulted encoding minimizes the sum of quantization error in a measure while the sum of note-lengths is equal to the number of beats. In the experiment with 16 children songs, we obtained the improved result in which measure finding was complete, the accuracy of encoding for note-length and pitch was about 91.3 and 86.7, respectively.

Finding Measure Position Using Combination Rules of Musical Notes in Monophonic Song (단일 음원 노래에서 음표의 조합 규칙을 이용한 마디 위치 찾기)

  • Park, En-Jong;Shin, Song-Yi;Lee, Joon-Whoan
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.10
    • /
    • pp.1-12
    • /
    • 2009
  • There exist some regular multiple relations in the intervals of notes when they are combined within one measure. This paper presents a method to find the exact measure positions in monophonic song based on those relations. In the proposed method the individual intervals are segmented at first and the rules that state the multiple relations are used to find the measure position. The measures can be applied as the foundational information for extracting beat and tempo of a song which can be used as background knowledge of automatic music transcription system. The proposed method exactly detected the measure positions of 11 songs out of 12 songs except one song which consist of monophonic voice song of the men and women. Also one can extract the information of beat and tempo of a song using the information about extracted measure positions with music theory.

Automatic Music Transcription System Using SIDE (SIDE를 이용한 자동 음악 채보 시스템)

  • Hyoung, A-Young;Lee, Joon-Whoan
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.141-150
    • /
    • 2009
  • This paper proposes a system that can automatically write singing voices to music notes. First, the system uses Stabilized Diffusion Equation(SIDE) to divide the song to a series of syllabic parts based on pitch detection. By the song segmentation, our method can recognize the sound length of each fragment through clustering based on genetic algorithm. Moreover, this study introduces a concept called 'Relative Interval' so as to recognize interval based on pitch of singer. And it also adopted measure extraction algorithm using pause data to implement the higher precision of song transcription. By the experiments using 16 nursery songs, it is shown that the measure recognition rate is 91.5% and DMOS score reaches 3.82. These findings demonstrate effectiveness of system performance.