Search | Korea Science

A study on improving the performance of the machine-learning based automatic music transcription model by utilizing pitch number information (음고 개수 정보 활용을 통한 기계학습 기반 자동악보전사 모델의 성능 개선 연구)

Daeho Lee;Seokjin Lee
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.207-213
- /
- 2024
In this paper, we study how to improve the performance of a machine learning-based automatic music transcription model by adding musical information to the input data. Where, the added musical information is information on the number of pitches that occur in each time frame, and which is obtained by counting the number of notes activated in the answer sheet. The obtained information on the number of pitches was used by concatenating it to the log mel-spectrogram, which is the input of the existing model. In this study, we use the automatic music transcription model included the four types of block predicting four types of musical information, we demonstrate that a simple method of adding pitch number information corresponding to the music information to be predicted by each block to the existing input was helpful in training the model. In order to evaluate the performance improvement proceed with an experiment using MIDI Aligned Piano Sounds (MAPS) data, as a result, when using all pitch number information, performance improvement was confirmed by 9.7 % in frame-based F1 score and 21.8 % in note-based F1 score including offset.
https://doi.org/10.7776/ASK.2024.43.2.207 인용 PDF

A Study on Improving Pitch Search by Varying the number of Subframes for Vocoder (보코더에서 서브프레임 수의 변화를 이용한 피치검색 성능 개선에 관한 연구)

Baek, Geum-Ran;Bae, Myung-Jin
- Journal of the Institute of Electronics and Information Engineers
- /
- v.49 no.10
- /
- pp.83-88
- /
- 2012
The pitch searching is a very important process in a vocoder. Generally, the method of pitch searching method is used by highlighting the periodicity, where a correlation is identified with the signal by changing the interval of two pulses. When the correlation value is highest, the pitch can be found by the pulse interval because it is the repetition interval with most striking period. There are many methods to solve this problem and search the pitch by dividing a frame into many subframes, but there is too much calculation to solve. A method in this paper is suggested to vary the number of subframes by predicting the amplitude change rate in a frame. If this method is applied, the general pitch searching performance will be improved because the accuracy may be enhanced without affecting the sound quality in the synthesized signal after parameter transmission; and the pitch searching time may be reduced.
https://doi.org/10.5573/ieek.2012.49.10.083 인용 PDF

A Study on Number sounds Speaker recognition using the Pitch detection and the Fuzzified pattern (피치 검출과 퍼지화 패턴을 이용한 숫자음 화자 인식에 관한 연구)

김연숙;김희주;김경재
- Journal of the Korea Society of Computer and Information
- /
- v.8 no.3
- /
- pp.73-79
- /
- 2003
This paper proposes speaker recognition algorithm which includes both the pitch detection and the fuzzified pattern matching. This study utilizes pitch pattern using a pitch and speech parameter uses binary spectrum. In this paper. makes reference pattern using fuzzy membership function in order to include time variation width for non-utterance time and performs vocal track recognition of common character using fuzzified pattern matching.
PDF

Differences in High Pitch Accents between News Speech and Natural Speech (영어 뉴스와 자연발화에 나타나는 고성조 피치액센트의 차이점)

Choi, Yun-Hui;Lee, Joo-Kyeong
- Speech Sciences
- /
- v.12 no.2
- /
- pp.17-28
- /
- 2005
This paper argues that news speech entails a distinct intonational pattern from natural speech, effectively reflecting that it primarily focuses on providing new information. We conducted a phonetic experiment to compare the tonal contours between news speech and natural speech, examining the distributions of pitch accents and the overall pitch ranges. We utilized 70 American Press (AP) radio news utterances and 70 natural utterances extracted from TV dramas. Results show that news speech involves 3.38 H*'s (including L+H* and !H*) within an intonational phrase (IP) or intermediate phrase (ip) whereas natural speech, 1.8 in average. The number of IP/ip's per sentence is 3 in news speech, which is shown in the highest rate of 32.07% of the news speech, but it is merely 1, taking up the highest 41.42% in natural speech. Next, declination tends to be prevented in news speech, and the pitch range is much greater in news speech than in natural speech. Finally, a secondary stress syllable is comparatively frequently given a pitch accent in news speech, explicitly distinct from natural speech. These results can be interpreted as stating that news has the particular purpose of providing new information; every content word tends to be given a H* or its related pitch accent like L+H* or !H* because news speech assumes that every word conveys new information. This definitely brings about more IP/ip's per sentence due to a human physiological constraint; that is, more H*'s will cause more respiratory breaks. Also, greater pitch ranges and pitch accents imposed on secondary stress may be attributed to exaggerating new information.
PDF

A Study on Speaker Recognition using the Peak and valley pitch detection and the Fuzzy (국부 봉우리와 골에 의한 피치 검출과 퍼지를 이용한 화자 인식에 관한 연구)

김연숙;김희주;김경재
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.8 no.1
- /
- pp.213-219
- /
- 2004
This paper proposes speaker recognition algorithm which includes the pitch parameter for the peak and valley. The time-frequency hybrid method for pitch extraction is valuable in that it can improve resolution in the time domain and accuracy in the frequency domain at the same time. It makes reference pattern using membership function and performs vocal track recognition of common character using fuzzy pattern matching in order to include time variation width for non-linear utterance for proposed method, speaker recognition experiments are carried out using vowels and number sounds.
PDF KSCI

Changes in Features of Korean Vowels with Age and Sex of Speakers and Their Recognition (한국어 단모음의 성별, 연령별 특징변화 및 인식)

이용주;김경태;차균현
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.25 no.12
- /
- pp.1503-1512
- /
- 1988
As the basic analysis to solve the within-and cross-speaker variability in phoneme based speech recognition, changes in pitch and formant frequencies of 8 Korean vowels with age and sex of speaker has been investigated by analyzing a large number fo samples. Conclusions obtained are as follows: 1) Changes in pitch frequency with age and sex of speaker for children are hard to distinguish and the difference of before and after the voice change is analyzed approximately 0.2 oct. for female an 0.9 oct. for male. 2) While most of the formants of vowel considerably change with the age of speaker, the change becomes smaller as the age becomes older. 3) While there is an indirect correlation between pitch and formant with change in age, it is hard to see a direct correlation. 4) When the objects of the recognition experiment by pitch and formants are various speakers in each age and sex, pitch also works as an efficient recognition parameter.
PDF

Pitch Estimation Method in an Integrated Time and Frequency Domain by Applying Linear Interpolation (선형 보간법을 이용한 시간과 주파수 조합영역에서의 피치 추정 방법)

Kim, Ki-Chul;Park, Sung-Joo;Lee, Seok-Pil;Kim, Moo-Young
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.47 no.5
- /
- pp.100-108
- /
- 2010
An autocorrelation method is used in pitch estimation. Autocorrelation values in time and frequency domains, which have different characteristics, correspond to the pitch period and fundamental frequency, respectively. We utilize an integrated autocorrelation method in time and frequency domains. It can remove the errors of pitch doubling and having. In the time and frequency domains, pitch period and fundamental frequency have reciprocal relation to each other. Especially, fundamental frequency estimation ends up as an error because of the resolution of FFT. To reduce these artifacts, interpolation methods are applied in the integrated autocorrelation domain, which decreases pitch errors. Moreover, only for the pitch candidates found in a time domain, the corresponding frequency-domain autocorrelation values are calculated with reduced computational complexity. Using linear interpolation, we can decrease the required number of FFT coefficients by 8 times. Thus, compared to the conventional methods, computational complexity can be reduced by 9.5 times.
PDF KSCI

The method to minimize the number of sample modules in the electronic musical instrument using pitch shifting technique (Pitch Shifting 기법을 사용하는 전자악기에서 Sample Module의 개수를 최소화하는 방안)

박진원;최제헌;김규년
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04b
- /
- pp.439-441
- /
- 2001
현재 사용되고 있는 전자악기는 한 옥타브에 대해 여러 개의 샘플 모듈(sample module)을 메모리에 저장해두면서, 옥타브내의 다른 음들은 그 샘플 모듈을 피지 시프팅(pitch shifting)하여 생성한다[1]. 따라서, 하나의 악기에 대해 많은 샘플 모듈을 사용하게 되고 많은 메모리를 필요로 한다. 본 논문에서는 보다 적은 샘플 모듈을 사용하여 메모리를 절약할 수 있는 방법을 연구하였다. 피치 시프팅의 범위를 하나의 옥타브 이내로 제한하지 않고, 피치 시프팅에 의해 발생하는 원음과 오차 평균을 줄이면서 가장 적절한 샘플 모듈음을 찾아낸다. 본 논문에서는 전자악기의 악기음 중에서 피아노 음을 선택했으며, 피아노의 88개 음들 중에서 피치 시프팅을 했을 때 원음과 가장 가까운 음을 만들어내는 음들을 샘플 모듈로 사용한다. 이러한 방법으로 샘플 모듈음들을 선택하면 기존의 전자악기에서 사용하는 샘플 모듈 개수보다 훨씬 적으면서도 동일한 음질을 보장하고, 또한 메모리 공간도 절약할 수 있다.
PDF

A study on the gain-scheduling of missile autopilot (유도탄 제어기의 이득-스케듈링에 관한 연구)

송찬호;김윤식
- 제어로봇시스템학회:학술대회논문집
- /
- 1991.10a
- /
- pp.355-360
- /
- 1991
A method of autopilot gain-scheduling is presented for missiles which have heavy aerodynamic coupling between pitch and yaw channels due to high maneuverability. Pitch and yaw, autopilot are cross-coupled, and their feedback gains are scheduled by total acceleration and bank angle for given Mach number and height. Bank angle information is obtained by using a simple estimator. By computer simulation, it is shown that the proposed method is superior to other existing methods.
PDF

A Study on Improving Pitch Search for Vocoder (보코더에서 피치검색 성능개선에 관한 연구)

Baek, Geum-Ran;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.7
- /
- pp.419-426
- /
- 2012
The pitch searching is a vital process in a vocoder. Generally, the method of pitch searching is employed after highlighting the periodicity, where a correlation is identified with the signal by changing the interval of two pulses. When the correlation value reaches the peak, the pitch can be found by the pulse interval because it is the repetition interval with most striking period. However if the identified period happens to be one of half period, double period or triple period, this cannot be considered as the pitch period. Many methods were suggested to solve this problem. An inaccurate pitch could be obtained as well, when there is an interval where signal amplitude is not constant but varies abruptly in the frame. To solve this matter, searching the pitch by dividing a frame into various subframes is adopted, but too much calculation has to be followed while it leads the correct value. This paper suggests an algorithm to resolve these two problems. First, to search the pitch after advance correction of the signal energy level with an estimated overall energy change ratio in the frame before pitch search to reduce half period, double period and triple period is suggested. Second, to vary the number of subframes by predicting the amplitude change rate in the frame by the energy ratio obtained by the above-mentioned method is advised. If these two methods are applied, the pitch searching time can be reduced and the general pitch searching performance can be improved without affecting the sound quality in the synthesized signal.
https://doi.org/10.7776/ASK.2012.31.7.419 인용 PDF KSCI

Search Result 41, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)