• Title/Summary/Keyword: 음색

Search Result 146, Processing Time 0.025 seconds

Salience of Envelope Interaural Time Difference of High Frequency as Spatial Feature (공간감 인자로서의 고주파 대역 포락선 양이 시간차의 유효성)

  • Seo, Jeong-Hun;Chon, Sang-Bae;Sung, Koeng-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.6
    • /
    • pp.381-387
    • /
    • 2010
  • Both timbral features and spatial features are important in the assessment of multichannel audio coding systems. The prediction model, extending the ITU-R Rec. BS. 1387-1 to multichannel audio coding systems, with the use of spatial features such as ITDDist (Interaural Time Difference Distortion), ILDDist (Interaural Level Difference Distortion), and IACCDist (InterAural Cross-correlation Coefficient Distortion) was proposed by Choi et al. In that model, ITDDistswere only computed for low frequency bands (below 1500Hz), and ILDDists were computed only for high frequency bands (over 2500Hz) according to classical duplex theory. However, in the high frequency range, information in temporal envelope is also important in spatial perception, especially in sound localization. A new model to compute the ITD distortions of temporal envelopes in high frequency components is introduced in this paper to investigate the role of such ITD on spatial perception quantitatively. The computed ITD distortions of temporal envelopes in high frequency components were highly correlated with perceived sound quality of multichannel audio sounds.

Sound Engine for Korean Traditional Instruments Using General Purpose Digital Signal Processor (범용 디지털 신호처리기를 이용한 국악기 사운드 엔진 개발)

  • Kang, Myeong-Su;Cho, Sang-Jin;Kwon, Sun-Deok;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.229-238
    • /
    • 2009
  • This paper describes a sound engine of Korean traditional instruments, which are the Gayageum and Taepyeongso, by using a TMS320F2812. The Gayageum and Taepyeongso models based on commuted waveguide synthesis (CWS) are required to synthesize each sound. There is an instrument selection button to choose one of instruments in the proposed sound engine, and thus a corresponding sound is produced by the relative model at every certain time. Every synthesized sound sample is transmitted to a DAC (TLV5638) using SPI communication, and it is played through a speaker via an audio interface. The length of the delay line determines a fundamental frequency of a desired sound. In order to determine the length of the delay line, it is needed that the time for synthesizing a sound sample should be checked by using a GPIO. It takes $28.6{\mu}s$ for the Gayageum and $21{\mu}s$ for the Taepyeongso, respectively. It happens that each sound sample is synthesized and transferred to the DAC in an interrupt service routine (ISR) of the proposed sound engine. A timer of the TMS320F2812 has four events for generating interrupts. In this paper, the interrupt is happened by using the period matching event of it, and the ISR is called whenever the interrupt happens, $60{\mu}s$. Compared to original sounds with their spectra, the results are good enough to represent timbres of instruments except 'Mu, Hwang, Tae, Joong' of the Taepyeongso. Moreover, only one sound is produced when playing the Taepyeongso and it takes $21{\mu}s$ for the real-time playing. In the case of the Gayageum, players usually use their two fingers (thumb and middle finger or thumb and index finger), so it takes $57.2{\mu}s$ for the real-time playing.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Optical Properties of Sea Water in the Western Channel of the Korea Strait (대한해협에서의 해수의 광학적 성질)

  • YANG Yong-Rhim
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.15 no.2
    • /
    • pp.171-177
    • /
    • 1982
  • Optical properties of sea water were studied in the western channel of the Korea Strait, based on the data obtained from fifteen oceanographic stations in July, 1980. Submarine daylight intensity was measured at intervals of 5m depth in the upper 70m layer by using the underwater irradiameter (Kahlsico $\#268_{WA}360$). The mean absorption coefficients of the sea water were shown as $0.098(0.063\sim0.183),\;0.129(0.090\sim0.270), 0.081(0.044\sim0.142),\;and 0.087(0.036\sim0,142)$ for clear, red, green, and blue color respectively. The transparency ranged from 11.5 to 24m(mean 18.3m). The mean water color in this area was $3.5(3\sim4)$ in Forel scales. The relation between absorption coefficient $(\kappa)$ and transparency (D) was $\kappa=1.72/D,\;\kappa=2.33/D,\;\kappa=1.41/D,\;and \kappa=1.44/D$ for clear, red, green, and blue color respectively. The rates of light penetration for clear, red, green, and blue color in four different depths were computed with reference to the surface light intensity respectively. The mean rates of light penetration in proportion to depths were as follows; clear : $57.90\% (5m),\;23.40\%\;(15m),\;6.23\%\;(30m),\;1.00\%\;(50m).$ $red\;:\;48.95\%\;(5m),\;14,81\%\;(15m),\;2.76\%\;(30m),\;0.28\%\;(50m).$ $green:\;63.20\%\;(5m),\;30.47\%\;(15m),\;10.03\%\;(30m),\;2.24\%\;(50m).$ $blue\;:\;62.70\%\;(5m),\;30.00\%\;(15m),\;9.75\%\;(30m),\;1.70\%\;(50m)$

  • PDF

A Study of Sound Expression in Webtoon (웹툰의 사운드 표현에 관한 연구)

  • Mok, Hae Jung
    • Cartoon and Animation Studies
    • /
    • s.36
    • /
    • pp.469-491
    • /
    • 2014
  • Webtoon has developed the method that makes it possible to express sound visually. Also we can also hear sound in webtoon through the development of web technology. It is natural that we analyze the sound that we can hear, but we can also analyze the sound that we can not hear. This study is based on 'dual code' in cognitive psychology. Cartoonists can make visual expression on the basis of auditive impression and memory, and readers can recall the sound through the process of memory and memory-retrieval. This study analyzes both audible sound and inaudable sound. Concise analysis owes the method to film sound theory. Three main factor, Volume, pitch, and tone are recognized by frequency in acoustics. On the other hand they are expressed by the thickness and site of line and image of sound source. The visual expression of in screen sound and off screen sound is related to the frame of comics. Generally the outside of frame means off sound, but some off sound is in the frame. In addition, horror comics use much sound for the effect of genre like horror film. When analyzing comics sound using this kinds of the method film sound analysis, we can find that webtoon has developed creative expression method comparing with simple ones of early comics. Especially arranging frames and expressing sound following and vertical moving are new ones in webtoon. Also types and arrangement of frame has been varied. BGM is the first in using audible sound and recently BGM composed mixing sound effect is being used. In addition, the program which makes it possible for readers to hear sound according to scroll moving. Especially horror genre raise the genre effects using this technology. Various methods of visualizing sound are being created, and the change shows that webtoon could be the model of convergence in contents.

The actual aspects of North Korea's 1950s Changgeuk through the Chunhyangjeon in the film Moranbong(1958) and the album Corée Moranbong(1960) (영화 <모란봉>(1958)과 음반 (1960) 수록 <춘향전>을 통해 본 1950년대 북한 창극의 실제적 양상)

  • Song, Mi-Kyoung
    • (The) Research of the performance art and culture
    • /
    • no.43
    • /
    • pp.5-46
    • /
    • 2021
  • The film Moranbong is the product of a trip to North Korea in 1958, when Armangati, Chris Marker, Claude Lantzmann, Francis Lemarck and Jean-Claude Bonardo left at the invitation of Joseon Film. However, for political reasons, the film was not immediately released, and it was not until 2010 that it was rediscovered and received attention. The movie consists of the narratives of Young-ran and Dong-il, set in the Korean War, that are folded into the narratives of Chunhyang and Mongryong in the classic Chunhyangjeon of Joseon. At this time, Joseon's classics are reproduced in the form of the drama Chunhyangjeon, which shares the time zone with the two main characters, and the two narratives are covered in a total of six scenes. There are two layers of middle-story frames in the movie, and if the same narrative is set in North Korea in the 1950s, there is an epic produced by the producers and actors of the Changgeuk Chunhyangjeon and the Changgeuk Chunhyangjeon as a complete work. In the outermost frame of the movie, Dong-il is the main character, but in the inner double frame, Young-ran, who is an actor growing up with the Changgeuk Chunhyangjeon and a character in the Changgeuk Chunhyangjeon, is the center. The following three OST albums are Corée Moranbong released in France in 1960, Musique de corée released in 1970, and 朝鮮の伝統音樂-唱劇 「春香伝」と伝統樂器- released in 1968 in Japan. While Corée Moranbong consists only of the music from the film Moranbong, the two subsequent albums included additional songs collected and recorded by Pyongyang National Broadcasting System. However, there is no information about the movie Moranbong on the album released in Japan. Under the circumstances, it is highly likely that the author of the record label or music commentary has not confirmed the existence of the movie Moranbong, and may have intentionally excluded related contents due to the background of the film's ban on its release. The results of analyzing the detailed scenes of the Changgeuk Chunhyangjeon, Farewell Song, Sipjang-ga, Chundangsigwa, Bakseokti and Prison Song in the movie Moranbong or OST album in the 1950s are as follows. First, the process of establishing the North Korean Changgeuk Chunhyangjeon in the 1950s was confirmed. The play, compiled in 1955 through the Joseon Changgeuk Collection, was settled in the form of a Changgeuk that can be performed in the late 1950s by the Changgeuk Chunhyangjeon between 1956 and 1958. Since the 1960s, Chunhyangjeon has no longer been performed as a traditional pansori-style Changgeuk, so the film Moranbong and the album Corée moranbong are almost the last records to capture the Changgeuk Chunhyangjeon and its music. Second, we confirmed the responses of the actors to the controversy over Takseong in the North Korean creative world in the 1950s. Until 1959, there was a voice of criticism surrounding Takseong and a voice of advocacy that it was also a national characteristic. Shin Woo-sun, who almost eliminated Takseong with clear and high-pitched phrases, air man who changed according to the situation, who chose Takseong but did not actively remove Takseong, Lim So-hyang, who tried to maintain his own tone while accepting some of modern vocalization. Although Cho Sang-sun and Lim So-hyang were also guaranteed roles to continue their voices, the selection/exclusion patterns in the movie Moranbong were linked to the Takseong removal guidelines required by North Korean musicians in the name of Dang and People in the 1950s. Second, Changgeuk actors' response to the controversy over the turbidity of the North Korean Changgeuk community in the 1950s was confirmed. Until 1959, there were voices of criticism and support surrounding Taksung in North Korea. Shin Woo-sun, who showed consistent performance in removing turbidity with clear, high-pitched vocal sounds, Gong Gi-nam, who did not actively remove turbidity depending on the situation, Cho Sang-sun, who accepted some of the vocalization required by the party, while maintaining his original tone. On the other hand, Cho Sang-seon and Lim So-hyang were guaranteed roles to continue their sounds, but the selection/exclusion patterns of Moranbong was independently linked to the guidelines for removing turbidity that the Gugak musicians who crossed to North Korea had been asked for.