• Title/Summary/Keyword: 화자확인

Search Result 246, Processing Time 0.024 seconds

The usefulness of the depth images in image-based speech synthesis (영상 기반 음성합성에서 심도 영상의 유용성)

  • Ki-Seung Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.1
    • /
    • pp.67-74
    • /
    • 2023
  • The images acquired from the speaker's mouth region revealed the unique patterns according to the corresponding voices. By using this principle, the several methods were proposed in which speech signals were recognized or synthesized from the images acquired at the speaker's lower face. In this study, an image-based speech synthesis method was proposed in which the depth images were cooperatively used. Since depth images yielded depth information that cannot be acquired from optical image, it can be used for the purpose of supplementing flat optical images. In this paper, the usefulness of depth images from the perspective of speech synthesis was evaluated. The validation experiment was carried out on 60 Korean isolated words, it was confirmed that the performance in terms of both subjective and objective evaluation was comparable to the optical image-based method. When the two images were used in combination, performance improvements were observed compared with when each image was used alone.

Pronunciation of the Korean diphthong /jo/: Phonetic realizations and acoustic properties (한국어 /ㅛ/의 발음 양상 연구: 발음형 빈도와 음향적 특징을 중심으로)

  • Hyangwon Lee
    • Phonetics and Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.9-17
    • /
    • 2023
  • The purpose of this study is to determine how the Korean diphthong /jo/ shows phonetic variation in various linguistic environments. The pronunciation of /jo/ is discussed, focusing on the relationship between phonetic variation and the distribution range of vowels. The location in a word (monosyllable, word-initial, word-medial, word-final) and word class (content word, function word) were analyzed using the speech of 10 female speakers of the Seoul Corpus. As a result of determining the frequency of appearance of /jo/ in each environment, the pronunciation type and word class were affected by the location in a word. Frequent phonetic reduction was observed in the function word /jo/ in the acoustic analysis. The word class did not change the average phonetic values of /jo/, but changed the distribution of individual tokens. These results indicate that the linguistic environment affects the phonetic distribution of vowels.

A case study of Digital humanities lecture on Marcel Proust's À La Recherche du temps perdu (마르셀 프루스트의 『잃어버린 시간을 찾아서』에 대한 디지털인문학적 강의 운영 사례 연구)

  • Jinyoung MIN
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.269-275
    • /
    • 2023
  • In 2021, the 150th anniversary of Proust's birth, and in 2022, the 100th anniversary of his death, the interest in À la recherche du temps perdu increased. We took advantage of a digital humanities approach to make these seven novels known as difficult easily accessible to French literature major korean students. We let the students analyze using the analyzing tools for the big data and find some clues to understand the works through the visualized data. We picked out the main characters and places that appear in his works with Wordcloud, and checked the awareness of Proust in domestic and foreign through the various sites to analyze the big data, such as Big Kinds and Textom. Through the methodology of digital humanities, the students commented that they have gradually enlarged their understanding breadth for Proust's 『In Search of Lost Time』 rather than giving up it as difficult. This study confirmed that applying the big data analysis and digital humanities is an appropriate teaching method in finding ways for the students to broaden the understanding of French literature.

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.

Speech/Music Discrimination Using Spectrum Analysis and Neural Network (스펙트럼 분석과 신경망을 이용한 음성/음악 분류)

  • Keum, Ji-Soo;Lim, Sung-Kil;Lee, Hyon-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.207-213
    • /
    • 2007
  • In this research, we propose an efficient Speech/Music discrimination method that uses spectrum analysis and neural network. The proposed method extracts the duration feature parameter(MSDF) from a spectral peak track by analyzing the spectrum, and it was used as a feature for Speech/Music discriminator combined with the MFSC. The neural network was used as a Speech/Music discriminator, and we have reformed various experiments to evaluate the proposed method according to the training pattern selection, size and neural network architecture. From the results of Speech/Music discrimination, we found performance improvement and stability according to the training pattern selection and model composition in comparison to previous method. The MSDF and MFSC are used as a feature parameter which is over 50 seconds of training pattern, a discrimination rate of 94.97% for speech and 92.38% for music. Finally, we have achieved performance improvement 1.25% for speech and 1.69% for music compares to the use of MFSC.

Flora of Hangyeong Gotjawall Forest Genetic Resource Reserve Area in Jeju-do (한경 곶자왈 산림유전자원보호구역의 식물상)

  • Park, Ji-Hyun;Hyun, Hwa-Ja;Lim, Eun-Young;Kim, Chang-Uk;Chung, Jun-Ho;Kang, Shin-Ho;Song, Gwanpil
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2018.10a
    • /
    • pp.61-61
    • /
    • 2018
  • 한경 곶자왈 산림유전자원보호구역은 개가시나무와 백서향 등을 포함한 암괴지대의 상록활엽수림으로서 그 중요성이 인정되어 산림유전자원 생태보전지역으로 지정 관리되고 있다. 따라서 본 지역의 식물상을 조사하여 곶자왈의 분포식물을 파악하고자 실시하였다. 본 연구는 한라산 서쪽 곶자왈지역 중 곶자왈 산림유전자원 보호구역으로 지정된 지역을 2017년 6월부터 2017년 9월까지 20여 차례 현장조사를 통하여 표본을 수집하고 이를 기록 정리하였다. 조사지역의 식물은 총 83과 181속 236종 23변종 2품종 총 261종으로 조사결과 양치식물이 7과 18속 30종 2변종 32분류군, 나자식물이 1과 1속 2분류군, 피자식물은 74과 162속 204종 21변종 2품종 227분류군 이었다. 그 중 쌍자엽식물은 67과 139속 179종 16변종 1품종 196분류군이고, 단자엽식물이 7과 23속 25종 5변종 1품종 31분류군이었다. 조사에서 확인된 환경부 멸종위기야생식물은 개가시나무 1종 이었다. 식물 구계학적 특정식물로는 총 97분류군이 조사되었다. 등급별로는 V등급 검정개관중, 개가시나무로 2분류군, IV등급 백서향, 녹나무 등 12분류군, III등급 아왜나무, 호자나무 등 37분류군, II등급 올벚나무, 개산초 등 13분류군, I 등급 푸조나무, 자금우 등 33분류군으로 확인되었다. 한국의 적색목록 식물을 분석해보면 취약(VU)은 개가시나무, 섬다래 등 3분류군, 준위협(NT)은 백서향, 약난초로 2분류군, 관심대상종(LC)은 골고사리, 새우난초 등 9분류군, 미평가종(NE)은 빌레나무 1분류군으로 나타났다. 이 결과는 청수 저지 곶자왈 지역의 생태학적 자료로 제주도 내 곶자왈과 비교하였을 때 맹아가 발달한 종가시나무가 우점하는 지역으로 향후 상록수림 천이에 의해 상록수 아래에 자라는 식물로 낙엽활엽수가 대체될 것으로 추정되어 지속적인 조사가 요구된다.

  • PDF

Augmented Presentation Framework Design and System Implementation for Immersive Information Visualization and Delivery (몰입적 정보 표현과 전달을 위한 증강 프레젠테이션 디자인 및 시스템 구현)

  • Kim, Minju;Wohn, Kwangyun
    • Journal of the HCI Society of Korea
    • /
    • v.12 no.1
    • /
    • pp.5-13
    • /
    • 2017
  • Interactive intervention of the human presenter is one of the important factors that make the visualization more effective. Rather than just showing the content, the presenter enhances the process of the information delivery by providing the context of visualization. In this paper, we define this as an augmented presentation. In augmented presentation concept, the presenter can facilitate presentation more actively by being fully immersed in the visualization space and reaching and interacting into digital information. In order to concrete the concept, we design presentation space that enables the presenter to be seamlessly immersed in the visualization. Also we increase the presenter's roles as a storyteller, controller and augmenter allowing the presenter to fully support communicative process between the audience and the visualization. Then, we present an augmented presentation system to verify the proposed concept. We rendered 3D visualization through a half-mirror film and a wall projection screen that are place in parallel and applied with stereoscopic images, then, spatially align the presenter inside the virtual visualization space. After that, we conduct a controlled experiment to investigate the subjective level of immersion and engagement of the audience to HoloStation compared to traditional presentation system. Our initial investigation suggests that the newly conceived augmented presentation has potential not only to enhance the information presentation but also to supports the delivery of visualization.

On the speaker's position estimation using TDOA algorithm in vehicle environments (자동차 환경에서 TDOA를 이용한 화자위치추정 방법)

  • Lee, Sang-Hun;Choi, Hong-Sub
    • Journal of Digital Contents Society
    • /
    • v.17 no.2
    • /
    • pp.71-79
    • /
    • 2016
  • This study is intended to compare the performances of sound source localization methods used for stable automobile control by improving voice recognition rate in automobile environment and suggest how to improve their performances. Generally, sound source location estimation methods employ the TDOA algorithm, and there are two ways for it; one is to use a cross correlation function in the time domain, and the other is GCC-PHAT calculated in the frequency domain. Among these ways, GCC-PHAT is known to have stronger characteristics against echo and noise than the cross correlation function. This study compared the performances of the two methods above in automobile environment full of echo and vibration noise and suggested the use of a median filter additionally. We found that median filter helps both estimation methods have good performances and variance values to be decreased. According to the experimental results, there is almost no difference in the two methods' performances in the experiment using voice; however, using the signal of a song, GCC-PHAT is 10% more excellent than the cross correlation function in terms of the recognition rate. Also, when the median filter was added, the cross correlation function's recognition rate could be improved up to 11%. And in regarding to variance values, both methods showed stable performances.

Evaluation of Treatment Response Using Diffusion-Weighted MAI in Metastatic Spines (척추 전이암에서 확산강조 자기공명 영상을 이용한 치료반응의 평가)

  • Lee, Jang-Jin;Shin, Sei-One
    • Journal of Yeungnam Medical Science
    • /
    • v.18 no.1
    • /
    • pp.30-38
    • /
    • 2001
  • Background: The purpose of this study was to evaluate the usefulness of diffusion-weighted magnetic resonance imaging for monitoring the response to radiation therapy in metastatic bone marrow of the spines. Materials and Methods: Twenty-one patients with metastatic bone marrow of the spines were examined with MRI. Diffusion-weighted and spin-echo MRI were performed in 10 patients before and after radiation therapy with or without systemic chemotherapy, and performed in 11 patients after radiation therapy alone. Follow up spin-echo and diffusion-weighted MRI were obtained at 1 to 6 months after radiation therapy according to patients' condition. The diffusion-weighted imaging sequence was based on reversed fast imaging with steady-state precession (PSIF). Signal intensity changes of the metastatic bone marrows before and after radiation therapy on conventional spin-echo sequence MRI and diffusion-weighted MRI were evaluated. Bone marrow contrast ratios and signal-to-noise ratios before and after radiation therapy of diffusion- weighted MRI were analyzed. Results: All metastatic bone marrow of the spinal bodies were hyperintense to normal bone marrow of the spinal bodies on pretreatment diffusion-weighted MRI and positive bone marrow contrast ratios(p<0.001), and hypointense to normal spinal bodies on posttreatment diffusion-weighted MRI and negative bone marrow contrast ratios(p<0.001). The signal to noise ratios after treatment decreased comparing with those of pretreatment. Decreased signal intensity of the metastatic bone marrows on diffusion-weighted MRI began to be observed at average more than one month after the initiation of the radiation therapy. Conclusion: These results suggest that diffusion-weighted MRI would be an excellent method for monitoring the response to therapy of metastatic bone marrow of the spinal bodies, however, must be investigated in a larger series of patients with longer follow up period.

  • PDF

A realization of pauses in utterance across speech style, gender, and generation (과제, 성별, 세대에 따른 휴지의 실현 양상 연구)

  • Yoo, Doyoung;Shin, Jiyoung
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.33-44
    • /
    • 2019
  • This paper dealt with how realization of pauses in utterance is affected by speech style, gender, and generation. For this purpose, we analyzed the frequency and duration of pauses. Pauses were categorized into four types: pause with breath, pause with no breath, utterance medial pause, and utterance final pause. Forty-eight subjects living in Seoul were chosen from the Korean Standard Speech Database. All subjects engaged in reading and spontaneous speech, through which we could also compare the realization between the two speech styles. The results showed that utterance final pauses had longer durations than utterance medial pauses. It means that utterance final pause has a function that signals the end of an utterance to the audience. For difference between tasks, spontaneous speech had longer and more frequent pauses because of cognitive reasons. With regard to gender variables, women produced shorter and less frequent pauses. For male speakers, the duration of pauses with breath was significantly longer. Finally, for generation variable, older speakers produced more frequent pauses. In addition, the results showed several interaction effects. Male speakers produced longer pauses, but this gender effect was more prominent at the utterance final position.