• 제목/요약/키워드: news speech

검색결과 72건 처리시간 0.028초

영어 뉴스와 자연발화에 나타나는 고성조 피치액센트의 차이점 (Differences in High Pitch Accents between News Speech and Natural Speech)

  • 최윤희;이주경
    • 음성과학
    • /
    • 제12권2호
    • /
    • pp.17-28
    • /
    • 2005
  • This paper argues that news speech entails a distinct intonational pattern from natural speech, effectively reflecting that it primarily focuses on providing new information. We conducted a phonetic experiment to compare the tonal contours between news speech and natural speech, examining the distributions of pitch accents and the overall pitch ranges. We utilized 70 American Press (AP) radio news utterances and 70 natural utterances extracted from TV dramas. Results show that news speech involves 3.38 H*'s (including L+H* and !H*) within an intonational phrase (IP) or intermediate phrase (ip) whereas natural speech, 1.8 in average. The number of IP/ip's per sentence is 3 in news speech, which is shown in the highest rate of 32.07% of the news speech, but it is merely 1, taking up the highest 41.42% in natural speech. Next, declination tends to be prevented in news speech, and the pitch range is much greater in news speech than in natural speech. Finally, a secondary stress syllable is comparatively frequently given a pitch accent in news speech, explicitly distinct from natural speech. These results can be interpreted as stating that news has the particular purpose of providing new information; every content word tends to be given a H* or its related pitch accent like L+H* or !H* because news speech assumes that every word conveys new information. This definitely brings about more IP/ip's per sentence due to a human physiological constraint; that is, more H*'s will cause more respiratory breaks. Also, greater pitch ranges and pitch accents imposed on secondary stress may be attributed to exaggerating new information.

  • PDF

방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰 (A Study on Noise-Robust Methods for Broadcast News Speech Recognition)

  • 정용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.71-83
    • /
    • 2004
  • Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.

  • PDF

ETRI 방송뉴스음성인식시스템 소개 (Introduction of ETRI Broadcast News Speech Recognition System)

  • 박준
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 춘계 학술대회 발표논문집
    • /
    • pp.89-93
    • /
    • 2006
  • This paper presents ETRI broadcast news speech recognition system. There are two major issues on the broadcast news speech recognition: 1) real-time processing and 2) out-of-vocabulary handling. For real-time processing, we devised the dual decoder architecture. The input speech signal is segmented based on the long-pause between utterances, and each decoder processes the speech segment alternatively. One decoder can start to recognize the current speech segment without waiting for the other decoder to recognize the previous speech segment completely. Thus, the processing delay is not accumulated. For out-of-vocabulary handling, we updated both the vocabulary and the language model, based on the recent news articles on the internet. By updating the language model as well as the vocabulary, we can improve the performance up to 17.2% ERR.

  • PDF

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • 제21권1E호
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

Korean LVCSR for Broadcast News Speech

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권2E호
    • /
    • pp.3-8
    • /
    • 2001
  • In this paper, we will examine a Korean large vocabulary continuous speech recognition (LVCSR) system for broadcast news speech. The combined vowel and implosive unit is included in a phone set together with other short phone units in order to obtain a longer unit acoustic model. The effect of this unit is compared with conventional phone units. The dictionary units for language processing are automatically extracted from eojeols appearing in transcriptions. Triphone models are used for acoustic modeling and a trigram model is used for language modeling. Among three major speaker groups in news broadcasts-anchors, journalists and people (those other than anchors or journalists, who are being interviewed), the speech of anchors and journalists, which has a lot of noise, was used for testing and recognition.

  • PDF

Application of Speech Recognition with Closed Caption for Content-Based Video Segmentations

  • Son, Jong-Mok;Bae, Keun-Sung
    • 음성과학
    • /
    • 제12권1호
    • /
    • pp.135-142
    • /
    • 2005
  • An important aspect of video indexing is the ability to segment video into meaningful segments, i.e., content-based video segmentation. Since the audio signal in the sound track is synchronized with image sequences in the video program, a speech signal in the sound track can be used to segment video into meaningful segments. In this paper, we propose a new approach to content-based video segmentation. This approach uses closed caption to construct a recognition network for speech recognition. Accurate time information for video segmentation is then obtained from the speech recognition process. For the video segmentation experiment for TV news programs, we made 56 video summaries successfully from 57 TV news stories. It demonstrates that the proposed scheme is very promising for content-based video segmentation.

  • PDF

반향제거기를 갖는 자동차 실내 환경에서의 음성인식 (Robust speech recognition in car environment with echo canceller)

  • 박철호;허원철;배건성
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.147-150
    • /
    • 2005
  • The performance of speech recognition in car environment is severely degraded when there is music or news coming from a radio or a CD player. Since reference signals are available from the audio unit in the car, it is possible to remove them with an adaptive filter. In this paper, we present experimental results of speech recognition in car environment using the echo canceller. For this, we generate test speech signals by adding music or news to the car noisy speech from Aurora2 DB. The HTK-based continuous HMT system is constructed for a recognition system. In addition, the MMSE-STSA method is used to the output of the echo canceller to remove the residual noise more.

  • PDF

통계적 기법을 이용한 화자변화 검출 실험 (A Speaker Change Detection Experiment that Uses a Statistical Method)

  • 이경록;김진영
    • 음성과학
    • /
    • 제8권4호
    • /
    • pp.59-72
    • /
    • 2001
  • In this paper, we experimented with speaker change detection that uses a statistical method for NOD (News On Demand) service. A specified speaker's change can find out content of each data in speech if analysed because it means change of data contents in news data. Speaker change detection acts as preprocessor that divide input speech by speaker. This is an important preprocessor phase for speaker tracking. We detected speaker change using GLR(generalized likelihood ratio) distance base division and BIC (Bayesian information criterion) base division among matrix method. An experiment verified speaker change point using BIC base division after divide by speaker unit using GLR distance base method first. In the experimental result, FAR (False Alarm Rate) was 63.29 in high noise environment and FAR was 54.28 in low noise environment in MDR (Missed Detection Rate) 15% neighborhood.

  • PDF

한국과 미국 방송사의 코로나19 뉴스에 대해 CNN 기반 정량적 음성 감정 양상 비교 분석 (Quantifying and Analyzing Vocal Emotion of COVID-19 News Speech Across Broadcasters in South Korea and the United States Based on CNN)

  • 남영자;채선규
    • 한국정보통신학회논문지
    • /
    • 제26권2호
    • /
    • pp.306-312
    • /
    • 2022
  • 전례 없는 코로나19 팬데믹 상황에서 대중의 정보에의 요구는 과도한 코로나19 뉴스 소비를 조장하였다. 뉴스는 대중의 심리적 안녕에도 영향을 미치기에 뉴스 보도 양태에 대한 각별한 주의가 요구된다. 이에 본 연구는 한국과 미국의 주요 뉴스 미디어의 코로나19 관련 뉴스의 음성 감정 양상을 합성곱 신경망에 기반하여 분석하였다. 분석 결과, 대부분의 뉴스 미디어에서 중립이 탐지되었으나 슬픔과 분노도 탐지되었다. 이러한 양상은한국의 뉴스 미디어에서 두드러진 반면 미국 뉴스 미디어에서는 나타나지 않았다. 본 연구는 코로나19 뉴스의 첫 음성 감정 분석 연구로, 뉴스의 감정 분석에 있어 새로운 방향을 제시할 뿐 아니라 팬데믹에 대한 이해 증진에 있어 광범위한 함의를 지닌다.

언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석 (Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling)

  • 정의정;이영직
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF