통합 검색 | Korea Science

대어휘 연속음성인식을 위한 서브네트워크 기반의 1-패스 세미다이나믹 네트워크 디코딩 (1-Pass Semi-Dynamic Network Decoding Using a Subnetwork-Based Representation for Large Vocabulary Continuous Speech Recognition)

정민화;안동훈
- 대한음성학회지:말소리
- /
- 제50호
- /
- pp.51-69
- /
- 2004
In this paper, we present a one-pass semi-dynamic network decoding framework that inherits both advantages of fast decoding speed from static network decoders and memory efficiency from dynamic network decoders. Our method is based on the novel language model network representation that is essentially of finite state machine (FSM). The static network derived from the language model network [1][2] is partitioned into smaller subnetworks which are static by nature or self-structured. The whole network is dynamically managed so that those subnetworks required for decoding are cached in memory. The network is near-minimized by applying the tail-sharing algorithm. Our decoder is evaluated on the 25k-word Korean broadcast news transcription task. In case of the search network itself, the network is reduced by 73.4% from the tail-sharing algorithm. Compared with the equivalent static network decoder, the semi-dynamic network decoder has increased at most 6% in decoding time while it can be flexibly adapted to the various memory configurations, giving the minimal usage of 37.6% of the complete network size.
PDF

한국어 방송 뉴스 발화의 억양 기울기 특성 연구 (A Study on the Characteristics of the Intonational Slope of the Korean Broadcasting News Utterances)

인지영;성철재
- 대한음성학회지:말소리
- /
- 제66호
- /
- pp.21-39
- /
- 2008
The purpose of this study is to analyze the intonational slope characteristics of the Korean news utterances. Prosodic phrases were analyzed in terms of the K-ToBI labeling system. In addition, the change of intonation contour that occurs throughout the sentences was discussed in terms of types of media and gender. Results showed that the overall declination of the intonation contour of radio and male revealed a gentler slope than that of TV and female, respectively. While the regression of the top line slope showed male's higher $R^2$ with the number of words, the base line slope of the radio and female was proved to be highly influenced from the number of syllables, words, and prosodic phrases. A lot more independent variables statistically affected to the base line slope. This means that the base line slope was strongly related to the variables, the top line slope, otherwise, could be more freely fluctuated due to the light correlation with them.
PDF

Multimodal Approach for Summarizing and Indexing News Video

Kim, Jae-Gon;Chang, Hyun-Sung;Kim, Young-Tae;Kang, Kyeong-Ok;Kim, Mun-Churl;Kim, Jin-Woong;Kim, Hyung-Myung
- ETRI Journal
- /
- 제24권1호
- /
- pp.1-11
- /
- 2002
A video summary abstracts the gist from an entire video and also enables efficient access to the desired content. In this paper, we propose a novel method for summarizing news video based on multimodal analysis of the content. The proposed method exploits the closed caption data to locate semantically meaningful highlights in a news video and speech signals in an audio stream to align the closed caption data with the video in a time-line. Then, the detected highlights are described using MPEG-7 Summarization Description Scheme, which allows efficient browsing of the content through such functionalities as multi-level abstracts and navigation guidance. Multimodal search and retrieval are also within the proposed framework. By indexing synchronized closed caption data, the video clips are searchable by inputting a text query. Intensive experiments with prototypical systems are presented to demonstrate the validity and reliability of the proposed method in real applications.
PDF

Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models

Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
- ETRI Journal
- /
- 제33권5호
- /
- pp.741-751
- /
- 2011
This paper focuses on online speaker tracking for telephone conversations and broadcast news. Since the online applicability imposes some limitations on the tracking strategy, such as data insufficiency, a reliable approach should be applied to compensate for this shortage. In this framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigenvoice decomposition space are proposed in this paper. We believe that the eigenvoice adaptation techniques would help to embed the speaker space in the models and hence enrich the generality of the selected speaker models. Also, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on 2002 Rich Transcription Broadcast News and Conversational Telephone Speech corpus as well as a synthetic dataset. The indexing errors of the proposed framework on telephone conversations, broadcast news, and synthetic dataset are 8.77%, 9.36%, and 12.4%, respectively. Using the index tree structure approach, the run time of the proposed framework is improved by 22%.
https://doi.org/10.4218/etrij.11.0110.0686 인용 PDF KSCI

음성 비식별화 모델과 방송 음성 변조의 한국어 음성 비식별화 성능 비교 (Comparison of Korean Speech De-identification Performance of Speech De-identification Model and Broadcast Voice Modulation)

김승민;박대얼;최대선
- 스마트미디어저널
- /
- 제12권2호
- /
- pp.56-65
- /
- 2023
뉴스와 취재 프로그램 같은 방송에서는 제보자의 신원 보호를 위해 음성을 변조한다. 음성 변조 방법으로 피치(pitch)를 조절하는 방법이 가장 많이 사용되는데, 이 방법은 피치를 재조절하는 방식으로 쉽게 원본 음성과 유사하게 음성 복원이 가능하다. 따라서 방송 음성 변조 방법은 화자의 신원 보호를 제대로 해줄 수 없고 보안상 취약하기 때문에 이를 대체하기 위한 새로운 음성 변조 방법이 필요하다. 본 논문에서는 Voice Privacy Challenge에서 비식별화 성능이 검증된 Lightweight 음성 비식별화 모델을 성능 비교 모델로 사용하여 피치 조절을 사용한 방송 음성변조 방법과 음성 비식별화 성능 비교 실험 및 평가를 진행한다. Lightweight 음성 비식별화 모델의 6가지 변조 방법 중 비식별화 성능이 좋은 3가지 변조 방법 McAdams, Resampling, Vocal Tract Length Normalization(VTLN)을 사용하였으며 한국어 음성에 대한 비식별화 성능을 비교하기 위해 휴먼 테스트와 EER(Equal Error Rate) 테스트를 진행하였다. 실험 결과로 휴먼 테스트와 EER 테스트 모두 VTLN 변조 방법이 방송 변조보다 더 높은 비식별화 성능을 보였다. 결과적으로 한국어 음성에 대해 Lightweight 모델의 변조 방법은 충분한 비식별화 성능을 가지고 있으며 보안상 취약한 방송 음성 변조를 대체할 수 있을 것이다.
https://doi.org/10.30693/SMJ.2023.12.2.56 인용 PDF

샷의 타입을 이용한 뉴스 아카이브 시스템의 설계 및 구현 (Design and Implementation of a news Archive System using Shot Types)

한근주;낭종호;하명환;정병희;김경수
- 한국정보과학회논문지:컴퓨팅의 실제 및 레터
- /
- 제7권5호
- /
- pp.416-428
- /
- 2001
뉴스 아카이브 시스템을 구축하기 위하여서는 먼저 뉴스 비디오 스트림을 기사 단위로 인덱싱하고, 사용자가 기사 비디오를 모두 시청하지 않아도 그 내용을 이해할 수 있도록 하는 추상화 방법이 필요하다. 본 논문에서는 뉴스 비디오 스트림에 대하여 샷 타입을 이용하여 기사 단위로 인덱싱할 수 있는 새로운 기사 경계 검출 방법 및 기사 추상화 방법을 제안하다. 제안한 인덱싱 방법에서는 뉴스 비디오의 샷들을 앵커 샷, 인터뷰 샷, 연설 샷, 보도 샷, 그래픽 자료 샷 등으로 나눈다. 모든 기사는 앵커 샷으로 시작하고, 앵커 샷은 다른 샷에 비하여 길이가 길고 특별한 화면 구조를 가지고 있기 때문에 이를 이용하여 기사 단위의 인덱싱을 수행한다. 또한 각 기사에 대한 효과적인 추상화를 위하여 앵커 샷의 오른쪽 위에 있는 그래픽 데이타와 기사를 이루는 다른 샷들의 키 프레임들을 이용한 기사 포스터를 구성하는 방법을 제안하였다. 여러 종류의 뉴스 비디오 스트림에 대한 실험 결과에 의하면 본 논문에서 제안한 기사 경계 검출 알고리즘의 검출율(recall) 및 정확도 (precision)값이 각각 0.92 및 0.96 이상 됨을 알 수 있다. 또한 본 논문에서는 WWW상에서 수행되는 뉴스 아카이브 시스템의 프로토타입 시스템의 설계 및 구현에 대하여서도 설명한다.
PDF

운율 분석용 DB 작성을 위한 자동 레이블러(Automatic labeler)의 성능 평가 및 유용성

강상훈;이항섭;김회린
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 1996년도 10월 학술대회지
- /
- pp.468-471
- /
- 1996
이 논문에서는 대량의 음성합성용 운율 DB를 용이하게 구축하기 위해 음성번역시스템을 이용한 자동 레이블러의 성능을 다양한 음성데이타를 대상으로 평가하였다. 실험 결과 FM radio news문장, 대화체 문장 및 낭독체 문장 등에는 레이블링 대상 음소의 약 80% 이상이 오류가 30msec 이내인 범위로 레이블링 되며, 고립단어에 대해서는 약 60%의 성능을 보여주고 있다. 현재 당 연구실에서는 자동 레이블러를 이용하여 합성용 운율 DB 및 합성단위를 작성하고 있으며. 자동 레이블러를 이용함으로서 일관성 있는 레이블링 결과를 얻을 수 있을 환 아니라 작성하는데 소요되는 시간도 줄일 수 있었다
PDF

Enhancement of a language model using two separate corpora of distinct characteristics

조세형;정태선
- 한국지능시스템학회논문지
- /
- 제14권3호
- /
- pp.357-362
- /
- 2004
언어 모델은 음성 인식이나 필기체 문자 인식 등에서 다음 단어를 예측함으로써 인식률을 높이게 된다. 그러나 언어 모델은 그 도메인에 따라 모두 다르며 충분한 분량의 말뭉치를 수집하는 것이 거의 불가능하다. 본 논문에서는 N그램 방식의 언어모델을 구축함에 있어서 크기가 제한적인 말뭉치의 한계를 극복하기 위하여 두개의 말뭉치, 즉 소규모의 구어체 말뭉치와 대규모의 문어체 말뭉치의 통계를 이용하는 방법을 제시한다. 이 이론을 검증하기 위하여 수십만 단어 규모의 방송용 말뭉치에 수백만 이상의 신문 말뭉치를 결합하여 방송 스크립트에 대한 퍼플렉시티를 30% 향상시킨 결과를 획득하였다.
PDF KSCI

발생/소멸 패턴을 이용한 비정형 혼합 오디오의 주성분 검출 (Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns)

김사무엘
- 전자공학회논문지
- /
- 제50권12호
- /
- pp.224-231
- /
- 2013
이 논문에서는 비정형 혼합 오디오 신호에서 청취자에게 전달 되도록 의도된 주된 신호의 종류를 검출해 낼 수 있는 방법을 제안한다. 주된 신호의 종류는 음성, 음악, 음향효과로 정하였으며, 인텐서티 기반의 발생/소멸 패턴에서 추출할 수 있는 특징을 사용하여 그들을 구별할 수 있는 방법을 소개한다. 청취자가 주어진 오디오 신호에서 주된 신호를 받아들이는 주관적인 평가를 반영하기 위해서, 웹기반의 평가시스템을 도입하여 18시간의 다양한 종류의 장르 비디오의 오디오를 평가하였다. 실험을 통하여 비디오의 장르별로 각기 다른 성능을 보이지만 가능성 있는 (음성위주의 토크쇼의 경우 86.7%, 액션 영화 49.3%)정확도를 보였다.
https://doi.org/10.5573/ieek.2013.50.12.224 인용 PDF KSCI

Infodemic: The New Informational Reality of the Present Times

Araujo, Carlos Alberto Avila
- Journal of Information Science Theory and Practice
- /
- 제10권1호
- /
- pp.59-72
- /
- 2022
This text discusses elements and characteristics of contemporary informational reality, that is, the ways of producing, circulating, organizing, using, and appropriating information in the current context. Initially, seven terms and concepts used to describe this reality are discussed: fake news, false testimonials, hate speech, scientific negationism, disinformation, post-truth, and infodemic. Next, an attempt is made to present a framework for such phenomena as an object of study in information science. Therefore, this scenario is characterized based on the three main models of information science study: physical, cognitive, and social. The contribution of each of them to the study of contemporary informational reality is analyzed, identifying aspects such as the bubble effect, clickbaits, confirmation bias, cults of amateurism, and post-truth culture. Finally, it presents the discussion of a possible veritistic turn in the field, in order to think about elements not covered so far by information science in its task and challenge of producing adequate understanding and diagnoses of current phenomena. In conclusion, it is argued that only accurate and comprehensive diagnoses of such phenomena will allow information science to develop services and systems capable of combating their harmful effects.
https://doi.org/10.1633/JISTaP.2022.10.1.5 인용 PDF KSCI HTML

검색결과 72건 처리시간 0.024초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)