• 제목/요약/키워드: Decoding model

검색결과 153건 처리시간 0.024초

효율적 한국어 음성 인식을 위한 PTM 음절 모델 (Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR)

  • 김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

인공신경망 기계번역에서 디코딩 전략에 대한 연구 (Study on Decoding Strategies in Neural Machine Translation)

  • 서재형;박찬준;어수경;문현석;임희석
    • 한국융합학회논문지
    • /
    • 제12권11호
    • /
    • pp.69-80
    • /
    • 2021
  • 딥러닝 모델을 활용한 인공신경망 기계번역 (Neural machine translation)이 주류 분야로 떠오르면서 최고의 성능을 위해 모델과 데이터 언어 쌍에 대한 많은 투자와 연구가 활발하게 진행되고 있다. 그러나, 최근 대부분의 인공신경망 기계번역 연구들은 번역 문장의 품질을 극대화하는 자연어 생성을 위한 디코딩 전략 (Decoding strategy)에 대해서는 미래 연구 과제로 남겨둔 채 다양한 실험과 구체적인 분석이 부족한 상황이다. 기계번역에서 디코딩 전략은 번역 문장을 생성하는 과정에서 탐색 경로를 최적화 하고, 모델 변경 및 데이터 확장 없이도 성능 개선이 가능하다. 본 논문은 시퀀스 투 시퀀스 (Sequence to Sequence) 모델을 활용한 신경망 기반의 기계번역에서 고전적인 그리디 디코딩 (Greedy decoding)부터 최신의 방법론인 Dynamic Beam Allocation (DBA)까지 비교 분석하여 디코딩 전략의 효과와 그 의의를 밝힌다.

대어휘 연속음성인식을 위한 서브네트워크 기반의 1-패스 세미다이나믹 네트워크 디코딩 (1-Pass Semi-Dynamic Network Decoding Using a Subnetwork-Based Representation for Large Vocabulary Continuous Speech Recognition)

  • 정민화;안동훈
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.51-69
    • /
    • 2004
  • In this paper, we present a one-pass semi-dynamic network decoding framework that inherits both advantages of fast decoding speed from static network decoders and memory efficiency from dynamic network decoders. Our method is based on the novel language model network representation that is essentially of finite state machine (FSM). The static network derived from the language model network [1][2] is partitioned into smaller subnetworks which are static by nature or self-structured. The whole network is dynamically managed so that those subnetworks required for decoding are cached in memory. The network is near-minimized by applying the tail-sharing algorithm. Our decoder is evaluated on the 25k-word Korean broadcast news transcription task. In case of the search network itself, the network is reduced by 73.4% from the tail-sharing algorithm. Compared with the equivalent static network decoder, the semi-dynamic network decoder has increased at most 6% in decoding time while it can be flexibly adapted to the various memory configurations, giving the minimal usage of 37.6% of the complete network size.

  • PDF

Fano Decoding with Timeout: Queuing Analysis

  • Pan, W. David;Yoo, Seong-Moo
    • ETRI Journal
    • /
    • 제28권3호
    • /
    • pp.301-310
    • /
    • 2006
  • In mobile communications, a class of variable-complexity algorithms for convolutional decoding known as sequential decoding algorithms is of interest since they have a computational time that could vary with changing channel conditions. The Fano algorithm is one well-known version of a sequential decoding algorithm. Since the decoding time of a Fano decoder follows the Pareto distribution, which is a heavy-tailed distribution parameterized by the channel signal-to-noise ratio (SNR), buffers are required to absorb the variable decoding delays of Fano decoders. Furthermore, since the decoding time drawn by a certain Pareto distribution can become unbounded, a maximum limit is often employed by a practical decoder to limit the worst-case decoding time. In this paper, we investigate the relations between buffer occupancy, decoding time, and channel conditions in a system where the Fano decoder is not allowed to run with unbounded decoding time. A timeout limit is thus imposed so that the decoding will be terminated if the decoding time reaches the limit. We use discrete-time semi-Markov models to describe such a Fano decoding system with timeout limits. Our queuing analysis provides expressions characterizing the average buffer occupancy as a function of channel conditions and timeout limits. Both numerical and simulation results are provided to validate the analytical results.

  • PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao;Song, Cheng
    • Journal of Information Processing Systems
    • /
    • 제12권3호
    • /
    • pp.410-421
    • /
    • 2016
  • In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.

SystemC를 이용한 MPEG4 복호화 시스템 모델링 (MPEG4 decoding system modeling in SystemC)

  • 이미영;이승준;배영환
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(2)
    • /
    • pp.109-112
    • /
    • 2001
  • In this paper, I present a MPEG4 decoding system modeling in SystemC, a new C/C++ based system simulation approach, In the modeling, MPEG4 decoding behavior is modeled and verified. And I partitions the MPEG4 decoding system into several hardware components which will be implemented at low level hardware design flow and I model a synchronized hardware block communication through data ports.

  • PDF

A Differential SFBC-OFDM for a DMB System with Multiple Antennas

  • Woo, Kyung-Soo;Lee, Kyu-In;Paik, Jong-Ho;Park, Kyung-Won;Yang, Won-Young;Cho, Yong-Soo
    • 한국통신학회논문지
    • /
    • 제32권2A호
    • /
    • pp.195-202
    • /
    • 2007
  • A differential space-frequency block code - orthogonal frequency division multiplexing (SFBC-OFDM) scheme as a multiple-input multiple-output (MIMO) transmission technique for next-generation digital multimedia broadcasting (DMB) is proposed in this paper. A linear decoding method for differential SFBC, which performs comparably to the ML decoding method, is derived for the cases of two or four transmit antennas. A simple table lookup method is proposed to improve the efficiency of the encoding/decoding process of DSFBC for the case of non-constant modulus constellations. A DMB MIMO channel model, developed by extending the 3GPP MIMO model to fit DMB environments, is used to compare BER performances of differential space block code schemes for various channel environments. Simulation results show that the differential SFBC-16QAM scheme using either four transmit antennas with one receive antenna or two transmit antennas with two receive antennas achieves a performance gain of 12dB than that of the conventional DQPSK scheme, even with a data rate twice faster.

On the (n, m, k)-Cast Capacity of Wireless Ad Hoc Networks

  • Kim, Hyun-Chul;Sadjadpour, Hamid R.;Garcia-Luna-Aceves, Jose Joaquin
    • Journal of Communications and Networks
    • /
    • 제13권5호
    • /
    • pp.511-517
    • /
    • 2011
  • The capacity of wireless ad-hoc networks is analyzed for all kinds of information dissemination based on single and multiple packet reception schemes under the physical model. To represent the general information dissemination scheme, we use (n, m, k)-cast model [1] where n, m, and k (k ${\leq}$ m) are the number of nodes, destinations and closest destinations that actually receive packets from the source in each (n, m, k)-cast group, respectively. We first consider point-to-point communication, which implies single packet reception between transmitter-receiver pairs and compute the (n, m, k)-cast communications. Next, the achievable throughput capacity is computed when receiver nodes are endowed with multipacket reception (MPR) capability. We adopt maximum likelihood decoding (MLD) and successive interference cancellation as optimal and suboptimal decoding schemes for MPR. We also demonstrate that physical and protocol models for MPR render the same capacity when we utilize MLD for decoding.

3GPP2 SMV 기반의 보이스 피싱 검출 알고리즘 (Voice-Pishing Detection Algorithm Based on 3GPP2 SMV)

  • 이계환;장준혁
    • 대한전자공학회논문지SP
    • /
    • 제45권4호
    • /
    • pp.92-99
    • /
    • 2008
  • 본 논문에서는 보이스 피싱 (Voice Pishing) 예방을 위한 알고리즘을 3GPP2 Selectable Mode Vocoder (SMV) 코딩 파라미터를 기반으로 제안한다. 상대방 휴대폰에서 전송된 신호를 기반으로 SMV의 복호화 과정에서 자동적으로 추출되는 중요 특징벡터만을 사용하여 Gaussian Mixture Model (GMM)을 구성하고 이를 기반으로 보이스 피싱 예방을 위한 검출 알고리즘을 제안하였다. 실험 결과 제안된 코딩 파라미터 기반의 보이스 피싱 알고리즘이 전화사기 예방에 우수한 성능을 보인 것을 알 수 있었다.

Subsidiary Maximum Likelihood Iterative Decoding Based on Extrinsic Information

  • Yang, Fengfan;Le-Ngoc, Tho
    • Journal of Communications and Networks
    • /
    • 제9권1호
    • /
    • pp.1-10
    • /
    • 2007
  • This paper proposes a multimodal generalized Gaussian distribution (MGGD) to effectively model the varying statistical properties of the extrinsic information. A subsidiary maximum likelihood decoding (MLD) algorithm is subsequently developed to dynamically select the most suitable MGGD parameters to be used in the component maximum a posteriori (MAP) decoders at each decoding iteration to derive the more reliable metrics performance enhancement. Simulation results show that, for a wide range of block lengths, the proposed approach can enhance the overall turbo decoding performance for both parallel and serially concatenated codes in additive white Gaussian noise (AWGN), Rician, and Rayleigh fading channels.