• Title/Summary/Keyword: Decoding model

Search Result 152, Processing Time 0.028 seconds

Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR (효율적 한국어 음성 인식을 위한 PTM 음절 모델)

  • Kim Bong-Wan;Lee Yong-Jn
    • MALSORI
    • /
    • no.50
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

Study on Decoding Strategies in Neural Machine Translation (인공신경망 기계번역에서 디코딩 전략에 대한 연구)

  • Seo, Jaehyung;Park, Chanjun;Eo, Sugyeong;Moon, Hyeonseok;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.69-80
    • /
    • 2021
  • Neural machine translation using deep neural network has emerged as a mainstream research, and an abundance of investment and studies on model structure and parallel language pair have been actively undertaken for the best performance. However, most recent neural machine translation studies pass along decoding strategy to future work, and have insufficient a variety of experiments and specific analysis on it for generating language to maximize quality in the decoding process. In machine translation, decoding strategies optimize navigation paths in the process of generating translation sentences and performance improvement is possible without model modifications or data expansion. This paper compares and analyzes the significant effects of the decoding strategy from classical greedy decoding to the latest Dynamic Beam Allocation (DBA) in neural machine translation using a sequence to sequence model.

1-Pass Semi-Dynamic Network Decoding Using a Subnetwork-Based Representation for Large Vocabulary Continuous Speech Recognition (대어휘 연속음성인식을 위한 서브네트워크 기반의 1-패스 세미다이나믹 네트워크 디코딩)

  • Chung Minhwa;Ahn Dong-Hoon
    • MALSORI
    • /
    • no.50
    • /
    • pp.51-69
    • /
    • 2004
  • In this paper, we present a one-pass semi-dynamic network decoding framework that inherits both advantages of fast decoding speed from static network decoders and memory efficiency from dynamic network decoders. Our method is based on the novel language model network representation that is essentially of finite state machine (FSM). The static network derived from the language model network [1][2] is partitioned into smaller subnetworks which are static by nature or self-structured. The whole network is dynamically managed so that those subnetworks required for decoding are cached in memory. The network is near-minimized by applying the tail-sharing algorithm. Our decoder is evaluated on the 25k-word Korean broadcast news transcription task. In case of the search network itself, the network is reduced by 73.4% from the tail-sharing algorithm. Compared with the equivalent static network decoder, the semi-dynamic network decoder has increased at most 6% in decoding time while it can be flexibly adapted to the various memory configurations, giving the minimal usage of 37.6% of the complete network size.

  • PDF

Fano Decoding with Timeout: Queuing Analysis

  • Pan, W. David;Yoo, Seong-Moo
    • ETRI Journal
    • /
    • v.28 no.3
    • /
    • pp.301-310
    • /
    • 2006
  • In mobile communications, a class of variable-complexity algorithms for convolutional decoding known as sequential decoding algorithms is of interest since they have a computational time that could vary with changing channel conditions. The Fano algorithm is one well-known version of a sequential decoding algorithm. Since the decoding time of a Fano decoder follows the Pareto distribution, which is a heavy-tailed distribution parameterized by the channel signal-to-noise ratio (SNR), buffers are required to absorb the variable decoding delays of Fano decoders. Furthermore, since the decoding time drawn by a certain Pareto distribution can become unbounded, a maximum limit is often employed by a practical decoder to limit the worst-case decoding time. In this paper, we investigate the relations between buffer occupancy, decoding time, and channel conditions in a system where the Fano decoder is not allowed to run with unbounded decoding time. A timeout limit is thus imposed so that the decoding will be terminated if the decoding time reaches the limit. We use discrete-time semi-Markov models to describe such a Fano decoding system with timeout limits. Our queuing analysis provides expressions characterizing the average buffer occupancy as a function of channel conditions and timeout limits. Both numerical and simulation results are provided to validate the analytical results.

  • PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao;Song, Cheng
    • Journal of Information Processing Systems
    • /
    • v.12 no.3
    • /
    • pp.410-421
    • /
    • 2016
  • In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.

MPEG4 decoding system modeling in SystemC (SystemC를 이용한 MPEG4 복호화 시스템 모델링)

  • 이미영;이승준;배영환
    • Proceedings of the IEEK Conference
    • /
    • 2001.06b
    • /
    • pp.109-112
    • /
    • 2001
  • In this paper, I present a MPEG4 decoding system modeling in SystemC, a new C/C++ based system simulation approach, In the modeling, MPEG4 decoding behavior is modeled and verified. And I partitions the MPEG4 decoding system into several hardware components which will be implemented at low level hardware design flow and I model a synchronized hardware block communication through data ports.

  • PDF

A Differential SFBC-OFDM for a DMB System with Multiple Antennas

  • Woo, Kyung-Soo;Lee, Kyu-In;Paik, Jong-Ho;Park, Kyung-Won;Yang, Won-Young;Cho, Yong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.2A
    • /
    • pp.195-202
    • /
    • 2007
  • A differential space-frequency block code - orthogonal frequency division multiplexing (SFBC-OFDM) scheme as a multiple-input multiple-output (MIMO) transmission technique for next-generation digital multimedia broadcasting (DMB) is proposed in this paper. A linear decoding method for differential SFBC, which performs comparably to the ML decoding method, is derived for the cases of two or four transmit antennas. A simple table lookup method is proposed to improve the efficiency of the encoding/decoding process of DSFBC for the case of non-constant modulus constellations. A DMB MIMO channel model, developed by extending the 3GPP MIMO model to fit DMB environments, is used to compare BER performances of differential space block code schemes for various channel environments. Simulation results show that the differential SFBC-16QAM scheme using either four transmit antennas with one receive antenna or two transmit antennas with two receive antennas achieves a performance gain of 12dB than that of the conventional DQPSK scheme, even with a data rate twice faster.

On the (n, m, k)-Cast Capacity of Wireless Ad Hoc Networks

  • Kim, Hyun-Chul;Sadjadpour, Hamid R.;Garcia-Luna-Aceves, Jose Joaquin
    • Journal of Communications and Networks
    • /
    • v.13 no.5
    • /
    • pp.511-517
    • /
    • 2011
  • The capacity of wireless ad-hoc networks is analyzed for all kinds of information dissemination based on single and multiple packet reception schemes under the physical model. To represent the general information dissemination scheme, we use (n, m, k)-cast model [1] where n, m, and k (k ${\leq}$ m) are the number of nodes, destinations and closest destinations that actually receive packets from the source in each (n, m, k)-cast group, respectively. We first consider point-to-point communication, which implies single packet reception between transmitter-receiver pairs and compute the (n, m, k)-cast communications. Next, the achievable throughput capacity is computed when receiver nodes are endowed with multipacket reception (MPR) capability. We adopt maximum likelihood decoding (MLD) and successive interference cancellation as optimal and suboptimal decoding schemes for MPR. We also demonstrate that physical and protocol models for MPR render the same capacity when we utilize MLD for decoding.

Voice-Pishing Detection Algorithm Based on 3GPP2 SMV (3GPP2 SMV 기반의 보이스 피싱 검출 알고리즘)

  • Lee, Kye-Hwan;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.4
    • /
    • pp.92-99
    • /
    • 2008
  • We propose an effective voice-pishing detection algorithm based on the 3GPP2 selectable mode vocoder (SMV). The detection of voice pishing is performed based on a Gaussian mixture model (GMM) using decoding parameters of the SMV directly extracted from the decoding process of the transmitted speech information in the mobile phone. The experimental results indicate that SMV decoding parameters are effective in discriminating between general voice and phisher's voice and the performance is significantly acceptable when the proposed technique is applied.

Subsidiary Maximum Likelihood Iterative Decoding Based on Extrinsic Information

  • Yang, Fengfan;Le-Ngoc, Tho
    • Journal of Communications and Networks
    • /
    • v.9 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • This paper proposes a multimodal generalized Gaussian distribution (MGGD) to effectively model the varying statistical properties of the extrinsic information. A subsidiary maximum likelihood decoding (MLD) algorithm is subsequently developed to dynamically select the most suitable MGGD parameters to be used in the component maximum a posteriori (MAP) decoders at each decoding iteration to derive the more reliable metrics performance enhancement. Simulation results show that, for a wide range of block lengths, the proposed approach can enhance the overall turbo decoding performance for both parallel and serially concatenated codes in additive white Gaussian noise (AWGN), Rician, and Rayleigh fading channels.