• Title/Summary/Keyword: Decoding model

Search Result 153, Processing Time 0.02 seconds

Conformer with lexicon transducer for Korean end-to-end speech recognition (Lexicon transducer를 적용한 conformer 기반 한국어 end-to-end 음성인식)

  • Son, Hyunsoo;Park, Hosung;Kim, Gyujin;Cho, Eunsoo;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.530-536
    • /
    • 2021
  • Recently, due to the development of deep learning, end-to-end speech recognition, which directly maps graphemes to speech signals, shows good performance. Especially, among the end-to-end models, conformer shows the best performance. However end-to-end models only focuses on the probability of which grapheme will appear at the time. The decoding process uses a greedy search or beam search. This decoding method is easily affected by the final probability output by the model. In addition, the end-to-end models cannot use external pronunciation and language information due to structual problem. Therefore, in this paper conformer with lexicon transducer is proposed. We compare phoneme-based model with lexicon transducer and grapheme-based model with beam search. Test set is consist of words that do not appear in training data. The grapheme-based conformer with beam search shows 3.8 % of CER. The phoneme-based conformer with lexicon transducer shows 3.4 % of CER.

High-Capacity Robust Image Steganography via Adversarial Network

  • Chen, Beijing;Wang, Jiaxin;Chen, Yingyue;Jin, Zilong;Shim, Hiuk Jae;Shi, Yun-Qing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.366-381
    • /
    • 2020
  • Steganography has been successfully employed in various applications, e.g., copyright control of materials, smart identity cards, video error correction during transmission, etc. Deep learning-based steganography models can hide information adaptively through network learning, and they draw much more attention. However, the capacity, security, and robustness of the existing deep learning-based steganography models are still not fully satisfactory. In this paper, three models for different cases, i.e., a basic model, a secure model, a secure and robust model, have been proposed for different cases. In the basic model, the functions of high-capacity secret information hiding and extraction have been realized through an encoding network and a decoding network respectively. The high-capacity steganography is implemented by hiding a secret image into a carrier image having the same resolution with the help of concat operations, InceptionBlock and convolutional layers. Moreover, the secret image is hidden into the channel B of carrier image only to resolve the problem of color distortion. In the secure model, to enhance the security of the basic model, a steganalysis network has been added into the basic model to form an adversarial network. In the secure and robust model, an attack network has been inserted into the secure model to improve its robustness further. The experimental results have demonstrated that the proposed secure model and the secure and robust model have an overall better performance than some existing high-capacity deep learning-based steganography models. The secure model performs best in invisibility and security. The secure and robust model is the most robust against some attacks.

Large Vocabulary Continuous Speech Recognition Based on Language Model Network (언어 모델 네트워크에 기반한 대어휘 연속 음성 인식)

  • 안동훈;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.543-551
    • /
    • 2002
  • In this paper, we present an efficient decoding method that performs in real time for 20k word continuous speech recognition task. Basic search method is a one-pass Viterbi decoder on the search space constructed from the novel language model network. With the consistent search space representation derived from various language models by the LM network, we incorporate basic pruning strategies, from which tokens alive constitute a dynamic search space. To facilitate post-processing, it produces a word graph and a N-best list subsequently. The decoder is tested on the database of 20k words and evaluated with respect to accuracy and RTF.

Comparison Research of Non-Target Sentence Rejection on Phoneme-Based Recognition Networks (음소기반 인식 네트워크에서의 비인식 대상 문장 거부 기능의 비교 연구)

  • Kim, Hyung-Tai;Ha, Jin-Young
    • MALSORI
    • /
    • no.59
    • /
    • pp.27-51
    • /
    • 2006
  • For speech recognition systems, rejection function as well as decoding function is necessary to improve the reliability. There have been many research efforts on out-of-vocabulary word rejection, however, little attention has been paid on non-target sentence rejection. Recently pronunciation approaches using speech recognition increase the need for non-target sentence rejection to provide more accurate and robust results. In this paper, we proposed filler model method and word/phoneme detection ratio method to implement non-target sentence rejection system. We made performance evaluation of filler model along to word-level, phoneme-level, and sentence-level filler models respectively. We also perform the similar experiment using word-level and phoneme-level word/phoneme detection ratio method. For the performance evaluation, the minimized average of FAR and FRR is used for comparing the effectiveness of each method along with the number of words of given sentences. From the experimental results, we got to know that word-level method outperforms the other methods, and word-level filler mode shows slightly better results than that of word detection ratio method.

  • PDF

Performance and Energy Consumption Analysis of 802.11 with FEC Codes over Wireless Sensor Networks

  • Ahn, Jong-Suk;Yoon, Jong-Hyuk;Lee, Kang-Woo
    • Journal of Communications and Networks
    • /
    • v.9 no.3
    • /
    • pp.265-273
    • /
    • 2007
  • This paper expands an analytical performance model of 802.11 to accurately estimate throughput and energy demand of 802.11-based wireless sensor network (WSN) when sensor nodes employ Reed-Solomon (RS) codes, one of block forward error correction (FEC) techniques. This model evaluates these two metrics as a function of the channel bit error rate (BER) and the RS symbol size. Since the basic recovery unit of RS codes is a symbol not a bit, the symbol size affects the WSN performance even if each packet carries the same amount of FEC check bits. The larger size is more effective to recover long-lasting error bursts although it increases the computational complexity of encoding and decoding RS codes. For applying the extended model to WSNs, this paper collects traffic traces from a WSN consisting of two TIP50CM sensor nodes and measures its energy consumption for processing RS codes. Based on traces, it approximates WSN channels with Gilbert models. The computational analyses confirm that the adoption of RS codes in 802.11 significantly improves its throughput and energy efficiency of WSNs with a high BER. They also predict that the choice of an appropriate RS symbol size causes a lot of difference in throughput and power waste over short-term durations while the symbol size rarely affects the long-term average of these metrics.

Performance of speech recognition unit considering morphological pronunciation variation (형태소 발음변이를 고려한 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.111-119
    • /
    • 2018
  • This paper proposes a method to improve speech recognition performance by extracting various pronunciations of the pseudo-morpheme unit from an eojeol unit corpus and generating a new recognition unit considering pronunciation variations. In the proposed method, we first align the pronunciation of the eojeol units and the pseudo-morpheme units, and then expand the pronunciation dictionary by extracting the new pronunciations of the pseudo-morpheme units at the pronunciation of the eojeol units. Then, we propose a new recognition unit that relies on pronunciation by tagging the obtained phoneme symbols according to the pseudo-morpheme units. The proposed units and their extended pronunciations are incorporated into the lexicon and language model of the speech recognizer. Experiments for performance evaluation are performed using the Korean speech recognizer with a trigram language model obtained by a 100 million pseudo-morpheme corpus and an acoustic model trained by a multi-genre broadcast speech data of 445 hours. The proposed method is shown to reduce the word error rate relatively by 13.8% in the news-genre evaluation data and by 4.5% in the total evaluation data.

CRFNet: Context ReFinement Network used for semantic segmentation

  • Taeghyun An;Jungyu Kang;Dooseop Choi;Kyoung-Wook Min
    • ETRI Journal
    • /
    • v.45 no.5
    • /
    • pp.822-835
    • /
    • 2023
  • Recent semantic segmentation frameworks usually combine low-level and high-level context information to achieve improved performance. In addition, postlevel context information is also considered. In this study, we present a Context ReFinement Network (CRFNet) and its training method to improve the semantic predictions of segmentation models of the encoder-decoder structure. Our study is based on postprocessing, which directly considers the relationship between spatially neighboring pixels of a label map, such as Markov and conditional random fields. CRFNet comprises two modules: a refiner and a combiner that, respectively, refine the context information from the output features of the conventional semantic segmentation network model and combine the refined features with the intermediate features from the decoding process of the segmentation model to produce the final output. To train CRFNet to refine the semantic predictions more accurately, we proposed a sequential training scheme. Using various backbone networks (ENet, ERFNet, and HyperSeg), we extensively evaluated our model on three large-scale, real-world datasets to demonstrate the effectiveness of our approach.

Performance of Tactics Mobile Communication System Based on UWB with Double Binary Turbo Code in Multi-User Interference Environments (다중 사용자 간섭이 존재하는 환경에서 이중이진 터보부호를 이용한 UWB 기반의 전술이동통신시스템 성능)

  • Kim, Eun-Cheol;Seo, Sung-Il;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.1
    • /
    • pp.39-50
    • /
    • 2010
  • In this paper, we analyze and simulate the performance of a tactics mobile communication system based on ultra wide band (UWB) in multi-user interference (MUI) environments. This system adopts a double binary turbo code for forward error correction (FEC). Wireless channel is modeled a modified Saleh and Valenzuela (SV) model. We employ a space time block coding (STBC) scheme for enhancing system performance. System performance is evaluated in terms of bit error probability. From the simulation results, it is confirmed that the tactics mobile communication system based on UWB, which is encoded with the double binary turbo code, can achieve a remarkable coding gain with reasonable encoding and decoding complexity in multi-user interference environments. It is also known that the bit error probability performance of the tactics mobile communication system based on UWB can be substantially improved by increasing the number of iterations in the decoding process for a fixed cod rate. Besides, we can demonstrate that the double binary turbo coding scheme is very effective for increasing the number of simultaneous users for a given bit error probability requirement.

A Study on the Analysis and Detection Method for Protecting Malware Spreading via E-mail (전자우편을 이용한 악성코드 유포방법 분석 및 탐지에 관한 연구)

  • Yang, Kyeong-Cheol;Lee, Su-Yeon;Park, Won-Hyung;Park, Kwang-Cheol;Lim, Jong-In
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.1
    • /
    • pp.93-101
    • /
    • 2009
  • This paper proposes the detection method of spreading mails which hacker injects malicious codes to steal the information. And I developed the 'Analysis model' which is decoding traffics when hacker's encoding them to steal the information. I researched 'Methodology of intrusion detection techniques' in the computer network monitoring. As a result of this simulation, I developed more efficient rules to detect the PCs which are infected malicious codes in the hacking mail. By proposing this security policy which can be applicable in the computer network environment including every government or company, I want to be helpful to minimize the damage by hacking mail with malicious codes.

Design and Implementation of a Reusable and Extensible HL7 Encoding/Decoding Framework (재사용성과 확장성 있는 HL7 인코딩/디코딩 프레임워크의 설계 및 구현)

  • Kim, Jung-Sun;Park, Seung-Hun;Nah, Yun-Mook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.96-106
    • /
    • 2002
  • this paper, we propose a flexible, reusable, and extensible HL7 encoding and decoding framework using a Message Object Model (MOM) and Message Definition Repository (MDR). The MOM provides an abstract HL7 message form represented by a group of objects and their relationships. It reflects logical relationships among the standard HL7 message elements such as segments, fields, and components, while enforcing the key structural constraints imposed by the standard. Since the MOM completely eliminates the dependency of the HL7 encoder and decoder on platform-specific data formats, it makes it possible to build the encoder and decoder as reusable standalone software components, enabling the interconnection of arbitrary heterogeneous hospital information systems(HISs) with little effort. Moreover, the MDR, an external database of key definitions for HL7 messages, helps make the encoder and decoder as resilient as possible to future modifications of the standard HL7 message formats. It is also used by the encoder and decoder to perform a well formedness check for their respective inputs (i. e., HL7 message objects expressed in the MOM and encoded HL7 message strings). Although we implemented a prototype version of the encoder and decoder using JAVA, they can be easily packaged and delivered as standalone components using the standard component frameworks like ActiveX, JAVABEAN, or CORBA component.