• Title/Summary/Keyword: Encoder Model

Search Result 354, Processing Time 0.023 seconds

Optimization of MPEG-4 AAC Codec on PDA (휴대 단말기용 MPEG-4 AAC 코덱의 최적화)

  • 김동현;김도형;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.237-244
    • /
    • 2002
  • In this paper we mention the optimization of MPEG-4 VM (Moving Picture Expert Group-4 Verification Model) GA (General Audio) AAC (Advanced Audio Coding) encoder and the design of the decoder for PDA (Personal Digital Assistant) using MPEG-4 VM source. We profiled the VMC source and several optimization methods have applied to those selected functions from the profiling. Intel Pentium III 600 MHz PC, which uses windows 98 as OS, takes about 20 times of encoding time compared to input sample running time, with additional options, and about 10 times without any option. Decoding time on PDA was over 35 seconds for the 17 seconds input sample. After optimization, the encoding time has reduced to 50% and the real time decoding has achieved on PDA.

An Efficient Computation of FFT for MPEG/Audio Psycho-Acoustic Model (MPEG 심리음향모델의 고속 구현을 위한 효율적 FFT 연산)

  • 송건호;이근섭;박영철;윤대희
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.261-269
    • /
    • 2004
  • In this paper, an efficient algorithm for computing in the MPEG/audio Layer Ⅲ (MP3) encoder is proposed. The proposed algerian performs a full-band 1024-point FFT by computing 32-point FFT's of 32 subband outputs. To reduce the aliasing caused by the analysis filter bank, an aliasing cancellation butterfly is developed. A major benefit of the proposed algorithm is the computational saving. By using the proposed algorithm, it is possible to save 40~50% of computations for FFT, which results in about 20% reduction of the PAM-2 complexity.

Understanding recurrent neural network for texts using English-Korean corpora

  • Lee, Hagyeong;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.313-326
    • /
    • 2020
  • Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well.

A Study on Performance of Parmatic Coding and TCM in Rayleigh Fading Environment (Rayleigh 페이딩하에서 pragmatic 부호와 TCM의 성능에 관한 연구)

  • 강민정;방성일;진년강
    • The Proceeding of the Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.4 no.1
    • /
    • pp.20-27
    • /
    • 1993
  • In this paper, a model of TCM / M-PSK with set partitioning and a model of the combined M-ary PSK system with pragmatic coding for digital radio communication are realized. The equations of error probability for TCM / M-PSK system and the combined M-ary PSK system with pragmatic coding are derived on the conditions of the Rayleigh fading with the AWGN. It is found that the combined M-ary PSK systemwith pragmatic coding in the AWGN channel can not be applied to the fading channel since uncoded bits cause parallel:parallel paths in the trellis diagram to degrade the performance of the system. However, the use of pragmatic coding in the AWGN channel could simplify the given system since single convolutional encoder / decoder is required.

  • PDF

Analysis and extraction method of noise parameters for short channel MOSFET thermal noise modeling (단채널 MOSFET의 열잡음 모델링을 위한 잡음 파라메터의 분석과 추출방법)

  • Kim, Gue-Chol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.12
    • /
    • pp.2655-2661
    • /
    • 2009
  • In this paper, an accurate noise parameters for thermal noise modeling of short channel MOSFET is derived and extracted. Fukui model for calculating the noise parameters of a MOSFET is modified by considering effects of parasitic elements in short channel, and it is compared with conventional noise model equation. In addition, for obtaining the intrinsic noise sources of devices, noise parameters(minimum noise figure $F_{min}$, equivalent noise resistance $R_n$ optimized source admittance $Y_{opt}=G_{opt}+B_{opt}$) in submicron MOSFETs is extracted. With this extraction method, the intrinsic noise parameters of MOSFET without effects of probe pad and extrinsic parasitic elements from RF noise measurements can be directly obtained.

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.

Boundary and Reverse Attention Module for Lung Nodule Segmentation in CT Images (CT 영상에서 폐 결절 분할을 위한 경계 및 역 어텐션 기법)

  • Hwang, Gyeongyeon;Ji, Yewon;Yoon, Hakyoung;Lee, Sang Jun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.5
    • /
    • pp.265-272
    • /
    • 2022
  • As the risk of lung cancer has increased, early-stage detection and treatment of cancers have received a lot of attention. Among various medical imaging approaches, computer tomography (CT) has been widely utilized to examine the size and growth rate of lung nodules. However, the process of manual examination is a time-consuming task, and it causes physical and mental fatigue for medical professionals. Recently, many computer-aided diagnostic methods have been proposed to reduce the workload of medical professionals. In recent studies, encoder-decoder architectures have shown reliable performances in medical image segmentation, and it is adopted to predict lesion candidates. However, localizing nodules in lung CT images is a challenging problem due to the extremely small sizes and unstructured shapes of nodules. To solve these problems, we utilize atrous spatial pyramid pooling (ASPP) to minimize the loss of information for a general U-Net baseline model to extract rich representations from various receptive fields. Moreover, we propose mixed-up attention mechanism of reverse, boundary and convolutional block attention module (CBAM) to improve the accuracy of segmentation small scale of various shapes. The performance of the proposed model is compared with several previous attention mechanisms on the LIDC-IDRI dataset, and experimental results demonstrate that reverse, boundary, and CBAM (RB-CBAM) are effective in the segmentation of small nodules.

A medium-range streamflow forecasting approach over South Korea using Double-encoder-based transformer model (다중 인코더 기반의 트랜스포머 모델을 활용한 한반도 대규모 유역에 중장기 유출량 예측 전망 방법 제시)

  • Dong Gi Lee;Sung-Hyun Yoon;Kuk-Hyun Ahn
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.101-101
    • /
    • 2023
  • 지난 수십 년 동안 다양한 딥러닝 방법이 개발되고 있으며 수문 분야에서는 이러한 딥러닝 모형이 기존의 수문모형의 역할을 대체하여 사용할 수 있다는 가능성이 제시되고 있다. 본 연구에서는 딥러닝 모형 중에 트랜스포머 모형에 다중 인코더를 사용하여 중장기 기간 (1 ~ 10일)의 리드 타임에 대한 한국의 유출량 예측 전망의 가능성을 확인하고자 하였다. 트랜스포머 모형은 인코더와 디코더 구조로 구성되어 있으며 어텐션 (attention) 기법을 사용하여 기존 모형의 정보를 손실하는 단점을 보완한 모형이다. 본 연구에서 사용된 다중 인코더 기반의 트랜스포머 모델은 트랜스포머의 인코더와 디코더 구조에서 인코더를 하나 더 추가한 모형이다. 그리고 결과 비교를 위해 기존에 수문모형을 활용한 스태킹 앙상블 모형 (Stacking ensemble model) 기반의 예측모형을 추가로 구축하였다. 구축된 모형들은 남한 전체를 총 469개의 대규모 격자로 나누어 각 격자의 유출량을 비교하여 평가하였다. 결과적으로 수문모형보다 딥러닝 모형인 다중 인코더 기반의 트랜스포머 모형이 더 긴 리드 타임에서 높은 성능을 나타냈으며 이를 통해 수문모형의 역할을 딥러닝 모형이 어느 정도는 대신할 수 있고 높은 성능을 가질 수 있는 것을 확인하였다.

  • PDF

Reference-based Utterance Generation Model using Multi-turn Dialogue (멀티턴 대화를 활용한 레퍼런스 기반의 발화 생성 모델)

  • Sangmin Park;Yuri Son;Bitna Keum;Hongjin Kim;Harksoo Kim;Jaieun Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.88-91
    • /
    • 2022
  • 디지털 휴먼, 민원 상담, ARS 등 칫챗의 활용과 수요가 증가함에 따라 칫챗의 성능 향상을 위한 다양한 연구가 진행되고 있다. 특히, 오토 인코더(Auto-encoder) 기반의 생성 모델(Generative Model)은 높은 성능을 보이며 지속적인 연구가 이루어지고 있으나, 이전 대화들에 대한 충분한 문맥 정보의 반영이 어렵고 문법적으로 부적절한 답변을 생성하는 문제가 있다. 이를 개선하기 위해 검색 기반의 생성 모델과 관련된 연구가 진행되고 있으나, 현재 시점의 문장이 유사해도 이전 문장들에 따라 의도와 답변이 달라지는 멀티턴 대화 특징을 반영하여 대화를 검색하는 연구가 부족하다. 본 논문에서는 이와 같은 멀티턴 대화의 특징이 고려된 검색 방법을 제안하고 검색된 레퍼런스(준정답 문장)를 멀티턴 대화와 함께 생성 모델의 입력으로 활용하여 학습시키는 방안을 제안한다. 제안 방안으로 학습된 발화 생성 모델은 기존 모델과 비교 평가를 수행하며 Rouge-1 스코어에서 13.11점, Rouge-2 스코어에서 10.09점 Rouge-L 스코어에서 13.2점 향상된 성능을 보였고 이를 통해 제안 방안의 우수성을 입증하였다.

  • PDF

Hybrid model-based and deep learning-based metal artifact reduction method in dental cone-beam computed tomography

  • Jin Hur;Yeong-Gil Shin;Ho Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.8
    • /
    • pp.2854-2863
    • /
    • 2023
  • Objective: To present a hybrid approach that incorporates a constrained beam-hardening estimator (CBHE) and deep learning (DL)-based post-refinement for metal artifact reduction in dental cone-beam computed tomography (CBCT). Methods: Constrained beam-hardening estimator (CBHE) is derived from a polychromatic X-ray attenuation model with respect to X-ray transmission length, which calculates associated parameters numerically. Deep-learning-based post-refinement with an artifact disentanglement network (ADN) is performed to mitigate the remaining dark shading regions around a metal. Artifact disentanglement network (ADN) supports an unsupervised learning approach, in which no paired CBCT images are required. The network consists of an encoder that separates artifacts and content and a decoder for the content. Additionally, ADN with data normalization replaces metal regions with values from bone or soft tissue regions. Finally, the metal regions obtained from the CBHE are blended into reconstructed images. The proposed approach is systematically assessed using a dental phantom with two types of metal objects for qualitative and quantitative comparisons. Results: The proposed hybrid scheme provides improved image quality in areas surrounding the metal while preserving native structures. Conclusion: This study may significantly improve the detection of areas of interest in many dentomaxillofacial applications.