• Title/Summary/Keyword: New encoder/decoder

Search Result 68, Processing Time 0.023 seconds

A H.264 based Selective Fine Granular Scalable Coding Scheme (H.264 기반 선택적인 미세입자 스케일러블 코딩 방법)

  • 박광훈;유원혁;김규헌
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.4
    • /
    • pp.309-318
    • /
    • 2004
  • This paper proposes the H.264-based selective fine granular scalable (FGS) coding scheme that selectively uses the temporal prediction data in the enhancement layer. The base layer of the proposed scheme is basically coded by the H.264 (MPEG-4 Part 10 AVC) visual coding scheme that is the state-of-art in codig efficiency. The enhancement layer is basically coded by the same bitplane-based algorithm of the MPEG-4 (Part 2) fine granular scalable coding scheme. In this paper, we introduce a new algorithm that uses the temproal prediction mechanism inside the enhancement layer and the effective selection mechanism to decide whether the temporally-predicted data would be sent to the decoder or not. Whenever applying the temporal prediction inside the enhancement layer, the temporal redundancies may be effectively reduced, however the drift problem would be severly occurred especially at the low bitrate transmission, due to the mismatch bewteen the encoder's and decoder's reference frame images. Proposed algorithm selectively uses the temporal-prediction data inside the enhancement layer only in case those data could siginificantly reduce the temporal redundancies, to minimize the drift error and thus to improve the overall coding efficiency. Simulation results, based on several test image sequences, show that the proposed scheme has 1∼3 dB higher coding efficiency than the H.264-based FGS coding scheme, even 3∼5 dB higher coding efficiency than the MPEG-4 FGS international standard.

A PDWZ Encoder Using Code Conversion and Bit Interleaver (코드변환과 비트 인터리버를 이용한 화소영역 Wyner-Ziv 부호화 기법)

  • Kim, Jin-Soo;Kim, Jae-Gon;Seo, Kwang-Deok
    • Journal of Broadcast Engineering
    • /
    • v.15 no.1
    • /
    • pp.52-62
    • /
    • 2010
  • Recently, DVC (Distributed Video Coding) is attracting a lot of research works since this enables us to implement a light-weight video encoder by distributing the high complex tasks such as motion estimation into the decoder side. In order to improve the coding efficiency of the DVC, the existing works have been focused on the efficient generation of side information (SI) or the virtual channel modeling which can describe the statistical channel noise well. But, in order to improve the overall performance, this paper proposes a new scheme that can be implemented with simple bit operations without introducing complex operation. That is, the performance of the proposed scheme is enhanced by using the fact that the Wyner-Ziv (WZ) frame and side information are highly correlated, and by reducing the effect of virtual channel noise which tends to be clustered in some regions. For this aim, this paper proposes an efficient pixel-domain WZ (PDWZ) CODEC which effectively exploits the statistical redundancy by using the code conversion and Gray code, and then reduces the channel noise by using the bit interleaver. Through computer simulations, it is shown that the proposed scheme can improve the performance up to 0.5 dB in objective visual quality.

Latent Shifting and Compensation for Learned Video Compression (신경망 기반 비디오 압축을 위한 레이턴트 정보의 방향 이동 및 보상)

  • Kim, Yeongwoong;Kim, Donghyun;Jeong, Se Yoon;Choi, Jin Soo;Kim, Hui Yong
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.31-43
    • /
    • 2022
  • Traditional video compression has developed so far based on hybrid compression methods through motion prediction, residual coding, and quantization. With the rapid development of technology through artificial neural networks in recent years, research on image compression and video compression based on artificial neural networks is also progressing rapidly, showing competitiveness compared to the performance of traditional video compression codecs. In this paper, a new method capable of improving the performance of such an artificial neural network-based video compression model is presented. Basically, we take the rate-distortion optimization method using the auto-encoder and entropy model adopted by the existing learned video compression model and shifts some components of the latent information that are difficult for entropy model to estimate when transmitting compressed latent representation to the decoder side from the encoder side, and finally compensates the distortion of lost information. In this way, the existing neural network based video compression framework, MFVC (Motion Free Video Compression) is improved and the BDBR (Bjøntegaard Delta-Rate) calculated based on H.264 is nearly twice the amount of bits (-27%) of MFVC (-14%). The proposed method has the advantage of being widely applicable to neural network based image or video compression technologies, not only to MFVC, but also to models using latent information and entropy model.

Korean Abbreviation Generation using Sequence to Sequence Learning (Sequence-to-sequence 학습을 이용한 한국어 약어 생성)

  • Choi, Su Jeong;Park, Seong-Bae;Kim, Kweon-Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.183-187
    • /
    • 2017
  • Smart phone users prefer fast reading and texting. Hence, users frequently use abbreviated sequences of words and phrases. Nowadays, abbreviations are widely used from chat terms to technical terms. Therefore, gathering abbreviations would be helpful to many services, including information retrieval, recommendation system, and so on. However, manually gathering abbreviations needs to much effort and cost. This is because new abbreviations are continuously generated whenever a new material such as a TV program or a phenomenon is made. Thus it is required to generate of abbreviations automatically. To generate Korean abbreviations, the existing methods use the rule-based approach. The rule-based approach has limitations, in that it is unable to generate irregular abbreviations. Another problem is to decide the correct abbreviation among candidate abbreviations generated rules. To address the limitations, we propose a method of generating Korean abbreviations automatically using sequence-to-sequence learning in this paper. The sequence-to-sequence learning can generate irregular abbreviation and does not lead to the problem of deciding correct abbreviation among candidate abbreviations. Accordingly, it is suitable for generating Korean abbreviations. To evaluate the proposed method, we use dataset of two type. As experimental results, we prove that our method is effective for irregular abbreviations.

A Real-time H.264 to MPEG-2 Transcoding for Ship to Shore Communication (선박-육지간 통신을 위한 실시간 H.264 to MPEG-2 트랜스코딩)

  • Son, Nam-Rye;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.1
    • /
    • pp.90-102
    • /
    • 2011
  • Recently, the grade of users using wireless communication services which transmits and re-transmits to the signal via the broadcasting satellite have a variety. However the ships not preparing of H.264 standard devices should not received the realtime data because the broadcasting stations have transmitted the compressed video data through the satellite communication. Therefore this paper proposes H.264 to MPEG-2 transcoding for the ships using MPEG-2 devices. Proposed method improves a speed and object quality in H.264 to MPEG-2 transcoding by analysis features of macroblock modes in H.264. In the Intra mode of P-frame, it proposes new method by computing coincidence proportion after comparing of Intra mode methods of H.264 and MPEG-2. In the Inter mode, it proposes a PMV(predictive motion vector) considering movement of motion vectors in H.264 decoder. we reuses a PMV directly as like the final MV in MPEG-2 encoder and refinements the MV after coincidence ratio comparing of variable motion vectors of H.264 and these of MPEG-2. The experimental results from proposed method show a considerable reduction in processing time, as much as 70% and 67% respectively, with a small objective quality reduction in PSNR.

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.

Development of Fender Segmentation System for Port Structures using Vision Sensor and Deep Learning (비전센서 및 딥러닝을 이용한 항만구조물 방충설비 세분화 시스템 개발)

  • Min, Jiyoung;Yu, Byeongjun;Kim, Jonghyeok;Jeon, Haemin
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.26 no.2
    • /
    • pp.28-36
    • /
    • 2022
  • As port structures are exposed to various extreme external loads such as wind (typhoons), sea waves, or collision with ships; it is important to evaluate the structural safety periodically. To monitor the port structure, especially the rubber fender, a fender segmentation system using a vision sensor and deep learning method has been proposed in this study. For fender segmentation, a new deep learning network that improves the encoder-decoder framework with the receptive field block convolution module inspired by the eccentric function of the human visual system into the DenseNet format has been proposed. In order to train the network, various fender images such as BP, V, cell, cylindrical, and tire-types have been collected, and the images are augmented by applying four augmentation methods such as elastic distortion, horizontal flip, color jitter, and affine transforms. The proposed algorithm has been trained and verified with the collected various types of fender images, and the performance results showed that the system precisely segmented in real time with high IoU rate (84%) and F1 score (90%) in comparison with the conventional segmentation model, VGG16 with U-net. The trained network has been applied to the real images taken at one port in Republic of Korea, and found that the fenders are segmented with high accuracy even with a small dataset.

Prediction of Music Generation on Time Series Using Bi-LSTM Model (Bi-LSTM 모델을 이용한 음악 생성 시계열 예측)

  • Kwangjin, Kim;Chilwoo, Lee
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.65-75
    • /
    • 2022
  • Deep learning is used as a creative tool that could overcome the limitations of existing analysis models and generate various types of results such as text, image, and music. In this paper, we propose a method necessary to preprocess audio data using the Niko's MIDI Pack sound source file as a data set and to generate music using Bi-LSTM. Based on the generated root note, the hidden layers are composed of multi-layers to create a new note suitable for the musical composition, and an attention mechanism is applied to the output gate of the decoder to apply the weight of the factors that affect the data input from the encoder. Setting variables such as loss function and optimization method are applied as parameters for improving the LSTM model. The proposed model is a multi-channel Bi-LSTM with attention that applies notes pitch generated from separating treble clef and bass clef, length of notes, rests, length of rests, and chords to improve the efficiency and prediction of MIDI deep learning process. The results of the learning generate a sound that matches the development of music scale distinct from noise, and we are aiming to contribute to generating a harmonistic stable music.