• Title/Summary/Keyword: Encoder Model

Search Result 364, Processing Time 0.024 seconds

ViStoryNet: Neural Networks with Successive Event Order Embedding and BiLSTMs for Video Story Regeneration (ViStoryNet: 비디오 스토리 재현을 위한 연속 이벤트 임베딩 및 BiLSTM 기반 신경망)

  • Heo, Min-Oh;Kim, Kyung-Min;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.138-144
    • /
    • 2018
  • A video is a vivid medium similar to human's visual-linguistic experiences, since it can inculcate a sequence of situations, actions or dialogues that can be told as a story. In this study, we propose story learning/regeneration frameworks from videos with successive event order supervision for contextual coherence. The supervision induces each episode to have a form of trajectory in the latent space, which constructs a composite representation of ordering and semantics. In this study, we incorporated the use of kids videos as a training data. Some of the advantages associated with the kids videos include omnibus style, simple/explicit storyline in short, chronological narrative order, and relatively limited number of characters and spatial environments. We build the encoder-decoder structure with successive event order embedding, and train bi-directional LSTMs as sequence models considering multi-step sequence prediction. Using a series of approximately 200 episodes of kids videos named 'Pororo the Little Penguin', we give empirical results for story regeneration tasks and SEOE. In addition, each episode shows a trajectory-like shape on the latent space of the model, which gives the geometric information for the sequence models.

Fast Decision Method of Adaptive Motion Vector Resolution (적응적 움직임 벡터 해상도 고속 결정 기법)

  • Park, Sang-hyo
    • Journal of Broadcast Engineering
    • /
    • v.25 no.3
    • /
    • pp.305-312
    • /
    • 2020
  • As a demand for a new video coding standard having higher coding efficiency than the existing standards is growing, recently, MPEG and VCEG has been developing and standardizing the next-generation video coding project, named Versatile Video Coding (VVC). Many inter prediction techniques have been introduced to increase the coding efficiency, and among them, an adaptive motion vector resolution (AMVR) technique has contributed on increasing the efficiency of VVC. However, the best motion vector can only be determined by computing many rate-distortion costs, thereby increasing encoding complexity. It is necessary to reduce the complexity for real-time video broadcasting and streaming services, but it is yet an open research topic to reduce the complexity of AMVR. Therefore, in this paper, an efficient technique is proposed, which reduces the encoding complexity of AMVR. For that, the proposed method exploits a special VVC tree structure (i.e., multi-type tree structure) to accelerate the decision process of AMVR. Experiment results show that the proposed decision method reduces the encoding complexity of VVC test model by 10% with a negligible loss of coding efficiency.

A Case Study on Integrated Surveillance System Field Implement with Intelligent Video Analytic Software (지능형 영상 분석 소프트웨어를 탑재한 종합 감시 시스템 현장 구축에 관한 사례 연구)

  • Jeon, Ji-Hye;Ahn, Tae-Ki;Park, Kwang-Young;Park, Goo-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.6
    • /
    • pp.255-260
    • /
    • 2011
  • The security issue in urban transit system has been widely considered as the common matters. The safe urban transit system is highly demanded because of the vast number of daily passengers, and providing safety is one of the most challenging projects. We introduced a test model for integrated security system for urban transit system and built it at a subway station to demonstrate its performance. This system consists of cameras, sensor network and central monitoring software. We described the smart camera functionality in more detail. The proposed smart camera includes the moving objects recognition module, video analytics, video encoder and server module that transmits video and audio information. We demonstrated the system's excellent performance.

Fast CU Encoding Schemes Based on Merge Mode and Motion Estimation for HEVC Inter Prediction

  • Wu, Jinfu;Guo, Baolong;Hou, Jie;Yan, Yunyi;Jiang, Jie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1195-1211
    • /
    • 2016
  • The emerging video coding standard High Efficiency Video Coding (HEVC) has shown almost 40% bit-rate reduction over the state-of-the-art Advanced Video Coding (AVC) standard but at about 40% computational complexity overhead. The main reason for HEVC computational complexity is the inter prediction that accounts for 60%-70% of the whole encoding time. In this paper, we propose several fast coding unit (CU) encoding schemes based on the Merge mode and motion estimation information to reduce the computational complexity caused by the HEVC inter prediction. Firstly, an early Merge mode decision method based on motion estimation (EMD) is proposed for each CU size. Then, a Merge mode based early termination method (MET) is developed to determine the CU size at an early stage. To provide a better balance between computational complexity and coding efficiency, several fast CU encoding schemes are surveyed according to the rate-distortion-complexity characteristics of EMD and MET methods as a function of CU sizes. These fast CU encoding schemes can be seamlessly incorporated in the existing control structures of the HEVC encoder without limiting its potential parallelization and hardware acceleration. Experimental results demonstrate that the proposed schemes achieve 19%-46% computational complexity reduction over the HEVC test model reference software, HM 16.4, at a cost of 0.2%-2.4% bit-rate increases under the random access coding configuration. The respective values under the low-delay B coding configuration are 17%-43% and 0.1%-1.2%.

A fast block-matching algorithm using the slice-competition method (슬라이스 경쟁 방식을 이용한 고속 블럭 정합 알고리즘)

  • Jeong, Yeong-Hun;Kim, Jae-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.6
    • /
    • pp.692-702
    • /
    • 2001
  • In this paper, a new block-matching algorithm for standard video encoder is proposed. The algorithm finds a motion vector using the increasing SAD transition curve for each predefined candidates, not a coarse-to-fine approach as a conventional method. To remove low-probability candidates at the early stage of accumulation, a dispersed accumulation matrix is also proposed. This matrix guarantees high-linearity to the SAD transition curve. Therefore, base on this method, we present a new fast block-matching algorithm with the slice competition technique. The Candidate Selection Step and the Candidate Competition Step makes an out-performance model that considerably reduces computational power and not to be trapped into local minima. The computational power is reduced by 10%~70% than that of the conventional BMAs. Regarding computational time, an 18%~35% reduction was achieved by the proposed algorithm. Finally, the average MAD is always low in various bit-streams. The results were also very similar to the MAD of the full search block-matching algorithm.

  • PDF

Design of a Variable Bit Rate Speech Coder Based on One-dimensional SPIHT (1차원 SPIHT를 이용한 가변 비트율 음성 부호기의 설계)

  • Na, Hoon;Jeong, Dae-Gwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.443-451
    • /
    • 2003
  • Since a codebook-based CELP coder models its excitation signal according to one of several bit rates pre-assigned to codebooks and synthesizes speech signal using codebooks, it can not support encoding of speech signal at an arbitrary bit rate in one encoder. The proposed variable bit rate speech coder encodes the excitation signal based on the bit rate assigned to a present frame of speech using one-dimensional SPIHT and wavelet transform. Also it does't need to model excitation signal (or codebook) to some types as CELP coder, and can encode excitation signal at various bit rates without exact pitch information according to user requirement. As a result, since the coder doesn't have a codebook structure, it has relatively low coder complexity and provides equal or better speech quality compared to G.729 and G.723.1 coder.

A Deep Neural Network Model Based on a Mutation Operator (돌연변이 연산 기반 효율적 심층 신경망 모델)

  • Jeon, Seung Ho;Moon, Jong Sub
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.12
    • /
    • pp.573-580
    • /
    • 2017
  • Deep Neural Network (DNN) is a large layered neural network which is consisted of a number of layers of non-linear units. Deep Learning which represented as DNN has been applied very successfully in various applications. However, many issues in DNN have been identified through past researches. Among these issues, generalization is the most well-known problem. A Recent study, Dropout, successfully addressed this problem. Also, Dropout plays a role as noise, and so it helps to learn robust feature during learning in DNN such as Denoising AutoEncoder. However, because of a large computations required in Dropout, training takes a lot of time. Since Dropout keeps changing an inter-layer representation during the training session, the learning rates should be small, which makes training time longer. In this paper, using mutation operation, we reduce computation and improve generalization performance compared with Dropout. Also, we experimented proposed method to compare with Dropout method and showed that our method is superior to the Dropout one.

Position estimation method based on the optical displacement sensor for an autonomous hull cleaning robot (선체 청소로봇 자동화를 위한 광 변위센서 기반의 위치추정 방법)

  • Kang, Hoon;Ham, Youn-jae;Oh, Jin-seok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.2
    • /
    • pp.385-393
    • /
    • 2016
  • This paper presents the new position estimation method which contains the optical displacement sensor and the dead reckoning based position estimation algorithm for automation of hull cleaning robot. To evaluate feasibility of the proposed position estimation method on the hull cleaning robot, it was applied on the small scale robot model which has an identical drive method with the hull cleaning robot and then a set of the position estimation experiments were performed. The experimental results of the position estimation demonstrate that the estimated results with the optical displacement sensors is more accurate than used rotary encoder method. In addition, it continuously calculated the robot position quite close to the real robot driving path. In a follow-up study, the proposed position estimation method will be complemented and exploited on the actual hull cleaning robot by adding additional sensor modules that correct measurement errors.

Comparison Analysis of Four Face Swapping Models for Interactive Media Platform COX (인터랙티브 미디어 플랫폼 콕스에 제공될 4가지 얼굴 변형 기술의 비교분석)

  • Jeon, Ho-Beom;Ko, Hyun-kwan;Lee, Seon-Gyeong;Song, Bok-Deuk;Kim, Chae-Kyu;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.5
    • /
    • pp.535-546
    • /
    • 2019
  • Recently, there have been a lot of researches on the whole face replacement system, but it is not easy to obtain stable results due to various attitudes, angles and facial diversity. To produce a natural synthesis result when replacing the face shown in the video image, technologies such as face area detection, feature extraction, face alignment, face area segmentation, 3D attitude adjustment and facial transposition should all operate at a precise level. And each technology must be able to be interdependently combined. The results of our analysis show that the difficulty of implementing the technology and contribution to the system in facial replacement technology has increased in facial feature point extraction and facial alignment technology. On the other hand, the difficulty of the facial transposition technique and the three-dimensional posture adjustment technique were low, but showed the need for development. In this paper, we propose four facial replacement models such as 2-D Faceswap, OpenPose, Deekfake, and Cycle GAN, which are suitable for the Cox platform. These models have the following features; i.e. these models include a suitable model for front face pose image conversion, face pose image with active body movement, and face movement with right and left side by 15 degrees, Generative Adversarial Network.

End-to-end speech recognition models using limited training data (제한된 학습 데이터를 사용하는 End-to-End 음성 인식 모델)

  • Kim, June-Woo;Jung, Ho-Young
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.63-71
    • /
    • 2020
  • Speech recognition is one of the areas actively commercialized using deep learning and machine learning techniques. However, the majority of speech recognition systems on the market are developed on data with limited diversity of speakers and tend to perform well on typical adult speakers only. This is because most of the speech recognition models are generally learned using a speech database obtained from adult males and females. This tends to cause problems in recognizing the speech of the elderly, children and people with dialects well. To solve these problems, it may be necessary to retain big database or to collect a data for applying a speaker adaptation. However, this paper proposes that a new end-to-end speech recognition method consists of an acoustic augmented recurrent encoder and a transformer decoder with linguistic prediction. The proposed method can bring about the reliable performance of acoustic and language models in limited data conditions. The proposed method was evaluated to recognize Korean elderly and children speech with limited amount of training data and showed the better performance compared of a conventional method.