• Title/Summary/Keyword: contextual mode information

Search Result 11, Processing Time 0.026 seconds

Enhanced Inter Mode Decision Based on Contextual Prediction for P-Slices in H.264/AVC Video Coding

  • Kim, Byung-Gyu;Song, Suk-Kyu
    • ETRI Journal
    • /
    • v.28 no.4
    • /
    • pp.425-434
    • /
    • 2006
  • We propose a fast macroblock mode prediction and decision algorithm based on contextual information for Pslices in the H.264/AVC video standard, in which the mode prediction part is composed of intra and inter modes. There are nine $4{\times}4$ and four $16{\times}16$ modes in the intra mode prediction, and seven block types exist for the best coding gain based on rate-distortion optimization. This scheme gives rise to exhaustive computations (search) in the coding procedure. To overcome this problem, a fast inter mode prediction scheme is applied that uses contextual mode information for P-slices. We verify the performance of the proposed scheme through a comparative analysis of experimental results. The suggested mode search procedure increased more than 57% in speed compared to a full mode search and more than 20% compared to the other methods.

  • PDF

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

  • 김명원;한문성;이순신;류정우
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.3
    • /
    • pp.67-77
    • /
    • 2004
  • The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.

A Word Dictionary Structure for the Postprocessing of Hangul Recognition (한글인식 후처리용 단어사전의 기억구조)

  • ;Yoshinao Aoki
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.9
    • /
    • pp.1702-1709
    • /
    • 1994
  • In the postprocessing of Hangul recognition system, the storage structure of contextual information is an important matter for the recognition rate and speed of the entire system. Trie in general is used to represent the context as word dictionary, but the memory space efficiency of the structure is low. Therefore we propose a new structure for word dictionary that has better space efficiency and the equivalent merits of trie. Because Hangul is a compound language, the language can be represented by phonemes or by characters. In the representation by phonemes(P-mode) the retrieval is fast, but the space efficiency is low. In the representation by characters(C-mode) the space efficiency is high, but the retrieval is slow. In this paper the two representation methods are combined to form a hybrid representation(H-mode). At first an optimal level for the combination is selected by two characteristic curves of node utilization and dispersion. Then the input words are represented with trie structure by P-mode from the first to the optimal level, and the rest are represented with sequentially linked list structure by C-mode. The experimental results for the six kinds of word set show that the proposed structure is more efficient. This result is based on the fact that the retrieval for H-mode is as fast as P-mode and the space efficiency is as good as C-mode.

  • PDF

Risk Analysis on Various Contextual Situations and Progressive Authentication Method based on Contextual-Situation-based Risk Degree on Android Devices (안드로이드 단말에서의 상황별 위험도 분석 및 상황별 위험도 기반 지속인증 기법)

  • Kim, Jihwan;Kim, SeungHyun;Kim, Soo-Hyung;Lee, Younho
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1154-1164
    • /
    • 2016
  • To prevent the use of one's smartphone by another user, the authentication checks the owner in several ways. However, whenever the owner does use his/her smartphone, this authentication requires an unnecessary action, and sometimes he/she finally decides not to use an authentication method. This can cause a fatal problem in the smartphone's security. We propose a sustainable android platform-based authentication mode to solve this security issue and to facilitate secure authentication. In the proposed model, a smartphone identifies the current situation and then performs the authentication. In order to define the risk of the situation, we conducted a survey and analyzed the survey results by age, location, behavior, etc. Finally, a demonstration program was implemented to show the relationship between risk and security authentication methods.

L2 proficiency and effect of auditory source in processing L2 stops

  • Kong, Eun Jong;Kang, Jieun
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.99-105
    • /
    • 2015
  • The current study investigates whether Korean-speaking adults show differential sensitivities to the sources of auditory stimuli (L1 Korean and L2 English) in utilizing VOT and f0 in the perceptual mode of L2 stops, and how the L2 proficiency interacts with the learners' low-level phonetic sensitivities in L2 perceptual mode. 48 Korean learners of English participated in the perception experiments where they rated the goodness of English /t/ and /d/ using an analogue scale. Two sets of stimuli (English and Korean sources) were prepared by manipulating VOT (6-steps) and f0 (5-steps) values of productions by an English male (L2 source condition) and a Korean male (L1 source condition). Findings showed that, in judging /t/-likeness, the listeners responded differently to the two auditory stimulus conditions by relying on VOT significantly more in English source condition than in Korean source condition. The listeners' English proficiency did not interact with these differential sensitivities to the auditory stimulus source either along the VOT dimension or the f0 dimension. The results of the current study suggest that low-level contextual information of the auditory source can affect the learners in faithfully being in the L2 perceptual mode.

The Archi-Semiotic Characteristics of Spatial Modality in Interactive Space - Focus on Gilbert Simondon's Information of Technology - (상호작용 공간 모달리티의 건축기호적 특징 - 질베르 시몽동의 기술의 정보·형태화 관점 -)

  • Suh, Juneho
    • Korean Institute of Interior Design Journal
    • /
    • v.22 no.1
    • /
    • pp.75-84
    • /
    • 2013
  • This study focuses on Gilbert Simondon's individuation theory, a core concept of his technological philosophy, and spatial modality in interactive space as the schema of interactive operation. The study examines spatial modality as the technology of an interaction-enabler that has archi-semiotic characteristics in the designed space by aspects of examples. They are based on ideas and properties of a combined environment and the concept of information, which form Simondon's individuation theory. In the process of technological individuation, spatial modality has the characteristics of archi-semiotics from a combined environment and information. The first of the three properties is representation through semiosis and the information surface. Second is the context by relation works and perception, and third are the symbolic aspects, which could create Placeness by meaning. Combining meaningful constructive and deconstructive spaces could result in space for interactive communication. Spatial modality makes it possible to interact with users and spaces. In fact, it could have a particular semiotic mode of address and become a semiotic and contextual base. As a basic investigation of spatial modality, this study will contribute to interactive space design research.

Context-Adaptive Intra Prediction Model Training and Its Coding Performance Analysis (문맥적응적 화면내 예측 모델 학습 및 부호화 성능분석)

  • Moon, Gihwa;Park, Dohyeon;Kim, Jae-Gon
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.332-340
    • /
    • 2022
  • Recently, with the development of deep learning and artificial neural network technologies, research on the application of neural network has been actively conducted in the field of video coding. In particular, deep learning-based intra prediction is being studied as a way to overcome the performance limitations of the existing intra prediction techniques. This paper presents a method of context-adaptive neural network-based intra prediction model training and its coding performance analysis. In other words, in this paper, we implement and train a known intra prediction model based on convolutional neural network (CNN) that predicts a current block using contextual information from reference blocks. Then, we integrate the trained model into HM16.19 as an additional intra prediction mode and evaluate the coding performance of the trained model. Experimental results show that the trained model gives 0.28% BD-rate bit saving over HEVC in All Intra (AI) coding mode. In addition, the coding performance change of training considering block partition is also presented.

A Study on the Characteristics of Urban Public Transportation Information Services Use (도시 대중교통정보 이용 행동 특성 연구)

  • Joh, Chang-Hyeon;Lee, Back-Jin;Bin, Mi-Young
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.12 no.1
    • /
    • pp.56-66
    • /
    • 2009
  • As the amount of information is rapidly growing, and the ubiquitous urban environments are emerging, the question which information type to provide and which communication media to support is a major challenge for commercial and public travel-information service providers. The current research reports the first findings of analyses of recent data, collected in metropolitan Seoul, about the acquisition of travel information and the communication media used. The study is based on the assumption that information acquisition and choice of communication medium is strongly context-driven. The study applies CHAID analysis to find homogeneous segments in information acquisition and use of communication media. Findings indicate that transport mode and activity are important determinant of information acquisition and choice of media. The type of travel information acquired co-varies strongly with transport mode and activity. In addition, we found evidence of time of day effects. Similarly, the choice of communication medium depends on the type of travel information searched for, transport mode and activity. The results suggest important implications of managerial and policy measures, in particular the dynamic, contextual market segmentation.

  • PDF

Early Termination Algorithm of Merge Mode Search for Fast High Efficiency Video Coding (HEVC) Encoder (HEVC 인코더 고속화를 위한 병합 검색 조기 종료 결정 알고리즘)

  • Park, Chan Seob;Kim, Byung Gyu;Jun, Dong San;Jung, Soon Heung;Kim, Youn Hee;Seok, Jin Wook;Choi, Jin Soo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.691-701
    • /
    • 2013
  • In this paper, an early termination algorithm for merge process is proposed to reduce the computational complexity in High Efficiency Video Coding (HEVC) encoder. In the HEVC, the same candidate modes from merge candidate list (MCL) are shared to predict a merge or merge SKIP mode. This search process is performed by the number of the obtained candidates for the both of the merge and SKIP modes. This may cause some redundant search operations. To reduce this redundant search operation, we employ the neighboring blocks which have been encoded in prior, to check on the contextual information. In this study, the spatial, temporal and depth neighboring blocks have been considered to compute a correlation information. With this correlation information, an early termination algorithm for merge process is suggested. When all modes of neighboring blocks are SKIP modes, then the merge process performs only SKIP mode. Otherwise, usual merge process of HEVC is performed Through experimental results, the proposed method achieves a time-saving factor of about 21.25% on average with small loss of BD-rate, when comparing to the original HM 10.0 encoder.

A Study on Performance Evaluation of Hidden Markov Network Speech Recognition System (Hidden Markov Network 음성인식 시스템의 성능평가에 관한 연구)

  • 오세진;김광동;노덕규;위석오;송민규;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.4
    • /
    • pp.30-39
    • /
    • 2003
  • In this paper, we carried out the performance evaluation of HM-Net(Hidden Markov Network) speech recognition system for Korean speech databases. We adopted to construct acoustic models using the HM-Nets modified by HMMs(Hidden Markov Models), which are widely used as the statistical modeling methods. HM-Nets are carried out the state splitting for contextual and temporal domain by PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) algorithm, which is modified the original SSS algorithm. Especially it adopted the phonetic decision tree to effectively express the context information not appear in training speech data on contextual domain state splitting. In case of temporal domain state splitting, to effectively represent information of each phoneme maintenance in the state splitting is carried out, and then the optimal model network of triphone types are constructed by in the parameter. Speech recognition was performed using the one-pass Viterbi beam search algorithm with phone-pair/word-pair grammar for phoneme/word recognition, respectively and using the multi-pass search algorithm with n-gram language models for sentence recognition. The tree-structured lexicon was used in order to decrease the number of nodes by sharing the same prefixes among words. In this paper, the performance evaluation of HM-Net speech recognition system is carried out for various recognition conditions. Through the experiments, we verified that it has very superior recognition performance compared with the previous introduced recognition system.

  • PDF