• Title/Summary/Keyword: 음향 정보

Search Result 1,315, Processing Time 0.028 seconds

A Study on Performance Evaluation of Hidden Markov Network Speech Recognition System (Hidden Markov Network 음성인식 시스템의 성능평가에 관한 연구)

  • 오세진;김광동;노덕규;위석오;송민규;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.4
    • /
    • pp.30-39
    • /
    • 2003
  • In this paper, we carried out the performance evaluation of HM-Net(Hidden Markov Network) speech recognition system for Korean speech databases. We adopted to construct acoustic models using the HM-Nets modified by HMMs(Hidden Markov Models), which are widely used as the statistical modeling methods. HM-Nets are carried out the state splitting for contextual and temporal domain by PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) algorithm, which is modified the original SSS algorithm. Especially it adopted the phonetic decision tree to effectively express the context information not appear in training speech data on contextual domain state splitting. In case of temporal domain state splitting, to effectively represent information of each phoneme maintenance in the state splitting is carried out, and then the optimal model network of triphone types are constructed by in the parameter. Speech recognition was performed using the one-pass Viterbi beam search algorithm with phone-pair/word-pair grammar for phoneme/word recognition, respectively and using the multi-pass search algorithm with n-gram language models for sentence recognition. The tree-structured lexicon was used in order to decrease the number of nodes by sharing the same prefixes among words. In this paper, the performance evaluation of HM-Net speech recognition system is carried out for various recognition conditions. Through the experiments, we verified that it has very superior recognition performance compared with the previous introduced recognition system.

  • PDF

Analysis of Precision of Interpolation of Reservoir bed Through Comparison of Data Acquired by Using UAV and Echo Sounder (UAV와 Echo Sounder 취득 자료의 비교를 통한 저수지 하상의 공간 보간별 정확도 분석)

  • Roh, Tae-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.23 no.3
    • /
    • pp.85-99
    • /
    • 2020
  • Reservoir is an important infrastructure of our society because it can store immense amount of water for various usages - manufacturing, agriculture, drinking, power generation, tourism etc. For maintenance of reservoir, various efforts in administrative and technological aspects are periodically conducted and monitoring the conditions of reservoir bed is the first priority for maintenance of reservoir. To check the conditions of reservoir bed, we measured depth of reservoir by using echo sounder, which is relatively reliable, prior to discharging of stored water and surveyed topography of reservoir by using UAV after discharging of water. Then, we conducted interpolation of measured depth of water by means of inverse distance weighting interpolation, Kriging interpolation, minimum curvature interpolation and radial basis function interpolation and calculated the volume of reservoir for each interpolation method. We compared the calculated volume of reservoir with the volume of water calculated by UAV after discharging of water and found the following results: First, as results of the above processes, we found that the Kriging interpolation was 97% correct in measurement of the volume of reservoir. Second, as results of comparison of differences between topographical areas and interpolated areas after selection of cross section for comparison, Kriging interpolation was found to have the most similar configuration with the topographical configuration by showing the least difference in the area of cross section. Therefore, it is determined that the optimal modeling of reservoir bed with the water depth data measured by echo sounder shall provide basic information for efficient maintenance of reservoir.

A Study on a Real Time Presentation Method for Playing of a Multimedia mail on Internet (인터넷상의 동영상 메일을 재생하기 위한 실시간 연출 기법 연구)

  • Im, Yeong-Hwan;Lee, Seon-Hye
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.877-890
    • /
    • 1999
  • In this paper, a multimedia mail including video, sound, graphic data has been proposed as the next generation mail of the text based mail. In order to develop the multimedia mail, the most outstanding problem is the fact that the multimedia data are too huge to send them to the receiving end directly. The fact of big data may cause many problems in both transferring and storing the data of the multimedia mail. Our main idea is to separate between a control program for the multimedia presentation and multimedia data. Since the size of a control program is as small as a plain text mail, it has no problem to send it attached to the internet mail to the receiver directly. Instead, the big multimedia data themselves may remain on the sender's computer or be sent to a designated server so that the data may be transferred to the receiver only when the receiver activates the play of the multimedia mail. In this scheme, our research focus is paced on the buffer management and the thread scheduling for the real time play of the multimedia mail on internet. Another problem is to provide an easy way of editing a multimedia presentation for an ordinary people having no programming knowledge. For the purposed, VIP(Visual Interface Player) has been used and the results or multimedia mail implemented on LAN has been described.

  • PDF

Comparison of Head-related Transfer Function Models Based on Principal Components Analysis (주성분 분석법을 이용한 머리전달함수 모형화 기법의 성능 비교)

  • Hwang, Sung-Mok;Park, Young-Jin;Park, Youn-Sik
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.18 no.6
    • /
    • pp.642-653
    • /
    • 2008
  • This study deals with modeling of head-related transfer functions(HRTFs) using principal components analysis(PCA) in the time and frequency domains. Four PCA models based on head-related impulse responses(HRIRs), complex-valued HRTFs, augmented HRTFs, and log-magnitudes of HRTFs are investigated. The objective of this study is to compare modeling performances of the PCA models in the least-squares sense and to show the theoretical relationship between the PCA models. In terms of the number of principal components needed for modeling, the PCA model based on HRIR or augmented HRTFs showed more efficient modeling performance than the PCA model based on complex-valued HRTFs. The PCA model based on HRIRs in the time domain and that based on augmented HRTFs in the frequency domain are shown to be theoretically equivalent. Modeling performance of the PCA model based on log-magnitudes of HRTFs cannot be compared with that of other PCA models because the PCA model deals with log-scaled magnitude components only, whereas the other PCA models consider both magnitude and phase components in linear scale.

Research on the Variation of Deposition & Accumulation on the Shorelines using Ortho Areial Photos (수치항공사진을 이용한 해안선 침퇴적변화에 관한 연구)

  • Choi, Chul-Uong;Lee, Chang-Hun;Oh, Che-Young;Son, Jung-Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.17 no.3
    • /
    • pp.23-31
    • /
    • 2009
  • The border of the shorelines in a nation is an important factor in determining the border of a national territory, but Korea's shorelines are rapidly changing due to the recent rise in sea level from global warming and growth-centered economic policy over the decades of years. This research was done centering on the areas having well-preserved shorelines as they naturally are and other areas having damaged shorelines in their vicinities due to artificial structures at the two beaches located at the neighboring areas and having mutually homogeneous ocean conditions with each other. First, this research derived the shorelines using the aerial photographies taken from 1947 until 2007 and revised the tidal levels sounding data obtained from a hydrographical survey automation system consisting of Echosounder[Echotrac 3100] and Differential Global Positioning System[Beacon]by using topographical data and ships on land obtained by applying post-processing Kinematic GPS measuring method. In addition, this research evaluated the changes and dimensional variations for the last 60 years by dividing these determined shorelines into 5 sections. As a result, the Haewundae Beach showed a total of 29% decrease rate in dimension as of the year 2007 in comparison with the year 1947 due to a rapid dimensional decline centering on its west areas, while the dimension of the Gwanganri Beach showed an increase in its dimension amounting to a total of 69% due to the decrease in flow velocity by artificial structures built on both ends of the beach-forming accumulation; thus, it was found that there existed a big difference in deposition & accumulation tendency depending on neighboring environment in spite of the homogeneous ocean conditions.

  • PDF

The Development of Robot and Augmented Reality Based Contents and Instructional Model Supporting Childrens' Dramatic Play (로봇과 증강현실 기반의 유아 극놀이 콘텐츠 및 교수.학습 모형 개발)

  • Jo, Miheon;Han, Jeonghye;Hyun, Eunja
    • Journal of The Korean Association of Information Education
    • /
    • v.17 no.4
    • /
    • pp.421-432
    • /
    • 2013
  • The purpose of this study is to develop contents and an instructional model that support children's dramatic play by integrating the robot and augmented reality technology. In order to support the dramatic play, the robot shows various facial expressions and actions, serves as a narrator and a sound manager, supports the simultaneous interaction by using the camera and recognizing the markers and children's motions, records children's activities as a photo and a video that can be used for further activities. The robot also uses a projector to allow children to directly interact with the video object. On the other hand, augmented reality offers a variety of character changes and props, and allows various effects of background and foreground. Also it allows natural interaction between the contents and children through the real-type interface, and provides the opportunities for the interaction between actors and audiences. Along with these, augmented reality provides an experience-based learning environment that induces a sensory immersion by allowing children to manipulate or choose the learning situation and experience the results. In addition, the instructional model supporting dramatic play consists of 4 stages(i.e., teachers' preparation, introducing and understanding a story, action plan and play, evaluation and wrapping up). At each stage, detailed activities to decide or proceed are suggested.

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

  • Kim, Min Sik;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.65-71
    • /
    • 2020
  • Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

Underwater Experiment on CSMA/CA Protocol Using Commercial Modems (상용 모뎀 제어를 통한 수중 CSMA/CA 프로토콜 시험)

  • Cho, Junho;Lee, Sang-Kug;Shin, Jungchae;Lee, Tae-Jin;Cho, Ho-Shin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.6
    • /
    • pp.457-465
    • /
    • 2014
  • This paper introduces a test bed for communication protocol schemes of underwater acoustic sensor network, and also shows experimental results obtained from the test bed. As a testing protocol, carrier sense multiple access/collision avoidance (CSMA/CA) is evaluated on underwater acoustic channel. A sensor node is equipped with a DSP control board of ATmega2560 and a commercial underwater modem produced by Benthos. The control board not only manipulates a GPS signal to acquire the information of location and time, but also controls the underwater modem to operate according to the procedure designed for a given testing protocol. Whenever any event takes place such as exchanging control/data packets between underwater modems and acquiring location and timing information, each sensor node reports them through radio frequency (RF) air interface to a central station located on the ground. The four kinds of packets for CSMA/CA, RTS(Request To Send), CTS(Clear to Send), DATA, ACK(Acknowledgement) are designed according to the underwater communication environment and are analyzed through the lake experiment from the point of feasibility of CSMA/CA in underwater acoustic communications.

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

Acoustic 2-D Full-waveform Inversion with Initial Guess Estimated by Traveltime Tomography (주시 토모그래피와 음향 2차원 전파형 역산의 적용성에 관한 연구)

  • Han Hyun Chul;Cho Chang Soo;Suh Jung Hee;Lee Doo Sung
    • Geophysics and Geophysical Exploration
    • /
    • v.1 no.1
    • /
    • pp.49-56
    • /
    • 1998
  • Seismic tomography has been widely used as high resolution subsurface imaging techniques in engineering applications. Although most of the techniques have been using travel time inversion, waveform method is being driven forward owing to the progress of computational environments. Although full-waveform inversion method has been known as the best method in terms of model resolving power without high-frequency restriction and weak scattering approximation, it has practical disadvantage that it is apt to get stuck in local minimum if the initial guess is far from the actual model and it consumes so much time to calculate. In this study, 2-D full-waveform inversion algorithm in acoustic medium is developed, which uses result of traveltime tomography as initial model. From the application on synthetic data, it is proved that this approach can efficiently reduce the problem of conventional approaches: our algorithm shows much faster convergence rate and improvement of model resolution. Result of application on physical modeling data also shows much improvement. It is expected that this algorithm can be applicable to real data.

  • PDF