• Title/Summary/Keyword: synthesis algorithm

Search Result 668, Processing Time 0.022 seconds

Implementation of Web-based Remote Multi-View 3D Imaging Communication System Using Adaptive Disparity Estimation Scheme (적응적 시차 추정기법을 이용한 웹 기반의 원격 다시점 3D 화상 통신 시스템의 구현)

  • Ko Jung-Hwan;Kim Eun-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.1C
    • /
    • pp.55-64
    • /
    • 2006
  • In this paper, a new web-based remote 3D imaging communication system employing an adaptive matching algorithm is suggested. In the proposed method, feature values are extracted from the stereo image pair through estimation of the disparity and similarities between each pixel of the stereo image. And then, the matching window size for disparity estimation is adaptively selected depending on the magnitude of this feature value. Finally, the detected disparity map and the left image is transmitted into the client region through the network channel. And then, in the client region, right image is reconstructed and intermediate views be synthesized by a linear combination of the left and right images using interpolation in real-time. From some experiments on web based-transmission in real-time and synthesis of the intermediate views by using two kinds of stereo images of 'Joo' & 'Hoon' captured by real camera, it is analyzed that PSNRs of the intermediate views reconstructed by using the proposed transmission scheme are highly measured by 30dB for 'Joo', 27dB for 'Hoon' and the delay time required to obtain the intermediate image of 4 view is also kept to be very fast value of 67.2ms on average, respectively.

3D Track Models Generation and Applications Based on LiDAR Data for Railway Route Management (철도노선관리에서의 LIDAR 데이터 기반의 3차원 궤적 모델 생성 및 적용)

  • Yeon, Sang-Ho;Lee, Young-Dae
    • Proceedings of the KSR Conference
    • /
    • 2007.11a
    • /
    • pp.1099-1104
    • /
    • 2007
  • The visual implementation of 3-dimensional national environment is focused by the requirement and importance in the fields such as, national development plan, telecommunication facility deployment plan, railway construction, construction engineering, spatial city development, safety and disaster prevention engineering. The currently used DEM system using contour lines, which embodies national geographic information based on the 2-D digital maps and facility information has limitation in implementation in reproducing the 3-D spatial city. Moreover, this method often neglects the altitude of the rail way infrastructure which has narrow width and long length. There it is needed to apply laser measurement technique in the spatial target object to obtain accuracy. Currently, the LiDAR data which combines the laser measurement skill and GPS has been introduced to obtain high resolution accuracy in the altitude measurement. In this paper, we first investigate the LiDAR based researches in advanced foreign countries, then we propose data a generation scheme and an algorithm for the optimal manage and synthesis of railway facility system in our 3-D spatial terrain information. For this object, LiDAR based height data transformed to DEM, and the realtime unification of the vector via digital image mapping and raster via exactness evaluation is transformed to make it possible to trace the model of generated 3-dimensional railway model with long distance for 3D tract model generation.

  • PDF

The Effectiveness of Electroglottographic Parameters in Differential Diagnosis of Laryngeal Cancer (후두암 감별진단에 있어 성문전도(Electroglottograph) 파라미터의 유용성)

  • 송인무;고의경;전경명;권순복;김기련;전계록;김광년;정동근;조철우
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.14 no.1
    • /
    • pp.16-25
    • /
    • 2003
  • Background and Objectives : Electroglottography(EGG) is a non-invasive method of monitoring the vocal cord vibration by measuring the variation of physiological impedance across the vocal folds through the neck skin. It reveals especially the vocal fold contact area and is widely used for basic laryngeal researches, voice analysis and synthesis. The purpose of this study is to investigate the effectiveness of EGG parameters in differential diagnosis of laryngeal cancer. Materials and Methods : The author investigated 10 laryngeal cancer and 25 benign laryngeal disease patients who visited at the Department of Otolaryngology, Pusan National University Hospital. The EGG equipment was devised in the author's Department. Among various parameters of EGG, closed quotient(CQ), speed quotient(SQ), speed index(SI), Jitter, Shimmer, Fo were determined by an analysis program made with MATLAB 6.5$^{\circledR}$(Mathwork, Inc.). In order to differentiate various laryngeal diseases from pathologic voice signals, the author has used the electroglottographic parameters using the neural network of multilayer perceptron structure. Results : SQ, SI, Jitter and Shimmer values except those of CQ and Fo showed remarkable differences between benign and malignant laryngeal disease groups. From the artificial neural network, the percentage of differentiating the laryngeal cancer was over 80% in SQ, SI, Jitter, Shimmer except for CQ and Fo. These results indicated that it is possible to discriminate the benign and malignant laryngeal diseases by EGG parameters using the artificial neural network. Conclusion : If parameters of EGG which can reveal for the pathology of laryngeal diseases are additionally developed and the current classification algorithm is improved, the discrimination of laryngeal cancer will become much more accurate.

  • PDF

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

Time-Scale Modification of Polyphonic Audio Signals Using Sinusoidal Modeling (정현파 모델링을 이용한 폴리포닉 오디오 신호의 시간축 변화)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2001
  • This paper proposes a method of time-scale modification of polyphonic audio signals based on a sinusoidal model. The signals are modeled with sinusoidal component and noise component. A multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in time-scale modification a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. For extracting sinusoidal components and calculating their parameters matching pursuit algorithm is applied to each analysis frame of subband signal. In accordance with spectrum analysis a psychoacoustic model implementing the effect of frequency masking is incorporated with matching pursuit to provide a resonable stop condition of iteration and reduce the number of sinusoids. The noise component obtained by subtracting the synthesized signal with sinusoidal components from the original signal is modeled by line-segment model of short time spectrum envelope. For various polyphonic audio signals the result of simulation shows suggested sinusoidal modeling can synthesize original signal without loss of perceptual quality and do more robust and high quality time-scale modification for large scale factor because of representing transients without any perceptual loss.

  • PDF

Architecture Design of High Performance H.264 CAVLC Encoder Using Optimized Searching Technique (최적화된 탐색기법을 이용한 고성능 H.264/AVC CAVLC 부호화기 구조 설계 기법)

  • Lee, Yang-Bok;Jung, Hong-Kyun;Kim, Chang-Ho;Myung, Je-Jin;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.431-435
    • /
    • 2011
  • This paper presents optimized searching technique to improve the performance of H.264/AVC. The proposed CAVLC encoder uses forward and backward searching algorithm to compute the parameters. By zero-block skipping technique and pipelined scheduling, the proposed CAVLC encoder can obtain better performance. The experimental result shows that the proposed architecture needs only 66.6 cycles on average for each $16{\times}16$ macroblock encoding. The proposed architecture improves the performance by 13.8% than that of previous designs. The proposed CAVLC encoder was implemented using VerilogHDL and synthesized with Megnachip $0.18{\mu}m$ standard cell library. The synthesis result shows that the gate count is about 15.6K with 125Mhz clock frequency.

  • PDF

Video-to-Video Generated by Collage Technique (콜라주 기법으로 해석한 비디오 생성)

  • Cho, Hyeongrae;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.26 no.1
    • /
    • pp.39-60
    • /
    • 2021
  • In the field of deep learning, there are many algorithms mainly after GAN in research related to generation, but in terms of generation, there are similarities and differences with art. If the generation in the engineering aspect is mainly to judge the presence or absence of a quantitative indicator or the correct answer and the incorrect answer, the creation in the artistic aspect creates a creation that interprets the world and human life by cross-validating and doubting the correct answer and incorrect answer from various perspectives. In this paper, the video generation ability of deep learning was interpreted from the perspective of collage and compared with the results made by the artist. The characteristic of the experiment is to compare and analyze how much GAN reproduces the result of the creator made with the collage technique and the difference between the creative part, and investigate the satisfaction level by making performance evaluation items for the reproducibility of GAN. In order to experiment on how much the creator's statement and purpose of expression were reproduced, a deep learning algorithm corresponding to the statement keyword was found and its similarity was compared. As a result of the experiment, GAN did not meet much expectations to express the collage technique. Nevertheless, the image association showed higher satisfaction than human ability, which is a positive discovery that GAN can show comparable ability to humans in terms of abstract creation.

Cycle-Consistent Generative Adversarial Network: Effect on Radiation Dose Reduction and Image Quality Improvement in Ultralow-Dose CT for Evaluation of Pulmonary Tuberculosis

  • Chenggong Yan;Jie Lin;Haixia Li;Jun Xu;Tianjing Zhang;Hao Chen;Henry C. Woodruff;Guangyao Wu;Siqi Zhang;Yikai Xu;Philippe Lambin
    • Korean Journal of Radiology
    • /
    • v.22 no.6
    • /
    • pp.983-993
    • /
    • 2021
  • Objective: To investigate the image quality of ultralow-dose CT (ULDCT) of the chest reconstructed using a cycle-consistent generative adversarial network (CycleGAN)-based deep learning method in the evaluation of pulmonary tuberculosis. Materials and Methods: Between June 2019 and November 2019, 103 patients (mean age, 40.8 ± 13.6 years; 61 men and 42 women) with pulmonary tuberculosis were prospectively enrolled to undergo standard-dose CT (120 kVp with automated exposure control), followed immediately by ULDCT (80 kVp and 10 mAs). The images of the two successive scans were used to train the CycleGAN framework for image-to-image translation. The denoising efficacy of the CycleGAN algorithm was compared with that of hybrid and model-based iterative reconstruction. Repeated-measures analysis of variance and Wilcoxon signed-rank test were performed to compare the objective measurements and the subjective image quality scores, respectively. Results: With the optimized CycleGAN denoising model, using the ULDCT images as input, the peak signal-to-noise ratio and structural similarity index improved by 2.0 dB and 0.21, respectively. The CycleGAN-generated denoised ULDCT images typically provided satisfactory image quality for optimal visibility of anatomic structures and pathological findings, with a lower level of image noise (mean ± standard deviation [SD], 19.5 ± 3.0 Hounsfield unit [HU]) than that of the hybrid (66.3 ± 10.5 HU, p < 0.001) and a similar noise level to model-based iterative reconstruction (19.6 ± 2.6 HU, p > 0.908). The CycleGAN-generated images showed the highest contrast-to-noise ratios for the pulmonary lesions, followed by the model-based and hybrid iterative reconstruction. The mean effective radiation dose of ULDCT was 0.12 mSv with a mean 93.9% reduction compared to standard-dose CT. Conclusion: The optimized CycleGAN technique may allow the synthesis of diagnostically acceptable images from ULDCT of the chest for the evaluation of pulmonary tuberculosis.