• Title/Summary/Keyword: Video Synthesis

Search Result 116, Processing Time 0.025 seconds

European Experience in Implementing Innovative Educational Technologies in the Field of Culture and the Arts: Current Problems and Vectors of Development

  • Kdyrova, I.O.;Grynyshyna, M.O.;Yur, M.V.;Osadcha, O.A.;Varyvonchyk, A.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.39-48
    • /
    • 2022
  • The main purpose of the work is to analyze modern innovative educational practices in the field of culture and art and their effectiveness in the context of the spread of digitalization trends. The study used general scientific theoretical methods of analysis, synthesis, analogy, comparative, induction, deduction, reductionism, and a number of others, allowing you to fully understand the pattern of modern modernization processes in a long historical development and demonstrate how the rejection of the negativity of progress allows talented artists to realize their own potential. The study established the advantages and disadvantages of involving innovative technologies in the educational process on the example of European experience and outlined possible ways of implementing digitalization processes in Ukrainian institutions of higher education, formulated the main difficulties encountered by teachers and students in the use of technological innovation in the pandemic. The rapid development of digital technologies has had a great impact on the sphere of culture and art, both visual, scenic, and musical in all processes: creation, reproduction, perception, learning, etc. In the field of art education, there is a synthesis of creative practices with digital technologies. In terms of music education, these processes at the present stage are provided with digital tools of specially developed software (music programs for composition and typing of musical text, recording, and correction of sound, for quality listening to the whole work or its fragments) for training programs used in institutional education and non-institutional learning as a means of independent mastering of the theory and practice of music-making, as well as other programs and technical tools without which contemporary art cannot be imagined. In modern stage education, the involvement of video technologies, means of remote communication, allowing realtime adjustment of the educational process, is actualized. In the sphere of fine arts, there is a transformation of communicative forms of interaction between the teacher and students, which in the conditions of the pandemic are of two-way communication with the help of information and communication technologies. At this stage, there is an intensification of transformation processes in the educational industry in the areas of culture and art.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Real-Time Image Mosaic Using DirectX (DirectX를 이용한 실시간 영상 모자익)

  • Chong, Min-Yeong;Choi, Seung-Hyun;Bae, Ki-Tae;Lee, Chil-Woo
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.803-810
    • /
    • 2003
  • In this paper, we describe a fast image mosaic method for constructing a large-scale image with video image captured from cameras that are arranged in radial shape. In the first step, we adopt the phase correlation algorithm to estimate the horizontal and vertical displacement between two adjacent images. Secondly, we calculate the accurate transform matrix among those cameras with Levenberg-Marquardt method. In the last step, those images are stitched into one large scale image in real-time by applying the transform matrix to the texture mapping function of DirectX. The feature of the method is that we do not need to use special hardware devices or write machine-level programs for Implementing a real-time mosaic system since we use conventional graphic APIs (Application Programming Interfaces), DirectX for image synthesis process.

A Vision-based Approach for Facial Expression Cloning by Facial Motion Tracking

  • Chun, Jun-Chul;Kwon, Oryun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.2 no.2
    • /
    • pp.120-133
    • /
    • 2008
  • This paper presents a novel approach for facial motion tracking and facial expression cloning to create a realistic facial animation of a 3D avatar. The exact head pose estimation and facial expression tracking are critical issues that must be solved when developing vision-based computer animation. In this paper, we deal with these two problems. The proposed approach consists of two phases: dynamic head pose estimation and facial expression cloning. The dynamic head pose estimation can robustly estimate a 3D head pose from input video images. Given an initial reference template of a face image and the corresponding 3D head pose, the full head motion is recovered by projecting a cylindrical head model onto the face image. It is possible to recover the head pose regardless of light variations and self-occlusion by updating the template dynamically. In the phase of synthesizing the facial expression, the variations of the major facial feature points of the face images are tracked by using optical flow and the variations are retargeted to the 3D face model. At the same time, we exploit the RBF (Radial Basis Function) to deform the local area of the face model around the major feature points. Consequently, facial expression synthesis is done by directly tracking the variations of the major feature points and indirectly estimating the variations of the regional feature points. From the experiments, we can prove that the proposed vision-based facial expression cloning method automatically estimates the 3D head pose and produces realistic 3D facial expressions in real time.

VLSI Architecture of General-purpose Memory Controller with High-Performance for Multiple Master (다중 마스터를 위한 고성능의 범용 메모리 제어기의 구조)

  • Choi, Hyun-Jun;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.175-182
    • /
    • 2011
  • In this paper, we implemented a high-performence memory controller which can accommodate processing blocks(multiple masters) in SoC for video signal processing. The memory controller is arbitrated by the internal arbiter which receives request signals from masters and sends grant and data signals to masters. The designed memory controller consists of Master Selector, Mster Arbiter, Memory Signal Generator, Command Decoder, and memory Signal Generator. It was designed using VHDL, and verified using the memory model of SAMSING Inc. For FPGA synthesis and verification, Quartus II of ATERA Inc. was used. The target device is Cyclone II. For simulation, ModelSim of Cadence Inc was used. Since the designed H/W can be stably operated in 174.28MHz, it satisfies the specification of SDRAM technology.

Stereoscopic Video Display System Based on H.264/AVC (H.264/AVC 기반의 스테레오 영상 디스플레이 시스템)

  • Kim, Tae-June;Kim, Jee-Hong;Yun, Jung-Hwan;Bae, Byung-Kyu;Kim, Dong-Wook;Yoo, Ji-Sang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.6C
    • /
    • pp.450-458
    • /
    • 2008
  • In this paper, we propose a real-time stereoscopic display system based on H.264/AVC. We initially acquire stereo-view images from stereo web-cam using OpenCV library. The captured images are converted to YUV 4:2:0 format as a preprocess. The input files are encoded by stereo-encoder, which has a proposed estimation structure, with more than 30 fps. The encoded bitstream are decoded by stereo-decoder reconstructing left and right images. The reconstructed stereo images are postprocessed by stereoscopic image synthesis technique to offer users more realistic images with 3D effect. Experimental results show that the proposed system has better encoding efficiency compared with using a conventional stereo CODEC(coder and decoder) and operates with real-time processing and low complexity suitable for an application with a mobile environment.

Eye Contact System Using Depth Fusion for Immersive Videoconferencing (실감형 화상 회의를 위해 깊이정보 혼합을 사용한 시선 맞춤 시스템)

  • Jang, Woo-Seok;Lee, Mi Suk;Ho, Yo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.7
    • /
    • pp.93-99
    • /
    • 2015
  • In this paper, we propose a gaze correction method for realistic video teleconferencing. Typically, cameras used in teleconferencing are installed at the side of the display monitor, but not in the center of the monitor. This system makes it too difficult for users to contact each eyes. Therefore, eys contact is the most important in the immersive videoconferencing. In the proposed method, we use the stereo camera and the depth camera to correct the eye contact. The depth camera is the kinect camera, which is the relatively cheap price, and estimate the depth information efficiently. However, the kinect camera has some inherent disadvantages. Therefore, we fuse the kinect camera with stereo camera to compensate the disadvantages of the kinect camera. Consecutively, for the gaze-corrected image, view synthesis is performed by 3D warping according to the depth information. Experimental results verify that the proposed system is effective in generating natural gaze-corrected images.

The Effect of Hydrocarbon Content and Temperature Distribution on The Morphology of Diamond Film Synthesized by Combustion Flame Method (연소 화염법에 의해 합성된 다이아몬드형상에 미치는 탄화수소량과 온도분포의 영향)

  • Kim, Seong-Yeong;Go, Myeong-Wan;Lee, Jae-Seong
    • Korean Journal of Materials Research
    • /
    • v.4 no.5
    • /
    • pp.566-573
    • /
    • 1994
  • The diamond synthesis by combustion flame method is considerably affected by the substrate surface temperature and its distribution which are mainly controlled by the ratio of mixed gas, $O_2/C_2H_2$. In order to elucidate the role of gas ratio in the diamond synthetic process by combustion flame, under various gas ratios (R=0.87~0.98; R=ratio of flow-rate of $O_2/C_2H_2$ gas) the substrate temperature was measured by using thermal video system and the morphological change of diamond crystals was analysed by using SEM, Raman spectroscope, and X-ray diffraction method. With increasing the gas ratio, i.e., decreasing the hydrocarbon content, the nucleation rate of diamond crystal was lowerd. It was also found that the morphology of diamond crystals changed from the cubo-octahedron type consisting of (100), (111) plane to the octahedron type of (111) plane. The increase of the substrate temperature consistently resulted in the increase of the nucleation rate as well as the growth rate of diamond crystals in which the surface of diamond crystal dominantly consisting of (100) plane.

  • PDF

Comparison Analysis of Four Face Swapping Models for Interactive Media Platform COX (인터랙티브 미디어 플랫폼 콕스에 제공될 4가지 얼굴 변형 기술의 비교분석)

  • Jeon, Ho-Beom;Ko, Hyun-kwan;Lee, Seon-Gyeong;Song, Bok-Deuk;Kim, Chae-Kyu;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.5
    • /
    • pp.535-546
    • /
    • 2019
  • Recently, there have been a lot of researches on the whole face replacement system, but it is not easy to obtain stable results due to various attitudes, angles and facial diversity. To produce a natural synthesis result when replacing the face shown in the video image, technologies such as face area detection, feature extraction, face alignment, face area segmentation, 3D attitude adjustment and facial transposition should all operate at a precise level. And each technology must be able to be interdependently combined. The results of our analysis show that the difficulty of implementing the technology and contribution to the system in facial replacement technology has increased in facial feature point extraction and facial alignment technology. On the other hand, the difficulty of the facial transposition technique and the three-dimensional posture adjustment technique were low, but showed the need for development. In this paper, we propose four facial replacement models such as 2-D Faceswap, OpenPose, Deekfake, and Cycle GAN, which are suitable for the Cox platform. These models have the following features; i.e. these models include a suitable model for front face pose image conversion, face pose image with active body movement, and face movement with right and left side by 15 degrees, Generative Adversarial Network.

A Method of Detection of Deepfake Using Bidirectional Convolutional LSTM (Bidirectional Convolutional LSTM을 이용한 Deepfake 탐지 방법)

  • Lee, Dae-hyeon;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1053-1065
    • /
    • 2020
  • With the recent development of hardware performance and artificial intelligence technology, sophisticated fake videos that are difficult to distinguish with the human's eye are increasing. Face synthesis technology using artificial intelligence is called Deepfake, and anyone with a little programming skill and deep learning knowledge can produce sophisticated fake videos using Deepfake. A number of indiscriminate fake videos has been increased significantly, which may lead to problems such as privacy violations, fake news and fraud. Therefore, it is necessary to detect fake video clips that cannot be discriminated by a human eyes. Thus, in this paper, we propose a deep-fake detection model applied with Bidirectional Convolution LSTM and Attention Module. Unlike LSTM, which considers only the forward sequential procedure, the model proposed in this paper uses the reverse order procedure. The Attention Module is used with a Convolutional neural network model to use the characteristics of each frame for extraction. Experiments have shown that the model proposed has 93.5% accuracy and AUC is up to 50% higher than the results of pre-existing studies.