• Title/Summary/Keyword: Audio Generation

Search Result 103, Processing Time 0.026 seconds

Recent R&D Trends for 3D Deep Learning (3D 딥러닝 기술 동향)

  • Lee, S.W.;Hwang, B.W.;Lim, S.J.;Yoon, S.U.;Kim, T.J.;Choi, J.S.;Park, C.J.
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.5
    • /
    • pp.103-110
    • /
    • 2018
  • Studies on artificial intelligence have been developed for the past couple of decades. After a few periods of prosperity and recession, a new machine learning method, so-called Deep Learning, has been introduced. This is the result of high-quality big- data, an increase in computing power, and the development of new algorithms. The main targets for deep learning are 1D audio and 2D images. The application domain is being extended from a discriminative model, such as classification/segmentation, to a generative model. Currently, deep learning is used for processing 3D data. However, unlike 2D, it is not easy to acquire 3D learning data. Although low-cost 3D data acquisition sensors have become more popular owing to advances in 3D vision technology, the generation/acquisition of 3D data remains a very difficult problem. Moreover, it is not easy to directly apply an existing network model, such as a convolution network, owing to the variety of 3D data representations. In this paper, we summarize the 3D deep learning technology that have started to be developed within the last 2 years.

Performance Analysis of Combining Method for PAR Reduction in OFDM (OFDM에서 PAR을 제거하기 위한 혼합방법의 성능 해석)

  • 김병주;변건식
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.163-166
    • /
    • 2002
  • OFDM should be used for the fourth generation communication for high speed communication. Because of high spectral efficiency and high tolerance to fading channel, OFDM is applied to many high speed wire and wirless communication such as DAB(Digital Audio Broudcast), DVB(Digital Video Broadcast), IMT 2000 etc. Inter-modulation, however, is derived from PAR(Peak to Average Power Ratio) of OFDM signals. The paper describes PTS(Partial Transmit Sequence) and SLM(Select Mapping) of an existing methods which can reduce PAR. And then the document introduces the new method that is called "Combine PAR method". The method proposed in this paper is to combine PTS and SLM. As a result of the simulation, Combine PAR method is better than the existing methods.g methods.

  • PDF

A study on ultrasound analysis of the transformer strange signal (변압기 이상음의 초음파 분석에 관한 연구)

  • 백화종;지석근
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.835-838
    • /
    • 2002
  • A running high voltage equipments produce ultrasonic wave that has unique sound by the specific characteristics of the electricity. The generation of the ultrasonic wave is made by the electric transform like arcing, corona, and tracking so on. The mechanical losses and fatal human damages are happened by the electric failure of high voltage equipments. To prevent and diagnose the obstacle factors of the high voltage equipments, the measurement of the ultrasonic wave became to be prominent. However standardized data have been a deficient situation by now. This paper measures the ultrasonic wave coming from the real running transformer equipments and transforms it as an audio frequency. Measured data represents as frequency and time domain through the FFT(Fast Fourier Transform) transform. In conclusion, the purpose of this paper is to standardize the analyzed data.

  • PDF

Elimination of Discontinuity Phenomenon for Repeated Play of Finite DTV Stream (유한 DTV 스트림의 반복 재생시 불연속 현상 제거)

  • Han, Chan-Ho;Sohng, Kyu-Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.10A
    • /
    • pp.951-961
    • /
    • 2002
  • In general, there is discontinuity phenomenon like a black screen and an irregular sound for repeated play of a finite digital stream. In this paper, for repeated play we analyze the relation between source and stream time causes this phenomenon. We obtain the time relation between video frame rate, audio frame rate, and TS packet transmission rate to eliminate this phenomenon. Using this time relation, we propose a new generation method of elementary stream (ES) and transport stream (TS) to eliminate discontinuity phenomenon. The test results of the generate ES and TS using the proposed method show that the discontinuity phenomenon can be eliminated for repeated play of a finite proposed stream.

Implementation of Character and Object Metadata Generation System for Media Archive Construction (미디어 아카이브 구축을 위한 등장인물, 사물 메타데이터 생성 시스템 구현)

  • Cho, Sungman;Lee, Seungju;Lee, Jaehyeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1076-1084
    • /
    • 2019
  • In this paper, we introduced a system that extracts metadata by recognizing characters and objects in media using deep learning technology. In the field of broadcasting, multimedia contents such as video, audio, image, and text have been converted to digital contents for a long time, but the unconverted resources still remain vast. Building media archives requires a lot of manual work, which is time consuming and costly. Therefore, by implementing a deep learning-based metadata generation system, it is possible to save time and cost in constructing media archives. The whole system consists of four elements: training data generation module, object recognition module, character recognition module, and API server. The deep learning network module and the face recognition module are implemented to recognize characters and objects from the media and describe them as metadata. The training data generation module was designed separately to facilitate the construction of data for training neural network, and the functions of face recognition and object recognition were configured as an API server. We trained the two neural-networks using 1500 persons and 80 kinds of object data and confirmed that the accuracy is 98% in the character test data and 42% in the object data.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Audio Stream Delivery Using AMR(Adaptive Multi-Rate) Coder with Forward Error Correction in the Internet (인터넷 환경에서 FEC 기능이 추가된 AMR음성 부호화기를 이용한 오디오 스트림 전송)

  • 김은중;이인성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12A
    • /
    • pp.2027-2035
    • /
    • 2001
  • In this paper, we present an audio stream delivery using the AMR (Adaptive Multi-Rate) coder that was adopted by ETSI and 3GPP as a standard vocoder for next generation IMT-2000 service in which includes combined sender (FEC) and receiver reconstruction technique in the Internet. By use of the media-specific FEC scheme, the possibility to recover lost packets can be much increased due to the addition of repair data to a main data stream, by which the contents of lost packets can be recovered. The AMR codec is based on the code-excited linear predictive (CELP) coding model. So we use a frame erasure concealment for CELP-based coders. The proposed scheme is evaluated with ITU-T G.729 (CS-ACELP) coder and AMR - 12.2 kbit/s through the SNR (Signal to Noise Ratio) and the MOS (Mean Opinion Score) test. The proposed scheme provides 1.1 higher in Mean Opinion Score value and 5.61 dB higher than AMR - 12.2 kbit/s in terms of SNR in 10% packet loss, and maintains the communicab1e quality speech at frame erasure rates lop to 20%.

  • PDF

Implementation of LabVIEW based Testbed for MHA FTSR (LabVIEW 기반의 MHA 명령방식 비행종단수신기 점검장비 구현)

  • Kim, Myung-Hwan;Hwang, Soo-Sul;Lim, You-Cheol;Ma, Keun-Su
    • Aerospace Engineering and Technology
    • /
    • v.13 no.1
    • /
    • pp.55-62
    • /
    • 2014
  • FTSR(Flight Termination System Receiver) is a device that receives a ground command signal to abort a flight mission when abnormal conditions occur in the space launch vehicle. The secure tone command message shall consist of a series of 11 character tone pattern. Each character consists of the sum of two tones which taken from a set of 7 tones defined by IRIG(Inter-Range Instrumentation Group) in the audio frequency range. The MHA(Modified High alphabet) command adds a security feature to the secure tone command by using the predefined difference code. In order to check the function and performance of MHA FTSR, which is under development, for KSLV-II, the testbed should have functions of RF signal generation, receiver's output port monitoring, RS-422 communication and test data management. In this paper, we first briefly introduce MHA command and FTSR interface, and then show the LavVIEW based testbed include its H/W configuration, S/W implementation and test results.

A Study on the Effect of Next-Generation Mobile Advertising Service for TV Advertising Effectiveness (TV 광고 효과 향상을 위한 차세대 모바일 광고서비스 효과 연구)

  • Choi, Minkyung;Lee, Ook
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.3
    • /
    • pp.17-22
    • /
    • 2018
  • This study suggests that 'Will not we use more smartphones while watching TV commercials? Then, will the TV advertising effect decrease due to the distributed concentration?". As a result of the preliminary study, it was confirmed that about 63% of users watching TV commercials use smartphones together. In this study, we propose the possibility of a new mobile advertising service by experimenting and analyzing the effect of high frequency technology based TV advertising linked mobile advertising service. Experimental results show that the response rate of the same content is improved about 9 times as compared with that of the general mobile advertisement when the advertisement of the same content is provided by the TV advertisement interlocking type. It can be confirmed that it is quite effective to provide the same content advertisement to the customers who are in front of the TV at the time of sending the TV advertisement. Therefore, it is expected that various services based on high frequency technology will be activated as a new advertising service that will preserve the effect of TV commercials in the future.

Implementation of Visible Light Communication System Modulated by a Switching Driver Circuit of Lighting LED (조명용 LED의 스위칭 구동 회로로 변조되는 가시광 통신 시스템의 구현)

  • Cho, Sang-Ho;Han, Sang-Kyoo;Roh, Chung-Wook;Hong, Sung-Soo;Jang, Byung-Jun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.21 no.8
    • /
    • pp.905-910
    • /
    • 2010
  • In this paper, visible light communication(VLC) system modulated by a switching driver circuit of lighting light emitting diode(LED), not only for illumination but also for optical wireless communication, is implemented. Presented system could overcome the drawbacks of prior linear modulation technique such as low efficiency, heat generation, and limits to realization of high power lighting LED. Experimental results from the realized digital audio system are presented to confirm the superiority of the proposed circuit. Our prototype achieves a transmission data rate of 10 Mbps with a radius of 1.5 meters using 20 W output power, and the signals were detected successfully.