• Title/Summary/Keyword: Audio Data

Search Result 879, Processing Time 0.03 seconds

Application of Virtual Studio Technology and Digital Human Monocular Motion Capture Technology -Based on <Beast Town> as an Example-

  • YuanZi Sang;KiHong Kim;JuneSok Lee;JiChu Tang;GaoHe Zhang;ZhengRan Liu;QianRu Liu;ShiJie Sun;YuTing Wang;KaiXing Wang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.106-123
    • /
    • 2024
  • This article takes the talk show "Beast Town" as an example to introduce the overall technical solution, technical difficulties and countermeasures for the combination of cartoon virtual characters and virtual studio technology, providing reference and experience for the multi-scenario application of digital humans. Compared with the live broadcast that combines reality and reality, we have further upgraded our virtual production technology and digital human-driven technology, adopted industry-leading real-time virtual production technology and monocular camera driving technology, and launched a virtual cartoon character talk show - "Beast Town" to achieve real Perfectly combined with virtuality, it further enhances program immersion and audio-visual experience, and expands infinite boundaries for virtual manufacturing. In the talk show, motion capture shooting technology is used for final picture synthesis. The virtual scene needs to present dynamic effects, and at the same time realize the driving of the digital human and the movement with the push, pull and pan of the overall picture. This puts forward very high requirements for multi-party data synchronization, real-time driving of digital people, and synthetic picture rendering. We focus on issues such as virtual and real data docking and monocular camera motion capture effects. We combine camera outward tracking, multi-scene picture perspective, multi-machine rendering and other solutions to effectively solve picture linkage and rendering quality problems in a deeply immersive space environment. , presenting users with visual effects of linkage between digital people and live guests.

Real data-based active sonar signal synthesis method (실데이터 기반 능동 소나 신호 합성 방법론)

  • Yunsu Kim;Juho Kim;Jongwon Seok;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.1
    • /
    • pp.9-18
    • /
    • 2024
  • The importance of active sonar systems is emerging due to the quietness of underwater targets and the increase in ambient noise due to the increase in maritime traffic. However, the low signal-to-noise ratio of the echo signal due to multipath propagation of the signal, various clutter, ambient noise and reverberation makes it difficult to identify underwater targets using active sonar. Attempts have been made to apply data-based methods such as machine learning or deep learning to improve the performance of underwater target recognition systems, but it is difficult to collect enough data for training due to the nature of sonar datasets. Methods based on mathematical modeling have been mainly used to compensate for insufficient active sonar data. However, methodologies based on mathematical modeling have limitations in accurately simulating complex underwater phenomena. Therefore, in this paper, we propose a sonar signal synthesis method based on a deep neural network. In order to apply the neural network model to the field of sonar signal synthesis, the proposed method appropriately corrects the attention-based encoder and decoder to the sonar signal, which is the main module of the Tacotron model mainly used in the field of speech synthesis. It is possible to synthesize a signal more similar to the actual signal by training the proposed model using the dataset collected by arranging a simulated target in an actual marine environment. In order to verify the performance of the proposed method, Perceptual evaluation of audio quality test was conducted and within score difference -2.3 was shown compared to actual signal in a total of four different environments. These results prove that the active sonar signal generated by the proposed method approximates the actual signal.

Place Recognition Using Ensemble Learning of Mobile Multimodal Sensory Information (모바일 멀티모달 센서 정보의 앙상블 학습을 이용한 장소 인식)

  • Lee, Chung-Yeon;Lee, Beom-Jin;On, Kyoung-Woon;Ha, Jung-Woo;Kim, Hong-Il;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.64-69
    • /
    • 2015
  • Place awareness is an essential for location-based services that are widely provided to smartphone users. However, traditional GPS-based methods are only valid outdoors where the GPS signal is strong and also require symbolic place information of the physical location. In this paper, environmental sounds and images are used to recognize important aspects of each place. The proposed method extracts feature vectors from visual, auditory and location data recorded by a smartphone with built-in camera, microphone and GPS sensors modules. The heterogeneous feature vectors were then learned by an ensemble learning method that learns each group of feature vectors for each classifier respectively and votes to produce the highest weighted result. The proposed method is evaluated for place recognition using a data group of 3000 samples in six places and the experimental results show a remarkably improved recognition accuracy when using all kinds of sensory data comparing to results using data from a single sensor or audio-visual integrated data only.

Implementation of Character and Object Metadata Generation System for Media Archive Construction (미디어 아카이브 구축을 위한 등장인물, 사물 메타데이터 생성 시스템 구현)

  • Cho, Sungman;Lee, Seungju;Lee, Jaehyeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1076-1084
    • /
    • 2019
  • In this paper, we introduced a system that extracts metadata by recognizing characters and objects in media using deep learning technology. In the field of broadcasting, multimedia contents such as video, audio, image, and text have been converted to digital contents for a long time, but the unconverted resources still remain vast. Building media archives requires a lot of manual work, which is time consuming and costly. Therefore, by implementing a deep learning-based metadata generation system, it is possible to save time and cost in constructing media archives. The whole system consists of four elements: training data generation module, object recognition module, character recognition module, and API server. The deep learning network module and the face recognition module are implemented to recognize characters and objects from the media and describe them as metadata. The training data generation module was designed separately to facilitate the construction of data for training neural network, and the functions of face recognition and object recognition were configured as an API server. We trained the two neural-networks using 1500 persons and 80 kinds of object data and confirmed that the accuracy is 98% in the character test data and 42% in the object data.

Digital watermarking algorithm for authentication and detection of manipulated positions in MPEG-2 bit-stream (MPEG-2비트열에서의 인증 및 조작위치 검출을 위한 디지털 워터마킹 기법)

  • 박재연;임재혁;원치선
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.378-387
    • /
    • 2003
  • Digital watermarking is the technique that embeds invisible signalsincluding owner identification information, specific code, or pattern into multimedia data such as image, video and audio. Watermarking techniques can be classified into two groups; robust watermarking and fragile(semi-fragile) watermarking. The main purpose of the robust watermarking is the protection of copyright, whereas fragile(semi-fragile) watermarking prevents image or video data from illegal modifications. To achieve this goal watermark should survive from unintentional modifications such as random noise or compression, but it should be fragile for malicious manipulations. In this paper, an invertible semi-fragile watermarkingalgorithm for authentication and detection of manipulated location in MPEG-2 bit-stream is proposed. The proposed algorithm embeds two kinds of watermarks, which are embedded into quantized DCT coefficients. So it can be applied directly to the compressed bit-stream. The first watermark is used for authentication of video data. The second one is used for detection of malicious manipulations. It can distinguish transcodingin bit-stream domain from malicious manipulation and detect the block-wise locations of manipulations in video data. Also, since the proposed algorithm has an invertible property, recovering original video data is possible if the watermarked video is authentic.

A Feasibility Study of AMT Application to Tidal Flat Sedimentary Layer (갯벌 지역의 하부퇴적층에 대한 AMT 탐사의 적용 가능성 평가)

  • Kwon, Byung-Doo;Lee, Choon-Ki;Park, Gye-Soon;Choi, Su-Young;Yoo, Hee-Young;Choi, Jong-Keun;Eom, Joo-Young
    • Journal of the Korean earth science society
    • /
    • v.28 no.1
    • /
    • pp.64-74
    • /
    • 2007
  • The marine seismic prospecting using a research vessel in the shallow sea near the coastal area has certain limits according to the water depth and survey environment. Also, for the electrical resistivity survey at seashore area, one may need a specially designed high-voltage source to penetrate the very conductive surface layer. Therefore, we have conducted a feasibility study on the application of magnetotelluric method (MT), a passive geophysical method, on investigating of shallow marine environment geology. Our study involves both theoretical modeling and field survey at the tidal flat area which represent the very shallow marine environment. We have applied the audio-frequency magnetotelluric (AMT) method to the intertidal deposits of Gunhung Bay, west coast of Korea, and analysed the field data both qualitatively and quantitatively to investigate the morphology and sedimentary stratigraphy of the tidal flat. The inversion of AMT data well reveals the upper sedimentary layer of Holocene intertidal sediments having a range of 13-20 m thickness and the erosional patterns at the unconformable contact boundary. However, the AMT inversion results tend to overestimate the depth of basement (30-50 m) when compared with the seismic section (27-33 m). Since MT responses are not significantly sensitive to the resistivity of middle layer or the depth of basement, the AMT inversion result for basement may have to be adjusted using the comparison with other geophysical information like seismic section or logging data if possible. But, the AMT method can be an effective alternative choice for investigating the seashore area to get important basic informations such as the depositional environment of the tidal flat, sea-water intrusion and the basement structure near the sea shore.

A Study on the Educational Effects on Child-Raising Knowledge and Satisfaction with Out-Patient Care of Mothers with Ill-Child (환아 어머니 교육이 육아지식 정도 및 외래간호 만족도에 미치는 영향)

  • Lee So Yeon;Choi Mi Hye;Kwon Hye Jin
    • Child Health Nursing Research
    • /
    • v.3 no.1
    • /
    • pp.83-98
    • /
    • 1997
  • The purpose of this study were to find out the practical way to enlarged child-raising knowledge and to enhance their satisfaction with out-patient care by evaluating how effectively the education is done by nurses for mothers with ill-child and how their satisfaction with out-patient care changed. This study was designed as a Nonequivalent Control Group study. The subjects studied were consisted of the experimental and control group. Each consisted of 50 mothers with ill-child in pediatric department at one university hospital in Seoul. The period of this study is from May 20, 1996 to J one 28, 1996. The first data were collected from both of experimental and control groups in which mothers with ill-child come to the hospital for the first time. After this being done, the experimental group had been educated by the planned program and then the second data were collected from them. On the contrary, as for the control group, there had been no education and the second data were col looted on the same method. The data analysis was done by SPSS program. The results of this study are as follow, 1 The child-raising knowledge level of mothers with education was higher than that of with no education. (t=18.84, df=49, p=0.000) 2. The satisfaction with out-patient care level of mothers with education was higher than that of no education. (t=10.51, df=49, p=0.000) Based on these results, I suggest as follow, 1. The research on the patients and their family should be made not only in pediatric department, but in every out-patient department. 2. For more effective education, it is required for all out-patient nurses to research the education demand of patients and their family. 3. To increase the effect of education, there must be the consultation room in out-patient department. 4. The meetings with the mothers with ill-child of the same illness have to be established and periodical education must be executed. 5. Audio-visual education programs like video tapes are needed to make use of waiting time for the medical treatment. 6. On-line consulting programs are needed.

  • PDF

The QoS Filtering and Scalable Transmission Scheme of MPEG Data to Adapt Network Bandwidth Variation (통신망 대역폭 변화에 적응하는 MPEG 데이터의 QoS 필터링 기법과 스케일러블 전송 기법)

  • 유우종;김두현;유관종
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.479-494
    • /
    • 2000
  • Although the proliferation of real-time multimedia services over the Internet might indicate its successfulness in dealing with heterogeneous environments, it is obvious, on the other hand, that the internet now has to cope with a flood of multimedia data which consumes most of network communication channels due to a great deal of video or audio streams. Therefore, for the purpose of an efficient and appropriate utilization of network resources, it requires to develop and deploy a new scalable transmission technique n consideration of respective network environment and individual clients computing power. Also, we can eliminate the waste effects of storage device and data transmission overhead in that the same video stream duplicated according to QoS. The purpose of this paper is to develop a technology that can adjust the amount of data transmitted as an MPEG video stream according to its given communication bandwidth, and technique that can reflect dynamic bandwidth while playing a video stream. For this purpose, we introduce a media scalable media decomposer working on server side, and a scalable media composer working o n a client side, and then propose a scalable transmission method and a media sender and a media receiver in consideration of dynamic QoS. Those methods proposed her can facilitate an effective use of network resources, and provide multimedia MPEG video services in real-time with respect to individual client computing environment.

  • PDF

A Longitudinal Case Study of Late Babble and Early Speech in Southern Mandarin

  • Chen, Xiaoxiang
    • Cross-Cultural Studies
    • /
    • v.20
    • /
    • pp.5-27
    • /
    • 2010
  • This paper studies the relation between canonical/variegated babble (CB/VB) and early speech in an infant acquiring Mandarin Chinese from 9 to 17 months. The infant was audio-and video-taped in her home almost every week. The data analyzed here come from 1,621 utterances extracted from 23 sessions ranging from 30 minutes to one hour, from age 00:09;07 to 01:05;27. The data was digitized, and segments from 23 sessions were transcribed in narrow IPA and coded for analysis. Babble was coded from age 00:09;07 to 01:00;00, and words were coded from 01:00;00 to 01:05;27, proto-words appeared at 11 months, and some babble was still present after 01:10;00. 3821 segments were counted in CB/VB utterances, plus the segments found in 899 word tokens. The data transcription was completed and checked by the author and was rechecked by two other researchers who majored in Chinese phonetics in order to ensure the reliability, we reached an agreement of 95.65%. Mandarin Chinese is phonetically very rich in consonants, especially affricates: it has aspirated and unaspirated stops in labial, alveolar, and velar places of articulation; affricates and fricatives in alveolar, retroflex, and palatal places; /f/; labial, alveolar, and velar nasals; a lateral;[h]; and labiovelar and palatal glides. In the child's pre-speech phonetic repertoire, 7 different consonants and 10 vowels were transcribed at 00:09;07. By 00:10;16, the number of phones was more than doubled (17 consonants, 25 vowels), but the rate of increase slowed after 11 months of age. The phones from babbling remained active throughout the child's early and subsequent speech. The rank order of the occurrence of the major class types for both CB and early speech was: stops, approximants, nasals, affricates, fricatives and lateral. As expected, unaspirated stops outnumbered aspirated stops, and front stops and nasals were more frequent than back sounds in both types of utterances. The fact that affricates outnumbered fricatives in the child's late babble indicates the pre-speech influence of the ambient language. The analysis of the data also showed that: 1) the phonetic characteristics of CB/VB and early meaningful speech are extremely similar. The similarities of CB/VB and speech prove that the two are deeply related; 2) The infant has demonstrated similar preferences for certain types of sounds in the two stages; 3) The infant's babbling was patterned at segmental level, and this regularity was similarly evident in the early speech of children. The three types being coronal plus front vowel; labial plus central and dorsal plus back vowel exhibited much overlap in the phonetic forms of CB/ VB and early speech. So the child's CB/ VB at this stage already shared the basic architecture, composition and representation of early speech. The evidence of similarity between CB/VB and early speech leaves no doubt that phones present in CB/VB are indeed precursors to early speech.

Prediction of Music Generation on Time Series Using Bi-LSTM Model (Bi-LSTM 모델을 이용한 음악 생성 시계열 예측)

  • Kwangjin, Kim;Chilwoo, Lee
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.65-75
    • /
    • 2022
  • Deep learning is used as a creative tool that could overcome the limitations of existing analysis models and generate various types of results such as text, image, and music. In this paper, we propose a method necessary to preprocess audio data using the Niko's MIDI Pack sound source file as a data set and to generate music using Bi-LSTM. Based on the generated root note, the hidden layers are composed of multi-layers to create a new note suitable for the musical composition, and an attention mechanism is applied to the output gate of the decoder to apply the weight of the factors that affect the data input from the encoder. Setting variables such as loss function and optimization method are applied as parameters for improving the LSTM model. The proposed model is a multi-channel Bi-LSTM with attention that applies notes pitch generated from separating treble clef and bass clef, length of notes, rests, length of rests, and chords to improve the efficiency and prediction of MIDI deep learning process. The results of the learning generate a sound that matches the development of music scale distinct from noise, and we are aiming to contribute to generating a harmonistic stable music.