• Title/Summary/Keyword: visual-audio

Search Result 424, Processing Time 0.028 seconds

Audio Generative AI Usage Pattern Analysis by the Exploratory Study on the Participatory Assessment Process

  • Hanjin Lee;Yeeun Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.47-54
    • /
    • 2024
  • The importance of cultural arts education utilizing digital tools is increasing in terms of enhancing tech literacy, self-expression, and developing convergent capabilities. The creation process and evaluation of innovative multi-modal AI, provides expanded creative audio-visual experiences in users. In particular, the process of creating music with AI provides innovative experiences in all areas, from musical ideas to improving lyrics, editing and variations. In this study, we attempted to empirically analyze the process of performing tasks using an Audio and Music Generative AI platform and discussing with fellow learners. As a result, 12 services and 10 types of evaluation criteria were collected through voluntary participation, and divided into usage patterns and purposes. The academic, technological, and policy implications were presented for AI-powered liberal arts education with learners' perspectives.

A Scene Boundary Detection Scheme using Audio Information in MPEG System Stream (MPEG 시스템 스트림상에서 오디오 정보를 이용한 장면 경계 검출 방법)

  • Kim, Jae-Hong;Nang, Jong-Ho;Park, Soo-Yong
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.8
    • /
    • pp.864-876
    • /
    • 2000
  • This paper proposes a new scene boundary detection scheme for the MPEG System stream using MPEG Audio information and proves its usefulness by extensive experiments. A scene boundary has a characteristic that the audio as well as video information are changed rapidly. This paper first classifies this scene boundary into three cases ; Radical, Gradual, Micro Changes, with respect to the audio changes. The Radical change has a large-scale changing of decibel value and pitch value at a scene boundary, the Gradual change shows the long-time transition of decibel and pitch values from max to min or vice versa, and the Micro change displays a some change of pitch or frequency distribution without decibel changes. Upon this analysis, a new scene change detection algorithm detecting these three cases is proposed in which a progressive window with a time line is used to trace the changes in the audio information. Some experiments with various movies show that proposed algorithm could produce a high detection ratio for Radical change that is the most popular scene change in the movies, while producing a moderate detection ratio for Gradual and Micro changes. The proposed scene boundary detection scheme could be used to build a database for visual information like MPEG System stream.

  • PDF

Crossmodal Perception of Mismatched Emotional Expressions by Embodied Agents (에이전트의 표정과 목소리 정서의 교차양상지각)

  • Cho, Yu-Suk;Suk, Ji-He;Han, Kwang-Hee
    • Science of Emotion and Sensibility
    • /
    • v.12 no.3
    • /
    • pp.267-278
    • /
    • 2009
  • Today an embodied agent generates a large amount of interest because of its vital role for human-human interactions and human-computer interactions in virtual world. A number of researchers have found that we can recognize and distinguish between various emotions expressed by an embodied agent. In addition many studies found that we respond to simulated emotions in a similar way to human emotion. This study investigates interpretation of mismatched emotions expressed by an embodied agent (e.g. a happy face with a sad voice); whether audio-visual channel integration occurs or one channel dominates when participants judge the emotion. The study employed a 4 (visual: happy, sad, warm, cold) $\times$ 4 (audio: happy, sad, warm, cold) within-subjects repeated measure design. The results suggest that people perceive emotions not depending on just one channel but depending on both channels. Additionally facial expression (happy face vs. sad face) makes a difference in influence of two channels; Audio channel has more influence in interpretation of emotions when facial expression is happy. People were able to feel other emotion which was not expressed by face or voice from mismatched emotional expressions, so there is a possibility that we may express various and delicate emotions with embodied agent by using only several kinds of emotions.

  • PDF

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment (모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구)

  • Gyu, Song-Min;Pham, Thanh Trung;Kim, Jin-Young;Taek, Hwang-Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.478-484
    • /
    • 2009
  • Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.

A survey on the nonpharmacologic nursing intervention for children in pain (통증 환아를 위한 비약물적 간호 중재 방법 조사)

  • Yoon Hea Bong;Cho Kyoul Ja
    • Child Health Nursing Research
    • /
    • v.6 no.2
    • /
    • pp.144-157
    • /
    • 2000
  • This study was done to understand nonpharmacologic pain management for pediateric patients and nurses' knowledge and attitudes toward it. The aim of this study was that which method did the patient's use according to the nurses' age, and how did they effectively use these methods in their field. The subjects of this study were 77 nurses working in the Pediatric unit in the Kyung Medical Center from September 2 to 15, 1999 using questionnaire form. The results of this study were as follows : 1. We divided the subjects into four groups : Younger than one year old, 1-6 years, 6-12 years, 12-18 years group. In the group younger one year old, most of the nurses participating in this study used speaking in soft quiet tones, supportive touch, toys, pacifiers. In the group of 1-6 years, they used speaking in soft quiet tones, toys, distracting attention, story talking, and visual stimulus. In the group of 6-12 years. they used pop-up books, providing information, cold therapy, speaking in soft quiet tones, supportive touch. In the group of 12-19 years, most of them used providing information, controling respiration and supportive touch. 2. The effective nursing intervention used in their field are speaking in soft quiet tones, pacifiers and nesting with blanket in the group of younger than one year old. Un the group of 1-6 years old, speaking in soft quiet tones, toys, and supportive touch were effective method in the control of nonp-harmacologic pain management. In the group of 6-12 years old, story talking, supportive touch, and speaking in soft quiet tones were effective method and in the group of 12-18 years old, providing information, cold therapy and supportive touch were effectively used to control nonpharmacologic pain management. 3. To compare the general characteristics and non-pharmacologic pain nursing intervention, in the group of younger than one year, touching stimuli is widely used. In the groups of 1-6, and 6-12 years old, visual and audio method were widely used. In the group of 12-18 years old, sensitive intervention were used as well as education, information and guided imagery. In conclusion, there was no significant difference in nurses' demographic characteristics, child's age and nonpharmacologic pain management. There was significant difference only in the nurses working area, that is nurses working in the surgical department used more audio-visual-tactile pain management methods than medical department.

  • PDF

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

DECODE: A Novel Method of DEep CNN-based Object DEtection using Chirps Emission and Echo Signals in Indoor Environment (실내 환경에서 Chirp Emission과 Echo Signal을 이용한 심층신경망 기반 객체 감지 기법)

  • Nam, Hyunsoo;Jeong, Jongpil
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.3
    • /
    • pp.59-66
    • /
    • 2021
  • Humans mainly recognize surrounding objects using visual and auditory information among the five senses (sight, hearing, smell, touch, taste). Major research related to the latest object recognition mainly focuses on analysis using image sensor information. In this paper, after emitting various chirp audio signals into the observation space, collecting echoes through a 2-channel receiving sensor, converting them into spectral images, an object recognition experiment in 3D space was conducted using an image learning algorithm based on deep learning. Through this experiment, the experiment was conducted in a situation where there is noise and echo generated in a general indoor environment, not in the ideal condition of an anechoic room, and the object recognition through echo was able to estimate the position of the object with 83% accuracy. In addition, it was possible to obtain visual information through sound through learning of 3D sound by mapping the inference result to the observation space and the 3D sound spatial signal and outputting it as sound. This means that the use of various echo information along with image information is required for object recognition research, and it is thought that this technology can be used for augmented reality through 3D sound.

A Study on the Creative, Conceptual Using of Digital Technique (디지털 기법의 창조적, 개념적 활용의 유형에 관한 사례 연구 - 공간디자인 프로세스를 중심으로 -)

  • 박영태
    • Korean Institute of Interior Design Journal
    • /
    • no.28
    • /
    • pp.158-166
    • /
    • 2001
  • e-revolution makes a lot of changes in the methodology all over the world. That is, the theory of real time showing helps people to access audio and visual wherever and whenever they are. In the pst computers were considered as only tools which could make us work easily. However, the meaning of computer is changing with e-revolution nowadays. Computers are not just computers as they were; they have done a lot of things which we thought impossible and they will do in the future as well. This new wave encourages people who are teaching the design to use computers whatever they do. For example, instead of using pencil and a drafting board, most people in the design field work with monitors, mouse and plotter. Therefore, most people who are in the design field need to have the ability of computer skills. They have to use computers not only in their class but also in their office. However, if we use computers for visual presenting in the class, it will not be enough to catch the e-revolution. That is, we should work with computers in the creative and conceptual design such as the using of the design information and the applying digital techniques in the early stage of the work. The purpose of this study is to show how to work with computers in the spacial design process especially th using of the DIS(Design Information System) and the applying digital techniques in the early stage of the work.

  • PDF

A DATABASE FOR HUMAN PERFORMANCE UNDER SIMULATED EMERGENCIES OF NUCLEAR POWER PLANTS

  • Park, Jin-Kyun;Jung, Won-Dea
    • Nuclear Engineering and Technology
    • /
    • v.37 no.5
    • /
    • pp.491-502
    • /
    • 2005
  • Reliable human performance is a prerequisite in securing the safety of complicated process systems such as nuclear power plants. However, the amount of available knowledge that can explain why operators deviate from an expected performance level is so small because of the infrequency of real accidents. Therefore, in this study, a database that contains a set of useful information extracted from simulated emergencies was developed in order to provide important clues for understanding the change of operators' performance under stressful conditions (i.e., real accidents). The database was developed under Microsoft Windows TM environment using Microsoft Access $97^{TM}$ and Microsoft Visual Basic $6.0^{TM}$. In the database, operators' performance data obtained from the analysis of over 100 audio-visual records for simulated emergencies were stored using twenty kinds of distinctive data fields. A total of ten kinds of operators' performance data are available from the developed database. Although it is still difficult to predict operators' performance under stressful conditions based on the results of simulated emergencies, simulation studies remain the most feasible way to scrutinize performance. Accordingly, it is expected that the performance data of this study will provide a concrete foundation for understanding the change of operators' performance in emergency situations.

Emotional Evaluation According to the Changes of Visual and Auditory Landscape Elements in Residential Areas (주거단지의 시청각 조경요소 변화에 따른 감성평가)

  • Shin, Yong-Gyu;Jeon, Ji-Hyeon;Jang, Gil-Soo;Kim, Sun-Woo;Kook, Chan
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.17 no.7 s.124
    • /
    • pp.611-616
    • /
    • 2007
  • This study aims to clarify differences among the responses of users depending on variations in audio-visual landscape elements used to create amenities in residential areas. For the purpose, a laboratory experiment was performed to evaluate the emotions of subjects. As a result of subjective evaluation, it was found that the emotions of subjects were more significantly promoted in providing both sounds and images at the same time, than in providing images alone. In addition, as a result of comparing the variables of relativistic energy alpha waves have by measuring their brain waves, it was seen that alpha waves increased when providing harmonious sound sources with images, except for specific sound sources. Thus, it is considered that provision of sound sources capable of promoting human emotions can contribute greatly to improving the value of space for the sake of comfortable housing environment.