• Title/Summary/Keyword: audio database

Search Result 75, Processing Time 0.02 seconds

Design of Image Retrieval System Based on XML Database Using Embedded System (임베디드 시스템을 이용한 XML 데이터베이스를 기반으로 이미지 검색 시스템의 설계)

  • Kim, Kyung-Soo
    • Convergence Security Journal
    • /
    • v.9 no.2
    • /
    • pp.85-89
    • /
    • 2009
  • This study to solve these problems a search system has been designed by combining the two methods. Also the search and manage image data by handheld devices such as portable PDA or smart phone, a system has been designed application to an embedded system. Once this is used, multimedia data can be efficiently searched and utilized by handheld devices.

  • PDF

Design and Implementation of a Real-time Audio Service System Using Database (데이터베이스를 이용한 실시간 오디오 서비스 시스템의 설계 및 구현)

  • 배진욱;이태원;홍석진;용환승;이석호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10b
    • /
    • pp.24-26
    • /
    • 1998
  • 웹을 통해 실시간으로 오디오 데이타를 서비스하는 기존의 시스템의 경우 웹 서버를 사용하여 서비스를 제공한다. 그러나 범용적인 웹 서비스를 위해 설계된 웹 서버는 실시간 데이터 전송을 위해 필수적인 전송 제어를 하지 못한다는 단점이 있다. 이러한 단점을 해결하기 위해 전송 제어가 가능한 오디오 서버(AEAP 서버)를 둔 실시간 오디오 서비스 시스템을 제안한다. 이 시스템에서는 오디오 데이터를 일정 크기의 작은 조각으로 쪼개어 데이터베이스에 저장하여 두었다가 사용자 요청이 오면 일정 시간 간격으로 조각 데이터를 전송하므로써 전송 제어를 구현하다. 이 결과 사용자들의 지연 시간 감소와 동시 사용자수 증대라는 성과를 거두었다.

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

  • Eom, Youngsik;Bang, Junseong
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.148-154
    • /
    • 2021
  • With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.

Development and validation of a Korean Affective Voice Database (한국형 감정 음성 데이터베이스 구축을 위한 타당도 연구)

  • Kim, Yeji;Song, Hyesun;Jeon, Yesol;Oh, Yoorim;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.77-86
    • /
    • 2022
  • In this study, we reported the validation results of the Korean Affective Voice Database (KAV DB), an affective voice database available for scientific and clinical use, comprising a total of 113 validated affective voice stimuli. The KAV DB includes audio-recordings of two actors (one male and one female), each uttering 10 semantically neutral sentences with the intention to convey six different affective states (happiness, anger, fear, sadness, surprise, and neutral). The database was organized into three separate voice stimulus sets in order to validate the KAV DB. Participants rated the stimuli on six rating scales corresponding to the six targeted affective states by using a 100 horizontal visual analog scale. The KAV DB showed high internal consistency for voice stimuli (Cronbach's α=.847). The database had high sensitivity (mean=82.8%) and specificity (mean=83.8%). The KAV DB is expected to be useful for both academic research and clinical purposes in the field of communication disorders. The KAV DB is available for download at https://kav-db.notion.site/KAV-DB-75 39a36abe2e414ebf4a50d80436b41a.

Additive Data Insertion into MP3 Bitstream Using linbits Characteristics (Linbits 특성을 이용하여 MP3 비트스트림에 부가적인 정보를 삽입하는 방법에 관한 연구)

  • 김도형;양승진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.612-621
    • /
    • 2003
  • As the use of MP3 audio compression increased, the demand for the insertion of additive data about copyright or information on music contents has been groved and the related research has been progressed actively. When an additive data is inserted into MP3 bitstream, it should not to happen any distortion of music quality or the change of file size, due to the modification of MP3 bitstream structure. In our study, to make these conditions satisfied, we inserted some additive data to bitstream by modifying some bits of linbits among the quantized integer coefficients having big values. At this time, we consider the characteristics of linbits and their distributions. As a result of subjective sound quality test through MOS test, we confirmed that the quality of MOS 4.6 can be achieved at the data insertion rate of 60 bytes/sec. Using the proposed method, it is possible to effectively insert an additive data such as copyright information or information about media itself, so that various applications like audio database management can be realized.

Energy and Statistical Filtering for a Robust Audio Fingerprinting System (강인한 오디오 핑거프린팅 시스템을 위한 에너지와 통계적 필터링)

  • Jeong, Byeong-Jun;Kim, Dae-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.1-9
    • /
    • 2012
  • The popularity of digital music and smart phones led to develope noise-robust real-time audio fingerprinting system in various ways. In particular, The Multiple Hashing(MLH) of fingerprint algorithms is robust to noise and has an elaborate structure. In this paper, we propose a filter engine based on MLH to achieve better performance. In this approach, we compose a energy-intensive filter to improve the accuracy of Q/R from music database and a statistic filter to remove continuity and redundancy. The energy-intensive filter uses the Discrite Cosine Transform(DCT)'s feature gathering energy to low-order bits and the statistic filters use the correlation between searched fingerprint's information. Experimental results show that the superiority of proposed algorithm consists of the energy and statistical filtering in noise environment. It is found that the proposed filter engine achieves more robust to noise than Philips Robust Hash(PRH), and a more compact way than MLH.

Robust Feature Extraction Based on Image-based Approach for Visual Speech Recognition (시각 음성인식을 위한 영상 기반 접근방법에 기반한 강인한 시각 특징 파라미터의 추출 방법)

  • Gyu, Song-Min;Pham, Thanh Trung;Min, So-Hee;Kim, Jing-Young;Na, Seung-You;Hwang, Sung-Taek
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.3
    • /
    • pp.348-355
    • /
    • 2010
  • In spite of development in speech recognition technology, speech recognition under noisy environment is still a difficult task. To solve this problem, Researchers has been proposed different methods where they have been used visual information except audio information for visual speech recognition. However, visual information also has visual noises as well as the noises of audio information, and this visual noises cause degradation in visual speech recognition. Therefore, it is one the field of interest how to extract visual features parameter for enhancing visual speech recognition performance. In this paper, we propose a method for visual feature parameter extraction based on image-base approach for enhancing recognition performance of the HMM based visual speech recognizer. For experiments, we have constructed Audio-visual database which is consisted with 105 speackers and each speaker has uttered 62 words. We have applied histogram matching, lip folding, RASTA filtering, Liner Mask, DCT and PCA. The experimental results show that the recognition performance of our proposed method enhanced at about 21% than the baseline method.

Design and Implementation of Wireless RFID Assistant System for Activity Monitoring of Elderly Living Alone (독거노인 활동 모니터링을 위한 보조 시스템의 설계 및 구현)

  • Jung, Kyung-Kwon;Lee, Yong-Gu;Kim, Yong-Joong
    • 전자공학회논문지 IE
    • /
    • v.46 no.3
    • /
    • pp.55-61
    • /
    • 2009
  • This paper describes an assistant system for elders who live alone. The developed system is composed of a wearable RFID system, a gateway system, and server system. The wearable RFID system is installed in glove. The wearable RFID system can be considered as a wireless sensor network which has a sink node and sensor node with a RFID reader. The sensor node can read RFID tags on the various objects used in daily living such as furniture, medicines, sugar and salt bottles, and ok. The sensor node transmits wireless packets to the sink node. The sink node sends the received packet immediately to a server system via a gateway system. The gateway provides users with audio-visual information of objects. The server system is composed of a database server and a web server. The data from each wearable RFID system is collected into a database, and then the data are processed to visualize the measurement of daily living activities of users. The processed data can be provided for someone who wants to know about user's daily living patterns in house such as family, caregivers, and medical crew.

Design Of an Advertising I-Commerce Server Using Push Technology (푸시 기술을 이용한 광고형 전자상거래 서버의 설계)

  • 박은영;장시웅
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2000.10a
    • /
    • pp.355-359
    • /
    • 2000
  • This paper presents design of an advertising E-commerce (electronic commerce) serve. using push technology which provides clients with multimedia information such as text, audio and video according to their setting information. In the most of existing E-commerce system, users visit the web site in person, see and buy goods. In this paper we show the new E-commerce format: Clients select a field of advertisement of what they want to see on the browser, then EC server sends the clients moving advertisements at regular intervals. Subsequently, the users see advertisements and buy goods. The server of this system was designed using push server that saves moving advertisements in the database and sends the user-specific advertisements to users. The system designed in this study is very an active system contrary to existing passive E-commerce systems.

  • PDF

Music Search Algorithm for Automotive Infotainment System (자동차 환경의 인포테인먼트 시스템을 위한 음악 검색 알고리즘)

  • Kim, Hyoung-Gook;Kim, Jae-Man
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.12 no.1
    • /
    • pp.81-87
    • /
    • 2013
  • In this paper, we propose a music search algorithm for automotive infotainment system. The proposed method extracts fingerprints using the high peaks based on log-spectrum of the music signal, and the extracted music fingerprints store in cloud server applying a hash value. In the cloud server, the most similar music is retrieved by comparing the user's query music with the fingerprints stored in hash table of cloud server. To evaluate the performance of the proposed music search algorithm, we measure an accuracy of the retrieved results according to various length of the query music and measure a retrieval time according to the number of stored music database in hash table.