• Title/Summary/Keyword: Audio Data

Search Result 886, Processing Time 0.026 seconds

Development of a Mobile Application for Disease Prediction Using Speech Data of Korean Patients with Dysarthria (한국인 구음장애 환자의 발화 데이터 기반 질병 예측을 위한 모바일 애플리케이션 개발)

  • Changjin Ha;Taesik Go
    • Journal of Biomedical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • Communication with others plays an important role in human social interaction and information exchange in modern society. However, some individuals have difficulty in communicating due to dysarthria. Therefore, it is necessary to develop effective diagnostic techniques for early treatment of the dysarthria. In the present study, we propose a mobile device-based methodology that enables to automatically classify dysarthria type. The light-weight CNN model was trained by using the open audio dataset of Korean patients with dysarthria. The trained CNN model can successfully classify dysarthria into related subtype disease with 78.8%~96.6% accuracy. In addition, the user-friendly mobile application was also developed based on the trained CNN model. Users can easily record their voices according to the selected inspection type (e.g. word, sentence, paragraph, and semi-free speech) and evaluate the recorded voice data through their mobile device and the developed mobile application. This proposed technique would be helpful for personal management of dysarthria and decision making in clinic.

Analysis of Livestock Vocal Data using Lightweight MobileNet (경량화 MobileNet을 활용한 축산 데이터 음성 분석)

  • Se Yeon Chung;Sang Cheol Kim
    • Smart Media Journal
    • /
    • v.13 no.6
    • /
    • pp.16-23
    • /
    • 2024
  • Pigs express their reactions to their environment and health status through a variety of sounds, such as grunting, coughing, and screaming. Given the significance of pig vocalizations, their study has recently become a vital source of data for livestock industry workers. To facilitate this, we propose a lightweight deep learning model based on MobileNet that analyzes pig vocal patterns to distinguish pig voices from farm noise and differentiate between vocal sounds and coughing. This model was able to accurately identify pig vocalizations amidst a variety of background noises and cough sounds within the pigsty. Test results demonstrated that this model achieved a high accuracy of 98.2%. Based on these results, future research is expected to address issues such as analyzing pig emotions and identifying stress levels.

Design and Implementation of the Endoscope Image Store System in the Orthopedics (정형외과 관절경 영상 저장 시스템의 설계 및 구현)

  • 심갑식;정태영
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.4
    • /
    • pp.8-15
    • /
    • 2002
  • This Paper proposes designing and implementing the database system storing the medical images. This system collects the medical image when doctors operate and diagnose the patients using the endoscope in the orthopedics, then stores the medical image data to database. Therefore. system avoids duplicated medical data, retrieves and updates the medical data effectively. The medical image data can be shared to the multiple users and application programs. This system consists of the five components. that is, the input module acquiring the medical image from the endoscope. the modulo storing the medical image. the database design and implementation storms the patient's disease history and the medical image data, user friendly interface design and implementation, and the simple data retrieval engine. The features of the system are followed. The image catcher program using DirectShow is portable any image catcher board And because the image catcher algorithm is implemented as a public module, The throughput can be increased during the development of video and audio contents on internet.

  • PDF

Data Transmission Method using Broadcasting in Bluetooth Low Energy Environment (저전력 블루투스 환경에서 브로드캐스팅을 이용한 데이터전송 방법)

  • Jang, Rae-Young;Lee, Jae-Ung;Jung, Sung-Jae;Soh, Woo-Young
    • Journal of Digital Contents Society
    • /
    • v.19 no.5
    • /
    • pp.963-969
    • /
    • 2018
  • Wi-Fi and Bluetooth technologies are perhaps the most prominent examples of wireless communication technologies used in the Internet of Things (IoT) environment. Compared to widely used Wi-Fi, Bluetooth technology has some flaws including 1:1 connection (one-way) between Master and Slave, slow transmission, and limited connection range; Bluetooth is mainly used for connecting audio devices. Since the release of Bluetooth Low Energy (BLE), some of the flaws of Bluetooth technology have been improved but it still failed to become a competitive alternative of Wi-Fi. This paper presents a method of data transmission through broadcasting in BLE and demonstrates its performance, one-to-many data transfer result. The Connection-Free Data Transmission proposed in this paper will hopefully be utilized in special circumstances requiring 1:N data transmission or disaster security network.

Temporal attention based animal sound classification (시간 축 주의집중 기반 동물 울음소리 분류)

  • Kim, Jungmin;Lee, Younglo;Kim, Donghyeon;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.406-413
    • /
    • 2020
  • In this paper, to improve the classification accuracy of bird and amphibian acoustic sound, we utilize GLU (Gated Linear Unit) and Self-attention that encourages the network to extract important features from data and discriminate relevant important frames from all the input sequences for further performance improvement. To utilize acoustic data, we convert 1-D acoustic data to a log-Mel spectrogram. Subsequently, undesirable component such as background noise in the log-Mel spectrogram is reduced by GLU. Then, we employ the proposed temporal self-attention to improve classification accuracy. The data consist of 6-species of birds, 8-species of amphibians including endangered species in the natural environment. As a result, our proposed method is shown to achieve an accuracy of 91 % with bird data and 93 % with amphibian data. Overall, an improvement of about 6 % ~ 7 % accuracy in performance is achieved compared to the existing algorithms.

Audio Stream Delivery Using AMR(Adaptive Multi-Rate) Coder with Forward Error Correction in the Internet (인터넷 환경에서 FEC 기능이 추가된 AMR음성 부호화기를 이용한 오디오 스트림 전송)

  • 김은중;이인성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12A
    • /
    • pp.2027-2035
    • /
    • 2001
  • In this paper, we present an audio stream delivery using the AMR (Adaptive Multi-Rate) coder that was adopted by ETSI and 3GPP as a standard vocoder for next generation IMT-2000 service in which includes combined sender (FEC) and receiver reconstruction technique in the Internet. By use of the media-specific FEC scheme, the possibility to recover lost packets can be much increased due to the addition of repair data to a main data stream, by which the contents of lost packets can be recovered. The AMR codec is based on the code-excited linear predictive (CELP) coding model. So we use a frame erasure concealment for CELP-based coders. The proposed scheme is evaluated with ITU-T G.729 (CS-ACELP) coder and AMR - 12.2 kbit/s through the SNR (Signal to Noise Ratio) and the MOS (Mean Opinion Score) test. The proposed scheme provides 1.1 higher in Mean Opinion Score value and 5.61 dB higher than AMR - 12.2 kbit/s in terms of SNR in 10% packet loss, and maintains the communicab1e quality speech at frame erasure rates lop to 20%.

  • PDF

Spectral Perturbation of Theta and Alpha Wave for the Affective Auditory Stimuli (청각자극에 따른 세타파와 알파파의 스펙트럼적 반응)

  • Du, Ruoyu;Lee, Hyo Jong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.10
    • /
    • pp.451-456
    • /
    • 2014
  • The correlations between electroencephalographic (EEG) spectral power and emotional responses during affective sound clip listening are important parameters. Hemispheric asymmetry in prefrontal activation have been proposed in two decades ago, as measured by power value, is related to reactivity to affectively pleasure audio stimuli. In this study, we designed an emotional audio stimulus experiment in order to verify frontal EEG asymmetry by analyzing Event-related Spectral Perturbation (ERSP) results. Thirty healthy college male students volunteered the stimulus experiment with the standard IADS(International Affective Digital Sounds) clips. These affective sound clips are classified in three emotion states, high pleasure-high arousal (happy), middle pleasure-low arousal (neutral) and low pleasure-high arousal (fear). The analysis of the data was performed in both theta (4-8Hz) and alpha (8-13Hz) bands. ERSP maps in the alpha band revealed that there are the stronger power responses of high pleasure (happy) in the right frontal lobe, while the stronger power responses of middle-low pleasure (neutral and fear) in the left frontal lobe. Moreover, ERSP maps in the theta band revealed that there are the stronger power responses of high arousal (fear and happy) in the left pre-frontal lobe, while the stronger responses of low arousal (neutral) in the right pre-frontal lobe. However, the high pleasure emotions (happy) can elicit greater relative right EEG activity, while the low and middle pleasure emotions (fear and neutral) can elicit the greater relative left EEG activity. Additionally, the most differences of theta band have been found out in the medial frontal lobe, which is proved as the frontal midline theta. And there are the strongest responses of happy sounds in the alpha band around the whole frontal regions. These results are well suited for emotion recognition, and provide the evidences that theta and alpha powers may have the more important role in the emotion processing than previously believed.

Comparison of Multi-channel Terrestrial Broadcasting Service Method Focused on MMS and KoreaView (지상파 다채널방송 서비스 방식 비교 연구 (MMS와 KoreaView 방식을 중심으로))

  • Lee, Chang-Hyung;Park, Sung-Kyu
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.6
    • /
    • pp.78-91
    • /
    • 2012
  • The Terrestrial DTV service compliant with ATSC has been advancing for years. In KBA(Korean Broadcasters Association), a multi-channel service was broadcasted on air during the period of the 2006 FIFA World Cup Germany with the various type of MMS(Multi Mode Service) using MPEG-2 encoding method. MMS Service can provides not only one HD channel but also serveral additional services within 6MHz bandwidth. Using digital video compression technology(MPEG-2), many various programs such as HDTV, SDTV, Audio and Data are able to be transmitted within the same bandwidth. From November 2009, KBS has been preparing an advanced MMS service, 'Korea-View' which has both methods of encoding, MPEG-2 and H.264 that is compliant ATSC mobile standard, A/153. Korea-View is a kind of multi-channel broadcast service to provide one HD and 3 SD programs with the bandwidth of 6MHz. Terrestrial multi-channel service is required to focuse on expanding viewer service. Such Terrestrial multi-channel services will contribute to transferring to digital broadcasting and to extending the viewers' welfare. Due to advances in digital technology, Pay-TV channels has increased to hundreds. Even though digital switchover is being proceeded, terrestrial broadcasters have been unable to deliver multi-channel services. In this paper, technical features and differences of MMS and Koreaview will be analyzed regarding terrestrial multi-channel broadcasting services, and the politic direction will be proposed in accordance with introduction of future service.

Real-Time Implementation of MPEG-1 Layer III Audio Decoder Using TMS320C6201 (TMS320C6201을 이용한 MPEG-1 Layer III 오디오 디코더의 실시간 구현)

  • 권홍석;김시호;배건성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.8B
    • /
    • pp.1460-1468
    • /
    • 2000
  • The goal of this research is the real-time implementation of MPEG-1 Layer III audio decoder using the fixed-point digital signal processor of TMS320C6201 The main job for this work is twofold: one is to convert floating-point operation in the decoder into fixed-point operation while maintaining the high resolution, and the other is to optimize the program to make it run in real-time with memory size as small as possible. We, especially, devote much time to the descaling module in the decoder for conversion of floating-point operation into fixed-point operation with high accuracy. The inverse modified cosine transform(IMDCT) and synthesis polyphase filter bank modules are optimized in order to reduce the amount of computation and memory size. After the optimization process, in this paper, the implemented decoder uses about 26% of maximum computation capacity of TMS320C6201. The program memory, data ROM, data RAM used in the decoder are about 6.77kwords, 3.13 kwords and 9.94 kwords, respectively. Comparing the PCM output of fixed-point computation with that of floating-point computation, we achieve the signal-to-noise ratio of more than 60 dB. A real-time operation is demonstrated on the PC using the sound I/O and host communication functions in the EVM board.

  • PDF

Theoretical Investigation on Molecular Diffusion and Conceptual Change of Preservice Teachers by Inquiry Experiment (분자확산에 대한 이론적 고찰과 탐구실험을 통한 예비교사의 개념변화)

  • Seong, Suk-Kyoung;Baek, Jong-Ho;Jeong, Dea-Hong
    • Journal of The Korean Association For Science Education
    • /
    • v.30 no.1
    • /
    • pp.80-93
    • /
    • 2010
  • The scope of this study is: (1) to review or summarize the theoretical explanations of diffusion; (2) to investigate the preservice teachers' understanding of diffusion utilizing the inquiry experiment of diffusion that was developed in this study. The data was collected through questionnaires given to 41 preservice teachers in 3 universities and interviews with 20 subjects from this population, who conducted the inquiry experiment. During the experiment, the data was collected from the students' reports and 3 small groups' audio/video recordings. To understand preservice teachers' conceptions, reports, audio/video recordings, questionnaires and interviews were analyzed and discussed with co-workers. The results follow: (1) The differences between effusion and diffusion as well as equal-pressure experiment and equal-flux one on diffusion were discussed; (2) Most preservice teachers understood effusion and diffusion connected to Graham's law of diffusion by rote and have misconceptions about the diffusion process; (3) They observed two kinds of diffusion experiments (equal-pressure and equal-flux) by inquiry experiment, but the majority of them failed to find conceptual differences between these experiments. After the inquiry experiment, about 40% of the samples modified their conceptions about diffusion.