• Title/Summary/Keyword: 오디오 인식

Search Result 118, Processing Time 0.026 seconds

A Digital Library Prototype for Access to Diverse Collections (다양한 장서 접근을 위한 디지털 도서관의 프로토타입 구축)

  • Choi Won-Tae
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.2
    • /
    • pp.295-307
    • /
    • 1998
  • This article is an overview of the digital library project, indicating what roles Koreas diverse digital collections may play. Our digital library prototype has simple architecture, consisting of digital repositories, filters, indexing and searching, and clients. Digital repositories include various types of materials and databases. The role of filters is to recognize a format of a document collection and mark the structural components of each of its documents. We are using a database management system (ORACLE and ConText) supporting user-defined functions and access methods that allows us to easily incorporate new object analysis, structuring, and indexing technology into a repository. Clients can be considered browsers or viewers designed for different document data types, such as image, audio, video, SGML, PDF, and KORMARC. The combination of navigational tools supports a variety of approaches to identifying collections and browsing or searching for individual items. The search interface was implemented using HTML forms and the World Wide Web's CGI mechanism.

  • PDF

A Study on the Classification of Podcasting Users in the Smartphone Era - Podcasting of Terrestrial Radio Programs (스마트폰 시대의 팟캐스팅 이용자 유형화 연구 - 지상파 프로그램의 팟캐스팅을 중심으로)

  • Kim, Cheol-Young
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.11
    • /
    • pp.628-643
    • /
    • 2014
  • The idea to conduct this study stemmed from the following question: What is the appropriate interpretation of the recent change in the user behavior of radio listeners? For some terrestrial radio programs, traditional listeners who mainly listen to them real-time through terrestrial radio broadcasting has been outnumbered by those who access them on smartphones and other mobile devices in a nonlinear way outside the pre-scheduled time slots of the programs. In the research, how terrestrial radio listeners use the new service called podcasting to access and consume audio content were examined by using Q methodology. As a result, three different types of user behavior and perception were modeled. This shows a prominent change in radio content use, which is moving away from the conventional user behavior pattern of radio content, one of the key media for mass communication in the 20th century. Such a development opens up new opportunities to create the same or even a greater user base compared to the existing one for terrestrial radio programs with the users' newly gained mobile access and to replace current radio content by using podcasting as a new service.

A Real-time Audio Surveillance System Detecting and Localizing Dangerous Sounds for PTZ Camera Surveillance (PTZ 카메라 감시를 위한 실시간 위험 소리 검출 및 음원 방향 추정 소리 감시 시스템)

  • Nguyen, Viet Quoc;Kang, HoSeok;Chung, Sun-Tae;Cho, Seongwon
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.11
    • /
    • pp.1272-1280
    • /
    • 2013
  • In this paper, we propose an audio surveillance system which can detect and localize dangerous sounds in real-time. The location information about dangerous sounds can render a PTZ camera to be directed so as to catch a snapshot image about the dangerous sound source area and send it to clients instantly. The proposed audio surveillance system firstly detects foreground sounds based on adaptive Gaussian mixture background sound model, and classifies it into one of pre-trained classes of foreground dangerous sounds. For detected dangerous sounds, a sound source localization algorithm based on Dual delay-line algorithm is applied to localize the sound sources. Finally, the proposed system renders a PTZ camera to be oriented towards the dangerous sound source region, and take a snapshot against over the sound source region. Experiment results show that the proposed system can detect foreground dangerous sounds stably and classifies the detected foreground dangerous sounds into correct classes with a precision of 79% while the sound source localization can estimate orientation of the sound source with acceptably small error.

Implementation of MPEG-U part2 Reference Software (MPEG-U part2 참조 소프트웨어 설계 및 구현)

  • Han, Gukhee;Baek, A-Ram;Choi, Haechul
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.202-205
    • /
    • 2012
  • 최근 멀티미디어 분야에서 다양한 입/출력 장치들이 개발됨에 따라 입/출력 장치와 사용자 사이의 향상된 상호작용(AUI : Advanced User Interaction)을 위한 방법들이 연구되고 있다. AUI에서 정의되는 데이터는 입/출력 장치와 다양한 객체(비디오, 오디오, 2D 그래픽 객체, 애니메이션 등)로 표현되는 Scene Description 사이에서 서로 정보를 주고받기 위한 매체이다. 따라서 다양한 입/출력 장치와 사용자 사이의 향상된 상호작용을 위해서는 AUI 데이터 형식이 공통적으로 정의되어야한다. 이를 위해 ISO/IEC JTC1/SC29/WG11 Moving Picture Experts Group(MPEG)에서는 XML(Extensible Markup Language) 문서로 AUI 데이터 포맷을 표준화하기 위한 MPEG-U 프로젝트를 진행 중이다. 본 논문에서는 MPEG-U의 표준을 소개하고, 이의 타당성을 검증하기 위해서 MPEG-U 참조 소프트웨어를 설계하였다. MPEG-U 참조 소프트웨어는 크게 UID(User Interaction Device)의 데이터를 처리하는 사용자 인터페이스 입/출력부와 XML 문서를 처리하는 MPEG-U XML 생성/해석부로 구성된다. 사용자 인터페이스 입력부에서는 사용자의 손동작을 인식하여 AUI 파라미터로 저장하고, 이 파라미터를 MPEG-U XML 생성부에서 MPEG-U 표준 XML 스키마 구조로 서술하여 표준화된 AUI 데이터 포맷을 생성한다. 다시 표준화된 XML 문서를 읽어 MPEG-U XML 해석부에서 파라미터를 얻고, 사용자 인터페이스 출력부에서 GUI(Graphic User Interface)에서 그래픽 객체로 표현한다. 본 연구에서는 MPEG-U 참조 소프트웨어로 MPEG-U의 용용 예를 제시하고, 구현된 소프트웨어가 표준에 적합한지를 보였다.

  • PDF

A Study on the Woman Oriented Sensibility in Product Design (여성적 감성을 반영한 제품디자인에 관한 연구)

  • Seo Hong-Seok
    • Science of Emotion and Sensibility
    • /
    • v.8 no.3
    • /
    • pp.231-240
    • /
    • 2005
  • The social participation and economic position of the woman come to be high, they are rising to the market as a core consumer and trend inventor leading the fashion. So that, we have to recognize the woman with one axis of product development, it is necessary to the product development and the strategy which it will put woman oriented sensibility in product design. From this research, it reviews the backgrounds from social culture and the economic marketing sides through the digital product recently, it analyzed the features of woman characteristic designs which is embossed the product. Also it connected actual product development, it developed a woman oriented sensibility audio which reflected a woman's consuming trend, life style and the preference product style. It proposed the necessity of woman oriented product development ultimately and researched the product design strategy for the reflection of woman sensibility.

  • PDF

Abnormal Behavior Pattern Identifications of One-person Households using Audio, Vision, and Dust Sensors (음성, 영상, 먼지 센서를 활용한 1인 가구 이상 행동 패턴 탐지)

  • Kim, Si-won;Ahn, Jun-ho
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.95-103
    • /
    • 2019
  • The number of one person households has grown steadily over the recent past and the population of lonely and unnoticed death are also observed. The phenomenon of one person households has been occurred. In the dark side of society, the remarkable number of lonely and unnoticed death are reported among different age-groups. We propose an unusual event detection method which may give a remarkable solution to reduce the number of the death rete for people dying alone and remaining undiscovered for a long period of time. The unusual event detection method we suggested to identify abnormal user behavior in their lives using vision pattern, audio pattern, and dust pattern algorithms. Individually proposed pattern algorithms have disadvantages of not being able to detect when they leave the coverage area. We utilized a fusion method to improve the accuracy performance of each pattern algorithm and evaluated the technique with multiple user behavior patterns in indoor areas.

Video Adaptation Model for User-Centric Contents Delivery in Mobile Computing (모바일 환경에서 맞춤형 콘텐츠 전달을 위한 비디오 적응성 모델)

  • Kim, Svetlana;Yoon, Yong-Ik
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.389-394
    • /
    • 2009
  • Lately the usage of multimedia equipment with small LCD displays is rapidly increasing. Although many people use devices like this, videos intended for TV or HDTV are sent to these mobile devices. Therefore cases where it is hard for the user to view the desired scenes are growing more frequent. Currently, most services simply reduce the size of the content to fit the screen when they offer it for mobile devices. However, especially with sports broadcasts, there are many areas that cannot be seen very well because it was simply reduced in size. We therefore consider this weakness and are researching how to let the user choose an area of interest and then sending it to the user in a way that fits the device. In this paper, we address the problem of video delivery and personalization. For the delivered video content, we suggest the UP-SAM User Personalized Context-Aware Service Adaptation Middleware) model that uses the video content description and MPEG-21 multimedia framework.

A Study on Perceptions for the Establishment of Collection Development Policy in Court Libraries (법원도서관 장서개발정책 수립을 위한 이용자 인식조사 연구)

  • Kwak, Seung-Jin;Noh, Younghee;Chang, Inho;Kim, Jeong-Taek;Shin, Youngji
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.3
    • /
    • pp.1-20
    • /
    • 2021
  • This study is the most important component in establishing the court library as the best legal library in Korea responsible for professional legal services. A perception survey was conducted on the target. As a result, first, looking at the collection direction based on the needs of general users, in the case of collection types, preference in the order of books, electronic materials, and non-books should be considered. It seems to be necessary to plan a collection development policy reflecting the high preference for books. In addition, in the non-books section, the preference for non-book materials in the form of video rather than audio is much higher, and in the case of language, domestic books should be collected mainly. Second, looking at the collection direction based on the needs of experts, the satisfaction of experts is generally low, so it seems that a collection development policy should be established to improve this. As for the type of information source, preference was shown in the order of electronic materials, books, and non-books. There is a need. The future collection direction should be based on the preference shown in the order of procedural law, specialized field, basic substantive law, and legal series. Also, when collecting the same book, electronic form of legal data should be considered rather than printed. In addition, it is necessary to collect collections mainly from domestic books, and then, it is expected that the scope of collection should be expanded to prioritize English and American books, Japanese books, and German books.

Performance Evaluation of CoMirror System with Video Call and Messaging Function between Smart Mirrors (스마트 미러간 화상 통화와 메시징 기능을 가진 CoMirror 시스템의 성능평가)

  • Kitae Hwang;Kyung-Mi Kim;Yu-Jin Kim;Chae-Won Park;Song-Yeon Yoo;In-Hwan Jung;Jae-Moon Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.51-57
    • /
    • 2023
  • Smart mirror is an IoT device that attaches a display and an embedded computer to the mirror and provides various information to the user along with the mirror function. This paper presents performance evaluation of the CoMirror system as an extension of the previous research in which proposed and implemented the CoMirror system that connects Smart Mirrors using a network. First, the login performance utilizing face recognition was evaluated. As result of the performance evaluation, it was concluded that the 40 face images are most suitable for face learning and only one face image is most suitable for face recognition for login. Second, as a result of evaluating the message transmission time, the average time was 0.5 seconds for text, 0.63 seconds for audio, and 2.9 seconds for images. Third, as a result of measuring a video communication performance, the average setup time for video communication was 1.8 seconds and the average video reception time was 1.9 seconds. Finally, according to the performance evaluation results, we conclude that the CoMirror system has high practicality.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.