• Title/Summary/Keyword: 오디오 추출

Search Result 170, Processing Time 0.026 seconds

Software Downloading for Digital TV Settop Boxes (디지털 TV 수신장치를 위한 소프트웨어 다운로드 기능)

  • Jung Moon-Ryul;Park Youn-Sun;Ryu Il-Kyoun;Kim Jin-Goo;Ahn Byoung-Kyu;Choi Seung-Pil;Kim Jung-Hwan;Choi Jin-Soo;Bang Gun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.271-276
    • /
    • 2004
  • 디지털방송이 시작되면서 고품질의 A/V(비디오/오디오)프로그램과 다양한 멀티미디어 컨텐츠를 제공하는 데이터방송을 처리하기 위한 수신 장치의 비중이 커지고 있다 이와 관련하여 데이터방송 환경에서 새로운 기술과 서비스가 등장할 때마다 이를 수용할 수 있는 소프트웨어를 탑재한 새로운 수신 장치가 필요하다. 일반적으로 한번 가정 내에 보급된 디지털 수신 장치의 소프트웨어 업그레이드가 용이하지 않기 때문에, 방송을 통해 이를 실현한다. 본 논문은 TV 셋탑박스 (STB) 내에 상주하는 middleware native application software 를 방송으로 다운받아 수정하는 기능을 지닌 STB 의 구현에 대해서 기술한다. 소프트웨어 업데이트 시스템은 소프트웨어를 포함하는 데이터 카루셀 스트림을 다운받아 파싱하는 다운로더, 추출된 소프트웨어를 설치하는 업데이트 로더, 그리고 예치상황이 발생하면 셋탑박스가 새로 부팅될 때, 로그 파일을 이용하여 소프트웨어를 옛날 상태의 회복시켜주는 리커버러 (recoverer)로 구성되어 있다. 다운로더는 지상파 디지털 방송 규격인 ATSC 규약에 맞게 구현하고, ATSC용 STB환경에서 테스트하고 있다.

  • PDF

Implementation of Character and Object Metadata Generation System for Media Archive Construction (미디어 아카이브 구축을 위한 등장인물, 사물 메타데이터 생성 시스템 구현)

  • Cho, Sungman;Lee, Seungju;Lee, Jaehyeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1076-1084
    • /
    • 2019
  • In this paper, we introduced a system that extracts metadata by recognizing characters and objects in media using deep learning technology. In the field of broadcasting, multimedia contents such as video, audio, image, and text have been converted to digital contents for a long time, but the unconverted resources still remain vast. Building media archives requires a lot of manual work, which is time consuming and costly. Therefore, by implementing a deep learning-based metadata generation system, it is possible to save time and cost in constructing media archives. The whole system consists of four elements: training data generation module, object recognition module, character recognition module, and API server. The deep learning network module and the face recognition module are implemented to recognize characters and objects from the media and describe them as metadata. The training data generation module was designed separately to facilitate the construction of data for training neural network, and the functions of face recognition and object recognition were configured as an API server. We trained the two neural-networks using 1500 persons and 80 kinds of object data and confirmed that the accuracy is 98% in the character test data and 42% in the object data.

Transport Overhead Analysis in Terrestrial UHD Broadcast A/V Stream (지상파 UHD 방송 AV 스트림 오버헤드 분석)

  • Kim, Nayeon;Bae, Byungjun
    • Journal of Broadcast Engineering
    • /
    • v.22 no.6
    • /
    • pp.744-754
    • /
    • 2017
  • This paper compares transport overhead of MPEG-2 TS, MMT and ROUTE in order to compare transport efficiency between the DTV and UHDTV. The MPEG-2 TS standard, widely used, was established for multiplexing and synchronizing encoded audio and video, additional information. In recent years, MMT and ROUTE was established as a next generation multimedia transport standard for the new broadcasting communication environment. In this paper, we compare and analyze transport overhead about three protocol. In order to analysis, we captured the UHD A/V stream in real-time broadcasting service using ROUTE and MMT, and we calculated and analyzed transport overhead using the overhead analysis program which was developed in our laboratory. Furthermore, for comparison under the same conditions, we assumed the MPEG-2 TS stream by extracting ES of UHD A/V stream based on the DTV standard. In this paper, we show the results of protocol transport efficiency in case of basic A/V stream except for additional services. And result show that MMT and ROUTE have similar overhead and MPEG-2 TS is relatively small overhead. However, since MPEG-2 TS result does not consider null packets, it is expected that the relative overhead difference will be reduced.

The Study on the Common Definition of Knowledge and its Development Relation -Focused on the General Information Systems, Knowledge Management, DSS and EIS- (지식의 공통적 정의와 발전적 연관 관계에 관한 연구 -일반적 정보시스템과 지식경영, DSS, EIS를 중심으로-)

  • Roh, Jeong-Ran
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.2
    • /
    • pp.239-259
    • /
    • 2004
  • The purpose of this study is to review the established research practices and managerial methods on the range of Knowledge that have been independently studied from the conventional information system (libraries) and the managerial information system (MIS, DSS and EIS) within the quantitative and the non-quantitative perspective. The information systems were developed through their own purpose since the 1950s and these days the corporate environments have become integrated due to the rapid creation and expansion of information. Therefore, to make fast decisions in this situation it is appropriate that these two systems, Library and the managerial information system, should be dealt within the same category. In other words, not only the quantitative data that become main sources of DSS or EIS, but also the qualitative data such as the text documents, video and audio data, which have been managed in the libraries and information centers and not extracted from the former, can be used as the new knowledge source. Also BSS/EIS can provide the splendid infrastructure for Knowledge Management(KM) while libraries/information centers manage the comprehensive range of explicit and tacit knowledge, which can be a facilitator or main driver for KM.

Development of MPEG-4 IPMP Authoring Tool (MPEG-4 IPMP 저작 도구 개발)

  • Kim Kwangyong;Hong Jinwoo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2003.11a
    • /
    • pp.75-78
    • /
    • 2003
  • MPEG-4 표준은 저작자가 정지영상, 텍스트, 2D/3D 그래픽스, 오디오, 심지어 임의형의 비디오 등과 같이 다양한 형태의 객체들을 개별적으로 구성하고 이들을 시/공간자적으로 다루기 용이하게 해 준다. 이와 같은 객체 기반 코딩 특성에 의해서 대화형 방송 콘텐츠를 제작하는데 가장 유용한 방식으로 고려할 수 있다. 피러나, 콘텐츠의 제작, 전송, 소비 관전에서 고려해 달 때, 콘텐츠 제작자 또는 저작권자의 보호 및 관리가 필요하게 되었다. 이에 따라 최근에는 OPIMA (Open Platform Initiative for Multimedia Access), SDMI (Secure Digital Music Initiative) and MPEG(Moving Picture Expert Group) OPIMATfMr(Intellectual Property Management & Protection)와 같은 국제 표준 단체들이 콘텐츠 보호 및 관리에 대한 관심을 가지게 되었다. 특히, MPEG의 경우에 MPEG-4 IPMP를 표준화하여 디지털 콘텐츠와 저작권에 대한 보호를 체계적이고 효과적으로 다루는 연구를 가장 활발히 해오고 있다. 이 논문에서 우리는 MPEG-4 콘텐츠 저작자가 MPEG-4 규격에 맞게 보호화 된 객체 기반 방송용 콘텐츠를 쉽고 편리하게 제작학 수 있도록 하기 위한 MPEG씨 콘텐츠 및 저자권 보호를 위한 MPEG-4 IPMP 저작 도구를 제안하고자 한다. 제안한 MPEG-4 콘텐츠 및 저작권 보호 저작 도구는 저작자에게 친근한 사용자 인터페이스를 제공하여 편집 및 수정이 용이한 텍스트 포맷인 IPMP회된 XMT(extensible Mpeg-4 Textual format) 파일을 생성한다. 또한, 콘텐츠 전송 및 저장의 효율성을 위해 이진 포멧인 IPMP화된 MP4 파일을 생성할 수 있다.으로써, 에러 이미지가 가지고 있는 엔트로피에 좀 근접하게 코딩을 할 수 있게 되었다. 이 방법은 실제로 Arithmetic Coder를 이용하는 다른 압축 방법에 그리고 적용할 수 있다. 실험 결과 압축효율은 JPEG-LS보다 약 $5\%$의 압축 성능 개선이 있었으며, CALIC과는 대등한 압축률을 보이며, 부호화/복호화 속도는 CALIC보다 우수한 것으로 나타났다.우 $23.87\%$($18.00\~30.91\%$), 갑폭 $23.99\%$($17.82\~30.48\%$), 체중 $91.51\%$($58.86\~129.14\%$)이였으며 성장율은 사육 온도구간별 차는 없었다.20 km 까지의 지점들(지점 2에서 지점 6)에서 매우 높은 값을 보이며 이는 조석작용으로 해수와 담수가 강제혼합되면서 표층퇴적물이 재부유하기 때문이라고 판단된다. 영양염류는 월별로 다소의 차이는 있으나, 대체적으로 지점 1과 2에서 가장 낮고, 상류로 갈수록 점차 증가하며 지점 7 상류역이 하류역에 비해 높은 농도이다. 월별로는 7월에 규산염, 용존무기태질소 및 암모니아의 농도가 가장 높은 반면에 용존산소포화도는 가장 낮다. 그러나 지점 14 상류역에서는 5월에 측정한 용존무기태질소, 암모니아, 인산염 및 COD 값이 7월보다 다소 높거나 비슷하다. 한편 영양염류와 COD값은 대체적으로 8월에 가장 낮으나 용존산소포화도는 가장 높다.출조건은 $100^{\circ}C$에서 1분간의 고온단시간 추출이 적합하였다. 증가를 나타내었는데, 저장기간에 따른 물성의 변화는 숭어에 비하여 붕장어가 적었다.양식산은 aspartic acid 및 proline이 많았다. 또한 잉어는 천연산이

  • PDF

Design of Pattern Classifier for Electrical and Electronic Waste Plastic Devices Using LIBS Spectrometer (LIBS 분광기를 이용한 폐소형가전 플라스틱 패턴 분류기의 설계)

  • Park, Sang-Beom;Bae, Jong-Soo;Oh, Sung-Kwun;Kim, Hyun-Ki
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.6
    • /
    • pp.477-484
    • /
    • 2016
  • Small industrial appliances such as fan, audio, electric rice cooker mostly consist of ABS, PP, PS materials. In colored plastics, it is possible to classify by near infrared(NIR) spectroscopy, while in black plastics, it is very difficult to classify black plastic because of the characteristic of black material that absorbs the light. So the RBFNNs pattern classifier is introduced for sorting electrical and electronic waste plastics through LIBS(Laser Induced Breakdown Spectroscopy) spectrometer. At the preprocessing part, PCA(Principle Component Analysis), as a kind of dimension reduction algorithms, is used to improve processing speed as well as to extract the effective data characteristics. In the condition part, FCM(Fuzzy C-Means) clustering is exploited. In the conclusion part, the coefficients of linear function of being polynomial type are used as connection weights. PSO and 5-fold cross validation are used to improve the reliability of performance as well as to enhance classification rate. The performance of the proposed classifier is described based on both optimization and no optimization.

Reducing latency of neural automatic piano transcription models (인공신경망 기반 저지연 피아노 채보 모델)

  • Dasol Lee;Dasaem Jeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.102-111
    • /
    • 2023
  • Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.

Comprehensive analysis of deep learning-based target classifiers in small and imbalanced active sonar datasets (소량 및 불균형 능동소나 데이터세트에 대한 딥러닝 기반 표적식별기의 종합적인 분석)

  • Geunhwan Kim;Youngsang Hwang;Sungjin Shin;Juho Kim;Soobok Hwang;Youngmin Choo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.329-344
    • /
    • 2023
  • In this study, we comprehensively analyze the generalization performance of various deep learning-based active sonar target classifiers when applied to small and imbalanced active sonar datasets. To generate the active sonar datasets, we use data from two different oceanic experiments conducted at different times and ocean. Each sample in the active sonar datasets is a time-frequency domain image, which is extracted from audio signal of contact after the detection process. For the comprehensive analysis, we utilize 22 Convolutional Neural Networks (CNN) models. Two datasets are used as train/validation datasets and test datasets, alternatively. To calculate the variance in the output of the target classifiers, the train/validation/test datasets are repeated 10 times. Hyperparameters for training are optimized using Bayesian optimization. The results demonstrate that shallow CNN models show superior robustness and generalization performance compared to most of deep CNN models. The results from this paper can serve as a valuable reference for future research directions in deep learning-based active sonar target classification.

Development of Music Recommendation System based on Customer Sentiment Analysis (소비자 감성 분석 기반의 음악 추천 알고리즘 개발)

  • Lee, Seung Jun;Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.197-217
    • /
    • 2018
  • Music is one of the most creative act that can express human sentiment with sound. Also, since music invoke people's sentiment to get empathized with it easily, it can either encourage or discourage people's sentiment with music what they are listening. Thus, sentiment is the primary factor when it comes to searching or recommending music to people. Regard to the music recommendation system, there are still lack of recommendation systems that are based on customer sentiment. An algorithm's that were used in previous music recommendation systems are mostly user based, for example, user's play history and playlists etc. Based on play history or playlists between multiple users, distance between music were calculated refer to basic information such as genre, singer, beat etc. It can filter out similar music to the users as a recommendation system. However those methodology have limitations like filter bubble. For example, if user listen to rock music only, it would be hard to get hip-hop or R&B music which have similar sentiment as a recommendation. In this study, we have focused on sentiment of music itself, and finally developed methodology of defining new index for music recommendation system. Concretely, we are proposing "SWEMS" index and using this index, we also extracted "Sentiment Pattern" for each music which was used for this research. Using this "SWEMS" index and "Sentiment Pattern", we expect that it can be used for a variety of purposes not only the music recommendation system but also as an algorithm which used for buildup predicting model etc. In this study, we had to develop the music recommendation system based on emotional adjectives which people generally feel when they listening to music. For that reason, it was necessary to collect a large amount of emotional adjectives as we can. Emotional adjectives were collected via previous study which is related to them. Also more emotional adjectives has collected via social metrics and qualitative interview. Finally, we could collect 134 individual adjectives. Through several steps, the collected adjectives were selected as the final 60 adjectives. Based on the final adjectives, music survey has taken as each item to evaluated the sentiment of a song. Surveys were taken by expert panels who like to listen to music. During the survey, all survey questions were based on emotional adjectives, no other information were collected. The music which evaluated from the previous step is divided into popular and unpopular songs, and the most relevant variables were derived from the popularity of music. The derived variables were reclassified through factor analysis and assigned a weight to the adjectives which belongs to the factor. We define the extracted factors as "SWEMS" index, which describes sentiment score of music in numeric value. In this study, we attempted to apply Case Based Reasoning method to implement an algorithm. Compare to other methodology, we used Case Based Reasoning because it shows similar problem solving method as what human do. Using "SWEMS" index of each music, an algorithm will be implemented based on the Euclidean distance to recommend a song similar to the emotion value which given by the factor for each music. Also, using "SWEMS" index, we can also draw "Sentiment Pattern" for each song. In this study, we found that the song which gives a similar emotion shows similar "Sentiment Pattern" each other. Through "Sentiment Pattern", we could also suggest a new group of music, which is different from the previous format of genre. This research would help people to quantify qualitative data. Also the algorithms can be used to quantify the content itself, which would help users to search the similar content more quickly.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.