• Title/Summary/Keyword: Audio retrieval

Search Result 102, Processing Time 0.026 seconds

Content-based Image Retrieval Using HSI Color Space and Neural Networks (HSI 컬러 공간과 신경망을 이용한 내용 기반 이미지 검색)

  • Kim, Kwang-Baek;Woo, Young-Woon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.2
    • /
    • pp.152-157
    • /
    • 2010
  • The development of computer and internet has introduced various types of media - such as, image, audio, video, and voice - to the traditional text-based information. However, most of the information retrieval systems are based only on text, which results in the absence of ability to use available information. By utilizing the available media, one can improve the performance of search system, which is commonly called content-based retrieval and content-based image retrieval system specifically tries to incorporate the analysis of images into search systems. In this paper, a content-based image retrieval system using HSI color space, ART2 algorithm, and SOM algorithm is introduced. First, images are analyzed in the HSI color space to generate several sets of features describing the images and an SOM algorithm is used to provide candidates of training features to a user. The features that are selected by a user are fed to the training part of a search system, which uses an ART2 algorithm. The proposed system can handle the case in which an image belongs to several groups and showed better performance than other systems.

Story-based Information Retrieval (스토리 기반의 정보 검색 연구)

  • You, Eun-Soon;Park, Seung-Bo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.81-96
    • /
    • 2013
  • Video information retrieval has become a very important issue because of the explosive increase in video data from Web content development. Meanwhile, content-based video analysis using visual features has been the main source for video information retrieval and browsing. Content in video can be represented with content-based analysis techniques, which can extract various features from audio-visual data such as frames, shots, colors, texture, or shape. Moreover, similarity between videos can be measured through content-based analysis. However, a movie that is one of typical types of video data is organized by story as well as audio-visual data. This causes a semantic gap between significant information recognized by people and information resulting from content-based analysis, when content-based video analysis using only audio-visual data of low level is applied to information retrieval of movie. The reason for this semantic gap is that the story line for a movie is high level information, with relationships in the content that changes as the movie progresses. Information retrieval related to the story line of a movie cannot be executed by only content-based analysis techniques. A formal model is needed, which can determine relationships among movie contents, or track meaning changes, in order to accurately retrieve the story information. Recently, story-based video analysis techniques have emerged using a social network concept for story information retrieval. These approaches represent a story by using the relationships between characters in a movie, but these approaches have problems. First, they do not express dynamic changes in relationships between characters according to story development. Second, they miss profound information, such as emotions indicating the identities and psychological states of the characters. Emotion is essential to understanding a character's motivation, conflict, and resolution. Third, they do not take account of events and background that contribute to the story. As a result, this paper reviews the importance and weaknesses of previous video analysis methods ranging from content-based approaches to story analysis based on social network. Also, we suggest necessary elements, such as character, background, and events, based on narrative structures introduced in the literature. We extract characters' emotional words from the script of the movie Pretty Woman by using the hierarchical attribute of WordNet, which is an extensive English thesaurus. WordNet offers relationships between words (e.g., synonyms, hypernyms, hyponyms, antonyms). We present a method to visualize the emotional pattern of a character over time. Second, a character's inner nature must be predetermined in order to model a character arc that can depict the character's growth and development. To this end, we analyze the amount of the character's dialogue in the script and track the character's inner nature using social network concepts, such as in-degree (incoming links) and out-degree (outgoing links). Additionally, we propose a method that can track a character's inner nature by tracing indices such as degree, in-degree, and out-degree of the character network in a movie through its progression. Finally, the spatial background where characters meet and where events take place is an important element in the story. We take advantage of the movie script to extracting significant spatial background and suggest a scene map describing spatial arrangements and distances in the movie. Important places where main characters first meet or where they stay during long periods of time can be extracted through this scene map. In view of the aforementioned three elements (character, event, background), we extract a variety of information related to the story and evaluate the performance of the proposed method. We can track story information extracted over time and detect a change in the character's emotion or inner nature, spatial movement, and conflicts and resolutions in the story.

Audio Fingerprint Extraction Method Using Multi-Level Quantization Scheme (다중 레벨 양자화 기법을 적용한 오디오 핑거프린트 추출 방법)

  • Song Won-Sik;Park Man-Soo;Kim Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4
    • /
    • pp.151-158
    • /
    • 2006
  • In this paper, we proposed a new audio fingerprint extraction method, based on Philips' music retrieval algorithm, which uses the energy difference of neighboring filter-bank and probabilistic characteristics of music. Since Philips method uses too many filter-banks in limited frequency band, it may cause audio fingerprints to be highly sensitive to additive noises and to have too high correlation between neighboring bands. The proposed method improves robustness to noises by reducing the number of filter-banks while it maintains the discriminative power by representing the energy difference of bands with 2 bits where the quantization levels are determined by probabilistic characteristics. The correlation which exists among 4 different levels in 2 bits is not only utilized in similarity measurement. but also in efficient reduction of searching area. Experiments show that the proposed method is not only more robust to various environmental noises (street, department, car, office, and restaurant), but also takes less time for database search than Philips in the case where music is highly degraded.

A New Tempo Feature Extraction Based on Modulation Spectrum Analysis for Music Information Retrieval Tasks

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.2
    • /
    • pp.95-106
    • /
    • 2007
  • This paper proposes an effective tempo feature extraction method for music information retrieval. The tempo information is modeled by the narrow-band temporal modulation components, which are decomposed into a modulation spectrum via joint frequency analysis. In implementation, the tempo feature is directly extracted from the modified discrete cosine transform coefficients, which is the output of partial MP3(MPEG 1 Layer 3) decoder. Then, different features are extracted from the amplitudes of modulation spectrum and applied to different music information retrieval tasks. The logarithmic scale modulation frequency coefficients are employed in automatic music emotion classification and music genre classification. The classification precision in both systems is improved significantly. The bit vectors derived from adaptive modulation spectrum is used in audio fingerprinting task That is proved to be able to achieve high robustness in this application. The experimental results in these tasks validate the effectiveness of the proposed tempo feature.

  • PDF

Multifunctional communication terminal on ATM networ (서비스 통합형 ATM 멀티미디어 통신단말)

  • 황대환;이종형;박영덕;조규섭
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.4
    • /
    • pp.873-892
    • /
    • 1998
  • In this paper, we propose an architecture of multimedia communication terminal that can e used in broadband ISDN environments. To design and implement the multimedia communication terminal, we analyzed the structure of multimedia terminals and the types of services which is recommended by public and private standard or ganization, such as ITU-T, Digital Audio-Visual Council(davic) and ATM Forum. The multifunctional communication terminal designed in this paper could allow inter-working between existing communication terminals on the heterogeneousnetwork and accept current and advanced multimedia communication application flexible. An implemented terminal is consisted of the multimedia processing board and the ATm interface board that is installed in PCI bus on personal computer. The integrated service multimedia communication terminal that is implemented supports retrieval, distributive and conversational communication service simultaneously. And then we performaned functional module test according to the individual communication services.

  • PDF

A Study on the Music Retrieval System using MPEG-7 Audio Low-Level Descriptors (MPEG-7 오디오 하위 서술자를 이용한 음악 검색 방법에 관한 연구)

  • Park Mansoo;Park Chuleui;Kim Hoi-Rin;Kang Kyeongok
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2003.11a
    • /
    • pp.215-218
    • /
    • 2003
  • 본 논문에서는 MPEG-7에 정의된 오디오 서술자를 이용한 오디오 특징을 기반으로 한 음악 검색 알고리즘을 제안한다. 특히 timbral 특징들은 음색 구분을 용이하게 할 수 있어 음악 검색뿐만 아니라 음악 장르 분류 또는 Query by humming에 이용 될 수 있다. 이러한 연구를 통하여 오디오 신호의 대표적인 특성을 표현 할 수 있는 특징벡터를 구성 할 수 있다면 추후에 멀티모달 시스템을 이용한 검색 알고리즘에도 오디오 특징으로 이용 될 수 있을 것이다 본 논문에서는 방송 시스템에 적용 할 수 있도록 검색 범위를 특정 컨텐츠의 O.S.T 앨범으로 제한하였다. 즉, 사용자가 임의로 선택한 부분적인 오디오 클립만을 이용하여 그 컨텐츠 전체의 O.S.T 앨범 내에서 음악을 검색할 수 있도록 하였다. 오디오 특징벡터를 구성하기 위한 MPEG-7 오디오 서술자의 조합 방법을 제안하고 distance 또는 ratio 계산 방식을 통해 성능 향상을 추구하였다. 또한 reference 음악의 템플릿 구성 방식의 변화를 통해 성능 향상을 추구하였다. Classifier로 k-NN 방식을 사용하여 성능 평가를 수행한 결과 timbral spectral feature들의 비율을 이용한 IFCR(Intra-Feature Component Ratio) 방식이 Euclidean distance 방식보다 우수한 성능을 보였다.

  • PDF

A Study on Modeling of Bibliographic Framework Based on FRBR for Television Program Materials (방송영상자료의 FRBR기반 서지구조모형에 관한 연구)

  • Chung, Jin-Gyoo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.41 no.1
    • /
    • pp.185-214
    • /
    • 2007
  • This study intends to design the bibliographic framework based on IFLA-FRBR model for television program materials and to evaluate this in terms of effectiveness of retrieval and usability of the system. The FRBR model supplies mote suitable bibliographic framework of audio-visual material which has a sufficient hierarchical relations and relative bibliographical records. The followings are research methods designed by this study; (1) The experimental metadata system named it FbCS based on FRBR was developed by using the entity-related database and composed of multi-layed and hierarchy. FbCS is developed through benchmarking of a case study for iMMix model in Netherlands based on FRBR. (2) To evaluate effectiveness of retrieval and usability of FbCS, this study made a experiment and survey by user groups of professionals.

A Study on Elemental Technology Identification of Sound Data for Audio Forensics (오디오 포렌식을 위한 소리 데이터의 요소 기술 식별 연구)

  • Hyejin Ryu;Ah-hyun Park;Sungkyun Jung;Doowon Jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.1
    • /
    • pp.115-127
    • /
    • 2024
  • The recent increase in digital audio media has greatly expanded the size and diversity of sound data, which has increased the importance of sound data analysis in the digital forensics process. However, the lack of standardized procedures and guidelines for sound data analysis has caused problems with the consistency and reliability of analysis results. The digital environment includes a wide variety of audio formats and recording conditions, but current audio forensic methodologies do not adequately reflect this diversity. Therefore, this study identifies Life-Cycle-based sound data elemental technologies and provides overall guidelines for sound data analysis so that effective analysis can be performed in all situations. Furthermore, the identified elemental technologies were analyzed for use in the development of digital forensic techniques for sound data. To demonstrate the effectiveness of the life-cycle-based sound data elemental technology identification system presented in this study, a case study on the process of developing an emergency retrieval technology based on sound data is presented. Through this case study, we confirmed that the elemental technologies identified based on the Life-Cycle in the process of developing digital forensic technology for sound data ensure the quality and consistency of data analysis and enable efficient sound data analysis.

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

Content Based Video Retrieval by Example Considering Context (문맥을 고려한 예제 기반 동영상 검색 알고리즘)

  • 박주현;낭종호;김경수;하명환;정병희
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.756-771
    • /
    • 2003
  • Digital Video Library System which manages a large amount of multimedia information requires efficient and effective retrieval methods. In this paper, we propose and implement a new video search and retrieval algorithm that compares the query video shot with the video shots in the archives in terms of foreground object, background image, audio, and its context. The foreground object is the region of the video image that has been changed in the successive frames of the shot, the background image is the remaining region of the video image, and the context is the relationship between the low-level features of the adjacent shots. Comparing these features is a result of reflecting the process of filming a moving picture, and it helps the user to submit a query focused on the desired features of the target video clips easily by adjusting their weights in the comparing process. Although the proposed search and retrieval algorithm could not totally reflect the high level semantics of the submitted query video, it tries to reflect the users' requirements as much as possible by considering the context of video clips and by adjusting its weight in the comparing process.