• 제목/요약/키워드: retrieval method

검색결과 1,557건 처리시간 0.026초

Query Expansion Based on Word Graphs Using Pseudo Non-Relevant Documents and Term Proximity (잠정적 부적합 문서와 어휘 근접도를 반영한 어휘 그래프 기반 질의 확장)

  • Jo, Seung-Hyeon;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • 제19B권3호
    • /
    • pp.189-194
    • /
    • 2012
  • In this paper, we propose a query expansion method based on word graphs using pseudo-relevant and pseudo non-relevant documents to achieve performance improvement in information retrieval. The initially retrieved documents are classified into a core cluster when a document includes core query terms extracted by query term combinations and the degree of query term proximity. Otherwise, documents are classified into a non-core cluster. The documents that belong to a core query cluster can be seen as pseudo-relevant documents, and the documents that belong to a non-core cluster can be seen as pseudo non-relevant documents. Each cluster is represented as a graph which has nodes and edges. Each node represents a term and each edge represents proximity between the term and a query term. The term weight is calculated by subtracting the term weight in the non-core cluster graph from the term weight in the core cluster graph. It means that a term with a high weight in a non-core cluster graph should not be considered as an expanded term. Expansion terms are selected according to the term weights. Experimental results on TREC WT10g test collection show that the proposed method achieves 9.4% improvement over the language model in mean average precision.

Content based Video Segmentation Algorithm using Comparison of Pattern Similarity (장면의 유사도 패턴 비교를 이용한 내용기반 동영상 분할 알고리즘)

  • Won, In-Su;Cho, Ju-Hee;Na, Sang-Il;Jin, Ju-Kyong;Jeong, Jae-Hyup;Jeong, Dong-Seok
    • Journal of Korea Multimedia Society
    • /
    • 제14권10호
    • /
    • pp.1252-1261
    • /
    • 2011
  • In this paper, we propose the comparison method of pattern similarity for video segmentation algorithm. The shot boundary type is categorized as 2 types, abrupt change and gradual change. The representative examples of gradual change are dissolve, fade-in, fade-out or wipe transition. The proposed method consider the problem to detect shot boundary as 2-class problem. We concentrated if the shot boundary event happens or not. It is essential to define similarity between frames for shot boundary detection. We proposed 2 similarity measures, within similarity and between similarity. The within similarity is defined by feature comparison between frames belong to same shot. The between similarity is defined by feature comparison between frames belong to different scene. Finally we calculated the statistical patterns comparison between the within similarity and between similarity. Because this measure is robust to flash light or object movement, our proposed algorithm make contribution towards reducing false positive rate. We employed color histogram and mean of sub-block on frame image as frame feature. We performed the experimental evaluation with video dataset including set of TREC-2001 and TREC-2002. The proposed algorithm shows the performance, 91.84% recall and 86.43% precision in experimental circumstance.

Component Analysis for Constructing an Emotion Ontology (감정 온톨로지의 구축을 위한 구성요소 분석)

  • Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • 제21권1호
    • /
    • pp.157-175
    • /
    • 2010
  • Understanding dialogue participant's emotion is important as well as decoding the explicit message in human communication. It is well known that non-verbal elements are more suitable for conveying speaker's emotions than verbal elements. Written texts, however, contain a variety of linguistic units that express emotions. This study aims at analyzing components for constructing an emotion ontology, that provides us with numerous applications in Human Language Technology. A majority of the previous work in text-based emotion processing focused on the classification of emotions, the construction of a dictionary describing emotion, and the retrieval of those lexica in texts through keyword spotting and/or syntactic parsing techniques. The retrieved or computed emotions based on that process did not show good results in terms of accuracy. Thus, more sophisticate components analysis is proposed and the linguistic factors are introduced in this study. (1) 5 linguistic types of emotion expressions are differentiated in terms of target (verbal/non-verbal) and the method (expressive/descriptive/iconic). The correlations among them as well as their correlation with the non-verbal expressive type are also determined. This characteristic is expected to guarantees more adaptability to our ontology in multi-modal environments. (2) As emotion-related components, this study proposes 24 emotion types, the 5-scale intensity (-2~+2), and the 3-scale polarity (positive/negative/neutral) which can describe a variety of emotions in more detail and in standardized way. (3) We introduce verbal expression-related components, such as 'experiencer', 'description target', 'description method' and 'linguistic features', which can classify and tag appropriately verbal expressions of emotions. (4) Adopting the linguistic tag sets proposed by ISO and TEI and providing the mapping table between our classification of emotions and Plutchik's, our ontology can be easily employed for multilingual processing.

  • PDF

Efficient Management of Statistical Information of Keywords on E-Catalogs (전자 카탈로그에 대한 효율적인 색인어 통계 정보 관리 방법)

  • Lee, Dong-Joo;Hwang, In-Beom;Lee, Sang-Goo
    • The Journal of Society for e-Business Studies
    • /
    • 제14권4호
    • /
    • pp.1-17
    • /
    • 2009
  • E-Catalogs which describe products or services are one of the most important data for the electronic commerce. E-Catalogs are created, updated, and removed in order to keep up-to-date information in e-Catalog database. However, when the number of catalogs increases, information integrity is violated by the several reasons like catalog duplication and abnormal classification. Catalog search, duplication checking, and automatic classification are important functions to utilize e-Catalogs and keep the integrity of e-Catalog database. To implement these functions, probabilistic models that use statistics of index words extracted from e-Catalogs had been suggested and the feasibility of the methods had been shown in several papers. However, even though these functions are used together in the e-Catalog management system, there has not been enough consideration about how to share common data used for each function and how to effectively manage statistics of index words. In this paper, we suggest a method to implement these three functions by using simple SQL supported by relational database management system. In addition, we use materialized views to reduce the load for implementing an application that manages statistics of index words. This brings the efficiency of managing statistics of index words by putting database management systems optimize statistics updating. We showed that our method is feasible to implement three functions and effective to manage statistics of index words with empirical evaluation.

  • PDF

A Parameter-Free Approach for Clustering and Outlier Detection in Image Databases (이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법)

  • Oh, Hyun-Kyo;Yoon, Seok-Ho;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • 제47권1호
    • /
    • pp.80-91
    • /
    • 2010
  • As the volume of image data increases dramatically, its good organization of image data is crucial for efficient image retrieval. Clustering is a typical way of organizing image data. However, traditional clustering methods have a difficulty of requiring a user to provide the number of clusters as a parameter before clustering. In this paper, we discuss an approach for clustering image data that does not require the parameter. Basically, the proposed approach is based on Cross-Association that finds a structure or patterns hidden in data using the relationship between individual objects. In order to apply Cross-Association to clustering of image data, we convert the image data into a graph first. Then, we perform Cross-Association on the graph thus obtained and interpret the results in the clustering perspective. We also propose the method of hierarchical clustering and the method of outlier detection based on Cross-Association. By performing a series of experiments, we verify the effectiveness of the proposed approach. Finally, we discuss the finding of a good value of k used in k-nearest neighbor search and also compare the clustering results with symmetric and asymmetric ways used in building a graph.

Efficient Methods for Detecting Frame Characteristics and Objects in Video Sequences (내용기반 비디오 검색을 위한 움직임 벡터 특징 추출 알고리즘)

  • Lee, Hyun-Chang;Lee, Jae-Hyun;Jang, Ok-Bae
    • Journal of KIISE:Software and Applications
    • /
    • 제35권1호
    • /
    • pp.1-11
    • /
    • 2008
  • This paper detected the characteristics of motion vector to support efficient content -based video search of video. Traditionally, the present frame of a video was divided into blocks of equal size and BMA (block matching algorithm) was used, which predicts the motion of each block in the reference frame on the time axis. However, BMA has several restrictions and vectors obtained by BMA are sometimes different from actual motions. To solve this problem, the foil search method was applied but this method is disadvantageous in that it has to make a large volume of calculation. Thus, as an alternative, the present study extracted the Spatio-Temporal characteristics of Motion Vector Spatio-Temporal Correlations (MVSTC). As a result, we could predict motion vectors more accurately using the motion vectors of neighboring blocks. However, because there are multiple reference block vectors, such additional information should be sent to the receiving end. Thus, we need to consider how to predict the motion characteristics of each block and how to define the appropriate scope of search. Based on the proposed algorithm, we examined motion prediction techniques for motion compensation and presented results of applying the techniques.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • 제43권10호
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

The Measurements of Biomass Burning Aerosols from GLI Data (GLI 자료를 이용한 생체 소각 에어러솔 측정에 대한 연구)

  • Lee Hyun Jin;Fukushima Hajime;Ha Kyung-Ja;Kim Jae Hwan
    • Korean Journal of Remote Sensing
    • /
    • 제21권4호
    • /
    • pp.273-285
    • /
    • 2005
  • This study has investigated the suitable wavelength for detecting biomass burning aerosols. We have performed the analysis of the wavelength at 380nm in near-UV, 400nm, 412nm, 460nm, and 490nm in visible, and 2100nm in shortwave infrared regions from the Global Imager measurements. It is well known that the UV bands have the advantage of the aerosols retrieval due to the low surface reflectance and a weak effect of Bidirectional Reflectivity Distribution Function. However, the pure surface reflectances of shortwave visible bands, except 412nm, are as low as that of 380nm in near-UV over northeast Asia. In order to detect the aerosol signal, we have retrieved the aerosol reflectance as a function of wavelength based on the surface reflectivity contrast method for the period of May 2003. It is interesting that the retrieved aerosol reflectance with 460nm is slightly more sensitive than that with 380nm. Additionally, we have applied the TOMS aerosol index method to determine the best pair for biomass burning aerosols and found that the pair of 380 and 460nm results in the best signal for retrieving aerosols.

Construction of the Terminology Dictionary for National R&D Information Utilization (국가R&D정보활용을 위한 전문용어사전 구축)

  • Kim, Tae-Hyun;Yang, Myung-Seok;Choi, Kwang-Nam
    • The Journal of the Korea Contents Association
    • /
    • 제19권10호
    • /
    • pp.217-225
    • /
    • 2019
  • National research and development(R&D) information is information generated in the process of performing R&D based on programs and projects issued by national government departments, and includes information from various research fields as ordered by various departments. Therefore, for efficient R&D information retrieval, it is necessary to build a national R&D terminology dictionary that can reflect the characteristics of such national R&D information. In this study, we propose a method for constructing a national R&D terminology dictionary by applying the classification of science and technology standards used to specify the research field in national R&D information. We will discuss the structural characteristics of national R&D project information and the usefulness of the project keyword, and explain the status of national R&D information by the National Standard Science and Technology Classification(NSSTC) Codes and the characteristics of the national R&D terminologies. Based on this, a method for building a national R&D terminology dictionary is defined in terms of the type and structure of the terminology dictionary, preliminary construction procedures, and refining rules. The national R&D terminology dictionary built on the basis of this study can be used in various ways such as expansion of search terms using Korean-English equivalent words and synonyms when searching national R&D information, clarifying the scope of search using NSSTC, and providing user convenience functions using term explanation information.

Calculations of the Single-Scattering Properties of Non-Spherical Ice Crystals: Toward Physically Consistent Cloud Microphysics and Radiation (비구형 빙정의 단일산란 특성 계산: 물리적으로 일관된 구름 미세물리와 복사를 향하여)

  • Um, Junshik;Jang, Seonghyeon;Kim, Jeonggyu;Park, Sungmin;Jung, Heejung;Han, Suji;Lee, Yunseo
    • Atmosphere
    • /
    • 제31권1호
    • /
    • pp.113-141
    • /
    • 2021
  • The impacts of ice clouds on the energy budget of the Earth and their representation in climate models have been identified as important and unsolved problems. Ice clouds consist almost exclusively of non-spherical ice crystals with various shapes and sizes. To determine the influences of ice clouds on solar and infrared radiation as required for remote sensing retrievals and numerical models, knowledge of scattering and microphysical properties of ice crystals is required. A conventional method for representing the radiative properties of ice clouds in satellite retrieval algorithms and numerical models is to combine measured microphysical properties of ice crystals from field campaigns and pre-calculated single-scattering libraries of different shapes and sizes of ice crystals, which depend heavily on microphysical and scattering properties of ice crystals. However, large discrepancies between theoretical calculations and observations of the radiative properties of ice clouds have been reported. Electron microscopy images of ice crystals grown in laboratories and captured by balloons show varying degrees of complex morphologies in sub-micron (e.g., surface roughness) and super-micron (e.g., inhomogeneous internal and external structures) scales that may cause these discrepancies. In this study, the current idealized models representing morphologies of ice crystals and the corresponding numerical methods (e.g., geometric optics, discrete dipole approximation, T-matrix, etc.) to calculate the single-scattering properties of ice crystals are reviewed. Current problems and difficulties in the calculations of the single-scattering properties of atmospheric ice crystals are addressed in terms of cloud microphysics. Future directions to develop physically consistent ice-crystal models are also discussed.