• Title/Summary/Keyword: 군집색인

Search Result 36, Processing Time 0.01 seconds

Content based Image retrieval using Object Shape Token Clustering (객체 외형의 토큰 군집화를 통한 내용 기반 영상 검색)

  • Jeong Seok-hyun;KIM Gae-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.880-882
    • /
    • 2005
  • 내용기반 영상 검색 시스템은 데이터베이스에 저장된 정지영상의 색이나, 질감, 형태 등의 특징을 이용한다. 본 연구는 실험 영상 집합에서 주요 객체를 추출하여, 객체들의 외형으로부터 분리된 토큰들을 군집화 한 후, 그 군집단위를 색인어로 사용하여 검색하는 방법이다. 기존의 내용기반 영상 검색 시스템에서 모양 정보는 그 표현과 색인 정합 등의 문제로 처리 방법이 명확하지 않았고, 회전, 크기 변화, 폐색 등에 민감했다. 따라서 기존 방법의 문제점을 해결하기 위해서 토큰을 이용한 색인을 이용하여 지역 정보와, 이들 지역 정보들의 관계에 의한 전역 정보를 복합적으로 이용한 방법을 제안한다.

  • PDF

Term Clustering and Interleaving for Parallel Information Retrieval (색인어 군집화를 이용한 효율적인 병렬정보검색시스템)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.05a
    • /
    • pp.401-409
    • /
    • 2002
  • 인터넷과 같은 대량의 정보에 대응할 수 있는 고성능 정보검색시스템을 구축하기 위해서는 지금까지 고가의 중대형 컴퓨터를 주로 활용하여 왔으나, 최근 가격대 성능비가 높은 PC 클러스터 시스템을 활용하는 방안이 경제적인 대안으로 떠오르고 있다. PC 클러스터 상에서의 병렬정보검색시스템을 효율적으로 운영하기 위해서는 사용자가 입력한 질의를 처리하는데 요구되는 개별 PC의 디스크 I/O 및 검색관련 연산을 모든 PC에 가능한 균등하게 분배할 필요가 있다. 본 논문에서는 같은 질의에 동시에 등장할 가능성이 높은 색인어들끼리 군집 화하고 생성된 군집을 활용하여 색인어들을 각 PC에 분산저장함으로써 보다 높은 수준의 병렬화를 달성할 수 있는 방안을 제시한다. 대용량 말뭉치를 활용한 실험결과 본 논문에서 제시하는 분산저장기법이 충분한 효율성을 가지고 있음을 확인하였다.

  • PDF

A Comparative Study using Bibliometric Analysis Method on the Reformed Theology and Evangelicalism (개혁신학과 복음주의에 관한 계량서지학적 비교 연구)

  • Yoo, Yeong Jun;Lee, Jae Yun
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.29 no.3
    • /
    • pp.41-63
    • /
    • 2018
  • This study aimed at analyzing journals and index terms, authors of the reformed theology and evangelicalism, neutral theological position by using bibliometrical analyzing methods. The analyzing methods are average linkage and neighbor centralities, profile cosine similarities. Especially, when analyzing the relationship between authors, we interpreted the research topic by finding the key shared index terms between the authors. In the journal analysis results, 9 journals were largely clustered together in the two clusters of the reformed theology and evangelicalism, but Presbyterian Theological Quarterly that is thought to be a reformed journal was clustered in evangelical cluster. In the index terms analysis results of the clusters, the reformed theology and evangelicalism were key words representing the two clusters. In the authors' analysis results, we had 9 clusters and the Presbyterian theologian studying the reformed theology had the four clusters and the non-Presbyterian theologian had the 5 clusters. Therefore, we consistently had the two clusters of the reformed theology and evangelicalism in all the analysis of the journals and the index terms, the authors.

Index Structure for Efficient Similarity Search of Multi-Dimensional Data (다차원 데이터의 효과적인 유사도 검색을 위한 색인구조)

  • 복경수;허정필;유재수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.97-99
    • /
    • 2004
  • 본 논문에서는 다차원 데이터의 유사도 검색을 효과적으로 수행하기 위한 색인 구조를 제안한다. 제안하는 색인 구조는 차원의 저주 현상을 극복하기 위한 벡터 근사 기반의 색인 구조이다. 제안하는 색인 구조는 부모 노드를 기준으로 KDB-트리와 유사한 영역 분할 방식으로 분할하고 분할된 각 영역은 데이터의 분포 특성에 따라 동적 비트를 할당하여 벡터 근사화된 영역을 표현한다. 따라서, 하나의 노드 안에 않은 영역 정보를 저장하여 트리의 깊이를 줄일 수 있다. 또한 다차원의 특징 벡터 공간에 상대적인 비트를 할당하기 때문에 군집화되어 있는 데이터에 대해서 효과적이다 제안하는 색인 구조의 우수성을 보이기 위해 다양한 실험을 통하여 성능의 우수성을 입증한다.

  • PDF

Towards Next Generation Multimedia Information Retrieval by Analyzing User-centered Image Access and Use (이용자 중심의 이미지 접근과 이용 분석을 통한 차세대 멀티미디어 검색 패러다임 요소에 관한 연구)

  • Chung, EunKyung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.121-138
    • /
    • 2017
  • As information users seek multimedia with a wide variety of information needs, information environments for multimedia have been developed drastically. More specifically, as seeking multimedia with emotional access points has been popular, the needs for indexing in terms of abstract concepts including emotions have grown. This study aims to analyze the index terms extracted from Getty Image Bank. Five basic emotion terms, which are sadness, love, horror, happiness, anger, were used when collected the indexing terms. A total 22,675 index terms were used for this study. The data are three sets; entire emotion, positive emotion, and negative emotion. For these three data sets, co-word occurrence matrices were created and visualized in weighted network with PNNC clusters. The entire emotion network demonstrates three clusters and 20 sub-clusters. On the other hand, positive emotion network and negative emotion network show 10 clusters, respectively. The results point out three elements for next generation of multimedia retrieval: (1) the analysis on index terms for emotions shown in people on image, (2) the relationship between connotative term and denotative term and possibility for inferring connotative terms from denotative terms using the relationship, and (3) the significance of thesaurus on connotative term in order to expand related terms or synonyms for better access points.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

Clustered Hash Index-based Skyline Query (해시 색인 군집화 기반 스카이라인 질의)

  • Choi, Jong-Hyeok;Nasridinov, Aziz
    • Proceedings of The KACE
    • /
    • 2018.01a
    • /
    • pp.45-48
    • /
    • 2018
  • 스카이라인 질의는 지배라는 개념을 활용, 주어진 데이터로부터 데이터를 대표할 수 있는 데이터들을 탐색하기 때문에 사용자의 요청에 부합하는 최적의 결과를 탐색하거나 기업에서 의사결정을 이루기 위해 사용되는 등 넓은 활용을 보이고 있다. 하지만 스카이라인 질의는 데이터의 차원이 증가하는 경우 전체적인 성능의 감소와 함께 스카이라인으로 선택되는 데이터의 수가 급증하여 사용자에게 유용한 결과를 반환하지 못하게 된다. 이러한 문제를 해결하기 위해 최근에는 Top-k 질의 기반의 방식이나 군집화 기반의 기법을 적용한 방식의 스카이라인 질의들이 새롭게 제안되고 있지만 이들은 데이터의 편향이나 사용자로부터 입력된 k에 큰 영향을 받는 등 해당 질의 결과가 데이터들을 충분히 대표하거나 다양성을 만족시키지 못했다. 이러한 문제를 해결하기 위해 본 논문에서는 해시 색인 기법과 군집화 기법인 DBSCAN을 통해 주어진 데이터들을 충분히 대표함과 동시에 다양성을 만족할 수 있는 새로운 방식의 스카이라인인 CHI-SQ의 이론적 배경을 제안하고자 한다.

  • PDF

A Study on Intellectual Structure of Library and Information Science in Korea (문헌정보학의 지식 구조에 관한 연구)

  • Yoo, Yeong-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.3
    • /
    • pp.277-297
    • /
    • 2003
  • This study was conducted upon the premise that index terms display the intellectual structure of a specific subject field. In this study, and attempt was made to grasp the intellectual structure of Library and Information. Science by clustering the index terms of the journals of the related academic societies at the Library of National Assembly - such as the Journal of the Korean Society for Information Management, the Journal of the Korean Library and Information Science Society, and the Journal of the Korean Society for Library and Information Science. Through the course of the study, index term clusters were generated based on the linkage of the index terms and the frequency of co-occurrence, and moreover, time periods analysis was conducted along with studies on first-appearing terms, in order to clarify the trend and development process of the Library and Information Science. This study also analysed the difference between two intellectual structure by comparing the structure generated by index term clusters with the existing structure of traditional classification systems.