• Title/Summary/Keyword: Semantic similarity search

Search Result 56, Processing Time 0.018 seconds

A Categorization Scheme of Tag-based Folksonomy Images for Efficient Image Retrieval (효과적인 이미지 검색을 위한 태그 기반의 폭소노미 이미지 카테고리화 기법)

  • Ha, Eunji;Kim, Yongsung;Hwang, Eenjun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.6
    • /
    • pp.290-295
    • /
    • 2016
  • Recently, folksonomy-based image-sharing sites where users cooperatively make and utilize tags of image annotation have been gaining popularity. Typically, these sites retrieve images for a user request using simple text-based matching and display retrieved images in the form of photo stream. However, these tags are personal and subjective and images are not categorized, which results in poor retrieval accuracy and low user satisfaction. In this paper, we propose a categorization scheme for folksonomy images which can improve the retrieval accuracy in the tag-based image retrieval systems. Consequently, images are classified by the semantic similarity using text-information and image-information generated on the folksonomy. To evaluate the performance of our proposed scheme, we collect folksonomy images and categorize them using text features and image features. And then, we compare its retrieval accuracy with that of existing systems.

Region Based Image Similarity Search using Multi-point Relevance Feedback (다중점 적합성 피드백방법을 이용한 영역기반 이미지 유사성 검색)

  • Kim, Deok-Hwan;Lee, Ju-Hong;Song, Jae-Won
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.857-866
    • /
    • 2006
  • Performance of an image retrieval system is usually very low because of the semantic gap between the low level feature and the high level concept in a query image. Semantically relevant images may exhibit very different visual characteristics, and may be scattered in several clusters. In this paper, we propose a content based image rertrieval approach which combines region based image retrieval and a new relevance feedback method using adaptive clustering together. Our main goal is finding semantically related clusters to narrow down the semantic gap. Our method consists of region based clustering processes and cluster-merging process. All segmented regions of relevant images are organized into semantically related hierarchical clusters, and clusters are merged by finding the number of the latent clusters. This method, in the cluster-merging process, applies r: using v principal components instead of classical Hotelling's $T_v^2$ [1] to find the unknown number of clusters and resolve the singularity problem in high dimensions and demonstrate that there is little difference between the performance of $T^2$ and that of $T_v^2$. Experiments have demonstrated that the proposed approach is effective in improving the performance of an image retrieval system.

WordNet-Based Category Utility Approach for Author Name Disambiguation (저자명 모호성 해결을 위한 개념망 기반 카테고리 유틸리티)

  • Kim, Je-Min;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.225-232
    • /
    • 2009
  • Author name disambiguation is essential for improving performance of document indexing, retrieval, and web search. Author name disambiguation resolves the conflict when multiple authors share the same name label. This paper introduces a novel approach which exploits ontologies and WordNet-based category utility for author name disambiguation. Our method utilizes author knowledge in the form of populated ontology that uses various types of properties: titles, abstracts and co-authors of papers and authors' affiliation. Author ontology has been constructed in the artificial intelligence and semantic web areas semi-automatically using OWL API and heuristics. Author name disambiguation determines the correct author from various candidate authors in the populated author ontology. Candidate authors are evaluated using proposed WordNet-based category utility to resolve disambiguation. Category utility is a tradeoff between intra-class similarity and inter-class dissimilarity of author instances, where author instances are described in terms of attribute-value pairs. WordNet-based category utility has been proposed to exploit concept information in WordNet for semantic analysis for disambiguation. Experiments using the WordNet-based category utility increase the number of disambiguation by about 10% compared with that of category utility, and increase the overall amount of accuracy by around 98%.

Personalized Recommendation System for IPTV using Ontology and K-medoids (IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템)

  • Yun, Byeong-Dae;Kim, Jong-Woo;Cho, Yong-Seok;Kang, Sang-Gil
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.147-161
    • /
    • 2010
  • As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Context Sharing Framework Based on Time Dependent Metadata for Social News Service (소셜 뉴스를 위한 시간 종속적인 메타데이터 기반의 컨텍스트 공유 프레임워크)

  • Ga, Myung-Hyun;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.39-53
    • /
    • 2013
  • The emergence of the internet technology and SNS has increased the information flow and has changed the way people to communicate from one-way to two-way communication. Users not only consume and share the information, they also can create and share it among their friends across the social network service. It also changes the Social Media behavior to become one of the most important communication tools which also includes Social TV. Social TV is a form which people can watch a TV program and at the same share any information or its content with friends through Social media. Social News is getting popular and also known as a Participatory Social Media. It creates influences on user interest through Internet to represent society issues and creates news credibility based on user's reputation. However, the conventional platforms in news services only focus on the news recommendation domain. Recent development in SNS has changed this landscape to allow user to share and disseminate the news. Conventional platform does not provide any special way for news to be share. Currently, Social News Service only allows user to access the entire news. Nonetheless, they cannot access partial of the contents which related to users interest. For example user only have interested to a partial of the news and share the content, it is still hard for them to do so. In worst cases users might understand the news in different context. To solve this, Social News Service must provide a method to provide additional information. For example, Yovisto known as an academic video searching service provided time dependent metadata from the video. User can search and watch partial of video content according to time dependent metadata. They also can share content with a friend in social media. Yovisto applies a method to divide or synchronize a video based whenever the slides presentation is changed to another page. However, we are not able to employs this method on news video since the news video is not incorporating with any power point slides presentation. Segmentation method is required to separate the news video and to creating time dependent metadata. In this work, In this paper, a time dependent metadata-based framework is proposed to segment news contents and to provide time dependent metadata so that user can use context information to communicate with their friends. The transcript of the news is divided by using the proposed story segmentation method. We provide a tag to represent the entire content of the news. And provide the sub tag to indicate the segmented news which includes the starting time of the news. The time dependent metadata helps user to track the news information. It also allows them to leave a comment on each segment of the news. User also may share the news based on time metadata as segmented news or as a whole. Therefore, it helps the user to understand the shared news. To demonstrate the performance, we evaluate the story segmentation accuracy and also the tag generation. For this purpose, we measured accuracy of the story segmentation through semantic similarity and compared to the benchmark algorithm. Experimental results show that the proposed method outperforms benchmark algorithms in terms of the accuracy of story segmentation. It is important to note that sub tag accuracy is the most important as a part of the proposed framework to share the specific news context with others. To extract a more accurate sub tags, we have created stop word list that is not related to the content of the news such as name of the anchor or reporter. And we applied to framework. We have analyzed the accuracy of tags and sub tags which represent the context of news. From the analysis, it seems that proposed framework is helpful to users for sharing their opinions with context information in Social media and Social news.