• Title/Summary/Keyword: Semantic similarity search

Search Result 56, Processing Time 0.019 seconds

An Image Retrieving Scheme Using Salient Features and Annotation Watermarking

  • Wang, Jenq-Haur;Liu, Chuan-Ming;Syu, Jhih-Siang;Chen, Yen-Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.1
    • /
    • pp.213-231
    • /
    • 2014
  • Existing image search systems allow users to search images by keywords, or by example images through content-based image retrieval (CBIR). On the other hand, users might learn more relevant textual information about an image from its text captions or surrounding contexts within documents or Web pages. Without such contexts, it's difficult to extract semantic description directly from the image content. In this paper, we propose an annotation watermarking system for users to embed text descriptions, and retrieve more relevant textual information from similar images. First, tags associated with an image are converted by two-dimensional code and embedded into the image by discrete wavelet transform (DWT). Next, for images without annotations, similar images can be obtained by CBIR techniques and embedded annotations can be extracted. Specifically, we use global features such as color ratios and dominant sub-image colors for preliminary filtering. Then, local features such as Scale-Invariant Feature Transform (SIFT) descriptors are extracted for similarity matching. This design can achieve good effectiveness with reasonable processing time in practical systems. Our experimental results showed good accuracy in retrieving similar images and extracting relevant tags from similar images.

Improving Performance of Search Engine Using Category based Evaluation (범주 기반 평가를 이용한 검색시스템의 성능 향상)

  • Kim, Hyung-Il;Yoon, Hyun-Nim
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.1
    • /
    • pp.19-29
    • /
    • 2013
  • In the current Internet environment where there is high space complexity of information, search engines aim to provide accurate information that users want. But content-based method adopted by most of search engines cannot be used as an effective tool in the current Internet environment. As content-based method gives different weights to each web page using morphological characteristics of vocabulary, the method has its drawbacks of not being effective in distinguishing each web page. To resolve this problem and provide useful information to the users, this paper proposes an evaluation method based on categories. Category-based evaluation method is to extend query to semantic relations and measure the similarity to web pages. In applying weighting to web pages, category-based evaluation method utilizes user response to web page retrieval and categories of query and thus better distinguish web pages. The method proposed in this paper has the advantage of being able to effectively provide the information users want through search engines and the utility of category-based evaluation technique has been confirmed through various experiments.

The MapDS-Onto Framework for Matching Formula Factors of KPIs and Database Schema: A Case Study of the Prince of Songkla University

  • Kittisak Kaewninprasert;Supaporn Chai-Arayalert;Narueban Yamaqupta
    • Journal of Information Science Theory and Practice
    • /
    • v.12 no.3
    • /
    • pp.49-62
    • /
    • 2024
  • Strategy monitoring is essential for business management and for administrators, including managers and executives, to build a data-driven organization. Having a tool that is able to visualize strategic data is significant for business intelligence. Unfortunately, there are gaps between business users and information technology departments or business intelligence experts that need to be filled to meet user requirements. For example, business users want to be self-reliant when using business intelligence systems, but they are too inexperienced to deal with the technical difficulties of the business intelligence systems. This research aims to create an automatic matching framework between the key performance indicators (KPI) formula and the data in database systems, based on ontology concepts, in the case study of Prince of Songkla University. The mapping data schema with ontology (MapDSOnto) framework is created through knowledge adaptation from the literature review and is evaluated using sample data from the case study. String similarity methods are compared to find the best fit for this framework. The research results reveal that the "fuzz.token_set_ratio" method is suitable for this study, with a 91.50 similarity score. The two main algorithms, database schema mapping and domain schema mapping, present the process of the MapDS-Onto framework using the "fuzz.token_set_ratio" method and database structure ontology to match the correct data of each factor in the KPI formula. The MapDS-Onto framework contributes to increasing self-reliance by reducing the amount of database knowledge that business users need to use semantic business intelligence.

3D Visualization of Compound Knowledge using SOM(Self-Organizing Map) (SOM을 이용한 복합지식의 3D 가시화 방법)

  • Kim, Gui-Jung;Han, Jung-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.5
    • /
    • pp.50-56
    • /
    • 2011
  • This paper proposes 3D visualization method of compound knowledge which will be able to identify and search easily compound knowledge objects based the multidimensional relationship. For this, we structurized a compound knowledge with link and node which become the semantic network. and we suggested 3D visualization method using SOM. Also, to arrange compound knowledge from 3D space and to provide the chance of realistic and intuitional information retrieval to the user, we proposed compound knowledge 3D clustering methods using object similarity. Compound knowledge 3D visualization and clustering using SOM will be the optimum method to appear context of compound knowledge and connectivity in space-time.

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

A Semantic-Based Mashup Development Tool Supporting Various Open API Types (다양한 Open API 타입들을 지원하는 시맨틱 기반 매쉬업 개발 툴)

  • Lee, Yong-Ju
    • Journal of Internet Computing and Services
    • /
    • v.13 no.3
    • /
    • pp.115-126
    • /
    • 2012
  • Mashups have become very popular over the last few years, and their use also varies for IT convergency services. In spite of their popularity, there are several challenging issues when combining Open APIs into mashups, First, since portal sites may have a large number of APIs available for mashups, manually searching and finding compatible APIs can be a tedious and time-consuming task. Second, none of the existing portal sites provides a way to leverage semantic techniques that have been developed to assist users in locating and integrating APIs like those seen in traditional SOAP-based web services. Third, although suitable APIs have been discovered, the integration of these APIs is required for in-depth programming knowledge. To solve these issues, we first show that existing techniques and algorithms used for finding and matching SOAP-based web services can be reused, with only minor changes. Next, we show how the characteristics of APIs can be syntactically defined and semantically described, and how to use the syntactic and semantic descriptions to aid the easy discovery and composition of Open APIs. Finally, we propose a goal-directed interactive approach for the dynamic composition of APIs, where the final mashup is gradually generated by a forward chaining of APIs. At each step, a new API is added to the composition.

An Ontology-Driven Mapping Algorithm between Heterogeneous Product Classification Taxonomies (이질적인 쇼핑몰 환경을 위한 온톨로지 기반 상품 매핑 방법론)

  • Kim Woo-Ju;Choi Nam-Hyuk;Choi Dae-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.2
    • /
    • pp.33-48
    • /
    • 2006
  • The Semantic Web and its related technologies have been opening the era of information sharing via the Web. There are, however, several huddles still to overcome in the new era, and one of the major huddles is the issue of information integration, unless a single unified and huge ontology could be built and used which could address everything in the world. Particularly in the e-business area, the problem of information integration is of a great concern for product search and comparison at various Internet shopping sites and e-marketplaces. To overcome this problem, we proposed an ontology-driven mapping algorithm between heterogeneous product classification and description frameworks. We also peformed a comparative evaluation of the proposed mapping algorithm against a well-Down ontology mapping tool, PROMPT.

  • PDF

Genealogy grouping for services of message post-office box based on fuzzy-filtering (퍼지필터링 기반의 메시지 사서함 서비스를 위한 genealogy 그룹화)

  • Lee Chong-Deuk;Ahn Jeong-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.701-708
    • /
    • 2005
  • Structuring mechanism, important to serve messages in post-office box structure, is to construct the hierarchy of classes according to the contents of message objects. This Paper Proposes $\alpha$-cut based genealogy grouping method to cluster a lot of structured objects in application domain. The proposed method decides the relationship first by semantic similarity relation and fuzzy relation, and then performs the grouping by operations of search( ), insert() and hierarchy(). This hierarchy structure makes it easy to process group-related processing tasks such as answering queries, discriminating objects, finding similarities among objects, etc. The proposed post-office box structure may be efficiently used to serve and manage message objects by the creation of groups. The Proposed method is tested for 5500 message objects and compared with other methods such as non-grouping, BGM, RGM, OGM.

Ranked Web Service Retrieval by Keyword Search (키워드 질의를 이용한 순위화된 웹 서비스 검색 기법)

  • Lee, Kyong-Ha;Lee, Kyu-Chul;Kim, Kyong-Ok
    • The Journal of Society for e-Business Studies
    • /
    • v.13 no.2
    • /
    • pp.213-223
    • /
    • 2008
  • The efficient discovery of services from a large scale collection of services has become an important issue[7, 24]. We studied a syntactic method for Web service discovery, rather than a semantic method. We regarded a service discovery as a retrieval problem on the proprietary XML formats, which were service descriptions in a registry DB. We modeled services and queries as probabilistic values and devised similarity-based retrieval techniques. The benefits of our way are follows. First, our system supports ranked service retrieval by keyword search. Second, we considers both of UDDI data and WSDL definitions of services amid query evaluation time. Last, our technique can be easily implemented on the off-theshelf DBMS and also utilize good features of DBMS maintenance.

  • PDF

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.