• Title/Summary/Keyword: Similarity search

Search Result 531, Processing Time 0.025 seconds

Design and Implementation of Intelligent Web Search Agent using Case Based Reasoning (사례기반 추론을 이용한 지능형 웹 검색 에이전트의 설계 및 구현)

  • 하창승;류길수
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.1
    • /
    • pp.20-29
    • /
    • 2003
  • According as quantity of information is augmented rapidly in World Wide Web, users are investing more times finding correct information to on. Search function that a search agent is personalized according to user's preference degree or search objective to solve these problem should be offered. Therefore, a search agent accumulates experienced knowledge connected with user's past search in this research. When new query was given, search agent offered learning function of intelligence that decides category group through estimation method of similarity using this knowledge. So this paper showed that case based search can bring superior result in the correctness rate than other search method.

  • PDF

A Similarity-based Inference System for Identifying Insects in the Ubiquitous Environments (유비쿼터스 환경에서의 유사도 기반 곤충 종 추론검색시스템)

  • Jun, Eung-Sup;Chang, Yong-Sik;Kwon, Young-Dae;Kim, Yong-Nam
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.3
    • /
    • pp.175-187
    • /
    • 2011
  • Since insects play important roles in existence of plants and other animals in the natural environment, they are considered as necessary biological resources from the perspectives of those biodiversity conservation and national utilization strategy. For the conservation and utilization of insect species, an observational learning environment is needed for non-experts such as citizens and students to take interest in insects in the natural ecosystem. The insect identification is a main factor for the observational learning. A current time-consuming search method by insect classification is inefficient because it needs much time for the non-experts who lack insect knowledge to identify insect species. To solve this problem, we proposed an smart phone-based insect identification inference system that helps the non-experts identify insect species from observational characteristics in the natural environment. This system is based on the similarity between the observational information by an observer and the biological insect characteristics. For this system, we classified the observational characteristics of insects into 27 elements according to order, family, and species, and proposed similarity indexes to search similar insects. In addition, we developed an insect identification inference prototype system to show this study's viability and performed comparison experimentation between our system and a general insect classification search method. As the results, we showed that our system is more effective in identifying insect species and it can be more efficient in search time.

Construction of Local Document Management System based on Associative Search

  • Kasagi, Yoshimasa;Yamaguchi, Toru;Takama, Yasufumi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.146-149
    • /
    • 2003
  • As the information that can collect from the web to local database is increasing, we propose a system that can suggest related local documents when new document arrives. We also propose for constructing an association dictionary using web search engines for similarity calculation. The prototype system is also developed, which is described in detail.

  • PDF

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching (집합 유사 시퀀스 매칭의 성능 향상을 위한 인덱스 기반 검색 방법)

  • Lee, Juwon;Lim, Hyo-Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.507-520
    • /
    • 2017
  • The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.

Learning Similarity with Probabilistic Latent Semantic Analysis for Image Retrieval

  • Li, Xiong;Lv, Qi;Huang, Wenting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1424-1440
    • /
    • 2015
  • It is a challenging problem to search the intended images from a large number of candidates. Content based image retrieval (CBIR) is the most promising way to tackle this problem, where the most important topic is to measure the similarity of images so as to cover the variance of shape, color, pose, illumination etc. While previous works made significant progresses, their adaption ability to dataset is not fully explored. In this paper, we propose a similarity learning method on the basis of probabilistic generative model, i.e., probabilistic latent semantic analysis (PLSA). It first derives Fisher kernel, a function over the parameters and variables, based on PLSA. Then, the parameters are determined through simultaneously maximizing the log likelihood function of PLSA and the retrieval performance over the training dataset. The main advantages of this work are twofold: (1) deriving similarity measure based on PLSA which fully exploits the data distribution and Bayes inference; (2) learning model parameters by maximizing the fitting of model to data and the retrieval performance simultaneously. The proposed method (PLSA-FK) is empirically evaluated over three datasets, and the results exhibit promising performance.

Cross-architecture Binary Function Similarity Detection based on Composite Feature Model

  • Xiaonan Li;Guimin Zhang;Qingbao Li;Ping Zhang;Zhifeng Chen;Jinjin Liu;Shudan Yue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2101-2123
    • /
    • 2023
  • Recent studies have shown that the neural network-based binary code similarity detection technology performs well in vulnerability mining, plagiarism detection, and malicious code analysis. However, existing cross-architecture methods still suffer from insufficient feature characterization and low discrimination accuracy. To address these issues, this paper proposes a cross-architecture binary function similarity detection method based on composite feature model (SDCFM). Firstly, the binary function is converted into vector representation according to the proposed composite feature model, which is composed of instruction statistical features, control flow graph structural features, and application program interface calling behavioral features. Then, the composite features are embedded by the proposed hierarchical embedding network based on a graph neural network. In which, the block-level features and the function-level features are processed separately and finally fused into the embedding. In addition, to make the trained model more accurate and stable, our method utilizes the embeddings of predecessor nodes to modify the node embedding in the iterative updating process of the graph neural network. To assess the effectiveness of composite feature model, we contrast SDCFM with the state of art method on benchmark datasets. The experimental results show that SDCFM has good performance both on the area under the curve in the binary function similarity detection task and the vulnerable candidate function ranking in vulnerability search task.

An Empirical Evaluation of Color Distribution Descriptor for Image Search (이미지 검색을 위한 칼라 분포 기술자의 성능 평가)

  • Lee, Choon-Sang;Lee, Yong-Hwan;Kim, Young-Seop;Rhee, Sang-Burm
    • Journal of the Semiconductor & Display Technology
    • /
    • v.5 no.2 s.15
    • /
    • pp.27-31
    • /
    • 2006
  • As more and more digital images are made by various applications, image retrieval becomes a primary concern in technology of multimedia. This paper presents color based descriptor that uses information of color distribution in color images which is the most basic element for image search and performance of proposed visual feature is evaluated through the simulation. In designing the image search descriptor used color histogram, HSV, Daubechies 9/7 and 2 level wavelet decomposition provide better results than other parameters in terms of computational time and performances. Also histogram quadratic matrix outperforms the sum of absolute difference in similarity measurements, but spends more than 60 computational times.

  • PDF

Atom Number and Bounding Sphere Based Search Speedup Technique for Similar Proteins Screening (원자개수와 경계구에 기반한 유사 단백질 스크리닝을 위한 검색 가속 기법)

  • Lee, Jaeho;Park, JoonYoung
    • Korean Journal of Computational Design and Engineering
    • /
    • v.20 no.4
    • /
    • pp.321-327
    • /
    • 2015
  • In the protein database search, 3D structural shape comparison for protein screening plays a important role. Protein databases have big size and have been grown rapidly. Exhaustive search methods cannot provide a satisfactory performance. As protein is composed of a set of spheres, the similarity calculation of two set of spheres is very expensive. Thus, a reasonable filtering method could be an answer for the speedup of protein screening. In this paper, we suggest a speedup method for protein screening with atom number and bounding sphere. We also show some experimental results for the validity of our method.

Engineering Information Search based on Ontology Mapping (온톨로지 매핑 기반 엔지니어링 정보 검색)

  • Jung Min;Suh Hyo-Won
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.617-618
    • /
    • 2006
  • The participants in collaborative environment want to get the right documents which are intended to find. In general search system, it searches documents which contain only the keywords. For searching different word-expressions for the same meaning, we perform mapping before searching. Our mapping logic consists of three steps. First, the character matching is the mapping of two terminologies that have identical character strings. Second, the definition comparing is the method that compares two terminologies' definitions. Third, the similarity checking pairs terminologies which were not mapped by two prior steps. In this paper, we propose Engineering Information Search System based on ontology mapping.

  • PDF

Design of IG-based Fuzzy Models Using Improved Space Search Algorithm (개선된 공간 탐색 알고리즘을 이용한 정보입자 기반 퍼지모델 설계)

  • Oh, Sung-Kwun;Kim, Hyun-Ki
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.686-691
    • /
    • 2011
  • This study is concerned with the identification of fuzzy models. To address the optimization of fuzzy model, we proposed an improved space search evolutionary algorithm (ISSA) which is realized with the combination of space search algorithm and Gaussian mutation. The proposed ISSA is exploited here as the optimization vehicle for the design of fuzzy models. Considering the design of fuzzy models, we developed a hybrid identification method using information granulation and the ISSA. Information granules are treated as collections of objects (e.g. data) brought together by the criteria of proximity, similarity, or functionality. The overall hybrid identification comes in the form of two optimization mechanisms: structure identification and parameter identification. The structure identification is supported by the ISSA and C-Means while the parameter estimation is realized via the ISSA and weighted least square error method. A suite of comparative studies show that the proposed model leads to better performance in comparison with some existing models.