• Title/Summary/Keyword: sentence-query similarity

Search Result 5, Processing Time 0.021 seconds

SSF: Sentence Similar Function Based on word2vector Similar Elements

  • Yuan, Xinpan;Wang, Songlin;Wan, Lanjun;Zhang, Chengyuan
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1503-1516
    • /
    • 2019
  • In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similarity calculation method based on a system similarity function. The algorithm uses word2vector as the system elements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from two characteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF) based on word2vector similar elements doesn't satisfy the exchange rule. In later studies, we found the time complexity of our algorithm depends on the process of calculating similar elements, so we build an index of potentially similar elements when training the word vector process. Finally, the experimental results show that our algorithm has higher accuracy than the word mover's distance (WMD), and has the least query time of three calculation methods of SSF.

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity (문장-질의 유사성을 이용한 웹 정보 검색의 성능 향상)

  • Park Eui-Kyu;Ra Dong-Yul;Jang Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.406-415
    • /
    • 2005
  • Prosperity of Internet led to the web containing huge number of documents. Thus increasing importance is given to the web information retrieval technology that can provide users with documents that contain the right information they want. This paper proposes several techniques that are effective for the improvement of web information retrieval. Similarity between a document and the query is a major source of information exploited by conventional systems. However, we suggest a technique to make use of similarity between a sentence and the query. We introduce a technique to compute the approximate score of the sentence-query similarity even without a mature technology of natural language processing. It was shown that the amount of computation for this task is linear to the number of documents in the total collection, which implies that practical systems can make use of this technique. The next important technique proposed in this paper is to use stratification of documents in re-ranking the documents to output. It was shown that it can lead to significant improvement in performance. We furthermore showed that using hyper links, anchor texts, and titles can result in enhancement of performance. To justify the proposed techniques we developed a large scale web information retrieval system and used it for experiments.

The Method of Searching Unified Medical Language System Using Automatic Modified a Query (자동 질의수정을 통한 통합의학언어 시스템 검색)

  • 김종광;하원식;이정현
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.129-132
    • /
    • 2003
  • The metathesaurus(UMLS, 2003AA edition) supports multi language and includes 875, 233 concepts, 2, 146, 897 concept names. It is impossible for PubMed or NLM serve searching of the metatheaurus to retrieval using a query that is not to be text, a fault sentence structure or a part of concept name. That means the user notice correctly suitable medical words in order to get correct answer, otherwise she or he can't find information that they want to find I propose that the method of searching unified medical language system using automatic modified a query for problem that I mentioned. This method use dictionary that is standard for automation of modified query gauge similarity between query and dictionary using string comparison algorithm. And then, the tested term converse the form of metathesaurus for optimized result. For the evaluation of method, I select some query and I contrast NLM method that renewed Aug. 2003.

  • PDF

Implementation of a Spam Message Filtering System using Sentence Similarity Measurements (문장유사도 측정 기법을 통한 스팸 필터링 시스템 구현)

  • Ou, SooBin;Lee, Jongwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • Short message service (SMS) is one of the most important communication methods for people who use mobile phones. However, illegal advertising spam messages exploit people because they can be used without the need for friend registration. Recently, spam message filtering systems that use machine learning have been developed, but they have some disadvantages such as requiring many calculations. In this paper, we implemented a spam message filtering system using the set-based POI search algorithm and sentence similarity without servers. This algorithm can judge whether the input query is a spam message or not using only letter composition without any server computing. Therefore, we can filter the spam message although the input text message has been intentionally modified. We added a specific preprocessing option which aims to enable spam filtering. Based on the experimental results, we observe that our spam message filtering system shows better performance than the original set-based POI search algorithm. We evaluate the proposed system through extensive simulation. According to the simulation results, the proposed system can filter the text message and show high accuracy performance against the text message which cannot be filtered by the 3 major telecom companies.

A New Similarity Measure for Improving Ranking in QA Systems (질의응답시스템 응답순위 개선을 위한 새로운 유사도 계산방법)

  • Kim Myung-Gwan;Park Young-Tack
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.529-536
    • /
    • 2004
  • The main idea of this paper is to combine position information in sentence and query type classification to make the documents ranking to query more accessible. First, the use of conceptual graphs for the representation of document contents In information retrieval is discussed. The method is based on well-known strategies of text comparison, such as Dice Coefficient, with position-based weighted term. Second, we introduce a method for learning query type classification that improves the ability to retrieve answers to questions from Question Answering system. Proposed methods employ naive bayes classification in machine learning fields. And, we used a collection of approximately 30,000 question-answer pairs for training, obtained from Frequently Asked Question(FAQ) files on various subjects. The evaluation on a set of queries from international TREC-9 question answering track shows that the method with machine learning outperforms the underline other systems in TREC-9 (0.29 for mean reciprocal rank and 55.1% for precision).