• Title/Summary/Keyword: Text Retrieval

Search Result 342, Processing Time 0.023 seconds

Combining Multiple Sources of Evidence to Enhance Web Search Performance

  • Yang, Kiduk
    • Journal of Korean Library and Information Science Society
    • /
    • v.45 no.3
    • /
    • pp.5-36
    • /
    • 2014
  • The Web is rich with various sources of information that go beyond the contents of documents, such as hyperlinks and manually classified directories of Web documents such as Yahoo. This research extends past fusion IR studies, which have repeatedly shown that combining multiple sources of evidence (i.e. fusion) can improve retrieval performance, by investigating the effects of combining three distinct retrieval approaches for Web IR: the text-based approach that leverages document texts, the link-based approach that leverages hyperlinks, and the classification-based approach that leverages Yahoo categories. Retrieval results of text-, link-, and classification-based methods were combined using variations of the linear combination formula to produce fusion results, which were compared to individual retrieval results using traditional retrieval evaluation metrics. Fusion results were also examined to ascertain the significance of overlap (i.e. the number of systems that retrieve a document) in fusion. The analysis of results suggests that the solution spaces of text-, link-, and classification-based retrieval methods are diverse enough for fusion to be beneficial while revealing important characteristics of the fusion environment, such as effects of system parameters and relationship between overlap, document ranking and relevance.

The Prefix Array for Multimedia Information Retrieval in the Real-Time Stenograph (실시간 속기 자막 환경에서 멀티미디어 정보 검색을 위한 Prefix Array)

  • Kim, Dong-Joo;Kim, Han-Woo
    • Proceedings of the KIEE Conference
    • /
    • 2006.10c
    • /
    • pp.521-523
    • /
    • 2006
  • This paper proposes an algorithm and its data structure to support real-time full-text search for the streamed or broadcasted multimedia data containing real-time stenograph text. Since the traditional indexing method used at information retrieval area uses the linguistic information, there is a heavy cost. Therefore, we propose the algorithm and its data structure based on suffix array, which is a simple data structure and has low space complexity. Suffix array is useful frequently to search for huge text. However, subtitle text of multimedia data is to get longer by time. Therefore, suffix array must be reconstructed because subtitle text is continually changed. We propose the data structure called prefix array and search algorithm using it.

  • PDF

A Study on Natural Language Document and Query Processor for Information Retrieval in Digital Library (디지털 도서관 환경에서의 정보 검색을 위한 자연어 문서 및 질의 처리기에 관한 연구)

  • 윤성희
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.12
    • /
    • pp.1601-1608
    • /
    • 2001
  • Digital library is the most important database system that needs information retrieval engine for natural language documents and multimedia data. This paper describes the experimental results of information retrieval engine and browser based on natural language processing. It includes lexical analysis, syntax processing, stemming, and keyword indexing for the natural language text. With the experimental database ‘Earth and Space Science’ that has lots of images and titles and their descriptive text in natural language, text-based search engine was tested. Combined with content-based image search engine, it is expected to be a multimedia information retrieval system in digital library

  • PDF

Text Partitioned Indexing Method for Educational Documents (교육용 문서의 텍스트분할 색인)

  • Kang, Mu-Yeong;Lee, Sang-Gu
    • Journal of The Korean Association of Information Education
    • /
    • v.3 no.2
    • /
    • pp.72-84
    • /
    • 2000
  • Information retrieval system plays a key role in the information society to store digital documents with efficiency and to provide user with the information through the retrieval very fast. Especially, indexing is a prerequisite function for the information retrieval system in order to retrieve the information of the documents effectively which are saved in database. In this paper, we propose an indexing method using text partition. This method can retrieve educational documents in short processing time. We applied the suggested indexing method to real information retrieval system, and proved its excellent functions through the demonstration.

  • PDF

The Historical Development of Information Retrieval Systems (정보검색 발전사)

  • 사공철;서경주
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.2
    • /
    • pp.19-37
    • /
    • 1996
  • The development of information retrieval between 1950s and 1990s is described chronologically. For each decade, the following information retrieval systems are examined : post-coordinate and KWIC indexing methods for the 1950s ; off-line and experimental on-line systems for the 1960s ; on-line and full-text retrieval systems for the 1970s ; full-text databases, on-line interfaces, and overseas and domestic on-line databases for the 1980s ; and finally for the 1990s, CD-ROM, multimedia, hypertext, and Internet. The prospects for the future are also discussed.

  • PDF

The Extraction of Effective Index Database from Voice Database and Information Retrieval (음성 데이터베이스로부터의 효율적인 색인데이터베이스 구축과 정보검색)

  • Park Mi-Sung
    • Journal of Korean Library and Information Science Society
    • /
    • v.35 no.3
    • /
    • pp.271-291
    • /
    • 2004
  • Such information services source like digital library has been asked information services of atypical multimedia database like image, voice, VOD/AOD. Examined in this study are suggestions such as word-phrase generator, syllable recoverer, morphological analyzer, corrector for voice processing. Suggested voice processing technique transform voice database into tort database, then extract index database from text database. On top of this, the study suggest a information retrieval model to use in extracted index database, voice full-text information retrieval.

  • PDF

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

  • Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2407-2424
    • /
    • 2022
  • Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Implementation of Text Summarize Automation Using Document Length Normalization (문서 길이 정규화를 이용한 문서 요약 자동화 시스템 구현)

  • 이재훈;김영천;이성주
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.51-55
    • /
    • 2001
  • With the rapid growth of the World Wide Web and electronic information services, information is becoming available on-Line at an incredible rate. One result is the oft-decried information overload. No one has time to read everything, yet we often have to make critical decisions based on what we are able to assimilate. The technology of automatic text summarization is becoming indispensable for dealing with this problem. Text summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user or task. Information retrieval(IR) is the task of searching a set of documents for some query-relevant documents. On the other hand, text summarization is considered to be the task of searching a document, a set of sentences, for some topic-relevant sentences. In this paper, we show that document information, that is more reliable and suitable for query, using document length normalization of which is gained through information retrieval . Experimental results of this system in newspaper articles show that document length normalization method superior to other methods use query itself.

  • PDF

A Study in the Preference of e-Learning Contents Delivery Types on Web Information Search Literacy in the case of Agricultural High School (농업계 고등학교 학생들의 정보검색 능력에 따른 이러닝 콘텐츠 유형 선호도 연구)

  • Yu, Byeong-Min;Kim, Su-Wook;Park, Sung-Youl;Choi, Jun-Sik
    • Journal of Agricultural Extension & Community Development
    • /
    • v.16 no.2
    • /
    • pp.463-486
    • /
    • 2009
  • The purpose of this study was to find out the differences of preferences in e-Learning contents delivery types according to information searching retrieval ability in agricultural high school students. Contents delivery types are limited three kinds which are HTML type, video type, and text type and need to know about differences. The following summarizes the results of this study. On the preference of e-Learning contents delivery type on information searching retrieval ability had differences. High level group of information searching retrieval ability showed that they mostly preferred text contents delivery type. However, low level group of information searching retrieval ability showed that they preferred video contents delivery type. The results support our belief that there could be the differences in preferences in e-Learning delivery types with students' information searching retrieval abilities. We suggest that delivery types of e-Learning should be based on the students not on designers and developers.

  • PDF