• Title/Summary/Keyword: text indexing

Search Result 111, Processing Time 0.022 seconds

Theory and Practice of Automatic Indexing (자동색인의 이론과 실제)

    • Journal of Korean Library and Information Science Society
    • /
    • v.30 no.3
    • /
    • pp.27-51
    • /
    • 1999
  • This paper deals with the methods as well as the problems associated with automatic extraction indexing and assignment indexing, expert systems for indexing, and major approaches currently used to index the Internet resources. It also briefly reviews basic methods for establishing hypertext/hypermedia links automatically. The methods used in much of text processing today are not particularly new. Most of the them were used, perhaps in a more rudimentary form, 30 or more years ago by Luhn and many other investigators. Better results can be achieved today because much greater bodies of electronic text are now avaliable and the power of present-day computers allows the processing of such text with reasonable efficiency.

  • PDF

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Automatic Name Line Detection for Person Indexing Based on Overlay Text

  • Lee, Sanghee;Ahn, Jungil;Jo, Kanghyun
    • Journal of Multimedia Information System
    • /
    • v.2 no.1
    • /
    • pp.163-170
    • /
    • 2015
  • Many overlay texts are artificially superimposed on the broadcasting videos by humans. These texts provide additional information to the audiovisual content. Especially, the overlay text in news videos contains concise and direct description of the content. Therefore, it is most reliable clue for constructing a news video indexing system. To make the automatic person indexing of interview video in the TV news program, this paper proposes the method to only detect the name text line among the whole overlay texts in one frame. The experimental results on Korean television news videos show that the proposed framework efficiently detects the overlaid name text line.

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

A Study on the Indexing System Using a Controlled Vocabulary and Natural Language in the Secondary Legal Information Full-Text Databases : an Evaluation and Comparison of Retrieval Effectiveness (2차 법률정보 전문데이터베이스에 있어서 통제어 색인시스템과 자연어 색인시스템의 검색효율 평가에 관한 연구)

  • Roh Jeong-Ran
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.69-86
    • /
    • 1998
  • The purpose of velop the indexing algorithm of secondary legal information by the study of characteristics of legal information, to compare the indexing system using controlled vocabulary to the indexing system using natural language in the secondary legal information full-text databases, and to prove propriety and superiority of the indexing system using controlled vocabulary. The results are as follows; 1)The indexing system using controlled vocabulary in the secondary legal information full-text databases has more effectiveness than the indexing system using natural language, in the recall rate, the precision rate, the distribution of propriety, and the faculty of searching for the unique proper-records which the indexing system using natural language fans to find 2)The indexing system which adds more words to the controlled vocabulary in the secondary legal information full-text databases does not better effectiveness in the retail rate, the precision rate, comparing to the indexing system using controlled vocabulary. 3)The indexing system using word-added controlled vocabulary with an extra weight in the secondary legal information full-text databases does not better effectiveness in the recall rate, the precision rate, comparing to the indexing system using word-added controlled vocabulary without an extra weight. This study indicates that it is necessary to have characteristic information the information experts recognize - that is to say, experimental and inherent knowledge only human being can have built-in into the system rather than to approach the information system by the linguistic, statistic or structuralistic way, and it can be more essential and intelligent information system.

  • PDF

On The Full-Text Database Retrieval and Indexing Language

  • Chang, Hye-Rhan
    • Journal of the Korean Society for information Management
    • /
    • v.4 no.1
    • /
    • pp.24-46
    • /
    • 1987
  • The recent growth of full-text database operations has brought new opportunities for subject access. The fundamental problem of subject access in the online environment is the indexing language and technology. The purpose of this paper is to identify the characteristics and capabilities of full-text retrieval as compared to traditional bibliographic retrieval. Retrieval performance of indexing languages, full-text systems features achieved so far, and the new role of a controlled vocabulary, are examined. This paper also includes a review of the research on full-text retrieval performance.

  • PDF

A Study on Information Resource Evaluation for Text Categorization (문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.305-321
    • /
    • 2007
  • The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

A Study of Automatic Indexing Technique based on Logical Structure of SGML Hangul Document (SGML 한글문서의 논리적 구조에 근거한 색인기법에 관한 연구)

  • 유석종
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.2
    • /
    • pp.85-101
    • /
    • 1995
  • Conventional indexing sytstems support only full-text indexing method for electronic documents and do not use logical structure of documents in retrieval. Most electronic documents are in different formats depending on various systems. Also, they only indicate physical style of the document without considering any logical structure. Thus, in the effort to standardize the exchange of documents. IS0 developed SGML(Stadard Generalized Markup Language) which contains information about logical structure of the documents. In this paper, to resolve the disadvantages of full-text indexing method and to use standard document format. indexing system for SGML document is designed and implemented. In this system, user can assign indexing domain on elements, thus the logical structure of document is reflected in retrieving information. Various retrieval methods can be implemented by using the structural information of the document. In addition, automatic indexing for SGML Hangul document is supported in this system

  • PDF

Text-based Image Indexing and Retrieval using Formal Concept Analysis

  • Ahmad, Imran Shafiq
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.2 no.3
    • /
    • pp.150-170
    • /
    • 2008
  • In recent years, main focus of research on image retrieval techniques is on content-based image retrieval. Text-based image retrieval schemes, on the other hand, provide semantic support and efficient retrieval of matching images. In this paper, based on Formal Concept Analysis (FCA), we propose a new image indexing and retrieval technique. The proposed scheme uses keywords and textual annotations and provides semantic support with fast retrieval of images. Retrieval efficiency in this scheme is independent of the number of images in the database and depends only on the number of attributes. This scheme provides dynamic support for addition of new images in the database and can be adopted to find images with any number of matching attributes.