• Title/Summary/Keyword: Text Title

Search Result 150, Processing Time 0.032 seconds

Title Extraction from Book Cover Images Using Histogram of Oriented Gradients and Color Information

  • Do, Yen;Kim, Soo Hyung;Na, In Seop
    • International Journal of Contents
    • /
    • v.8 no.4
    • /
    • pp.95-102
    • /
    • 2012
  • In this paper, we present a technique to extract the title areas from book cover images. A typical book cover image may contain text, pictures, diagrams as well as complex and irregular background. In addition, the high variability of character features such as thickness, font, position, background and tilt of the text also makes the text extraction task more complicated. Therefore, we propose a two steps efficient method that uses Histogram of Oriented Gradients and color information to find the title areas. Firstly, text localization is carried out to find the title candidates. Finally, refinement process is performed to find the sufficient components of title areas. To obtain the best result, we also use other constraints about the size, ratio between the length and width of the title. We achieve encouraging results of extracted title regions from book cover images which prove the advantages and efficiency of the proposed method.

Automatic Document Title Generation with RNN and Reinforcement Learning (RNN과 강화 학습을 이용한 자동 문서 제목 생성)

  • Cho, Sung-Min;Kim, Wooseng
    • Journal of Information Technology Applications and Management
    • /
    • v.27 no.1
    • /
    • pp.49-58
    • /
    • 2020
  • Lately, a large amount of textual data have been poured out of the Internet and the technology to refine them is needed. Most of these data are long text and often have no title. Therefore, in this paper, we propose a technique to combine the sequence-to-sequence model of RNN and the REINFORCE algorithm to generate the title of the long text automatically. In addition, the TextRank algorithm was applied to extract a summarized text to minimize information loss in order to protect the shortcomings of the sequence-to-sequence model in which an information is lost when long texts are used. Through the experiment, the techniques proposed in this study are shown to be superior to the existing ones.

Automatic Title Detection by Spatial Feature and Projection Profile for Document Images (공간 정보와 투영 프로파일을 이용한 문서 영상에서의 타이틀 영역 추출)

  • Park, Hyo-Jin;Kim, Bo-Ram;Kim, Wook-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.209-214
    • /
    • 2010
  • This paper proposes an algorithm of segmentation and title detection for document image. The automated title detection method that we have developed is composed of two phases, segmentation and title area detection. In the first phase, we extract and segment the document image. To perform this operation, the binary map is segmented by combination of morphological operation and CCA(connected component algorithm). The first phase provides segmented regions that would be detected as title area for the second stage. Candidate title areas are detected using geometric information, then we can extract the title region that is performed by removing non-title regions. After classification step that removes non-text regions, projection is performed to detect a title region. From the fact that usually the largest font is used for the title in the document, horizontal projection is performed within text areas. In this paper, we proposed a method of segmentation and title detection for various forms of document images using geometric features and projection profile analysis. The proposed system is expected to have various applications, such as document title recognition, multimedia data searching, real-time image processing and so on.

How the Title of Investment Strategy Report Affects Stock Price Forecast: Using Text Mining Method (투자전략 보고서의 제목이 주가 예측에 미치는 영향: 텍스트마이닝 중심으로)

  • Jang, Joon-Kyu;Lee, Kyu Hyun;Lee, Zoonky
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.21-34
    • /
    • 2016
  • There are various investment strategy reports available online, prepared by many financial analysts. If the correlation between the title of the report and analyst forecast can be found, we can tell from the title whether analyst' forecast will be reliable or not. The objective of this study is to see the correlation between the title of analyst investment strategy report and the actual result of forecast by using the Text Mining technique. The result of actual analysis showed that "strong buy and sell call" appeared in the title lead the higher accuracy of analyst forecast and fulfillment ratio. The results that potential investors can get better information by reading the title of the analyst report. We hope that this study could be the basis for new methodologies in this area.

  • PDF

A Study on the Utility of Relevance/Non-relevance Information in Homogeneous Documents (유사문헌집단에서 적합/부적합정보의 유용성에 관한 연구)

  • Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.3
    • /
    • pp.277-293
    • /
    • 2015
  • This study examined the relative retrieval effectiveness after relevance feedback between two systems (Title/Abstract and Full-text) using four different sets of relevance judgment. Four relevance levels (not relevant, marginally relevant, relevant, highly relevant) are also used, each of which is determined by referees giving a relevance score to documents. This study also investigated how much the average precision was improved after relevance feedback when "marginally relevant" documents are included in the relevant class with the Title/Abstract system, and with the Full-text retrieval system as well. It is found that the Title/Abstract system benefited from relevance feedback with the marginally relevant documents. In case of the Title/Abstract system, the higher percentage of improvement was consistently obtained when including the marginally relevant documents in the relevance class, however the result was vice versa in case of the Full-text retrieval system. It implied that the marginally relevant documents in the relevant class had caused noises in the Full-text retrieval system.

A Study on Information Resource Evaluation for Text Categorization (문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.305-321
    • /
    • 2007
  • The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

A Study on 『HaeHokByeonUi』 by Lee, ByungHa (이병하(李炳夏)의 『해혹변의(解惑辨疑)』 연구)

  • Park, Hun-pyeong
    • Journal of Korean Medical classics
    • /
    • v.34 no.1
    • /
    • pp.1-25
    • /
    • 2021
  • Objectives : The purpose of this paper is to analyze the text of the 『HaeHokByeonUi(解惑辨疑)』 in detail and to collect information on its author, Lee, ByungHa. Methods : Family and life of Lee, ByungHa were reconstructed through genealogy and historical data published by the government. The contents and frequency of title items were analyzed. Results :1. The period of writing is estimated to be between 1827-1831. 2. At that time, there were one JeonUigam(典醫監)-bujigjang(副直長), and four medical officers who belonged to the Chijongcheong(治腫廳). 3. There was a total of 2434 title items, of which 472 items were overlaps. 4. The proportion of general vocabulary is higher than that of other vocabulary. 5. The overlapping title items are presumed to be important basic concepts within the medical text of that time. Conclusions : 『HaeHokByeonUi(解惑辨疑)』 was likely an introductory text to those preparing for the National Medical Examination of the 19th century. It provides useful basic medical vocabulary to learners of Korean Medicine even today.

A Study on the Compilation and Revision of Texts in the Nam-chungjanggong-sigo ("남충장공시고"의 편차와 산절에 관한 연구)

  • 박문열
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.1
    • /
    • pp.195-215
    • /
    • 2003
  • This study is a bibliographical analysis on the Nam-Chungjanggong-sigo(남충장공시고), a wooden block printed book. On the view of physical bibliography, a table of contents on the Nam-Chungjanggong-sigo is compiled by a preface, a Chungjanggong's poetical works, an extra appendixes and an epilogue; and its wooden printing block has made of 52 plates. On the row of textual bibliography, text of Chungjanggong's poetical works is revised second times, such as each title or text, also title and text; and shorten for verses. And some of verses are prepared for revised the same case of compilation, but omitted in the process of last compilation.

  • PDF

The Effect of Text Consistency between the Review Title and Content on Review Helpfulness (온라인 리뷰의 제목과 내용의 일치성이 리뷰 유용성에 미치는 영향)

  • Li, Qinglong;Kim, Jaekyeong
    • Knowledge Management Research
    • /
    • v.23 no.3
    • /
    • pp.193-212
    • /
    • 2022
  • Many studies have proposed several factors that affect review helpfulness. Previous studies have investigated the effect of quantitative factors (e.g., star ratings) and affective factors (e.g., sentiment scores) on review helpfulness. Online reviews contain titles and contents, but existing studies focus on the review content. However, there is a limitation to investigating the factors that affect review helpfulness based on the review content without considering the review title. However, previous studies independently investigated the effect of review content and title on review helpfulness. However, it may ignore the potential impact of similarity between review titles and content on review helpfulness. This study used text consistency between review titles and content affect review helpfulness based on the mere exposure effect theory. We also considered the role of information clearness, review length, and source reliability. The results show that text consistency between the review title and the content negatively affects the review helpfulness. Furthermore, we found that information clearness and source reliability weaken the negative effects of text consistency on review helpfulness.

Development of Retrieval Model Using Structure Information and Term Information (구조적 정보와 색인어 정보를 결합한 검색 모델 개발)

  • 임성신;한기덕;권혁철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.799-801
    • /
    • 2004
  • 인터넷 정보의 축적량이 증가함으로 인해 사용자는 원하는 정보를 찾기가 더욱 어려워졌다 따라서 수많은 문서들 중에서 원하는 정보를 효과적으로 찾아주는 정보검색 시스템의 중요성이 증가하게 되었으며 이에 대한 연구도 활발히 진행되었다. 인터넷 문서에서 추출할 수 있는 정보들은 링크 정보, Anchor Text 정보, Title Text 정보, 본문 Text 정보 등이 있으며, 이런 정보들을 이용한 수많은 정보검색 시스템이 개발되거나 모델이 연구되고 있다 본 논문에서는 기존에 이용되어 왔던 일반적인 추출 점보들을 정제 및 처리를 통해 성능을 높일 수 있는 방안을 연구했던 선행 연구를 기반으로 한 실험 결과 및 사이트 가중치를 추가한 모델을 제시한다.

  • PDF