Search | Korea Science

Main Content Extraction from Web Pages Based on Node Characteristics

Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
- Journal of Computing Science and Engineering
- /
- v.11 no.2
- /
- pp.39-48
- /
- 2017
Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.
https://doi.org/10.5626/JCSE.2017.11.2.39 인용 PDF KSCI

Font Change Blindness Triggered by the Text Difficulty in Moving Window Technique (움직이는 창 기법에서의 덩이글 난이도에 따른 글꼴 변화맹)

Seong-Jun Bak;Joo-Seok Hyun
- Korean Journal of Cognitive Science
- /
- v.34 no.4
- /
- pp.259-275
- /
- 2023
The aim of this study was to investigate font change blindness based on text difficulty in the "Moving Window Task", as originally introduced by McConkie and Rayner(1975). During the reading process where the moving window was applied, different target words in terms of font style compared to the text were presented. As participants' gaze reached the position of the target word, the font of the target word was changed to match the text font. The font of the target word before the change was either sans-serif when the text font was serif, or serif when the text font was sans-serif. After completing the reading task, more than half of the participants(62.5%) reported not detecting the font change. Observation of eye movements at the target word positions revealed that when understanding the content within the text was difficult, there was an increase in the number of regressions, an extended gaze duration, and a reduction in saccade length. Specifically, the increase in the number of regressions was evident only when the text font was serif, in other words, when the font of the target word shifted from sans-serif to serif. These results suggest that sensory interference unrelated to content understanding is not easily detected during reading. However, the possibility of detection increases when comprehension of the content becomes challenging. Furthermore, this exceptional detection possibility implies that it may be higher when the text font is serif compared to when it is sans-serif.
https://doi.org/10.19066/cogsci.2023.34.4.001 인용 PDF

Comparison of Text Beginning Frame Detection Methods in News Video Sequences (뉴스 비디오 시퀀스에서 텍스트 시작 프레임 검출 방법의 비교)

Lee, Sanghee;Ahn, Jungil;Jo, Kanghyun
- Journal of Broadcast Engineering
- /
- v.21 no.3
- /
- pp.307-318
- /
- 2016
비디오 프레임 내의 오버레이 텍스트는 음성과 시각적 내용에 부가적인 정보를 제공한다. 특히, 뉴스 비디오에서 이 텍스트는 비디오 영상 내용을 압축적이고 직접적인 설명을 한다. 그러므로 뉴스 비디오 색인 시스템을 만드는데 있어서 가장 신뢰할 수 있는 실마리이다. 텔레비전 뉴스 프로그램의 색인 시스템을 만들기 위해서는 텍스트를 검출하고 인식하는 것이 중요하다. 이 논문은 뉴스 비디오에서 오버레이 텍스트를 검출하고 인식하는데 도움이 되는 오버레이 텍스트 시작 프레임 식별을 제안한다. 비디오 시퀀스의 모든 프레임이 오버레이 텍스트를 포함하는 것이 아니기 때문에, 모든 프레임에서 오버레이 텍스트의 추출은 불필요하고 시간 낭비다. 그러므로 오버레이 텍스트를 포함하고 있는 프레임에만 초점을 맞춤으로써 오버레이 텍스트 검출의 정확도를 개선할 수 있다. 텍스트 시작 프레임 식별 방법에 대한 비교 실험을 뉴스 비디오에 대해서 실시하고, 적절한 처리 방법을 제안한다.
https://doi.org/10.5909/JBE.2016.21.3.307 인용 PDF KSCI KPUBS HTML

Case Analysis of Bible Visualization based on Text Data Traits -Focused on Content, Structure, Quotation of Text- (텍스트 데이터의 특성에 따른 성경 시각화 사례 분석 -텍스트의 내용적, 구조적 특성 및 인용 정보를 중심으로-)

Kim, Hyoyoung;Park, Jin Wan
- The Journal of the Korea Contents Association
- /
- v.13 no.8
- /
- pp.83-92
- /
- 2013
Text visualization begins with understanding text itself which is material of visual expression. To visualize any text data, sufficient understanding about characteristics of the text first and the expressive approaches can be decided depending on the derived unique characteristics of the text. In this research we aimed to establish theoretical foundation about the approaches for text visualization by diverse examples of text visualization which are derived through the various characteristics of the text. To do this, we chose the 'Bible' text which is well known globally and digital data of it can be accessed easily and thus diverse text visualization examples exist and analyzed the examples of the bible text visualization. We derived the unique characteristics of text-content, structure, quotation- as criteria for analyzing and supported validity of analysis by adopting at least 2-3 examples for each criterion. In the result, we can comprehend that the goals and expressive approaches are decided depending on the unique characteristics of the Bible text. We expect to build theoretical method for choosing the materials and approaches by analyzing more diverse examples with various point of views on the basis of this research.
https://doi.org/10.5392/JKCA.2013.13.08.083 인용 PDF KSCI

Entrepreneur Speech and User Comments: Focusing on YouTube Contents (기업가 연설문의 주제와 시청자 댓글 간의 관계 분석: 유튜브 콘텐츠를 중심으로)

Kim, Sungbum;Lee, Junghwan
- The Journal of the Korea Contents Association
- /
- v.20 no.5
- /
- pp.513-524
- /
- 2020
Recently, YouTube's growth started drawing attention. YouTube is not only a content-consumption channel but also provides a space for consumers to express their intention. Consumers share their opinions on YouTube through comments. The study focuses on the text of global entrepreneurs' speeches and the comments in response to those speeches on YouTube. A content analysis was conducted for each speech and comment using the text mining software Leximancer. We analyzed the theme of each entrepreneurial speech and derived topics related to the propensity and characteristics of individual entrepreneurs. In the comments, we found the theme of money, work and need to be common regardless of the content of each speech. Talking into account the different lengths of text, we additionally performed a Prominence Index analysis. We derived time, future, better, best, change, life, business, and need as common keywords for speech contents and viewer comments. Users who watched an entrepreneur's speech on YouTube responded equally to the topics of life, time, future, customer needs, and positive change.
https://doi.org/10.5392/JKCA.2020.20.05.513 인용 PDF KSCI HTML

The Effects of User Involvement on Internet Ad Preference Based on Presentation Type and Content

Joo Hoo Kim
- The Journal of Society for e-Business Studies
- /
- v.8 no.4
- /
- pp.33-51
- /
- 2003
The primary objectives of this study were, using data from Internet users in Korea, to determine users' preference of banner ad through two ad properties; ad presentation type (text vs. image) and ad content (product information vs. prize information) by incorporating the level of involvement into research design. Using within-group experimental design by means of subjects' web-based participation in the study, the study result showed that image-based banner ad was significantly preferred to text-based banner ad. It was found that the level of ad involvement had a significant impact on the preference of banner ads. Also it was found that image-based banner ad had a greater effect on ad preference than text-based banner ad in low involvement situation only, Finally, image-based banner ad was consistently preferred to text-based banner ad regardless of involvement level when the banner ad was product oriented. The study findings suggest that adoption decisions regarding banner ad presentation type and banner ad content should be based on the knowledge of both the level of consumer's ad involvement and the interactive effects between ad presentation and ad content.
PDF

Audience Cognitive Reconstruction of the Extended Meaning of Complex Mechanism Text : For Communication Education using Story Media Expressions (복합기제 텍스트의 확장 의미에 대한 수용자의 인지적 재구성 : 서사적 미디어 표현을 활용한 의사소통 교육을 위해)

Lim, Ji-Won
- Journal of Korea Entertainment Industry Association
- /
- v.15 no.7
- /
- pp.137-143
- /
- 2021
This discussion can be said to be a qualitative study on the possibility of linking communication education for college students and literacy education for Korean language-linked educators based on the theory of interpretation of cognitive meaning of media text containing complex mechanisms. The implicit meaning of media content expression used as an interactive communication strategy will be accepted as a multilateral interpretation according to the individual learner's cognitive environment. If so, how is the general media content meaning intended by the content creator being accepted? These doubts are the starting point for discussion. To solve the problem, I leaned on the experimental pragmatic methodology of cognitive aesthetics and applied a model of relevance of cognitive linguistics to connect learners' creative cognitive environment and present content to find a contrast. As a result of the discussion, it was possible to establish a basic framework for learners to express their subjectivity and creative thinking that could connect the cognitive environment and present content themselves. In particular, active and positive learners also revealed direct descriptive expressions to build a new cognitive environment, such as suggesting a third alternative to argue the ability to question produced media texts and the validity of the meaning implied in the text. In the future, since media text containing complex mechanisms is an indirect and persuasive communication behavior that occurs easily through various media in modern society, the universal communication principle of reliable conversation between media text creators and audiences should exist.
https://doi.org/10.21184/jkeia.2021.10.15.7.137 인용

Design and Development of a Multimodal Biomedical Information Retrieval System

Demner-Fushman, Dina;Antani, Sameer;Simpson, Matthew;Thoma, George R.
- Journal of Computing Science and Engineering
- /
- v.6 no.2
- /
- pp.168-177
- /
- 2012
The search for relevant and actionable information is a key to achieving clinical and research goals in biomedicine. Biomedical information exists in different forms: as text and illustrations in journal articles and other documents, in images stored in databases, and as patients' cases in electronic health records. This paper presents ways to move beyond conventional text-based searching of these resources, by combining text and visual features in search queries and document representation. A combination of techniques and tools from the fields of natural language processing, information retrieval, and content-based image retrieval allows the development of building blocks for advanced information services. Such services enable searching by textual as well as visual queries, and retrieving documents enriched by relevant images, charts, and other illustrations from the journal literature, patient records and image databases.
https://doi.org/10.5626/JCSE.2012.6.2.168 인용 PDF KSCI KPUBS

Intention Classification for Retrieval of Health Questions

Liu, Rey-Long
- International Journal of Knowledge Content Development & Technology
- /
- v.7 no.1
- /
- pp.101-120
- /
- 2017
Healthcare professionals have edited many health questions (HQs) and their answers for healthcare consumers on the Internet. The HQs provide both readable and reliable health information, and hence retrieval of those HQs that are relevant to a given question is essential for health education and promotion through the Internet. However, retrieval of relevant HQs needs to be based on the recognition of the intention of each HQ, which is difficult to be done by predefining syntactic and semantic rules. We thus model the intention recognition problem as a text classification problem, and develop two techniques to improve a learning-based text classifier for the problem. The two techniques improve the classifier by location-based and area-based feature weightings, respectively. Experimental results show that, the two techniques can work together to significantly improve a Support Vector Machine classifier in both the recognition of HQ intentions and the retrieval of relevant HQs.
https://doi.org/10.5865/IJKCT.2017.7.1.101 인용 PDF KSCI

Design and Implementation of Web Crawler with Real-Time Keyword Extraction based on the RAKE Algorithm

Zhang, Fei;Jang, Sunggyun;Joe, Inwhee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2017.11a
- /
- pp.395-398
- /
- 2017
We propose a web crawler system with keyword extraction function in this paper. Researches on the keyword extraction in existing text mining are mostly based on databases which have already been grabbed by documents or corpora, but the purpose of this paper is to establish a real-time keyword extraction system which can extract the keywords of the corresponding text and store them into the database together while grasping the text of the web page. In this paper, we design and implement a crawler combining RAKE keyword extraction algorithm. It can extract keywords from the corresponding content while grasping the content of web page. As a result, the performance of the RAKE algorithm is improved by increasing the weight of the important features (such as the noun appearing in the title). The experimental results show that this method is superior to the existing method and it can extract keywords satisfactorily.
https://doi.org/10.3745/PKIPS.y2017m11a.395 인용 PDF

Search Result 812, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)