Search | Korea Science

Main Content Extraction from Web Pages Based on Node Characteristics

Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
- Journal of Computing Science and Engineering
- /
- v.11 no.2
- /
- pp.39-48
- /
- 2017
Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.
https://doi.org/10.5626/JCSE.2017.11.2.39 인용 PDF KSCI

Web contents-comparative analysis: Online shopping agencies that retail foreign apparel

Kim, Sung-Hee
- Journal of Fashion Business
- /
- v.13 no.6
- /
- pp.99-110
- /
- 2009
The purpose of this study is 1) to investigate the Web content of online shopping agencies and to see if differences exist in their contents and 2) to suggest strategies for online shopping agencies. For the study, the online shopping agency components of web sites were investigated, which consisted of product, information, and customer service. Each component had sub-categories. In order to analyze the Web content of online shopping agencies, nine agencies were chosen based on the rankings of www.rankey.com and www.100hot.co.kr. The content analysis was conducted from January 5th to the 30th of 2009. The results showed that the basic Web content of online shopping agencies (i.e., product, information, and customer service) was evident on most sites. However, products were similar and effective information and labor-saving functions were sparsely used. Areas where customers actively participated were limited. Therefore, most sites need to reinforce their content by providing a well-articulated product, information, and customer service for shoppers and to differentiate their Web site identities.
PDF KSCI

User Centric Content Management System for Open IPTV Over SNS

Jeon, Seung Hyun;An, Sanghong;Yoon, Changwoo;Lee, Hyun-woo;Choi, Junkyun
- Journal of Communications and Networks
- /
- v.17 no.3
- /
- pp.296-305
- /
- 2015
Coupled schemes between service-oriented architecture (SOA) and Web 2.0 have recently been researched. Web-based content providers and telecommunications company (Telecom) based Internet protocol television (IPTV) providers have struggled against each other to accommodate more three-screen service subscribers. Since the advent of Web 2.0, more abundant reproduced content can be circulated. However, because according to increasing device's resolution and content formats IPTV providers transcode content in advance, network bandwidth, storage and operation costs for content management systems (CMSs) are wasted. In this paper, we present a user centric CMS for open IPTV, which integrates SOA and Web 2.0. Considering content popularity based on a Zipf-like distribution to solve these problems, we analyze the performance between the user centric CMS and the conventional Web syndication system for normalized costs. Based on the user centric CMS, we implement a social Web TV with device-aware function, which can aggregate, transcode, and deploy content over social networking service independently.
https://doi.org/10.1109/JCN.2015.000052 인용 KSCI

Optimization Model on the World Wide Web Organization with respect to Content Centric Measures (월드와이드웹의 내용기반 구조최적화)

Lee Wookey;Kim Seung;Kim Hando;Kang Sukho
- Journal of the Korean Operations Research and Management Science Society
- /
- v.30 no.1
- /
- pp.187-198
- /
- 2005
The structure of a Web site can prevent the search robots or crawling agents from confusion in the midst of huge forest of the Web pages. We formalize the view on the World Wide Web and generalize it as a hierarchy of Web objects such as the Web as a set of Web sites, and a Web site as a directed graph with Web nodes and Web edges. Our approach results in the optimal hierarchical structure that can maximize the weight, tf-idf (term frequency and inverse document frequency), that is one of the most widely accepted content centric measures in the information retrieval community, so that the measure can be used to embody the semantics of search query. The experimental results represent that the optimization model is an effective alternative in the dynamically changing Web environment by replacing conventional heuristic approaches.
PDF KSCI

The Effects of Web Site Architecture, Web Site Content Quantity and User Task Complexity on Usability (웹사이트의 구조와 정보량 및 사용자 과업 복잡도가 사용성에 미치는 영향)

Koh, Seok-Ha;Kim, Ju-Sung;Kim, Young-Ki
- Journal of Information Technology Applications and Management
- /
- v.12 no.2
- /
- pp.145-161
- /
- 2005
In this paper, we present an experiment conducted to examine how the web site's architecture, content quantity and the user task complexity affect its usability. The experiment was performed in two phases on college students to visit existing web sites which are readily accessible via the Internet. The results of experiment show that the web site's architecture significantly affects the efficiency and the effectiveness interactively with its content quantity and the user task's complexity. On the other hand, none of the above three factors show significant effects on the satisfaction and the learnability. In particular, the web site's content quantity does not have any statistically significant effects on the satisfaction and the learnability. It implies that the factors affecting the satisfaction and the learnability are different from those affecting the efficiency and the effectiveness. The analysis reassures that it is essential to consider the context of use in designing a web site.
PDF

Web 2.0 and Web novels -Focusing on Web-based Romance Novels (웹 2.0 시대와 웹소설 -웹 로맨스 서사를 중심으로)

Ryu, Su-Yun
- Journal of Popular Narrative
- /
- v.25 no.4
- /
- pp.9-43
- /
- 2019
Web novels are one of the most actively adapted genre novels under a new medium called the Internet. Research on cultural content implemented on top of digital media is naturally closely related to environmental changes in digital media. The same goes for Web novels sparked by the identity of Web platforms. Especially in the case of web novels, the platform itself that provides them has triggered direct changes in genre code and reading patterns. From this perspective, this thesis wanted to examine the formation process and strategic features of web novels, which became content and products on the web platform environment. First of all, through the formation process ranging from communication novels to Internet novels and web novels, I arranged the transition to digital media and the change of genre novel market. This was an attempt to extract that Web novels not only have continuity as genre novels, but also have a turning point as digital content. Web novels are digital content that internalizes the values of the Web 2.0 era. It should also be a core product that grows the pie in the market in its own right. This paper noted that web novels are content that embodies these consumption values. So this thesis considered about what is the visualization and commercialization strategy of the web-based novels that is currently formed, and what is the current status of the web-based romance novels as the content and the product that is driving OSMU most actively in the process of commercialization. Through this process, I found that the greatest characteristic of web novels as genre novels that have evolved into digital content is their division and crack of genre.
https://doi.org/10.18856/jpn.2019.25.4.001 인용

A Study on Content Characteristics of Hyperrealism Web Drama (하이퍼리얼리즘형 웹드라마 <좋좋소>의 콘텐츠 특성 연구)

Lee, Jun Seok;Jung, Won Sik
- Journal of Korea Multimedia Society
- /
- v.25 no.1
- /
- pp.114-123
- /
- 2022
Korean web drama are undergoing changes before and after the popularization of the OTT platform. The previous web drama were mainly aimed at teenagers, with school content as a genre feature, and the active casting of new idols and the use of B-grade emotions as production characteristics. After the popularization of OTT represented by YouTube and Netflix, there are signs of significant change. This study examined the hyperrealistic content characteristics of the web drama , which can be the center of the change. Through this, it was considered that has a greatly changed aspect from previous web dramas through stereoscopic typification, emphasis on detailed expression, active use of experiential narrative characteristics, and a new media creator-centered production method.
https://doi.org/10.9717/kmms.2022.25.1.114 인용 PDF KSCI HTML

Korean Web Content Extraction using Tag Rank Position and Gradient Boosting (태그 서열 위치와 경사 부스팅을 활용한 한국어 웹 본문 추출)

Mo, Jonghoon;Yu, Jae-Myung
- Journal of KIISE
- /
- v.44 no.6
- /
- pp.581-586
- /
- 2017
For automatic web scraping, unnecessary components such as menus and advertisements need to be removed from web pages and main contents should be extracted automatically. A content block tends to be located in the middle of a web page. In particular, Korean web documents rarely include metadata and have a complex design; a suitable method of content extraction is therefore needed. Existing content extraction algorithms use the textual and structural features of content blocks because processing visual features requires heavy computation for rendering and image processing. In this paper, we propose a new content extraction method using the tag positions in HTML as a quasi-visual feature. In addition, we develop a tag rank position, a type of tag position not affected by text length, and show that gradient boosting with the tag rank position is a very accurate content extraction method. The result of this paper shows that the content extraction method can be used to collect high-quality text data automatically from various web pages.
https://doi.org/10.5626/JOK.2017.44.6.581 인용 KSCI

Development of Web-based Multimedia Content for a Physical Examination and Health Assessment Course (웹기반의 건강사정 멀티미디어 컨텐츠 개발)

Oh Pok-Ja;Kim Il-Ok;Shin Sung-Rae;Jung Hoe-Kyung
- Journal of Korean Academy of Nursing
- /
- v.34 no.6
- /
- pp.994-1003
- /
- 2004
Purpose: This study was to develop Web-based multimedia content for Physical Examination and Health Assesment. Method: The multimedia content was developed based on Jung's teaching and learning structure plan model, using the following 5 processes: 1) Analysis Stage, 2) Planning Stage, 3) Storyboard Framing and Production Stage, 4) Program Operation Stage, and 5) Final Evaluation Stage. Results: The web based multimedia content consisted of an intro movie, main page and sub pages. On the main page, there were 6 menu bars that consisted of Announcement center, Information of professors, Lecture guide, Cyber lecture, Q&A, and Data centers, and a site map which introduced 15 week lectures. In the operation of web based multimedia content, HTML, JavaScript, Flash, and multimedia technology(Audio and Video) were utilized and the content consisted of text content, interactive content, animation, and audio & video. Consultation with the experts in context, computer engineering, and educational technology was utilized in the development of these processes. Conclusions: Web-based multimedia content is expected to offer individualized and tailored learning opportunities to maximize and facilitate the effectiveness of the teaching and learning process. Therefore, multimedia content should be utilized concurrently with the lecture in the Physical Examination and Health Assesment classes as a vital teaching aid to make up for the weakness of the face-to- face teaching-learning method.
PDF KSCI

DMB News Application Creation System for DMB Based on Web Content (DMB 환경에서 웹 콘텐츠를 활용한 뉴스 어플리케이션 생성 시스템 설계)

Jang, Yun-Yong;Choy, Yoon-Chul;Lim, Soon-Bum
- 한국HCI학회:학술대회논문집
- /
- 2008.02a
- /
- pp.612-617
- /
- 2008
To develop the broadcasting application for DMB, the programmers have to aggregate the content. In this case of content such as news, it would be hard to provide successively updated content. This paper introduces a creation system which can automatically create the news application for data broadcasting on DMB based on the web news content updated immediately. The designed creation system aggregates the news content using RSS based XML and produces the news application by transcoding the web content which can be applied on DMB.
PDF

Search Result 1,150, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)