• Title/Summary/Keyword: Web news page

Search Result 24, Processing Time 0.03 seconds

Design and Adaptation for Internet News Data Extraction Middleware(INDEM) System

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.55-62
    • /
    • 2016
  • In this paper, we propose the INDEM(Internet News Data Extraction Middleware) system for the removal of the unnecessary data in internet news. Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information service, it contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page. The INDEM system parses html and explores the XPath, and it is to perform the analysis. The user simply utilize INDEM by implementing an abstract class that provides INDEM, and can obtain the analysis information. INDEM System through this process delivers the analysis information including the main contents of news site to the users. In this paper, the INDEM system was adapted in a stand-alone and web service system and it was evaluated on the basis of 16 news site. As a result, performance of the INDEM system is affected in html source data size and complexity of used html grammar than the main news data size.

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

Main Content Extraction from Web Pages Based on Node Characteristics

  • Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.39-48
    • /
    • 2017
  • Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.

Realization of a Remote Management System for Process Inspection of Chip-Mounter

  • Lim, Sun-Jong;Joon Lyon
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.91.4-91
    • /
    • 2002
  • Today, Internal offers WWW(World Wide Web), remote control, file transfer and e-mail service. Among the services, WWW takes large portion because of convenient GUI, easy information search and unlimited information registration. WWW service gives the comfort in life such as goods purchase, information search, real-time news, internet TV and medical diagnosis. Remote Monitoring Server(RMS) Ssystem that uses internet and WWW is constructed for chip mounter. Hardware base consists of RMS, chip mounter and C/S(Customer Service) service. Software includes DBMS and various modules in server home page. Web browser provide product num her, bad product number, troubl...

  • PDF

Content-based Recommendation Based on Social Network for Personalized News Services (개인화된 뉴스 서비스를 위한 소셜 네트워크 기반의 콘텐츠 추천기법)

  • Hong, Myung-Duk;Oh, Kyeong-Jin;Ga, Myung-Hyun;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.57-71
    • /
    • 2013
  • Over a billion people in the world generate new news minute by minute. People forecasts some news but most news are from unexpected events such as natural disasters, accidents, crimes. People spend much time to watch a huge amount of news delivered from many media because they want to understand what is happening now, to predict what might happen in the near future, and to share and discuss on the news. People make better daily decisions through watching and obtaining useful information from news they saw. However, it is difficult that people choose news suitable to them and obtain useful information from the news because there are so many news media such as portal sites, broadcasters, and most news articles consist of gossipy news and breaking news. User interest changes over time and many people have no interest in outdated news. From this fact, applying users' recent interest to personalized news service is also required in news service. It means that personalized news service should dynamically manage user profiles. In this paper, a content-based news recommendation system is proposed to provide the personalized news service. For a personalized service, user's personal information is requisitely required. Social network service is used to extract user information for personalization service. The proposed system constructs dynamic user profile based on recent user information of Facebook, which is one of social network services. User information contains personal information, recent articles, and Facebook Page information. Facebook Pages are used for businesses, organizations and brands to share their contents and connect with people. Facebook users can add Facebook Page to specify their interest in the Page. The proposed system uses this Page information to create user profile, and to match user preferences to news topics. However, some Pages are not directly matched to news topic because Page deals with individual objects and do not provide topic information suitable to news. Freebase, which is a large collaborative database of well-known people, places, things, is used to match Page to news topic by using hierarchy information of its objects. By using recent Page information and articles of Facebook users, the proposed systems can own dynamic user profile. The generated user profile is used to measure user preferences on news. To generate news profile, news category predefined by news media is used and keywords of news articles are extracted after analysis of news contents including title, category, and scripts. TF-IDF technique, which reflects how important a word is to a document in a corpus, is used to identify keywords of each news article. For user profile and news profile, same format is used to efficiently measure similarity between user preferences and news. The proposed system calculates all similarity values between user profiles and news profiles. Existing methods of similarity calculation in vector space model do not cover synonym, hypernym and hyponym because they only handle given words in vector space model. The proposed system applies WordNet to similarity calculation to overcome the limitation. Top-N news articles, which have high similarity value for a target user, are recommended to the user. To evaluate the proposed news recommendation system, user profiles are generated using Facebook account with participants consent, and we implement a Web crawler to extract news information from PBS, which is non-profit public broadcasting television network in the United States, and construct news profiles. We compare the performance of the proposed method with that of benchmark algorithms. One is a traditional method based on TF-IDF. Another is 6Sub-Vectors method that divides the points to get keywords into six parts. Experimental results demonstrate that the proposed system provide useful news to users by applying user's social network information and WordNet functions, in terms of prediction error of recommended news.

Personalized Search Service in Semantic Web (시멘틱 웹 환경에서의 개인화 검색)

  • Kim, Je-Min;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.533-540
    • /
    • 2006
  • The semantic web environment promise semantic search of heterogeneous data from distributed web page. Semantic search would resuit in an overwhelming number of results for users is increased, therefore elevating the need for appropriate personalized ranking schemes. Culture Finder helps semantic web agents obtain personalized culture information. It extracts meta data for each web page(culture news, culture performance, culture exhibition), perform semantic search and compute result ranking point to base user profile. In order to work efficient, Culture Finder uses five major technique: Machine learning technique for generating user profile from user search behavior and meta data repository, an efficient semantic search system for semantic web agent, query analysis for representing query and query result, personalized ranking method to provide suitable search result to user, upper ontology for generating meta data. In this paper, we also present the structure used in the Culture Finder to support personalized search service.

Wrapper-based Economy Data Collection System Design And Implementation (래퍼 기반 경제 데이터 수집 시스템 설계 및 구현)

  • Piao, Zhegao;Gu, Yeong Hyeon;Yoo, Seong Joon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.227-230
    • /
    • 2015
  • For analyzing and prediction of economic trends, it is necessary to collect particular economic news and stock data. Typical Web crawler to analyze the page content, collects document and extracts URL automatically. On the other hand there are forms of crawler that can collect only document of a particular topic. In order to collect economic news on a particular Web site, we need to design a crawler which could directly analyze its structure and gather data from it. The wrapper-based web crawler design is required. In this paper, we design a crawler wrapper for Economic news analysis system based on big data and implemented to collect data. we collect the data which stock data, sales data from USA auto market since 2000 with wrapper-based crawler. USA and South Korea's economic news data are also collected by wrapper-based crawler. To determining the data update frequency on the site. And periodically updated. We remove duplicate data and build a structured data set for next analysis. Primary to remove the noise data, such as advertising and public relations, etc.

  • PDF

Realtime Digital Information Display System based on Web Server (웹 서버 연동의 실시간 디지털 정보 디스플레이 시스템)

  • Lee, Se-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.153-161
    • /
    • 2009
  • In this paper, we designed and implemented realtime DID(digital information display) system based on web server that displayed multimedia contents. The contents are weather, news information on the internet web sites and public relations or advertisements data on local systems. The DID system has client/server architecture that the server send to client that schedule informations and multimedia contents received form web server and the client displayed the contents though scheduled information. Therefore the systems overcome network fault for the mean time. Also, the system has realtime services of web page filtering function that extract the partial information of specific web pages.

Information Sharing and Evaluation as Determinants of Spread of Fake News on Social Media among Nigerian Youths: Experience from COVID-19 Pandemic

  • Sulaiman, Kabir Alabi;Adeyemi, Ismail Olatunji;Ayegun, Ibrahim
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.10 no.4
    • /
    • pp.65-82
    • /
    • 2020
  • This study examined information sharing and evaluation as determinants of the spread of fake news among Nigerian youths on social media using experience from COVID-19 pandemic. A descriptive survey design was adopted for the study and a Web-based questionnaire (Google Forms) was used to collect data for the study. The total responses of 278 were collected from the participants, which represents the unit of analysis. The finding of the study revealed that most Nigerian youths used Facebook, Twitter, WhatsApp and Instagram to share information on COVID-19. However, only a few Nigerians used Linkedln and other types of social media to share information on COVID-19. It was also found that building a relationship with social media communities, enjoyment and risk taking, and political inclination influence the sharing behavior of Nigerian youths during the COVID-19 pandemic. Results show that social media handle/page found sharing of fake news on COVID-19 especially on the treatment, vaccines numbers of cases and symptoms. The study concludes that there is a positive relationship between information evaluation and the spreading of fake news on COVID-19 among Nigerians. Information sharing and evaluation should be done with the utmost level of objectivity and sincerity.

The Influence of the Introduction of Smart Phone on Using Portal Sites: An Exploratory Study by the Analysis on Smart Phone Users' Web Traffic (스마트폰 도입이 포털사이트 이용에 미친 영향: 스마트폰 이용자의 웹 트래픽 분석을 통한 탐색적 연구)

  • Kim, Wi-Geun
    • Korean journal of communication and information
    • /
    • v.64
    • /
    • pp.109-135
    • /
    • 2013
  • This study is for empirical verification of the influence of the introduction of smart phone on using the portal sites that were affected the most in the previous media environment. To achieve this, Web traffic data that are the result of smart phone users' practical Web uses have collected longitudinally and analyzed. The research results are the following: First, the use hours of portal sites have decreased about 15% and the page views have did about 35%, since using smart phones was diffused and habituated in earnest during the past two years. Using the community, news media, video, mobile, and game section of portal site sections have reduced. Second, the portal site portion of using smart phone Web is much more than that portion of using PC Web. More than two thirds of smart phone Web use traffic occurs in using portal sites, while more than one third of PC Web use traffic does in using that. Using the news media section is the most of using portal site sections on a smart phone. Third, since the introduction of smart phone, using the news media, communication, and life section of portal site sections have greatly increased, while the community, mobile, and game section have greatly decreased in the aggregate.

  • PDF