• Title/Summary/Keyword: web classification

Search Result 543, Processing Time 0.019 seconds

An Automatic Web Page Classification System Using Meta-Tag (메타 태그를 이용한 자동 웹페이지 분류 시스템)

  • Kim, Sang-Il;Kim, Hwa-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.4
    • /
    • pp.291-297
    • /
    • 2013
  • Recently, the amount of web pages, which include various information, has been drastically increased according to the explosive increase of WWW usage. Therefore, the need for web page classification arose in order to make it easier to access web pages and to make it possible to search the web pages through the grouping. Web page classification means the classification of various web pages that are scattered on the web according to the similarity of documents or the keywords contained in the documents. Web page classification method can be applied to various areas such as web page searching, group searching and e-mail filtering. However, it is impossible to handle the tremendous amount of web pages on the web by using the manual classification. Also, the automatic web page classification has the accuracy problem in that it fails to distinguish the different web pages written in different forms without classification errors. In this paper, we propose the automatic web page classification system using meta-tag that can be obtained from the web pages in order to solve the inaccurate web page retrieval problem.

A Study on Function Requirements for the Development of a Web Version of Korean Decimal Classification (한국십진분류법 웹 버전 개발을 위한 기능요건 연구)

  • Jeong-Yun Yang
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.147-165
    • /
    • 2023
  • New technologies representing the Fourth Industrial Revolution are already being realized in library services. There is not, however, active research on measures to increase work efficiency by introducing a new technology in the work of "classification" that is part of the traditional librarian jobs they should continue in the future. The Dewey Decimal Classification (DDC) has not issued a print version since 2018. This study analyzes cases of WebDewey, Classification Web, and UDC Online. The functions required for the development of the Korean Decimal Classification (KDC) web version were derived, and the final functions suitable for the development of the KDC web version were proposed through AHP analysis.

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

  • Park, So-Young;Chang, Juno;Kihl, Taesuk
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.268-273
    • /
    • 2013
  • In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.

Improving Malicious Web Code Classification with Sequence by Machine Learning

  • Paik, Incheon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.319-324
    • /
    • 2014
  • Web applications make life more convenient. Many web applications have several kinds of user input (e.g. personal information, a user's comment of commercial goods, etc.) for the activities. On the other hand, there are a range of vulnerabilities in the input functions of Web applications. Malicious actions can be attempted using the free accessibility of many web applications. Attacks by the exploitation of these input vulnerabilities can be achieved by injecting malicious web code; it enables one to perform a variety of illegal actions, such as SQL Injection Attacks (SQLIAs) and Cross Site Scripting (XSS). These actions come down to theft, replacing personal information, or phishing. The existing solutions use a parser for the code, are limited to fixed and very small patterns, and are difficult to adapt to variations. A machine learning method can give leverage to cover a far broader range of malicious web code and is easy to adapt to variations and changes. Therefore, this paper suggests the adaptable classification of malicious web code by machine learning approaches for detecting the exploitation user inputs. The approach usually identifies the "looks-like malicious" code for real malicious code. More detailed classification using sequence information is also introduced. The precision for the "looks-like malicious code" is 99% and for the precise classification with sequence is 90%.

A Classification of Web Business Models (웹 비즈니스 모델의 분류에 관한 연구)

  • Jeong, Hai-Sung;Lee, Yang-Kyu
    • Journal of Applied Reliability
    • /
    • v.10 no.3
    • /
    • pp.183-197
    • /
    • 2010
  • Web businesses are one of the most dynamic industries where lots of new business models are emerging while the other obsoleted ones are fading away almost every day. It is, therefore, difficult to establish a classification scheme for ever-changing web businesses. Previous researches on business models focus on classifying web businesses in one dimension which made some web sites difficult to fit into one category. We propose two dimensional classification scheme based on the means and the sources of revenue. The two dimensional classification provides more clear and broad perspectives of the web businesses and ways to identify web sites in combinations of several business models.

The Informative Support and Emotional Support Classification Model for Medical Web Forums using Text Analysis (의료 웹포럼에서의 텍스트 분석을 통한 정보적 지지 및 감성적 지지 유형의 글 분류 모델)

  • Woo, Jiyoung;Lee, Min-Jung;Ku, Yungchang
    • Journal of Information Technology Services
    • /
    • v.11 no.sup
    • /
    • pp.139-152
    • /
    • 2012
  • In the medical web forum, people share medical experience and information as patients and patents' families. Some people search medical information written in non-expert language and some people offer words of comport to who are suffering from diseases. Medical web forums play a role of the informative support and the emotional support. We propose the automatic classification model of articles in the medical web forum into the information support and emotional support. We extract text features of articles in web forum using text mining techniques from the perspective of linguistics and then perform supervised learning to classify texts into the information support and the emotional support types. We adopt the Support Vector Machine (SVM), Naive-Bayesian, decision tree for automatic classification. We apply the proposed model to the HealthBoards forum, which is also one of the largest and most dynamic medical web forum.

The Basic Concepts Classification as a Bottom-Up Strategy for the Semantic Web

  • Szostak, Rick
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.4 no.1
    • /
    • pp.39-51
    • /
    • 2014
  • The paper proposes that the Basic Concepts Classification (BCC) could serve as the controlled vocabulary for the Semantic Web. The BCC uses a synthetic approach among classes of things, relators, and properties. These are precisely the sort of concepts required by RDF triples. The BCC also addresses some of the syntactic needs of the Semantic Web. Others could be added to the BCC in a bottom-up process that carefully evaluates the costs, benefits, and best format for each rule considered.

Context-based Web Application Design (컨텍스트 기반의 웹 애플리케이션 설계 방법론)

  • Park, Jin-Soo
    • The Journal of Society for e-Business Studies
    • /
    • v.12 no.2
    • /
    • pp.111-132
    • /
    • 2007
  • Developing and managing Web applications are more complex than ever because of their growing functionalities, advancing Web technologies, increasing demands for integration with legacy applications, and changing content and structure. All these factors call for a more inclusive and comprehensive Web application design method. In response, we propose a context-based Web application design methodology that is based on several classification schemes including a Webpage classification, which is useful for identifying the information delivery mechanism and its relevant Web technology; a link classification, which reflects the semantics of various associations between pages; and a software component classification, which is helpful for pinpointing the roles of various components in the course of design. The proposed methodology also incorporates a unique Web application model comprised of a set of information clusters called compendia, each of which consists of a theme, its contextual pages, links, and components. This view is useful for modular design as well as for management of ever-changing content and structure of a Web application. The proposed methodology brings together all the three classification schemes and the Web application model to arrive at a set of both semantically cohesive and syntactically loose-coupled design artifacts.

  • PDF

Performance Improvement of Web Document Classification through Incorporation of Feature Selection and Weighting (특징선택과 특징가중의 융합을 통한 웹문서분류 성능의 개선)

  • Lee, Ah-Ram;Kim, Han-Joon;Man, Xuan
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.141-148
    • /
    • 2013
  • Automated classification systems which utilize machine learning develops classification models through learning process, and then classify unknown data into predefined set of categories according to the model. The performance of machine learning-based classification systems relies greatly upon the quality of features composing classification models. For textual data, we can use their word terms and structure information in order to generate the set of features. Particularly, in order to extract feature from Web documents, we need to analyze tag and hyperlink information. Recent studies on Web document classification focus on feature engineering technology other than machine learning algorithms themselves. Thus this paper proposes a novel method of incorporating feature selection and weighting which can improves classification models effectively. Through extensive experiments using Web-KB document collections, the proposed method outperforms conventional ones.

Web Image Classification using Semantically Related Tags and Image Content (의미적 연관태그와 이미지 내용정보를 이용한 웹 이미지 분류)

  • Cho, Soo-Sun
    • Journal of Internet Computing and Services
    • /
    • v.11 no.3
    • /
    • pp.15-24
    • /
    • 2010
  • In this paper, we propose an image classification which combines semantic relations of tags with contents of images to improve the satisfaction of image retrieval on application domains as huge image sharing sites. To make good use of image retrieval or classification algorithms on huge image sharing sites as Flickr, they are applicable to real tagged Web images. To classify the Web images by 'bag of visual word' based image content, our algorithm includes training the category model by utilizing the preliminary retrieved images with semantically related tags as training data and classifying the test images based on PLSA. In the experimental results on the Flickr Web images, the proposed method produced the better precision and recall rates than those from the existing method using tag information.