• Title/Summary/Keyword: Web Mining

Search Result 550, Processing Time 0.021 seconds

A New Approach to Web Data Mining Based on Cloud Computing

  • Zhu, Wenzheng;Lee, Changhoon
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.4
    • /
    • pp.181-186
    • /
    • 2014
  • Web data mining aims at discovering useful knowledge from various Web resources. There is a growing trend among companies, organizations, and individuals alike of gathering information through Web data mining to utilize that information in their best interest. In science, cloud computing is a synonym for distributed computing over a network; cloud computing relies on the sharing of resources to achieve coherence and economies of scale, similar to a utility over a network, and means the ability to run a program or application on many connected computers at the same time. In this paper, we propose a new system framework based on the Hadoop platform to realize the collection of useful information of Web resources. The system framework is based on the Map/Reduce programming model of cloud computing. We propose a new data mining algorithm to be used in this system framework. Finally, we prove the feasibility of this approach by simulation experiment.

User modeling based on fuzzy category and interest for web usage mining

  • Lee, Si-Hun;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.1
    • /
    • pp.88-93
    • /
    • 2005
  • Web usage mining is a research field for searching potentially useful and valuable information from web log file. Web log file is a simple list of pages that users refer. Therefore, it is not easy to analyze user's current interest field from web log file. This paper presents web usage mining method for finding users' current interest based on fuzzy categories. We consider not only how many times a user visits pages but also when he visits. We describe a user's current interest with a fuzzy interest degree to categories. Based on fuzzy categories and fuzzy interest degrees, we also propose a method to cluster users according to their interests for user modeling. For user clustering, we define a category vector space. Experiments show that our method properly reflects the time factor of users' web visiting as well as the users' visit number.

Web Navigation Mining by Integrating Web Usage Data and Hyperlink Structures (웹 사용 데이타와 하이퍼링크 구조를 통합한 웹 네비게이션 마이닝)

  • Gu Heummo;Choi Joongmin
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.416-427
    • /
    • 2005
  • Web navigation mining is a method of discovering Web navigation patterns by analyzing the Web access log data. However, it is admitted that the log data contains noisy information that leads to the incorrect recognition of user navigation path on the Web's hyperlink structure. As a result, previous Web navigation mining systems that exploited solely the log data have not shown good performance in discovering correct Web navigation patterns efficiently, mainly due to the complex pre-processing procedure. To resolve this problem, this paper proposes a technique of amalgamating the Web's hyperlink structure information with the Web access log data to discover navigation patterns correctly and efficiently. Our implemented Web navigation mining system called SPMiner produces a WebTree from the hyperlink structure of a Web site that is used trl eliminate the possible noises in the Web log data caused by the user's abnormal navigational activities. SPMiner remarkably reduces the pre-processing overhead by using the structure of the Web, and as a result, it could analyze the user's search patterns efficiently.

An Extended Dynamic Web Page Recommendation Algorithm Based on Mining Frequent Traversal Patterns (빈발 순회패턴 탐사에 기반한 확장된 동적 웹페이지 추천 알고리즘)

  • Lee KeunSoo;Lee Chang Hoon;Yoon Sun-Hee;Lee Sang Moon;Seo Jeong Min
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.9
    • /
    • pp.1163-1176
    • /
    • 2005
  • The Web is the largest distributed information space but, the individual's capacity to read and digest contents is essentially fixed. In these Web environments, mining traversal patterns is an important problem in Web mining with a host of application domains including system design and information services. Conventional traversal pattern mining systems use the inter-pages association in sessions with only a very restricted mechanism (based on vector or matrix) for generating frequent K-Pagesets. We extend a family of novel algorithms (termed WebPR - Web Page Recommend) for mining frequent traversal patterns and then pageset to recommend. We add a WebPR(A) algorithm into a family of WebPR algorithms, and propose a new winWebPR(T) algorithm introducing a window concept on WebPR(T). Including two extended algorithms, our experimentation with two real data sets, including LadyAsiana and KBS media server site, clearly validates that our method outperforms conventional methods.

  • PDF

Analysis of Web Log Using Clementine Data Mining Solution (클레멘타인 데이터마이닝 솔루션을 이용한 웹 로그 분석)

  • Kim, Jae-Kyeong;Lee, Kun-Chang;Chung, Nam-Ho;Kwon, Soon-Jae;Cho, Yoon-Ho
    • Information Systems Review
    • /
    • v.4 no.1
    • /
    • pp.47-67
    • /
    • 2002
  • Since mid 90's, most of firms utilizing web as a communication vehicle with customers are keenly interested in web log file which contains a lot of trails customers left on the web, such as IP address, reference address, cookie file, duration time, etc. Therefore, an appropriate analysis of the web log file leads to understanding customer's behaviors on the web. Its analysis results can be used as an effective marketing information for locating potential target customers. In this study, we introduced a web mining technique using Clementine of SPSS, and analyzed a set of real web log data file on a certain Internet hub site. We also suggested a process of various strategies build-up based on the web mining results.

Merchandise Management Using Web Mining in Business To Customer Electronic Commerce (기업과 소비자간 전자상거래에서의 웹 마이닝을 이용한 상품관리)

  • 임광혁;홍한국;박상찬
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.97-121
    • /
    • 2001
  • Until now, we have believed that one of advantages of cyber market is that it can virtually display and sell goods because it does not necessary maintain expensive physical shops and inventories. But, in a highly competitive environment, business model that does away with goods in stock must be modified. As we know in the case of AMAZON, leading companies already consider merchandise management as a critical success factor in their business model. That is, a solution to compete against one's competitors in a highly competitive environment is merchandise management as in the traditional retail market. Cyber market has not only past sales data but also web log data before sales data that contains information of path that customer search and purchase on cyber market as compared with traditional retail market. So if we can correctly analyze the characteristics of before sales patterns using web log data, we can better prepare for the potential customers and effectively manage inventories and merchandises. We introduce a systematic analysis method to extract useful data for merchandise management - demand forecasting, evaluating & selecting - using web mining that is the application of data mining techniques to the World Wide Web. We use various techniques of web mining such as clustering, mining association rules, mining sequential patterns.

  • PDF

Web Recommendation Mechanism Based on Case-Based Reasoning and Web Data Mining

  • Kim, Jin-Sung
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.12a
    • /
    • pp.443-446
    • /
    • 2002
  • In this research, we suggest a Web-based hybrid recommendation mechanism using CBR (Case-Based Reasoning) and web data mining. Data mining is used as an efficient mechanism in reasoning for relationship between goods, customers' preference and future behavior. CBR systems are normally used in problems for which it is difficult to define rules. We use CBR as an AI tool to recommend the similar purchase case. A Web-log data gathered in real-world Internet shopping mall was given to illustrate the quality of the proposed mechanism. The results showed that the CBR and web data mining-based hybrid recommendation mechanism could reflect both association knowledge and purchase information about our former customers.

User Access Patterns Discovery based on Apriori Algorithm under Web Logs (웹 로그에서의 Apriori 알고리즘 기반 사용자 액세스 패턴 발견)

  • Ran, Cong-Lin;Joung, Suck-Tae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.681-689
    • /
    • 2019
  • Web usage pattern discovery is an advanced means by using web log data, and it's also a specific application of data mining technology in Web log data mining. In education Data Mining (DM) is the application of Data Mining techniques to educational data (such as Web logs of University, e-learning, adaptive hypermedia and intelligent tutoring systems, etc.), and so, its objective is to analyze these types of data in order to resolve educational research issues. In this paper, the Web log data of a university are used as the research object of data mining. With using the database OLAP technology the Web log data are preprocessed into the data format that can be used for data mining, and the processing results are stored into the MSSQL. At the same time the basic data statistics and analysis are completed based on the processed Web log records. In addition, we introduced the Apriori Algorithm of Web usage pattern mining and its implementation process, developed the Apriori Algorithm program in Python development environment, then gave the performance of the Apriori Algorithm and realized the mining of Web user access pattern. The results have important theoretical significance for the application of the patterns in the development of teaching systems. The next research is to explore the improvement of the Apriori Algorithm in the distributed computing environment.

Development of Active Data Mining Component for Web Database Applications (웹 데이터베이스 응용을 위한 액티브데이터마이닝 컴포넌트 개발)

  • Choi, Yong-Goo
    • Journal of Information Technology Applications and Management
    • /
    • v.15 no.2
    • /
    • pp.1-14
    • /
    • 2008
  • The distinguished prosperity of information technologies from great progress of e-business during the last decade has unavoidably made software development for active data mining to discovery hidden predictive information regarding business trends and behavior from vary large databases. Therefore this paper develops an active mining object(ADMO) component, which provides real-time predictive information from web databases. The ADMO component is to extended ADO(ActiveX Data Object) component to active data mining component based on COM(Component Object Model) for application program interface(API). ADMO component development made use of window script component(WSC) based on XML(eXtensible Markup Language). For the purpose of investigating the application environments and the practical schemes of the ADMO component, experiments for diverse practical applications were performed in this paper. As a result, ADMO component confirmed that it could effectively extract the analytic information of classification and aggregation from vary large databases for Web services.

  • PDF

Design and Implementation of a Web Mining System Using WMSQL (WMSQL을 이용한 Web Mining System의 설계 및 구현)

  • 최성경;박민호;이근호;백인구;한기준
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.166-168
    • /
    • 2000
  • World-Wide Web(WWW)이 발전하면서 웹으로부터 사용자가 원하는 정보를 효과적으로 찾기 위한 정보검색 방법론이 연구가들로부터 중요한 이슈로서 대두되었고 이에 기반하여 여러 상용 정보검색 시스템들이 등장하게 되었다. 그러나, 이러한 정보검색 시스템들은 웹에 존재하는 데이터의 비구조화와 다양성, 사용자의 다양성, 그리고 정보의 질과 양이 문제로 인하여 사용자의 의도와 요구에 맞는 정보를 구하기 어렵다. 또한, 웹 상의 많은 데이터들로부터 단순히 일반적인 정보만을 얻어 이용할 뿐 효과적인 지식의 탐사나 관리 기능을 갖고 있지 않다. 본 논문에서는 이전의 정보검색 시스템들이 갖는 문제점을 분석하고 이를 보완하고자 웹에 대한 지식 발견(Knowledge Discovery)의 새로운 시도인 웹 마이닝(Web Mining)에 대한 관련 연구를 토대로 웹 마이닝 시스템을 설계 및 구현한다. 특히, 사용자의 의도를 정확히 전달하기 위하여 기존의 SQL 과 유사한 형태의 질의어인 WMSQL을 사용하여 웹 문서의 내용에 직접적인 웹 마이닝을 수행하는 Web Content Mining을 개발함으로서 웹의 비구조화된 데이터로부터 의미있고 함축적인 지식을 추출할 수 있도록 한다.

  • PDF