• Title/Summary/Keyword: Web Mining

Search Result 549, Processing Time 0.025 seconds

Discovering Web Page Association Rules & Evaluating Web Site Performance To Improve Web Site Structure (웹사이트 구조 개선을 위한 웹페이지 연관 규칙 발견과 웹사이트 성능 평가)

  • 김민정;박승수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.46-48
    • /
    • 2001
  • 현재 수많은 웹사이트들이 웹상에 존재하며 서비스를 하고 있다. 사용자는 여러 웹사이트 중에서 접속하기 편하고 잘 구성된 웹사이트에 접속하기 마련이므로, 잘 구성된 웹사이트 운영은 그 웹사이트의 생존 전략이며 방문자 유지에 필수적이다. 이를 위해 사용자들이 웹사이트에 접속한 기록이 남아 있는 웹서버 로그데이터(이하 웹 로그파일)를 분석하여 사용자들의 브라우징 패턴과 접속 경향, 웹 서버의 에러발생 정보 등을 파악할 수 있다. 본 논문에서는 Web Usage Mining 과 Web Structure Mining 작업으로 로그파일 분석과 웹사이트 구조분석을 수행하여 페이지들의 연관 관계와 웹사이트의 구조 정보를 발견해서 웹사이트의 구조를 개선하는 방안을 제안하고자 한다.

  • PDF

Web Contents Mining System for Opinion Information Searching Engine (의견정보 검색엔진을 위한 웹 콘텐츠 마이닝 시스템)

  • Joo, Hae-Jong;Park, Young-Bae;Choi, Hae-Gil
    • The Journal of Information Technology
    • /
    • v.12 no.3
    • /
    • pp.7-17
    • /
    • 2009
  • This research is about the design of an opinion drawing and analysis system through statistical based Web Mining of web contents, where data of opinions are automatically drawn and analyzed concerning web documents that are scattered around in various web sites that exist within the internet. Furthermore, provides a search service that can easily classify positive/negative opinions and also provide searching and statistical information. Users, who want to search for opinions, can input a specific keyword to observe opinions of others easily. In addition, there is a merit in materializing the monitoring system.

  • PDF

A Process of Digital Design using Web-based CRM(eCRM) (웹기반 CRM(eCRM)을 이용한 디지털디자인 프로세스)

  • 이유리;양종열;정성환;오민권;이옥희
    • Archives of design research
    • /
    • v.14 no.4
    • /
    • pp.109-116
    • /
    • 2001
  • In recent years, the advent of information technology has transformed the way design is done and how companies manage information about their customers. The availability of large volume of data on customers, made possible by new information technology tools, has created opportunities as well as challenges for businesses to apply the data and gain competitive advantage. Under these conditions, eCRM solution through web data mining tools can provide the hidden information(need or preference) and we can understand customer better, while a systematic information management effort can channel the information into effective digital design contents strategies. Therefore, in this study, after reviewing web data mining and eCRM definition and developing a research program, guidelines for digital design contents are provided through the eCRM solution program we developed.

  • PDF

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.

Page Logging System for Web Mining Systems (웹마이닝 시스템을 위한 페이지 로깅 시스템)

  • Yun, Seon-Hui;O, Hae-Seok
    • The KIPS Transactions:PartC
    • /
    • v.8C no.6
    • /
    • pp.847-854
    • /
    • 2001
  • The Web continues to grow fast rate in both a large aclae volume of traffic and the size and complexity of Web sites. Along with growth, the complexity of tasks such as Web site design Web server design and of navigating simply through a Web site have increased. An important input to these design tasks is the analysis of how a web site is being used. The is paper proposes a Page logging System(PLS) identifying reliably user sessions required in Web mining system PLS consists of Page Logger acquiring all the page accesses of the user Log processor producing user session from these data, and statements to incorporate a call to page logger applet. Proposed PLS abbreviates several preprocessing tasks which spends a log of time and efforts that must be performed in Web mining systems. In particular, it simplifies the complexity of transaction identification phase through acquiring directly the amount of time a user stays on a page. Also PLS solves local cache hits and proxy IPs that create problems with identifying user sessions from Web sever log.

  • PDF

A Study on the Implementation of an optimized Algorithm for association rule mining system using Fuzzy Utility (Fuzzy Utility를 활용한 연관규칙 마이닝 시스템을 위한 알고리즘의 구현에 관한 연구)

  • Park, In-Kyu;Choi, Gyoo-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.19-25
    • /
    • 2020
  • In frequent pattern mining, the uncertainty of each item is accompanied by a loss of information. AAlso, in real environment, the importance of patterns changes with time, so fuzzy logic must be applied to meet these requirements and the dynamic characteristics of the importance of patterns should be considered. In this paper, we propose a fuzzy utility mining technique for extracting frequent web page sets from web log databases through fuzzy utility-based web page set mining. Here, the downward closure characteristic of the fuzzy set is applied to remove a large space by the minimum fuzzy utility threshold (MFUT)and the user-defined percentile(UDP). Extensive performance analyses show that our algorithm is very efficient and scalable for Fuzzy Utility Mining using dynamic weights.

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.2
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

Tree-based Navigation Pattern Analysis

  • Choi, Hyun-Jip
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.1
    • /
    • pp.271-279
    • /
    • 2001
  • Sequential pattern discovery is one of main interests in web usage mining. the technique of sequential pattern discovery attempts to find inter-session patterns such that the presence of a set of items is followed by another item in a time-ordered set of server sessions. In this paper, a tree-based sequential pattern finding method is proposed in order to discover navigation patterns in server sessions. At each learning process, the suggested method learns about the navigation patterns per server session and summarized into the modified Rymon's tree.

  • PDF

Pre-Processing of Query Logs in Web Usage Mining

  • Abdullah, Norhaiza Ya;Husin, Husna Sarirah;Ramadhani, Herny;Nadarajan, Shanmuga Vivekanada
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.1
    • /
    • pp.82-86
    • /
    • 2012
  • In For the past few years, query log data has been collected to find user's behavior in using the site. Many researches have studied on the usage of query logs to extract user's preference, recommend personalization, improve caching and pre-fetching of Web objects, build better adaptive user interfaces, and also to improve Web search for a search engine application. A query log contain data such as the client's IP address, time and date of request, the resources or page requested, status of request HTTP method used and the type of browser and operating system. A query log can offer valuable insight into web site usage. A proper compilation and interpretation of query log can provide a baseline of statistics that indicate the usage levels of website and can be used as tool to assist decision making in management activities. In this paper we want to discuss on the tasks performed of query logs in pre-processing of web usage mining. We will use query logs from an online newspaper company. The query logs will undergo pre-processing stage, in which the clickstream data is cleaned and partitioned into a set of user interactions which will represent the activities of each user during their visits to the site. The query logs will undergo essential task in pre-processing which are data cleaning and user identification.

Knowledge Discovery Process from the Web for Effective Knowledge Creation: Application to the Stock Market (효과적인 지식창출을 위한 웹 상의 지식채굴과정 : 주식시장에의 응용)

  • Kim, Kyoung-Jae;Hong, Tae-Ho;Han, In-Goo
    • Knowledge Management Research
    • /
    • v.1 no.1
    • /
    • pp.81-90
    • /
    • 2000
  • This study proposes the knowledge discovery process for the effective mining of knowledge on the web. The proposed knowledge discovery process uses the Prior knowledge base and the Prior knowledge management system to reflect tacit knowledge in addition to explicit knowledge. The prior knowledge management system constructs the prior knowledge base using a fuzzy cognitive map, and defines information to be extracted from the web. In addition, it transforms the extracted information into the form being handled in mining process. Experiments using case-based reasoning and neural network" are performed to verify the usefulness of the proposed model. The experimental results are encouraging and prove the usefulness of the proposed model.

  • PDF