• Title/Summary/Keyword: web data mining

Search Result 408, Processing Time 0.022 seconds

Consumer behavior prediction using Airbnb web log data (에어비앤비(Airbnb) 웹 로그 데이터를 이용한 고객 행동 예측)

  • An, Hyoin;Choi, Yuri;Oh, Raeeun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.391-404
    • /
    • 2019
  • Customers' fixed characteristics have often been used to predict customer behavior. It has recently become possible to track customer web logs as customer activities move from offline to online. It has become possible to collect large amounts of web log data; however, the researchers only focused on organizing the log data or describing the technical characteristics. In this study, we predict the decision-making time until each customer makes the first reservation, using Airbnb customer data provided by the Kaggle website. This data set includes basic customer information such as gender, age, and web logs. We use various methodologies to find the optimal model and compare prediction errors for cases with web log data and without it. We consider six models such as Lasso, SVM, Random Forest, and XGBoost to explore the effectiveness of the web log data. As a result, we choose Random Forest as our optimal model with a misclassification rate of about 20%. In addition, we confirm that using web log data in our study doubles the prediction accuracy in predicting customer behavior compared to not using it.

Hybrid Product Recommendation for e-Commerce : A Clustering-based CF Algorithm

  • Ahn, Do-Hyun;Kim, Jae-Sik;Kim, Jae-Kyeong;Cho, Yoon-Ho
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.416-425
    • /
    • 2003
  • Recommender systems are a personalized information filtering technology to help customers find the products they would like to purchase. Collaborative filtering (CF) has been known to be the most successful recommendation technology. However its widespread use in e-commerce has exposed two research issues, sparsity and scalability. In this paper, we propose several hybrid recommender procedures based on web usage mining, clustering techniques and collaborative filtering to address these issues. Experimental evaluation of suggested procedures on real e-commerce data shows interesting relation between characteristics of procedures and diverse situations.

  • PDF

Systematic Review on Chatbot Techniques and Applications

  • Park, Dong-Min;Jeong, Seong-Soo;Seo, Yeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.18 no.1
    • /
    • pp.26-47
    • /
    • 2022
  • Chatbots were an important research subject in the past. A chatbot is a computer program or an artificial intelligence program that participates in a conversation via auditory or textual methods. As the research on chatbots progressed, some important issues regarding them changed over time. Therefore, it is necessary to review the technology with a focus on recent advancements and core research technologies. In this paper, we introduce five different chatbot technologies: natural language processing, pattern matching, semantic web, data mining, and context-aware computer. We also introduce the latest technology for the chatbot researchers to recognize the present situation and channelize it in the right direction.

Emerging Data Management Tools and Their Implications for Decision Support

  • Eorm, Sean B.;Novikova, Elena;Yoo, Sangjin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.2 no.2
    • /
    • pp.189-207
    • /
    • 1997
  • Recently, we have witnessed a host of emerging tools in the management support systems (MSS) area including the data warehouse/multidimensinal databases (MDDB), data mining, on-line analytical processing (OLAP), intelligent agents, World Wide Web(WWW) technologies, the Internet, and corporate intranets. These tools are reshaping MSS developments in organizations. This article reviews a set of emerging data management technologies in the knowledge discovery in databases(KDD) process and analyzes their implications for decision support. Furthermore, today's MSS are equipped with a plethora of AI techniques (artifical neural networks, and genetic algorithms, etc) fuzzy sets, modeling by example , geographical information system(GIS), logic modeling, and visual interactive modeling (VIM) , All these developments suggest that we are shifting the corporate decision making paradigm form information-driven decision making in the1980s to knowledge-driven decision making in the 1990s.

  • PDF

Analyzing RDF Data in Linked Open Data Cloud using Formal Concept Analysis

  • Hwang, Suk-Hyung;Cho, Dong-Heon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.6
    • /
    • pp.57-68
    • /
    • 2017
  • The Linked Open Data(LOD) cloud is quickly becoming one of the largest collections of interlinked datasets and the de facto standard for publishing, sharing and connecting pieces of data on the Web. Data publishers from diverse domains publish their data using Resource Description Framework(RDF) data model and provide SPARQL endpoints to enable querying their data, which enables creating a global, distributed and interconnected dataspace on the LOD cloud. Although it is possible to extract structured data as query results by using SPARQL, users have very poor in analysis and visualization of RDF data from SPARQL query results. Therefore, to tackle this issue, based on Formal Concept Analysis, we propose a novel approach for analyzing and visualizing useful information from the LOD cloud. The RDF data analysis and visualization technique proposed in this paper can be utilized in the field of semantic web data mining by extracting and analyzing the information and knowledge inherent in LOD and supporting classification and visualization.

Clustering Analysis of Films on Box Office Performance : Based on Web Crawling (영화 흥행과 관련된 영화별 특성에 대한 군집분석 : 웹 크롤링 활용)

  • Lee, Jai-Ill;Chun, Young-Ho;Ha, Chunghun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.3
    • /
    • pp.90-99
    • /
    • 2016
  • Forecasting of box office performance after a film release is very important, from the viewpoint of increase profitability by reducing the production cost and the marketing cost. Analysis of psychological factors such as word-of-mouth and expert assessment is essential, but hard to perform due to the difficulties of data collection. Information technology such as web crawling and text mining can help to overcome this situation. For effective text mining, categorization of objects is required. In this perspective, the objective of this study is to provide a framework for classifying films according to their characteristics. Data including psychological factors are collected from Web sites using the web crawling. A clustering analysis is conducted to classify films and a series of one-way ANOVA analysis are conducted to statistically verify the differences of characteristics among groups. The result of the cluster analysis based on the review and revenues shows that the films can be categorized into four distinct groups and the differences of characteristics are statistically significant. The first group is high sales of the box office and the number of clicks on reviews is higher than other groups. The characteristic of the second group is similar with the 1st group, while the length of review is longer and the box office sales are not good. The third group's audiences prefer to documentaries and animations and the number of comments and interests are significantly lower than other groups. The last group prefer to criminal, thriller and suspense genre. Correspondence analysis is also conducted to match the groups and intrinsic characteristics of films such as genre, movie rating and nation.

Association Rule by Considering Users Web Site Visiting Time (사용자 웹 사이트 방문 시간을 고려한 연관 규칙)

  • Kang, Hyung-Chang;Kim, Chul-Soo;Lee, Dong-Cheol
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.29 no.2
    • /
    • pp.104-109
    • /
    • 2006
  • We can offer suitable information to users analyzing the pattern of users. An association rule is one of data mining techniques which can discover the pattern. We use an association rule which considers the web page visiting time and we should the pattern analyse of users. The offered method puts the weights in Web page visiting time of the user and produces an association rule. Weight is web page visiting time unit divide to total of web page visiting time. We offer rather meaningful result the association rule by Apriori algorithm. This method that proposes in the paper offers rather meaningful result Apriori algorithm

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint (순차 데이터 스트림에서 발생 간격 제한 조건을 활용한 빈발 순차 패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.35-46
    • /
    • 2010
  • Sequential pattern mining is one of the essential data mining tasks, and it is widely used to analyze data generated in various application fields such as web-based applications, E-commerce, bioinformatics, and USN environments. Recently data generated in the application fields has been taking the form of continuous data streams rather than finite stored data sets. Considering the changes in the form of data, many researches have been actively performed to efficiently find sequential patterns over data streams. However, conventional researches focus on reducing processing time and memory usage in mining sequential patterns over a target data stream, so that a research on mining more interesting and useful sequential patterns that efficiently reflect the characteristics of the data stream has been attracting no attention. This paper proposes a mining method of sequential patterns over data streams with a gap constraint, which can help to find more interesting sequential patterns over the data streams. First, meanings of the gap for a sequential pattern and gap-constrained sequential patterns are defined, and subsequently a mining method for finding gap-constrained sequential patterns over a data stream is proposed.

An Optimized User Behavior Prediction Model Using Genetic Algorithm On Mobile Web Structure

  • Hussan, M.I. Thariq;Kalaavathi, B.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.5
    • /
    • pp.1963-1978
    • /
    • 2015
  • With the advancement of mobile web environments, identification and analysis of the user behavior play a significant role and remains a challenging task to implement with variations observed in the model. This paper presents an efficient method for mining optimized user behavior prediction model using genetic algorithm on mobile web structure. The framework of optimized user behavior prediction model integrates the temporary and permanent register information and is stored immediately in the form of integrated logs which have higher precision and minimize the time for determining user behavior. Then by applying the temporal characteristics, suitable time interval table is obtained by segmenting the logs. The suitable time interval table that split the huge data logs is obtained using genetic algorithm. Existing cluster based temporal mobile sequential arrangement provide efficiency without bringing down the accuracy but compromise precision during the prediction of user behavior. To efficiently discover the mobile users' behavior, prediction model is associated with region and requested services, a method called optimized user behavior Prediction Model using Genetic Algorithm (PM-GA) on mobile web structure is introduced. This paper also provides a technique called MAA during the increase in the number of models related to the region and requested services are observed. Based on our analysis, we content that PM-GA provides improved performance in terms of precision, number of mobile models generated, execution time and increasing the prediction accuracy. Experiments are conducted with different parameter on real dataset in mobile web environment. Analytical and empirical result offers an efficient and effective mining and prediction of user behavior prediction model on mobile web structure.