• Title/Summary/Keyword: web data mining

Search Result 411, Processing Time 0.027 seconds

Application of Market Basket Analysis to Personalized advertisements on Internet Storefront (인터넷 상점에서 개인화 광고를 위한 장바구니 분석 기법의 활용)

  • 김종우;이경미
    • Korean Management Science Review
    • /
    • v.17 no.3
    • /
    • pp.19-30
    • /
    • 2000
  • Customization and personalization services are considered as a critical success factor to be a successful Internet store or web service provider. As a representative personalization technique, personalized recommendation techniques are studied and commercialized to suggest products or services to a customer of Internet storefronts based on demographics of the customer or based on an analysis of the past purchasing behavior of the customer. The underlining theories of recommendation techniques are statistics, data mining, artificial intelligence, and/or rule-based matching. In the rule-based approach for personalized recommendation, marketing rules for personalization are usually collected from marketing experts and are used to inference with customers data. however, it is difficult to extract marketing rules from marketing experts, and also difficult to validate and to maintain the constructed knowledge base. In this paper, we proposed a marketing rule extraction technique for personalized recommendation on Internet storefronts using market basket analysis technique, a well-known data mining technique. Using marketing basket analysis technique, marketing rules for cross sales are extracted, and are used to provide personalized advertisement selection when a customer visits in an Internet store. An experiment has been performed to evaluate the effectiveness of proposed approach comparing with preference scoring approach and random selection.

  • PDF

Distributed and Scalable Intrusion Detection System Based on Agents and Intelligent Techniques

  • El-Semary, Aly M.;Mostafa, Mostafa Gadal-Haqq M.
    • Journal of Information Processing Systems
    • /
    • v.6 no.4
    • /
    • pp.481-500
    • /
    • 2010
  • The Internet explosion and the increase in crucial web applications such as ebanking and e-commerce, make essential the need for network security tools. One of such tools is an Intrusion detection system which can be classified based on detection approachs as being signature-based or anomaly-based. Even though intrusion detection systems are well defined, their cooperation with each other to detect attacks needs to be addressed. Consequently, a new architecture that allows them to cooperate in detecting attacks is proposed. The architecture uses Software Agents to provide scalability and distributability. It works in two modes: learning and detection. During learning mode, it generates a profile for each individual system using a fuzzy data mining algorithm. During detection mode, each system uses the FuzzyJess to match network traffic against its profile. The architecture was tested against a standard data set produced by MIT's Lincoln Laboratory and the primary results show its efficiency and capability to detect attacks. Finally, two new methods, the memory-window and memoryless-window, were developed for extracting useful parameters from raw packets. The parameters are used as detection metrics.

Analysis of Online Behavior and Prediction of Learning Performance in Blended Learning Environments

  • JO, Il-Hyun;PARK, Yeonjeong;KIM, Jeonghyun;SONG, Jongwoo
    • Educational Technology International
    • /
    • v.15 no.2
    • /
    • pp.71-88
    • /
    • 2014
  • A variety of studies to predict students' performance have been conducted since educational data such as web-log files traced from Learning Management System (LMS) are increasingly used to analyze students' learning behaviors. However, it is still challenging to predict students' learning achievement in blended learning environment where online and offline learning are combined. In higher education, diverse cases of blended learning can be formed from simple use of LMS for administrative purposes to full usages of functions in LMS for online distance learning class. As a result, a generalized model to predict students' academic success does not fulfill diverse cases of blended learning. This study compares two blended learning classes with each prediction model. The first blended class which involves online discussion-based learning revealed a linear regression model, which explained 70% of the variance in total score through six variables including total log-in time, log-in frequencies, log-in regularities, visits on boards, visits on repositories, and the number of postings. However, the second case, a lecture-based class providing regular basis online lecture notes in Moodle show weaker results from the same linear regression model mainly due to non-linearity of variables. To investigate the non-linear relations between online activities and total score, RF (Random Forest) was utilized. The results indicate that there are different set of important variables for the two distinctive types of blended learning cases. Results suggest that the prediction models and data-mining technique should be based on the considerations of diverse pedagogical characteristics of blended learning classes.

Research of Knowledge Management and Reusability in Streaming Big Data with Privacy Policy through Actionable Analytics (스트리밍 빅데이터의 프라이버시 보호 동반 실용적 분석을 통한 지식 활용과 재사용 연구)

  • Paik, Juryon;Lee, Youngsook
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.3
    • /
    • pp.1-9
    • /
    • 2016
  • The current meaning of "Big Data" refers to all the techniques for value eduction and actionable analytics as well management tools. Particularly, with the advances of wireless sensor networks, they yield diverse patterns of digital records. The records are mostly semi-structured and unstructured data which are usually beyond of capabilities of the management tools. Such data are rapidly growing due to their complex data structures. The complex type effectively supports data exchangeability and heterogeneity and that is the main reason their volumes are getting bigger in the sensor networks. However, there are many errors and problems in applications because the managing solutions for the complex data model are rarely presented in current big data environments. To solve such problems and show our differentiation, we aim to provide the solution of actionable analytics and semantic reusability in the sensor web based streaming big data with new data structure, and to empower the competitiveness.

Analysis of domestic and foreign research trends of Tricholoma matsutake using text mining techniques

  • Choi, Ah Hyeon;Kang, Jun Won
    • Korean Journal of Agricultural Science
    • /
    • v.48 no.3
    • /
    • pp.505-514
    • /
    • 2021
  • Among non-timber forest products, Tricholoma matsutake is a high value added item. Many countries, including Korea, China, and Japan, are doing research and technology development to increase artificial cultivation and productivity. However, the production of T. matsutake is on the decline due to global warming, abnormal temperatures and pine tree pest problems. Therefore, it is necessary to identify trends in domestic and foreign research on T. matsutake, respond to preemptive research and development to preserve the genetic resources of T. matsutake and increase its productivity. Based on the correlation between keywords in the high frequency keywords, it was observed that microbial clusters of T. matsutake are mainly found in Korea. The main focus in China has been the pharmacology studies on the ingredients of T. matsutake. The main focus in Japan has been on preserving the genetic diversity and species of T. matsutake. Thus, future domestic studies of T. matsutake will require pharmacological studies on the ingredients of T. matsutake and on its genetic diversity and species conservation. In addition, unlike China and Japan, genetic keywords did not appear in Korea at high frequency. Therefore, Korea will have to proceed with research using modern molecular biology techniques.

Text-Mining Analyses of News Articles on Schizophrenia (조현병 관련 주요 일간지 기사에 대한 텍스트 마이닝 분석)

  • Nam, Hee Jung;Ryu, Seunghyong
    • Korean Journal of Schizophrenia Research
    • /
    • v.23 no.2
    • /
    • pp.58-64
    • /
    • 2020
  • Objectives: In this study, we conducted an exploratory analysis of the current media trends on schizophrenia using text-mining methods. Methods: First, web-crawling techniques extracted text data from 575 news articles in 10 major newspapers between 2018 and 2019, which were selected by searching "schizophrenia" in the Naver News. We had developed document-term matrix (DTM) and/or term-document matrix (TDM) through pre-processing techniques. Through the use of DTM and TDM, frequency analysis, co-occurrence network analysis, and topic model analysis were conducted. Results: Frequency analysis showed that keywords such as "police," "mental illness," "admission," "patient," "crime," "apartment," "lethal weapon," "treatment," "Jinju," and "residents" were frequently mentioned in news articles on schizophrenia. Within the article text, many of these keywords were highly correlated with the term "schizophrenia" and were also interconnected with each other in the co-occurrence network. The latent Dirichlet allocation model presented 10 topics comprising a combination of keywords: "police-Jinju," "hospital-admission," "research-finding," "care-center," "schizophrenia-symptom," "society-issue," "family-mind," "woman-school," and "disabled-facilities." Conclusion: The results of the present study highlight that in recent years, the media has been reporting violence in patients with schizophrenia, thereby raising an important issue of hospitalization and community management of patients with schizophrenia.

Identifying the Interests of Web Category Visitors Using Topic Analysis (토픽 분석을 활용한 웹 카테고리별 방문자 관심 이슈 식별 방안)

  • Choi, Seongi;Kim, Namgyu
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.4_spc
    • /
    • pp.415-429
    • /
    • 2014
  • With the advent of smart devices, users are able to connect to each other through the Internet without the constraints of time and space. Because the Internet has become increasingly important to users in their everyday lives, reliance on it has grown. As a result, the number of web sites constantly increases and the competition between these sites becomes more intense. Even those sites that operate successfully struggle to establish new strategies for customer retention and customer development in order to survive. Many companies use various customer information in order to establish marketing strategies based on customer group segmentation A method commonly used to determine the customer groups of individual sites is to infer customer characteristics based on the customers' demographic information. However, such information cannot sufficiently represent the real characteristics of customers. For example, users who have similar demographic characteristics could nonetheless have different interests and, therefore, different buying needs. Hence, in this study, customers' interests are first identified through an analysis of their Internet news inquiry records. This information is then integrated in order to identify each web category. The study then analyzes the possibilities for the practical use of the proposed methodology through its application to actual Internet news inquiry records and web site browsing histories.

An Efficient Data Mining Algorithm based on the Database Characteristics (데이터 베이스 특성에 따른 효율적인 데이터 마이닝 알고리즘)

  • Park, Ji-Hyun;Koh, Chan
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.10 no.1
    • /
    • pp.107-119
    • /
    • 2006
  • Recently with developments of an internet and web techniques, the amount of data that are stored in database is increasing rapidly. So the range of adaption in database has been expanded and a research of Data Mining techniques finding useful skills from the huge database has been progressed. Many original algorithms have been developed by cutting down the item set and the size of database isn't required in the entire course of creating frequent item sets. Although those skills could save time in some course, it requires too much time for adapting those techniques in other courses. In this paper, an algorithm is proposed. In an Transaction Database that the length of it's transactions are short or the number of items are relatively small, this algorithm scans a database once by using a Hashing Technique and at the same time, stores all parts of the set, can be appeared at each transaction, in an Hash-table. So without an influence of n minimum percentage of support, it can discover a set of frequent items in more shorter time than the time what is used by an original algorithm.

  • PDF

Ontology Construction of Technological Knowledge for R&D Trend Analysis (연구 개발 트렌드 분석을 위한 기술 지식 온톨로지 구축)

  • Hwang, Mi-Nyeong;Lee, Seungwoo;Cho, Minhee;Kim, Soon Young;Choi, Sung-Pil;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.12
    • /
    • pp.35-45
    • /
    • 2012
  • Researchers and scientists spend huge amount of time in analyzing the previous studies and their results. In order to timely take the advantageous position, they usually analyze various resources such as paper, patents, and Web documents on recent research issues to preoccupy newly emerging technologies. However, it is difficult to select invest-worthy research fields out of huge corpus by using the traditional information search based on keywords and bibliographic information. In this paper, we propose a method for efficient creation, storage, and utilization of semantically relevant information among technologies, products and research agents extracted from 'big data' by using text mining. In order to implement the proposed method, we designed an ontology that creates technological knowledge for semantic web environment based on the relationships extracted by text mining techniques. The ontology was utilized for InSciTe Adaptive, a R&D trends analysis and forecast service which supports the search for the relevant technological knowledge.

OLAP System and Performance Evaluation for Analyzing Web Log Data (웹 로그 분석을 위한 OLAP 시스템 및 성능 평가)

  • 김지현;용환승
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.909-920
    • /
    • 2003
  • Nowadays, IT for CRM has been growing and developed rapidly. Typical techniques are statistical analysis tools, on-line multidimensional analytical processing (OLAP) tools, and data mining algorithms (such neural networks, decision trees, and association rules). Among customer data, web log data is very important and to use these data efficiently, applying OLAP technology to analyze multi-dimensionally. To make OLAP cube, we have to precalculate multidimensional summary results in order to get fast response. But as the number of dimensions and sparse cells increases, data explosion occurs seriously and the performance of OLAP decreases. In this paper, we presented why the web log data sparsity occurs and then what kinds of sparsity patterns generate in the two and t.he three dimensions for OLAP. Based on this research, we set up the multidimensional data models and query models for benchmark with each sparsity patterns. Finally, we evaluated the performance of three OLAP systems (MS SQL 2000 Analysis Service, Oracle Express and C-MOLAP).

  • PDF