• Title/Summary/Keyword: Web Mining

Search Result 550, Processing Time 0.02 seconds

Arab Spring Effects on Meanings for Islamist Web Terms and on Web Hyperlink Networks among Muslim-Majority Nations: A Naturalistic Field Experiment

  • Danowski, James A.;Park, Han Woo
    • Journal of Contemporary Eastern Asia
    • /
    • v.13 no.2
    • /
    • pp.15-39
    • /
    • 2014
  • This research conducted a before/after naturalistic field experiment, with the early Arab Spring as the treatment. Compared to before the early Arab Spring, after the observation period the associations became stronger among the Web terms: 'Jihad, Sharia, innovation, democracy and civil society.' The Western concept of civil society transformed into a central Islamist ideological component. At another level, the inter-nation network based on Jihad-weighted Web hyperlinks between pairs of 46 Muslim Majority (MM) nations found Iran in one of the top two positions of flow betweenness centrality, a measure of network power, both before and after early Arab Spring. In contrast, Somalia, UAE, Egypt, Libya, and Sudan increased most in network flow betweenness centrality. The MM 'Jihad'-centric word co-occurrence network more than tripled in size, and the semantic structure more became entropic. This media "cloud" perhaps billowed as Islamist groups changed their material-level relationships and the corresponding media representations of Jihad among them changed after early Arab Spring. Future research could investigate various rival explanations for this naturalistic field experiment's findings.

International Scientific and Scholarly Communication Networks on World Wide Web (월드와이드웹에 나타난 국제 학술 커뮤니케이션 네트워크에 대한 탐사적 연구)

  • Park, Han-Woo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.37 no.2
    • /
    • pp.153-168
    • /
    • 2003
  • A hyperlink on academic World Wide Web has started to be recognized as a form of collaborative communication network connecting individual researchers and research groups and expanding their collaboration relations by making possible easy and direct online contact among people or groups anywhere in the world. This paper describes the structure of academic hyperlinks embedded in universities' Web sites hosted at the 10 Asian countries and further, examines the association between the structure of the hyperlink network and collaborative communication pattern among those countries based on their frequency of co-authoring articles. This research found that the number of inter-hyperlinks among universities' Web sites was significantly correlated with the frequency of co-authored articles across the 10 countries.

Mining Semantically Similar Tags from Delicious (딜리셔스에서 유사태그 추출에 관한 연구)

  • Yi, Kwan
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.2
    • /
    • pp.127-147
    • /
    • 2009
  • The synonym issue is an inherent barrier in human-computer communication, and it is more challenging in a Web 2.0 application, especially in social tagging applications. In an effort to resolve the issue, the goal of this study is to test the feasibility of a Web 2.0 application as a potential source for synonyms. This study investigates a way of identifying similar tags from a popular collaborative tagging application, Delicious. Specifically, we propose an algorithm (FolkSim) for measuring the similarity of social tags from Delicious. We compared FolkSim to a cosine-based similarity method and observed that the top-ranked tags on the similar list generated by FolkSim tend to be among the best possible similar tags in given choices. Also, the lists appear to be relatively better than the ones created by CosSim. We also observed that tag folksonomy and similar list resemble each other to a certain degree so that it possibly serves as an alternative outcome, especially in case the FolkSim-based list is unavailable or infeasible.

Ontology Construction of Technological Knowledge for R&D Trend Analysis (연구 개발 트렌드 분석을 위한 기술 지식 온톨로지 구축)

  • Hwang, Mi-Nyeong;Lee, Seungwoo;Cho, Minhee;Kim, Soon Young;Choi, Sung-Pil;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.12
    • /
    • pp.35-45
    • /
    • 2012
  • Researchers and scientists spend huge amount of time in analyzing the previous studies and their results. In order to timely take the advantageous position, they usually analyze various resources such as paper, patents, and Web documents on recent research issues to preoccupy newly emerging technologies. However, it is difficult to select invest-worthy research fields out of huge corpus by using the traditional information search based on keywords and bibliographic information. In this paper, we propose a method for efficient creation, storage, and utilization of semantically relevant information among technologies, products and research agents extracted from 'big data' by using text mining. In order to implement the proposed method, we designed an ontology that creates technological knowledge for semantic web environment based on the relationships extracted by text mining techniques. The ontology was utilized for InSciTe Adaptive, a R&D trends analysis and forecast service which supports the search for the relevant technological knowledge.

An SNS and Web based BDAS design for On-Line Marketing Strategy (온라인 마케팅 전략을 위한 SNS와 Web기반 BDAS(Big data Data Analysis Scheme) 설계)

  • Jeong, Yi-Na;Lee, Byung-Kwan;Park, Seok-Gyu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.141-148
    • /
    • 2015
  • This paper proposes the BDAS(Big Data analysis Scheme) design that extracts the real time shared information from SNS and Web, analyzes the extracted data rapidly for customers, and makes an on-line marketing strategy efficiently. First, the BDAS collects the data shared in SNS and Web. Second, it provides the result of visualization by analyzing the semantics of the collected data as positive or negative. Therefore, because the BDAS ensures an average 90% accuracy in judging the semantics about the shared SNA and Web data, it can judge customer's propensity accurately and be used for on-line marketing strategy efficiently.

Web Structure Mining by Extracting Hyperlinks from Web Documents and Access Logs (웹 문서와 접근로그의 하이퍼링크 추출을 통한 웹 구조 마이닝)

  • Lee, Seong-Dae;Park, Hyu-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.11
    • /
    • pp.2059-2071
    • /
    • 2007
  • If the correct structure of Web site is known, the information provider can discover users# behavior patterns and characteristics for better services, and users can find useful information easily and exactly. There may be some difficulties, however, to extract the exact structure of Web site because documents one the Web tend to be changed frequently. This paper proposes new method for extracting such Web structure automatically. The method consists of two phases. The first phase extracts the hyperlinks among Web documents, and then constructs a directed graph to represent the structure of Web site. It has limitations, however, to discover the hyperlinks in Flash and Java Applet. The second phase is to find such hidden hyperlinks by using Web access log. It fist extracts the click streams from the access log, and then extract the hidden hyperlinks by comparing with the directed graph. Several experiments have been conducted to evaluate the proposed method.

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.

Discovering Temporal Relation Rules from Temporal Interval Data (시간간격을 고려한 시간관계 규칙 탐사 기법)

  • Lee, Yong-Joon;Seo, Sung-Bo;Ryu, Keun-Ho;Kim, Hye-Kyu
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.301-314
    • /
    • 2001
  • Data mining refers to a set of techniques for discovering implicit and useful knowledge from large database. Many studies on data mining have been pursued and some of them have involved issues of temporal data mining for discovering knowledge from temporal database, such as sequential pattern, similar time sequence, cyclic and temporal association rules, etc. However, all of the works treat problems for discovering temporal pattern from data which are stamped with time points and do not consider problems for discovering knowledge from temporal interval data. For example, there are many examples of temporal interval data that it can discover useful knowledge from. These include patient histories, purchaser histories, web log, and so on. Allen introduces relationships between intervals and operators for reasoning about relations between intervals. We present a new data mining technique that can discover temporal relation rules in temporal interval data by using the Allen's theory. In this paper, we present two new algorithms for discovering algorithm for generating temporal relation rules, discovers rules from temporal interval data. This technique can discover more useful knowledge in compared with conventional data mining techniques.

  • PDF

Feature Extraction of Web Document using Association Word Mining (연관 단어 마이닝을 사용한 웹문서의 특징 추출)

  • 고수정;최준혁;이정현
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.351-361
    • /
    • 2003
  • The previous studies to extract features for document through word association have the problems of updating profiles periodically, dealing with noun phrases, and calculating the probability for indices. We propose more effective feature extraction method which is using association word mining. The association word mining method, by using Apriori algorithm, represents a feature for document as not single words but association-word-vectors. Association words extracted from document by Apriori algorithm depend on confidence, support, and the number of composed words. This paper proposes an effective method to determine confidence, support, and the number of words composing association words. Since the feature extraction method using association word mining does not use the profile, it need not update the profile, and automatically generates noun phrase by using confidence and support at Apriori algorithm without calculating the probability for index. We apply the proposed method to document classification using Naive Bayes classifier, and compare it with methods of information gain and TFㆍIDF. Besides, we compare the method proposed in this paper with document classification methods using index association and word association based on the model of probability, respectively.

Consumer behavior prediction using Airbnb web log data (에어비앤비(Airbnb) 웹 로그 데이터를 이용한 고객 행동 예측)

  • An, Hyoin;Choi, Yuri;Oh, Raeeun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.391-404
    • /
    • 2019
  • Customers' fixed characteristics have often been used to predict customer behavior. It has recently become possible to track customer web logs as customer activities move from offline to online. It has become possible to collect large amounts of web log data; however, the researchers only focused on organizing the log data or describing the technical characteristics. In this study, we predict the decision-making time until each customer makes the first reservation, using Airbnb customer data provided by the Kaggle website. This data set includes basic customer information such as gender, age, and web logs. We use various methodologies to find the optimal model and compare prediction errors for cases with web log data and without it. We consider six models such as Lasso, SVM, Random Forest, and XGBoost to explore the effectiveness of the web log data. As a result, we choose Random Forest as our optimal model with a misclassification rate of about 20%. In addition, we confirm that using web log data in our study doubles the prediction accuracy in predicting customer behavior compared to not using it.