• Title/Summary/Keyword: Web data mining

Search Result 409, Processing Time 0.029 seconds

A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall (온라인 쇼핑몰의 상품평 자동분류를 위한 감성분석 알고리즘)

  • Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.19-33
    • /
    • 2009
  • With the continuously increasing volume of e-commerce transactions, it is now popular to buy some products and to evaluate them on the World Wide Web. The product reviews are very useful to customers because they can make better decisions based on the indirect experiences obtainable through the reviews. Product Reviews are results expressing customer's sentiments and thus are divided into positive reviews and negative ones. However, as the number of reviews in on-line shopping increases, it is inefficient or sometimes impossible for users to read all the relevant review documents. In this paper, we present a sentiment analysis algorithm for automatically classifying subjective opinions of customer's reviews using opinion mining technology. The proposed algorithm is to focus on product reviews of on-line shopping, and provides summarized results from large product review data by determining whether they are positive or negative. Additionally, this paper introduces an automatic review analysis system implemented based on the proposed algorithm, and also present the experiment results for verifying the efficiency of the algorithm.

  • PDF

The Implementation of eCRM Solution for Design Development (디자인개발을 위한 eCRM솔루션구현)

  • 홍정표;양종열;이유리;오민권
    • Archives of design research
    • /
    • v.15 no.3
    • /
    • pp.271-280
    • /
    • 2002
  • These days information technology and internet have made startling progress. In these developing environments, the strategy or marketing based on existing off-line is getting more difficult to accomplish the role of the improvement of business competitive power, and they are bringing out a lot of changes in information management and marketing performance method about consumers due to digital networking between companies and consumers. These developments and changes make many varieties in the way of design studying methodology. Therefore, in this study, considering the aspects of design, society and environment, after I developed the consumer response framework about products design which is argued by Bloch(1995) ; distinct relationship model among preference degree- design image adjective - design factors, we established design information abstraction solution combined with the interaction based on IT as eCRM in real time. This suggested solutions will provide product designers with good information in finding the design factors which consumers prefer.

  • PDF

Perceptions of Residents in Relation to Smartphone Applications to Promote Understanding of Radiation Exposure after the Fukushima Accident: A Cross-Sectional Study within and outside Fukushima Prefecture

  • Kuroda, Yujiro;Goto, Jun;Yoshida, Hiroko;Takahashi, Takeshi
    • Journal of Radiation Protection and Research
    • /
    • v.47 no.2
    • /
    • pp.67-76
    • /
    • 2022
  • Background: We conducted a cross-sectional study of residents within and outside Fukushima Prefecture to clarify their perceptions of the need for smartphone applications (apps) for explaining exposure doses. The results will lead to more effective methods for identifying target groups for future app development by researchers and municipalities, which will promote residents' understanding of radiological situations. Materials and Methods: In November 2019, 400 people in Fukushima Prefecture and 400 people outside were surveyed via a web-based questionnaire. In addition to basic characteristics, survey items included concerns about radiation levels and intention to use a smartphone app to keep track of exposure. The analysis was conducted by stratifying responses in each region and then cross-tabulating responses to concerns about radiation levels and intention to use an app by demographic variables. The intention to use an app was analyzed by binomial logistic regression analysis. Text-mining analyses were conducted in KH Coder software. Results and Discussion: Outside Fukushima Prefecture, concerns about the medical exposure of women to radiation exceeded 30%. Within the prefecture, the medical exposure of women, purchasing food products, and consumption of own-grown food were the main concerns. Within the prefecture, having children under the age of 18, the experience of measurement, and having experience of evacuation were significantly related to the intention to use an app. Conclusion: Regional and individual differences were evident. Since respondents differ, it is necessary to develop and promote app use in accordance with their needs and with phases of reconstruction. We expect that a suitable app will not only collect data but also connect local service providers and residents, while protecting personal information.

Analysis of Text Mining of Consumer's Personality Implication Words in Review of Used Transaction Application (중고거래 어플리케이션 <당근마켓> 리뷰텍스트에 나타난 소비자의 인성 함축단어 텍스트마이닝 분석)

  • Jung, Yea-Rin;Ju, Young-Ae
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This study analyzes the use and meaning of consumer personality implication words in the review text of the Used Transaction Application . From of May 2021, the data were collected for the past six months by our Web crawler in Seoul and Gyeonggi Province, and a total of 1368 cases were collected first by random sampling, and finally 570 cases were preprocessed. The results are as follows. First, 48.2% of review texts were related to the personality of consumers even though it was a commercial platform of products. Second, the review text is mainly positive, which formed a text network structure based on the keyword 'gratitude'. Third, the review text, which implies consumer character, was divided into two groups: 'extrovert personality' and 'introvert personality' of consumers. And the individuality of the two groups worked together on the platform. In conclusion, we would like to suggest that consumer personality plays an important role in the platform transaction process, that consumer personality will play a role in the services of the platform in the future, and that consumer personality should be studied from various perspectives.

Analysis of media trends related to spent nuclear fuel treatment technology using text mining techniques (텍스트마이닝 기법을 활용한 사용후핵연료 건식처리기술 관련 언론 동향 분석)

  • Jeong, Ji-Song;Kim, Ho-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.33-54
    • /
    • 2021
  • With the fourth industrial revolution and the arrival of the New Normal era due to Corona, the importance of Non-contact technologies such as artificial intelligence and big data research has been increasing. Convergent research is being conducted in earnest to keep up with these research trends, but not many studies have been conducted in the area of nuclear research using artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. This study was conducted to confirm the applicability of data science analysis techniques to the field of nuclear research. Furthermore, the study of identifying trends in nuclear spent fuel recognition is critical in terms of being able to determine directions to nuclear industry policies and respond in advance to changes in industrial policies. For those reasons, this study conducted a media trend analysis of pyroprocessing, a spent nuclear fuel treatment technology. We objectively analyze changes in media perception of spent nuclear fuel dry treatment techniques by applying text mining analysis techniques. Text data specializing in Naver's web news articles, including the keywords "Pyroprocessing" and "Sodium Cooled Reactor," were collected through Python code to identify changes in perception over time. The analysis period was set from 2007 to 2020, when the first article was published, and detailed and multi-layered analysis of text data was carried out through analysis methods such as word cloud writing based on frequency analysis, TF-IDF and degree centrality calculation. Analysis of the frequency of the keyword showed that there was a change in media perception of spent nuclear fuel dry treatment technology in the mid-2010s, which was influenced by the Gyeongju earthquake in 2016 and the implementation of the new government's energy conversion policy in 2017. Therefore, trend analysis was conducted based on the corresponding time period, and word frequency analysis, TF-IDF, degree centrality values, and semantic network graphs were derived. Studies show that before the 2010s, media perception of spent nuclear fuel dry treatment technology was diplomatic and positive. However, over time, the frequency of keywords such as "safety", "reexamination", "disposal", and "disassembly" has increased, indicating that the sustainability of spent nuclear fuel dry treatment technology is being seriously considered. It was confirmed that social awareness also changed as spent nuclear fuel dry treatment technology, which was recognized as a political and diplomatic technology, became ambiguous due to changes in domestic policy. This means that domestic policy changes such as nuclear power policy have a greater impact on media perceptions than issues of "spent nuclear fuel processing technology" itself. This seems to be because nuclear policy is a socially more discussed and public-friendly topic than spent nuclear fuel. Therefore, in order to improve social awareness of spent nuclear fuel processing technology, it would be necessary to provide sufficient information about this, and linking it to nuclear policy issues would also be a good idea. In addition, the study highlighted the importance of social science research in nuclear power. It is necessary to apply the social sciences sector widely to the nuclear engineering sector, and considering national policy changes, we could confirm that the nuclear industry would be sustainable. However, this study has limitations that it has applied big data analysis methods only to detailed research areas such as "Pyroprocessing," a spent nuclear fuel dry processing technology. Furthermore, there was no clear basis for the cause of the change in social perception, and only news articles were analyzed to determine social perception. Considering future comments, it is expected that more reliable results will be produced and efficiently used in the field of nuclear policy research if a media trend analysis study on nuclear power is conducted. Recently, the development of uncontact-related technologies such as artificial intelligence and big data research is accelerating in the wake of the recent arrival of the New Normal era caused by corona. Convergence research is being conducted in earnest in various research fields to follow these research trends, but not many studies have been conducted in the nuclear field with artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. The academic significance of this study is that it was possible to confirm the applicability of data science analysis technology in the field of nuclear research. Furthermore, due to the impact of current government energy policies such as nuclear power plant reductions, re-evaluation of spent fuel treatment technology research is undertaken, and key keyword analysis in the field can contribute to future research orientation. It is important to consider the views of others outside, not just the safety technology and engineering integrity of nuclear power, and further reconsider whether it is appropriate to discuss nuclear engineering technology internally. In addition, if multidisciplinary research on nuclear power is carried out, reasonable alternatives can be prepared to maintain the nuclear industry.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

A Study on the Improvement of Recommendation Accuracy by Using Category Association Rule Mining (카테고리 연관 규칙 마이닝을 활용한 추천 정확도 향상 기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.27-42
    • /
    • 2020
  • Traditional companies with offline stores were unable to secure large display space due to the problems of cost. This limitation inevitably allowed limited kinds of products to be displayed on the shelves, which resulted in consumers being deprived of the opportunity to experience various items. Taking advantage of the virtual space called the Internet, online shopping goes beyond the limits of limitations in physical space of offline shopping and is now able to display numerous products on web pages that can satisfy consumers with a variety of needs. Paradoxically, however, this can also cause consumers to experience the difficulty of comparing and evaluating too many alternatives in their purchase decision-making process. As an effort to address this side effect, various kinds of consumer's purchase decision support systems have been studied, such as keyword-based item search service and recommender systems. These systems can reduce search time for items, prevent consumer from leaving while browsing, and contribute to the seller's increased sales. Among those systems, recommender systems based on association rule mining techniques can effectively detect interrelated products from transaction data such as orders. The association between products obtained by statistical analysis provides clues to predicting how interested consumers will be in another product. However, since its algorithm is based on the number of transactions, products not sold enough so far in the early days of launch may not be included in the list of recommendations even though they are highly likely to be sold. Such missing items may not have sufficient opportunities to be exposed to consumers to record sufficient sales, and then fall into a vicious cycle of a vicious cycle of declining sales and omission in the recommendation list. This situation is an inevitable outcome in situations in which recommendations are made based on past transaction histories, rather than on determining potential future sales possibilities. This study started with the idea that reflecting the means by which this potential possibility can be identified indirectly would help to select highly recommended products. In the light of the fact that the attributes of a product affect the consumer's purchasing decisions, this study was conducted to reflect them in the recommender systems. In other words, consumers who visit a product page have shown interest in the attributes of the product and would be also interested in other products with the same attributes. On such assumption, based on these attributes, the recommender system can select recommended products that can show a higher acceptance rate. Given that a category is one of the main attributes of a product, it can be a good indicator of not only direct associations between two items but also potential associations that have yet to be revealed. Based on this idea, the study devised a recommender system that reflects not only associations between products but also categories. Through regression analysis, two kinds of associations were combined to form a model that could predict the hit rate of recommendation. To evaluate the performance of the proposed model, another regression model was also developed based only on associations between products. Comparative experiments were designed to be similar to the environment in which products are actually recommended in online shopping malls. First, the association rules for all possible combinations of antecedent and consequent items were generated from the order data. Then, hit rates for each of the associated rules were predicted from the support and confidence that are calculated by each of the models. The comparative experiments using order data collected from an online shopping mall show that the recommendation accuracy can be improved by further reflecting not only the association between products but also categories in the recommendation of related products. The proposed model showed a 2 to 3 percent improvement in hit rates compared to the existing model. From a practical point of view, it is expected to have a positive effect on improving consumers' purchasing satisfaction and increasing sellers' sales.

Sentiment Analyses of the Impacts of Online Experience Subjectivity on Customer Satisfaction (감성분석을 이용한 온라인 체험 내 비정형데이터의 주관도가 고객만족에 미치는 영향 분석)

  • Yeeun Seo;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.233-255
    • /
    • 2023
  • The development of information technology(IT) has brought so-called "online experience" to satisfy our daily needs. The market for online experiences grew more during the COVID-19 pandemic. Therefore, this study attempted to analyze how the features of online experience services affect customer satisfaction by crawling structured and unstructured data from the online experience web site newly launched by Airbnb after COVID-19. As a result of the analysis, it was found that the structured data generated by service users on a C2C online sharing platform had a positive effect on the satisfaction of other users. In addition, unstructured text data such as experience introductions and host introductions generated by service providers turned out to have different subjectivity scores depending on the purpose of its text. It was confirmed that the subjective host introduction and the objective experience introduction affect customer satisfaction positively. The results of this study are to provide various implications to stakeholders of the online sharing economy platform and researchers interested in online experience knowledge management.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Re-ranking the Results from Two Image Retrieval System in Cooperative Manner (두 영상검색 시스템의 협력적 이용을 통한 재순위화)

  • Hwang, Joong-Won;Kim, Hyunwoo;Kim, Junmo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.1
    • /
    • pp.7-15
    • /
    • 2014
  • Image retrieval has become a huge part of computer vision and data mining. Although commercial image retrieval systems such as Google show great performances, the improvement on the performances are constantly on demand because of the rapid growth of data on web space. To satisfy the demand, many re-ranking methods, which enhance the performances by reordering retrieved results with independent algorithms, has been proposed. Conventional re-ranking algorithms are based on the assumption that visual patterns are not used on initial image retrieval stage. However, image search engines in present have begun to use the visual and the assumption is required to be reconsidered. Also, though it is possible to suspect that integration of multiple retrieval systems can improve the overall performance, the research on the topic has not been done sufficiently. In this paper, we made the condition that other manner than cooperation cannot improve the ranking result. We evaluate the algorithm on toy model and show that propose module can improve the retrieval results.