• Title/Summary/Keyword: Web data mining

Search Result 409, Processing Time 0.026 seconds

A study on the User Experience at Unmanned Checkout Counter Using Big Data Analysis (빅데이터 분석을 통한 무인계산대 사용자 경험에 관한 연구)

  • Kim, Ae-sook;Jung, Sun-mi;Ryu, Gi-hwan;Kim, Hee-young
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.343-348
    • /
    • 2022
  • This study aims to analyze the user experience of unmanned checkout counters perceived by consumers using SNS big data. For this study, blogs, news, intellectuals, cafes, intellectuals (tips), and web documents were analyzed on Naver and Daum, and 'unmanned checkpoints' were used as keywords for data search. The data analysis period was selected as two years from January 1, 2020 to December 31, 2021. For data collection and analysis, frequency and matrix data were extracted through Textom, and network analysis and visualization analysis were conducted using the NetDraw function of the UCINET 6 program. As a result, the perception of the checkout counter was clustered into accessibility, usability, continuous use intention, and others according to the definition of consumers' experience factors. From a supplier's point of view, if unmanned checkpoints spread indiscriminately to solve the problem of raising the minimum wage and shortening working hours, a bigger employment problem will arise from a social point of view. In addition, institutionalization is needed to supply easy and convenient unmanned checkout counters for the elderly and younger generations, children, and foreigners who are not familiar with unmanned calculation.

Sentiment Analysis of Product Reviews to Identify Deceptive Rating Information in Social Media: A SentiDeceptive Approach

  • Marwat, M. Irfan;Khan, Javed Ali;Alshehri, Dr. Mohammad Dahman;Ali, Muhammad Asghar;Hizbullah;Ali, Haider;Assam, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.830-860
    • /
    • 2022
  • [Introduction] Nowadays, many companies are shifting their businesses online due to the growing trend among customers to buy and shop online, as people prefer online purchasing products. [Problem] Users share a vast amount of information about products, making it difficult and challenging for the end-users to make certain decisions. [Motivation] Therefore, we need a mechanism to automatically analyze end-user opinions, thoughts, or feelings in the social media platform about the products that might be useful for the customers to make or change their decisions about buying or purchasing specific products. [Proposed Solution] For this purpose, we proposed an automated SentiDecpective approach, which classifies end-user reviews into negative, positive, and neutral sentiments and identifies deceptive crowd-users rating information in the social media platform to help the user in decision-making. [Methodology] For this purpose, we first collected 11781 end-users comments from the Amazon store and Flipkart web application covering distant products, such as watches, mobile, shoes, clothes, and perfumes. Next, we develop a coding guideline used as a base for the comments annotation process. We then applied the content analysis approach and existing VADER library to annotate the end-user comments in the data set with the identified codes, which results in a labelled data set used as an input to the machine learning classifiers. Finally, we applied the sentiment analysis approach to identify the end-users opinions and overcome the deceptive rating information in the social media platforms by first preprocessing the input data to remove the irrelevant (stop words, special characters, etc.) data from the dataset, employing two standard resampling approaches to balance the data set, i-e, oversampling, and under-sampling, extract different features (TF-IDF and BOW) from the textual data in the data set and then train & test the machine learning algorithms by applying a standard cross-validation approach (KFold and Shuffle Split). [Results/Outcomes] Furthermore, to support our research study, we developed an automated tool that automatically analyzes each customer feedback and displays the collective sentiments of customers about a specific product with the help of a graph, which helps customers to make certain decisions. In a nutshell, our proposed sentiments approach produces good results when identifying the customer sentiments from the online user feedbacks, i-e, obtained an average 94.01% precision, 93.69% recall, and 93.81% F-measure value for classifying positive sentiments.

Weighted Subject - Method Network Analysis of Library and Information Science Studies (문헌정보학 분야 핵심 학술지들의 가중 주제-방법 네트워크 분석)

  • Lee, Keehoen;Jung, Hyojung;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.49 no.3
    • /
    • pp.457-488
    • /
    • 2015
  • In this study, we analyzed the current research state of Library and Information science in top 20 journals from 1990 to 2015, in subject and method perspectives. We developed weighted subject-method network to investigate on centralities of a subject and a method as well as their relations. This network is composed of subject nodes and method nodes and gives a weight on each node by topic occurrence. As a result, for 25 years, management information system, information need analysis, bibliometrics, information policy were top topics. Modeling, literature review, scientific research impact analysis, web data analysis were top methods. A recent rise of text mining is highlighted. We also analyzed communities made from the past 25 years and the recent 5 years. Bibliometrics is extending its field by applying various network analyzing algorithms. Text mining is specialized in medical information system and user interface. This result identifies the interests of excellent studies in Library and Information Science. It also can be fundamental resource for the development of Library and Information Science.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Challenges in Construction of Omics data integration, and its standardization (농생명 오믹스데이터 통합 및 표준화)

  • Kim, Do-Wan;Lee, Tae-Ho;Kim, Chang-Kug;Seol, Young-Joo;Lee, Dong-Jun;Oh, Jae-Hyeon;Beak, Jung-Ho;Kim, Juna;Lee, Hong-Ro
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.768-770
    • /
    • 2015
  • We performed integration and standardization of the omics data related agriculture. To do this, we requires progressed computational methods and bioinformatics infrastructures for integration, standardization, mining, and analysis. It makes easier biological knowledge to find. we potentialize registration a row and processed data in NABIC (National Agricultural Biotechnology Information Center) and its processed analysis results were offered related researchers. And we also provided various analysis pipelines, NGS analysis (Reference assembly, RNA-seq), GWAS, Microbial community analysis. In addition, the our system was carried out based on the design and build the quality assurance in management omics information system and constructed the infrastructure for utilization of omics analyze system. We carried out major improvement quality of omics information system. First is Improvement quality of registration category for omics based information. Second is data processing and development platform for web UI about related omics data. Third is development of proprietary management information for omics registration database. Forth is management and development of the statistics module producers about omics data. Last is Improvement the standard upload/ download module for Large omics Registration information.

  • PDF

Informatics analysis of consumer reviews for 「Frozen 2」 fashion collaboration products - Semantic networks and sentiment analysis - (「겨울왕국2」의 콜라보레이션 패션제품에 대한 소비자 리뷰 - 의미 네트워크와 감성분석 -)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • The Research Journal of the Costume Culture
    • /
    • v.28 no.2
    • /
    • pp.265-284
    • /
    • 2020
  • This study aimed to analyze the performance of Disney-collaborated fashion lines based on online consumer reviews. To do so, the researchers employed text mining and network analysis to identify key words in the reviews of these products. Blogs, internet cafes, and web documents provided by Naver, Daum, and YoutTube were selected as subjects for the analysis. The analysis period was limited to one year after for the 2019. Data collection and analysis were conducted using Python 3.7, Textom, and NodeXL. The research terms in question were as follows: 'Disney fashion collaboration' and 'Frozen fashion collaboration'. Preliminary survey results indicated that 'Elsa's dress' was the most frequently mentioned term and that the domestic fashion brand Eland Retail was the most active in selling Disney branded clothing through its own brand. The writers of reviews for Disney-collaborated fashion products were primarily mothers with daughters. Their decision to purchase these products was based upon the following factors; price, size, stability of decoration, shipping, laundry, and retailer. The motives for purchasing the product were the positive response of the consumer's child and the satisfaction of the parents due to the child's response. The problems to be solved included insufficient quantity of supply, delay in delivery, expensive price considering the number of times children's clothes are worn, poor glitter decoration, faded color, contamination from laundry, and undesirable smells immediately after the purchase.

A Usage Pattern Analysis of the Academic Database Using Social Network Analysis in K University Library (사회 네트워크 분석에 기반한 도서관 학술DB 이용 패턴 연구: K대학도서관 학술DB 이용 사례)

  • Choi, Il-Young;Lee, Yong-Sung;Kim, Jae-Kyeong
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.1
    • /
    • pp.25-40
    • /
    • 2010
  • The purpose of this study is to analyze the usage pattern between each academic database through social network analysis, and to support the academic database for users's needs. For this purpose, we have extracted log data to construct the academic database networks in the proxy server of K university library and have analyzed the usage pattern among each research area and among each social position. Our results indicate that the specialized academic database for the research area has more cohesion than the generalized academic database in the full-time professors' network and the doctoral students' network, and the density, degree centrality and degree centralization of the full-time professors' network and the doctoral students' network are higher than those of the other social position networks.

Comparison of Readability between Documents in the Community Question-Answering (질의응답 커뮤니티에서 문서 간 이독성 비교)

  • Mun, Gil-Seong
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.25-34
    • /
    • 2020
  • Community question and answering service is one of the main sources of information and knowledge in the Web. The quality of information in question and answer documents is determined by the clarity of the question and the relevance of the answers, and the readability of a document is a key factor for evaluating the quality. This study is to measure the quality of documents used in community question and answering service. For this purpose, we compare the frequency of occurrence by vocabulary level used in community documents and measure the readability index of documents by institution of author. To measure the readability index, we used the Dale-Chall formula which is calculated by vocabulary level and sentence length. The results show that the vocabulary used in the answers is more difficult than in the questions and the sentence length is longer. The gap in readability between questions and answers is also found by writing institution. The results of this study can be used as basic data for improving online counseling services.

Evaluation of Thyroid Cancer Medical Information Sites using HONCODE (HONCODE를 근거로 한 갑상선암에 대한 의료정보 제공사이트의 질 평가)

  • Heo, Jun;Jung, Yong Gyu;Sihn, Sung Chul;Kim, Jang Il
    • Journal of Service Research and Studies
    • /
    • v.3 no.2
    • /
    • pp.45-52
    • /
    • 2013
  • With the development of information and communication technology, the Internet is more important in the social and economic influence rapidly, and it is no different in the field of health care. As health information on the Internet increasing, the availabilities of health information from the Internet becomes more important with health care professionals and information specialists. the quality of health information on the Internet are continually being presented without any guarantee or judge on the quality. It is needed to provide the right to use of qualified health information through Internet. HONCODE has been established and managed by HON (Health On the Net) Foundation. In this paper, Web sites of thyroid cancer Information are evaluated using HONCODE. They provide domestic medical information on the Internet. Through this, more accuracy and evaluated information could be provided on the Internet about the thyroid cancer.

  • PDF

Statistical Profiles of Users' Interactions with Videos in Large Repositories: Mining of Khan Academy Repository

  • Yassine, Sahar;Kadry, Seifedine;Sicilia, Miguel Angel
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2101-2121
    • /
    • 2020
  • The rapid growth of instructional videos repositories and their widespread use as a tool to support education have raised the need of studies to assess the quality of those educational resources and their impact on the quality of learning process that depends on them. Khan Academy (KA) repository is one of the prominent educational videos' repositories. It is famous and widely used by different types of learners, students and teachers. To better understand its characteristics and the impact of such repositories on education, we gathered a huge amount of KA data using its API and different web scraping techniques, then we analyzed them. This paper reports the first quantitative and descriptive analysis of Khan Academy repository (KA repository) of open video lessons. First, we described the structure of repository. Then, we demonstrated some analyses highlighting content-based growth and evolution. Those descriptive analyses spotted the main important findings in KA repository. Finally, we focused on users' interactions with video lessons. Those interactions consisted of questions and answers posted on videos. We developed interaction profiles for those videos based on the number of users' interactions. We conducted regression analysis and statistical tests to mine the relation between those profiles and some quality related proposed metrics. The results of analysis showed that all interaction profiles are highly affected by video length and reuse rate in different subjects. We believe that our study demonstrated in this paper provides valuable information in understanding the logic and the learning mechanism inside learning repositories, which can have major impacts on the education field in general, and particularly on the informal learning process and the instructional design process. This study can be considered as one of the first quantitative studies to shed the light on Khan Academy as an open educational resources (OER) repository. The results presented in this paper are crucial in understanding KA videos repository, its characteristics and its impact on education.