• Title/Summary/Keyword: web data mining

Search Result 412, Processing Time 0.028 seconds

Trend Analysis on Clothing Care System of Consumer from Big Data (빅데이터를 통한 소비자의 의복관리방식 트렌드 분석)

  • Koo, Young Seok
    • Fashion & Textile Research Journal
    • /
    • v.22 no.5
    • /
    • pp.639-649
    • /
    • 2020
  • This study investigates consumer opinions of clothing care and provides fundamental data to decision-making for oncoming development of clothing care system. Textom, a web-matrix program, was used to analyze big data collected from Naver and Daum with a keyword of "clothing care" from March 2019 to February 2020. A total of 22, 187 texts were shown from the big data collection. Collected big data were analyzed using text-mining, network, and CONCOR analysis. The results of this study were as follows. First, many keywords related to clothing care were shown from the result of frequency analysis such as style, Dryer, LG Electronics, Product, Customer, Clothing, and Styler. Consumers were well recognizing and having an interest in recent information related to the clothing care system. Second, various keywords such as product, function, brand, and performance, were linked to each other which were fundamentally related to the clothing care. The interest in products of the clothing care system were linked to product brands that were also naturally linked to consumer interest. Third, the keywords in the network showed similar attributes from the result of CONCOR analysis that were classified into 4 groups such as the characteristics of purchase, product, performance, and interest. Lastly, positive emotions including goodwill, interest, and joy on the clothing care system were strongly expressed from the result of the sentimental analysis.

A Design and Implementation of Intelligent Image Retrieval System using Hybrid Image Metadata (혼합형 이미지 메타데이타를 이용한 지능적 이미지 검색 시스템 설계 및 구현)

  • 홍성용;나연묵
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.3
    • /
    • pp.209-223
    • /
    • 2000
  • As the importance and utilization of multimedia data increases, it becomes necessary to represent and manage multimedia data within database systems. In this paper, we designed and implemented an image retrieval system which support efficient management and intelligent retrieval of image data using concept hierarchy and data mining techniques. We stored the image information intelligently in databases using concept hierarchy. To support intelligent retrievals and efficient web services, our system automatically extracts and stores the user information, the user's query information, and the feature data of images. The proposed system integrates user metadata and image metadata to support various retrieval methods on image data.

  • PDF

An Efficient Large Graph Clustering Technique based on Min-Hash (Min-Hash를 이용한 효율적인 대용량 그래프 클러스터링 기법)

  • Lee, Seok-Joo;Min, Jun-Ki
    • Journal of KIISE
    • /
    • v.43 no.3
    • /
    • pp.380-388
    • /
    • 2016
  • Graph clustering is widely used to analyze a graph and identify the properties of a graph by generating clusters consisting of similar vertices. Recently, large graph data is generated in diverse applications such as Social Network Services (SNS), the World Wide Web (WWW), and telephone networks. Therefore, the importance of graph clustering algorithms that process large graph data efficiently becomes increased. In this paper, we propose an effective clustering algorithm which generates clusters for large graph data efficiently. Our proposed algorithm effectively estimates similarities between clusters in graph data using Min-Hash and constructs clusters according to the computed similarities. In our experiment with real-world data sets, we demonstrate the efficiency of our proposed algorithm by comparing with existing algorithms.

Internet Information Orientation: The Link to National Competitiveness on Internet

  • Song, In Kuk;Kang, Mingoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3028-3039
    • /
    • 2015
  • Recently, the web index of Korea peaked at the top 10 among the eighty six countries, and Korea became the only Asian country ranked at the top level. Korea also has been on the top in the field of Internet penetration rate, in terms of both high-speed broadband and wireless Internet. However, such achievements did not guarantee the national level for the effective use of information utilizing Internet. According to OECD, the national informatization index of Korea has not been free from the middle of the OECD countries. Despite of the heightened pressure in practically enhancing effective information use utilizing Internet, the previous research interests and efforts to develop the Internet-related framework or to identify Internet capabilities rarely existed. The study aims to propose the framework, named "Internet Information Orientation" that illustrates the relationship between Internet capabilities and national competitiveness on Internet. The research identified the specific Internet capabilities, reclassified the capabilities based on the research issues provided at the 6th international conference on Internet held in December 2014, and finally described the rigorous research endeavors on the issues. As a result, 16 papers presented and selected as the outstanding papers at the conference handle issues to be brought together, which include: Wireless Network, Internet of Things, Green Computing, Multimedia Processing, Big Data and Text Mining, Database in Cloud Environment, Business Intelligence, Software Engineering, IT Strategy & Policy, and Social Network Services.

Clustering Representative Annotations for Image Browsing (이미지 브라우징 처리를 위한 전형적인 의미 주석 결합 방법)

  • Zhou, Tie-Hua;Wang, Ling;Lee, Yang-Koo;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.62-65
    • /
    • 2010
  • Image annotations allow users to access a large image database with textual queries. But since the surrounding text of Web images is generally noisy. an efficient image annotation and retrieval system is highly desired. which requires effective image search techniques. Data mining techniques can be adopted to de-noise and figure out salient terms or phrases from the search results. Clustering algorithms make it possible to represent visual features of images with finite symbols. Annotationbased image search engines can obtains thousands of images for a given query; but their results also consist of visually noise. In this paper. we present a new algorithm Double-Circles that allows a user to remove noise results and characterize more precise representative annotations. We demonstrate our approach on images collected from Flickr image search. Experiments conducted on real Web images show the effectiveness and efficiency of the proposed model.

  • PDF

Rating Individual Food Items of Restaurant Menu based on Online Customer Reviews using Text Mining Technique (신뢰성있는 온라인 고객 리뷰 텍스트 마이닝 기반 식당 개별 음식 아이템 평가)

  • Syed, Muzamil Hussain;Chung, Sun-Tae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.389-392
    • /
    • 2020
  • The growth in social media, blogs and restaurant listing directories have led to increasing customer reviews about restaurants, their quality of food items and services available on the internet. These user reviews offer a massive amount of valuable information that can be used for various decision-making purposes. Currently, most food recommendation sites provide recommendation scores about restaurants rather than food items of the restaurant and the provided recommendation scores may be biased since they are calculated only from user reviews listed only in their sites. Usually, people wants a reliable recommendation about foods, not restaurant. In this paper, we present a reliable Korean food items rating method; we first extract food items by applying NER technique to restaurant reviews collected from many Korean restaurant recommendation web sites, blogs and web data. Then, we apply lexicon-based sentiment analysis on collected user reviews and predict people's opinions as sentiment polarity scores (+1 for positive; -1 for negative; 0 for neutral). Finally, by taking average of all calculated polarity scores about a food item, we obtain a rating to individual menu items of the restaurant. The proposed food item rating is more reliable since it does not depend on reviews of only one site.

Clustering Corporate Brands based on Opinion Mining: A Case Study of the Automobile Industry (오피니언 마이닝을 통한 브랜드 클러스터링: 자동차 산업 사례연구)

  • Hwang, Hyun-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.453-462
    • /
    • 2016
  • Since the Internet provides a way of expressing and sharing Internet users' mindsets, corporate marketers want to acquire measurable and actionable insights from web data. In the past, companies used to analyze the attitude, satisfaction, and loyalty of consumers toward their brands using survey data, whereas nowadays this is done using the big data extracted from Social Network Services. In this study, we propose a framework for clustering brand names using the social metrics gathered on social media. We also conduct a case study of the automobile industry to verify the feasibility of the proposed framework. We calculate the brand name distance for each pair of brand names based on the total number of times that they are mentioned together. These distances are used to project the brand name onto a 3-dimensional space using multidimensional scaling. After the projection, we found the clusters of brand names and identified the characteristics of each cluster. Furthermore, we concluded this paper with a discussion of the limitations and future directions of this research.

An Exploratory Study on Key Attributes of Specialty Coffee by Online Big Data Analysis (온라인 빅 데이터 분석을 활용한 스페셜티 커피 속성에 대한 탐색적 연구)

  • Lim, Miri;Wun, Daiyeol;Ryu, Gihwan
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.275-282
    • /
    • 2020
  • Social interest on high-quality specialty coffee is increased due to customers' growing experience upon coffee and recent change of coffee culture, which is taking one step further from putting emphasis on not just price and quality but also psychological satisfaction. As a culture of drinking coffee and giving much value on its taste and flavor, a number of customers increasingly demand coffee which is probable to suit one's taste. Likewise, the number of specialty coffee shops is increasing with growing qualities of their coffee. Therefore, the purpose of this study is to analyze the main attributes of specialty coffee and to build a marketing system for specialty coffee shops. The text mining on domestic web portal sites by online big-data analysis is used to extract components of properties of specialty coffee and analyze the degree of how the elements affect the properties. According to the result of the study, words related to coffee taste, coffee beans and baristas were found to play a central role in the properties of specialty coffee.

A Topic Modeling Analysis for Online News Article Comments on Nurses' Workplace Bullying (간호사의 직장 내 괴롭힘 관련 온라인 뉴스기사 댓글에 대한 토픽 모델링 분석)

  • Kang, Jiyeon;Kim, Soogyeong;Roh, Seungkook
    • Journal of Korean Academy of Nursing
    • /
    • v.49 no.6
    • /
    • pp.736-747
    • /
    • 2019
  • Purpose: This study aimed to explore public opinion on workplace bullying in the nursing field, by analyzing the keywords and topics of online news comments. Methods: This was a text-mining study that collected, processed, and analyzed text data. A total of 89,951 comments on 650 online news articles, reported between January 1, 2013 and July 31, 2018, were collected via web crawling. The collected unstructured text data were preprocessed and keyword analysis and topic modeling were performed using R programming. Results: The 10 most important keywords were "work" (37121.7), "hospital" (25286.0), "patients" (24600.8), "woman" (24015.6), "physician" (20840.6), "trouble" (18539.4), "time" (17896.3), "money" (16379.9), "new nurses" (14056.8), and "salary" (13084.1). The 22,572 preprocessed key words were categorized into four topics: "poor working environment", "culture among women", "unfair oppression", and "society-level solutions". Conclusion: Public interest in workplace bullying among nurses has continued to increase. The public agreed that negative work environment and nursing shortage could cause workplace bullying. They also considered nurse bullying as a problem that should be resolved at a societal level. It is necessary to conduct further research through gender discrimination perspectives on nurse workplace bullying and the social value of nursing work.

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.