• Title/Summary/Keyword: data analytics

Search Result 549, Processing Time 0.024 seconds

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

A Study on Policy Priorities for Implementing Big Data Analytics in the Social Security Sector : Adopting AHP Methodology (AHP분석을 활용한 사회보장부문 빅 데이터 활용가능 영역 탐색 연구)

  • Ham, Young-Jin;Ahn, Chang-Won;Kim, Ki-Ho;Park, Gyu-Beom;Kim, Kyoung-June;Lee, Dae-Young;Park, Sun-Mi
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.49-60
    • /
    • 2014
  • The primary purpose of this paper is to find out what issues are important in the Social Security sector, and then, through AHP methodology, this study analyzes what kind of big data methodologies and projects can be implemented to solves these issues. To the aim, this paper first confirmed 8 big data projects from reviewing all issues in the Social Security sector such as administrative works and social policies. After the result of pairwise comparison, policy validity is most important factors rather then effectiveness and practicability. With regard to the priorities among sub-big data projects, the project about preventing improper recipients has come out the most important project in terms of validity, effectiveness and practicability. And the results showed that the project about outreaching and reducing a blind spot on the welfare sector is weighed as a significant project. The results of this paper, in particular 8 sub-big data projects, will be useful to anyone who is interested in using big data and its methodologies for the social welfare sector.

A Study on the Use of Location Data for Exploring Infant's Peer Relationships in Free-Choice Play Activities (자유선택놀이 활동에서 유아 또래관계 탐색을 위한 위치데이터 활용 방안 연구)

  • Kim, Jeong Kyoum;Lee, Sang-Seon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.466-472
    • /
    • 2020
  • The purpose of this study is to explore how to use location data for peer relations of infants in free-choice play activities. For this study, location data was collected using wearable devices for 14 students in one class at an early childhood education institution in Chungnam. For the pre-processing of the collected location data, a smoothing technique was applied to recover missing values during the collection process, and the data was visualized using Python's Matplotlib. Subsequently, the movement distance, distance between infants, and interaction types of infants were extracted from the location data using the formula. As a result of the study, it was possible to derive 1) change in moving distance, cumulative value, average value, 2) change in distance and average distance value between infants, and 3) change and trend in interaction type according to the passage of time. These results can provide valuable information on the process of forming peer groups for infants in situations where it is difficult for a teacher to closely observe all members, and can be used as meaningful information for the design and operation of educational programs.

Big-Data Traffic Analysis for the Campus Network Resource Efficiency (학내 망 자원 효율화를 위한 빅 데이터 트래픽 분석)

  • An, Hyun-Min;Lee, Su-Kang;Sim, Kyu-Seok;Kim, Ik-Han;Jin, Seo-Hoon;Kim, Myung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.3
    • /
    • pp.541-550
    • /
    • 2015
  • The importance of efficient enterprise network management has been emphasized continuously because of the rapid utilization of Internet in a limited resource environment. For the efficient network management, the management policy that reflects the characteristics of a specific network extracted from long-term traffic analysis is essential. However, the long-term traffic data could not be handled in the past and there was only simple analysis with the shot-term traffic data. However, as the big data analytics platforms are developed, the long-term traffic data can be analyzed easily. Recently, enterprise network resource efficiency through the long-term traffic analysis is required. In this paper, we propose the methods of collecting, storing and managing the long-term enterprise traffic data. We define several classification categories, and propose a novel network resource efficiency through the multidirectional statistical analysis of classified long-term traffic. The proposed method adopted to the campus network for the evaluation. The analysis results shows that, for the efficient enterprise network management, the QoS policy must be adopted in different rules that is tuned by time, space, and the purpose.

On Building the Solar Dataset Form using the Kaggle Platform: The applicability of Machine Learning (캐글 플랫폼 활용한 태양광 데이터셋 형태 구축: 머신 러닝의 적용 가능성)

  • Ko, Ju-won;Park, Jung-jin;Park, Jin-woo;Oh, Do-hee;Kim, Mincheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.255-258
    • /
    • 2022
  • As environmental pollution continues, attention on renewable energy is on the constant rise in recent days. Although various kinds of renewable energy such as solar, wind power and biomass energy have been generated in Jeju, opening and analyzing cases on related data seem insufficient. Therefore, this study is being conducted to deduce the variables which have high relation with solar panel&s output and to understand machine learning methods that can be applied to solar power generation data by utilizing Kaggle platform, which is actively used by a number of scientists. Then, it is planned to propose a form of solar power generation dataset by researching machine learning methods that could be applied to the data. To be specific, analyzing solar power generation data with the Kaggle platform, this study will provide complements on gathering solar power data in Jeju. This study is anticipated to be utilized on data analysis for developing the solar power industry in Jeju. That is, this study is expected to reveal the room for improvement inherent in existing open datasets in Jeju, so that they could be constructed in a suitable form for machine learning for AI analytics. Through this process, a method to increase efficiency of solar power generation is anticipated to be prepared.

  • PDF

Big Data Analytics in RNA-sequencing (RNA 시퀀싱 기법으로 생성된 빅데이터 분석)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.55 no.4
    • /
    • pp.235-243
    • /
    • 2023
  • As next-generation sequencing has been developed and used widely, RNA-sequencing (RNA-seq) has rapidly emerged as the first choice of tools to validate global transcriptome profiling. With the significant advances in RNA-seq, various types of RNA-seq have evolved in conjunction with the progress in bioinformatic tools. On the other hand, it is difficult to interpret the complex data underlying the biological meaning without a general understanding of the types of RNA-seq and bioinformatic approaches. In this regard, this paper discusses the two main sections of RNA-seq. First, two major variants of RNA-seq are described and compared with the standard RNA-seq. This provides insights into which RNA-seq method is most appropriate for their research. Second, the most widely used RNA-seq data analyses are discussed: (1) exploratory data analysis and (2) pathway enrichment analysis. This paper introduces the most widely used exploratory data analysis for RNA-seq, such as principal component analysis, heatmap, and volcano plot, which can provide the overall trends in the dataset. The pathway enrichment analysis section introduces three generations of pathway enrichment analysis and how they generate enriched pathways with the RNA-seq dataset.

A Study on the Usage Behavior of Universities Library Website Before and After COVID-19: Focusing on the Library of C University (COVID-19 전후 대학도서관 홈페이지 이용행태에 관한 연구: C대학교 도서관을 중심으로)

  • Lee, Sun Woo;Chang, Woo Kwon
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.3
    • /
    • pp.141-174
    • /
    • 2021
  • In this study, by examining the actual usage data of the university library website before and after COVID-19 outbreak, the usage behavior of users was analyzed, and the data before and after the virus outbreak was compared, so that university libraries can provide more efficient information services in a pandemic situation. We would like to suggest ways to improve it. In this study, the user traffic made on the website of University C was 'using Google Analytics', from January 2018 to December 2018 before the oneself of the COVID-19 virus and from January 2020 to 2020 after the outbreak of the virus. A comparative analysis was conducted until December. Web traffic variables were analyzed by classifying them into three characteristics: 'User information', 'Path', and 'Site behavior' based on metrics such as session, user, number of pageviews, number of pages per session time, and bounce rate. To summarize the study results, first, when compared with data from January 1 to January 20 before the oneself of COVID-19, users, new visitors, and sessions all increased compared to the previous year, and the number of sessions per user, number of pageviews, and number of pages per session, which showed an upward trend before the virus outbreak in 2020, increased significantly. Second, as social distancing was upgraded to the second stage, there was also a change in the use of university library websites. In 2020 and 2018, when the number os students was the lowest, the number of page views increased by 100,000 more in 2020 compared to 2018, and the number of pages per session also recorded10.46, which was about 2 more pages compared to 2018. The bounce rate also recorded 14.38 in 2018 and 2019, but decreased by 1 percentage point to 13.05 in 2020, which led to more active use of the website at a time when social distancing was raised.

The Effect of Paid YouTube Channel Membership Motivation on Usage Satisfaction and Continuance Intention: Based on Consumption Value Theory (유료 유튜브 채널멤버십 이용동기가 이용만족과 지속이용의도에 미치는 영향: 소비가치이론을 기반으로)

  • Chengnan Jiang;Ji Yoon Kwon;Sung-Byung Yang
    • Journal of Service Research and Studies
    • /
    • v.13 no.2
    • /
    • pp.181-203
    • /
    • 2023
  • YouTube exhibits a hybrid personality, incorporating traits of both over-the-top (OTT) and personal broadcasting platforms. However, limited research has investigated these hybrid characteristics, particularly in the context of paid YouTube channel memberships. Therefore, building upon consumption value theory and prior literature, this study examines the influence of consumption value factors associated with paid YouTube channel memberships on usage satisfaction and continuance intention. Specifically, the study identifies four perceived consumption value factors (functional, social, emotional, and epistemic values) within the paid YouTube channel membership context and assesses their impact on usage satisfaction and continuance intention. Additionally, the study explores the moderating role of conditional value (the experience of watching live streams on paid YouTube channels) in these relationships. Data was collected via an online survey from Korean adults who subscribed to multiple paid YouTube channel memberships, resulting in 274 responses. The proposed hypotheses were tested using structural equation modeling (SEM). The SEM results indicate that all four consumption value factors significantly influence usage satisfaction, with usage satisfaction in turn positively affecting continuance intention. Furthermore, the study reveals that conditional value moderates the relationships between functional/emotional values and usage satisfaction, as well as between usage satisfaction and continuance intention. This study is the first to focus on YouTube channel paid memberships, which encompass characteristics from both OTT and personal broadcasting platforms. It is anticipated that this research will offer insights to personal broadcasters and stakeholders regarding the motivational factors that impact user satisfaction and encourage subscriptions to channel memberships.

Application of Web Query Information for Forecasting Korean Unemployment Rate (실업률 예측을 위한 인터넷 검색 정보의 활용)

  • Kwon, Chi-Myung;Hwang, Sung-Won;Jung, Jae-Un
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.2
    • /
    • pp.31-39
    • /
    • 2015
  • Unemployment is related to social issues as well as personal economics activity so various policies have been made to reduce the unemployment rate in many countries. Because of delay inherent in the survey mechanism to collect unemployment data, it takes lots of time to acquire survey unemployment data. To develop proper policies for reducing unemployment rate at the right time, it is quite critical to obtain faster and more accurate information concerning about unemployment level. To remedy this problem, recently an advanced analytics utilizing internet queries is suggested. To examine the potential of Web query information, this research investigates the usefulness of internet activity data to predict Korean unemployment rate. One of selected web-query data(unemployment claim) has a quite strong correlation with unemployment rate. This research employes a time series approach of the ARIMA model that utilizes the information of keyword queries provided by the Naver(Korean representative portal site) trend together with unemployment rate data provisioned from Statistics Korea. With respect to model selection guidelines of mean squared error and prediction error, the model with utilizing the web query information shows better results than the model without such information. This suggests that there is a strong potential for the used method, which needs to be further explored.

Mobile App Analytics using Media Repertoire Approach (미디어 레퍼토리를 이용한 스마트폰 애플리케이션 이용 패턴 유형 분석)

  • Kwon, Sung Eun;Jang, Shu In;Hwangbo, Hyunwoo
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.133-154
    • /
    • 2021
  • Today smart phone is the most common media with a vehicle called 'application'. In order to understand how media users select applications and build their repertoire, this study conducted two-step approach using big data from smart phone log for 4 weeks in November 2019, and finally classified 8 media repertoire groups. Each of the eight media repertoire groups showed differences in time spent of mobile application category compared to other groups, and also showed differences between groups in demographic distribution. In addition to the academic contribution of identifying the mobile application repertoire with large scale behavioral data, this study also has significance in proposing a two-step approach that overcomes 'outlier issue' in behavioral data by extracting prototype vectors using SOM (Sefl-Organized Map) and applying it to k-means clustering for optimization of the classification. The study is also meaningful in that it categorizes customers using e-commerce services, identifies customer structure based on behavioral data, and provides practical guides to e-commerce communities that execute appropriate services or marketing decisions for each customer group.