• Title/Summary/Keyword: data analytics

Search Result 555, Processing Time 0.024 seconds

On Building the Solar Dataset Form using the Kaggle Platform: The applicability of Machine Learning (캐글 플랫폼 활용한 태양광 데이터셋 형태 구축: 머신 러닝의 적용 가능성)

  • Ko, Ju-won;Park, Jung-jin;Park, Jin-woo;Oh, Do-hee;Kim, Mincheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.255-258
    • /
    • 2022
  • As environmental pollution continues, attention on renewable energy is on the constant rise in recent days. Although various kinds of renewable energy such as solar, wind power and biomass energy have been generated in Jeju, opening and analyzing cases on related data seem insufficient. Therefore, this study is being conducted to deduce the variables which have high relation with solar panel&s output and to understand machine learning methods that can be applied to solar power generation data by utilizing Kaggle platform, which is actively used by a number of scientists. Then, it is planned to propose a form of solar power generation dataset by researching machine learning methods that could be applied to the data. To be specific, analyzing solar power generation data with the Kaggle platform, this study will provide complements on gathering solar power data in Jeju. This study is anticipated to be utilized on data analysis for developing the solar power industry in Jeju. That is, this study is expected to reveal the room for improvement inherent in existing open datasets in Jeju, so that they could be constructed in a suitable form for machine learning for AI analytics. Through this process, a method to increase efficiency of solar power generation is anticipated to be prepared.

  • PDF

Big Data Analytics in RNA-sequencing (RNA 시퀀싱 기법으로 생성된 빅데이터 분석)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.55 no.4
    • /
    • pp.235-243
    • /
    • 2023
  • As next-generation sequencing has been developed and used widely, RNA-sequencing (RNA-seq) has rapidly emerged as the first choice of tools to validate global transcriptome profiling. With the significant advances in RNA-seq, various types of RNA-seq have evolved in conjunction with the progress in bioinformatic tools. On the other hand, it is difficult to interpret the complex data underlying the biological meaning without a general understanding of the types of RNA-seq and bioinformatic approaches. In this regard, this paper discusses the two main sections of RNA-seq. First, two major variants of RNA-seq are described and compared with the standard RNA-seq. This provides insights into which RNA-seq method is most appropriate for their research. Second, the most widely used RNA-seq data analyses are discussed: (1) exploratory data analysis and (2) pathway enrichment analysis. This paper introduces the most widely used exploratory data analysis for RNA-seq, such as principal component analysis, heatmap, and volcano plot, which can provide the overall trends in the dataset. The pathway enrichment analysis section introduces three generations of pathway enrichment analysis and how they generate enriched pathways with the RNA-seq dataset.

A Study on the Usage Behavior of Universities Library Website Before and After COVID-19: Focusing on the Library of C University (COVID-19 전후 대학도서관 홈페이지 이용행태에 관한 연구: C대학교 도서관을 중심으로)

  • Lee, Sun Woo;Chang, Woo Kwon
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.3
    • /
    • pp.141-174
    • /
    • 2021
  • In this study, by examining the actual usage data of the university library website before and after COVID-19 outbreak, the usage behavior of users was analyzed, and the data before and after the virus outbreak was compared, so that university libraries can provide more efficient information services in a pandemic situation. We would like to suggest ways to improve it. In this study, the user traffic made on the website of University C was 'using Google Analytics', from January 2018 to December 2018 before the oneself of the COVID-19 virus and from January 2020 to 2020 after the outbreak of the virus. A comparative analysis was conducted until December. Web traffic variables were analyzed by classifying them into three characteristics: 'User information', 'Path', and 'Site behavior' based on metrics such as session, user, number of pageviews, number of pages per session time, and bounce rate. To summarize the study results, first, when compared with data from January 1 to January 20 before the oneself of COVID-19, users, new visitors, and sessions all increased compared to the previous year, and the number of sessions per user, number of pageviews, and number of pages per session, which showed an upward trend before the virus outbreak in 2020, increased significantly. Second, as social distancing was upgraded to the second stage, there was also a change in the use of university library websites. In 2020 and 2018, when the number os students was the lowest, the number of page views increased by 100,000 more in 2020 compared to 2018, and the number of pages per session also recorded10.46, which was about 2 more pages compared to 2018. The bounce rate also recorded 14.38 in 2018 and 2019, but decreased by 1 percentage point to 13.05 in 2020, which led to more active use of the website at a time when social distancing was raised.

The Effect of Paid YouTube Channel Membership Motivation on Usage Satisfaction and Continuance Intention: Based on Consumption Value Theory (유료 유튜브 채널멤버십 이용동기가 이용만족과 지속이용의도에 미치는 영향: 소비가치이론을 기반으로)

  • Chengnan Jiang;Ji Yoon Kwon;Sung-Byung Yang
    • Journal of Service Research and Studies
    • /
    • v.13 no.2
    • /
    • pp.181-203
    • /
    • 2023
  • YouTube exhibits a hybrid personality, incorporating traits of both over-the-top (OTT) and personal broadcasting platforms. However, limited research has investigated these hybrid characteristics, particularly in the context of paid YouTube channel memberships. Therefore, building upon consumption value theory and prior literature, this study examines the influence of consumption value factors associated with paid YouTube channel memberships on usage satisfaction and continuance intention. Specifically, the study identifies four perceived consumption value factors (functional, social, emotional, and epistemic values) within the paid YouTube channel membership context and assesses their impact on usage satisfaction and continuance intention. Additionally, the study explores the moderating role of conditional value (the experience of watching live streams on paid YouTube channels) in these relationships. Data was collected via an online survey from Korean adults who subscribed to multiple paid YouTube channel memberships, resulting in 274 responses. The proposed hypotheses were tested using structural equation modeling (SEM). The SEM results indicate that all four consumption value factors significantly influence usage satisfaction, with usage satisfaction in turn positively affecting continuance intention. Furthermore, the study reveals that conditional value moderates the relationships between functional/emotional values and usage satisfaction, as well as between usage satisfaction and continuance intention. This study is the first to focus on YouTube channel paid memberships, which encompass characteristics from both OTT and personal broadcasting platforms. It is anticipated that this research will offer insights to personal broadcasters and stakeholders regarding the motivational factors that impact user satisfaction and encourage subscriptions to channel memberships.

Application of Web Query Information for Forecasting Korean Unemployment Rate (실업률 예측을 위한 인터넷 검색 정보의 활용)

  • Kwon, Chi-Myung;Hwang, Sung-Won;Jung, Jae-Un
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.2
    • /
    • pp.31-39
    • /
    • 2015
  • Unemployment is related to social issues as well as personal economics activity so various policies have been made to reduce the unemployment rate in many countries. Because of delay inherent in the survey mechanism to collect unemployment data, it takes lots of time to acquire survey unemployment data. To develop proper policies for reducing unemployment rate at the right time, it is quite critical to obtain faster and more accurate information concerning about unemployment level. To remedy this problem, recently an advanced analytics utilizing internet queries is suggested. To examine the potential of Web query information, this research investigates the usefulness of internet activity data to predict Korean unemployment rate. One of selected web-query data(unemployment claim) has a quite strong correlation with unemployment rate. This research employes a time series approach of the ARIMA model that utilizes the information of keyword queries provided by the Naver(Korean representative portal site) trend together with unemployment rate data provisioned from Statistics Korea. With respect to model selection guidelines of mean squared error and prediction error, the model with utilizing the web query information shows better results than the model without such information. This suggests that there is a strong potential for the used method, which needs to be further explored.

Mobile App Analytics using Media Repertoire Approach (미디어 레퍼토리를 이용한 스마트폰 애플리케이션 이용 패턴 유형 분석)

  • Kwon, Sung Eun;Jang, Shu In;Hwangbo, Hyunwoo
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.133-154
    • /
    • 2021
  • Today smart phone is the most common media with a vehicle called 'application'. In order to understand how media users select applications and build their repertoire, this study conducted two-step approach using big data from smart phone log for 4 weeks in November 2019, and finally classified 8 media repertoire groups. Each of the eight media repertoire groups showed differences in time spent of mobile application category compared to other groups, and also showed differences between groups in demographic distribution. In addition to the academic contribution of identifying the mobile application repertoire with large scale behavioral data, this study also has significance in proposing a two-step approach that overcomes 'outlier issue' in behavioral data by extracting prototype vectors using SOM (Sefl-Organized Map) and applying it to k-means clustering for optimization of the classification. The study is also meaningful in that it categorizes customers using e-commerce services, identifies customer structure based on behavioral data, and provides practical guides to e-commerce communities that execute appropriate services or marketing decisions for each customer group.

Personalized Session-based Recommendation for Set-Top Box Audience Targeting (셋톱박스 오디언스 타겟팅을 위한 세션 기반 개인화 추천 시스템 개발)

  • Jisoo Cha;Koosup Jeong;Wooyoung Kim;Jaewon Yang;Sangduk Baek;Wonjun Lee;Seoho Jang;Taejoon Park;Chanwoo Jeong;Wooju Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.323-338
    • /
    • 2023
  • TV advertising with deep analysis of watching pattern of audiences is important to set-top box audience targeting. Applying session-based recommendation model(SBR) to internet commercial, or recommendation based on searching history of user showed its effectiveness in previous studies, but applying SBR to the TV advertising was difficult in South Korea due to data unavailabilities. Also, traditional SBR has limitations for dealing with user preferences, especially in data with user identification information. To tackle with these problems, we first obtain set-top box data from three major broadcasting companies in South Korea(SKB, KT, LGU+) through collaboration with Korea Broadcast Advertising Corporation(KOBACO), and this data contains of watching sequence of 4,847 anonymized users for 6 month respectively. Second, we develop personalized session-based recommendation model to deal with hierarchical data of user-session-item. Experiments conducted on set-top box audience dataset and two other public dataset for validation. In result, our proposed model outperformed baseline model in some criteria.

Exploratory Study on Child Abuse Reduction Plan through the Big Data Convergence Analysis (빅데이터 융합분석을 통한 아동학대 감소방안에 관한 탐색적 연구)

  • Hwang, Jun-Soo;Lim, Jong-Yun;Gwon, Sun-young;Noh, Kyoo-Sung;Lee, Joo-Yeoun
    • Journal of Digital Convergence
    • /
    • v.14 no.10
    • /
    • pp.95-105
    • /
    • 2016
  • Recently the problem of child abuses has become a big social issue. According to national statistics data portal, the population under 19 years old is shrinking trend, but the number of child abuse is increasing day ever. However, the number of counseling after calling is a constant level without large fluctuations. Due to the seriousness of the problems, child abuse is even worse despite the research and countermeasures. This study designed a study model on the child abuse based on a preliminary study and suggested plans for reducing child abuse through the big data analytics. When we see a result of test of the hypothesis, abuse actor characteristics, characteristics of children, and employment type were analyzed to have a significant impact on child abuse. Based on such analysis, this research has suggested ways to reduce child abuse, including educational and economic support measures.

Using Big Data and Small Data to Understand Linear Parks - Focused on the 606 Trail, USA and Gyeongchun Line Forest, Korea - (빅데이터와 스몰데이터로 본 선형공원 - 시카고 606 트레일과 서울 경춘선 숲길을 중심으로 -)

  • Sim, Ji-Soo;Oh, Chang Song
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.5
    • /
    • pp.28-41
    • /
    • 2020
  • This study selects two linear parks representing each culture and reveals the differences between them using a visitor survey as small data and social media analytics as big data based on the three components of the model of landscape perception. The 606 in Chicago, U.S., and the Gyeongchun Line in Seoul, Korea, are representative parks built on railroads. A total of 505 surveys were collected from these parks. The responses were analyzed using descriptive statistics, principal component analysis, and linear regression. Also, more than 20,000 tweets which mentioned two linear parks respectively were collected. By using those tweets, the authors conducted the clustering analysis and draw the bigram network diagram for identifying and comparing the placeness of each park. The result suggests that more diverse design concept links to less diversity in behavior; that half of the park users use the park as a shortcut; and that same physical exercise provides different benefits depending on the park. Social media analysis showed the 606 is more closely related to the neighborhoods rather than the Gyeongchun Line Forest. The Gyeongchun Line Forest was a more event-related place than the 606.

Genetic Programming based Manufacutring Big Data Analytics (유전 프로그래밍을 활용한 제조 빅데이터 분석 방법 연구)

  • Oh, Sanghoun;Ahn, Chang Wook
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.31-40
    • /
    • 2020
  • Currently, black-box-based machine learning algorithms are used to analyze big data in manufacturing. This algorithm has the advantage of having high analytical consistency, but has the disadvantage that it is difficult to interpret the analysis results. However, in the manufacturing industry, it is important to verify the basis of the results and the validity of deriving the analysis algorithms through analysis based on the manufacturing process principle. To overcome the limitation of explanatory power as a result of this machine learning algorithm, we propose a manufacturing big data analysis method using genetic programming. This algorithm is one of well-known evolutionary algorithms, which repeats evolutionary operators such as selection, crossover, mutation that mimic biological evolution to find the optimal solution. Then, the solution is expressed as a relationship between variables using mathematical symbols, and the solution with the highest explanatory power is finally selected. Through this, input and output variable relations are derived to formulate the results, so it is possible to interpret the intuitive manufacturing mechanism, and it is also possible to derive manufacturing principles that cannot be interpreted based on the relationship between variables represented by formulas. The proposed technique showed equal or superior performance as a result of comparing and analyzing performance with a typical machine learning algorithm. In the future, the possibility of using various manufacturing fields was verified through the technique.