• Title/Summary/Keyword: Data sparsity

Search Result 174, Processing Time 0.029 seconds

Web Log Data Sparsity Analysis for OLAP (웹 로그 데이터의 OLAP 연산을 위한 희박성 분석)

  • 김지현;용환승
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.58-60
    • /
    • 2001
  • 하루에도 수십 수백 메가 바이트까지 증가하는 웹 로그 데이터를 이용하여 실시간에 다차원분석을 가능하게 하기 위해서는 OLAP의 적용이 필요하다. 하지만 OLAP을 적용하는데 있어서 빠른 응답시간을 얻기 위해 사전처리(Precomputation)를 수행 할 시 심각한 데이터의 희박성으로 인해 데이터 폭발 현상이 발생된다. 본 논문에서는 실제 웹 로그 데이터를 사용하여 OLAP적용 시 희박성을 일으키는 원인들을 밝히고, 2, 3 차원에서의 희박성 형태를 분석함으로써 웹 로그 데이터의 희박성 처리 방식 및 성능평가에 기반이 되게 한다.

  • PDF

A Study on the Real-Time Preference Prediction for Personalized Recommendation on the Mobile Device (모바일 기기에서 개인화 추천을 위한 실시간 선호도 예측 방법에 대한 연구)

  • Lee, Hak Min;Um, Jong Seok
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.336-343
    • /
    • 2017
  • We propose a real time personalized recommendation algorithm on the mobile device. We use a unified collaborative filtering with reduced data. We use Fuzzy C-means clustering to obtain the reduced data and Konohen SOM is applied to get initial values of the cluster centers. The proposed algorithm overcomes data sparsity since it extends data to the similar users and similar items. Also, it enables real time service on the mobile device since it reduces computing time by data clustering. Applying the suggested algorithm to the MovieLens data, we show that the suggested algorithm has reasonable performance in comparison with collaborative filtering. We developed Android-based smart-phone application, which recommends restaurants with coupons and restaurant information.

The research of new algorithm to improve prediction accuracy of recommender system in electronic commercey

  • Kim, Sun-Ok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.185-194
    • /
    • 2010
  • In recommender systems which are used widely at e-commerce, collaborative filtering needs the information of user-ratings and neighbor user-ratings. These are an important value for recommendation in recommender systems. We investigate the in-formation of rating in NBCFA (neighbor Based Collaborative Filtering Algorithm), we suggest new algorithm that improve prediction accuracy of recommender system. After we analyze relations between two variable and Error Value (EV), we suggest new algorithm and apply it to fitted line. This fitted line uses Least Squares Method (LSM) in Exploratory Data Analysis (EDA). To compute the prediction value of new algorithm, the fitted line is applied to experimental data with fitted function. In order to confirm prediction accuracy of new algorithm, we applied new algorithm to increased sparsity data and total data. As a result of study, the prediction accuracy of recommender system in the new algorithm was more improved than current algorithm.

A Recommender System Using Factorization Machine (Factorization Machine을 이용한 추천 시스템 설계)

  • Jeong, Seung-Yoon;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.18 no.4
    • /
    • pp.707-712
    • /
    • 2017
  • As the amount of data increases exponentially, the recommender system is attracting interest in various industries such as movies, books, and music, and is being studied. The recommendation system aims to propose an appropriate item to the user based on the user's past preference and click stream. Typical examples include Netflix's movie recommendation system and Amazon's book recommendation system. Previous studies can be categorized into three types: collaborative filtering, content-based recommendation, and hybrid recommendation. However, existing recommendation systems have disadvantages such as sparsity, cold start, and scalability problems. To improve these shortcomings and to develop a more accurate recommendation system, we have designed a recommendation system as a factorization machine using actual online product purchase data.

Clustering Method of Weighted Preference Using K-means Algorithm and Bayesian Network for Recommender System (추천시스템을 위한 k-means 기법과 베이시안 네트워크를 이용한 가중치 선호도 군집 방법)

  • Park, Wha-Beum;Cho, Young-Sung;Ko, Hyung-Hwa
    • Journal of Information Technology Applications and Management
    • /
    • v.20 no.3_spc
    • /
    • pp.219-230
    • /
    • 2013
  • Real time accessiblity and agility in Ubiquitous-commerce is required under ubiquitous computing environment. The Research has been actively processed in e-commerce so as to improve the accuracy of recommendation. Existing Collaborative filtering (CF) can not reflect contents of the items and has the problem of the process of selection in the neighborhood user group and the problems of sparsity and scalability as well. Although a system has been practically used to improve these defects, it still does not reflect attributes of the item. In this paper, to solve this problem, We can use a implicit method which is used by customer's data and purchase history data. We propose a new clustering method of weighted preference for customer using k-means clustering and Bayesian network in order to improve the accuracy of recommendation. To verify improved performance of the proposed system, we make experiments with dataset collected in a cosmetic internet shopping mall.

상하분해 단체법에서 수정 Forrest-Tomlin 방법의 효율적인 구현

  • 김우제;임성묵;박순달
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.63-66
    • /
    • 1998
  • In the implementation of the simplex method program, the representation and the maintenance of basis matrix is very important, In the experimental study, we investigates Suhl's idea in the LU factorization and LU update of basis matrix. First, the triangularization of basis matrix is implemented and its efficiency is shown. Second, various technique in the dynamic Markowitz's ordering and threshold pivoting are presented. Third, modified Forrest-Tomlin LU update method exploiting sparsity is presented. Fourth, as a storage scheme of LU factors, Gustavson data structure is explained. Fifth, efficient timing of reinversion is developed. Finally, we show that modified Forrest-Tomlin method with Gustavson data structure is superior more than 30% to the Reid method with linked list data structure.

  • PDF

Adaptive ridge procedure for L0-penalized weighted support vector machines

  • Kim, Kyoung Hee;Shin, Seung Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1271-1278
    • /
    • 2017
  • Although the $L_0$-penalty is the most natural choice to identify the sparsity structure of the model, it has not been widely used due to the computational bottleneck. Recently, the adaptive ridge procedure is developed to efficiently approximate a $L_q$-penalized problem to an iterative $L_2$-penalized one. In this article, we proposed to apply the adaptive ridge procedure to solve the $L_0$-penalized weighted support vector machine (WSVM) to facilitate the corresponding optimization. Our numerical investigation shows the advantageous performance of the $L_0$-penalized WSVM compared to the conventional WSVM with $L_2$ penalty for both simulated and real data sets.

Support Vector Quantile Regression Using Asymmetric e-Insensitive Loss Function

  • Shim, Joo-Yong;Seok, Kyung-Ha;Hwang, Chang-Ha;Cho, Dae-Hyeon
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.2
    • /
    • pp.165-170
    • /
    • 2011
  • Support vector quantile regression(SVQR) is capable of providing a good description of the linear and nonlinear relationships among random variables. In this paper we propose a sparse SVQR to overcome a limitation of SVQR, nonsparsity. The asymmetric e-insensitive loss function is used to efficiently provide sparsity. The experimental results are presented to illustrate the performance of the proposed method by comparing it with nonsparse SVQR.

Sparse Kernel Regression using IRWLS Procedure

  • Park, Hye-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.735-744
    • /
    • 2007
  • Support vector machine(SVM) is capable of providing a more complete description of the linear and nonlinear relationships among random variables. In this paper we propose a sparse kernel regression(SKR) to overcome a weak point of SVM, which is, the steep growth of the number of support vectors with increasing the number of training data. The iterative reweighted least squares(IRWLS) procedure is used to solve the optimal problem of SKR with a Laplacian prior. Furthermore, the generalized cross validation(GCV) function is introduced to select the hyper-parameters which affect the performance of SKR. Experimental results are then presented which illustrate the performance of the proposed procedure.

  • PDF

Big IoT Healthcare Data Analytics Framework Based on Fog and Cloud Computing

  • Alshammari, Hamoud;El-Ghany, Sameh Abd;Shehab, Abdulaziz
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1238-1249
    • /
    • 2020
  • Throughout the world, aging populations and doctor shortages have helped drive the increasing demand for smart healthcare systems. Recently, these systems have benefited from the evolution of the Internet of Things (IoT), big data, and machine learning. However, these advances result in the generation of large amounts of data, making healthcare data analysis a major issue. These data have a number of complex properties such as high-dimensionality, irregularity, and sparsity, which makes efficient processing difficult to implement. These challenges are met by big data analytics. In this paper, we propose an innovative analytic framework for big healthcare data that are collected either from IoT wearable devices or from archived patient medical images. The proposed method would efficiently address the data heterogeneity problem using middleware between heterogeneous data sources and MapReduce Hadoop clusters. Furthermore, the proposed framework enables the use of both fog computing and cloud platforms to handle the problems faced through online and offline data processing, data storage, and data classification. Additionally, it guarantees robust and secure knowledge of patient medical data.