• Title/Summary/Keyword: data science department

Search Result 26,592, Processing Time 0.196 seconds

Micro marketing using a cosmetic transaction data (화장품 고객 정보를 이용한 마이크로 마케팅)

  • Seok, Kyoung-Ha;Cho, Dae-Hyeon;Kim, Byung-Soo;Lee, Jong-Un;Paek, Seung-Hun;Jeon, Yu-Joong;Lee, Young-Bae;Kim, Jae-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.535-546
    • /
    • 2010
  • There are two methods in grouping customers for micro marketing promotion. The one is based on how much they paid and the other is based on how many times they purchased. In this study we are interested in the repurchase probability of customers. By analysing the customer's transaction data and demographic data, we develop a forecasting model of repurchase and make epurchase indexes of them. As a modeling tool we use the logistic regression model. Finally we categorize the customers into five groups in according to their repurchase indexes so that we can control customers effectively and get higher profit.

A Scalable Data Integrity Mechanism Based on Provable Data Possession and JARs

  • Zafar, Faheem;Khan, Abid;Ahmed, Mansoor;Khan, Majid Iqbal;Jabeen, Farhana;Hamid, Zara;Ahmed, Naveed;Bashir, Faisal
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2851-2873
    • /
    • 2016
  • Cloud storage as a service provides high scalability and availability as per need of user, without large investment on infrastructure. However, data security risks, such as confidentiality, privacy, and integrity of the outsourced data are associated with the cloud-computing model. Over the year's techniques such as, remote data checking (RDC), data integrity protection (DIP), provable data possession (PDP), proof of storage (POS), and proof of retrievability (POR) have been devised to frequently and securely check the integrity of outsourced data. In this paper, we improve the efficiency of PDP scheme, in terms of computation, storage, and communication cost for large data archives. By utilizing the capabilities of JAR and ZIP technology, the cost of searching the metadata in proof generation process is reduced from O(n) to O(1). Moreover, due to direct access to metadata, disk I/O cost is reduced and resulting in 50 to 60 time faster proof generation for large datasets. Furthermore, our proposed scheme achieved 50% reduction in storage size of data and respective metadata that result in providing storage and communication efficiency.

Spatial Cluster Analysis for Earthquake on the Korean Peninsula

  • Kang, Chang-Wan;Moon, Sung-Ho;Cho, Jang-Sik;Lee, Jeong-Hyeong;Choi, Seung-Bae;Beum, Soo-Gyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1141-1150
    • /
    • 2006
  • In this study, we performed spatial cluster analysis which considered spatial information using earthquake data for Korean peninsula occurred on 1978 year to 2005 year. Also, we look into how to be clustered for regions using earthquake magnitude and frequency based on spatial scan statistic. And, on the basis of the results, we constructed earthquake map by earthquake outbreak risk and gave a possible explanation for the results of spatial cluster analysis.

  • PDF

Building a computing infrastructure in the era of data science (데이터과학 시대에 적합한 컴퓨팅 인프라 구축)

  • Sookhee Choi;Kyungsoo Han;Zhe Wang
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.49-59
    • /
    • 2024
  • The popularity of data science, influenced by the trends from the United States around 2010, has significantly impacted the education of various statistics departments at domestic universities. However, it is challenging to find research papers in domestic academic journals that address the efficient teaching of data science topics in relation to computing environment. This article will discuss and propose the establishment of a suitable computing infrastructure for the education and research in statistics and data science departments in domestic universities.

On the Aggregation of Multi-dimensional Data using Data Cube and MDX

  • Ahn, Jeong-Yong;Kim, Seok-Ki
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.1
    • /
    • pp.37-44
    • /
    • 2003
  • One of the characteristics of both on-line analytical processing(OLAP) applications and decision support systems is to provide aggregated source data. The purpose of this study is to discuss on the aggregation of multi-dimensional data. In this paper, we (1) examine the SQL aggregate functions and the GROUP BY operator, (2) introduce the Data Cube and MDX, (3) present an example for the practical usage of the Data Cube and MDX using sample data.

  • PDF

Obesity Level Prediction Based on Data Mining Techniques

  • Alqahtani, Asma;Albuainin, Fatima;Alrayes, Rana;Al muhanna, Noura;Alyahyan, Eyman;Aldahasi, Ezaz
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.103-111
    • /
    • 2021
  • Obesity affects individuals of all gender and ages worldwide; consequently, several studies have performed great works to define factors causing it. This study develops an effective method to trace obesity levels based on supervised data mining techniques such as Random Forest and Multi-Layer Perception (MLP), so as to tackle this universal epidemic. Notably, the dataset was from countries like Mexico, Peru, and Colombia in the 14- 61year age group, with varying eating habits and physical conditions. The data includes 2111 instances and 17 attributes labelled using NObesity, which facilitates categorization of data using Overweight Levels l I and II, Insufficient Weight, Normal Weight, as well as Obesity Type I to III. This study found that the highest accuracy was achieved by Random Forest algorithm in comparison to the MLP algorithm, with an overall classification rate of 96.7%.

Reliability Estimation in Bivariate Pareto Model with Bivariate Type I Censored Data

  • Cho, Jang-Sik;Cho, Kil-Ho;Kang, Sang-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.837-844
    • /
    • 2003
  • In this paper, we obtain the estimator of system reliability for the bivariate Pareto model with bivariate type 1 censored data. We obtain the estimators and approximated confidence intervals of the reliability for the parallel system based on likelihood function and the relative frequency, respectively. Also we present a numerical example by giving a data set which is generated by computer.

  • PDF

Large Sample Test for Independence in the Bivariate Pareto Model with Censored Data

  • Cho, Jang-Sik;Lee, Jea-Man;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.377-383
    • /
    • 2003
  • In this paper, we consider two components system in which the lifetimes follow the bivariate Pareto model with random censored data. We assume that the censoring time is independent of the lifetimes of the two components. We develop large sample tests for testing independence between two components. Also we present simulated study which is the test based on asymptotic normal distribution in testing independence.

  • PDF

Test for Independence in Bivariate Pareto Model with Bivariate Random Censored Data

  • Cho, Jang-Sik;Kwon, Yong-Man;Choi, Seung-Bae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.31-39
    • /
    • 2004
  • In this paper, we consider two components system which the lifetimes follow bivariate pareto model with bivariate random censored data. We assume that the censoring times are independent of the lifetimes of the two components. We develop large sample test for testing independence between two components. Also we present a simulation study which is the test based on asymptotic normal distribution in testing independence.

  • PDF