• Title/Summary/Keyword: Generate Data

Search Result 3,066, Processing Time 0.034 seconds

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Production Data Analysis to Predict Production Performance of Horizontal Well in a Hydraulically Fractured CBM Reservoir (수압파쇄된 CBM 저류층에서 수평정의 생산 거동예측을 위한 생산자료 분석)

  • Kim, Young-Min;Park, Jin-Young;Han, Jeong-Min;Lee, Jeong-Hwan
    • Journal of the Korean Institute of Gas
    • /
    • v.20 no.3
    • /
    • pp.1-11
    • /
    • 2016
  • Production data from hydraulically fractured well in coalbed methane (CBM) reservoirs was analyzed using decl ine curve analysis (DCA), flow regime analysis, and flowing material balance to forecast the production performance and to determine estimated ultimate recovery (EUR) and timing for applying the DCA. To generate synthetic production data, reservoir models were built based on the CBM propertie of the Appalachian Basin, USA. Production data analysis shows that the transient flow (TF) occurs for 6~16 years and then the boundary dominated flow (BDF) was reached. In the TF period, it is impossible to forecast the production performance due to the significant errors between predicted data and synthetic data. The prediction can be conducted using the production data of more than a year after reached BDF with EUR error of approximately 5%.

Development of Data Warehouse for Construction Material Management (건설공사 자재 관리를 위한 데이터 웨어하우스 개발)

  • Ryu, Han-Guk
    • Journal of the Korea Institute of Building Construction
    • /
    • v.11 no.3
    • /
    • pp.319-325
    • /
    • 2011
  • During a construction project, construction managers must be provided with material information to help them to make decisions more efficiently without delaying the delivery of material. Construction work can be smoothly performed with the proper material supply. Construction duration depends on several material-related decisions, including the order, delivery, and allocation of material to the correct work location. Hence, it is worthwhile to introduce data warehouse techniques that generate subject-oriented and integrated data to construction material management. The data warehouse for construction material management can perform multidimensional analysis and then define KPIs (Key Performance Index) in order to provide construction managers with construction material information such as lead time, material delivery rate, material installation rate and so on. This research proposes a method of effectively facilitating large amounts of data in the operating systems during the construction management process. In other words, the proposed method can supply structured and multi-perspective material-related information using data warehouse techniques.

An Implementation of XML Database System for Semantic-Based E-Catalog Image Retrieval (의미기반 전자 카탈로그 이미지 검색을 위한 XML 데이타베이스 시스템 구현)

  • Hong Sungyong;Nah Yunmook
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.9
    • /
    • pp.1219-1232
    • /
    • 2004
  • Recently, the web sites, such as e-business sites and shopping mall sites, deal with lots of catalog image information and contents. As a result, it is required to support semantic-based image retrieval efficiently on such image data. This paper presents a semantic-based image retrieval system, which adopts XML and Fuzzy technology. To support semantic-based retrieval on product catalog images containing multiple objects, we use a multi-level metadata structure which represents the product information and semantics of image data. To enable semantic-based retrieval on such image data, we design a XML database for storing the proposed metadata and study how to apply fuzzy data. This paper proposes a system, generate the fuzzy data automatically to use the image metadata, that can support semantic-based image retrieval by utilizing the generating fuzzy data. Therefore, it will contribute in improving the retrieval correctness and the user's satisfaction on semantic-based e-catalog image retrieval.

  • PDF

Predictive Convolutional Networks for Learning Stream Data (스트림 데이터 학습을 위한 예측적 컨볼루션 신경망)

  • Heo, Min-Oh;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.614-618
    • /
    • 2016
  • As information on the internet and the data from smart devices are growing, the amount of stream data is also increasing in the real world. The stream data, which is a potentially large data, requires online learnable models and algorithms. In this paper, we propose a novel class of models: predictive convolutional neural networks to be able to perform online learning. These models are designed to deal with longer patterns as the layers become higher due to layering convolutional operations: detection and max-pooling on the time axis. As a preliminary check of the concept, we chose two-month gathered GPS data sequence as an observation sequence. On learning them with the proposed method, we compared the original sequence and the regenerated sequence from the abstract information of the models. The result shows that the models can encode long-range patterns, and can generate a raw observation sequence within a low error.

Efficient Quantitative Association Rules with Parallel Processing (병렬처리를 이용한 효율적인 수량 연관규칙)

  • Lee, Hye-Jung;Hong, Min;Park, Doo-Soon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.8
    • /
    • pp.945-957
    • /
    • 2007
  • Quantitative association rules apply a binary association to the data which have the relatively strong quantitative attributions in a large database system. When a domain range of quantitative data which involve the significant meanings for the association is too broad, a domain requires to be divided into a proper interval which satisfies the minimum support for the generation of large interval items. The reliability of formulated rules is enormously influenced by the generation of large interval items. Therefore, this paper proposes a new method to efficiently generate the large interval items. The proposed method does not lose any meaningful intervals compared to other existing methods, provides the accurate large interval items which are close to the minimum support, and minimizes the loss of characteristics of data. In addition, since our method merges data where the frequency of data is high enough, it provides the fast run time compared with other methods for the broad quantitative domain. To verify the superiority of proposed method, the real national census data are used for the performance analysis and a Clunix HPC system is used for the parallel processing.

  • PDF

A Study of Inverse Modeling from Micro Gas Turbine Experimental Test Data (소형 가스터빈 엔진 실험 데이터를 이용한 역모델링 연구)

  • Kong, Chang-Duk;Lim, Se-Myeong;Koo, Young-Ju;Kim, Keon-Woo;Oh, Seong-Hwan;Kim, Ji-Hyun
    • Proceedings of the Korean Society of Propulsion Engineers Conference
    • /
    • 2009.11a
    • /
    • pp.537-541
    • /
    • 2009
  • The gas turbine engine performance is greatly relied on its component performance characteristics. Generally, acquisition of component maps is not easy for engine purchasers because it is an expensive intellectual property of gas turbine engine supplier. In the previous work, the maps were inversely generated from engine performance deck data, but this method is limited to obtain the realistic maps due to calculated performance deck data. Therefore this work proposes newly to generate more realistic compressor map from experimental performance test data. And then a realistic compressor map can be generated form those processed data using the proposed extended scaling method at each rotational speed. Evaluation can be made through comparison between performance analysis results using the performance simulation program including the generated compressor map and on-condition monitoring performance data.

  • PDF

Refining massive event logs to evaluate performance measures of the container terminal (컨테이너 터미널 성능평가를 위한 대용량 이벤트 로그 정제 방안 연구)

  • Park, Eun-Jung;Bae, Hyerim
    • The Journal of Bigdata
    • /
    • v.4 no.1
    • /
    • pp.11-27
    • /
    • 2019
  • There is gradually being a decrease in earnings rate of the container terminals because of worsened business environment. To enhance global competitiveness of terminal, operators of the container terminal have been attempting to deal with problems of operations through analyzing overall the terminal operations. For improving operations of the container terminal, the operators try to efforts about analyzing and utilizing data from the database which collects and stores data generated during terminal operation in real time. In this paper, we have analyzed the characteristics of operating processes and defined the event log data to generate container processes and CKO processes using stored data in TOS (terminal operating system). And we have explained how imperfect event logs creating non-normal processes are refined effectively by analyzing the container and CKO processes. We also have proposed the framework to refine the event logs easily and fast. To validate the proposed framework we have implemented it using python2.7 and tested it using the data collected from real container terminal as input data. In consequence we could have verified that the non-normal processes in the terminal operations are greatly improved.

  • PDF

Bayesian Approaches to Zero Inflated Poisson Model (영 과잉 포아송 모형에 대한 베이지안 방법 연구)

  • Lee, Ji-Ho;Choi, Tae-Ryon;Wo, Yoon-Sung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.677-693
    • /
    • 2011
  • In this paper, we consider Bayesian approaches to zero inflated Poisson model, one of the popular models to analyze zero inflated count data. To generate posterior samples, we deal with a Markov Chain Monte Carlo method using a Gibbs sampler and an exact sampling method using an Inverse Bayes Formula(IBF). Posterior sampling algorithms using two methods are compared, and a convergence checking for a Gibbs sampler is discussed, in particular using posterior samples from IBF sampling. Based on these sampling methods, a real data analysis is performed for Trajan data (Marin et al., 1993) and our results are compared with existing Trajan data analysis. We also discuss model selection issues for Trajan data between the Poisson model and zero inflated Poisson model using various criteria. In addition, we complement the previous work by Rodrigues (2003) via further data analysis using a hierarchical Bayesian model.

Is it suitable to Use Rainfall Runoff Model with Observed Data for Climate Change Impact Assessment? (관측자료로 추정한 강우유출모형을 기후변화 영향평가에 그대로 활용하여도 되는가?)

  • Poudel, Niroj;Kim, Young-Oh;Kim, Cho-Rong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2011.05a
    • /
    • pp.252-252
    • /
    • 2011
  • Rainfall-runoff models are calibrated and validated by using a same data set such as observations. The past climate change effects the present rainfall pattern and also will effect on the future. To predict rainfall-runoff more preciously we have to consider the climate change pattern in the past, present and the future time. Thus, in this study, the climate change represents changes in mean precipitation and standard deviation in different patterns. In some river basins, there is no enough length of data for the analysis. Therefore, we have to generate the synthetic data using proper distribution for calculation of precipitation based on the observed data. In this study, Kajiyama model is used to analyze the runoff in the dry and the wet period, separately. Mean and standard deviation are used for generating precipitation from the gamma distribution. Twenty hypothetical scenarios are considered to show the climate change conditions. The mean precipitation are changed by -20%, -10%, 0%, +10% and +20% for the data generation with keeping the standard deviation constant in the wet and the dry period respectively. Similarly, the standard deviations of precipitation are changed by -20%, -10%, 0%, +10% and +20% keeping the mean value of precipitation constant for the wet and the dry period sequentially. In the wet period, when the standard deviation value varies then the mean NSE ratio is more fluctuate rather than the dry period. On the other hand, the mean NSE ratio in some extent is more fluctuate in the wet period and sometimes in the dry period, if the mean value of precipitation varies while keeping the standard deviation constant.

  • PDF