• 제목/요약/키워드: Big data Problem

검색결과 574건 처리시간 0.024초

GEase-K: Linear and Nonlinear Autoencoder-based Recommender System with Side Information (GEase-K: 부가 정보를 활용한 선형 및 비선형 오토인코더 기반의 추천시스템)

  • Taebeom Lee;Seung-hak Lee;Min-jeong Ma;Yoonho Cho
    • Journal of Intelligence and Information Systems
    • /
    • 제29권3호
    • /
    • pp.167-183
    • /
    • 2023
  • In the recent field of recommendation systems, various studies have been conducted to model sparse data effectively. Among these, GLocal-K(Global and Local Kernels for Recommender Systems) is a research endeavor combining global and local kernels to provide personalized recommendations by considering global data patterns and individual user characteristics. However, due to its utilization of kernel tricks, GLocal-K exhibits diminished performance on highly sparse data and struggles to offer recommendations for new users or items due to the absence of side information. In this paper, to address these limitations of GLocal-K, we propose the GEase-K (Global and EASE kernels for Recommender Systems) model, incorporating the EASE(Embarrassingly Shallow Autoencoders for Sparse Data) model and leveraging side information. Initially, we substitute EASE for the local kernel in GLocal-K to enhance recommendation performance on highly sparse data. EASE, functioning as a simple linear operational structure, is an autoencoder that performs highly on extremely sparse data through regularization and learning item similarity. Additionally, we utilize side information to alleviate the cold-start problem. We enhance the understanding of user-item similarities by employing a conditional autoencoder structure during the training process to incorporate side information. In conclusion, GEase-K demonstrates resilience in highly sparse data and cold-start situations by combining linear and nonlinear structures and utilizing side information. Experimental results show that GEase-K outperforms GLocal-K based on the RMSE and MAE metrics on the highly sparse GoodReads and ModCloth datasets. Furthermore, in cold-start experiments divided into four groups using the GoodReads and ModCloth datasets, GEase-K denotes superior performance compared to GLocal-K.

Development of a Post-Processor for Three-Dimensional Forging Analysis (3차원 단조해석용 후처리기 개발)

  • 정완진;최석우
    • Transactions of Materials Processing
    • /
    • 제12권6호
    • /
    • pp.542-549
    • /
    • 2003
  • Three-dimensional forging analysis becomes an inevitable tool to make design process more reliable and more producible. In this study, in order to make the investigation for three-dimensional forging analysis more conveniently and accurately, a new post processor was developed. For post-processing of multi-stage forging simulation, efficient data structure was proposed and applied by using STL. New file architecture was developed to handle successive and huge data efficiently, common in three-dimensional forging analysis. Since sectioning and flow tracing plays an important role in the investigation of analysis result, we developed an algorithm suitable for 4-node and 10-node tetrahedron. This flow tracing algorithm can trace and reverse-trace flow through remeshing. Developed program shows good performance and functionality. Especially, a big size problem can be handled easily due to proposed data structure and file architecture.

An Enhanced University Registration Model Using Distributed Database Schema

  • Maabreh, Khaled Saleh
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권7호
    • /
    • pp.3533-3549
    • /
    • 2019
  • A big database utilizes the establishing network technology, and it became an emerging trend in the computing field. Therefore, there is a necessity for an optimal and effective data distribution approach to deal with this trend. This research presents the practical perspective of designing and implementing distributed database features. The proposed system has been establishing the satisfying, reliable, scalable, and standardized use of information. Furthermore, the proposed scheme reduces the vast and recurring efforts for designing an individual system for each university, as well as it is effectively participating in solving the course equivalence problem. The empirical finding in this study shows the superiority of the distributed system performance based on the average response time and the average waiting time than the centralized system. The system throughput also overcomes the centralized system because of data distribution and replication. Therefore, the analyzed data shows that the centralized system thrashes when the workload exceeds 60%, while the distributed system becomes thrashes after 81% workload.

A study of creative human judgment through the application of machine learning algorithms and feature selection algorithms

  • Kim, Yong Jun;Park, Jung Min
    • International journal of advanced smart convergence
    • /
    • 제11권2호
    • /
    • pp.38-43
    • /
    • 2022
  • In this study, there are many difficulties in defining and judging creative people because there is no systematic analysis method using accurate standards or numerical values. Analyze and judge whether In the previous study, A study on the application of rule success cases through machine learning algorithm extraction, a case study was conducted to help verify or confirm the psychological personality test and aptitude test. We proposed a solution to a research problem in psychology using machine learning algorithms, Data Mining's Cross Industry Standard Process for Data Mining, and CRISP-DM, which were used in previous studies. After that, this study proposes a solution that helps to judge creative people by applying the feature selection algorithm. In this study, the accuracy was found by using seven feature selection algorithms, and by selecting the feature group classified by the feature selection algorithms, and the result of deriving the classification result with the highest feature obtained through the support vector machine algorithm was obtained.

Forecasting Energy Consumption of Steel Industry Using Regression Model (회귀 모델을 활용한 철강 기업의 에너지 소비 예측)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권2호
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.

Design of a High-Speed Data Packet Allocation Circuit for Network-on-Chip (NoC 용 고속 데이터 패킷 할당 회로 설계)

  • Kim, Jeonghyun;Lee, Jaesung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국정보통신학회 2022년도 추계학술대회
    • /
    • pp.459-461
    • /
    • 2022
  • One of the big differences between Network-on-Chip (NoC) and the existing parallel processing system based on an off-chip network is that data packet routing is performed using a centralized control scheme. In such an environment, the best-effort packet routing problem becomes a real-time assignment problem in which data packet arriving time and processing time is the cost. In this paper, the Hungarian algorithm, a representative computational complexity reduction algorithm for the linear algebraic equation of the allocation problem, is implemented in the form of a hardware accelerator. As a result of logic synthesis using the TSMC 0.18um standard cell library, the area of the circuit designed through case analysis for the cost distribution is reduced by about 16% and the propagation delay of it is reduced by about 52%, compared to the circuit implementing the original operation sequence of the Hungarian algorithm.

  • PDF

Exploratory Study on Child Abuse Reduction Plan through the Big Data Convergence Analysis (빅데이터 융합분석을 통한 아동학대 감소방안에 관한 탐색적 연구)

  • Hwang, Jun-Soo;Lim, Jong-Yun;Gwon, Sun-young;Noh, Kyoo-Sung;Lee, Joo-Yeoun
    • Journal of Digital Convergence
    • /
    • 제14권10호
    • /
    • pp.95-105
    • /
    • 2016
  • Recently the problem of child abuses has become a big social issue. According to national statistics data portal, the population under 19 years old is shrinking trend, but the number of child abuse is increasing day ever. However, the number of counseling after calling is a constant level without large fluctuations. Due to the seriousness of the problems, child abuse is even worse despite the research and countermeasures. This study designed a study model on the child abuse based on a preliminary study and suggested plans for reducing child abuse through the big data analytics. When we see a result of test of the hypothesis, abuse actor characteristics, characteristics of children, and employment type were analyzed to have a significant impact on child abuse. Based on such analysis, this research has suggested ways to reduce child abuse, including educational and economic support measures.

Developing a Deep Learning-based Restaurant Recommender System Using Restaurant Categories and Online Consumer Review (레스토랑 카테고리와 온라인 소비자 리뷰를 이용한 딥러닝 기반 레스토랑 추천 시스템 개발)

  • Haeun Koo;Qinglong Li;Jaekyeong Kim
    • Information Systems Review
    • /
    • 제25권1호
    • /
    • pp.27-46
    • /
    • 2023
  • Research on restaurant recommender systems has been proposed due to the development of the food service industry and the increasing demand for restaurants. Existing restaurant recommendation studies extracted consumer preference information through quantitative information or online review sensitivity analysis, but there is a limitation that it cannot reflect consumer semantic preference information. In addition, there is a lack of recommendation research that reflects the detailed attributes of restaurants. To solve this problem, this study proposed a model that can learn the interaction between consumer preferences and restaurant attributes by applying deep learning techniques. First, the convolutional neural network was applied to online reviews to extract semantic preference information from consumers, and embedded techniques were applied to restaurant information to extract detailed attributes of restaurants. Finally, the interaction between consumer preference and restaurant attributes was learned through the element-wise products to predict the consumer preference rating. Experiments using an online review of Yelp.com to evaluate the performance of the proposed model in this study confirmed that the proposed model in this study showed excellent recommendation performance. By proposing a customized restaurant recommendation system using big data from the restaurant industry, this study expects to provide various academic and practical implications.

Method for the evaluation of Unit Load of Road­-Section CO2 Emission Based on Individual Speed Data (개별 속도자료기반 도로구간 CO2 배출량 원단위 산정 방안)

  • Park, Chahgwha;Yoon, Byoungjo;Chang, Hyunho
    • Journal of the Society of Disaster Information
    • /
    • 제13권1호
    • /
    • pp.96-105
    • /
    • 2017
  • Global warming, mainly caused by CO2, is one of the on­going cataclysms of the human race. The nation­wide policy to reduce greenhouse gases (GHG) has been enforced, for which it is crucial to estimate reliable GHG emissions. The unit load of road­section CO2 emission (URSCE) is a prerequisite for the evaluation of GHG emissions from road mobile source, and it is mainly computed using vehicular velocity source. Unfortunately, there is real­world limitations to collect and analyse representative speed data for nation­wide road network. To tackle this problem, a method for the evaluation of URSCE, proposed in this study, is based on a disaggregated way using big GPS vehicle data. The method yields more accurate URSCE than an current aggregated data based approach and can be directly employed for nation­wide road systems.

Application of access control policy in ScienceDMZ-based network configuration (ScienceDMZ 기반의 네트워크 구성에서 접근제어정책 적용)

  • Kwon, Woo Chang;Lee, Jae Kwang;Kim, Ki Hyeon
    • Convergence Security Journal
    • /
    • 제21권2호
    • /
    • pp.3-10
    • /
    • 2021
  • Nowadays, data-based scientific research is a trend, and the transmission of large amounts of data has a great influence on research productivity. To solve this problem, a separate network structure for transmitting large-scale scientific big data is required. ScienceDMZ is a network structure designed to transmit such scientific big data. In such a network configuration, it is essential to establish an access control list(ACL) for users and resources. In this paper, we describe the R&E Together project and the network structure implemented in the actual ScienceDMZ network structure, and define users and services to which access control policies are applied for safe data transmission and service provision. In addition, it presents a method for the network administrator to apply the access control policy to all network resources and users collectively, and through this, it was possible to achieve automation of the application of the access control policy.