• Title/Summary/Keyword: Big data processing

Search Result 1,063, Processing Time 0.027 seconds

A Study on the Real-time user purchase pattern analysis User Product Recommendation System in E-Commerce Environment (E-commerce 환경에서 실시간 사용자 구매 패턴 분석을 통한 사용자 상품 추천 시스템 연구)

  • Beom Jung Kim;Ji Hye Huh;Hyeopgeon Lee;Young Woon Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.413-414
    • /
    • 2023
  • IT 기술의 발달로 E-Commerce 분야는 실시간으로 발생되는 데이터양이 증가하고 있으며, 발생된 데이터는 개인화 맞춤 서비스에 많이 활용되고 있다. 그러나 신생 E-commerce 기업은 신규 상품 및 기존 상품에 대한 정보와 고객 간의 상호 작용 데이터가 존재하지 않아 콜드 스타트 문제가 발생한다. 이에 본 논문에서는 E-commerce 환경에서 실시간 사용자 구매패턴 분석을 통한 사용자 상품 추천 시스템을 제안한다. 제안하는 시스템은 Kafka와 Spark를 사용해 실시간 스트림을 데이터를 처리한다. 주요 기능은 ALS 알고리즘과, FP-Growth 알고리즘을 적용해 콜트 스타트 문제를 해결하며, 사용자 구매 패턴 분석을 통한 분석 결과에 맞는 상품을 사용자에게 추천한다.

Research on Implementing Digital Diary Minting Application By Using DALL-E2 and Blockchain (DALL-E2와 블록체인을 활용한 일기 작성 및 NFT 민팅 애플리케이션 구현)

  • Ha-Yoon Kim;Woo-Jung Park;You-Jeen Lee;So-Young Kim;Min-Jae Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.888-889
    • /
    • 2023
  • 문서를 통해 기록을 남기는 과거의 기록 방식은 현대에 이르러 블로그, 인스타그램 등 다양한 SNS를 활용하는 방식으로 변모하고 있다. SNS의 발달과 대중화는 현대인에게 일반적인 일기 작성 포맷으로 자리 잡고 있다. 증가하는 수요와 디지털 기술 혁신에 대비되는 기존의 수동적인 일기 작성 애플리케이션을 대체하기 위해 본 논문은 DALL-E2와 블록체인을 활용한 일기 작성 및 민팅 애플리케이션 구현을 제안한다. 사용자는 제안하는 애플리케이션을 통해 음성인식, 광학 문자인식을 통한 다양한 일기 작성 방식을 제공받고, 완성된 일기 이미지를 디지털 자산으로서 보존할 수 있다.

Analyzing Operation Deviation in the Deasphalting Process Using Multivariate Statistics Analysis Method

  • Park, Joo-Hwang;Kim, Jong-Soo;Kim, Tai-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.7
    • /
    • pp.858-865
    • /
    • 2014
  • In the case of system like MES, various sensors collect the data in real time and save it as a big data to monitor the process. However, if there is big data mining in distributed computing system, whole processing process can be improved. In this paper, system to analyze the cause of operation deviation was built using the big data which has been collected from deasphalting process at the two different plants. By applying multivariate statistical analysis to the big data which has been collected through MES(Manufacturing Execution System), main cause of operation deviation was analyzed. We present the example of analyzing the operation deviation of deasphalting process using the big data which collected from MES by using multivariate statistics analysis method. As a result of regression analysis of the forward stepwise method, regression equation has been found which can explain 52% increase of performance compare to existing model. Through this suggested method, the existing petrochemical process can be replaced which is manual analysis method and has the risk of being subjective according to the tester. The new method can provide the objective analysis method based on numbers and statistic.

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

The Method for Real-time Complex Event Detection of Unstructured Big data (비정형 빅데이터의 실시간 복합 이벤트 탐지를 위한 기법)

  • Lee, Jun Heui;Baek, Sung Ha;Lee, Soon Jo;Bae, Hae Young
    • Spatial Information Research
    • /
    • v.20 no.5
    • /
    • pp.99-109
    • /
    • 2012
  • Recently, due to the growth of social media and spread of smart-phone, the amount of data has considerably increased by full use of SNS (Social Network Service). According to it, the Big Data concept is come up and many researchers are seeking solutions to make the best use of big data. To maximize the creative value of the big data held by many companies, it is required to combine them with existing data. The physical and theoretical storage structures of data sources are so different that a system which can integrate and manage them is needed. In order to process big data, MapReduce is developed as a system which has advantages over processing data fast by distributed processing. However, it is difficult to construct and store a system for all key words. Due to the process of storage and search, it is to some extent difficult to do real-time processing. And it makes extra expenses to process complex event without structure of processing different data. In order to solve this problem, the existing Complex Event Processing System is supposed to be used. When it comes to complex event processing system, it gets data from different sources and combines them with each other to make it possible to do complex event processing that is useful for real-time processing specially in stream data. Nevertheless, unstructured data based on text of SNS and internet articles is managed as text type and there is a need to compare strings every time the query processing should be done. And it results in poor performance. Therefore, we try to make it possible to manage unstructured data and do query process fast in complex event processing system. And we extend the data complex function for giving theoretical schema of string. It is completed by changing the string key word into integer type with filtering which uses keyword set. In addition, by using the Complex Event Processing System and processing stream data at real-time of in-memory, we try to reduce the time of reading the query processing after it is stored in the disk.

Real-time Processing of Manufacturing Facility Data based on Big Data for Smart-Factory (스마트팩토리를 위한 빅데이터 기반 실시간 제조설비 데이터 처리)

  • Hwang, Seung-Yeon;Shin, Dong-Jin;Kwak, Kwang-Jin;Kim, Jeong-Joon;Park, Jeong-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.5
    • /
    • pp.219-227
    • /
    • 2019
  • Manufacturing methods have been changed from labor-intensive methods to technological intensive methods centered on manufacturing facilities. As manufacturing facilities replace human labour, the importance of monitoring and managing manufacturing facilities is emphasized. In addition, Big Data technology has recently emerged as an important technology to discover new value from limited data. Therefore, changes in manufacturing industries have increased the need for smart factory that combines IoT, information and communication technologies, sensor data, and big data. In this paper, we present strategies for existing domestic manufacturing factory to becom big data based smart-factory through technologies for distributed storage and processing of manufacturing facility data in MongoDB in real time and visualization using R programming.

Data Processing Architecture for Cloud and Big Data Services in Terms of Cost Saving (비용절감 측면에서 클라우드, 빅데이터 서비스를 위한 대용량 데이터 처리 아키텍쳐)

  • Lee, Byoung-Yup;Park, Jae-Yeol;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.5
    • /
    • pp.570-581
    • /
    • 2015
  • In recent years, many institutions predict that cloud services and big data will be popular IT trends in the near future. A number of leading IT vendors are focusing on practical solutions and services for cloud and big data. In addition, cloud has the advantage of unrestricted in selecting resources for business model based on a variety of internet-based technologies which is the reason that provisioning and virtualization technologies for active resource expansion has been attracting attention as a leading technology above all the other technologies. Big data took data prediction model to another level by providing the base for the analysis of unstructured data that could not have been analyzed in the past. Since what cloud services and big data have in common is the services and analysis based on mass amount of data, efficient operation and designing of mass data has become a critical issue from the early stage of development. Thus, in this paper, I would like to establish data processing architecture based on technological requirements of mass data for cloud and big data services. Particularly, I would like to introduce requirements that must be met in order for distributed file system to engage in cloud computing, and efficient compression technology requirements of mass data for big data and cloud computing in terms of cost-saving, as well as technological requirements of open-source-based system such as Hadoop eco system distributed file system and memory database that are available in cloud computing.

Spatial Big Data Query Processing System Supporting SQL-based Query Language in Hadoop (Hadoop에서 SQL 기반 질의언어를 지원하는 공간 빅데이터 질의처리 시스템)

  • Joo, In-Hak
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper we present a spatial big data query processing system that can store spatial data in Hadoop and query the data with SQL-based query language. The system stores large-scale spatial data in HDFS-based storage system, and supports spatial queries expressed in SQL-based query language extended for spatial data processing. It supports standard spatial data types and functions defined in OGC simple feature model in the query language. This paper presents the development of core functions of the system including query language parsing, query validation, query planning, and connection with storage system. We compares the performance of the suggested system with an existing system, and our experiments show that the system shows about 58% performance improvement of query execution time over the existing system when executing region query for spatial data stored in Hadoop.

Parallelization of Genome Sequence Data Pre-Processing on Big Data and HPC Framework (빅데이터 및 고성능컴퓨팅 프레임워크를 활용한 유전체 데이터 전처리 과정의 병렬화)

  • Byun, Eun-Kyu;Kwak, Jae-Hyuck;Mun, Jihyeob
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.10
    • /
    • pp.231-238
    • /
    • 2019
  • Analyzing next-generation genome sequencing data in a conventional way using single server may take several tens of hours depending on the data size. However, in order to cope with emergency situations where the results need to be known within a few hours, it is required to improve the performance of a single genome analysis. In this paper, we propose a parallelized method for pre-processing genome sequence data which can reduce the analysis time by utilizing the big data technology and the highperformance computing cluster which is connected to the high-speed network and shares the parallel file system. For the reliability of analytical data, we have chosen a strategy to parallelize the existing analytical tools and algorithms to the new environment. Parallelized processing, data distribution, and parallel merging techniques have been developed and performance improvements have been confirmed through experiments.

An Automatic Urban Function District Division Method Based on Big Data Analysis of POI

  • Guo, Hao;Liu, Haiqing;Wang, Shengli;Zhang, Yu
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.645-657
    • /
    • 2021
  • Along with the rapid development of the economy, the urban scale has extended rapidly, leading to the formation of different types of urban function districts (UFDs), such as central business, residential and industrial districts. Recognizing the spatial distributions of these districts is of great significance to manage the evolving role of urban planning and further help in developing reliable urban planning programs. In this paper, we propose an automatic UFD division method based on big data analysis of point of interest (POI) data. Considering that the distribution of POI data is unbalanced in a geographic space, a dichotomy-based data retrieval method was used to improve the efficiency of the data crawling process. Further, a POI spatial feature analysis method based on the mean shift algorithm is proposed, where data points with similar attributive characteristics are clustered to form the function districts. The proposed method was thoroughly tested in an actual urban case scenario and the results show its superior performance. Further, the suitability of fit to practical situations reaches 88.4%, demonstrating a reasonable UFD division result.