• Title/Summary/Keyword: HADOOP

Search Result 398, Processing Time 0.023 seconds

A Study on the Customized Food Menu Recommendation System Based on ICT and Big Data (ICT 및 빅데이터기반 맞춤형 음식메뉴 추천시스템 연구)

  • Ryoo, Hee-Soo;Lee, Man-ting
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.2
    • /
    • pp.339-346
    • /
    • 2021
  • In this paper, we implemented an interface that provides a better food ordering mechanism and enables real-time selection of recipe ingredient ratios for customized food orders from global customers. Providing appropriate food to global customers by arranging a selection of menu on the order system screen that shows the basic ratio of each recipe ingredient and provides a customized recipe ingredient composition ratio by configuring a recipe graph without a system for simply selecting and ordering food menus. By enabling interaction, it allows users to provide customized services through the ratio adjustment of various recipe ingredients in the food menu ordering device

Efficient K-Anonymization Implementation with Apache Spark

  • Kim, Tae-Su;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.17-24
    • /
    • 2018
  • Today, we are living in the era of data and information. With the advent of Internet of Things (IoT), the popularity of social networking sites, and the development of mobile devices, a large amount of data is being produced in diverse areas. The collection of such data generated in various area is called big data. As the importance of big data grows, there has been a growing need to share big data containing information regarding an individual entity. As big data contains sensitive information about individuals, directly releasing it for public use may violate existing privacy requirements. Thus, privacy-preserving data publishing (PPDP) has been actively studied to share big data containing personal information for public use, while preserving the privacy of the individual. K-anonymity, which is the most popular method in the area of PPDP, transforms each record in a table such that at least k records have the same values for the given quasi-identifier attributes, and thus each record is indistinguishable from other records in the same class. As the size of big data continuously getting larger, there is a growing demand for the method which can efficiently anonymize vast amount of dta. Thus, in this paper, we develop an efficient k-anonymity method by using Spark distributed framework. Experimental results show that, through the developed method, significant gains in processing time can be achieved.

Design of Search System Based on Lucene for Minimum Price Products (루씬 기반의 최저가 상품 검색 시스템 설계)

  • Kim, A-Yong;Jeong, Dae-Jin;Gye, Min-Suk;Kim, Chang-Su;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.603-605
    • /
    • 2014
  • Has been switched to the online shopping market in stores of the consumer is from increased utilization and smart devices, the internet popularization. That is why has been converting the user's consumption patterns and consumer culture. Open markets is provides of making a wide variety of events and lowest price policies, safe transactions etc, for attract the consumers of expand distribution channels of the web and via mobile. In this paper, a designs of provides a search system for minimum price product information to the user of Information collect and analyze on sale from open market.

  • PDF

Performance Factor of Distributed Processing of Machine Learning using Spark (스파크를 이용한 머신러닝의 분산 처리 성능 요인)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.19-24
    • /
    • 2021
  • In this paper, we study performance factor of machine learning in the distributed environment using Apache Spark and presents an efficient distributed processing method through experiments. This work firstly presents performance factor when performing machine learning in a distributed cluster by classifying cluster performance, data size, and configuration of spark engine. In addition, performance study of regression analysis using Spark MLlib running on the Hadoop cluster is performed while changing the configuration of the node and the Spark Executor. As a result of the experiment, it was confirmed that the effective number of executors was affected by the number of data blocks, but depending on the cluster size, the maximum and minimum values were limited by the number of cores and the number of worker nodes, respectively.

Big IoT Healthcare Data Analytics Framework Based on Fog and Cloud Computing

  • Alshammari, Hamoud;El-Ghany, Sameh Abd;Shehab, Abdulaziz
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1238-1249
    • /
    • 2020
  • Throughout the world, aging populations and doctor shortages have helped drive the increasing demand for smart healthcare systems. Recently, these systems have benefited from the evolution of the Internet of Things (IoT), big data, and machine learning. However, these advances result in the generation of large amounts of data, making healthcare data analysis a major issue. These data have a number of complex properties such as high-dimensionality, irregularity, and sparsity, which makes efficient processing difficult to implement. These challenges are met by big data analytics. In this paper, we propose an innovative analytic framework for big healthcare data that are collected either from IoT wearable devices or from archived patient medical images. The proposed method would efficiently address the data heterogeneity problem using middleware between heterogeneous data sources and MapReduce Hadoop clusters. Furthermore, the proposed framework enables the use of both fog computing and cloud platforms to handle the problems faced through online and offline data processing, data storage, and data classification. Additionally, it guarantees robust and secure knowledge of patient medical data.

Crowd Psychological and Emotional Computing Based on PSMU Algorithm

  • Bei He
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2119-2136
    • /
    • 2024
  • The rapid progress of social media allows more people to express their feelings and opinions online. Many data on social media contains people's emotional information, which can be used for people's psychological analysis and emotional calculation. This research is based on the simplified psychological scale algorithm of multi-theory integration. It aims to accurately analyze people's psychological emotion. According to the comparative analysis of algorithm performance, the results show that the highest recall rate of the algorithm in this study is 95%, while the highest recall rate of the item response theory algorithm and the social network analysis algorithm is 68% and 87%. The acceleration ratio and data volume of the research algorithm are analyzed. The results show that when 400,000 data are calculated in the Hadoop cluster and there are 8 nodes, the maximum acceleration ratio is 40%. When the data volume is 8GB, the maximum scale ratio of 8 nodes is 43%. Finally, we carried out an empirical analysis on the model that compute the population's psychological and emotional conditions. During the analysis, the psychological simplification scale algorithm was adopted and multiple theories were taken into account. Then, we collected negative comments and expressions about Japan's discharge of radioactive water in microblog and compared them with the trend derived by the model. The results were consistent. Therefore, this research model has achieved good results in the emotion classification of microblog comments.

Research on Regional Smart Farm Data Linkage and Service Utilization (지역 스마트팜 데이터 연계 및 서비스 활용에 대한 연구)

  • Won-Goo Lee;Hyun Jung Koo;Cheol-Joo Chae
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.26 no.2
    • /
    • pp.14-24
    • /
    • 2024
  • To enhance the usability of smart agriculture, methods for utilizing smart farm data are required. Therefore, this study proposes a scheme for utilizing regional smart farm data by linking it to services. The current status of domestic and foreign smart farm data collection and linkage services is analyzed. To collect and link regional smart farm data, necessary data collection, data cleaning, data storage structure and schema, and data storage and linkage systems are proposed. Based on the standards currently being implemented for regional smart farm internal data storage, a farm schema, environmental information schema, facility control information schema, and growth information schema are designed by extending the crop schema and crop main environmental factor information database schema. A data collection and management system structure based on the Hadoop Ecosystem is designed for data collection and management at regional smart farm data centers. Strategies are proposed for utilizing regional smart farm data to provide smart farm productivity improvement and revenue optimization services, image-based crop analysis services, and virtual reality-based smart farm simulation services.

Management Architecture With Multi-modal Ensemble AI Models for Worker Safety

  • Dongyeop Lee;Daesik, Lim;Jongseok Park;Soojeong Woo;Youngho Moon;Aesol Jung
    • Safety and Health at Work
    • /
    • v.15 no.3
    • /
    • pp.373-378
    • /
    • 2024
  • Introduction: Following the Republic of Korea electric power industry site-specific safety management system, this paper proposes a novel safety autonomous platform (SAP) architecture that can automatically and precisely manage on-site safety through ensemble artificial intelligence (AI) models. The ensemble AI model was generated from video information and worker's biometric information as learning data and the estimation results of this model are based on standard operating procedures of the workplace and safety rules. Methods: The ensemble AI model is designed and implemented by the Hadoop ecosystem with Kafka/NiFi, Spark/Hive, HUE, and ELK (Elasticsearch, Logstash, Kibana). Results: The functional evaluation shows that the main function of this SAP architecture was operated successfully. Discussion: The proposed model is confirmed to work well with safety mobility gateways to provide some safety applications.

Update Frequency Reducing Method of Spatio-Temporal Big Data based on MapReduce (MapReduce와 시공간 데이터를 이용한 빅 데이터 크기의 이동객체 갱신 횟수 감소 기법)

  • Choi, Youn-Gwon;Baek, Sung-Ha;Kim, Gyung-Bae;Bae, Hae-Young
    • Spatial Information Research
    • /
    • v.20 no.2
    • /
    • pp.137-153
    • /
    • 2012
  • Until now, many indexing methods that can reduce update cost have been proposed for managing massive moving objects. Because indexing methods for moving objects have to be updated periodically for managing moving objects that change their location data frequently. However these kinds indexing methods occur big load that exceed system capacity when the number of moving objects increase dramatically. In this paper, we propose the update frequency reducing method to combine MapReduce and existing indices. We use the update request grouping method for each moving object by using MapReduce. We decide to update by comparing the latest data and the oldest data in grouping data. We reduce update frequency by updating the latest data only. When update is delayed, for the data should not be lost and updated periodically, we store the data in a certain period of time in the hash table that keep previous update data. By the performance evaluation, we can prove that the proposed method reduces the update frequency by comparison with methods that are not applied the proposed method.

A reviews on the social network analysis using R (R을 이용한 사회연결망 분석에 대한 고찰)

  • Choi, Kyoungho;Yoo, Jin Ah
    • Journal of the Korea Convergence Society
    • /
    • v.6 no.1
    • /
    • pp.77-83
    • /
    • 2015
  • Though the SNA (social network analysis ; SNA) has been used for various fields, esp. social science field, ig. politics, journalism, and science of public administration as well as natural science field, there are few studies about the introduction of analysis tools. In order to perform the SNA, collecting data which are fit for the purpose, statistical values deduction and visualized results made by analysis tool are necessary, but the studies, which explain them systematically, are not sufficient yet. So, in this study, we are intended to introduce the analytic process, from the data input to the interpretation, with proven data. using the R program, which is free, in order to help researchers who have any plan to study using the SNA. The proven data in this study are quoted ones in the domestic scientific journals of food, which are those supplied citation index DB of Korean scientific journals. As a study methodology, the SNA is a new paradigm to substitute existing research methods as well as a complement of statistical analysis. Therefore, this study would contribute to vitalization of the SNA.