• Title/Summary/Keyword: Data Preprocessing

Search Result 974, Processing Time 0.151 seconds

Implementation of Recipe Recommendation System Using Ingredients Combination Analysis based on Recipe Data (레시피 데이터 기반의 식재료 궁합 분석을 이용한 레시피 추천 시스템 구현)

  • Min, Seonghee;Oh, Yoosoo
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.8
    • /
    • pp.1114-1121
    • /
    • 2021
  • In this paper, we implement a recipe recommendation system using ingredient harmonization analysis based on recipe data. The proposed system receives an image of a food ingredient purchase receipt to recommend ingredients and recipes to the user. Moreover, it performs preprocessing of the receipt images and text extraction using the OCR algorithm. The proposed system can recommend recipes based on the combined data of ingredients. It collects recipe data to calculate the combination for each food ingredient and extracts the food ingredients of the collected recipe as training data. And then, it acquires vector data by learning with a natural language processing algorithm. Moreover, it can recommend recipes based on ingredients with high similarity. Also, the proposed system can recommend recipes using replaceable ingredients to improve the accuracy of the result through preprocessing and postprocessing. For our evaluation, we created a random input dataset to evaluate the proposed recipe recommendation system's performance and calculated the accuracy for each algorithm. As a result of performance evaluation, the accuracy of the Word2Vec algorithm was the highest.

A Study for Snoring Detection Based Artificial Neural Network (신경망 기반의 코골이 검출 알고리즘 개발에 관한 연구)

  • Jang, Won-Kyu;Cho, Sung-Pil;Lee , Kyung-Joung
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.51 no.7
    • /
    • pp.327-333
    • /
    • 2002
  • In this study, we developed a snoring detection algorithm that detects snores automatically. It consists of preprocessing and snoring detection part. The preprocessing part is composed of a noise removal part using spectrum subtraction, and segmentation part, and computation part of temporal and spectral features. And the snoring detection part decides whether detected blocks are snores with BPNN(Back-Propagation Neural Network). BPNN with one hidden layer and one output layer, is trained with data of 7 subjects and tested with data of 11 subjects of total 18 subjects. The proposed algorithm showed a Sensitivity of 90.41% and a Predictive Positive Value of 84.95%.

Tools for Echelle Spectrograph of NYSC 1m Telescope

  • Kang, Wonseok;Kim, Taewoo;Kim, Jeongeun;Shin, Yong Cheol;Yoo, Jihyun;Jeong, Shinu;Choi, Yoonho;Kwon, Sun-gill
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.1
    • /
    • pp.50.1-50.1
    • /
    • 2018
  • We present the development of tools for Echelle spectrograph of NYSC 1-m telescope. The eShel spectrograph(Shelyak) has operated at Deokheung Optical Astronomy Observatory since 2016. We carried out test observation in 2016 and completed the preprocessing and wavelength calibration of the spectroscopic data using IRAF. Based on the reduction process in IRAF, PySpecW, a set of tools for spectroscopic data was developed in 2017. PySpecW was optimized for NYSC 1m telescope, and written in Python for youth to use easily on any OS. PySpecW consists of preprocessing, aperture tracing, aperture extraction, wavelength calibration, and dispersion correction for extracted spectra.

  • PDF

R-to-R Extraction and Preprocessing Procedure for an Automated Diagnosis of Various Diseases from ECG Data

  • Timothy, Vincentius;Prihatmanto, Ary Setijadi;Rhee, Kyung-Hyune
    • Journal of Multimedia Information System
    • /
    • v.3 no.2
    • /
    • pp.1-8
    • /
    • 2016
  • In this paper, we propose a method to automatically diagnose various diseases. The input data consists of electrocardiograph (ECG) recordings. We extract R-to-R interval (RRI) signals from ECG recordings, which are preprocessed to remove trends and ectopic beats, and to keep the signal stationary. After that, we perform some prospective analysis to extract time-domain parameters, frequency-domain parameters, and nonlinear parameters of the signal. Those parameters are unique for each disease and can be used as the statistical symptoms for each disease. Then, we perform feature selection to improve the performance of the diagnosis classifier. We utilize the selected features to diagnose various diseases using machine learning. We subsequently measure the performance of the machine learning classifier to make sure that it will not misdiagnose the diseases. The first two steps, which are R-to-R extraction and preprocessing, have been successfully implemented with satisfactory results.

A study on the increase of user gesture recognition rate using data preprocessing (데이터 전처리를 통한 사용자 제스처 인식률 증가 방안)

  • Kim, Jun Heon;Song, Byung Hoo;Shin, Dong Ryoul
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.07a
    • /
    • pp.13-16
    • /
    • 2017
  • 제스처 인식은 HCI(Human-Computer Interaction) 및 HRI(Human-Robot Interaction) 분야에서 활발히 연구되고 있는 기술이며, 제스처 데이터의 특징을 추출해내고 그에 따른 분류를 통하여 사용자의 제스처를 정확히 판별하는 것이 중요한 과제로 자리 잡았다. 본 논문에서는 EMG(Electromyography) 센서로 측정한 사용자의 손 제스처 데이터를 분석하는 방안에 대하여 서술한다. 수집된 데이터의 노이즈를 제거하고 데이터의 특징을 극대화시키기 위하여 연속적인 데이터로 변환하는 전처리 과정을 거쳐 이를 머신 러닝 알고리즘을 사용하여 분류하였다. 이 때, 기존의 raw 데이터와 전처리 과정을 거친 데이터의 성능을 decision-tree 알고리즘을 통하여 비교하였다.

  • PDF

The research of preprocessing technique of Data Compaction customized to network packet data (네트워크 패킷 데이터 마이닝을 위한 데이터 압축 전처리 기법에 관한 연구)

  • Na, Sang-Hyuck;Lee, Won-Suk
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2009.05a
    • /
    • pp.341-344
    • /
    • 2009
  • 네트워크(Network) 라우터(Router)와 스위치(Switch) 장치에서 수많은 패킷(Packet)이 통과된다. 네트워크에 연결된 컴퓨터가 20대일 경우에 일일 평균 패킷 전송양은 약 400GB 정도에 이른다. 이러한 패킷 데이터를 분석하기 위해서는 수집된 데이터를 디스크 장치에 저장할 수 있는 대규모의 저장공간과 주기적인 백업이 필요하다. 수집된 데이터 원형에는 사용자가 원하는 정보뿐만 아니라 불필요한 정보가 산재해있다. 따라서 수집된 데이터를 원형 그대로 저장하는 것이 아니라 원하는 정보(Information)와 지식(Knowledge)이 유지되고 쉽게 식별될 수 있도록 데이터를 가공해서 요약된 정보를 유지하는 것이 효과적이다. 전 세계적으로 네트워크를 통과하는 패킷 데이터의 양이 헤아릴 수 없을 만큼 증가하고, 인터넷 보급률이 증가함에 따라서 인터넷 사용자 및 소비자의 정보 분석의 필요성이 부각되고 있다. 본 논문에서는 네트워크에서 수집된 패킷 데이터에 적합한 데이터 전처리 기법(preprocessing)을 제안한다.

  • PDF

A Study of the Use of Step by Preprocessing and Dynamic Programming for the Exact Depth Map (정확한 깊이 맵을 위한 전처리 과정과 다이나믹 프로그래밍에 관한 연구)

  • Kim, Young-Seop;Song, Eung-Yeol
    • Journal of the Semiconductor & Display Technology
    • /
    • v.9 no.3
    • /
    • pp.65-69
    • /
    • 2010
  • The stereoscopic vision system is the algorithm to obtain the depth of target object of stereo vision image. This paper presents an efficient disparity matching method using nagao filter, octree color quantization and dynamic programming algorithm. we describe methods for performing color quantization on full color RGB images, using an octree data structure. This method has the advantage of saving a lot of data. We propose a preprocessing stereo matching method based on Nagao-filter algorithm using color information. using the nagao filter, we could obtain effective depth map and using the octree color quantization, we could reduce the time of computation.

Data preprocessing for efficient machine learning (효율적인 기계학습을 위한 데이터 전처리)

  • Kim, Dong-Hyun;Yoo, Seung-Eon;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.49-50
    • /
    • 2019
  • 데이터를 기반으로 한 기계학습은 데이터의 양, 학습 모델, 그리고 데이터의 특징 등 다양한 환경에 민감한 특징을 지니고 있어, 보다 효율적인 기계학습을 위해 데이터의 전처리 과정을 필요로 한다. 데이터의 전처리 과정이란 특징 선택(Feature selection), 노이즈 데이터의 제거, 차원 감소(Demension reduction), 클러스터링(Clustering) 등 보다 효율적인 기계학습을 위한 방법이다. 따라서 본 논문에서는 다양한 환경에서 보다 효율적인 기계학습을 위한 데이터 전처리 기술의 종류 및 간단한 특징에 대해 서술한다.

  • PDF

Image Classification Model using web crawling and transfer learning (웹 크롤링과 전이학습을 활용한 이미지 분류 모델)

  • Lee, JuHyeok;Kim, Mi Hui
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.639-646
    • /
    • 2022
  • In this paper, to solve the large dataset problem, we collect images through an image collection method called web crawling and build datasets for use in image classification models through a data preprocessing process. We also propose a lightweight model that can automatically classify images by adding category values by incorporating transfer learning into the image classification model and an image classification model that reduces training time and achieves high accuracy.

Identifying research trends in the emergency medical technician field using topic modeling (토픽모델링을 활용한 응급구조사 관련 연구동향)

  • Lee, Jung Eun;Kim, Moo-Hyun
    • The Korean Journal of Emergency Medical Services
    • /
    • v.26 no.2
    • /
    • pp.19-35
    • /
    • 2022
  • Purpose: This study aimed to identify research topics in the emergency medical technician (EMT) field and examine research trends. Methods: In this study, 261 research papers published between January 2000 and May 2022 were collected, and EMT research topics and trends were analyzed using topic modeling techniques. This study used a text mining technique and was conducted using data collection flow, keyword preprocessing, and analysis. Keyword preprocessing and data analysis were done with the RStudio Version 4.0.0 program. Results: Keywords were derived through topic modeling analysis, and eight topics were ultimately identified: patient treatment, various roles, the performance of duties, cardiopulmonary resuscitation, triage systems, job stress, disaster management, and education programs. Conclusion: Based on the research results, it is believed that a study on the development and application of education programs that can successfully increase the emergency care capabilities of EMTs is needed.