• Title/Summary/Keyword: Preprocessed data

Search Result 183, Processing Time 0.027 seconds

Development of a Personalized Music Recommendation System Using MBTI Personality Types and KNN Algorithm

  • Chun-Ok Jang
    • International Journal of Advanced Culture Technology
    • /
    • v.12 no.3
    • /
    • pp.427-433
    • /
    • 2024
  • This study aims to develop a personalized music digital therapeutic based on MBTI personality types and apply it to depression treatment. In the data collection stage, participants' MBTI personality types and music preferences were surveyed to build a database, which was then preprocessed as input data for the KNN model. The KNN model calculates the distance between personality types using Euclidean distance and recommends music suitable for the user's MBTI type based on the nearest K neighbors' data. The developed system was tested with new participants, and the system and algorithm were improved based on user feedback. In the final validation stage, the system's effectiveness in alleviating depression was evaluated. The results showed that the MBTI personality type-based music recommendation system provides a personalized music therapy experience, positively impacting emotional stability and stress reduction. This study suggests the potential of nonpharmacological treatments and demonstrates that a personalized treatment experience can offer more effective and safer methods for treating depression.

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.8
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

  • Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.19-27
    • /
    • 2011
  • Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

Three Dimensional Last Data Generation System Design Utilizing SFFD and LFFD (LFFD 및 SFFD를 이용한 3차원 라스트 데이터 생성시스템 개발)

  • Kim, Si-Kyung;Park, In-Duck
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.2
    • /
    • pp.113-118
    • /
    • 2006
  • A new last design approach based on the Limb line FFD (LFFD) and Scale factor FFD (SFFD) is presented in this paper. The proposed last design method utilizes the dynamic trimmed parametric patches for the measured foot 3D data and last 3D data. Furthermore, the proposed last data generation system utilizes cross sectional data extracted obtained from the measured 3D foot data. First, the last design rule of the LFFD is constructed on the FFD lattice based on foot last shape analysis. Secondly, SFFD is constructed on the LFFD new lattice based on scale factor deformation. The scale factor is constructed on the boundary edges of polygonized patch and the cross section last data boundary edge of the polygon object. Suppose the two boundary curves have been preprocessed so that they run in the same direction and they forms the SF(Scale Factor). In addition, the control points of FFD lattice are derived with cross. sectional data interpolation methods from a finite set of 3D foot data.

A Container Orchestration System for Process Workloads

  • Jong-Sub Lee;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.4
    • /
    • pp.270-278
    • /
    • 2023
  • We propose a container orchestration system for process workloads that combines the potential of big data and machine learning technologies to integrate enterprise process-centric workloads. This proposed system analyzes big data generated from industrial automation to identify hidden patterns and build a machine learning prediction model. For each machine learning case, training data is loaded into a data store and preprocessed for model training. In the next step, you can use the training data to select and apply an appropriate model. Then evaluate the model using the following test data: This step is called model construction and can be performed in a deployment framework. Additionally, a visual hierarchy is constructed to display prediction results and facilitate big data analysis. In order to implement parallel computing of PCA in the proposed system, several virtual systems were implemented to build the cluster required for the big data cluster. The implementation for evaluation and analysis built the necessary clusters by creating multiple virtual machines in a big data cluster to implement parallel computation of PCA. The proposed system is modeled as layers of individual components that can be connected together. The advantage of a system is that components can be added, replaced, or reused without affecting the rest of the system.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance (빅데이터의 정규화 전처리과정이 기계학습의 성능에 미치는 영향)

  • Jo, Jun-Mo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.3
    • /
    • pp.547-552
    • /
    • 2019
  • Recently, the massive growth in the scale of data has been observed as a major issue in the Big Data. Furthermore, the Big Data should be preprocessed for normalization to get a high performance of the Machine learning since the Big Data is also an input of Machine Learning. The performance varies by many factors such as the scope of the columns in a Big Data or the methods of normalization preprocessing. In this paper, the various types of normalization preprocessing methods and the scopes of the Big Data columns will be applied to the SVM(: Support Vector Machine) as a Machine Learning method to get the efficient environment for the normalization preprocessing. The Machine Learning experiment has been programmed in Python and the Jupyter Notebook.

A data fusion method for bridge displacement reconstruction based on LSTM networks

  • Duan, Da-You;Wang, Zuo-Cai;Sun, Xiao-Tong;Xin, Yu
    • Smart Structures and Systems
    • /
    • v.29 no.4
    • /
    • pp.599-616
    • /
    • 2022
  • Bridge displacement contains vital information for bridge condition and performance. Due to the limits of direct displacement measurement methods, the indirect displacement reconstruction methods based on the strain or acceleration data are also developed in engineering applications. There are still some deficiencies of the displacement reconstruction methods based on strain or acceleration in practice. This paper proposed a novel method based on long short-term memory (LSTM) networks to reconstruct the bridge dynamic displacements with the strain and acceleration data source. The LSTM networks with three hidden layers are utilized to map the relationships between the measured responses and the bridge displacement. To achieve the data fusion, the input strain and acceleration data need to be preprocessed by normalization and then the corresponding dynamic displacement responses can be reconstructed by the LSTM networks. In the numerical simulation, the errors of the displacement reconstruction are below 9% for different load cases, and the proposed method is robust when the input strain and acceleration data contains additive noise. The hyper-parameter effect is analyzed and the displacement reconstruction accuracies of different machine learning methods are compared. For experimental verification, the errors are below 6% for the simply supported beam and continuous beam cases. Both the numerical and experimental results indicate that the proposed data fusion method can accurately reconstruct the displacement.

Pallet speed control in a sintering plant using neural networks (신경회로망을 이용한 소결기 팰릿 속도 제어)

  • Jang, Min;Cho, Sung-Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.03a
    • /
    • pp.261-270
    • /
    • 1999
  • Sintering transforms powdered ore into lumped ore so that the latter can be used in a blast furnace. The powdered or combined with coke and other materials is loaded into a container and moved along by a pallet while the ignited coke burns. The speed by which the pallet moves determines how much sintering takes place. Since the process is complicated and lacks an accurate mathematical model, human operators manually control the speed by monitoring various factors in the plant. In this paper, we propose a neural network-based pallet speed controller which copies human operator knowledge. Actual process data were collected from a sintering plant for eight months and preprocessed to remove noisy and inconsistent data. A multilayer perceptron was trained using a back-propagation learning algorithm. In on-line testing at the sinter plant, the proposed model reliably controlled pallet speed during normal operation without the help of human operators. Moreover, the quality and productivity was as good as with human operators.

  • PDF

Analysis of Main Design Factors for Developing a Soil Water Content Sensor Using Impedance Spectroscopy (Impedance Spectroscopy를 이용한 토양 수분함량 센서의 주요 설계인자 분석)

  • Lee, Dong-Hoon;Cho, Yong-Jin;Chang, Young-Chang;Lee, Kyou-Seung
    • Journal of Biosystems Engineering
    • /
    • v.33 no.4
    • /
    • pp.269-275
    • /
    • 2008
  • This study was conducted to design an impedance sensor that can measure soil water content of soils. Partial least square regression (PLSR) was applied to soil impedance data preprocessed with a smoothing method. An optimal sub-spectrum size and wavelength range were determined by comparing the coefficient of determination ($R^2$) and root mean square error (RMSE) of the PLSR models obtained using soil impedance data. various PLS analysis. Based on the PLSR analysis, it would be concluded that the optimal spectrum measurement range was $32.0{\sim}50.0\;MHz$ with the optimal sub-spectrum size of about 18.5 MHz.