• Title/Summary/Keyword: Decision-tree

Search Result 1,677, Processing Time 0.027 seconds

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Data-driven Co-Design Process for New Product Development: A Case Study on Smart Heating Jacket (신제품 개발을 위한 데이터 기반 공동 디자인 프로세스: 스마트 난방복 사례 연구)

  • Leem, Sooyeon;Lee, Sang Won
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.133-141
    • /
    • 2021
  • This research suggests a design process that effectively complements the human-centered design through an objective data-driven approach. The subjective human-centered design process can often lack objectivity and can be supplemented by the data-driven approaches to effectively discover hidden user needs. This research combines the data mining analysis with co-design process and verifies its applicability through the case study on the smart heating jacket. In the data mining process, the clustering can group the users which is the basis for selecting the target groups and the decision tree analysis primarily identifies the important user perception attributes and values. The broad point of view based on the data analysis is modified through the co-design process which is the deeper human-centered design process by using the developed workbook. In the co-design process, the journey maps, needs and pain points, ideas, values for the target user groups are identified and finalized. They can become the basis for starting new product development.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

Establishment of WBS·CBS-based Construction Information Classification System for Efficient Construction Cost Analysis and Prediction of High-tech Facilities (하이테크 공장의 효율적 건설 사업비 분석 및 예측을 위한 WBS·CBS 기반 건설정보 분류체계 구축)

  • Choi, Seong Hoon;Kim, Jinchul;Kwon, Soonwook
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.8
    • /
    • pp.356-366
    • /
    • 2021
  • The high-tech industry, a leader in the national economy, has a larger investment cost compared to general buildings, a shorter construction period, and requires continuous investment. Therefore, accurate construction cost prediction and quick decision-making are important factors for efficient cost and process management. Overseas, the construction information classification system has been standardized since 1980 and has been continuously developed, improving construction productivity by systematically collecting and utilizing project life cycle information. At domestic construction sites, attempts have been made to standardize the classification system of construction information, but it is difficult to achieve continuous standardization and systematization due to the absence of a standardization body and differences in cost and process management methods for each construction company. Particular, in the case of the high-tech industry, the standardization and systematization level of the construction information classification system for high-tech facility construction is very low due to problems such as large scale, numerous types of work, complex construction and security. Therefore, the purpose of this study is to construct a construction information classification system suitable for high-tech facility construction through collection, classification, and analysis of related project data constructed in Korea. Based on the WBS (Work Breakdown Structure) and CBS (Cost Breakdown Structure) classified and analyzed through this study, a code system through hierarchical classification was proposed, and the cost model of buildings by linking WBS and CBS was three-dimensionalized and the utilized method was presented. Through this, an information classification system based on inter-relationships can be developed beyond the one-way tree structure, which is a general construction information classification system, and effects such as shortening of construction period and cost reduction will be maximized.

A Development of Defeat Prediction Model Using Machine Learning in Polyurethane Foaming Process for Automotive Seat (머신러닝을 활용한 자동차 시트용 폴리우레탄 발포공정의 불량 예측 모델 개발)

  • Choi, Nak-Hun;Oh, Jong-Seok;Ahn, Jong-Rok;Kim, Key-Sun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.6
    • /
    • pp.36-42
    • /
    • 2021
  • With recent developments in the Fourth Industrial Revolution, the manufacturing industry has changed rapidly. Through key aspects of Fourth Industrial Revolution super-connections and super-intelligence, machine learning will be able to make fault predictions during the foam-making process. Polyol and isocyanate are components in polyurethane foam. There has been a lot of research that could affect the characteristics of the products, depending on the specific mixture ratio and temperature. Based on these characteristics, this study collects data from each factor during the foam-making process and applies them to machine learning in order to predict faults. The algorithms used in machine learning are the decision tree, kNN, and an ensemble algorithm, and these algorithms learn from 5,147 cases. Based on 1,000 pieces of data for validation, the learning results show up to 98.5% accuracy using the ensemble algorithm. Therefore, the results confirm the faults of currently produced parts by collecting real-time data from each factor during the foam-making process. Furthermore, control of each of the factors may improve the fault rate.

Development of prediction model identifying high-risk older persons in need of long-term care (장기요양 필요 발생의 고위험 대상자 발굴을 위한 예측모형 개발)

  • Song, Mi Kyung;Park, Yeongwoo;Han, Eun-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • In aged society, it is important to prevent older people from being disability needing long-term care. The purpose of this study is to develop a prediction model to discover high-risk groups who are likely to be beneficiaries of Long-Term Care Insurance. This study is a retrospective study using database of National Health Insurance Service (NHIS) collected in the past of the study subjects. The study subjects are 7,724,101, the population over 65 years of age registered for medical insurance. To develop the prediction model, we used logistic regression, decision tree, random forest, and multi-layer perceptron neural network. Finally, random forest was selected as the prediction model based on the performances of models obtained through internal and external validation. Random forest could predict about 90% of the older people in need of long-term care using DB without any information from the assessment of eligibility for long-term care. The findings might be useful in evidencebased health management for prevention services and can contribute to preemptively discovering those who need preventive services in older people.

Metabolic Diseases Classification Models according to Food Consumption using Machine Learning (머신러닝을 활용한 식품소비에 따른 대사성 질환 분류 모델)

  • Hong, Jun Ho;Lee, Kyung Hee;Lee, Hye Rim;Cheong, Hwan Suk;Cho, Wan-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.354-360
    • /
    • 2022
  • Metabolic disease is a disease with a prevalence of 26% in Korean, and has three of the five states of abdominal obesity, hypertension, hunger glycemic disorder, high neutral fat, and low HDL cholesterol at the same time. This paper links the consumer panel data of the Rural Development Agency(RDA) and the medical care data of the National Health Insurance Service(NHIS) to generate a classification model that can be divided into a metabolic disease group and a control group through food consumption characteristics, and attempts to compare the differences. Many existing domestic and foreign studies related to metabolic diseases and food consumption characteristics are disease correlation studies of specific food groups and specific ingredients, and this paper is logistic considering all food groups included in the general diet. We created a classification model using regression, a decision tree-based classification model, and a classification model using XGBoost. Of the three models, the high-precision model is the XGBoost classification model, but the accuracy was not high at less than 0.7. As a future study, it is necessary to extend the observation period for food consumption in the patient group to more than 5 years and to study the metabolic disease classification model after converting the food consumed into nutritional characteristics.

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem (데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석)

  • Lee, Jong-Hwa;Bang, Jiwon;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.23 no.2
    • /
    • pp.18-28
    • /
    • 2020
  • With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

Analysis of the Causes for Continuous Employment of Employed Students after Graduation from Characterization High School -Focusing on the Commercial High Schools (특성화고등학교 졸업 후 취업자의 근속 원인 분석 연구 -상업계 고등학교를 중심으로)

  • Jeong, Kyu-Han;Lee, Jang-Hee
    • Journal of Practical Engineering Education
    • /
    • v.14 no.1
    • /
    • pp.165-177
    • /
    • 2022
  • The purpose of this study is to present the direction of employment guidance for long-term service through the analysis of the cause of employment of employed students who graduated from specialized high school. In particular, the purpose is to present student guidance plans for long-term service by analyzing personal reasons for students graduating from commercial high schools and policy factors for individual, school, company, and government service after employment. To this end, a survey was conducted for graduates of commercial high schools nationwide, and the validity, reliability, and causality of the survey data were analyzed by applying Exploratory Factor Analysis, Cronbach's Alpha, and decision tree analysis techniques. We found that personal goal setting for employment is an important factor for working for more than 1 year, personal relationships at work and personal characteristics are important factors for working for more than 3 years. In addition, we found that the reason for getting a job is that personal reasons and school recommendations are great, special lectures on employment, camps, and 'advice from seniors and teachers' programs are helpful in finding a job, and accounting and computer related subjects are helpful for long-term employment. Accordingly, in specialized high schools, it is required to prepare specific instructional measures for education such as setting personal goals and the formation of human relationships that are the basis of social life, and to actively operate the above subjects and programs to help with employment and longevity.

A Study on the Improvement of Service for the Revitalization of Natural Burial (자연장 활성화를 위한 서비스 개선방안 연구)

  • Lee, Jeung-Sun;Ahn, Jin-Ho
    • Journal of Service Research and Studies
    • /
    • v.13 no.3
    • /
    • pp.70-81
    • /
    • 2023
  • The choice of business method is a necessary decision at the last moment of life, and to this end, we use several criteria. Our funeral methods were dominated by ancestral worship culture and religion, not nature. It is only recently that nature was used as a means from a human perspective, but natural field methods such as consideration for nature and symbiosis with nature have emerged. The recent high public preference for natural fields is today's strong zeitgeist and nature-friendly values. Based on statistics in 2021, Korea's national cremation rate exceeded 92%, and compared to less than 20% of the cremation rate just 20 years ago, our business method has changed rapidly. As the cremation promotion movement and government policies, which began in the early 90s, were systematically developed, the enshrinement facility was established next to us. However, while this was also subject to criticism of national damage, the Jang Act called natural field was introduced into the system in 2008, and about 15 years have passed, but the revitalization of natural field is slower than expected. One of the reasons for the stagnation of development is to forget the basic spirit of the natural field (once you return to the forest), and to think like a graveyard grave. Accordingly, this study aims to identify the background of the introduction and current operation of natural fields and present development measures to improve memorial services to make natural fields loved by the people.