• Title/Summary/Keyword: Deviation Tree

Search Result 63, Processing Time 0.03 seconds

Predictors of Protective Factors for Depression in Adolescent using Decision Making Tree Analysis (의사결정나무분석을 이용한 청소년 우울의 보호요인 예측모형)

  • Kim, Bo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.5
    • /
    • pp.375-385
    • /
    • 2015
  • The study is to develop specific strategies to prevent adolescents' depression, early detection and intervention services. This study was a descriptive research study with the purpose of predictors of protective factors for depression in adolescent using decision making tree analysis. The subjects for the study were 485 student in G city. This study collected data between September 23, 2013 and September 26, 2013 and analyzed them with frequency analysis, percentage, the mean and standard deviation, ${\chi}^2$-test, t-test, and a decision-making tree by using SPSS 20.0 program. From the data analysis, the predictive model for protective factors related to depression in adolescent with 4 pathways, 12 nodes. The common predicting variables of depression in adolescent were characteristics, family cohesion, parent adolescent communication, peer communication. The specialty of training data and test data was 76.0% and 65.4%. The sensitivity of training data was 78.2% and 63.7%. As for the classification accuracy, training data and test data explained 70.1% and 69.7%. Parent adolescent communication and peer communication to decrease depression of Korean middle and high school students are necessary. This study should contribute as baseline data for intervention strategies and planning ability of depression prevention in adolescents.

Industrial Safety Risk Analysis Using Spatial Analytics and Data Mining (공간분석·데이터마이닝 융합방법론을 통한 산업안전 취약지 등급화 방안)

  • Ko, Kyeongseok;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.147-153
    • /
    • 2017
  • The mortality rate in industrial accidents in South Korea was 11 per 100,000 workers in 2015. It's five times higher than the OECD average. Economic losses due to industrial accidents continue to grow, reaching 19 trillion won much more than natural disaster losses equivalent to 1.1 trillion won. It requires fundamental changes according to industrial safety management. In this study, We classified the risk of accidents in industrial complex of Ulju-gun using spatial analytics and data mining. We collected 119 data on accident data, factory characteristics data, company information such as sales amount, capital stock, building information, weather information, official land price, etc. Through the pre-processing and data convergence process, the analysis dataset was constructed. Then we conducted geographically weighted regression with spatial factors affecting fire incidents and calculated the risk of fire accidents with analytical model for combining Boosting and CART (Classification and Regression Tree). We drew the main factors that affect the fire accident. The drawn main factors are deterioration of buildings, capital stock, employee number, officially assessed land price and height of building. Finally the predicted accident rates were divided into four class (risk category-alert, hazard, caution, and attention) with Jenks Natural Breaks Classification. It is divided by seeking to minimize each class's average deviation from the class mean, while maximizing each class's deviation from the means of the other groups. As the analysis results were also visualized on maps, the danger zone can be intuitively checked. It is judged to be available in different policy decisions for different types, such as those used by different types of risk ratings.

The Object Image Detection Method using statistical properties (통계적 특성에 의한 객체 영상 검출방안)

  • Kim, Ji-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.7
    • /
    • pp.956-962
    • /
    • 2018
  • As the study of the object feature detection from image, we explain methods to identify the species of the tree in forest using the picture taken from dron. Generally there are three kinds of methods, which are GLCM (Gray Level Co-occurrence Matrix) and Gabor filters, in order to extract the object features. We proposed the object extraction method using the statistical properties of trees in this research because of the similarity of the leaves. After we extract the sample images from the original images, we detect the objects using cross correlation techniques between the original image and sample images. Through this experiment, we realized the mean value and standard deviation of the sample images is very important factor to identify the object. The analysis of the color component of the RGB model and HSV model is also used to identify the object.

Detection of Forest Areas using Airborne LIDAR Data (항공 라이다데이터를 이용한 산림영역 탐지)

  • Hwang, Se-Ran;Kim, Seong-Joon;Lee, Im-Pyeong
    • Spatial Information Research
    • /
    • v.18 no.3
    • /
    • pp.23-32
    • /
    • 2010
  • LIDAR data are useful for forest applications such as bare-earth DEM generation for forest areas, and estimation of tree height and forest biomass. As a core preprocessing procedure for most forest applications, this study attempts to develop an efficient method to detect forest areas from LIDAR data. First, we suggest three perceptual cues based on multiple return characteristics, height deviation and spatial distribution, being expected as reliable perceptual cues for forest area detection from LIDAR data. We then classify the potential forest areas based on the individual cue and refine them with a bi-morphological process to eliminate falsely detected areas and smoothing the boundaries. The final refined forest areas have been compared with the reference data manually generated with an aerial image. All the methods based on three types of cues show the accuracy of more than 90%. Particularly, the method based on multiple returns is slightly better than other two cues in terms of the simplicity and accuracy. Also, it is shown that the combination of the individual results from each cue can enhance the classification accuracy.

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

  • Young-Jin, Han;In-Whee, Joe
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.12
    • /
    • pp.445-452
    • /
    • 2022
  • Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.

Conserved Metabolic Pathways of 471 Species of Archaebacteria (고세균 471종의 보존적 대사경로)

  • Dong-Geun Lee;Andre Kim;Sang-Hyeon Lee
    • Journal of Life Science
    • /
    • v.34 no.8
    • /
    • pp.588-593
    • /
    • 2024
  • An extensive analysis of 3,490 metabolic pathways in 471 archaebacterial species was conducted using the MetaCyc database. The number of metabolic pathways in these species varied significantly, ranging from 13 to 184 per species. Notably, no single metabolic pathway was found to be common in all archaebacteria. However, the "UTP and CTP de novo biosynthesis" and "tRNA charging" pathways were present in the 470 species. Among the top 12 most prevalent metabolic pathways in archaebacteria, five were associated with nucleic acids and five with proteins. The remaining pathways included the "synthetic pathway of S-adenosyl-L-methionine (SAM)," a critical cofactor in various bioreactions, and "phosphopantothenate biosynthesis III (archaea)," which is required for essential post-translational modifications. These findings underscore the importance of nucleic acids and protein metabolism in archaeal biology. When the average and standard deviation of the distance values obtained from the phylogenetic tree of metabolic pathways, each class of archaebacteria was divided into main two groups and the others, showing that the distribution of metabolic pathways was diverse. This study's insights hold potential applications in both foundational science and drug development.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

An Actual Measurement on Safety of Play Equipments in the Outdoor Playground (어린이 놀이터 놀이시설의 안전도에 관한 조사)

  • 석주영;안옥희;박인전
    • Journal of the Korean housing association
    • /
    • v.13 no.2
    • /
    • pp.47-53
    • /
    • 2002
  • The purposes of this study actually measure to the dimension and the quality of material play equipment's examine whether they meet safety standards or not, and intend to offer basic data to present proper safety standards concerning the dimension of play equipment in the end. The subjects for this study were 59 outdoor playgrounds, 30 among them located in apartment sites and the remainder did in residential districts. The time of actual measurement was in June 1999, and June 2000. Data were analyzed into frequency, percentage, mean, and standard deviation by using SPSSWIN program. The main results were as follows.: First, more than half of playgrounds were assessed for being traffic hazards due to the adjacent streets. And they were hardly equipped with the toilet and drinking water facilities, but were almost equipped with the shade of a tree and benches. Second, it was caused in inconvenience of children's use and difficulty of play equipments'management, since the quality of play equipments materials was consisted of wood or metal. Third, the standards for swing and slide were established in detail and actual measurement's results were suitable to standards'value as well, whereas the standards for seesaw and climber were not in detail and they did not design or install suitably.

A Numerical Method to Calculate Drainage Time in Large Transmission Pipelines Filter (대구경 관로의 배수시간 산정을 위한 수치해석 기법)

  • Shin, Byoung-Ho;Choi, Doo-Yong;Jeong, Kwansue
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.31 no.6
    • /
    • pp.511-519
    • /
    • 2017
  • Multi-regional water supply system, which installed for supplying multiple water demands, is characterized by large-sized, long-distance, tree-type layout. This system is vulnerable to long-standing service interruption when a pipe breaks is occurred. In this study, a numerical method is proposed to calculate drainage time that directly affects time of service interruption. To begin with, governing equations are formulated to embed the delayed drainage effect by the friction loss, and to resolve complicated connection of pipelines, which are derived from the continuity and energy equations. The nonlinear hydraulic equations are solved by using explicit time integration method and the Newton-Raphson method. The developed model is verified by comparing the result with analytical solution. Furthermore, the model's applicability is validated by the examples of pipelines in serial, in parallel, and complex layout. Finally, the model is utilized to suggest an appropriate actions to reduce the deviation of draining time in the C transmission line of the B multi-regional water supply system.

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.