• Title/Summary/Keyword: Outliers

Search Result 669, Processing Time 0.02 seconds

Semi-automatic Extraction of 3D Building Boundary Using DSM from Stereo Images Matching (영상 매칭으로 생성된 DSM을 이용한 반자동 3차원 건물 외곽선 추출 기법 개발)

  • Kim, Soohyeon;Rhee, Sooahm
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.6_1
    • /
    • pp.1067-1087
    • /
    • 2018
  • In a study for LiDAR data based building boundary extraction, usually dense point cloud was used to cluster building rooftop area and extract building outline. However, when we used DSM generated from stereo image matching to extract building boundary, it is not trivial to cluster building roof top area automatically due to outliers and large holes of point cloud. Thus, we propose a technique to extract building boundary semi-automatically from the DSM created from stereo images. The technique consists of watershed segmentation for using user input as markers and recursive MBR algorithm. Since the proposed method only inputs simple marker information that represents building areas within the DSM, it can create building boundary efficiently by minimizing user input.

Verification of Harmonization of Dose Assessment Results According to Internal Exposure Scenarios

  • Kim, Bong-Gi;Ha, Wi-Ho;Kwon, Tae-Eun;Lee, Jun-Ho;Jung, Kyu-Hwan
    • Journal of Radiation Protection and Research
    • /
    • v.43 no.4
    • /
    • pp.143-153
    • /
    • 2018
  • Background: The determination of the amount of radionuclides and internal dose for the worker who may have intake of radionuclides results in a variation due to uncertainty of measurement data and ingestion information. As a result of this, it is possible that for the same internal exposure scenario assessors could make considerably different estimation of internal dose. In order to reduce this difference, internal exposure scenarios for nuclear facilities were developed, and intercomparison were made to determine the harmonization of dose assessment results among the assessors. Materials and Methods: Seven cases on internal exposures incidents that have occurred or may occur were prepared by referring to the intercomparison excercise scenario that NRC and IAEA have carried out. Based on this, 16 nuclear facilities concerned with internal exposure in Korea were asked to evaluate the scenarios. Each result was statistically determined according to the harmonization discrimination criteria developed by IDEAS/IAEA. Results and Discussion: The results were evaluated as having no outliers in all 7 cases. However, the distribution of the results was spread by various causes. They can be divided into two wide categories. The first one is the distribution of the results according to the assumption of the intake factors and the evaluation factors. The second one is distribution due to misapplication of calculation method and factors related to internal exposure. Conclusion: In order to satisfy the harmonization criteria and accuracy of the internal exposure dose evaluation, it is necessary that exact guidelines should be set on low dose, and various intercomparison cases also be needed including high dose exposure as well as the specialized education. The aim of the blind test is to make harmonization evaluation, but it will also contribute to securing the expertise and high quality of dose evaluation data through the discussion among the participants.

Analysing Risk Factors of 5-Year Survival Colorectal Cancer Using the Network Model

  • Park, Won Jun;Lee, Young Ho;Kang, Un Gu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.9
    • /
    • pp.103-108
    • /
    • 2019
  • The purpose of this study is to identify the factors that may affect the 5-year survival of colon cancer through network model and to use it as a clinical decision supporting system for colorectal cancer patients. This study was conducted using data from 2,540 patients who underwent colorectal cancer surgery from 1996 to 2018. Eleven factors related to survival of colorectal cancer were selected by consulting medical experts and previous studies. Analysis was proceeded from the data sorted out into 1,839 patients excluding missing values and outliers. Logistic regression analysis showed that age, BMI, and heart disease were statistically significant in order to identify factors affecting 5-year survival of colorectal cancer. Additionally, a correlation analysis was carried out age, BMI, heart disease, diabetes, and other diseases were correlated with 5-year survival of colorectal cancer. Sex was related with BMI, lung disease, and liver disease. Age was associated with heart disease, heart disease, hypertension, diabetes, and other diseases, and BMI with hypertension, diabetes, and other diseases. Heart disease was associated with hypertension, diabetes, hypertension, diabetes, and other diseases. In addition, diabetes and kidney disease were associated. In the correlation analysis, the network model was constructed with the Network Correlation Coefficient less than p <0.001 as the weight. The network model showed that factors directly affecting survival were age, BMI levels, heart disease, and indirectly influencing factors were diabetes, high blood pressure, liver disease and other diseases. If the network model is used as an assistant indicator for the treatment of colorectal cancer, it could contribute to increasing the survival rate of patients.

Outlier Detection of Real-Time Reservoir Water Level Data Using Threshold Model and Artificial Neural Network Model (임계치 모형과 인공신경망 모형을 이용한 실시간 저수지 수위자료의 이상치 탐지)

  • Kim, Maga;Choi, Jin-Yong;Bang, Jehong;Lee, Jaeju
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.1
    • /
    • pp.107-120
    • /
    • 2019
  • Reservoir water level data identify the current water storage of the reservoir, and they are utilized as primary data for management and research of agricultural water. For the reservoir storage management, Korea Rural Community Corporation (KRC) installed water level stations at around 1,600 agricultural reservoirs and has been collecting the water level data every 10 minutes. However, various kinds of outliers due to noise and erroneous problems are frequently appearing because of environmental and physical causes. Therefore, it is necessary to detect outlier and improve the quality of reservoir water level data to utilize the water level data in purpose. This study was conducted to detect and classify outlier and normal data using two different models including the threshold model and the artificial neural network (ANN) model. The results were compared to evaluate the performance of the models. The threshold model identifies the outlier by setting the upper/lower bound of water level data and variation data and by setting bandwidth of water level data as a threshold of regarding erroneous water level. The ANN model was trained with prepared training dataset as normal data (T) and outlier (F), and the ANN model operated for identifying the outlier. The models are evaluated with reference data which were collected reservoir water level data in daily by KRC. The outlier detection performance of the threshold model was better than the ANN model, but ANN model showed better detection performance for not classifying normal data as outlier.

Comparative Analysis of Essential Tasks and Delegable Tasks among Kindergarten Dietitians (유치원 유형에 따른 영양(교)사의 필수 업무 및 위임 가능 업무 비교·분석)

  • Kyung, Min Sook;Shin, Yu Lee;Ham, Sunny
    • Journal of the Korean Dietetic Association
    • /
    • v.27 no.4
    • /
    • pp.209-231
    • /
    • 2021
  • The purpose of this study was to compare differences between essential tasks and delegable tasks among public kindergarten dietitians. A survey study was conducted through a self-administered online method from November 18 to December 28, 2019. The survey consisted of essential tasks and delegable tasks, including 6 Duties, 25 Tasks, and 94 Task Elements. The survey was distributed to a sample of 500 kindergartens in Korea, after excluding incomplete surveys and outliers, and a total of 224 responses were used for the analysis. Descriptive statistics were used to compare essential tasks and delegable tasks. The results show that 'Duty A. Nutrition Management', 'Duty B. Foodservice Management Practices', 'Duty C. Hygiene management of kindergarten foodservice', 'Duty D. Nutrition-Diet Education and counseling', and 'Duty F. Professionalism Enhancement' were recognized as essential tasks to be performed by kindergarten dietitians. All 16 tasks elements (100.0%) in 'Duty E. Managing snacks during semesters, and lunch/snack during breaks' were identified as delegable tasks. In conclusion, most tasks were recognized as essential tasks to be performed by kindergarten dietitians. On the other hand, 'Duty E. Managing Snacks during semesters, and lunch/snack during breaks' was considered a delegable task by public-attached kindergarten dietitians. It is recommended that public-attached kindergartens should consider additional workforce related to 'Duty E'. This study is expected to offer basic data on laws and regulations about the duties of kindergarten dietitians.

Mean-shortfall optimization problem with perturbation methods (퍼터베이션 방법을 활용한 평균-숏폴 포트폴리오 최적화)

  • Won, Hayeon;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.39-56
    • /
    • 2021
  • Many researches have been done on portfolio optimization since Markowitz (1952) published a diversified investment model. Markowitz's mean-variance portfolio optimization problem is established under the assumption that the distribution of returns follows a normal distribution. However, in real life, the distribution of returns does not follow a normal distribution, and variance is not a robust statistic as it is heavily influenced by outliers. To overcome these potential issues, mean-shortfall portfolio model was proposed that utilized downside risk, shortfall, as a risk index. In this paper, we propose a perturbation method that uses the shortfall as a risk index of the portfolio. The proposed portfolio utilizes an adaptive Lasso to obtain a sparse and stable asset selection because it can reduce management and transaction costs. The proposed optimization is easily applicable as it can be computed using an efficient linear programming. In our real data analysis, we show the validity of the proposed perturbation method.

Quality Prediction Model for Manufacturing Process of Free-Machining 303-series Stainless Steel Small Rolling Wire Rods (쾌삭 303계 스테인리스강 소형 압연 선재 제조 공정의 생산품질 예측 모형)

  • Seo, Seokjun;Kim, Heungseob
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.12-22
    • /
    • 2021
  • This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.

Analysis of Success and Failure Factors of OTT Service Contents According to the Rating: Focus on Netflix (평점에 따른 OTT 서비스 콘텐츠의 성공과 실패 요인 분석: 넷플릭스를 중심으로)

  • Hong, Ji-Soo;Park, Jin-Soo;Kang, Sung-Woo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.65-75
    • /
    • 2021
  • This study explores multiple variables of an OTT service for discovering hidden relationship between rating and the other variables of each successful and failed content, respectively. In order to extract key variables that are strongly correlated to the rating across the contents, this work analyzes 170 Netflix original dramas and 419 movies. These contents are classified as success and failure by using the rating site IMDb, respectively. The correlation between the contents, which are classified via rating, and variables such as violence, lewdness and running time are analyzed to determine whether a certain variable appears or not in each successful and failure content. This study employs a regression analysis to discover correlations across the variables as a main analysis method. Since the correlation between independent variables should be low, check multicollinearity and select the variable. Cook's distance is used to detect and remove outliers. To improve the accuracy of the model, a variable selection based on AIC(Akaike Information Criterion) is performed. Finally, the basic assumptions of regression analysis are identified by residual diagnosis and Dubin Watson test. According to the whole analysis process, it is concluded that the more director awards exist and the less immatatable tend to be successful in movies. On the contrary, lower fear tend to be failure in movies. In case of dramas, there are close correlations between failure dramas and lower violence, higher fear, higher drugs.

Pairwise fusion approach to cluster analysis with applications to movie data (영화 데이터를 위한 쌍별 규합 접근방식의 군집화 기법)

  • Kim, Hui Jin;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.265-283
    • /
    • 2022
  • MovieLens data consists of recorded movie evaluations that was often used to measure the evaluation score in the recommendation system research field. In this paper, we provide additional information obtained by clustering user-specific genre preference information through movie evaluation data and movie genre data. Because the number of movie ratings per user is very low compared to the total number of movies, the missing rate in this data is very high. For this reason, there are limitations in applying the existing clustering methods. In this paper, we propose a convex clustering-based method using the pairwise fused penalty motivated by the analysis of MovieLens data. In particular, the proposed clustering method execute missing imputation, and at the same time uses movie evaluation and genre weights for each movie to cluster genre preference information possessed by each individual. We compute the proposed optimization using alternating direction method of multipliers algorithm. It is shown that the proposed clustering method is less sensitive to noise and outliers than the existing method through simulation and MovieLens data application.

Hydrochemical Investigation for Site Characterization: Focusing on the Application of Principal Component Analysis (부지특성화을 위한 지하수의 수리화학 특성 연구: 주성분 분석을 중심으로)

  • Yu, Soonyoung;Kim, Han-Suk;Jun, Seong-Chun;Yi, Jong Hwa;Yun, Seong-Taek;Kwon, Man Jae;Jo, Ho Young
    • Journal of Soil and Groundwater Environment
    • /
    • v.27 no.spc
    • /
    • pp.34-50
    • /
    • 2022
  • Principal component analysis (PCA) was conducted using hydrochemical data in four testbeds (A to D) built for the development of site characterization technologies to assess the hydrochemical processes controlling the hydrochemistry in each site. The PCA results indicated the nitrogen loading to deep bedrock aquifers through permeable fractures in Testbed A, the chemical weathering enhanced with the biodegradation of petroleum hydrocarbons in Testbed B, the reductive dechlorination in Testbed C, and the different hydrochemistry depending on the depth to bedrock in Testbed D, consistent with the characteristics of each site. In Testbeds B and D, outliers seemed to affect the PCA result probably due to the small number of samples, whereas the PCA result was still consistent with site characteristics. This study result indicates that the PCA is widely applicable to hydrochemical data for the assessment of major hydrochemical processes in contamination sites, which is useful for site characterization when combined with other site characterization technologies, e.g., geological survey, geophysical investigation, borehole logging. It is suggested that PCA is applied in contaminated sites to interpret hydrochemical data not only for the distribution of contamination levels but also for the assessment of major hydrochemical processes and contamination sources.