• 제목/요약/키워드: random sets

검색결과 277건 처리시간 0.024초

P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구 (Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending)

  • 프란시스 조셉 코스텔로;이건창
    • 디지털융복합연구
    • /
    • 제17권9호
    • /
    • pp.71-78
    • /
    • 2019
  • 본 연구는 P2P 대부 플랫폼에서 우수 대출자를 예측시 유용한 합성 소수집단 오버샘플링 기법을 제안하고 그 성과를 실증적으로 검증하고자 한다. P2P 대부 관련 우수 대출자를 추정할 때 일어나는 문제점중의 하나는 클래스 간 불균형이 심하여 이를 해결하지 않고서는 우수 대출자 예측이 쉽지 않다는 점이다. 이러한 문제를 해결하기 위하여 본 연구에서는 SMOTE, 즉 합성 소수집단 오버샘플링 기법을 제안하고 LendingClub 데이터셋에 적용하여 성과를 검증하였다. 검증결과 SMOTE 방법은 서포트 벡터머신, k-최근접이웃, 로지스틱 회귀, 랜덤 포레스트, 그리고 딥 뉴럴네트워크 분류기와 비교하여 통계적으로 우수한 성과를 보였다.

Ecological Momentary Assessment Using Smartphone-Based Mobile Application for Affect and Stress Assessment

  • Yang, Yong Sook;Ryu, Gi Wook;Han, Insu;Oh, Seojin;Choi, Mona
    • Healthcare Informatics Research
    • /
    • 제24권4호
    • /
    • pp.381-386
    • /
    • 2018
  • Objectives: This study aimed to describe the process of utilizing a mobile application for ecological momentary assessment (EMA) to collect data on stress and mood in daily life setting. Methods: A mobile application for the Android operating system was developed and installed with a set of questions regarding momentary mood and stress into a smartphone of a participant. The application sets alarms at semi-random intervals in 60-minute blocks, four times a day for 7 days. After obtaining all momentary affect and stress, the questions to assess the usability of the mobile EMA application were also administered. Results: The data were collected from 97 police officers working in Gyeonggi Province of South Korea. The mean completion rate was 60.0% ranging from 3.5% to 100%. The means of positive and negative affect were 18.34 of 28 and 19.09 of 63. The mean stress was 17.92 of 40. Participants responded that the mobile application correctly measured their affect ($4.34{\pm}0.83$) and stress ($4.48{\pm}0.62$) of 5-point Likert scale. Conclusions: Our study investigated the process of utilizing a mobile application to assess momentary affect and stress at repeated times. We found challenges regarding adherence to the research protocol, such as completion and delay of answering after alarm notification. Despite this inherent issue of adherence to the research protocol, the EMA still has advantages of reducing recall bias and assessing the actual moment of interest at multiple time points that improves ecological validity.

Pathway enrichment and protein interaction network analysis for milk yield, fat yield and age at first calving in a Thai multibreed dairy population

  • Laodim, Thawee;Elzo, Mauricio A.;Koonawootrittriron, Skorn;Suwanasopee, Thanathip;Jattawa, Danai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제32권4호
    • /
    • pp.508-518
    • /
    • 2019
  • Objective: This research aimed to determine biological pathways and protein-protein interaction (PPI) networks for 305-d milk yield (MY), 305-d fat yield (FY), and age at first calving (AFC) in the Thai multibreed dairy population. Methods: Genotypic information contained 75,776 imputed and actual single nucleotide polymorphisms (SNP) from 2,661 animals. Single-step genomic best linear unbiased predictions were utilized to estimate SNP genetic variances for MY, FY, and AFC. Fixed effects included herd-year-season, breed regression and heterosis regression effects. Random effects were animal additive genetic and residual. Individual SNP explaining at least 0.001% of the genetic variance for each trait were used to identify nearby genes in the National Center for Biotechnology Information database. Pathway enrichment analysis was performed. The PPI of genes were identified and visualized of the PPI network. Results: Identified genes were involved in 16 enriched pathways related to MY, FY, and AFC. Most genes had two or more connections with other genes in the PPI network. Genes associated with MY, FY, and AFC based on the biological pathways and PPI were primarily involved in cellular processes. The percent of the genetic variance explained by genes in enriched pathways (303) was 2.63% for MY, 2.59% for FY, and 2.49% for AFC. Genes in the PPI network (265) explained 2.28% of the genetic variance for MY, 2.26% for FY, and 2.12% for AFC. Conclusion: These sets of SNP associated with genes in the set enriched pathways and the PPI network could be used as genomic selection targets in the Thai multibreed dairy population. This study should be continued both in this and other populations subject to a variety of environmental conditions because predicted SNP values will likely differ across populations subject to different environmental conditions and changes over time.

게이트심장혈액풀검사에서 딥러닝 기반 좌심실 영역 분할방법의 유용성 평가 (Evaluating Usefulness of Deep Learning Based Left Ventricle Segmentation in Cardiac Gated Blood Pool Scan)

  • 오주영;정의환;이주영;박훈희
    • 대한방사선기술학회지:방사선기술과학
    • /
    • 제45권2호
    • /
    • pp.151-158
    • /
    • 2022
  • The Cardiac Gated Blood Pool (GBP) scintigram, a nuclear medicine imaging, calculates the left ventricular Ejection Fraction (EF) by segmenting the left ventricle from the heart. However, in order to accurately segment the substructure of the heart, specialized knowledge of cardiac anatomy is required, and depending on the expert's processing, there may be a problem in which the left ventricular EF is calculated differently. In this study, using the DeepLabV3 architecture, GBP images were trained on 93 training data with a ResNet-50 backbone. Afterwards, the trained model was applied to 23 separate test sets of GBP to evaluate the reproducibility of the region of interest and left ventricular EF. Pixel accuracy, dice coefficient, and IoU for the region of interest were 99.32±0.20, 94.65±1.45, 89.89±2.62(%) at the diastolic phase, and 99.26±0.34, 90.16±4.19, and 82.33±6.69(%) at the systolic phase, respectively. Left ventricular EF was calculated to be an average of 60.37±7.32% in the ROI set by humans and 58.68±7.22% in the ROI set by the deep learning segmentation model. (p<0.05) The automated segmentation method using deep learning presented in this study similarly predicts the average human-set ROI and left ventricular EF when a random GBP image is an input. If the automatic segmentation method is developed and applied to the functional examination method that needs to set ROI in the field of cardiac scintigram in nuclear medicine in the future, it is expected to greatly contribute to improving the efficiency and accuracy of processing and analysis by nuclear medicine specialists.

Privacy-Preserving Traffic Volume Estimation by Leveraging Local Differential Privacy

  • Oh, Yang-Taek;Kim, Jong Wook
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권12호
    • /
    • pp.19-27
    • /
    • 2021
  • 본 논문에서는 지역 차분 프라이버시(Local Differential Privacy, LDP) 기법을 이용하여 프라이버시를 보호하면서 수집한 차량 위치 데이터와 딥러닝 기법을 이용하여 교통량을 예측하기 위한 기법을 제시한다. 제시한 기법은 데이터를 수집하는 과정과 수집한 데이터를 이용하여 교통량을 예측하는 과정으로 구성된다. 첫 번째 단계에서는 데이터 수집 과정 중에 발생할 수 있는 프라이버시 침해 문제를 해결하기 위해 LDP 기법을 적용하여 차량의 위치 데이터를 수집한다. LDP 기법은 데이터 수집 시 원본 데이터에 노이즈를 추가해 사용자의 민감한 데이터가 외부에 노출되는 것을 방지한다. 이를 통해 운전자의 프라이버시를 보존하면서 차량의 위치 데이터를 수집할 수 있다. 두 번째 단계에서는 첫 번째 단계에서 수집한 데이터에 딥러닝 기법을 적용하여, 교통량을 예측한다. 또한, 본 논문에서 제안한 기법의 우수성을 입증하기 위해, 실데이터를 이용한 성능 평가를 진행한다. 성능 평가 결과는 본 논문에서 제안한 기법이 사용자의 프라이버시를 보호하면서 수집된 데이터를 이용하여 효과적으로 교통량을 예측할 수 있음을 입증한다.

Cloud Removal Using Gaussian Process Regression for Optical Image Reconstruction

  • Park, Soyeon;Park, No-Wook
    • 대한원격탐사학회지
    • /
    • 제38권4호
    • /
    • pp.327-341
    • /
    • 2022
  • Cloud removal is often required to construct time-series sets of optical images for environmental monitoring. In regression-based cloud removal, the selection of an appropriate regression model and the impact analysis of the input images significantly affect the prediction performance. This study evaluates the potential of Gaussian process (GP) regression for cloud removal and also analyzes the effects of cloud-free optical images and spectral bands on prediction performance. Unlike other machine learning-based regression models, GP regression provides uncertainty information and automatically optimizes hyperparameters. An experiment using Sentinel-2 multi-spectral images was conducted for cloud removal in the two agricultural regions. The prediction performance of GP regression was compared with that of random forest (RF) regression. Various combinations of input images and multi-spectral bands were considered for quantitative evaluations. The experimental results showed that using multi-temporal images with multi-spectral bands as inputs achieved the best prediction accuracy. Highly correlated adjacent multi-spectral bands and temporally correlated multi-temporal images resulted in an improved prediction accuracy. The prediction performance of GP regression was significantly improved in predicting the near-infrared band compared to that of RF regression. Estimating the distribution function of input data in GP regression could reflect the variations in the considered spectral band with a broader range. In particular, GP regression was superior to RF regression for reproducing structural patterns at both sites in terms of structural similarity. In addition, uncertainty information provided by GP regression showed a reasonable similarity to prediction errors for some sub-areas, indicating that uncertainty estimates may be used to measure the prediction result quality. These findings suggest that GP regression could be beneficial for cloud removal and optical image reconstruction. In addition, the impact analysis results of the input images provide guidelines for selecting optimal images for regression-based cloud removal.

ACA: Automatic search strategy for radioactive source

  • Jianwen Huo;Xulin Hu;Junling Wang;Li Hu
    • Nuclear Engineering and Technology
    • /
    • 제55권8호
    • /
    • pp.3030-3038
    • /
    • 2023
  • Nowadays, mobile robots have been used to search for uncontrolled radioactive source in indoor environments to avoid radiation exposure for technicians. However, in the indoor environments, especially in the presence of obstacles, how to make the robots with limited sensing capabilities automatically search for the radioactive source remains a major challenge. Also, the source search efficiency of robots needs to be further improved to meet practical scenarios such as limited exploration time. This paper proposes an automatic source search strategy, abbreviated as ACA: the location of source is estimated by a convolutional neural network (CNN), and the path is planned by the A-star algorithm. First, the search area is represented as an occupancy grid map. Then, the radiation dose distribution of the radioactive source in the occupancy grid map is obtained by Monte Carlo (MC) method simulation, and multiple sets of radiation data are collected through the eight neighborhood self-avoiding random walk (ENSAW) algorithm as the radiation data set. Further, the radiation data set is fed into the designed CNN architecture to train the network model in advance. When the searcher enters the search area where the radioactive source exists, the location of source is estimated by the network model and the search path is planned by the A-star algorithm, and this process is iterated continuously until the searcher reaches the location of radioactive source. The experimental results show that the average number of radiometric measurements and the average number of moving steps of the ACA algorithm are only 2.1% and 33.2% of those of the gradient search (GS) algorithm in the indoor environment without obstacles. In the indoor environment shielded by concrete walls, the GS algorithm fails to search for the source, while the ACA algorithm successfully searches for the source with fewer moving steps and sparse radiometric data.

석씨성경과 천상열차분야지도의 이십팔수 수거성 관측 연도의 통계적 추정 (Statistical estimation of the epochs of observation for the 28 determinative stars in the Shi Shi Xing Jing and the table in Cheonsang Yeolcha Bunyajido)

  • 안상현
    • 천문학회보
    • /
    • 제44권2호
    • /
    • pp.61.3-61.3
    • /
    • 2019
  • 석씨성경과 천상열차분야지도 도설에 있는 이십팔수 거성들의 좌푯값을 측정한 연도를 두 가지 방법을 써서 추정하였다. 이 두 표에 있는 좌푯값들은 자오선 관측 기기를 가지고 측정한 것으로 생각된다. 그래서 이 값들에는 기기 회전축이 어긋나서 생기는 오차와 랜덤 오차가 들어 잇다. 우리는 푸리에 방법을 받아들이고, 또한 최소자승법을 새로 고안하였다. 우리는 관측 연돗값의 분산을 구하기 위해 부트스트랩 리샘플링을 시행하였다. 그 결과, 우리는 두 성표가 모두 기원전 1세기 즉 전한 후기에 만들어졌다는 사실을 알 수 있었다. 석씨성경의 관측 연도가 천상열차분야에 들어 있는 좌푯값보다 약 15-20년 정도 앞선 것으로 보인다. 그러나 그 두 연돗값의 분산이 너무 커서 석씨성경은 기원전 77년 무렵에, 또한 천상열차분야지도의 성표는 기원전 52년에 측정된 것이라는 추정은 확인할 수 없었다. 자료 개수가 더 있거나 또는 측정 오차가 절반 정도라면 검증을 통해 결정을 할 수 있을 것이다. 이러한 점에 비추어 우리는 석씨성경에 수록되어 있는 120개의 별들의 좌표 등에 관해 논의해볼 것이다.

  • PDF

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • 제23권12호
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

통계적 품질관리를 위한 왜도의 활용 (Utilization of Skewness for Statistical Quality Control)

  • 김훈태;임성욱
    • 품질경영학회지
    • /
    • 제51권4호
    • /
    • pp.663-675
    • /
    • 2023
  • Purpose: Skewness is an indicator used to measure the asymmetry of data distribution. In the past, product quality was judged only by mean and variance, but in modern management and manufacturing environments, various factors and volatility must be considered. Therefore, skewness helps accurately understand the shape of data distribution and identify outliers or problems, and skewness can be utilized from this new perspective. Therefore, we would like to propose a statistical quality control method using skewness. Methods: In order to generate data with the same mean and variance but different skewness, data was generated using normal distribution and gamma distribution. Using Minitab 18, we created 20 sets of 1,000 random data of normal distribution and gamma distribution. Using this data, it was proven that the process state can be sensitively identified by using skewness. Results: As a result of the analysis of this study, if the skewness is within ± 0.2, there is no difference in judgment from management based on the probability of errors that can be made in the management state as discussed in quality control. However, if the skewness exceeds ±0.2, the control chart considering only the standard deviation determines that it is in control, but it can be seen that the data is out of control. Conclusion: By using skewness in process management, the ability to evaluate data quality is improved and the ability to detect abnormal signals is excellent. By using this, process improvement and process non-sub-stitutability issues can be quickly identified and improved.