• Title/Summary/Keyword: data sampling

Search Result 5,029, Processing Time 0.036 seconds

Improvement of SOM using Stratification

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.1
    • /
    • pp.36-41
    • /
    • 2009
  • Self organizing map(SOM) is one of the unsupervised methods based on the competitive learning. Many clustering works have been performed using SOM. It has offered the data visualization according to its result. The visualized result has been used for decision process of descriptive data mining as exploratory data analysis. In this paper we propose improvement of SOM using stratified sampling of statistics. The stratification leads to improve the performance of SOM. To verify improvement of our study, we make comparative experiments using the data sets form UCI machine learning repository and simulation data.

Sampling Study on Environmental Observations: Precipitation, Soil Moisture and Land Cover Information

  • 유철상
    • Journal of Environmental Science International
    • /
    • v.5 no.2
    • /
    • pp.103-112
    • /
    • 1996
  • Observational date is integral in our understanding of present climate, its natural variability and any cnange roue to anturopogenic effects. This study incorporates a brief overview of sampling requirements using data from the first ISLSCP Field Experiment (FIFE) in 1987, which was a multi-disciplinary field experiment over a 15km grid in Konza Prairie, USA. Sampling strategies were designed for precipitation and soil moisture measurements and also detecting land cover type. It was concludes that up to 8 raingages would be needed for valuable precipitation measurements covering the whole FIFE catchment, but only one soil moisture station. Results show that as new gages or station are added to the catchment then the sampling error is reduced, but the Improvement in error performance is less as the number of gages or stations increases. Sampling from remoteiy sensed instruments shows different results. It can be seen that the sampling error at 1arger resolution sizes are small due to competing error contribution from both commission and omission error.

  • PDF

Super-Resolution Image Processing Algorithm Using Hybrid Up-sampling (하이브리드 업샘플링을 이용한 베이시안 초해상도 영상처리)

  • Park, Jong-Hyun;Kang, Moon-Gi
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.2
    • /
    • pp.294-302
    • /
    • 2008
  • In this paper, we present a new image up-sampling method which registers low resolution images to the high resolution grid when Bayesian super-resolution image processing is performed. The proposed up-sampling method interpolates high-resolution pixels using high-frequency data lying in all the low resolution images, instead of up-sampling each low resolution image separately. The interpolation is based on B-spline non-uniform re-sampling, adjusted for the super-resolution image processing. The experimental results demonstrate the effects when different up-sampling methods generally used such as zero-padding or bilinear interpolation are applied to the super-resolution image reconstruction. Then, we show that the proposed hybird up-sampling method generates high-resolution images more accurately than conventional methods with quantitative and qualitative assess measures.

Comparison of Sampling and Wall-to-Wall Methodologies for Reporting the GHG Inventory of the LULUCF Sector in Korea (LULUCF 부문 산림 온실가스 인벤토리 구축을 위한 Sampling과 Wall-to-Wall 방법론 비교)

  • Park, Eunbeen;Song, Cholho;Ham, Boyoung;Kim, Jiwon;Lee, Jongyeol;Choi, Sol-E;Lee, Woo-Kyun
    • Journal of Climate Change Research
    • /
    • v.9 no.4
    • /
    • pp.385-398
    • /
    • 2018
  • Although the importance of developing reliable and systematic GHG inventory has increased, the GIS/RS-based national scale LULUCF (Land Use, Land-Use Change and Forestry) sector analysis is insufficient in the context of the Paris Agreement. In this study, the change in $CO_2$ storage of forest land due to land use change is estimated using two GIS/RS methodologies, Sampling and Wall-to-Wall methods, from 2000 to 2010. Particularly, various imagery with sampling data and land cover maps are used for Sampling and Wall-to-Wall methods, respectively. This land use matrix of these methodologies and the national cadastral statistics are classified by six land-use categories (Forest land, Cropland, Grassland, Wetlands, Settlements, and Other land). The difference of area between the result of Sampling methods and the cadastral statistics decreases as the sample plot distance decreases. However, the difference is not significant under a 2 km sample plot. In the 2000s, the Wall-to-Wall method showed similar results to sampling under a 2 km distance except for the Settlement category. With the Wall-to-Wall method, $CO_2$ storage is higher than that of the Sampling method. Accordingly, the Wall-to-Wall method would be more advantageous than the Sampling method in the presence of sufficient spatial data for GHG inventory assessment. These results can contribute to establish an annual report system of national greenhouse gas inventory in the LULUCF sector.

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.803-816
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

Application of Random Over Sampling Examples(ROSE) for an Effective Bankruptcy Prediction Model (효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용)

  • Ahn, Cheolhwi;Ahn, Hyunchul
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.525-535
    • /
    • 2018
  • If the frequency of a particular class is excessively higher than the frequency of other classes in the classification problem, data imbalance problems occur, which make machine learning distorted. Corporate bankruptcy prediction often suffers from data imbalance problems since the ratio of insolvent companies is generally very low, whereas the ratio of solvent companies is very high. To mitigate these problems, it is required to apply a proper sampling technique. Until now, oversampling techniques which adjust the class distribution of a data set by sampling minor class with replacement have popularly been used. However, they are a risk of overfitting. Under this background, this study proposes ROSE(Random Over Sampling Examples) technique which is proposed by Menardi and Torelli in 2014 for the effective corporate bankruptcy prediction. The ROSE technique creates new learning samples by synthesizing the samples for learning, so it leads to better prediction accuracy of the classifiers while avoiding the risk of overfitting. Specifically, our study proposes to combine the ROSE method with SVM(support vector machine), which is known as the best binary classifier. We applied the proposed method to a real-world bankruptcy prediction case of a Korean major bank, and compared its performance with other sampling techniques. Experimental results showed that ROSE contributed to the improvement of the prediction accuracy of SVM in bankruptcy prediction compared to other techniques, with statistical significance. These results shed a light on the fact that ROSE can be a good alternative for resolving data imbalance problems of the prediction problems in social science area other than bankruptcy prediction.

Probability Sampling to Select Polling Places in Exit Poll (출구조사를 위한 투표소 확률추출 방법)

  • Kim, Young-Won;Uhm, Yoon-Hee
    • Survey Research
    • /
    • v.6 no.2
    • /
    • pp.1-32
    • /
    • 2005
  • The accuracy of exit poll mainly depends on the sampling method of voting places. For exit poll, we propose a probability sampling method of selecting voting places as an alternative to the bellwether polling place sampling. Through an empirical study based on the 2004 general election data, the efficiency of the suggested systematic sampling from ordered voting places was evaluated in terms of mean prediction error and it turns out that the proposed sampling method outperformed the bellwether polling places sampling. We also calculated the variance of estimator from the proposed sampling, and considered the sample size problem to guarantee the target precision using the design effect of the proposed sample design.

  • PDF

Time-Balanced Quota Sampling for Telephone Survey (전화조사를 위한 시간균형할당표본추출)

  • Huh, Myung-Hoe;Hwang, Jin-Mo
    • Survey Research
    • /
    • v.7 no.2
    • /
    • pp.39-52
    • /
    • 2006
  • Most of Korean survey institutions adopt quota sampling for telephone surveys based on region, gender and age-band. In weekdays, it is well blown that there exist substantial differences in day time in-house rate by individual's socio-demographic attributes. So, quota sampling may induce systematic respondent selection bias. To solve the problem, we propose "time-balanced quota sampling" in which interviewer's call time-band is added as an quota variable. Furthermore, we propose "time-balanced quasi-quota sampling" which is derived by partially relaxing evening time quotas in time-balanced quota sampling. We compare the conventional and the newly proposed quota sampling schemes by drawing Monte Carlo samples from the hypothetical population for which the Korea 2004 time use survey data is assumed.

  • PDF

A Comparative Analysis on Medical and Korean Medical Service Tendency of Total Knee Arthroplasty Patients Using Patients Sample Data of Health Insurance Review and Assessment Service (슬관절 전치환술 환자의 의과 및 한의과 의료기관 이용 현황 비교 분석: 건강보험심사평가원 표본 데이터를 이용하여)

  • Park, Joo-sung;Kim, Nam-Kwen;Song, Yun-kyung
    • Journal of Korean Medicine Rehabilitation
    • /
    • v.29 no.1
    • /
    • pp.31-39
    • /
    • 2019
  • Objectives To obtain future research basis of Korean Medicine for total knee arthroplasty patient by analyzing medical and Korean Medical service utilization and treatment duration. Methods Data sampling was performed on Health Insurance Review and Assessment Service patient data of 2015 (Confidence level of 97%) to analyze patients' medical and Korean Medical service tendency. Sampling groups were divided into two groups; i) Patients who completed their treatment within 5 months of total knee arthroplasty, ii) Patients who continued their treatment after 5 months of total knee arthroplasty, to investigate patients' medical and Korean Medical service tendency and individual characteristics were carefully monitored. Results It was confirmed that total of 1,655 patients had gone through total knee arthroplasty out of 1,453,486 patients who were gathered for sampling. First sampling group (Patients who completed their treatment within 5 months of total knee arthroplasty) was 287 patients and second sampling group (Patients who continued their treatment after 5 months of total knee arthroplasty) was 385 patients. The proportion of patients who visited Korean Medical service in first sampling group was lower than that of second sampling group. Conclusions It was confirmed that medical and Korean Medical service and cost consumed by second group (Patients who continued their treatment after 5 months of total knee arthroplasty) was higher than that of first group (Patients who continued their treatment after 5 months of total knee arthroplasty). It is highly recommended to continue with further study for efficient medical and Korean Medical service and reduced cost.

Novel Compressed Sensing Techniques for Realistic Image (실감 영상을 위한 압축 센싱 기법)

  • Lee, Sun Yui;Jung, Kuk Hyun;Kim, Jin Young;Park, Gooman
    • Journal of Satellite, Information and Communications
    • /
    • v.9 no.3
    • /
    • pp.59-63
    • /
    • 2014
  • This paper describes the basic principles of 3D broadcast system and proposes new 3D broadcast technology that reduces the amount of data by applying CS(Compressed Sensing). Differences between Sampling theory and the CS technology concept were described. Recently proposed CS algorithm AMP(Approximate Message Passing) and CoSaMP(Compressive Sampling Matched Pursuit) were described. This paper compared an accuracy between two algorithms and a calculation time that image data compressed and restored by these algorithms. As result determines a low complexity algorithm for 3D broadcast system.