Search | Korea Science

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

Kang, Dae-Ki;Han, Min-gyu
- International journal of advanced smart convergence
- /
- v.8 no.1
- /
- pp.75-81
- /
- 2019
Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.
https://doi.org/10.7236/IJASC.2019.8.1.75 인용 PDF KSCI HTML

Effective sampling of estuarine fauna by a passive net in theWest Sea of Korea occurring strong tide (조류가 강한 서해 하구에서 수동어구를 이용한 하구역 유영생물의 효율적 채집)

Hwang, Sun-Do;Im, Yang-Jae
- Journal of the Korean Society of Fisheries and Ocean Technology
- /
- v.47 no.4
- /
- pp.338-343
- /
- 2011
To obtain the effective sampling estuarine fauna by a passive net in the West Sea of Korea occurring strong tide, catch were collected by bag nets with various sampling trials off Ganghwa Island in November 2009. We compared the difference of community structures (on spring tide vs. neap tide, total sample vs. subsample and 4 nets vs. 1 net) with each species composition as a sampling unit by the Pearson chi-square test. Number of individual at the spring tide was more abundant than that at the neap tide (p<0.0001) although number of species at the spring tide was not significantly different with that at the neap tide (p=0.174). Both number of species (p=0.138) and number of individual (p=0.096) were not significantly different between total sample and random subsample. Number of species was not significantly different between the subsample by 1 net and the subsample by 4 nets (p=0.515), but number of individual was a little different on both samples (p=0.024). In conclusion, we suggest the subsample by 1 net at spring tide as the effective sampling estuarine fauna by a passive net in the West Sea occurring strong tide.
https://doi.org/10.3796/KSFT.2011.47.4.338 인용 PDF KSCI

A Cost Effective Reference Data Sampling Algorithm Using Fractal Analysis

Lee, Byoung-Kil;Eo, Yang-Dam;Jeong, Jae-Joon;Kim, Yong-Il
- ETRI Journal
- /
- v.23 no.3
- /
- pp.129-137
- /
- 2001
A random sampling or systematic sampling method is commonly used to assess the accuracy of classification results. In remote sensing, with these sampling methods, much time and tedious work are required to acquire sufficient ground truth data. So, a more effective sampling method that can represent the characteristics of the population is required. In this study, fractal analysis is adopted as an index for reference sampling. The fractal dimensions of the whole study area and the sub-regions are calculated to select sub-regions that have the most similar dimensionality to that of the whole area. Then the whole area's classification accuracy is compared with those of sub-regions, and it is verified that the accuracies of selected sub-regions are similar to that of whole area. A new kind of reference sampling method using the above procedure is proposed. The results show that it is possible to reduce sampling area and sample size, while keeping the same level of accuracy as the existing methods.
PDF

A Cost Effective Reference Data Sampling Algorithm Using Fractal Analysis (프랙탈 분석을 통한 비용효과적인 기준 자료추출알고리즘에 관한 연구)

김창재
- Spatial Information Research
- /
- v.8 no.1
- /
- pp.171-182
- /
- 2000
Random sampling or systematic sampling method is commonly used to assess the accuracy of classification results. In remote sensing, with these sampling method, much time and tedious works are required to acquire sufficient ground truth data. So , a more effective sampling method that can retain the characteristics of the population is required. In this study, fractal analysis is adopted as an index for reference sampling . The fractal dimensions of the whole study area and the sub-regions are calculated to choose sub-regions that have the most similar dimensionality to that of whole-area. Then the whole -area s classification accuracy is compared to those of sub-regions, respectively, and it is verified that the accuracies of selected sub regions are similar to that of full-area . Using the above procedure, a new kind of reference sampling method is proposed. The result shows that it is possible to reduced sampling area and sample size keeping up the same results as existing methods in accuracy tests. Thus, the proposed method is proved cost-effective for reference data sampling.
PDF

Construction of variable sampling rate model and its evaluation

Imoto, Fumio;Nakamura, Masatoshi
- 제어로봇시스템학회:학술대회논문집
- /
- 1994.10a
- /
- pp.106-111
- /
- 1994
We proposed a new variable sampling rate model which expresses the phenomena with both rapid and slow components. A method for determining the variable sampling rate and the older of the time series model was explained. The proposed variable sampling rate model was evaluated based oil an information criterion(AIC). Tile variable sampling rate model brought smaller an information criterion than one of a constant sampling rate model of conventional type, and was proved to be effective as a prediction model of the system with both rapid and slow components.
PDF

ON COMPARISON OF PERFORMANCES OF SYNTHETIC AND NON-SYNTHETIC GENERALIZED REGRESSION ESTIMATIONS FOR ESTIMATING LOCALIZED ELEMENTS

SARA AMITAVA
- Journal of the Korean Statistical Society
- /
- v.34 no.1
- /
- pp.73-83
- /
- 2005
Thompson's (1990) adaptive cluster sampling is a promising sampling technique to ensure effective representation of rare or localized population units in the sample. We consider the problem of simultaneous estimation of the numbers of earners through a number of rural unorganized industries of which some are concentrated in specific geographic locations and demonstrate how the performance of a conventional Rao-Hartley-Cochran (RHC, 1962) estimator can be improved upon by using auxiliary information in the form of generalized regression (greg) estimators and then how further improvements are also possible to achieve by adopting adaptive cluster sampling.
PDF KSCI

Design of the Variable Sampling Rates X-chart with Average Time to Signal Adjusted by the Sampling Cost

Park, Chang-Soon;Song, Moon-Sup
- Journal of the Korean Statistical Society
- /
- v.26 no.2
- /
- pp.181-198
- /
- 1997
The variable sampling rates scheme is proposed by taking random sample size and sampling interval during the process. The performance of the scheme is measured in terms of the average time to signal adjusted by teh sampling cost when the process is out of control. This measurement evaluates the effectiveness of the scheme in terms of the cost incurred due to nonconformation as well as sampling. The variable sampling rates scheme is shown to be effective especially for small and moderate shifts of the mean when compared to the standard scheme.
PDF

SMCS/SMPS Simulation Algorithms for Estimating Network Reliability (네트워크 신뢰도를 추정하기 위한 SMCS/SMPS 시뮬레이션 기법)

서재준
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.24 no.63
- /
- pp.33-43
- /
- 2001
To estimate the reliability of a large and complex network with a small variance, we propose two dynamic Monte Carlo sampling methods: the sequential minimal cut set (SMCS) and the sequential minimal path set (SMPS) methods. These methods do not require all minimal cut sets or path sets to be given in advance and do not simulate all arcs at each trial, which can decrease the valiance of network reliability. Based on the proposed methods, we develop the importance sampling estimators, the total hazard (or safety) estimator and the hazard (or safety) importance sampling estimator, and compare the performance of these simulation estimators. It is found that these estimators can significantly reduce the variance of the raw simulation estimator and the usual importance sampling estimator. Especially, the SMCS algorithm is very effective in case that the failure probabilities of arcs are low. On the contrary, the SMPS algorithm is effective in case that the success Probabilities of arcs are low.
PDF

Sensitivity Approach of Sequential Sampling for Kriging Model (민감도법을 이용한 크리깅모델의 순차적 실험계획)

Lee, Tae-Hee;Jung, Jae-Jun;Hwang, In-Kyo;Lee, Chang-Seob
- Transactions of the Korean Society of Mechanical Engineers A
- /
- v.28 no.11
- /
- pp.1760-1767
- /
- 2004
Sequential sampling approaches of a metamodel that sampling points are updated sequentially become a significant consideration in metamodeling technique. Sequential sampling design is more effective than classical space filling design of all-at-once sampling because sequential sampling design is to add new sampling points by means of distance between sampling points or precdiction error obtained from metamodel. However, though the extremum points can strongly reflect the behaviors of responses, the existing sequential sampling designs are inefficient to approximate extremum points of original model. In this research, new sequential sampling approach using the sensitivity of Kriging model is proposed, so that new approach reflects the behaviors of response sequentially. Various sequential sampling designs are reviewed and the performances of the proposed approach are compared with those of existing sequential sampling approaches by using mean squared error. The accuracy of the proposed approach is investigated against optimization results of test problems so that superiority of the sensitivity approach is verified.
https://doi.org/10.3795/KSME-A.2004.28.11.1760 인용 PDF KSCI

Standardization of Sample Handling Methods to Reduce the Rate of Inadequate Sampling

Yo-Han Seo
- Quality Improvement in Health Care
- /
- v.29 no.2
- /
- pp.85-93
- /
- 2023
Purpose: The predominant approach for mitigating inadequate sampling rates has primarily involved bolstering the volume of education. This study aimed to curtail inadequate sampling rates through the implementation of continuous quality improvement (CQI) activities, tailoring effective methods to the unique needs of each institution. Methods: We developed a sample handling guidebook and implemented QI activities to address this issue. Results: These measures resulted in a 4.7% decrease in inadequate sampling rates, concurrently improving knowledge of sample handling and overall nurse satisfaction. We addressed the root causes of inadequate sampling before laboratory pre-processing by: 1) focusing on systematic rather than erratic errors through CQI activities, 2) revising the sample handling guide, and 3) delivering face-to-face education based on the specific needs of the nursing department. These changes resulted in an additional 0.6% decrease in the inadequate sampling rate. Conclusion: This study demonstrates that the implementation of CQI activities based on evidence derived from a multifaceted causal analysis significantly reduced the inadequate sampling rate compared to previous studies.
https://doi.org/10.14371/QIH.2023.29.2.85 인용 PDF

Search Result 924, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)