• Title/Summary/Keyword: data sampling

Search Result 5,009, Processing Time 0.037 seconds

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

  • Kang, Dae-Ki;Han, Min-gyu
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.75-81
    • /
    • 2019
  • Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

An improved linear sampled-data output regulators (개선된 선형 샘플치 출력 조절기)

  • 정선태
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.1726-1729
    • /
    • 1997
  • In general, the solvability of linear robust output regulation problem are not preserved under time-sampling. Thus, it is found that the digital regulator implemented by itme-sampling of anlog output regulator designed based on the continuous-time linear system model is nothing but a 1st order approximation with respect to time-sampling. By the way, one can design an improved sampled-data regulator with respect to sampling time by utilizing the intrinsic structure of the system. In this paper, we study the system structures which it is possible to design an improved sampled-data regulator with respect to sampling time.

  • PDF

Fast Volume Visualization Techniques for Ultrasound Data

  • Kwon Koo-Joo;Shin Byeong-Seok
    • Journal of Biomedical Engineering Research
    • /
    • v.27 no.1
    • /
    • pp.6-13
    • /
    • 2006
  • Ultrasound visualization is a typical diagnosis method to examine organs, soft tissues and fetus data. It is difficult to visualize ultrasound data because the quality of the data might be degraded by artifact and speckle noise, and gathered with non-linear sampling. Rendering speed is too slow since we can not use additional data structures or procedures in rendering stage. In this paper, we use several visualization methods for fast rendering of ultrasound data. First method, denoted as adaptive ray sampling, is to reduce the number of samples by adjusting sampling interval in empty space. Secondly, we use early ray termination scheme with sufficiently wide sampling interval and low threshold value of opacity during color compositing. Lastly, we use bilinear interpolation instead of trilinear interpolation for sampling in transparent region. We conclude that our method reduces the rendering time without loss of image quality in comparison to the conventional methods.

Output regulation of nonlinear sampled-data systems (비선형 샘플치 시스템의 출력조절)

  • 정선태
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10b
    • /
    • pp.391-394
    • /
    • 1996
  • The effects of time-sampling on nonlinear output regulation problem is investigated. Output regulatedness is preserved under time sampling as in linear systems, however output regulatability is not robust with respect to time-sampling, and thus one needs to seek an approximate nonlinear sampled-data output regulator.

  • PDF

Scheduling algirithm of data sampling times in the real-time distributed control systems

  • Hong, Seung-Ho
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1992.10b
    • /
    • pp.112-117
    • /
    • 1992
  • The Real-time Distributed Control Systems(RDCS) consist of several distributed control processes which share a network medium to exchange their data. Performance of feedback control loops in the RDCS is subject to the network-induced delays from sensor to controller and from controller to actuator. The network-induced delays are directly dependent upon the data sampling times of the control components which share a network medium. In this study, a scheduling algorithm of determining data sampling times is developed using the window concept, where the sampling data from the control components dynamically share a limited number of windows.

  • PDF

SPATIAL AND TEMPORAL INFLUENCES ON SOIL MOISTURE ESTIMATION

  • Kim, Gwang-seob
    • Water Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.31-44
    • /
    • 2002
  • The effect of diurnal cycle, intermittent visit of observation satellite, sensor installation, partial coverage of remote sensing, heterogeneity of soil properties and precipitation to the soil moisture estimation error were analyzed to present the global sampling strategy of soil moisture. Three models, the theoretical soil moisture model, WGR model proposed Waymire of at. (1984) to generate rainfall, and Turning Band Method to generate two dimensional soil porosity, active soil depth and loss coefficient field were used to construct sufficient two-dimensional soil moisture data based on different scenarios. The sampling error is dominated by sampling interval and design scheme. The effect of heterogeneity of soil properties and rainfall to sampling error is smaller than that of temporal gap and spatial gap. Selecting a small sampling interval can dramatically reduce the sampling error generated by other factors such as heterogeneity of rainfall, soil properties, topography, and climatic conditions. If the annual mean of coverage portion is about 90%, the effect of partial coverage to sampling error can be disregarded. The water retention capacity of fields is very important in the sampling error. The smaller the water retention capacity of the field (small soil porosity and thin active soil depth), the greater the sampling error. These results indicate that the sampling error is very sensitive to water retention capacity. Block random installation gets more accurate data than random installation of soil moisture gages. The Walnut Gulch soil moisture data show that the diurnal variation of soil moisture causes sampling error between 1 and 4 % in daily estimation.

  • PDF

A Comparison of Systematic Sampling Designs for Forest Inventory

  • Yim, Jong Su;Kleinn, Christoph;Kim, Sung Ho;Jeong, Jin-Hyun;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.2
    • /
    • pp.133-141
    • /
    • 2009
  • This study was conducted to support for determining an efficient sampling design for forest resources assessments in South Korea with respect to statistical efficiency. For this objective, different systematic sampling designs were simulated and compared based on an artificial forest population that had been built from field sample data and satellite data in Yang-Pyeong County, Korea. Using the k-NN technique, two thematic maps (growing stock and forest cover type per pixel unit) across the test area were generated; field data (n=191) and Landsat ETM+ were used as source data. Four sampling designs (systematic sampling, systematic sampling for post-stratification, systematic cluster sampling, and stratified systematic sampling) were employed as optimum sampling design candidates. In order to compute error variance, the Monte Carlo simulation was used (k=1,000). Then, sampling error and relative efficiency were compared. When the objective of an inventory was to obtain estimations for the entire population, systematic cluster sampling was superior to the other sampling designs. If its objective is to obtain estimations for each sub-population, post-stratification gave a better estimation. In order to successfully perform this procedure, it requires clear definitions of strata of interest per field observation unit for efficient stratification.

Comparison of two sampling intervals and three sampling intervals VSI charts for monitoring both means and variances

  • Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.997-1006
    • /
    • 2015
  • In industrial quality control, when engineers use VSI control procedure they should consider both required time to signal and switching behaviors together in the case of production process changed. Up to the present, many researchers have studied fixed sampling interval (FSI) chart and variable sampling interval (VSI) chart in the points of average number of samples to signal (ANSS) and average time to signal (ATS). However, ANSS and ATS do not provide any switching information between different sampling intervals of VSI schemes. In this study, performances of two sampling intervals VSI chart and three sampling intervals VSI chart are evaluated and compared. The numerical results show that ANSS and ATS values of two sampling intervals VSI chart and three sampling interval VSI chart are similar regardless the amount of shifts. However, the values of switching behaviors including ANSW are less efficient in three sampling intervals VSI charts than in two sampling intervals VSI chart.

Introduction to the Strategic Sampling Approaches to Construct Optimal Conceptual Model of a Contaminated Site (오염부지 최적 개념모델 수립을 위한 전략적 샘플링 기법 소개)

  • Park, Hyun Ji;Kim, Han-Suk;Yun, Seong-Taek;Jo, Ho Young;Kwon, Man Jae
    • Journal of Soil and Groundwater Environment
    • /
    • v.25 no.2_spc
    • /
    • pp.28-54
    • /
    • 2020
  • Even though a systematic sampling approach is very crucial in both the general and detailed investigation phases to produce the best conceptual site model for contaminated sites, the concept is not yet established in South Korea. The U.S. Environmental Protection Agency (EPA) issued the 'Strategic Sampling Approaches Technical guide' in 2018 to help environmental professionals choose which sampling approaches may be needed and most effective for given site conditions. The EPA guide broadly defines strategic sampling as the application of focused data collection across targeted areas of the conceptual site model (CSM) to provide the appropriate amount and type of information needed for decision-making. These strategic sampling approaches can prevent the essential data from missing, minimize the uncertainty of projects and secure the data which are necessary for the important site-decisions. Furthermore, these provide collaborative data sets through the life cycle phases of projects, which can generate more positive proofs on the site-decisions. The strategic sampling approaches can be divided by site conditions. This technical guide categorized it into eight conditions; High-resolution site characterization in unconsolidated environments, High-resolution site characterization in fractured sedimentary rock environments, Incremental sampling, Contaminant source definition, Passive groundwater sampling, Passive sampling for surface water and sediment, Groundwater to surface water interaction, and Vapor intrusion. This commentary paper introduces specific sampling methods based on site conditions when the strategic sampling approaches are applied.

A New Statistical Sampling Method for Reducing Computing time of Machine Learning Algorithms (기계학습 알고리즘의 컴퓨팅시간 단축을 위한 새로운 통계적 샘플링 기법)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.2
    • /
    • pp.171-177
    • /
    • 2011
  • Accuracy and computing time are considerable issues in machine learning. In general, the computing time for data analysis is increased in proportion to the size of given data. So, we need a sampling approach to reduce the size of training data. But, the accuracy of constructed model is decreased by going down the data size simultaneously. To solve this problem, we propose a new statistical sampling method having similar performance to the total data. We suggest a rule to select optimal sampling techniques according to given data structure. This paper shows a sampling method for reducing computing time with keeping the most of accuracy using cluster sampling, stratified sampling, and systematic sampling. We verify improved performance of proposed method by accuracy and computing time between sample data and total data using objective machine learning data sets.