• Title/Summary/Keyword: data sampling

Search Result 5,029, Processing Time 0.031 seconds

Hierarchical sampling optimization of particle filter for global robot localization in pervasive network environment

  • Lee, Yu-Cheol;Myung, Hyun
    • ETRI Journal
    • /
    • v.41 no.6
    • /
    • pp.782-796
    • /
    • 2019
  • This paper presents a hierarchical framework for managing the sampling distribution of a particle filter (PF) that estimates the global positions of mobile robots in a large-scale area. The key concept is to gradually improve the accuracy of the global localization by fusing sensor information with different characteristics. The sensor observations are the received signal strength indications (RSSIs) of Wi-Fi devices as network facilities and the range of a laser scanner. First, the RSSI data used for determining certain global areas within which the robot is located are represented as RSSI bins. In addition, the results of the RSSI bins contain the uncertainty of localization, which is utilized for calculating the optimal sampling size of the PF to cover the regions of the RSSI bins. The range data are then used to estimate the precise position of the robot in the regions of the RSSI bins using the core process of the PF. The experimental results demonstrate superior performance compared with other approaches in terms of the success rate of the global localization and the amount of computation for managing the optimal sampling size.

On inference of multivariate means under ranked set sampling

  • Rochani, Haresh;Linder, Daniel F.;Samawi, Hani;Panchal, Viral
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.1
    • /
    • pp.1-13
    • /
    • 2018
  • In many studies, a researcher attempts to describe a population where units are measured for multiple outcomes, or responses. In this paper, we present an efficient procedure based on ranked set sampling to estimate and perform hypothesis testing on a multivariate mean. The method is based on ranking on an auxiliary covariate, which is assumed to be correlated with the multivariate response, in order to improve the efficiency of the estimation. We showed that the proposed estimators developed under this sampling scheme are unbiased, have smaller variance in the multivariate sense, and are asymptotically Gaussian. We also demonstrated that the efficiency of multivariate regression estimator can be improved by using Ranked set sampling. A bootstrap routine is developed in the statistical software R to perform inference when the sample size is small. We use a simulation study to investigate the performance of the method under known conditions and apply the method to the biomarker data collected in China Health and Nutrition Survey (CHNS 2009) data.

The Effects of Varying Sampling Flow Rates on the Measurements of Total Nitrate and Sulfate in Dry Acid Deposition

  • Park, Jong-Kil;Kim, Jo-Chun
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.18 no.E1
    • /
    • pp.1-12
    • /
    • 2002
  • One technique for determining dry acid deposition fluxes involves measurement of time - averaged ambient concentrations of dry acid deposition species using filter packs (FP) coupled with estimates of mean deposition velocities for the exposure period. A critical problem associated with filter pack data comparisons between various field sampling networks is the use of diverse sampling flow rates and duration protocols. Field experiments were conducted to evaluate the effects of varying sampling flow rates, from 1.5 to 10 standard liters per minute, on total nitrate and sulfate measurements of specific dry acid deposition species . Collocated FP samplers were used to determine sampling and analysis data reproducibility and representativeness . Ambient air samples were simultaneously collected using groups of filter packs operated at various flow rates over identical 7 day periods. The species measured were sulfur dioxide, particulate sulfate , nitric acid and particulate nitrate. Statistical results (ANOVA; alpha level 5%) showed that neither the low nor high sampling flow rates caused a significant difference in the measurements of total sulfate and adjusted total nitrate (ATN) . However, it was concluded that for high flow rate sampling measurements, total nitrate (TN) could be affected during extended sampling durations because of potential nitric acid overloading and breakthrough. Although the previous workers (Costello, 1990; Quillian, 1990) used much higher sampling flow rates (~ 17 sLpm) than employed here, it was assumed that for a high loading (> 50$\mu\textrm{g}$ HNO$_3$) of nitric acid on the Nylon filters, a significant fraction (~10%) of nitric acid could pass through the Nylon filters and be collected on the carbonate impregnated filters. It was concluded that even at the highest sampling flow rate employed (10 sLpm) at the Cary Forest site, nitric acid breakthrough was less than 10% of the total HNO$_3$ collected. However, for a heavily polluted urban airshed or with longer sampling times , higher filter loadings could result in substantial nitric acid breakthrough and HNO$_3$concentrations would be underestimated.

Does Different Performance of Sampling Gears (Cast Net versus Gill Net) Bring the Inappropriate Estimation of Freshwater Fish in a Large River?

  • Kim, Jeong-Hui;Park, Sang-Hyeon;Baek, Seung-Ho;Jang, Min-Ho;Lee, Hae-Jin;Yoon, Ju-Duk
    • Korean Journal of Ecology and Environment
    • /
    • v.53 no.2
    • /
    • pp.156-164
    • /
    • 2020
  • The accurate estimation of fish assemblages is highly dependent on the sampling gear used for sampling. We used data from 15 sampling sites along the Nakdong River, which is a large river in South Korea, to identify differences in assemblages and sizes of freshwater fishes collected with either cast nets or gill nets, the two most commonly used sampling gear in South Korea. The two gears differed in the fish assemblages they captured, with more species caught by gill nets. Further, due to its tighter mesh size, the cast net caught significantly smaller fishes than the gill nets(independent t-test, p<0.05). We found the cast net to be appropriate for species that inhabit shallow (less than 2 m) and open water, but inappropriate for deep water, habitats with plant beds, and nocturnal species. Thus, cast net sampling is not efficient in a large river environment, and a combination of sampling methods is more suitable for understanding fish assemblages in such habitats. In general, appropriate selection of fishing methods to specific habitats is necessary to improve data quality and minimize the misrepresentation of environmental conditions.

A Study on Measuring the Similarity Among Sampling Sites in Lake (저수지 수질조사 지점간 유사성 분석)

  • Lee, Yo-Sang;Koh, Deuk-Koo;Lee, Hyun-Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.957-961
    • /
    • 2010
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data. For empirical study, data of two years at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

  • PDF

An application and development of an activity lesson guessing a population ratio by sampling with replacement in 'Closed box' ('닫힌 상자'에서의 복원추출에 의한 모비율 추측 활동수업 개발 및 적용)

  • Lee, Gi Don
    • The Mathematical Education
    • /
    • v.57 no.4
    • /
    • pp.413-431
    • /
    • 2018
  • In this study, I developed an activity oriented lesson to support the understanding of probabilistic and quantitative estimating population ratios according to the standard statistical principles and discussed its implications in didactical respects. The developed activity lesson, as an efficient physical simulation activity by sampling with replacement, simulates unknown populations and real problem situations through completely closed 'Closed Box' in which we can not see nor take out the inside balls, and provides teaching and learning devices which highlight the representativeness of sample ratios and the sampling variability. I applied this activity lesson to the gifted students who did not learn estimating population ratios and collected the research data such as the activity sheets and recording and transcribing data of students' presenting, and analyzed them by Qualitative Content Analysis. As a result of an application, this activity lesson was effective in recognizing and reflecting on the representativeness of sample ratios and recognizing the random sampling variability. On the other hand, in order to show the sampling variability clearer, I discussed appropriately increasing the total number of the inside balls put in 'Closed Box' and the active involvement of the teachers to make students pay attention to controlling possible selection bias in sampling processes.

Effects of Call-back Rules and Random Selection of Respondents: Statistical Re-analysis of R&R’s Ulsan Survey Data. (전화조사에서 재통화 규칙준수와 응답자 임의선택의 영향 - R&R 울산 사례의 통계적 재분석 -)

  • 허명회;임여주;노규형
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.247-259
    • /
    • 2003
  • In Korea, quota sampling is mainly adopted in telephone surveys, instead of random sampling which requires call-back procedure and random selection of respondent within households. The contact mode based on the se $x^{*}$age quotas is economically more advantageous and less time-consuming. However, it lacks theoretical ground for valid statistical inference, so that it is hardly accepted in academic circles despite of widely spread practice. Subsequently, survey theoreticians argued that random sampling-based telephone surveys should be tried. In response, Research & Research (R&R), a private research company in Seoul, executed atelephone survey by random sampling mode for the prediction of 2002 Ulsan City Mayor Election. The aim of this case study is to find out various effects of the call-back rule with random selection of respondents by statistically re-analyzing R&R’s Ulsan Survey Data.s by statistically re-analyzing R&R’s Ulsan Survey Data.

The Study on Effect of sEMG Sampling Frequency on Learning Performance in CNN based Finger Number Recognition (CNN 기반 한국 숫자지화 인식 응용에서 표면근전도 샘플링 주파수가 학습 성능에 미치는 영향에 관한 연구)

  • Gerelbat BatGerel;Chun-Ki Kwon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.1
    • /
    • pp.51-56
    • /
    • 2023
  • This study investigates the effect of sEMG sampling frequency on CNN learning performance at Korean finger number recognition application. Since the bigger sampling frequency of sEMG signals generates bigger size of input data and takes longer CNN's learning time. It makes making real-time system implementation more difficult and more costly. Thus, there might be appropriate sampling frequency when collecting sEMG signals. To this end, this work choose five different sampling frequencies which are 1,024Hz, 512Hz, 256Hz, 128Hz and 64Hz and investigates CNN learning performance with sEMG data taken at each sampling frequency. The comparative study shows that all CNN recognized Korean finger number one to five at the accuracy of 100% and CNN with sEMG signals collected at 256Hz sampling frequency takes the shortest learning time to reach the epoch at which korean finger number gestures are recognized at the accuracy of 100%.

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

A study on intrusion detection performance improvement through imbalanced data processing (불균형 데이터 처리를 통한 침입탐지 성능향상에 관한 연구)

  • Jung, Il Ok;Ji, Jae-Won;Lee, Gyu-Hwan;Kim, Myo-Jeong
    • Convergence Security Journal
    • /
    • v.21 no.3
    • /
    • pp.57-66
    • /
    • 2021
  • As the detection performance using deep learning and machine learning of the intrusion detection field has been verified, the cases of using it are increasing day by day. However, it is difficult to collect the data required for learning, and it is difficult to apply the machine learning performance to reality due to the imbalance of the collected data. Therefore, in this paper, A mixed sampling technique using t-SNE visualization for imbalanced data processing is proposed as a solution to this problem. To do this, separate fields according to characteristics for intrusion detection events, including payload. Extracts TF-IDF-based features for separated fields. After applying the mixed sampling technique based on the extracted features, a data set optimized for intrusion detection with imbalanced data is obtained through data visualization using t-SNE. Nine sampling techniques were applied through the open intrusion detection dataset CSIC2012, and it was verified that the proposed sampling technique improves detection performance through F-score and G-mean evaluation indicators.