• 제목/요약/키워드: data sampling

Search Result 5,069, Processing Time 0.036 seconds

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

A New Speech Waveform Coding Based on the Nonuniform Sampling Method with Separated to High-Low Band (대역분리-비균일표본화 방법을 이용한 새로운 음성신호의 파형부호화 연구)

  • Bae, Myung-Jin;Lee, Joo-Hun;Im, Sung-Bin;Lee, Won-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.89-93
    • /
    • 1995
  • To reduce the redundancy within samples that resulted from uniform sampling method, nonuniform sampling or nonredundant-sample coding methods can be considered. However, it is well known that when conventional nonuniform sampling methods are applied directly to speech signal, the required amount of data is comparable to or mure than that by uniform sampling method like PCM. To overcome this problem, a new nonuniform sampling method is proposed, in which nonuniform sampling is applied to the low-pass filtered speech signal and higher band is compensated by 8 colored Gaussian random noise with various noise levels. By this method, speech signal waveform can be encoded by 1.8 times larger compression ratio than the conventional nonuniform sampling method.

  • PDF

Which Endometrial Pathologies Need Intraoperative Frozen Sections?

  • Balik, Gulsah;Kagitci, Mehmet;Ustuner, Isik;Akpinar, Funda;Guven, Emine Seda Guvendag
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.10
    • /
    • pp.6121-6125
    • /
    • 2013
  • Background: Endometrial cancers are the most common gynecologic cancers. Endometrial sampling is a preferred procedure for diagnosis of the endometrial pathology. It is performed routinely in many clinics prior to surgery in order to exclude an endometrial malignancy. We aimed to investigate the accuracy of endometrial sampling in the diagnosis of endometrial pathologies and which findings need intra-operative frozen sections. Materials and Methods: Three hundred nine women applying to a university hospital and undergoing endometrial sampling and hysterectomy between 2010 and 2012 were included to this retrospective study. Data were retrieved from patient files and pathology archives. Results: There was 17 patients with malignancy but endometrial sampling could detect this in only 10 of them. The endometrial sampling sensitivity and specificity of detecting cancer were 58.8% and 100%, with negative and positive predictive values of 97.6%, and 100%, respectively. In 7 patients, the endometrial sampling failed to detect malignancy; 4 of these patients had a preoperative diagnosis of complex atypical endometrial hyperplasia and 2 patients had a post-menopausal endometrial polyps and 1 with simple endometrial hyperplasia. Conclusions: There is an increased risk of malignancy in post-menopausal women especially with endometrial polyps and complex atypia hyperplasia. Endometrial sampling is a good choice for the diagnosis of endometrial pathologies. However, the diagnosis should be confirmed by frozen section in patients with post-menopausal endometrial polyps and complex atypia hyperplasia.

A Proposed Algorithm and Sampling Conditions for Nonlinear Analysis of EEG (뇌파의 비선형 분석을 위한 신호추출조건 및 계산 알고리즘)

  • Shin, Chul-Jin;Lee, Kwang-Ho;Choi, Sung-Ku;Yoon, In-Young
    • Sleep Medicine and Psychophysiology
    • /
    • v.6 no.1
    • /
    • pp.52-60
    • /
    • 1999
  • Objectives: With the object of finding the appropriate conditions and algorithms for dimensional analysis of human EEG, we calculated correlation dimensions in the various condition of sampling rate and data aquisition time and improved the computation algorithm by taking advantage of bit operation instead of log operation. Methods: EEG signals from 13 scalp lead of a man were digitized with A-D converter under the condition of 12 bit resolution and 1000 Hertz of sampling rate during 32 seconds. From the original data, we made 15 time series data which have different sampling rate of 62.5, 125, 250, 500, 1000 hertz and data acqusition time of 10, 20, 30 second, respectively. New algorithm to shorten the calculation time using bit operation and the Least Trimmed Squares(LTS) estimator to get the optimal slope was applied to these data. Results: The values of the correlation dimension showed the increasing pattern as the data acquisition time becomes longer. The data with sampling rate of 62.5 Hz showed the highest value of correlation dimension regardless of sampling time but the correlation dimension at other sampling rates revealed similar values. The computation with bit operation instead of log operation had a statistically significant effect of shortening of calculation time and LTS method estimated more stably the slope of correlation dimension than the Least Squares estimator. Conclusion: The bit operation and LTS methods were successfully utilized to time-saving and efficient calculation of correlation dimension. In addition, time series of 20-sec length with sampling rate of 125 Hz was adequate to estimate the dimensional complexity of human EEG.

  • PDF

K-means Clustering using a Grid-based Sampling

  • Park, Hee-Chang;Lee, Sun-Myung
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.249-258
    • /
    • 2003
  • K-means clustering has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters that we want, because it is more primitive, explorative. In this paper we propose a new method of k-means clustering using the grid-based sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

Exploration of CHAID Algorithm by Sampling Proportion

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.215-228
    • /
    • 2003
  • Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, interaction effect identification, category merging and discretizing continuous variable, etc. CHAID(Chi-square Automatic Interaction Detector), is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. CHAID modeling selects a set of predictors and their interactions that optimally predict the dependent measure. In this paper we explore CHAID algorithm in view of accuracy and speed by sampling proportion.

  • PDF

Bayesian Methods for Generalized Linear Models

  • Paul E. Green;Kim, Dae-Hak
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.2
    • /
    • pp.523-532
    • /
    • 1999
  • Generalized linear models have various applications for data arising from many kinds of statistical studies. Although the response variable is generally assumed to be generated from a wide class of probability distributions we focus on count data that are most often analyzed using binomial models for proportions or poisson models for rates. The methods and results presented here also apply to many other categorical data models in general due to the relationship between multinomial and poisson sampling. The novelty of the approach suggested here is that all conditional distribution s can be specified directly so that staraightforward Gibbs sampling is possible. The prior distribution consists of two stages. We rely on a normal nonconjugate prior at the first stage and a vague prior for hyperparameters at the second stage. The methods are demonstrated with an illustrative example using data collected by Rosenkranz and raftery(1994) concerning the number of hospital admissions due to back pain in Washington state.

  • PDF

Design of a Clock and Data Recovery Circuit for High-Speed Serial Data Link Application (고속 시리얼 데이터 링크용 클럭 및 데이터 복원회로 설계)

  • 오운택;이흥배;소병춘;황원석;김수원
    • Proceedings of the IEEK Conference
    • /
    • 2003.07b
    • /
    • pp.1193-1196
    • /
    • 2003
  • This paper proposes a 2x oversampling method with a smart sampling for a clock and data recovery(CDR) circuit in a 2.5Gbps serial data link. In the conventional 2x oversampling method, the "bang-bang" operation of the phase detection produces a systematic jitter in CDR. The smart sampling in phase detection helps the CDR to remove the "bang-bang" operation and to improve the jitter performance. The CDR with the proposed 2x oversampling method is designed using Samsung 0.25${\mu}{\textrm}{m}$ process parameters and verified by simulation. Simulation result shows the proposed 2x oversampling method removes the systematic jitter.e systematic jitter.

  • PDF

Bayesian analysis of an exponentiated half-logistic distribution under progressively type-II censoring

  • Kang, Suk Bok;Seo, Jung In;Kim, Yongku
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1455-1464
    • /
    • 2013
  • This paper develops maximum likelihood estimators (MLEs) of unknown parameters in an exponentiated half-logistic distribution based on a progressively type-II censored sample. We obtain approximate confidence intervals for the MLEs by using asymptotic variance and covariance matrices. Using importance sampling, we obtain Bayes estimators and corresponding credible intervals with the highest posterior density and Bayes predictive intervals for unknown parameters based on progressively type-II censored data from an exponentiated half logistic distribution. For illustration purposes, we examine the validity of the proposed estimation method by using real and simulated data.

THE DEVELOPMENT OF AN OBESITY INDEX MODEL AS A COMPLEMENT TO BMI FOR ADULT: USING THE BLOOD DATA OF KNHANES

  • Ko, Kwanghee;Oh, Chunyoung
    • Honam Mathematical Journal
    • /
    • v.43 no.4
    • /
    • pp.717-739
    • /
    • 2021
  • We used blood data to predict obesity by complementing the BMI risk, because some blood factors are significantly associated with obesity. For the sampling method, a two-step stratified colony sampling method was used based on sixteen blood factors collected by the Korea National Health and Nutrition Examination Survey(KNHANES). We identify the number of effective blood data of obesity in the final model as 6 ~ 8 factors that differ somewhat depending on age and gender. Also, the coefficient of determination that represents the predictive power of obesity in the regression model is the highest for both men and women of aged 19 and in their 20s and 30s, and the predictive power decreases with increasing age.