• 제목/요약/키워드: Non-Gaussian data

검색결과 158건 처리시간 0.023초

수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구 (Study on Anomaly Detection Method of Improper Foods using Import Food Big data)

  • 조상구;최경현
    • 한국빅데이터학회지
    • /
    • 제3권2호
    • /
    • pp.19-33
    • /
    • 2018
  • FTA체결의 증가, 식품교역 증가 및 소비자의 다양한 식품 선호도 등으로 농축수산물 및 가공식품의 수입량은 매년 증가하고 있는 추세이다. 수입식품의 안전성을 확인하는 정밀검사는 전체 수입식품건수 대비 20%정도를 차지하고 계속 증가하고 있는 반면에 정부의 수입안전관리에 필요한 예산과 인력은 그 한계점에 다다르고 있다. 수입식품 안전사고가 발생하게 되면 막대한 사회적, 경제적 손실을 야기할 수 있으므로 수입식품의 수입허용여부를 정확하게 예측하여 선제 대응하는 것은 수입안전관리의 효율성과 경제성을 획기적으로 높일 수 있게 된다. 식품분야에서는 이미 엄청난 양의 정형 데이터가 과거로부터 쌓여 왔으며 이에 대한 충분한 분석을 통한 활용은 아직은 부족한 것이 현실이다. 전체 수입건수와 중량 중에서 차지하는 가공식품의 비중은 평균 75%에 달하고 있어 식품분야에서도 빅데이터의 분석, 분석기법의 적용 등으로 다량의 데이터로부터 의미 있는 정보를 추출하는 과학적이고 자동화된 부적합탐지시스템의 연구가 절실한 상황이다. 이러한 배경에서 본 연구는 기계학습분야의 다양한 부적합 예측 모형을 적용하였으며 예측 모형의 정확도를 개선시키기 위한 방편으로 새로운 파생변수의 생성을 통한 데이터 전처리 방안을 제시하였다. 또한 본 연구에서는 기계학습분야의 일반적인 기저 분류기를 적용하여 예측 모형의 성능을 비교하였으며 여러 기저분류기 중 Gaussian Naïve Bayes예측 모형이 수입식품의 부적합을 탐지하여 예측하는 가장 좋은 성과를 보여주었다. 향후 Gaussian Naïve Bayes 예측 모형을 이용한 부적합 탐지 모형을 적용하여 수입식품의 정밀검사 비중을 낮추고 부적합률을 제고시킴으로써 수입안전관리 국가사무의 효율성과 수입통관의 신속성에 지대한 효과를 거둘 수 있으리라 기대한다.

Development of a novel fatigue damage model for Gaussian wide band stress responses using numerical approximation methods

  • Jun, Seock-Hee;Park, Jun-Bum
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • 제12권1호
    • /
    • pp.755-767
    • /
    • 2020
  • A significant development has been made on a new fatigue damage model applicable to Gaussian wide band stress response spectra using numerical approximation methods such as data processing, time simulation, and regression analysis. So far, most of the alternative approximate models provide slightly underestimated or overestimated damage results compared with the rain-flow counting distribution. A more reliable approximate model that can minimize the damage differences between exact and approximate solutions is required for the practical design of ships and offshore structures. The present paper provides a detailed description of the development process of a new fatigue damage model. Based on the principle of the Gaussian wide band model, this study aims to develop the best approximate fatigue damage model. To obtain highly accurate damage distributions, this study deals with some prominent research findings, i.e., the moment of rain-flow range distribution MRR(n), the special bandwidth parameter μk, the empirical closed form model consisting of four probability density functions, and the correction factor QC. Sequential prerequisite data processes, such as creation of various stress spectra, extraction of stress time history, and the rain-flow counting stress process, are conducted so that these research findings provide much better results. Through comparison studies, the proposed model shows more reliable and accurate damage distributions, very close to those of the rain-flow counting solution. Several significant achievements and findings obtained from this study are suggested. Further work is needed to apply the new developed model to crack growth prediction under a random stress process in view of the engineering critical assessment of offshore structures. The present developed formulation and procedure also need to be extended to non-Gaussian wide band processes.

IMPLEMENTATION OF DATA ASSIMILATION METHODOLOGY FOR PHYSICAL MODEL UNCERTAINTY EVALUATION USING POST-CHF EXPERIMENTAL DATA

  • Heo, Jaeseok;Lee, Seung-Wook;Kim, Kyung Doo
    • Nuclear Engineering and Technology
    • /
    • 제46권5호
    • /
    • pp.619-632
    • /
    • 2014
  • The Best Estimate Plus Uncertainty (BEPU) method has been widely used to evaluate the uncertainty of a best-estimate thermal hydraulic system code against a figure of merit. This uncertainty is typically evaluated based on the physical model's uncertainties determined by expert judgment. This paper introduces the application of data assimilation methodology to determine the uncertainty bands of the physical models, e.g., the mean value and standard deviation of the parameters, based upon the statistical approach rather than expert judgment. Data assimilation suggests a mathematical methodology for the best estimate bias and the uncertainties of the physical models which optimize the system response following the calibration of model parameters and responses. The mathematical approaches include deterministic and probabilistic methods of data assimilation to solve both linear and nonlinear problems with the a posteriori distribution of parameters derived based on Bayes' theorem. The inverse problem was solved analytically to obtain the mean value and standard deviation of the parameters assuming Gaussian distributions for the parameters and responses, and a sampling method was utilized to illustrate the non-Gaussian a posteriori distributions of parameters. SPACE is used to demonstrate the data assimilation method by determining the bias and the uncertainty bands of the physical models employing Bennett's heated tube test data and Becker's post critical heat flux experimental data. Based on the results of the data assimilation process, the major sources of the modeling uncertainties were identified for further model development.

Pseudo Complex Correlation Coefficient: with Application to Correlated Information Sources for NOMA in 5G systems

  • Chung, Kyuhyuk
    • International journal of advanced smart convergence
    • /
    • 제9권4호
    • /
    • pp.42-51
    • /
    • 2020
  • In this paper, the authors propose the pseudo complex correlation coefficient (PCCC) of the two complex random variables (RV), because the four real correlation coefficients (RCC) of the corresponding four real RVs cannot be obtained only from the complex correlation coefficient (CCC) of given two complex RV. Such observation is motivated by the general statement; "The complex jointly-Gaussian random M-vector cannot be completely described by the complex covariance matrix, even though the real Gaussian random 2M-vector can be completely descried by the real covariance matrix. Therefore, in order to describe completely the complex jointly-Gaussian random M-vector, we need an additional matrix, namely the complex pseudo-covariance matrix, along with the complex covariance matrix." Then, we apply PCCC to correlated information sources (CIS) for non-orthogonal multiple access (NOMA) in 5G system, and investigate impact of the proposed PCCC on the achievable data rate of the stronger channel user in the conventional successive interference cancellation (SIC) NOMA with CIS. It is shown that for the given same CCC, the achievable data rates with the different PCCC are different, because the corresponding RCC are different. We also show that as the absolute value of the same CCC increases, the impact of the different PCCC becomes more significant.

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회/대한산업공학회 2003년도 춘계공동학술대회
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

일순간최대풍속의 난류특성에 관한 평가 (Estimation on the Turbulence Characteristics of Daily Instantaneous Maximum Wind Velocity)

  • 오종섭
    • 한국방재안전학회논문집
    • /
    • 제10권1호
    • /
    • pp.75-84
    • /
    • 2017
  • 내풍설계에서 기본풍속의 경우 우리나라는 10분 평균풍속을 이용하고 있지만, 기후변화와 태풍의 직간접 영향 및 강도증가로 인한 순간최대풍속이 구조물에 미치는 영향이 더 크다는 사실이 알려지고 있고, 일부 다른 나라에서는 이러한 순간풍속의 효과를 고려 3초의 평균풍속을 이용하고 있다. 본 논문에서는 1973-2016연까지의 일순간최대풍속의 확률과정, 통계적 성질, 난류의 특성 등을 평가하기 위하여 대표지점(17개 지점)을 선정했다. 선정된 각 지점에 대한 일순간최대풍속자료는 기상청으로부터 획득했다. 획득된 순간풍속의 해석결과 다음과 같은 결론을 얻었다. 1. 제주 서귀포 여수 부산에서의 8 7 9월에 0.2~0.35%로 나타났고, 서울 대관령은 3 4 5월에 0.25%로 나타났다. 2. 확률과정의 왜도평가에서 해안지역보다는 내륙지역에서의 더 큰 비정규성을 나타냈다. 3. 인접지역의 상관계수 평가에서 서울 인천(0.8), 대전 청주(0.75), 제주 서귀포(0.72) 순으로 나타났으며, 대관령 강릉은(-0.07), 전주 군산(0.0)은 인접지역의 영향이 거의 없는 것으로 나타났다.

STATISTICAL STUDY ON PERSONAL REDUCTION COEFFICIENTS OF SUNSPOT NUMBERS SINCE 1981

  • Cho, Il-Hyun;Bong, Su-Chan;Cho, Kyung-Suk;Lee, Jaejin;Kim, Rok-Soon;Park, Young-Deuk;Kim, Yeon-Han
    • 천문학회지
    • /
    • 제47권6호
    • /
    • pp.255-258
    • /
    • 2014
  • Using sunspot number data from 270 historical stations for the period 1981-2013, we investigate their personal reduction coefficients (k) statistically. Chang & Oh (2012) perform a simulation showing that the k varies with the solar cycle. We try to verify their results using observational data. For this, a weighted mean and weighted standard deviation of monthly sunspot number are used to estimate the error from observed data. We find that the observed error (noise) is much smaller than that used in the simulation. Thus no distinct k-variation with the solar cycle is observed contrary to the simulation. In addition, the probability distribution of k is determined to be non-Gaussian with a fat-tail on the right side. This result implies that the relative sunspot number after 1981 might be overestimated since the mean value of k is less than that of the Gaussian distribution.

일반화된 선형 혼합 모형(GENERALIZED LINEAR MIXED MODEL: GLMM)에 관한 최근의 연구 동향 (A Study for Recent Development of Generalized Linear Mixed Model)

  • 이준영
    • 응용통계연구
    • /
    • 제13권2호
    • /
    • pp.541-562
    • /
    • 2000
  • 일반화된 선형 혼합 모형(GLMM)은 자료가 계수의 형태로 나타나는 범주형 자료의 경우, 혹은 집락의 형태나 과산포된 비정규 자료, 또는 비선형 모형에 따르는 자료를 다루기 위한 모형 설정에 사용된다. 본 연구에서는 이에 대한 개요와 더불어, 이 모형의 적합을 위해 제시된 통계적 기법들중 의사가능도(quasi-likelihood: QL)를 이용한 추정 방법 및 Monte-Carlo 기법을 이용한 추정 방법들에 대해 조사하였다. 또한 GLMM에 대한 현재의 연구 방향 및 앞으로의 연구 가능 주제들에 대해서도 언급하였다.

  • PDF

Rao-Blackwellized Multiple Model Particle Filter자료융합 알고리즘 (Rao-Blackwellized Multiple Model Particle Filter Data Fusion algorithm)

  • 김도형
    • 한국항행학회논문지
    • /
    • 제15권4호
    • /
    • pp.556-561
    • /
    • 2011
  • 일반적으로 비선형 시스템에서 particle filter가 Kalman Filter보다 표적추적 성능이 뛰어나다고 알려져 있다. 그러나 particle filter는 많은 연산량을 요구하는 단점이 있다. 본 논문에서는 particle filter 보다 적은 particle의 수, 즉 적은 연산량으로 동일한 성능을 가지는 Rao-Blackwellized particle filter의 모델의 민감성을 줄인 Rao-Blackwellized Multiple Model Particle Filter(RBMMPF)의 알고리즘을 소개하고 이에 다중센서 정보를 융합하는 자료융합 기법을 적용하였다. 시뮬레이션을 통해 단일센서 정보를 이용한 RBMMPF 표적추적 성능과 다중센서정보를 융합한 RBMMPF의 표적추적 성능을 비교, 분석하였다.

Towards high-accuracy data modelling, uncertainty quantification and correlation analysis for SHM measurements during typhoon events using an improved most likely heteroscedastic Gaussian process

  • Qi-Ang Wang;Hao-Bo Wang;Zhan-Guo Ma;Yi-Qing Ni;Zhi-Jun Liu;Jian Jiang;Rui Sun;Hao-Wei Zhu
    • Smart Structures and Systems
    • /
    • 제32권4호
    • /
    • pp.267-279
    • /
    • 2023
  • Data modelling and interpretation for structural health monitoring (SHM) field data are critical for evaluating structural performance and quantifying the vulnerability of infrastructure systems. In order to improve the data modelling accuracy, and extend the application range from data regression analysis to out-of-sample forecasting analysis, an improved most likely heteroscedastic Gaussian process (iMLHGP) methodology is proposed in this study by the incorporation of the outof-sample forecasting algorithm. The proposed iMLHGP method overcomes this limitation of constant variance of Gaussian process (GP), and can be used for estimating non-stationary typhoon-induced response statistics with high volatility. The first attempt at performing data regression and forecasting analysis on structural responses using the proposed iMLHGP method has been presented by applying it to real-world filed SHM data from an instrumented cable-stay bridge during typhoon events. Uncertainty quantification and correlation analysis were also carried out to investigate the influence of typhoons on bridge strain data. Results show that the iMLHGP method has high accuracy in both regression and out-of-sample forecasting. The iMLHGP framework takes both data heteroscedasticity and accurate analytical processing of noise variance (replace with a point estimation on the most likely value) into account to avoid the intensive computational effort. According to uncertainty quantification and correlation analysis results, the uncertainties of strain measurements are affected by both traffic and wind speed. The overall change of bridge strain is affected by temperature, and the local fluctuation is greatly affected by wind speed in typhoon conditions.