• 제목/요약/키워드: statistical data processing

검색결과 679건 처리시간 0.039초

문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선 (Improved Statistical Language Model for Context-sensitive Spelling Error Candidates)

  • 이정훈;김민호;권혁철
    • 한국멀티미디어학회논문지
    • /
    • 제20권2호
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

On-Line Analytical Processing and Research Problems for Statisticians

  • Ahn, JeongYong;Han, Kyung Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.457-463
    • /
    • 2000
  • Recently, statistical analysis tools have been changed to the applications on the World Wide Web that access data stored in databases. On-line analytical processing(OLAP) is a class of technologies that give users statistical information with multidimensional views of data in databases. In this paper, we introduce the concept and requisites of OLAP system, and we propose some research issues.

  • PDF

A Study for the Features of Data Analysis Methods Used in Medical Research

  • 신재경;장덕준;문승호
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권2호
    • /
    • pp.257-264
    • /
    • 2003
  • The perception of the importance of statistical methods for processing medical data in Korea's medical research and the practical use of the analysis method are insufficient. From this standpoint, in order to examine the features of the data analysis method used in the medical journals of Korea and America, we have examined the research papers which has been published in the exemplary medical journals of both countries. It showed that there was a large difference in the quantity and quality between Korea and America. Especially in the medical research of Korea, we could notice that the use of statistical methods were comparatively low. Hence the researchers in the medical area are encouraged to use more statistical methods in processing medical data.

  • PDF

Practical Guide to NMR-based Metabolomics - III : NMR Spectrum Processing and Multivariate Analysis

  • Jung, Young-Sang
    • 한국자기공명학회논문지
    • /
    • 제22권3호
    • /
    • pp.46-53
    • /
    • 2018
  • NMR-based metabolomics needs various knowledge to elucidate metabolic perturbation such as NMR experiments, NMR spectrum processing, raw data processing, metabolite identification, statistical analysis, and metabolic pathway analysis regarding technical aspects. Among them, some concepts of raw data processing and multivariate analysis are not easy to understand but are important to correctly interpret metabolic profile. This article introduces NMR spectrum processing, raw data processing, and multivariate analysis.

베이지안 통계 추론 (On the Bayesian Statistical Inference)

  • 이호석
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2007년도 한국컴퓨터종합학술대회논문집 Vol.34 No.1 (C)
    • /
    • pp.263-266
    • /
    • 2007
  • 본 논문은 베이지안 통계 추론에 대하여 논의한다. 논문은 베이지안 추론, Markov Chain과 Monte Carlo 적분, MCMC(Markov Chain Monte Carlo) 기법, Metropolis-Hastings 알고리즘, Gibbs 샘플링, Maximum Likelihood Estimation, EM 알고리즘, 상실된 데이터 보완 기법, BMA(Bayesian Model Averaging) 순서로 논의를 진행한다. 이러한 통계적 기법들은 대용량의 데이터를 처리하는 생물학, 의학, 생명 공학, 과학과 공학, 그리고 일반 데이터 조사와 처리 등에 사용되고 있으며, 최적의 추론 결과를 이끌어 내는데 중요한 방법을 제공하고 있다. 그리고 마지막으로 PC(Principal Component) 분석 기법에 대하여 논의한다. PC 분석 기법도 데이터 분석과 연구에 많이 활용된다.

  • PDF

Comparison of different post-processing techniques in real-time forecast skill improvement

  • Jabbari, Aida;Bae, Deg-Hyo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2018년도 학술발표회
    • /
    • pp.150-150
    • /
    • 2018
  • The Numerical Weather Prediction (NWP) models provide information for weather forecasts. The highly nonlinear and complex interactions in the atmosphere are simplified in meteorological models through approximations and parameterization. Therefore, the simplifications may lead to biases and errors in model results. Although the models have improved over time, the biased outputs of these models are still a matter of concern in meteorological and hydrological studies. Thus, bias removal is an essential step prior to using outputs of atmospheric models. The main idea of statistical bias correction methods is to develop a statistical relationship between modeled and observed variables over the same historical period. The Model Output Statistics (MOS) would be desirable to better match the real time forecast data with observation records. Statistical post-processing methods relate model outputs to the observed values at the sites of interest. In this study three methods are used to remove the possible biases of the real-time outputs of the Weather Research and Forecast (WRF) model in Imjin basin (North and South Korea). The post-processing techniques include the Linear Regression (LR), Linear Scaling (LS) and Power Scaling (PS) methods. The MOS techniques used in this study include three main steps: preprocessing of the historical data in training set, development of the equations, and application of the equations for the validation set. The expected results show the accuracy improvement of the real-time forecast data before and after bias correction. The comparison of the different methods will clarify the best method for the purpose of the forecast skill enhancement in a real-time case study.

  • PDF

Quantitative Analysis for Plasma Etch Modeling Using Optical Emission Spectroscopy: Prediction of Plasma Etch Responses

  • Jeong, Young-Seon;Hwang, Sangheum;Ko, Young-Don
    • Industrial Engineering and Management Systems
    • /
    • 제14권4호
    • /
    • pp.392-400
    • /
    • 2015
  • Monitoring of plasma etch processes for fault detection is one of the hallmark procedures in semiconductor manufacturing. Optical emission spectroscopy (OES) has been considered as a gold standard for modeling plasma etching processes for on-line diagnosis and monitoring. However, statistical quantitative methods for processing the OES data are still lacking. There is an urgent need for a statistical quantitative method to deal with high-dimensional OES data for improving the quality of etched wafers. Therefore, we propose a robust relevance vector machine (RRVM) for regression with statistical quantitative features for modeling etch rate and uniformity in plasma etch processes by using OES data. For effectively dealing with the OES data complexity, we identify seven statistical features for extraction from raw OES data by reducing the data dimensionality. The experimental results demonstrate that the proposed approach is more suitable for high-accuracy monitoring of plasma etch responses obtained from OES.

식품안전수준에 대한 지수 개발 연구 (A Study on the Development of Index for Food Safety Status based on the Statistical Data)

  • 양성범
    • 한국유기농업학회지
    • /
    • 제30권1호
    • /
    • pp.21-35
    • /
    • 2022
  • Measuring the food safety has been focused only on the psychological consumers' recognition of food safety. The actual measurement tool should consist of the evidence-based statistical data to assess the level of national food safety in scientific perspectives. This paper described the development of a concept to measure the food safety of the food chain based on OECD PSR framework. This paper discusses the elaboration of a set of 8 food safety related data issued as statistical data, and which were same weighted. These food safety statistical data (FSDs) were derived as the basis of measuring the variation of food safety during 2013-2019. The values of the primary production indicator (PPI), the processing and manufacturing indicator (PMI), and the distribution and consumption indicator (DCI) are 0.558-0.859, 0.533-0.691, and 0.979-0.982, respectively. The food safety status (FSS) derived from the safety indicator values of each of the three stages is 0.700-0.810. In order to increase the level of food safety, it is necessary to pay attention to PMI and PPI management. In the future, continuously calculating the level of food safety, managing it like the level of psychological safety, and further expanding it to the level of food safety between countries will help establish policies to improve the level of food safety in Korea.

A Data Mining Approach for a Dynamic Development of an Ontology-Based Statistical Information System

  • Mohamed Hachem Kermani;Zizette Boufaida;Amel Lina Bensabbane;Besma Bourezg
    • Journal of Information Science Theory and Practice
    • /
    • 제11권2호
    • /
    • pp.67-81
    • /
    • 2023
  • This paper presents a dynamic development of an ontology-based statistical information system supporting the collection, storage, processing, analysis, and the presentation of statistical knowledge at the national scale. To accomplish this, we propose a data mining technique to dynamically collect data relating to citizens from publicly available data sources; the collected data will then be structured, classified, categorized, and integrated into an ontology. Moreover, an intelligent platform is proposed in order to generate quantitative and qualitative statistical information based on the knowledge stored in the ontology. The main aims of our proposed system are to digitize administrative tasks and to provide reliable statistical information to governmental, economic, and social actors. The authorities will use the ontology-based statistical information system for strategic decision-making as it easily collects, produces, analyzes, and provides both quantitative and qualitative knowledge that will help to improve the administration and management of national political, social, and economic life.

A Study on Data Mining Using the Spline Basis

  • Lee, Sun-Geune;Sim, Songyong;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • 제11권2호
    • /
    • pp.255-264
    • /
    • 2004
  • Due to a computerized data processing, there are many cases when we encounter a huge data set. On the other hand, advances in computing technologies make it possible to deal with a huge data set. One important area is the data mining. In this paper we consider data mining when the dependent variable is binary. The proposed method is to use the poly-class model when the independent variables consists of continuous and discrete variables. An example is provided.