• Title/Summary/Keyword: statistical data processing

Search Result 679, Processing Time 0.034 seconds

Improved Statistical Language Model for Context-sensitive Spelling Error Candidates (문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

On-Line Analytical Processing and Research Problems for Statisticians

  • Ahn, JeongYong;Han, Kyung Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.457-463
    • /
    • 2000
  • Recently, statistical analysis tools have been changed to the applications on the World Wide Web that access data stored in databases. On-line analytical processing(OLAP) is a class of technologies that give users statistical information with multidimensional views of data in databases. In this paper, we introduce the concept and requisites of OLAP system, and we propose some research issues.

  • PDF

A Study for the Features of Data Analysis Methods Used in Medical Research

  • Sin, Jae-Gyeong;Jang, Deok-Jun;Mun, Seung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.257-264
    • /
    • 2003
  • The perception of the importance of statistical methods for processing medical data in Korea's medical research and the practical use of the analysis method are insufficient. From this standpoint, in order to examine the features of the data analysis method used in the medical journals of Korea and America, we have examined the research papers which has been published in the exemplary medical journals of both countries. It showed that there was a large difference in the quantity and quality between Korea and America. Especially in the medical research of Korea, we could notice that the use of statistical methods were comparatively low. Hence the researchers in the medical area are encouraged to use more statistical methods in processing medical data.

  • PDF

Practical Guide to NMR-based Metabolomics - III : NMR Spectrum Processing and Multivariate Analysis

  • Jung, Young-Sang
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.22 no.3
    • /
    • pp.46-53
    • /
    • 2018
  • NMR-based metabolomics needs various knowledge to elucidate metabolic perturbation such as NMR experiments, NMR spectrum processing, raw data processing, metabolite identification, statistical analysis, and metabolic pathway analysis regarding technical aspects. Among them, some concepts of raw data processing and multivariate analysis are not easy to understand but are important to correctly interpret metabolic profile. This article introduces NMR spectrum processing, raw data processing, and multivariate analysis.

On the Bayesian Statistical Inference (베이지안 통계 추론)

  • Lee, Ho-Suk
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.263-266
    • /
    • 2007
  • This paper discusses the Bayesian statistical inference. This paper discusses the Bayesian inference, MCMC (Markov Chain Monte Carlo) integration, MCMC method, Metropolis-Hastings algorithm, Gibbs sampling, Maximum likelihood estimation, Expectation Maximization algorithm, missing data processing, and BMA (Bayesian Model Averaging). The Bayesian statistical inference is used to process a large amount of data in the areas of biology, medicine, bioengineering, science and engineering, and general data analysis and processing, and provides the important method to draw the optimal inference result. Lastly, this paper discusses the method of principal component analysis. The PCA method is also used for data analysis and inference.

  • PDF

Comparison of different post-processing techniques in real-time forecast skill improvement

  • Jabbari, Aida;Bae, Deg-Hyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.150-150
    • /
    • 2018
  • The Numerical Weather Prediction (NWP) models provide information for weather forecasts. The highly nonlinear and complex interactions in the atmosphere are simplified in meteorological models through approximations and parameterization. Therefore, the simplifications may lead to biases and errors in model results. Although the models have improved over time, the biased outputs of these models are still a matter of concern in meteorological and hydrological studies. Thus, bias removal is an essential step prior to using outputs of atmospheric models. The main idea of statistical bias correction methods is to develop a statistical relationship between modeled and observed variables over the same historical period. The Model Output Statistics (MOS) would be desirable to better match the real time forecast data with observation records. Statistical post-processing methods relate model outputs to the observed values at the sites of interest. In this study three methods are used to remove the possible biases of the real-time outputs of the Weather Research and Forecast (WRF) model in Imjin basin (North and South Korea). The post-processing techniques include the Linear Regression (LR), Linear Scaling (LS) and Power Scaling (PS) methods. The MOS techniques used in this study include three main steps: preprocessing of the historical data in training set, development of the equations, and application of the equations for the validation set. The expected results show the accuracy improvement of the real-time forecast data before and after bias correction. The comparison of the different methods will clarify the best method for the purpose of the forecast skill enhancement in a real-time case study.

  • PDF

Quantitative Analysis for Plasma Etch Modeling Using Optical Emission Spectroscopy: Prediction of Plasma Etch Responses

  • Jeong, Young-Seon;Hwang, Sangheum;Ko, Young-Don
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.4
    • /
    • pp.392-400
    • /
    • 2015
  • Monitoring of plasma etch processes for fault detection is one of the hallmark procedures in semiconductor manufacturing. Optical emission spectroscopy (OES) has been considered as a gold standard for modeling plasma etching processes for on-line diagnosis and monitoring. However, statistical quantitative methods for processing the OES data are still lacking. There is an urgent need for a statistical quantitative method to deal with high-dimensional OES data for improving the quality of etched wafers. Therefore, we propose a robust relevance vector machine (RRVM) for regression with statistical quantitative features for modeling etch rate and uniformity in plasma etch processes by using OES data. For effectively dealing with the OES data complexity, we identify seven statistical features for extraction from raw OES data by reducing the data dimensionality. The experimental results demonstrate that the proposed approach is more suitable for high-accuracy monitoring of plasma etch responses obtained from OES.

A Study on the Development of Index for Food Safety Status based on the Statistical Data (식품안전수준에 대한 지수 개발 연구)

  • Yang, Sung-Bum
    • Korean Journal of Organic Agriculture
    • /
    • v.30 no.1
    • /
    • pp.21-35
    • /
    • 2022
  • Measuring the food safety has been focused only on the psychological consumers' recognition of food safety. The actual measurement tool should consist of the evidence-based statistical data to assess the level of national food safety in scientific perspectives. This paper described the development of a concept to measure the food safety of the food chain based on OECD PSR framework. This paper discusses the elaboration of a set of 8 food safety related data issued as statistical data, and which were same weighted. These food safety statistical data (FSDs) were derived as the basis of measuring the variation of food safety during 2013-2019. The values of the primary production indicator (PPI), the processing and manufacturing indicator (PMI), and the distribution and consumption indicator (DCI) are 0.558-0.859, 0.533-0.691, and 0.979-0.982, respectively. The food safety status (FSS) derived from the safety indicator values of each of the three stages is 0.700-0.810. In order to increase the level of food safety, it is necessary to pay attention to PMI and PPI management. In the future, continuously calculating the level of food safety, managing it like the level of psychological safety, and further expanding it to the level of food safety between countries will help establish policies to improve the level of food safety in Korea.

A Data Mining Approach for a Dynamic Development of an Ontology-Based Statistical Information System

  • Mohamed Hachem Kermani;Zizette Boufaida;Amel Lina Bensabbane;Besma Bourezg
    • Journal of Information Science Theory and Practice
    • /
    • v.11 no.2
    • /
    • pp.67-81
    • /
    • 2023
  • This paper presents a dynamic development of an ontology-based statistical information system supporting the collection, storage, processing, analysis, and the presentation of statistical knowledge at the national scale. To accomplish this, we propose a data mining technique to dynamically collect data relating to citizens from publicly available data sources; the collected data will then be structured, classified, categorized, and integrated into an ontology. Moreover, an intelligent platform is proposed in order to generate quantitative and qualitative statistical information based on the knowledge stored in the ontology. The main aims of our proposed system are to digitize administrative tasks and to provide reliable statistical information to governmental, economic, and social actors. The authorities will use the ontology-based statistical information system for strategic decision-making as it easily collects, produces, analyzes, and provides both quantitative and qualitative knowledge that will help to improve the administration and management of national political, social, and economic life.

A Study on Data Mining Using the Spline Basis

  • Lee, Sun-Geune;Sim, Songyong;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.255-264
    • /
    • 2004
  • Due to a computerized data processing, there are many cases when we encounter a huge data set. On the other hand, advances in computing technologies make it possible to deal with a huge data set. One important area is the data mining. In this paper we consider data mining when the dependent variable is binary. The proposed method is to use the poly-class model when the independent variables consists of continuous and discrete variables. An example is provided.