• Title/Summary/Keyword: 통계 오류

Search Result 384, Processing Time 0.027 seconds

Two Statistical Models for Automatic Word Spacing of Korean Sentences (한글 문장의 자동 띄어쓰기를 위한 두 가지 통계적 모델)

  • 이도길;이상주;임희석;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.358-371
    • /
    • 2003
  • Automatic word spacing is a process of deciding correct boundaries between words in a sentence including spacing errors. It is very important to increase the readability and to communicate the accurate meaning of text to the reader. The previous statistical approaches for automatic word spacing do not consider the previous spacing state, and thus can not help estimating inaccurate probabilities. In this paper, we propose two statistical word spacing models which can solve the problem of the previous statistical approaches. The proposed models are based on the observation that the automatic word spacing is regarded as a classification problem such as the POS tagging. The models can consider broader context and estimate more accurate probabilities by generalizing hidden Markov models. We have experimented the proposed models under a wide range of experimental conditions in order to compare them with the current state of the art, and also provided detailed error analysis of our models. The experimental results show that the proposed models have a syllable-unit accuracy of 98.33% and Eojeol-unit precision of 93.06% by the evaluation method considering compound nouns.

A comparison of imputation methods for the consecutive missing temperature data (연속적 결측이 존재하는 기온 자료에 대한 결측복원 기법의 비교)

  • Kim, Hee-Kyung;Kang, In-Kyeong;Lee, Jae-Won;Lee, Yung-Seop
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.549-557
    • /
    • 2016
  • Consecutive missing values are likely to occur in long climate data due to system error or defective equipment. Furthermore, it is difficult to impute missing values. However, these complicated problems can be overcame by imputing missing values with reference time series. Reference time series must be composed of similar time series to time series that include missing values. We performed a simulation to compare three missing imputation methods (the adjusted normal ratio method, the regression method and the IDW method) to complete the missing values of time series. A comparison of the three missing imputation methods for the daily mean temperatures at 14 climatological stations indicated that the IDW method was better thanx others at south seaside stations. We also found the regression method was better than others at most stations (except south seaside stations).

The NHPP Bayesian Software Reliability Model Using Latent Variables (잠재변수를 이용한 NHPP 베이지안 소프트웨어 신뢰성 모형에 관한 연구)

  • Kim, Hee-Cheul;Shin, Hyun-Cheul
    • Convergence Security Journal
    • /
    • v.6 no.3
    • /
    • pp.117-126
    • /
    • 2006
  • Bayesian inference and model selection method for software reliability growth models are studied. Software reliability growth models are used in testing stages of software development to model the error content and time intervals between software failures. In this paper, could avoid multiple integration using Gibbs sampling, which is a kind of Markov Chain Monte Carlo method to compute the posterior distribution. Bayesian inference for general order statistics models in software reliability with diffuse prior information and model selection method are studied. For model determination and selection, explored goodness of fit (the error sum of squares), trend tests. The methodology developed in this paper is exemplified with a software reliability random data set introduced by of Weibull distribution(shape 2 & scale 5) of Minitab (version 14) statistical package.

  • PDF

Reanalysis of 2002 Donation Frequency Data: Corrections and Supplements (2002년 기부횟수 자료의 재분석: 수정 및 보완)

  • Kim, Byung Soo;Lee, Juhyung;Kim, Inyoung;Park, Su-Bum;Park, Tae-Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.743-753
    • /
    • 2014
  • Kim et al. (2006) and Kim et al. (2009) reported a set of explanatory variables affecting donation frequency when they analyzed nationwide survey data on donations collected in 2002 by Volunteer 21, a nonprofit organization in Korea. The primary purpose of this paper is to correct computational errors found in Kim et al. (2006) and Kim et al. (2009), to rectify major results in the Tables and Figures and to supplement Kim et al. (2009) by providing new results. We add two logistic regressions to the ZIP and a mixture of two Poisson regressions of Kim et al. (2009). Through these two logistic regressions we could detect a set of explanatory variables affecting donation activity (0 or 1) and another set of explanatory variables, in which the volunteer (0, 1) variable is common, discriminating the infrequent donor group from the frequent donor group.

A Study on Online Detection Schemes of Earthquake Induced Shifts in Coordinate Time Series of GNSS Continuous Operation Reference Station by Kalman Filtering (칼만필터에 기반한 GNSS 상시관측소 좌표 시계열의 지진에 따른 편의검출 기법에 관한 연구)

  • Lee, Hungkyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.662-671
    • /
    • 2020
  • It is crucial to manage and maintain the geodetic reference coordinates of GNSS continuously operating reference stations (CORSs) in consideration of their fundamental roles in geodetic control and positioning navigation infrastructure. Earthquake-induced crustal displacement directly impacts the reference coordinates, so such events should be promptly detected, and appropriate action should be made to maintain the target accuracy, including update of the geodetic coordinates. To this end, this paper deals with online schemes for the detection of persistent shifts in the coordinate time-series produced by an automatic GNSS processing system. Algorithms were implemented to test filtered results, such as hypothesis tests of the innovation sequence of a Kalman filter and a cumulative sum (CUSUM) test. The results were assessed by the time-series of coordinates of 14 CORS for two years, including the 2011 Tohoku earthquake. The results show that the global hypothesis test is practical for detecting abrupt jumps, whereas CUSUM is effective for identifying persistent shifts.

Core Demand Market by Visitor's Characteristics of Mountain Types of a National Park -focused on Demographic and Social Economical Factors- (국립공원 방문객 특성을 이용한 핵심수요시장연구 -인구통계학적 변인과 사회경제학적 변인을 중심으로-)

  • Gwak, Gang-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.7
    • /
    • pp.361-368
    • /
    • 2013
  • This research aims to offer the information required for demand increase on marketing strategy level by investigating Mudeungsan visitors' demographic characteristics and social economical variables. To accomplish this study, the proper analyzing model needs to be applied because a grave error of parameters will be led if regression model appropriate for analyzing the data of a continuous probability variable is applied, in case that dependent variable is a discrete random variable which have a discrete probability distribution. Therefore data analysis was performed with Poisson model. However, as the data was showing an overdispersion, parameter was estimated with the Binomial Poisson model able to cover the problem. As a result, some explanatory variables turned out to be significant such as visitor's age, occupation, preferred season to visit, type of company, five days working, and preferring type of tourism. Author could offer to the national park the information about characteristics of core market revealed and marketing strategy for it, based on those influential variables.

Study on algorithm of blind modulation detector in EDGE systems (EDGE 시스템에서 블라인드 변조 검출기의 알고리즘에 관한 연구)

  • Park, Hong-Won;Moon, Hong-Youl;Woo, Sung-Hyun;Kim, Jin-Hee
    • Aerospace Engineering and Technology
    • /
    • v.9 no.1
    • /
    • pp.67-71
    • /
    • 2010
  • In this study, an algorithm for blind modulation detection in EDGE systems is presented. EDGE introduces an 8PSK modulation to provide high-speed data rates in addition to the existing GSM system. A transmitter may switch dynamically the modulation and coding schemes for transmission of data according to the channel quality. To decode the data correctly, the receiver has to detect using only training sequence which modulation is being used. Basically the property of one radio block composed of four bursts to detect effectively the modulation scheme even under severe conditions is used. More specifically, the reference value calculated for received burst is accumulated with previous reference value to minimize statistically the false detection probability in one radio block. Also each burst data having different modulation from the modulation of the fourth burst is set to zero to improve the decoding performance because the reference of the fourth burst has the highest reliability.

Efficient strategy for the genetic analysis of related samples with a linear mixed model (선형혼합모형을 이용한 유전체 자료분석방안에 대한 연구)

  • Lim, Jeongmin;Sung, Joohon;Won, Sungho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1025-1038
    • /
    • 2014
  • Linear mixed model has often been utilized for genetic association analysis with family-based samples. The correlation matrix for family-based samples is constructed with kinship coefficient and assumes that parental phenotypes are independent and the amount of correlations between parent and offspring is same as that of correlations between siblings. However, for instance, there are positive correlations between parental heights, which indicates that the assumption for correlation matrix is often violated. The statistical validity and power are affected by the appropriateness of assumed variance covariance matrix, and in this thesis, we provide the linear mixed model with flexible variance covariance matrix. Our results show that the proposed method is usually more efficient than existing approaches, and its application to genome-wide association study of body mass index illustrates the practical value in real data analysis.

Web-based microarray analysis using the virtual chip viewer and bioconductor. (MicroArray의 직관적 시각적 분석을 위한 웹 기반 분석 도구)

  • Lee, Seung-Won;Park, Jun-Hyung;Kim, Hyun-Jin;Kang, Byeong-Chul;Park, Hee-Kyung;Kim, In-Ju;Kim, Cheol-Min
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.198-201
    • /
    • 2005
  • DNA microarray 칩은 신약 개발, 유전적 질환 진단, Bio-molecular 상호작용 연구, 유전자의 기능연구 등 폭넓게 사용되고 있다. 이 논문은 cDNA mimcroarray 데이터를 분석하기 위한 웹형태의 시스템 개발에 대한 내용을 다룬다. 하나의 cDNA microarray에는 수 백에서 수 만개의 유전자가 심어져 있으며, 데이터를 분석할 때 대량의 데이터와 다양한 형태의 오류로 인해서 데이터간의 차이를 보정하는 분석 도구와 통계적 기법들이 사용되어야 한다. 본 논문에서는 가상 칩 뷰어를 이용하여 실제 microarray 데이터의 foreground intensity에서 백그라운드의 intensity를 제거하여 일반화된 칩 이미지를 생성한다. 이 가상 칩 뷰어는 여러 가지 필터효과와 서로 다른 두 형광의 차이를 조정하는 global normalization 기법을 사용하여 발현 유전자 분석을 시각적으로 할 수 있고, 중복된 마이크로어레이 칩 데이터를 통하여 시간이 많이 걸리는 분석전 칩의 유효성을 검토할 수 있다. 칩 데이터의 normalization을 위한 통계 방법으로 R 통계 도구와 linear 모델을 사용하여 microarray 칩의 유전자 발현 양상을 분석한다. 통계적 방법을 사용하지 않은 데이터를 추출, 이 데이터의 패턴 그래프 그리고 발현 레벨을 분류하여 마이크로어레이의 각 스팟의 유효성 검토의 정확성을 높였다. 이 시스템은 칩의 유효성 검토, 스팟의 유효성 검토, 유전자 선정에 대해 분석의 용이성과 정확성을 높일 수 있었다.

  • PDF

A Comparison of Reduction of Dental Plaque Control and Oral Malodor according to Hardness of Detergent Food (일부 청정식품의 경도 차이에 따른 치면세균막 제거 및 구취감소 효과 비교)

  • Kim, Min-Ji
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.9
    • /
    • pp.324-330
    • /
    • 2017
  • The aim of this study was to make a comparison of dental plaque control and reduction of oral malodor according to hardness of detergent food. Subjects are 1 male(5.0%) and 19 females(95. 0%), the average age of 20.8 years old. The study was conducted from March 6 to April 24, 2014. Detergent foods which were selected during this experiment were cucumber, cabbage and tomato. The data were analyzed by using SPSS where the PHP Index, plaque rate, $H_2S$, $(CH_3)_2S$, Oral Gas, Expiration Gas were analyzed by Non-parametric Statistics and it was compared to the results of the compared mean whereas factors of detergent food before and after ingestion were analyzed by paired t-test. With all detergent foods, compared with the degree of control of dental plaque before and after ingestion showed a statistically significant difference between PHP index from cucumber, PHP index and plaque rate from tomato, and plaque rate from cabbage.