• Title/Summary/Keyword: probability.statistics

Search Result 1,207, Processing Time 0.023 seconds

A Study on the Characteristics of Opinion Retrieval Using Term Statistical Analysis in Opinion Documents (의견 문서의 단어 통계 분석을 통한 의견 검색 특성에 관한 연구)

  • Han, Kyoung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.11
    • /
    • pp.21-29
    • /
    • 2010
  • Opinion retrieval which searches the opinions expressed in documents by users cannot outperform significantly yet traditional topical retrieval which searches the facts. Therefore, the focus of this paper is to identify the statistical characteristics which can be applied to opinion retrieval by comparing and analyzing the term statistics of opinion and non-opinion documents in the blog domain. The TREC Blogs06 collection and 150 TREC topics are used in the experiments. The difference between term probability distributions in opinion documents is measured by JS divergence, and the difference according to the topic types and topic domains is also investigated. Moreover, the term probabilities of opinion terms are analyzed comparatively. The main findings of this study include the following: it is necessary to consider the topic-specific characteristics for the opinion detection; it is effective to extract positive and negative opinion terms according to the topics; the topic types are complementary to the topic domains; and special attention has to be given to the usage of the positive opinion terms.

A Study on the Prediction of Power Consumption in the Air-Conditioning System by Using the Gaussian Process (정규 확률과정을 사용한 공조 시스템의 전력 소모량 예측에 관한 연구)

  • Lee, Chang-Yong;Song, Gensoo;Kim, Jinho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.1
    • /
    • pp.64-72
    • /
    • 2016
  • In this paper, we utilize a Gaussian process to predict the power consumption in the air-conditioning system. As the power consumption in the air-conditioning system takes a form of a time-series and the prediction of the power consumption becomes very important from the perspective of the efficient energy management, it is worth to investigate the time-series model for the prediction of the power consumption. To this end, we apply the Gaussian process to predict the power consumption, in which the Gaussian process provides a prior probability to every possible function and higher probabilities are given to functions that are more likely consistent with the empirical data. We also discuss how to estimate the hyper-parameters, which are parameters in the covariance function of the Gaussian process model. We estimated the hyper-parameters with two different methods (marginal likelihood and leave-one-out cross validation) and obtained a model that pertinently describes the data and the results are more or less independent of the estimation method of hyper-parameters. We validated the prediction results by the error analysis of the mean relative error and the mean absolute error. The mean relative error analysis showed that about 3.4% of the predicted value came from the error, and the mean absolute error analysis confirmed that the error in within the standard deviation of the predicted value. We also adopt the non-parametric Wilcoxon's sign-rank test to assess the fitness of the proposed model and found that the null hypothesis of uniformity was accepted under the significance level of 5%. These results can be applied to a more elaborate control of the power consumption in the air-conditioning system.

Nonparametric Bayesian Statistical Models in Biomedical Research (생물/보건/의학 연구를 위한 비모수 베이지안 통계모형)

  • Noh, Heesang;Park, Jinsu;Sim, Gyuseok;Yu, Jae-Eun;Chung, Yeonseung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.867-889
    • /
    • 2014
  • Nonparametric Bayesian (np Bayes) statistical models are popularly used in a variety of research areas because of their flexibility and computational convenience. This paper reviews the np Bayes models focusing on biomedical research applications. We review key probability models for np Bayes inference while illustrating how each of the models is used to answer different types of research questions using biomedical examples. The examples are chosen to highlight the problems that are challenging for standard parametric inference but can be solved using nonparametric inference. We discuss np Bayes inference in four topics: (1) density estimation, (2) clustering, (3) random effects distribution, and (4) regression.

Estimating Effects of Attributes on Choice of Pizza Restaurants by Purchase Frequency (구매빈도별 피자전문점 선택에 미치는 속성의 영향 평가)

  • Kang, Jong-Heon;Jeong, In-Suk
    • Korean Journal of Human Ecology
    • /
    • v.15 no.3
    • /
    • pp.491-499
    • /
    • 2006
  • The purpose of this study is to measure the pizza purchasing behavioral characteristics of respondents and importances of factors affecting pizza purchase, to estimate the effects of attributes on choice of pizza restaurant, and to predict probability of selecting a particular pizza restaurant. The questionnaire consisted of two parts: The paired experimental profiles, purchasing behavior and importances of factors affecting pizza purchase. This study generated profiles of 16 hypothetical pizza restaurants based on seven attributes. The profiles comprised 16 discrete sets of variables, each of which had two levels. For this study, researcher randomly selected 150 university students as respondents. Twenty one students did not complete the survey instrument, resulting in a final sample size of 129. All estimations were carried out using frequencies, $X^2$, independent samples t-test, phreg procedure of SAS package. The results were as followed: Some purchasing behavioral characteristics and importances of factors affecting pizza purchase were significantly different by purchase frequency. Based on the estimated models developed for the two purchase frequency groups, the Chi-square statistics were significant at p<0.001. The parameter estimate for late delivery time with frequently purchase frequency group was highest, and the parameter estimate for price with frequently purchase frequency group was highest. The pizza restaurants that charged 20,000 won, offered 100% discount on eleventh pizza, promised to deliver pizza in 20 min, usually delivered the pizza as promised, offered 2 or more types of pizza crust, delivered steaming hot pizza, and did not offer a money-back guarantee which was favored by each of the two purchase frequency groups. The results from this study suggested that there was an opportunity to increase market share and profit by improving operations so that customers can receive discount and money-back guarantee simultaneously, and by reducing price, delivery time.

  • PDF

Probabilistic Modeling of Photovoltaic Power Systems with Big Learning Data Sets (대용량 학습 데이터를 갖는 태양광 발전 시스템의 확률론적 모델링)

  • Cho, Hyun Cheol;Jung, Young Jin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.412-417
    • /
    • 2013
  • Analytical modeling of photovoltaic power systems has been receiving significant attentions in recent years in that it is easy to apply for prediction of its dynamics and fault detection and diagnosis in advanced engineering technologies. This paper presents a novel probabilistic modeling approach for such power systems with a big data sequence. Firstly, we express input/output function of photovoltaic power systems in which solar irradiation and ambient temperature are regarded as input variable and electric power is output variable respectively. Based on this functional relationship, conditional probability for these three random variables(such as irradiation, temperature, and electric power) is mathematically defined and its estimation is accomplished from ratio of numbers of all sample data to numbers of cases related to two input variables, which is efficient in particular for a big data sequence of photovoltaic powers systems. Lastly, we predict the output values from a probabilistic model of photovoltaic power systems by using the expectation theory. Two case studies are carried out for testing reliability of the proposed modeling methodology in this paper.

Complex Segregation Analysis of Categorical Traits in Farm Animals: Comparison of Linear and Threshold Models

  • Kadarmideen, Haja N.;Ilahi, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.8
    • /
    • pp.1088-1097
    • /
    • 2005
  • Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping.

Using the Sample IQR for Calculating Sample Size (표본크기 결정을 위한 IQR의 활용방법)

  • 홍종선;김현태;윤상호;정민정
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.181-193
    • /
    • 2003
  • Without a sample standard deviation for an estimator of the population standard deviation u in a sample size computations, we often use some functions of a sample range (R) or interquartile range (IQR) by an estimator of $\sigma$. In order to avoid under-powered studies, these estimates must have a high probability of being greater than or equal to $\sigma$. In this paper, these probabilities of being greater than or equal to $\sigma$ are estimated for IQR for various parents distributions, and are compared with the probabilities for R/4 (Browne 2001). Alternative divisors (K) are explored and discussed for which the probabilities of R/K and IQR/K being greater than or equal to $\sigma$ is at least 95%.

Job-Matching Function Analysis Using Social Network Analysis (사회연결망분석을 이용한 잡매칭함수 분석)

  • Cho, Jang-Sik;Park, Sung-Ik
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.6
    • /
    • pp.675-685
    • /
    • 2011
  • This paper proposes a job matching function that calculates the job matching probability of a job-seeker to an employer taking the working conditions of a job-seeker and an employer into account. In addition, this study analysis the degree of centrality that means interactions of a job-seeker and an employer utilizing social network analysis. The results are follows. First, a degree of centrality is found to be severely concentrated in certain job-seekers or certain employers; in addition, there are many job-seekers and employers who have no matching results. Second, according to decision tree analysis, characteristics of a job-seeker that influences the degree of centrality are gender, age and degree of education in order of importance. The characteristics of a employer that influences the degree of centrality are proposed salary, industry classification and firm size in order of importance.

Relationship between the Sample Quantiles and Sample Quantile Ranks (표본분위수와 표본분위의 관계)

  • Ahn, Sung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.6
    • /
    • pp.707-716
    • /
    • 2011
  • Quantiles and quantile ranks(or plotting positions) are widely used in academia and industry. Sample quantile methods and sample quantile methods implemented in some major statistical software are at least seven, respectively. Small looking differences between the methods can make big differences in outcomes that result from decisions based on them. We discussed the characteristics and differences of the basic plotting position using the empirical cumulative probability and the six plotting positions derived from the suggestion of Blom (1958). After discussing the characteristics and differences of seven quantile methods used in the some major statistical software, we suggested a general expression covering all seven quantile methods. Using the insight obtained from the general expression, we proposed four propositions that make it possible to find the plotting position method that correspond to each of the seven quantile methods. These correspondences may help us to understand and apply quantile methodology.

Analysis of the contents of Practice and Synthetic Application area in Yanbian Textbooks (중국 연변 수학 교과서의 실천과 종합응용 영역에 나타난 학습내용 분석)

  • Lee, Daehyun
    • Journal of the Korean School Mathematics Society
    • /
    • v.16 no.2
    • /
    • pp.319-335
    • /
    • 2013
  • Chinese mathematical curriculum is divided 4 areas(number and algebra, space and figure, statistics and probability, practice and synthetic application). The purpose of this paper is to analyze the contents of the practice and synthetic application in yanbian elementary textbook. For this, 12-textbook which was published in yeonbeon a publishing company is analyze by topic, mathematical process, area of content and mathematical activity. mathematical process The following results have been drawn from this study. First, contextual backgrounds of practice are restricted in classroom. The contents of synthetic application are limited in connection of mathematical areas. Mathematical problem solving is a main in mathematical process, whereas reasoning activity is a few. Mathematical experience activity is a main in mathematical process, whereas synthetic activity is a few. We can use the suggestions of this paper for development of textbook and the contents of mathematical process.

  • PDF