• Title/Summary/Keyword: probability model for ranking

Search Result 18, Processing Time 0.026 seconds

A probabilistic information retrieval model by document ranking using term dependencies (용어간 종속성을 이용한 문서 순위 매기기에 의한 확률적 정보 검색)

  • You, Hyun-Jo;Lee, Jung-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.5
    • /
    • pp.763-782
    • /
    • 2019
  • This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.

Revisiting the Bradley-Terry model and its application to information retrieval

  • Jeon, Jong-June;Kim, Yongdai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1089-1099
    • /
    • 2013
  • The Bradley-Terry model is widely used for analysis of pairwise preference data. We explain that the popularity of Bradley-Terry model is gained due to not only easy computation but also some nice asymptotic properties when the model is misspecified. For information retrieval required to analyze big ranking data, we propose to use a pseudo likelihood based on the Bradley-Terry model even when the true model is different from the Bradley-Terry model. We justify using the Bradley-Terry model by proving that the estimated ranking based on the proposed pseudo likelihood is consistent when the true model belongs to the class of Thurstone models, which is much bigger than the Bradley-Terry model.

Protein Interaction Possibility Ranking Method based on Domain Combination (도메인 조합 기반 단백질 상호작용 가능성 순위 부여 기법)

  • Han Dong-Soo;Kim Hong-Song;Jong Woo-Hyuk;Lee Sung-Doke
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.5
    • /
    • pp.427-435
    • /
    • 2005
  • With the accumulation of protein and its related data on the Internet, many domain based computational techniques to predict protein interactions have been developed. However, most of the techniques still have many limitations to be used in real fields. They usually suffer from a low accuracy problem in prediction and do not provide any interaction possibility ranking method for multiple protein pairs. In this paper, we reevaluate a domain combination based protein interaction prediction method and develop an interaction possibility ranking method for multiple protein pairs. Probability equations are devised and proposed in the framework of domain combination based protein interaction prediction method. Using the ranking method, one can discern which protein pair is more probable to interact with each other than other protein pairs in multiple protein pairs. In the validation of the ranking method, we revealed that there exist some correlations between the interacting probability and the precision of the prediction in case of the protein pair group having the matching PIP(Primary Interaction Probability) values in the interacting or non interacting PIP distributions.

An Experimental Analysis of a Probabilistic DDHV Estimation Model (확률적인 중방향 설계시간 교통량 산정 모형에 관한 실험적 해석)

  • Jo, Jun-Han;Kim, Seong-Ho;No, Jeong-Hyeon
    • Journal of Korean Society of Transportation
    • /
    • v.27 no.2
    • /
    • pp.23-34
    • /
    • 2009
  • This paper is described as an experimental analysis for the probabilistic directional design hour volume estimation. The main objective of this paper is to derive acceptable design rankings, PK factors, and PD factors. In order to determine an appropriate distribution for acceptable design rankings, 12 probability distribution functions were employed. The parameters were estimated based on the method of maximum likelihood. The goodness of fit test was performed with a Kolmogorov-Smirnov test. The Beta General distribution among the probability distributions was selected as an appropriate model for 2 lane roadways. On the other hand, the Weibull distribution is superior for 4 lanes. The method of the inverse cumulative distribution function came up with an acceptable design ranking of design for LOS D. An acceptable design ranking of 2 lanes is 190, while an acceptable design ranking for 4 lanes is 164. The PK factor and PD factor of 2 lanes was elicited for 0.119 (0.100-0.139) and 0.568 (0.545-0.590), respectively. On the other hand, the PK factor and PD factor for 4 lanes was elicited as 0.106 (0.097-0.114) and 0.571 (0.544-0.598), respectively.

Statistical Probability Analysis of Storage Temperatures of Domestic Refrigerator as a Risk Factor of Foodborne Illness Outbreak (식중독 발생 위해인자로서 가정용 냉장고의 온도에 대한 확률분포 분석)

  • Bahk, Gyung-Jin
    • Korean Journal of Food Science and Technology
    • /
    • v.42 no.3
    • /
    • pp.373-376
    • /
    • 2010
  • The objective of this study was to present the proper probability distribution model based on the data obtained from surveys on domestic refrigerator food storage temperatures in home. Domestic refrigerator temperatures were determined as risk factors in foodborne disease outbreaks for microbial risk assessment (MRA). The temperature was measured by directly visiting 139 homes using a data logger from May to September of 2009. The overall mean temperature for all the refrigerators in the survey was $3.53{\pm}2.96^{\circ}C$, with 23.6% of the refrigerators measuring above $5^{\circ}C$. Probability distributions were also created using @RISK program based on the measured temperature data. Statistical ranking was determined by the goodness of fit (GOF, i.e., the Kolmogorov-Smirnov (KS) or Anderson-Darling (AD) test) to determine the proper probability distribution model. This result showed that the LogLogistic (-10.407, 13.616, 8.6107) distribution was found to be the most appropriate for the MRA model. The results of this study might be directly used as input variables in exposure evaluation for conducting MRA.

Ranking by Inductive Inference in Collaborative Filtering Systems (협력적 여과 시스템에서 귀납 추리를 이용한 순위 결정)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.9
    • /
    • pp.659-668
    • /
    • 2010
  • Collaborative filtering systems grasp behaviors for a new user and need new information for the user in order to recommend interesting items to the user. For the purpose of acquiring the information the collaborative filtering systems learn behaviors for users based on the previous data and can obtain new information from the results. In this paper, we propose an inductive inference method to obtain new information for users and rank items by using the new information in the proposed method. The proposed method clusters users into groups by learning users through NMF among inductive machine learning methods and selects the group features from the groups by using chi-square. Then, the method classifies a new user into a group by using the bayesian probability model as one of inductive inference methods based on the rating values for the new user and the features of groups. Finally, the method decides the ranks of items by applying the Rocchio algorithm to items with the missing values.

A Study on the Optimal Discriminant Model Predicting the likelihood of Insolvency for Technology Financing (기술금융을 위한 부실 가능성 예측 최적 판별모형에 대한 연구)

  • Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.10 no.2
    • /
    • pp.183-205
    • /
    • 2007
  • An investigation was undertaken of the optimal discriminant model for predicting the likelihood of insolvency in advance for medium-sized firms based on the technology evaluation. The explanatory variables included in the discriminant model were selected by both factor analysis and discriminant analysis using stepwise selection method. Five explanatory variables were selected in factor analysis in terms of explanatory ratio and communality. Six explanatory variables were selected in stepwise discriminant analysis. The effectiveness of linear discriminant model and logistic discriminant model were assessed by the criteria of the critical probability and correct classification rate. Result showed that both model had similar correct classification rate and the linear discriminant model was preferred to the logistic discriminant model in terms of criteria of the critical probability In case of the linear discriminant model with critical probability of 0.5, the total-group correct classification rate was 70.4% and correct classification rates of insolvent and solvent groups were 73.4% and 69.5% respectively. Correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify the present sample. However, the actual correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify a future observation. Unfortunately, the correct classification rate underestimates the actual correct classification rate because the data set used to estimate the discriminant function is also used to evaluate them. The cross-validation method were used to estimate the bias of the correct classification rate. According to the results the estimated bias were 2.9% and the predicted actual correct classification rate was 67.5%. And a threshold value is set to establish an in-doubt category. Results of linear discriminant model can be applied for the technology financing banks to evaluate the possibility of insolvency and give the ranking of the firms applied.

  • PDF

Application of FMECA with Stochastic Approach to Reliability-Centered Maintenance of Electric Power Plants in Korean Power Systems (RCM 수립을 위해 발전설비의 고장확률을 고려한 확률론적 FMECA 평가 기법)

  • Joo, Jae-Myung;Lee, Seung-Hyuk;Kim, Jin-O;Lee, Hyo-Sang
    • Proceedings of the KIEE Conference
    • /
    • 2006.07a
    • /
    • pp.196-197
    • /
    • 2006
  • Preventive maintenance can avail the generation utilities to reduce cost and gain more profit in a competitive supply-side power market. So, it is necessary to perform reliability analysis on the systems in which reliability is essential. In this paper, RCM (Reliability -Centered Maintenance) analytical method is adopted using real historical failure data in Korean power plants. Therefore, the reliability -based Probability model for predicting the failures of components in the power plant is also established, and application to FMECA(Failure Mode Effects and Critical Analysis) consideration of failure probability, Based on the weighting ranking of generating equipments which status to be probability estimation by FMECA. The FMECA is an engineering analysis and a core activity performed by reliability engineers to review the effects of probable failure modes of generating equipments and assemblies of the power system on system performance. The results of this paper show that application of FMECA with stochastic approach to the preventive maintenance can efficiently avail decreasing the cost on maintenance and hence improve the total benefit.

  • PDF

Interconnected Characteristics of Innovation Networks of Farmers Employing Ranked Logit Model (순위형 로짓모형을 이용한 농업인의 혁신네트워크 연계 특성)

  • Choi, Sang-Ho;Lee, Seong-Woo;Choe, Young-Chan
    • Journal of Korean Society of Rural Planning
    • /
    • v.13 no.4
    • /
    • pp.53-67
    • /
    • 2007
  • This study analyzed the probability that experiment stations, agricultural technology and extension centers, provincial agricultural research and extension services, central government organs, or civilian and other related organs will be the first choice of the compositional subjects of local innovation networks. While gender effect was statistically insignificant, educational level, income, main acquired information, sources of necessary information, and frequency of information acquisition sessions were significant, and the preference ranking model was highly relevant. According to the analysis, highly academic and business-related information was most likely to be acquired from the civilian sector; agricultural technology such as technology, crops/plants, storage, and circulation was most likely to be acquired from experiment stations and provincial agricultural research and extension services; and information on agricultural production was most likely to be acquired from agricultural technology centers.

An Evaluation Model of the Aviation Industry Development Strategies in Korea using Cross-impact Hierarchy Process (상호영향계층분석기법(Cross-impact Hierarchy Process)를 이용한 항공 산업 발전전략 평가체계 개발)

  • Kim, Seon-Tae;Song, Ki-Han
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.19 no.4
    • /
    • pp.74-82
    • /
    • 2011
  • In order to enhance the aviation industry in Korea, many strategies have been published by some researchers as well as the government. However, considering the constrained conditions in real, since the ranking of their importance has not determined yet, they are difficult to be implemented by decision makers. Therefore, in terms of their demand for deciding the significance of strategies, the evaluation model of this paper was developed. In this study, the Cross-impact Hierarchy Process(CHP), an linked model of both the Analytic Hierarchy Process(AHP) and Cross Impact Analysis(CIA), was selected as the best model. That is because the strategies are not independent from each other, and one strategy can affect the others depending on its realization, which can be considered in CHP. To achieve our objective, at first, the strategies were categorized and arranged according to the evaluation structure. Secondly, the parameters such as conditional probability and weights were estimated from the survey conducted by 16 experts in the aviation field. Lastly, the result of the assessment were discussed, and further studies were suggested.