• Title/Summary/Keyword: Maximum Entropy Model

Search Result 135, Processing Time 0.039 seconds

Part-Of-Speech Tagging using multiple sources of statistical data (이종의 통계정보를 이용한 품사 부착 기법)

  • Cho, Seh-Yeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.501-506
    • /
    • 2008
  • Statistical POS tagging is prone to error, because of the inherent limitations of statistical data, especially single source of data. Therefore it is widely agreed that the possibility of further enhancement lies in exploiting various knowledge sources. However these data sources are bound to be inconsistent to each other. This paper shows the possibility of using maximum entropy model to Korean language POS tagging. We use as the knowledge sources n-gram data and trigger pair data. We show how perplexity measure varies when two knowledge sources are combined using maximum entropy method. The experiment used a trigram model which produced 94.9% accuracy using Hidden Markov Model, and showed increase to 95.6% when combined with trigger pair data using Maximum Entropy method. This clearly shows possibility of further enhancement when various knowledge sources are developed and combined using ME method.

Overfitting Reduction of Intelligence Web Search based on Enforcement Learning (강화학습에 기초한 지능형 웹 검색의 과잉적합 감소방안)

  • Han, Song-Yi;Jung, Yong-Gyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.3
    • /
    • pp.25-30
    • /
    • 2009
  • Recent days intellectual systems using reinforcement learning are being researched at various fields of game and web searching applications. A good training models are called to be fitted with trainning data and also classified with new records accurately. A overfitted model with training data may possibly bring the unfavored fallacy of hasty generalization. But it would be unavoidable in actual world. The entropy and mutation model are suggested to reduce the overfitting problems on this paper. It explains variation of entropy and artificial development of entropy in datamining, which can tell development of mutation to survive in nature world. Periodical generation of maximum entropy are introduced in this paper to reduce overfitting. Maximum entropy model can be considered as a periodical generalization in intensified process of intellectual web searching.

  • PDF

Application of Generalized Maximum Entropy Estimator to the Two-way Nested Error Component Model with III-Posed Data

  • Cheon, Soo-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.4
    • /
    • pp.659-667
    • /
    • 2009
  • Recently Song and Cheon (2006) and Cheon and Lim (2009) developed the generalized maximum entropy(GME) estimator to solve ill-posed problems for the regression coefficients in the simple panel model. The models discussed consider the individual and a spatial autoregressive disturbance effects. However, in many application in economics the data may contain nested groupings. This paper considers a two-way error component model with nested groupings for the ill-posed data and proposes the GME estimator of the unknown parameters. The performance of this estimator is compared with the existing methods on the simulated dataset. The results indicate that the GME method performs the best in estimating the unknown parameters in terms of its quality when the data are ill-posed.

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

Comparison of models for estimating surplus productions and methods for estimating their parameters (잉여생산량을 추정하는 모델과 파라미터 추정방법의 비교)

  • Kwon, Youjung;Zhang, Chang Ik;Pyo, Hee Dong;Seo, Young Il
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.49 no.1
    • /
    • pp.18-28
    • /
    • 2013
  • It was compared the estimated parameters by the surplus production from three different models, i.e., three types (Schaefer, Gulland, and Schnute) of the traditional surplus production models, a stock production model incorporating covariates (ASPIC) model and a maximum entropy (ME) model. We also evaluated the performance of models in the estimation of their parameters. The maximum sustainable yield (MSY) of small yellow croaker (Pseudosciaena polyactis) in Korean waters ranged from 35,061 metric tons (mt) by Gulland model to 44,844mt by ME model, and fishing effort at MSY ($f_{MSY}$) ranged from 262,188hauls by Schnute model to 355,200hauls by ME model. The lowest root mean square error (RMSE) for small yellow croaker was obtained from the Gulland surplus production model, while the highest RMSE was from Schnute model. However, the highest coefficient of determination ($R^2$) was from the ME model, but the ASPIC model yielded the lowest coefficient. On the other hand, the MSY of Kapenta (Limnothrissa miodon) ranged from 16,880 mt by ASPIC model to 25,373mt by ME model, and $f_{MSY}$, from 94,580hauls by ASPIC model to 225,490hauls by Schnute model. In this case, both the lowest root mean square error (RMSE) and the highest coefficient of determination ($R^2$) were obtained from the ME model, which showed relatively better fits of data to the model, indicating that the ME model is statistically more stable and robust than other models. Moreover, the ME model could provide additional ecologically useful parameters such as, biomass at MSY ($B_{MSY}$), carrying capacity of the population (K), catchability coefficient (q) and the intrinsic rate of population growth (r).

A Comparative Analysis of Maximum Entropy and Analytical Models for Assessing Kapenta (Limnothrissa miodon) Stock in Lake Kariba (카리브호수 카펜타 자원량 추정을 위한 최대엔트피모델과 분석적 모델의 비교분석)

  • Tendaupenyu, Itai Hilary;Pyo, Hee-Dong
    • Environmental and Resource Economics Review
    • /
    • v.26 no.4
    • /
    • pp.613-639
    • /
    • 2017
  • A Maximum Entropy (ME) Model and an Analytical Model are analyzed in assessing Kapenta stock in Lake Kariba. The ME model estimates a Maximum Sustainable Yield (MSY) of 25,372 tons and a corresponding effort of 109,731 fishing nights suggesting overcapacity in the lake at current effort level. The model estimates a declining stock from 1988 to 2009. The Analytical Model estimates an Acceptable Biological Catch (ABC) annually and a corresponding fishing mortality (F) of 1.210/year which is higher than the prevailing fishing mortality of 0.927/year. The ME and Analytical Models estimate a similar biomass in the reference year 1982 confirming that both models are applicable to the stock. The ME model estimates annual biomass which has been gradually declining until less than one third of maximum biomass (156,047 tons) in 1988. It implies that the stock has been overexploited due to yieldings over the level of ABC compared to variations in annual catch, even if the recent prevailing catch levels were not up to the level of MSY. In comparison, the Analytical Model provides a more conservative value of ABC compared to the MSY value estimated by the ME model. Conservative management policies should be taken to reduce the aggregate amount of annual catch employing the total allowable catch system and effort reduction program.

SAMPLE ENTROPY IN ESTIMATING THE BOX-COX TRANSFORMATION

  • Rahman, Mezbahur;Pearson, Larry M.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.1
    • /
    • pp.103-125
    • /
    • 2001
  • The Box-Cox transformation is a well known family of power transformation that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. This paper proposes a new method for estimating the Box-Cox transformation using maximization of the Sample Entropy statistic which forces the data to get closer to normal as much as possible. A comparative study of the proposed procedure with the maximum likelihood procedure, the procedure via artificial regression estimation, and the recently introduced maximization of the Shapiro-Francia W' statistic procedure is given. In addition, we generate a table for the optimal spacings parameter in computing the Sample Entropy statistic.

  • PDF

Modeling potential habitats for Pergularia tomentosa using maximum entropy model and effect of environmental variables on its quantitative characteristics in arid rangelands, southeastern Iran

  • Hosseini, Seyed Hamzeh;Azarnivand, Hossein;Ayyari, Mahdi;Chahooki, Mohammad Ali Zare;Erfanzadeh, Reza;Piacente, Sonia;Kheirandish, Reza
    • Journal of Ecology and Environment
    • /
    • v.42 no.4
    • /
    • pp.227-239
    • /
    • 2018
  • Background: Predicting the potential habitat of plants in arid regions, especially for medicinal ones, is very important. Although Pergularia tomentosa is a key species for medicinal purposes, it appears in very low density in the arid rangelands of Iran, needing an urgent ecological attention. In this study, we modeled and predicted the potential habitat of P. tomentosa using maximum entropy, and the effects of environmental factors (geology, geomorphology, altitude, and soil properties) on some characteristics of the species were determined. Results: The results showed that P. tomentosa was absent in igneous formation while it appeared in conglomerate formation. In addition, among geomorphological units, the best quantitative characteristics of P. tomentosa was belonged to the conglomerate formation-small hill area (plant aerial parts = 57.63 and root length = 30.68 cm) with the highest electrical conductivity, silt, and $CaCO_3$ content. Conversely, the species was not found in the mountainous area with igneous formation. Moreover, plant density, length of roots, and aerial parts of the species were negatively correlated with soil sand, while positive correlation was observed with $CaCO_3$, EC, potassium, and silt content. The maximum entropy was found to be a reliable method (ROC = 0.91) for predicting suitable habitats for P. tomentosa. Conclusion: These results suggest that in evaluating the plant's habitat suitability in arid regions, contrary to the importance of the topography, some environmental variables such as geomorphology and geology can play the main role in rangeland plants' habitat suitability.

A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features (스팸성 자질과 URL 자질의 공동 학습을 이용한 최대 엔트로피 기반 스팸메일 필터 시스템)

  • Gong, Mi-Gyoung;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.1
    • /
    • pp.61-68
    • /
    • 2008
  • This paper presents a spam filter system using co-training with spamminess features and URL features based on the maximum entropy model. Spamminess features are the emphasizing patterns or abnormal patterns in spam messages used by spammers to express their intention and to avoid being filtered by the spam filter system. Since spammers use URLs to give the details and make a change to the URL format not to be filtered by the black list, normal and abnormal URLs can be key features to detect the spam messages. Co-training with spamminess features and URL features uses two different features which are independent each other in training. The filter system can learn information from them independently. Experiment results on TREC spam test collection shows that the proposed approach achieves 9.1% improvement and 6.9% improvement in accuracy compared to the base system and bogo filter system, respectively. The result analysis shows that the proposed spamminess features and URL features are helpful. And an experiment result of the co-training shows that two feature sets are useful since the number of training documents are reduced while the accuracy is closed to the batch learning.

Habitat Analysis of Hyla suweonensis in the Breeding Season Using Species Distribution Modeling (종분포모형을 이용한 수원청개구리의 번식기 서식지 분석)

  • Song, Wonkyong
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.18 no.1
    • /
    • pp.71-82
    • /
    • 2015
  • Hyla suweonensis is an endemic species and is designated as the only endangered species I among amphibians in 2012 by the Ministry of Environment, however studies about its habitat are lacking. This study was carried out to analyze habitat of H. suweonensis based on the spatial information using Maxent (Maximum entropy model as a species distribution model. We detected 45 present points until 2013 and 10 environmental variables by literature review for the model. The results showed that $429km^2$ (0.95%) of the study area, which was about 7.75% of the total agricultural area, was high possible habitats of H. suweonensis. The habitat of H. suweonensis was analyzed by over $1km^2$ rice paddy fields that were lower elevations, flat slopes, and not fragmented. The distance from forests and rivers was identified as a factor that affects its habitat possibilities. In order to conserve H. suweonensis, a large area of rice paddy fields should be preserved, and especially the area around forests and rivers would be required more intensive management. In addition, to compensate for degraded habitats of H. suweonensis in urban areas like as Suwon city, considering integrated watershed management strategy could be effective in the perspective of ecological habitat network of H. suweonensis.