• Title/Summary/Keyword: Maximum entropy model

Search Result 135, Processing Time 0.033 seconds

A Spam Filter System based on Maximum Entropy Model Using Spamness Features and URL Features (스팸성 자질과 URL 자질을 이용한 최대엔트로피모델 기반 스팸메일 필터 시스템)

  • Gong, Mi-Gyoung;Lee, Kyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.213-219
    • /
    • 2006
  • 본 논문에서는 스팸메일에 나타나는 스팸성 자질과 URL 자질을 이용한 최대엔트로피모델 기반 스팸 필터 시스템을 제안한다. 스팸성 자질은 스패머들이 스팸메일에 인위적으로 넣는 강조 패턴이나 필터 시스템을 통과하기 위해 비정상적으로 변형시킨 단어들을 말한다. 스팸성 자질 외에 반복적으로 나타나는 URL과 비정상적인 Ink도 자질로 사용하였다. 메일 수신자에게 추가적인 정보 제공을 목적으로 하이퍼링크로 연결시키거나 메일에 직접 타이핑한 URL 중 필터 시스템을 피하기 위해 유효하지 알은 비정상적인 URL들이 스팸 메일을 걸러내는데 도움을 줄 수 있기 때문이다. 또한 스팸성 자질과 URL을 각각 적용한 두 분류기를 통합하였다. 분류기의 통합은 각 분류기에 이용된 자질을 독립적으로 사용할 수 있다는 장점을 가지고 있다. 실험 결과를 통해 스팸성 자질과 URL을 이용함으로써 스팸 필터 시스템의 성능을 향상시킬 수 있음을 확인할 수 있었다.

  • PDF

Music Recommender System based on Lyrics Information (가사정보를 이용한 음악 추천 시스템)

  • Chang, Geun-Tak;Seo, Jung-Yun
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.42-45
    • /
    • 2010
  • 본 연구에서는 한국의 대중가요의 가사 정보를 형태소 단위로 분석하고 이 정보를 기반으로 노래의 감정을 분류하여 추천하는 시스템을 제안한다. 이 시스템을 구축하기 위해서 수집된 노래의 가사는 형태소를 분석하여 각 형태소를 자질로 결정하고, 사용되는 분류기는 ME 모델을 이용해서 학습된다. 이 학습된 분류기는 자질의 수에 따라 그 성능이 분석되고, 분류기를 사용한 추천 시스템은 랜덤하게 생성된 데이터 집합에 대해서 얼마나 정확하게 노래를 추천하는 지를 분석한다.

  • PDF

Hardware Implementation of Context Modeler in HEVC CABAC Decoder (HEVC CABAC 복호기의 문맥 모델러 설계)

  • Kim, Sohyun;Kim, Doohwan;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.280-283
    • /
    • 2017
  • HEVC (high efficiency video coding) exploits CABAC (context-based adaptive binary arithmetic coding) for entropy coding, where a context model estimates the probability for each syntax element. In this paper, a context modeler was designed and implemented for CABAC decoding. lookup table was used to reduce computation and to increase speed. 12 simulations for HEVC standard test sequences and encoder configurations were performed, and the context modeler was verified to perform correction operations. The designed context modeler was synthesized in 0.18um technology. Maximum frequency, maximum throughput, and gate count are 200 MHz, 200 Mbin/s, and 29,268 gates, respectively.

Applicability of Composite Beads, Spent Coffee Grounds/Chitosan, for the Adsorptive Removal of Pb(II) from Aqueous Solutions

  • Choi, Hee-Jeong
    • Applied Chemistry for Engineering
    • /
    • v.30 no.5
    • /
    • pp.536-545
    • /
    • 2019
  • An experiment was conducted to evaluate the adsorptive removal of Pb(II) from an aqueous solution using a mixture of spent coffee grounds and chitosan on beads (CC-beads). Various parameters affecting the adsorption process of Pb(II) using CC-beads were investigated. Based on the experimental data, the adsorption kinetics and adsorption isotherms were analyzed for their adsorption rate, maximum adsorption capacity, adsorption energy and adsorption strength. Moreover, the entropy, enthalpy and free energy were also calculated by thermodynamic analysis. According to the FT-IR analysis, a CC-bead has a very suitable structure for easy heavy metal adsorption. The process of adsorbing Pb(II) using CC-beads was suitable for pseudo-second order kinetic and Langmuir model, with a maximum adsorption capacity of 163.51 (mg/g). The adsorption of Pb(II) using CC-beads was closer to chemical adsorption than physical adsorption. In addition, the adsorption of Pb(II) on CC-beads was exothermic and spontaneous in nature. CC-beads are economical because they are inexpensive and also the waste can be recycled, which is very significant in terms of the continuous circulation of resources. Thus, CC-beads can compete with other adsorbents.

Design of an Efficient Binary Arithmetic Encoder for H.264/AVC (H.264/AVC를 위한 효율적인 이진 산술 부호화기 설계)

  • Moon, Jeon-Hak;Kim, Yoon-Sup;Lee, Seong-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.66-72
    • /
    • 2009
  • This paper proposes an efficient binary arithmetic encoder for CABAC which is used one of the entropy coding methods for H.264/AVC. The present binary arithmetic encoding algorithm requires huge complexity of operation and data dependency of each step, which is difficult to be operated in fast. Therefore, renormalization exploits 2-stage pipeline architecture for efficient process of operation, which reduces huge complexity of operation and data dependency. Context model updater is implemented by using a simple expression instead of transIdxMPS table and merging transIdxLPS and rangeTabLPS tables, which decreases hardware size. Arithmetic calculator consists of regular mode, bypass mode and termination mode for appearance probability of binary value. It can operate in maximum speed. The proposed binary arithmetic encoder has 7282 gate counts in 0.18um standard cell library. And input symbol per cycle is about 1.

Prediction of potential habitats and distribution of the marine invasive sea squirt, Herdmania momus

  • Park, Ju-Un;Lee, Taekjun;Kim, Dong Gun;Shin, Sook
    • Korean Journal of Environmental Biology
    • /
    • v.38 no.1
    • /
    • pp.179-188
    • /
    • 2020
  • The influx of marine exotic and alien species is disrupting marine ecosystems and aquaculture. Herdmania momus, reported as an invasive species, is distributed all along the coast of Jeju Island and has been confirmed to be distributed and spread to Busan. The potential habitats and distribution of H. momus were estimated using the maximum entropy (MaxEnt) model, quantum geographic information system (QGIS), and Bio-ocean rasters for analysis of climate and environment(Bio-ORACLE), which can predict the distribution and spread based only on species occurrence data using species distribution model (SDM). Temperature and salinity were selected as environmental variables based on previous literature. Additionally, two different representative concentration pathway (RCP) scenarios (RCP 4.5 and RCP 8.5) were set up to estimate future and potential habitats owing to climate change. The prediction of potential habitats and distribution for H. momus using MaxEnt confirmed maximum temperature as the highest contributor(77.1%), and mean salinity, the lowest (0%). And the potential habitats and distribution of H. momus were the highest on Jeju Island, and no potential habitat or distribution was seen in the Yellow Sea. Different RCP scenarios showed that at RCP 4.5, H. momus would be distributed along the coast of Jeju Island in the year 2050 and that the distribution would expand to parts of the Korea Strait by the year 2100. RCP 8.5, the distribution in 2050 is predicted to be similar to that at RCP 4.5; however, by 2100, the distribution is predicted to expand to parts of the Korea Strait and the East Sea. This study can be utilized as basic data to effectively control the ecological injuries by H. momus by predicting its spread and distribution both at present and in the future.

Magnetic properties and magnetocaloric effect of Sr-doped Pr0.7Ca0.3MnO3 compounds

  • Yen, Pham Duc Huyen;Dung, Nguyen Thi;Thanh, Tran Dang;Yu, Seong-Cho
    • Current Applied Physics
    • /
    • v.18 no.11
    • /
    • pp.1280-1288
    • /
    • 2018
  • In this work, we pointed out that Sr substitution for Ca leads to modify the magnetic and magnetocaloric properties of $Pr_{0.7}Ca_{0.3-x}Sr_xMnO_3$ compounds. Analyzing temperature dependence of magnetization, M(T), proves that the Curie temperature ($T_C$) increased with increasing Sr content (x); $T_C$ value is found to be 130-260 K for x = 0.0-0.3, respectively. Using the phenomenological model and M(T,H) data measured at several applied magnetic field, the magnetocaloric effect of $Pr_{0.7}Ca_{0.3-x}Sr_xMnO_3$ compounds has been investigated through their temperature and magnetic field dependences of magnetic entropy change ${\Delta}S_m$(T,H) and the change of the specific heat change ${\Delta}C_P$(T,H). Under an applied magnetic field change of 10 kOe, the maximum value of $-{\Delta}S_m$ is found to be about $3J/kg{\cdot}K$, and the maximum and minimum values of ${\Delta}C_P$(T) calculated to be about ${\pm}60J/kg{\cdot}K$ for x = 0.3 sample. Additionally, the critical behaviors of $Pr_{0.7}Ca_{0.3-x}Sr_xMnO_3$ compounds around their $T_C$ have been also analyzed. Results suggested a coexistence of the ferromagnetic short- and long-range interactions in samples. Moreover, Sr-doping favors establishing the short-range interactions.

Comparative Study of Citizen Science and Expert Based Survey Data Using the Species Distribution Model of Rana uenoi (큰산개구리(Rana uenoi ) 종분포모형을 활용한 시민과학 및 전문가 기반 조사자료의 비교연구)

  • Woncheol Lee;Jeongwoo Yoo;Paikho Rho
    • Journal of Environmental Science International
    • /
    • v.32 no.6
    • /
    • pp.429-440
    • /
    • 2023
  • Quantitative habitat model is established with species occurrence and spatial abundance data, which were usually acquired by professional field ecologists and citizen scientists. The importance of citizen science data is increasing, but the quality of these data needs to be evaluated. This study aims to identify and compare both expert-based data and citizen science data based on the performance power of quantitative models derived from both data sets. A Maximum Entropy (MaxENT) model was developed using eight environmental variables, including climate, topography, landcover and distance to forest edge. The AUC values derived from the MaxENT model were 0.842 and 0.809, respectively, indicating a high level of explanatory power. All environmental variables has similar values for both data sets, except for the distance to forest edge and rice paddy, which was relatively higher for expert-based survey data than that of the citizen science data as the distances increased. This result suggests that habitat model derived from expert-based survey data shows more ecological niche including wider ranges from forest edges and isolated habitat patches of rice paddy. This is presumably because citizen scientists focuses on direct observation methods, whereas professional field surveys investigate a wider variety of methods.

A Study on the Species Distribution Modeling using National Ecosystem Survey Data (전국자연환경조사 자료를 이용한 종분포모형 연구)

  • Kim, Jiyeon;Seo, Changwan;Kwon, Hyuksoo;Ryu, Jieun;Kim, Myungjin
    • Journal of Environmental Impact Assessment
    • /
    • v.21 no.4
    • /
    • pp.593-607
    • /
    • 2012
  • The Ministry of Environment have started the 'National Ecosystem Survey' since 1986. It has been carried out nationwide every ten years as the largest survey project in Korea. The second one and the third one produced the GIS-based inventory of species. Three survey methods were different from each other. There were few studies for species distribution using national survey data in Korea. The purposes of this study are to test species distribution models for finding the most suitable modeling methods for the National Ecosystem Survey data and to investigate the modeling results according to survey methods and taxonominal group. Occurrence data of nine species were extracted from the National Ecosystem Survey by taxonomical group (plant, mammal, and bird). Plants are Korean winter hazel (Corylopsis coreana), Iris odaesanensis (Iris odaesanensis), and Berchemia (Berchemia berchemiaefolia). Mammals are Korean Goral (Nemorhaedus goral), Marten (Martes flavigula koreana), and Leopard cat (Felis bengalensis). Birds are Black Woodpecker (Dryocopus martius), Eagle Owl (Bubo Bubo), and Common Buzzard (Buteo buteo). Environmental variables consisted of climate, topography, soil and vegetation structure. Two modeling methods (GAM, Maxent) were tested across nine species, and predictive species maps of target species were produced. The results of this study were as follows. Firstly, Maxent showed similar 5 cross-validated AUC with GAM. Maxent is more useful model to develop than GAM because National Ecosystem Survey data has presence-only data. Therefore, Maxent is more useful species distribution model for National Ecosystem Survey data. Secondly, the modeling results between the second and third survey methods showed sometimes different because of each different surveying methods. Therefore, we need to combine two data for producing a reasonable result. Lastly, modeling result showed different predicted distribution pattern by taxonominal group. These results should be considered if we want to develop a species distribution model using the National Ecosystem Survey and apply it to a nationwide biodiversity research.

Application of Species Distribution Model for Predicting Areas at Risk of Highly Pathogenic Avian Influenza in the Republic of Korea (종 분포 모형을 이용한 국내 고병원성 조류인플루엔자 발생 위험지역 추정)

  • Kim, Euttm;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.36 no.1
    • /
    • pp.23-29
    • /
    • 2019
  • While research findings suggest that the highly pathogenic avian influenza (HPAI) is the leading cause of economic loss in Korean poultry industry with an estimated cumulative impact of $909 million since 2003, identifying the environmental and anthropogenic risk factors involved remains a challenge. The objective of this study was to identify areas at high risk for potential HPAI outbreaks according to the likelihood of HPAI virus detection in wild birds. This study integrates spatial information regarding HPAI surveillance with relevant demographic and environmental factors collected between 2003 and 2018. The Maximum Entropy (Maxent) species distribution modeling with presence-only data was used to model the spatial risk of HPAI virus. We used historical data on HPAI occurrence in wild birds during the period 2003-2018, collected by the National Quarantine Inspection Agency of Korea. The database contains a total of 1,065 HPAI cases (farms) tied to 168 unique locations for wild birds. Among the environmental variables, the most effective predictors of the potential distribution of HPAI in wild birds were (in order of importance) altitude, number of HPAI outbreaks at farm-level, daily amount of manure processed and number of wild birds migrated into Korea. The area under the receiver operating characteristic curve for the 10 Maxent replicate runs of the model with twelve variables was 0.855 with a standard deviation of 0.012 which indicates that the model performance was excellent. Results revealed that geographic area at risk of HPAI is heterogeneously distributed throughout the country with higher likelihood in the west and coastal areas. The results may help biosecurity authority to design risk-based surveillance and implementation of control interventions optimized for the areas at highest risk of HPAI outbreak potentials.