• Title/Summary/Keyword: Entropy-based Model

Search Result 227, Processing Time 0.028 seconds

A Spam Filter System based on Maximum Entropy Model Using Spamness Features and URL Features (스팸성 자질과 URL 자질을 이용한 최대엔트로피모델 기반 스팸메일 필터 시스템)

  • Gong, Mi-Gyoung;Lee, Kyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.213-219
    • /
    • 2006
  • 본 논문에서는 스팸메일에 나타나는 스팸성 자질과 URL 자질을 이용한 최대엔트로피모델 기반 스팸 필터 시스템을 제안한다. 스팸성 자질은 스패머들이 스팸메일에 인위적으로 넣는 강조 패턴이나 필터 시스템을 통과하기 위해 비정상적으로 변형시킨 단어들을 말한다. 스팸성 자질 외에 반복적으로 나타나는 URL과 비정상적인 Ink도 자질로 사용하였다. 메일 수신자에게 추가적인 정보 제공을 목적으로 하이퍼링크로 연결시키거나 메일에 직접 타이핑한 URL 중 필터 시스템을 피하기 위해 유효하지 알은 비정상적인 URL들이 스팸 메일을 걸러내는데 도움을 줄 수 있기 때문이다. 또한 스팸성 자질과 URL을 각각 적용한 두 분류기를 통합하였다. 분류기의 통합은 각 분류기에 이용된 자질을 독립적으로 사용할 수 있다는 장점을 가지고 있다. 실험 결과를 통해 스팸성 자질과 URL을 이용함으로써 스팸 필터 시스템의 성능을 향상시킬 수 있음을 확인할 수 있었다.

  • PDF

ModifiedFAST: A New Optimal Feature Subset Selection Algorithm

  • Nagpal, Arpita;Gaur, Deepti
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.2
    • /
    • pp.113-122
    • /
    • 2015
  • Feature subset selection is as a pre-processing step in learning algorithms. In this paper, we propose an efficient algorithm, ModifiedFAST, for feature subset selection. This algorithm is suitable for text datasets, and uses the concept of information gain to remove irrelevant and redundant features. A new optimal value of the threshold for symmetric uncertainty, used to identify relevant features, is found. The thresholds used by previous feature selection algorithms such as FAST, Relief, and CFS were not optimal. It has been proven that the threshold value greatly affects the percentage of selected features and the classification accuracy. A new performance unified metric that combines accuracy and the number of features selected has been proposed and applied in the proposed algorithm. It was experimentally shown that the percentage of selected features obtained by the proposed algorithm was lower than that obtained using existing algorithms in most of the datasets. The effectiveness of our algorithm on the optimal threshold was statistically validated with other algorithms.

Design of Behavioral Classification Model Based on Skeleton Joints (Skeleton Joints 기반 행동 분류 모델 설계)

  • Cho, Jae-hyeon;Moon, Nam-me
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.1101-1104
    • /
    • 2019
  • 키넥트는 RGBD 카메라로 인체의 뼈대와 관절을 3D 공간에서 스켈레톤 데이터수집을 가능하게 해주었다. 스켈레톤 데이터를 활용한 행동 분류는 RNN, CNN 등 다양한 인공 신경망으로 접근하고 있다. 본 연구는 키넥트를 이용해서 Skeleton Joints를 수집하고, DNN 기반 스켈레톤 모델링 학습으로 행동을 분류한다. Skeleton Joints Processing 과정은 키넥트의 Depth Map 기반의 Skeleton Tracker로 25가지 Skeleton Joints 좌표를 얻고, 학습을 위한 전처리 과정으로 각 좌표를 상대좌표로 변경하고 데이터 수를 제한하며, Joint가 트래킹 되지 않은 부분에 대한 예외 처리를 수행한다. 스켈레톤 모델링 학습 과정에선 3계층의 DNN 신경망을 구축하고, softmax_cross_entropy 함수로 Skeleton Joints를 집는 모션, 내려놓는 모션, 팔짱 낀 모션, 얼굴을 가까이 가져가는 모션 해서 4가지 행동으로 분류한다.

Application of Liquid Theory to Sodium-Ammonia Solution

  • Lee, Jong-Myung;Jhon, Mu-Shik
    • Bulletin of the Korean Chemical Society
    • /
    • v.2 no.3
    • /
    • pp.90-96
    • /
    • 1981
  • The significant structure theory of liquids has been successfully applied to the sodium ammonia solution. In applying the theory to sodium ammonia solution, we assumed there were four species in solution, i.e., sodium cation, solvated electron, triple ion, and free electron and equilibria existed between them. Based on these assumptions, we set up the model explaining the anomalous properties of sodium ammonia solution. The partition function for sodium ammonia solution is composed of the partition functions for the above four species and also for the Debye-Huckel excess free energy term. Agreements between calculated and experimental values of the thermodynamic quantities, such as molar volume, vapor pressure, partial molar enthalpy and entropy, and chemical potential as well as viscosity are quite satisfactory.

Music Recommender System based on Lyrics Information (가사정보를 이용한 음악 추천 시스템)

  • Chang, Geun-Tak;Seo, Jung-Yun
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.42-45
    • /
    • 2010
  • 본 연구에서는 한국의 대중가요의 가사 정보를 형태소 단위로 분석하고 이 정보를 기반으로 노래의 감정을 분류하여 추천하는 시스템을 제안한다. 이 시스템을 구축하기 위해서 수집된 노래의 가사는 형태소를 분석하여 각 형태소를 자질로 결정하고, 사용되는 분류기는 ME 모델을 이용해서 학습된다. 이 학습된 분류기는 자질의 수에 따라 그 성능이 분석되고, 분류기를 사용한 추천 시스템은 랜덤하게 생성된 데이터 집합에 대해서 얼마나 정확하게 노래를 추천하는 지를 분석한다.

  • PDF

Flow Mechanism of Dilatant Systems. (Ⅰ) Starch Suspension in Water

  • Bang, Jeong-Hwang;Kim, Eung-Ryul;Hahn Sang-Joon;Ree, Tai-kyue
    • Bulletin of the Korean Chemical Society
    • /
    • v.4 no.5
    • /
    • pp.212-217
    • /
    • 1983
  • Depending on the range of shear rates, temperatures and concentrations, the potato starch suspension in water behaves as a typical dilatant system. The flow curves of the suspension at various concentrations and temperatures were obtained by using a Couette type rotational viscometer. The flow mechanism of the suspension is explained by a structure model of starch granules in the suspension. Based on the experimental results, a general flow equation for the dilatant system is proposed. By analyzing the temperature dependency of the relaxation time, the activation enthalpy and activation entropy for flow in the starch-water suspension were calculated, the former being about 10 kcal/mol.

A Two-Phase Shallow Semantic Parsing System Using Clause Boundary Information and Tree Distance (절 경계와 트리 거리를 사용한 2단계 부분 의미 분석 시스템)

  • Park, Kyung-Mi;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.531-540
    • /
    • 2010
  • In this paper, we present a two-phase shallow semantic parsing method based on a maximum entropy model. The first phase is to recognize semantic arguments, i.e., argument identification. The second phase is to assign appropriate semantic roles to the recognized arguments, i.e., argument classification. Here, the performance of the first phase is crucial for the success of the entire system, because the second phase is performed on the regions recognized at the identification stage. In order to improve performances of the argument identification, we incorporate syntactic knowledge into its pre-processing step. More precisely, boundaries of the immediate clause and the upper clauses of a predicate obtained from clause identification are utilized for reducing the search space. Further, the distance on parse trees from the parent node of a predicate to the parent node of a parse constituent is exploited. Experimental results show that incorporation of syntactic knowledge and the separation of argument identification from the entire procedure enhance performances of the shallow semantic parsing system.

An Improved ViBe Algorithm of Moving Target Extraction for Night Infrared Surveillance Video

  • Feng, Zhiqiang;Wang, Xiaogang;Yang, Zhongfan;Guo, Shaojie;Xiong, Xingzhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4292-4307
    • /
    • 2021
  • For the research field of night infrared surveillance video, the target imaging in the video is easily affected by the light due to the characteristics of the active infrared camera and the classical ViBe algorithm has some problems for moving target extraction because of background misjudgment, noise interference, ghost shadow and so on. Therefore, an improved ViBe algorithm (I-ViBe) for moving target extraction in night infrared surveillance video is proposed in this paper. Firstly, the video frames are sampled and judged by the degree of light influence, and the video frame is divided into three situations: no light change, small light change, and severe light change. Secondly, the ViBe algorithm is extracted the moving target when there is no light change. The segmentation factor of the ViBe algorithm is adaptively changed to reduce the impact of the light on the ViBe algorithm when the light change is small. The moving target is extracted using the region growing algorithm improved by the image entropy in the differential image of the current frame and the background model when the illumination changes drastically. Based on the results of the simulation, the I-ViBe algorithm proposed has better robustness to the influence of illumination. When extracting moving targets at night the I-ViBe algorithm can make target extraction more accurate and provide more effective data for further night behavior recognition and target tracking.

Development of Online Machine Learning Model for AHU Supply Air Temperature Prediction using Progressive Sampling and Normalized Mutual Information (점진적 샘플링과 정규 상호정보량을 이용한 온라인 기계학습 공조기 급기온도 예측 모델 개발)

  • Chu, Han-Gyeong;Shin, Han-Sol;Ahn, Ki-Uhn;Ra, Seon-Jung;Park, Cheol Soo
    • Journal of the Architectural Institute of Korea Structure & Construction
    • /
    • v.34 no.6
    • /
    • pp.63-69
    • /
    • 2018
  • The machine learning model can capture the dynamics of building systems with less inputs than the first principle based simulation model. The training data for developing a machine learning model are usually selected in a heuristic manner. In this study, the authors developed a machine learning model which can describe supply air temperature from an AHU in a real office building. For rational reduction of the training data, the progressive sampling method was used. It is found that even though the progressive sampling requires far less training data (n=60) than the offline regular sampling (n=1,799), the MBEs of both models are similar (2.6% vs. 5.4%). In addition, for the update of the machine learning model, the normalized mutual information (NMI) was applied. If the NMI between the simulation output and the measured data is less than 0.2, the model has to be updated. By the use of the NMI, the model can perform better prediction ($5.4%{\rightarrow}1.3%$).

Natural language processing techniques for bioinformatics

  • Tsujii, Jun-ichi
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.3-3
    • /
    • 2003
  • With biomedical literature expanding so rapidly, there is an urgent need to discover and organize knowledge extracted from texts. Although factual databases contain crucial information the overwhelming amount of new knowledge remains in textual form (e.g. MEDLINE). In addition, new terms are constantly coined as the relationships linking new genes, drugs, proteins etc. As the size of biomedical literature is expanding, more systems are applying a variety of methods to automate the process of knowledge acquisition and management. In my talk, I focus on the project, GENIA, of our group at the University of Tokyo, the objective of which is to construct an information extraction system of protein - protein interaction from abstracts of MEDLINE. The talk includes (1) Techniques we use fDr named entity recognition (1-a) SOHMM (Self-organized HMM) (1-b) Maximum Entropy Model (1-c) Lexicon-based Recognizer (2) Treatment of term variants and acronym finders (3) Event extraction using a full parser (4) Linguistic resources for text mining (GENIA corpus) (4-a) Semantic Tags (4-b) Structural Annotations (4-c) Co-reference tags (4-d) GENIA ontology I will also talk about possible extension of our work that links the findings of molecular biology with clinical findings, and claim that textual based or conceptual based biology would be a viable alternative to system biology that tends to emphasize the role of simulation models in bioinformatics.

  • PDF