• Title/Summary/Keyword: Threshold Models

Search Result 389, Processing Time 0.031 seconds

Semi-supervised learning for sentiment analysis in mass social media (대용량 소셜 미디어 감성분석을 위한 반감독 학습 기법)

  • Hong, Sola;Chung, Yeounoh;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.482-488
    • /
    • 2014
  • This paper aims to analyze user's emotion automatically by analyzing Twitter, a representative social network service (SNS). In order to create sentiment analysis models by using machine learning techniques, sentiment labels that represent positive/negative emotions are required. However it is very expensive to obtain sentiment labels of tweets. So, in this paper, we propose a sentiment analysis model by using self-training technique in order to utilize "data without sentiment labels" as well as "data with sentiment labels". Self-training technique is that labels of "data without sentiment labels" is determined by utilizing "data with sentiment labels", and then updates models using together with "data with sentiment labels" and newly labeled data. This technique improves the sentiment analysis performance gradually. However, it has a problem that misclassifications of unlabeled data in an early stage affect the model updating through the whole learning process because labels of unlabeled data never changes once those are determined. Thus, labels of "data without sentiment labels" needs to be carefully determined. In this paper, in order to get high performance using self-training technique, we propose 3 policies for updating "data with sentiment labels" and conduct a comparative analysis. The first policy is to select data of which confidence is higher than a given threshold among newly labeled data. The second policy is to choose the same number of the positive and negative data in the newly labeled data in order to avoid the imbalanced class learning problem. The third policy is to choose newly labeled data less than a given maximum number in order to avoid the updates of large amount of data at a time for gradual model updates. Experiments are conducted using Stanford data set and the data set is classified into positive and negative. As a result, the learned model has a high performance than the learned models by using "data with sentiment labels" only and the self-training with a regular model update policy.

Temperature-dependent Development Model of Paromius exiguus (Distant) (Heteroptera: Lygaeidae) (흑다리긴노린재[Paromius exiguus (Distant)] 온도발육 모형)

  • Park, Chang-Gyu;Park, Hong-Hyun;Uhm, Ki-Baik;Lee, Joon-Ho
    • Korean journal of applied entomology
    • /
    • v.49 no.4
    • /
    • pp.305-312
    • /
    • 2010
  • The developmental time of immature stages of Paromius exiguus (Distant) was investigated at nine constant temperatures (15, 17.5, 20, 22.5, 25, 27.5, 30, 32.5, $35{\pm}1^{\circ}C$), 20-30% RH, and a photoperiod of 14:10h (L:D). Eggs did not develop at $15^{\circ}C$, and their developmental time decreased with increasing temperatures. Its developmental time was longest at $17.5^{\circ}C$ (28.2 days) and shortest at $35^{\circ}C$ (5.9 days). The first nymphs failed to reach the next nymphal stage at 17.5 and $35^{\circ}C$. Nymphal developmental time decreased with increasing temperatures between $20^{\circ}C$ and $32.5^{\circ}C$, and developmental rate was decreased at temperatures above $30^{\circ}C$ in all stages except for the fourth nymphal stage. The relationship between developmental rate and temperature fit a linear model and three nonlinear models (Briere 1, Lactin 2, and Logan 6). The lower threshold temperature of egg and total nymphal stage was $l3.8^{\circ}C$ and $15.3^{\circ}C$, respectively. The thermal constant required to reach complete egg and the total nymphal stage was 109.9 and 312.5DD, respectively. The Logan-6 model was best fitted ($r^2$=0.94-0.99), among three nonlinear models. The distribution of completion of each development stage was well described by the 3-parameter Weibull function ($r^2$=0.91-0.99).

Temperature-dependent Development Model and Forecasting of Adult Emergence of Overwintered Small Brown Planthopper, Laodelphax striatellus Fallen, Population (애멸구 온도 발육 모델과 월동 개체군의 성충 발생 예측)

  • Park, Chang-Gyu;Park, Hong-Hyun;Kim, Kwang-Ho
    • Korean journal of applied entomology
    • /
    • v.50 no.4
    • /
    • pp.343-352
    • /
    • 2011
  • The developmental period of Laodelphax striatellus Fallen, a vector of rice stripe virus (RSV), was investigated at ten constant temperatures from 12.5 to $35{\pm}1^{\circ}C$ at 30 to 40% RH, and a photoperiod of 14:10 (L:D) h. Eggs developed successfully at each temperature tested and their developmental time decreased as temperature increased. Egg development was fasted at $35^{\circ}C$(5.8 days), and slowest at $12.5^{\circ}C$ (44.5 days). Nymphs could not develop to the adult stage at 32.5 or $35^{\circ}C$. The mean total developmental time of nymphal stages at 12.5, 15, 17.5, 20, 22.5, 25, 27.5 and $30^{\circ}C$ were 132.7, 55.9, 37.7, 26.9, 20.2, 15.8, 14.9 and 17.4 days, respectively. One linear model and four nonlinear models (Briere 1, Lactin 2, Logan 6 and Poikilotherm rate) were used to determine the response of developmental rate to temperature. The lower threshold temperatures of egg and total nymphal stage of L. striatellus were $10.2^{\circ}C$ and $10.7^{\circ}C$, respectively. The thermal constants (degree-days) for eggs and nymphs were 122.0 and 238.1DD, respectively. Among the four nonlinear models, the Poikilotherm rate model had the best fit for all developmental stages ($r^2$=0.98~0.99). The distribution of completion of each development stage was well described by the two-parameter Weibull function ($r^2$=0.84~0.94). The emergence rate of L. striatellus adults using DYMEX$^{(R)}$ was predicted under the assumption that the physiological age of over-wintered nymphs was 0.2 and that the Poikilotherm rate model was applied to describe temperature-dependent development. The result presented higher predictability than other conditions.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

Growth and Fresh Bulb Weight Model in Harvest Time of Southern Type Garlic Var. 'Namdo' based on Temperature (온도에 따른 난지형 마늘 '남도'의 생육과 수확기 구생체중 모델 개발)

  • Wi, Seung Hwan;Moon, Kyung Hwan;Song, Eun Young;Son, In Chang;Oh, Soon Ja;Cho, Young Yeol
    • Journal of Bio-Environment Control
    • /
    • v.26 no.1
    • /
    • pp.13-18
    • /
    • 2017
  • This study was conducted to investigate optimal temperature of garlic and develop bulb weight model in harvest time. Day and night temperature in chambers was set to $11/7^{\circ}C$, $14/10^{\circ}C$, $17/12^{\circ}C$, $20/15^{\circ}C$, $23/18^{\circ}C$, $28/23^{\circ}C$(16/8h). Bulb fresh and dry weight was heaviest on $20/15^{\circ}C$. In $11/7^{\circ}C$ and $14/10^{\circ}C$, leaf number and total leaf area increased slowly. But in the harvest, leaf number and total leaf area were not significant, except $28/23^{\circ}C$. Models were developed with fresh bulb weight. As a result of analyzing the model, $18{\sim}20^{\circ}C$ certified optimal mean temperature. And the growing degree day base temperature estimated $7.1^{\circ}C$, upper temperature threshold estimated $31.7^{\circ}C$. To verify the model, mean temperature on temperature gradient tunnel applied to the growth rate model. Lineal function model, quadric model, and logistic distribution model showed 79.0~95.0%, 77.2~92.3% and 85.0~95.8% accuracy, respectively. Logistic distribution model has the highest accuracy and good for explaining moderate temperature, growing degree day base temperature and upper temperature threshold.

Three-dimensional analysis of soft and hard tissue changes after mandibular setback surgery in skeletal Class III patients (골격성 3급 부정교합 환자의 하악골 후퇴술 시행후 안모변화에 대한 3차원적 연구)

  • Park, Jae-Woo;Kim, Nam-Kug;Kim, Myung-Jin;Chang, Young-Il
    • The korean journal of orthodontics
    • /
    • v.35 no.4 s.111
    • /
    • pp.320-329
    • /
    • 2005
  • The three-dimensional (3D) changes of bone, soft tissue and the ratio of soft tissue to bony movement was investigated in 8 skeletal Class III patients treated by mandibular setback surgery. CT scans of each patient at pre- and post-operative states were taken. Each scan was segmented by a threshold value and registered to a universal three-dimensional coordinate system, consisting of an FH plane, a mid-sagittal plane, and a coronal plane defined by PNS. In the study, the grid parallel to the coronal plane was proposed for the comparison of the changes. The bone or soft tissue was intersected by the projected line from each point on the gird. The coordinate values of intersected point were measured and compared between the pre- and post-operative models. The facial surface changes after setback surgery occurred not only in the mandible, but also in the mouth corner region. The soft tissue changes of the mandibular area were measured relatively by the proportional ratios to the bone changes. The ratios at the mid-sagittal plane were $77\~102\%(p<0.05)$. The ratios at all other sagittal planes had similar patterns to the mid-sagittal plane, but with decreased values. And, the changes in the maxillary region were calculated as a ratio, relative to the movement of a point representing a mandibular movement. When B point was used as a representative point, the ratios were $14\~29\%$, and when Pog was used, the ratios were $17\~37\%(9<0.05)$. In case of the 83rd point of the grid, the ratios were $11\~22\%(p<0.05)$.

Application of Hydro-Cartographic Generalization on Buildings for 2-Dimensional Inundation Analysis (2차원 침수해석을 위한 수리학적 건물 일반화 기법의 적용)

  • PARK, In-Hyeok;JIN, Gi-Ho;JEON, Ka-Young;HA, Sung-Ryong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.18 no.2
    • /
    • pp.1-15
    • /
    • 2015
  • Urban flooding threatens human beings and facilities with chemical and physical hazards since the beginning of human civilization. Recent studies have emphasized the integration of data and models for effective urban flood inundation modeling. However, the model set-up process is tend to be time consuming and to require a high level of data processing skill. Furthermore, in spite of the use of high resolution grid data, inundation depth and velocity are varied with building treatment methods in 2-D inundation model, because undesirable grids are generated and resulted in the reliability decline of the simulation results. Thus, it requires building generalization process or enhancing building orthogonality to minimize the distortion of building before converting building footprint into grid data. This study aims to develop building generalization method for 2-dimensional inundation analysis to enhance the model reliability, and to investigate the effect of building generalization method on urban inundation in terms of geographical engineering and hydraulic engineering. As a result to improve the reliability of 2-dimensional inundation analysis, the building generalization method developed in this study should be adapted using Digital Building Model(DBM) before model implementation in urban area. The proposed building generalization sequence was aggregation-simplification, and the threshold of the each method should be determined by considering spatial characteristics, which should not exceed the summation of building gap average and standard deviation.

The Flood Water Stage Prediction based on Neural Networks Method in Stream Gauge Station (하천수위표지점에서 신경망기법을 이용한 홍수위의 예측)

  • Kim, Seong-Won;Salas, Jose-D.
    • Journal of Korea Water Resources Association
    • /
    • v.33 no.2
    • /
    • pp.247-262
    • /
    • 2000
  • In this paper, the WSANN(Water Stage Analysis with Neural Network) model was presented so as to predict flood water stage at Jindong which has been the major stream gauging station in Nakdong river basin. The WSANN model used the improved backpropagation training algorithm which was complemented by the momentum method, improvement of initial condition and adaptive-learning rate and the data which were used for this study were classified into training and testing data sets. An empirical equation was derived to determine optimal hidden layer node between the hidden layer node and threshold iteration number. And, the calibration of the WSANN model was performed by the four training data sets. As a result of calibration, the WSANN22 and WSANN32 model were selected for the optimal models which would be used for model verification. The model verification was carried out so as to evaluate model fitness with the two-untrained testing data sets. And, flood water stages were reasonably predicted through the results of statistical analysis. As results of this study, further research activities are needed for the construction of a real-time warning of the impending flood and for the control of flood water stage with neural network method in river basin. basin.

  • PDF

Immersive Visualization of Casting Solidification by Mapping Geometric Model to Reconstructed Model of Numerical Simulation Result (주물 응고 수치해석 복원모델의 설계모델 매핑을 통한 몰입형 가시화)

  • Park, Ji-Young;Suh, Ji-Hyun;Kim, Sung-Hee;Rhee, Seon-Min;Kim, Myoung-Hee
    • The KIPS Transactions:PartA
    • /
    • v.15A no.3
    • /
    • pp.141-149
    • /
    • 2008
  • In this research we present a novel method which combines and visualizes the design model and the FDM-based simulation result of solidification. Moreover we employ VR displays and visualize stereoscopic images to provide an effective analysis environment. First we reconstruct the solidification simulation result to a rectangular mesh model using a conventional simulation software. Then each point color of the reconstructed model represents a temperature value of its position. Next we map the two models by finding the nearest point of the reconstructed model for each point of the design model and then assign the point color of the design model as that of the reconstructed model. Before this mapping we apply mesh subdivision because the design model is composed of minimum number of points and that makes the point distribution of the design model not uniform compared with the reconstructed model. In this process the original shape is preserved in the manner that points are added to the mesh edge which length is longer than a predefined threshold value. The implemented system visualizes the solidification simulation data on the design model, which allows the user to understand the object geometry precisely. The immersive and realistic working environment constructed with use of VR display can support the user to discover the defect occurrence faster and more effectively.

TAR and M-TAR Error Correction Models for Asymmetric Gasoline Price in Korea (TAR와 M-TAR 오차수정모형을 이용한 국내 휘발유가격의 비대칭성 분석)

  • Lee, Yang Seob
    • Environmental and Resource Economics Review
    • /
    • v.17 no.4
    • /
    • pp.813-843
    • /
    • 2008
  • This paper investigates the presence of long-run and short-run price asymmetries in weekly gasoline prices from January 1997 to July 2008. In accordance with distribution channels, wholesale and retail stages are analyzed separately. An approach based on TAR and M-TAR cointegration tests, which entail matching asymmetric ECMs, is employed. For wholesale prices, asymmetries in the links with crude oil prices and exchange rates are found for both ECMs in the long-run and short-run. Exchange rates appear to play more significant role than crude oil prices in explaining the short-run price asymmetry. The rise in crude oil prices or exchange rates has statistically significant major impact on the increase of wholesale prices on the second week, not immediately as expected in the concept of 'rockets and feathers'. And asymmetrically, the fall does not have any statistically significant effect on the same period. The finding seems to be somewhat unusual. However, for retail prices, asymmetry m connection with wholesale prices is only revealed in the long-run. A symmetric price adjustment can be assumed in the short-run. Contrary to the long-run asymmetry found in the wholesale stage, in the retail stage, the speed of adjustment for negative deviations toward long-run equilibrium is faster than for positive ones, which is a phenomenon not favorable to consumers.

  • PDF