• Title/Summary/Keyword: 확률 모델

Search Result 2,139, Processing Time 0.028 seconds

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Predicting the Potential Habitat and Future Distribution of Brachydiplax chalybea flavovittata Ris, 1911 (Odonata: Libellulidae) (기후변화에 따른 남색이마잠자리 잠재적 서식지 및 미래 분포예측)

  • Soon Jik Kwon;Yung Chul Jun;Hyeok Yeong Kwon;In Chul Hwang;Chang Su Lee;Tae Geun Kim
    • Journal of Wetlands Research
    • /
    • v.25 no.4
    • /
    • pp.335-344
    • /
    • 2023
  • Brachydiplax chalybea flavovittata, a climate-sensitive biological indicator species, was first observed and recorded at Jeju Island in Korea in 2010. Overwintering was recently confirmed in the Yeongsan River area. This study was aimed to predict the potential distribution patterns for the larvae of B. chalybea flavovittata and to understand its ecological characteristics as well as changes of population under global climate change circumstances. Data was collected both from the Global Biodiversity Information Facility (GBIF) and by field surveys from May 2019 to May 2023. We used for the distribution model among downloaded 19 variables from the WorldClim database. MaxEnt model was adopted for the prediction of potential and future distribution for B. chalybea flavovittata. Larval distribution ranged within a region delimited by northern latitude from Jeju-si, Jeju Special Self-Governing Province (33.318096°) to Yeoju-si, Gyeonggi-do (37.366734°) and eastern longitude from Jindo-gun, Jeollanam-do (126.054925°) to Yangsan-si, Gyeongsangnam-do (129.016472°). M type (permanent rivers, streams and creeks) wetlands were the most common habitat based on the Ramsar's wetland classification system, followed by Tp type (permanent freshwater marshes and pools) (45.8%) and F type (estuarine waters) (4.2%). MaxEnt model presented that potential distribution with high inhabiting probability included Ulsan and Daegu Metropolitan City in addition to the currently discovered habitats. Applying to the future scenarios by Intergovernmental Panel on Climate Change (IPCC), it was predicted that the possible distribution area would expand in the 2050s and 2090s, covering the southern and western coastal regions, the southern Daegu metropolitan area and the eastern coastal regions in the near future. This study suggests that B. chalybea flavovittata can be used as an effective indicator species for climate changes with a monitoring of their distribution ranges. Our findings will also help to provide basic information on the conservation and management of co-existing native species.

Test of Independence Between Variables to Estimate the Frequency of Damage in Heat Pipe (열수송관 파손빈도 추정을 위한 변수간 독립성 검정)

  • Myeongsik Kong;Jaemo Kang;Sungyeol Lee
    • Journal of the Korean GEO-environmental Society
    • /
    • v.24 no.12
    • /
    • pp.61-67
    • /
    • 2023
  • Heat pipes located underground in urban areas and operated under high temperature and pressure conditions can cause large-scale human and economic damage if damaged. In order to predict damage in advance, damage and construction information of heat pipe are analyzed to derive independent variables that have a correlation with frequency of damage, and a simple regression analysis modified model using each variable is applied to the field. However, as the correlation between independent variables applied to the model increases, the independence between variables is harmed and the reliability of the model decreases. In this study, the independence of the pipe diameter, burial depth, insulation level of monitoring system, and disconnection or short circuit of the detection line, which are judged to be interrelated, was tested to derive a method for combining variables and setting categories necessary to apply to the frequency of damage estimation model. For the test of independence, the continuous variables pipe diameter and burial depth were each converted into three categories, insulation level of monitoring system was converted into two categories, and the categorical variable disconnection or short circuit of the detection line status was kept as two categories. As a result of the test of independence, p-value between pipe diameter and burial depth, level of monitoring system and disconnection or short circuit of the detection line was lower than the significance level (α = 0.05), indicating a large correlation between them. Therefore, the pipe diameter and burial depth were combined into one variable, and the categories of the combined variable were set to 9 considering the previously set categories. The insulation level of monitoring system and the disconnection or short circuit of the detection line were also combined into one variable. Since the insulation level is unreliable when the detection line status is disconnection or short circuit, the categories of the combined variable were set to 3.

Software Reliability Growth Modeling in the Testing Phase with an Outlier Stage (하나의 이상구간을 가지는 테스팅 단계에서의 소프트웨어 신뢰도 성장 모형화)

  • Park, Man-Gon;Jung, Eun-Yi
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2575-2583
    • /
    • 1998
  • The productionof the highly relible softwae systems and theirs performance evaluation hae become important interests in the software industry. The software evaluation has been mainly carried out in ternns of both reliability and performance of software system. Software reliability is the probability that no software error occurs for a fixed time interval during software testing phase. These theoretical software reliability models are sometimes unsuitable for the practical testing phase in which a software error at a certain testing stage occurs by causes of the imperfect debugging, abnornal software correction, and so on. Such a certatin software testing stage needs to be considered as an outlying stage. And we can assume that the software reliability does not improve by means of muisance factor in this outlying testing stage. In this paper, we discuss Bavesian software reliability growth modeling and estimation procedure in the presence of an imidentitied outlying software testing stage by the modification of Jehnski Moranda. Also we derive the Bayes estimaters of the software reliability panmeters by the assumption of prior information under the squared error los function. In addition, we evaluate the proposed software reliability growth model with an unidentified outlying stage in an exchangeable model according to the values of nuisance paramether using the accuracy, bias, trend, noise metries as the quantilative evaluation criteria through the compater simulation.

  • PDF

Setting Criteria of Suitable Site for Southern-type Garlic Using Non-linear Regression Model (비선형회귀 분석을 통한 난지형 마늘의 적지기준 설정연구)

  • Choi, Won Jun;Kim, Yong Seok;Shim, Kyo Moon;Hur, Jina;Jo, Sera;Kang, Mingu
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.366-373
    • /
    • 2021
  • This study attempted to establish a field data-based write analysis standard by analyzing field observation data, which is non-linear data of southern garlic. Five regions, including Goheung, Namhae, Sinan, Changnyeong, and Haenam, were selected for analysis. Observation values for each observation station were extracted from the temperature data of farmland in the region through inverse distance weighted. Southern-type garlic production and temperature data were collected for 10 years, from 2010 to 2019. Local regression analysis (Kernel) of the obtained data was performed, and growth temperatures were analyzed, such as 0.8 (18.781℃), 0.9 (18.930℃), 1.0 (19.542℃), 1.1 (20.165℃), and 1.2 (21.042℃) depending on the bandwidth. The analyzed optimum temperature and the grown temperature (4℃/25℃) were applied to extract the growth temperature for each temperature by using the temperature response model analysis. Regression analysis and correlation analysis were performed between the analyzed growth temperature and production data. The coefficient of determination(R2) was analyzed as 0.325 to 0.438, and in the correlation analysis, the correlation coefficient of 0.57 to 0.66 was analyzed at the significance probability 0.001 level. Overall, as the bandwidth increased, the coefficient of determination was higher. However, in all analyses except bandwidth 1.0, it was analyzed that all variables were not used due to bias. The purpose of this study is to accommodate all data through non-linear data. It was analyzed that bandwidth 1.0 with a high coefficient of determination while accepting modeling as a whole is the most suitable.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Probability-based Pre-fetching Method for Multi-level Abstracted Data in Web GIS (웹 지리정보시스템에서 다단계 추상화 데이터의 확률기반 프리페칭 기법)

  • 황병연;박연원;김유성
    • Spatial Information Research
    • /
    • v.11 no.3
    • /
    • pp.261-274
    • /
    • 2003
  • The effective probability-based tile pre-fetching algorithm and the collaborative cache replacement algorithm are able to reduce the response time for user's requests by transferring tiles which will be used in advance and determining tiles which should be removed from the restrictive cache space of a client based on the future access probabilities in Web GISs(Geographical Information Systems). The Web GISs have multi-level abstracted data for the quick response time when zoom-in and zoom-out queries are requested. But, the previous pre-fetching algorithm is applied on only two-dimensional pre-fetching space, and doesn't consider expanded pre-fetching space for multi-level abstracted data in Web GISs. In this thesis, a probability-based pre-fetching algorithm for multi-level abstracted in Web GISs was proposed. This algorithm expanded the previous two-dimensional pre-fetching space into three-dimensional one for pre-fetching tiles of the upper levels or lower levels. Moreover, we evaluated the effect of the proposed pre-fetching algorithm by using a simulation method. Through the experimental results, the response time for user requests was improved 1.8%∼21.6% on the average. Consequently, in Web GISs with multi-level abstracted data, the proposed pre-fetching algorithm and the collaborative cache replacement algorithm can reduce the response time for user requests substantially.

  • PDF

Improvement and Validation of Convective Rainfall Rate Retrieved from Visible and Infrared Image Bands of the COMS Satellite (COMS 위성의 가시 및 적외 영상 채널로부터 복원된 대류운의 강우강도 향상과 검증)

  • Moon, Yun Seob;Lee, Kangyeol
    • Journal of the Korean earth science society
    • /
    • v.37 no.7
    • /
    • pp.420-433
    • /
    • 2016
  • The purpose of this study is to improve the calibration matrixes of 2-D and 3-D convective rainfall rates (CRR) using the brightness temperature of the infrared $10.8{\mu}m$ channel (IR), the difference of brightness temperatures between infrared $10.8{\mu}m$ and vapor $6.7{\mu}m$ channels (IR-WV), and the normalized reflectance of the visible channel (VIS) from the COMS satellite and rainfall rate from the weather radar for the period of 75 rainy days from April 22, 2011 to October 22, 2011 in Korea. Especially, the rainfall rate data of the weather radar are used to validate the new 2-D and 3-DCRR calibration matrixes suitable for the Korean peninsula for the period of 24 rainy days in 2011. The 2D and 3D calibration matrixes provide the basic and maximum CRR values ($mm\;h^{-1}$) by multiplying the rain probability matrix, which is calculated by using the number of rainy and no-rainy pixels with associated 2-D (IR, IR-WV) and 3-D (IR, IR-WV, VIS) matrixes, by the mean and maximum rainfall rate matrixes, respectively, which is calculated by dividing the accumulated rainfall rate by the number of rainy pixels and by the product of the maximum rain rate for the calibration period by the number of rain occurrences. Finally, new 2-D and 3-D CRR calibration matrixes are obtained experimentally from the regression analysis of both basic and maximum rainfall rate matrixes. As a result, an area of rainfall rate more than 10 mm/h is magnified in the new ones as well as CRR is shown in lower class ranges in matrixes between IR brightness temperature and IR-WV brightness temperature difference than the existing ones. Accuracy and categorical statistics are computed for the data of CRR events occurred during the given period. The mean error (ME), mean absolute error (MAE), and root mean squire error (RMSE) in new 2-D and 3-D CRR calibrations led to smaller than in the existing ones, where false alarm ratio had decreased, probability of detection had increased a bit, and critical success index scores had improved. To take into account the strong rainfall rate in the weather events such as thunderstorms and typhoon, a moisture correction factor is corrected. This factor is defined as the product of the total precipitable waterby the relative humidity (PW RH), a mean value between surface and 500 hPa level, obtained from a numerical model or the COMS retrieval data. In this study, when the IR cloud top brightness temperature is lower than 210 K and the relative humidity is greater than 40%, the moisture correction factor is empirically scaled from 1.0 to 2.0 basing on PW RH values. Consequently, in applying to this factor in new 2D and 2D CRR calibrations, the ME, MAE, and RMSE are smaller than the new ones.

A Management Plan According to the Estimation of Nutria (Myocastorcoypus) Distribution Density and Potential Suitable Habitat (뉴트리아(Myocastor coypus) 분포밀도 및 잠재적 서식가능지역 예측에 따른 관리방향)

  • Kim, Areum;Kim, Young-Chae;Lee, Do-Hun
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.2
    • /
    • pp.203-214
    • /
    • 2018
  • The purpose of this study is to estimate the concentrated distribution area of nutria (Myocastor coypus) and potential suitable habitat and to provide useful data for the effective management direction setting. Based on the nationwide distribution data of nutria, the cross-validation value was applied to analyze the distribution density. As a result, the concentrated distribution areas thatrequired preferential elimination is found in 14 administrative areas including Busan Metropolitan City, Daegu Metropolitan City, 11 cities and counties in Gyeongsangnam-do and 1 county in Gyeongsangbuk-do. In the potential suitable habitat estimation using a MaxEnt (Maximum Entropy) model, the possibility of emergency was found in the Nakdong River middle and lower stream area and the Seomjin riverlower stream area and Gahwacheon River area. As for the contribution by variables of a model, it showed DEM, precipitation of driest month, min temperature of coldest month and distance from river had contribution from the highest order. In terms of the relation with the probability of appearance, the probability of emergence was higher than the threshold value in areas with less than 34m of altitude, with $-5.7^{\circ}C{\sim}-0.6^{\circ}C$ of min temperature of the coldest month, with 15-30mm of precipitation of the driest month and with less than 1,373m away from the river. Variables that Altitude, existence of water and wintertemperature affected settlement and expansion of nutria, considering the research results and the physiological and ecological characteristics of nutria. Therefore, it is necessary to reflect them as important variables in the future habitable area detection and expansion estimation modeling. It must be essential to distinguish the concentrated distribution area and the management area of invasive alien species such as nutria and to establish and apply a suitable management strategy to the management site for the permanent control. The results in this study can be used as useful data for a strategic management such as rapid management on the preferential management area and preemptive and preventive management on the possible spreading area.

철도기준점을 이용한 철도중심선형 좌표변환에 관한연구 - 호남고속철도 계획노선을 중심으로 -

  • Moon, Cheung-Kyun;Heo, Joon;Kang, Sang-Du;Kim, Sang-Hoon
    • Proceedings of the KSR Conference
    • /
    • 2007.11a
    • /
    • pp.1141-1151
    • /
    • 2007
  • In this paper through Honam high-speed railroad which is planned with the north and south axis, we will verify the feasibility of the coordinate conversion using railroad control points after regarding current planned-railroad as the linear central axises. From analysis, distortion of Y axis varies 21cm to 40cm diminishing to a gentle straight line, distortion of X axis varies 14cm to 29cm. Through a revision, the deviation value between the coordinates were 6mm to 9mm and it satisfied the allowable error of national geographic information institute which is following ITRF (International Terrestrial Reference Frame) and cadastral boundary survey(10cm). consequently the coordinate conversion is possible using railroad control points as common control points.

  • PDF