• Title/Summary/Keyword: 확률분포모델

Search Result 511, Processing Time 0.033 seconds

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction (교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.3
    • /
    • pp.77-90
    • /
    • 2018
  • Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

Characteristics of the Graded Wildlife Dose Assessment Code K-BIOTA and Its Application (단계적 야생동식물 선량평가 코드 K-BIOTA의 특성 및 적용)

  • Keum, Dong-Kwon;Jun, In;Lim, Kwang-Muk;Kim, Byeong-Ho;Choi, Yong-Ho
    • Journal of Radiation Protection and Research
    • /
    • v.40 no.4
    • /
    • pp.252-260
    • /
    • 2015
  • This paper describes the technical background for the Korean wildlife radiation dose assessment code, K-BIOTA, and the summary of its application. The K-BIOTA applies the graded approaches of 3 levels including the screening assessment (Level 1 & 2), and the detailed assessment based on the site specific data (Level 3). The screening level assessment is a preliminary step to determine whether the detailed assessment is needed, and calculates the dose rate for the grouped organisms, rather than an individual biota. In the Level 1 assessment, the risk quotient (RQ) is calculated by comparing the actual media concentration with the environmental media concentration limit (EMCL) derived from a bench-mark screening reference dose rate. If RQ for the Level 1 assessment is less than 1, it can be determined that the ecosystem would maintain its integrity, and the assessment is terminated. If the RQ is greater than 1, the Level 2 assessment, which calculates RQ using the average value of the concentration ratio (CR) and equilibrium distribution coefficient (Kd) for the grouped organisms, is carried out for the more realistic assessment. Thus, the Level 2 assessment is less conservative than the Level 1 assessment. If RQ for the Level 2 assessment is less than 1, it can be determined that the ecosystem would maintain its integrity, and the assessment is terminated. If the RQ is greater than 1, the Level 3 assessment is performed for the detailed assessment. In the Level 3 assessment, the radiation dose for the representative organism of a site is calculated by using the site specific data of occupancy factor, CR and Kd. In addition, the K-BIOTA allows the uncertainty analysis of the dose rate on CR, Kd and environmental medium concentration among input parameters optionally in the Level 3 assessment. The four probability density functions of normal, lognormal, uniform and exponential distribution can be applied.The applicability of the code was tested through the participation of IAEA EMRAS II (Environmental Modeling for Radiation Safety) for the comparison study of environmental models comparison, and as the result, it was proved that the K-BIOTA would be very useful to assess the radiation risk of the wildlife living in the various contaminated environment.

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

  • Nam, Kihwan;Seong, Nohyoon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.33-49
    • /
    • 2018
  • The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.

Quantitative Microbial Risk Assessment Model for Staphylococcus aureus in Kimbab (김밥에서의 Staphylococcus aureus에 대한 정량적 미생물위해평가 모델 개발)

  • Bahk, Gyung-Jin;Oh, Deog-Hwan;Ha, Sang-Do;Park, Ki-Hwan;Joung, Myung-Sub;Chun, Suk-Jo;Park, Jong-Seok;Woo, Gun-Jo;Hong, Chong-Hae
    • Korean Journal of Food Science and Technology
    • /
    • v.37 no.3
    • /
    • pp.484-491
    • /
    • 2005
  • Quantitative microbial risk assessment (QMRA) analyzes potential hazard of microorganisms on public health and offers structured approach to assess risks associated with microorganisms in foods. This paper addresses specific risk management questions associated with Staphylococcus aureus in kimbab and improvement and dissemination of QMRA methodology, QMRA model was developed by constructing four nodes from retail to table pathway. Predictive microbial growth model and survey data were combined with probabilistic modeling to simulate levels of S. aureus in kimbab at time of consumption, Due to lack of dose-response models, final level of S. aureus in kimbeb was used as proxy for potential hazard level, based on which possibility of contamination over this level and consumption level of S. aureus through kimbab were estimated as 30.7% and 3.67 log cfu/g, respectively. Regression sensitivity results showed time-temperature during storage at selling was the most significant factor. These results suggested temperature control under $10^{\circ}C$ was critical control point for kimbab production to prevent growth of S. aureus and showed QMRA was useful for evaluation of factors influencing potential risk and could be applied directly to risk management.

Ecological Network on Benthic Diatom in Estuary Environment by Bayesian Belief Network Modelling (베이지안 모델을 이용한 하구수생태계 부착돌말류의 생태 네트워크)

  • Kim, Keonhee;Park, Chaehong;Kim, Seung-hee;Won, Doo-Hee;Lee, Kyung-Lak;Jeon, Jiyoung
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.1
    • /
    • pp.60-75
    • /
    • 2022
  • The Bayesian algorithm model is a model algorithm that calculates probabilities based on input data and is mainly used for complex disasters, water quality management, the ecological structure between living things or living-non-living factors. In this study, we analyzed the main factors affected Korean Estuary Trophic Diatom Index (KETDI) change based on the Bayesian network analysis using the diatom community and physicochemical factors in the domestic estuarine aquatic ecosystem. For Bayesian analysis, estuarine diatom habitat data and estuarine aquatic diatom health (2008~2019) data were used. Data were classified into habitat, physical, chemical, and biological factors. Each data was input to the Bayesian network model (GeNIE model) and performed estuary aquatic network analysis along with the nationwide and each coast. From 2008 to 2019, a total of 625 taxa of diatoms were identified, consisting of 2 orders, 5 suborders, 18 families, 141 genera, 595 species, 29 varieties, and 1 species. Nitzschia inconspicua had the highest cumulative cell density, followed by Nitzschia palea, Pseudostaurosira elliptica and Achnanthidium minutissimum. As a result of analyzing the ecological network of diatom health assessment in the estuary ecosystem using the Bayesian network model, the biological factor was the most sensitive factor influencing the health assessment score was. In contrast, the habitat and physicochemical factors had relatively low sensitivity. The most sensitive taxa of diatoms to the assessment of estuarine aquatic health were Nitzschia inconspicua, N. fonticola, Achnanthes convergens, and Pseudostaurosira elliptica. In addition, the ratio of industrial area and cattle shed near the habitat was sensitively linked to the health assessment. The major taxa sensitive to diatom health evaluation differed according to coast. Bayesian network analysis was useful to identify major variables including diatom taxa affecting aquatic health even in complex ecological structures such as estuary ecosystems. In addition, it is possible to identify the restoration target accurately when restoring the consequently damaged estuary aquatic ecosystem.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

A Study on the Prediction of Residual Probability of Fine Dust in Complex Urban Area (복잡한 도심에서의 유입된 미세먼지 잔류 가능성 예보 연구)

  • Park, Sung Ju;Seo, You Jin;Kim, Dong Wook;Choi, Hyun Jeong
    • Journal of the Korean earth science society
    • /
    • v.41 no.2
    • /
    • pp.111-128
    • /
    • 2020
  • This study presents a possibility of intensification of fine dust mass concentration due to the complex urban structure using data mining technique and clustering analysis. The data mining technique showed no significant correlation between fine dust concentration and regional-use public urban data over Seoul. However, clustering analysis based on nationwide-use public data showed that building heights (floors) have a strong correlation particularly with PM10. The modeling analyses using the single canopy model and the micro-atmospheric modeling program (ENVI-Met. 4) conducted that the controlled atmospheric convection in urban area leaded to the congested flow pattern depending on the building along the distribution and height. The complex structure of urban building controls convective activity resulted in stagnation condition and fine dust increase near the surface. Consequently, the residual effect through the changes in the thermal environment caused by the shape and structure of the urban buildings must be considered in the fine dust distribution. It is notable that the atmospheric congestion may be misidentified as an important implications for providing information about the residual probability of fine dust mass concentration in the complex urban area.

Effect of Soft Handoff Technique on CDMA Cell Coverage in a Lognormally Shadowed Channel (전파음영 채널 환경에서 소프트 핸드오프 기법이 CDMA 셀룰러 시스템의 셀 커버리지에 미치는 영향)

  • Oh, Hyon-Kyu;Kim, Hang-Rae;Kim, Nam
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.12 no.6
    • /
    • pp.871-881
    • /
    • 2001
  • In this paper, the effect of soft handoff technique on CDMA cell coverage is analyzed in a lognormally shadowed channel, which uses the Hata propagation model, Also, the rate of increase for the cell coverage is analyzed by calculating the hard and soft handoff margin. When the outage probability is 0.02 and the standard deviation of the received signal is 2.5 dB in a lognormally shadowed channel, the transmit power of the mobile station which is located in the cell boundary is increased by the hard handoff margin of 5.13 dB and by the soft handoff margin of 3.68 dB, respectively. So, the rate of increase for the cell coverage is 1.39 in case of using the soft handoff technique. It is shown that if the (E$\_$b//N$\_$0/)$\_$req/ value is 7 dB, the cell coverage of the CDMA cellular system with soft handoff technique in city area is 3.33 km in case of the 850 MHz frequency and 1.36 km in case of the 1900 MHz frequency. Also, the accurate cell coverage with soft handoff technique is supported that could be serviced by the base-station in CDMA cellular system.

  • PDF

Life Prediction of Composite Pressure Vessels Using Multi-Scale Approach (멀티 스케일 접근법을 이용한 복합재 압력용기의 수명 예측)

  • Jin, Kyo-Kook;Ha, Sung-Kyu;Kim, Jae-Hyuk;Han, Hoon-Hee;Kim, Seong-Jong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3176-3183
    • /
    • 2010
  • A multi-scale fatigue life prediction methodology of composite pressure vessels subjected to multi-axial loading has been proposed in this paper. The multi-scale approach starts from the constituents, fiber, matrix and interface, leading to predict behavior of ply, laminates and eventually the composite structures. The multi-scale fatigue life prediction methodology is composed of two steps: macro stress analysis and micro mechanics of failure based on fatigue analysis. In the macro stress analysis, multi-axial fatigue loading acting at laminate is determined from finite element analysis of composite pressure vessel, and ply stresses are computed using a classical laminate theory. The micro stresses are calculated in each constituent from ply stresses using a micromechanical model. Three methods are employed in predicting fatigue life of each constituent, i.e. a maximum stress method for fiber, an equivalent stress method for multi-axially loaded matrix, and a critical plane method for the interface. A modified Goodman diagram is used to take into account the generic mean stresses. Damages from each loading cycle are accumulated using Miner's rule. Monte Carlo simulation has been performed to predict the overall fatigue life of a composite pressure vessel considering statistical distribution of material properties of each constituent, fiber volume fraction and manufacturing winding angle.

Impact Assessment of Sea_Level Rise based on Coastal Vulnerability Index (연안 취약성 지수를 활용한 해수면 상승 영향평가 방안 연구)

  • Lee, Haemi;Kang, Tae soon;Cho, Kwangwoo
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.27 no.5
    • /
    • pp.304-314
    • /
    • 2015
  • We have reviewed the current status of coastal vulnerability index(CVI) to be guided into an appropriate CVI development for Korean coast and applied a methodology into the east coast of Korea to quantify coastal vulnerability by future sea_level rise. The CVIs reviewed includes USGS CVI, sea_level rise CVI, compound CVI, and multi scale CVI. The USGS CVI, expressed into the external forcing of sea_level rise, wave and tide, and adaptive capacity of morphology, erosion and slope, is adopted here for CVI quantification. The range of CVI is 1.826~22.361 with a mean of 7.085 for present condition and increases into 2.887~30.619 with a mean of 12.361 for the year of 2100(1 m sea_level rise). The index "VERY HIGH" is currently 8.57% of the coast and occupies 35.56% in 2100. The pattern of CVI change by sea_level rise is different to different local areas, and Gangneung, Yangyang and Goseong show the highest increase. The land use pattern in the "VERY HIGH" index is dominated by both human system of housing complex, road, cropland, etc, and natural system of sand, wetland, forestry, etc., which suggests existing land utilization should be reframed in the era of climate change. Though CVI approach is highly efficient to deal with a large set of climate scenarios entailed in climate impact assessment due to uncertainties, we also propose three_level assessment for the application of CVI methodology in the site specific adaptation such as first screening assessment by CVI, second scoping assessment by impact model, and final risk quantification with the result of impact model.