• Title/Summary/Keyword: Improvement of prediction performance

Search Result 440, Processing Time 0.027 seconds

A Spatial Entropy based Decision Tree Method Considering Distribution of Spatial Data (공간 데이터의 분포를 고려한 공간 엔트로피 기반의 의사결정 트리 기법)

  • Jang, Youn-Kyung;You, Byeong-Seob;Lee, Dong-Wook;Cho, Sook-Kyung;Bae, Hae-Young
    • The KIPS Transactions:PartB
    • /
    • v.13B no.7 s.110
    • /
    • pp.643-652
    • /
    • 2006
  • Decision trees are mainly used for the classification and prediction in data mining. The distribution of spatial data and relationships with their neighborhoods are very important when conducting classification for spatial data mining in the real world. Spatial decision trees in previous works have been designed for reflecting spatial data characteristic by rating Euclidean distance. But it only explains the distance of objects in spatial dimension so that it is hard to represent the distribution of spatial data and their relationships. This paper proposes a decision tree based on spatial entropy that represents the distribution of spatial data with the dispersion and dissimilarity. The dispersion presents the distribution of spatial objects within the belonged class. And dissimilarity indicates the distribution and its relationship with other classes. The rate of dispersion by dissimilarity presents that how related spatial distribution and classified data with non-spatial attributes we. Our experiment evaluates accuracy and building time of a decision tree as compared to previous methods. We achieve an improvement in performance by about 18%, 11%, respectively.

Performance Improvement of Collaborative Filtering System Using Associative User′s Clustering Analysis for the Recalculation of Preference and Representative Attribute-Neighborhood (선호도 재계산을 위한 연관 사용자 군집 분석과 Representative Attribute -Neighborhood를 이용한 협력적 필터링 시스템의 성능향상)

  • Jung, Kyung-Yong;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.287-296
    • /
    • 2003
  • There has been much research focused on collaborative filtering technique in Recommender System. However, these studies have shown the First-Rater Problem and the Sparsity Problem. The main purpose of this Paper is to solve these Problems. In this Paper, we suggest the user's predicting preference method using Bayesian estimated value and the associative user clustering for the recalculation of preference. In addition to this method, to complement a shortcoming, which doesn't regard the attribution of item, we use Representative Attribute-Neighborhood method that is used for the prediction when we find the similar neighborhood through extracting the representative attribution, which most affect the preference. We improved the efficiency by using the associative user's clustering analysis in order to calculate the preference of specific item within the cluster item vector to the collaborative filtering algorithm. Besides, for the problem of the Sparsity and First-Rater, through using Association Rule Hypergraph Partitioning algorithm associative users are clustered according to the genre. New users are classified into one of these genres by Naive Bayes classifier. In addition, in order to get the similarity value between users belonged to the classified genre and new users, and this paper allows the different estimated value to item which user evaluated through Naive Bayes learning. As applying the preference granted the estimated value to Pearson correlation coefficient, it can make the higher accuracy because the errors that cause the missing value come less. We evaluate our method on a large collaborative filtering database of user rating and it significantly outperforms previous proposed method.

DEVELOPMENT OF SAFETY-BASED LEVEL-OF-SERVICE CRITERIA FOR ISOLATED SIGNALIZED INTERSECTIONS (독립신호 교차로에서의 교통안전을 위한 서비스수준 결정방법의 개발)

  • Dr. Tae-Jun Ha
    • Proceedings of the KOR-KST Conference
    • /
    • 1995.02a
    • /
    • pp.3-32
    • /
    • 1995
  • The Highway Capacity Manual specifies procedures for evaluating intersection performance in terms of delay per vehicle. What is lacking in the current methodology is a comparable quantitative procedure for ass~ssing the safety-based level of service provided to motorists. The objective of the research described herein was to develop a computational procedure for evaluating the safety-based level of service of signalized intersections based on the relative hazard of alternative intersection designs and signal timing plans. Conflict opportunity models were developed for those crossing, diverging, and stopping maneuvers which are associated with left-turn and rear-end accidents. Safety¬based level-of-service criteria were then developed based on the distribution of conflict opportunities computed from the developed models. A case study evaluation of the level of service analysis methodology revealed that the developed safety-based criteria were not as sensitive to changes in prevailing traffic, roadway, and signal timing conditions as the traditional delay-based measure. However, the methodology did permit a quantitative assessment of the trade-off between delay reduction and safety improvement. The Highway Capacity Manual (HCM) specifies procedures for evaluating intersection performance in terms of a wide variety of prevailing conditions such as traffic composition, intersection geometry, traffic volumes, and signal timing (1). At the present time, however, performance is only measured in terms of delay per vehicle. This is a parameter which is widely accepted as a meaningful and useful indicator of the efficiency with which an intersection is serving traffic needs. What is lacking in the current methodology is a comparable quantitative procedure for assessing the safety-based level of service provided to motorists. For example, it is well¬known that the change from permissive to protected left-turn phasing can reduce left-turn accident frequency. However, the HCM only permits a quantitative assessment of the impact of this alternative phasing arrangement on vehicle delay. It is left to the engineer or planner to subjectively judge the level of safety benefits, and to evaluate the trade-off between the efficiency and safety consequences of the alternative phasing plans. Numerous examples of other geometric design and signal timing improvements could also be given. At present, the principal methods available to the practitioner for evaluating the relative safety at signalized intersections are: a) the application of engineering judgement, b) accident analyses, and c) traffic conflicts analysis. Reliance on engineering judgement has obvious limitations, especially when placed in the context of the elaborate HCM procedures for calculating delay. Accident analyses generally require some type of before-after comparison, either for the case study intersection or for a large set of similar intersections. In e.ither situation, there are problems associated with compensating for regression-to-the-mean phenomena (2), as well as obtaining an adequate sample size. Research has also pointed to potential bias caused by the way in which exposure to accidents is measured (3, 4). Because of the problems associated with traditional accident analyses, some have promoted the use of tqe traffic conflicts technique (5). However, this procedure also has shortcomings in that it.requires extensive field data collection and trained observers to identify the different types of conflicts occurring in the field. The objective of the research described herein was to develop a computational procedure for evaluating the safety-based level of service of signalized intersections that would be compatible and consistent with that presently found in the HCM for evaluating efficiency-based level of service as measured by delay per vehicle (6). The intent was not to develop a new set of accident prediction models, but to design a methodology to quantitatively predict the relative hazard of alternative intersection designs and signal timing plans.

  • PDF

Analysis of Service Factors on the Management Performance of Korea Railroad Corporation - Based on the railroad statistical yearbook data - (한국철도공사 경영성과에 미치는 서비스 요인분석 -철도통계연보 데이터를 대상으로-)

  • Koo, Kyoung-Mo;Seo, Jeong-Tek;Kang, Nak-Jung
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.4
    • /
    • pp.127-144
    • /
    • 2021
  • The purpose of this study is to derive service factors based on the "Rail Statistical Yearbook" data of railroad service providers from 1990 to 2019, and to analyze the effect of the service factors on the operating profit ratio(OPR), a representative management performance variable of railroad transport service providers. In particular, it has academic significance in terms of empirical research to evaluate whether the management innovation of the KoRail has changed in line with the purpose of establishing the corporation by dividing the research period into the first period (1990-2003) and the latter (2004-2019). The contents of this study investigated previous studies on the quality of railway passenger transportation service and analyzed the contents of government presentation data related to the management performance evaluation of the KoRail. As an empirical analysis model, a research model was constructed using OPR as a dependent variable and service factor variables of infrastructure, economy, safety, connectivity, and business diversity as explanatory variables based on the operation and management activity information during the analysis period 30 years. On the results of research analysis, OPR is that the infrastructure factor is improved by structural reform or efficiency improvement. And economic factors are the fact that operating profit ratio improves by reducing costs. The safety factor did not reveal the significant explanatory power of the regression coefficient, but the sign of influence was the same as the prediction. Connectivity factor reveals a influence on differences between first period and latter, but OPR impact direction is changed from negative in before to positive in late. This is an evironment in which connectivity is actually realized in later period. On diversity factor, there is no effect of investment share in subsidiaries and government subsidies on OPR.

A Complexity Reduction Method of MPEG-4 Audio Lossless Coding Encoder by Using the Joint Coding Based on Cross Correlation of Residual (여기신호의 상관관계 기반 joint coding을 이용한 MPEG-4 audio lossless coding 인코더 복잡도 감소 방법)

  • Cho, Choong-Sang;Kim, Je-Woo;Choi, Byeong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.3
    • /
    • pp.87-95
    • /
    • 2010
  • Portable multi-media products which can service the highest audio-quality by using lossless audio codec has been released and the international lossless codecs, MPEG-4 audio lossless coding(ALS) and MPEG-4 scalable lossless coding(SLS), were standardized by MPEG in 2006. The simple profile of MPEG-4 ALS, it supports up to stereo, was defined by MPEG in 2009. The lossless audio codec should have low-complexity in stereo to be widely used in portable multi-media products. But the previous researches of MPEG-4 ALS have focused on an improvement of compression ratio, a complexity reduction in multi-channels coding, and a selection of linear prediction coefficients(LPCs) order. In this paper, the complexity and compression ratio of MPEG-4 ALS encoder is analyzed in simple profile of MPEG-4 ALS, the method to reduce a complexity of MPEG-4 ALS encoder is proposed. Based on an analysis of complexity of MPEG-4 ALS encoder, the complexity of short-term prediction filter of MPEG-4 ALS encoder is reduced by using the low-complexity filter that is proposed in previous research to reduce the complexity of MPEG-4 ALS decoder. Also, we propose a joint coding decision method, it reduces the complexity and keeps the compression ratio of MPEG-4 ALS encoder. In proposed method, the operation of joint coding is decided based on the relation between cross-correlation of residual and compression ratio of joint coding. The performance of MPEG-4 ALS encoder that has the method and low-complexity filter is evaluated by using the MPEG-4 ALS conformance test file and normal music files. The complexity of MPEG-4 ALS encoder is reduced by about 24% by comparing with MPEG-4 ALS reference encoder, while the compression ratio by the proposed method is comparable to MPEG-4 ALS reference encoder.

Improvement in facies discrimination using multiple seismic attributes for permeability modelling of the Athabasca Oil Sands, Canada (캐나다 Athabasca 오일샌드의 투수도 모델링을 위한 다양한 탄성파 속성들을 이용한 상 구분 향상)

  • Kashihara, Koji;Tsuji, Takashi
    • Geophysics and Geophysical Exploration
    • /
    • v.13 no.1
    • /
    • pp.80-87
    • /
    • 2010
  • This study was conducted to develop a reservoir modelling workflow to reproduce the heterogeneous distribution of effective permeability that impacts on the performance of SAGD (Steam Assisted Gravity Drainage), the in-situ bitumen recovery technique in the Athabasca Oil Sands. Lithologic facies distribution is the main cause of the heterogeneity in bitumen reservoirs in the study area. The target formation consists of sand with mudstone facies in a fluvial-to-estuary channel system, where the mudstone interrupts fluid flow and reduces effective permeability. In this study, the lithologic facies is classified into three classes having different characteristics of effective permeability, depending on the shapes of mudstones. The reservoir modelling workflow of this study consists of two main modules; facies modelling and permeability modelling. The facies modelling provides an identification of the three lithologic facies, using a stochastic approach, which mainly control the effective permeability. The permeability modelling populates mudstone volume fraction first, then transforms it into effective permeability. A series of flow simulations applied to mini-models of the lithologic facies obtains the transformation functions of the mudstone volume fraction into the effective permeability. Seismic data contribute to the facies modelling via providing prior probability of facies, which is incorporated in the facies models by geostatistical techniques. In particular, this study employs a probabilistic neural network utilising multiple seismic attributes in facies prediction that improves the prior probability of facies. The result of using the improved prior probability in facies modelling is compared to the conventional method using a single seismic attribute to demonstrate the improvement in the facies discrimination. Using P-wave velocity in combination with density in the multiple seismic attributes is the essence of the improved facies discrimination. This paper also discusses sand matrix porosity that makes P-wave velocity differ between the different facies in the study area, where the sand matrix porosity is uniquely evaluated using log-derived porosity, P-wave velocity and photographically-predicted mudstone volume.

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.

The Impact of Milk Production Level on Profit Traits of Holstein Dairy Cattle in Korea (국내 Holstein종 젖소의 생산수준이 젖소의 수익형질에 미치는 효과)

  • Do, Changhee;Park, Suhun;Cho, Kwang-Hyun;Choi, Yunho;Choi, Taejeong;Park, Byungho;Yun, Hobaek;Lee, Donghee
    • Journal of Animal Science and Technology
    • /
    • v.55 no.5
    • /
    • pp.343-349
    • /
    • 2013
  • Data including 1,372,050 milk records pertaining to 438,019 cows from 1983 to 2011 collected during performance tests conducted by the National Livestock Cooperative Dairy Improvement Center were used to calculate milk income and profit of individuals and investigate the effects of production levels of early lactation (parity 1 and 2, respectively). Individuals with a moderate level of early lactation stayed longer in herds. Among parity 1, the 9,000 kg or higher group had a lower mean number of lactations than the overall mean of 3.13. The 7,000 kg or lower and 10,000 kg or higher groups had lower mean life time milking days than the overall mean of 1,076.8 days. Standard deviations of lifetime traits tended to decrease as production levels increased. For parity 2, the 11,000 kg or higher group had a lower mean number of lactation than the overall mean of 3.43. The lifetime milking days was highest in the 12,000 kg group (1,212.0 days), and generally smaller in the lower groups. Profit increased as the production level of groups increased for both parity 1 and 2. In groups with low production levels, profit of parity 1 was higher than that of parity 2, while the reverse was true in groups with high production levels. These results suggest that individuals in the low production groups had a greater likelihood to be culled due to reproductive or other problems. Furthermore, the accuracy of the prediction of lifetime profit of individuals with a milk yield of 305 days seems to be higher for parity 2 than parity 1; therefore, it is desirable to predict lifetime profit using the 305d milk yield of parity 2. In conclusion, breeding goals are based on many factors in functions for the estimation of profit; however, production levels during early lactation (parity 1 and 2) can be used as indicators of profit to extend profitability.

Simulation and model validation of Biomass Fast Pyrolysis in a fluidized bed reactor using CFD (전산유체역학(CFD)을 이용한 유동층반응기 내부의 목질계 바이오매스 급속 열분해 모델 비교 및 검증)

  • Ju, Young Min;Euh, Seung Hee;Oh, Kwang cheol;Lee, Kang Yol;Lee, Beom Goo;Kim, Dae Hyun
    • Journal of Energy Engineering
    • /
    • v.24 no.4
    • /
    • pp.200-210
    • /
    • 2015
  • The modeling for fast pyrolysis of biomass in fluidized bed reactor has been developed for accurate prediction of bio-oil and gas products and for yield improvement. The purpose of this study is to analyze and to compare the CFD(Computational Fluid Dynamics) simulation results with the experimental data from the CFD simulation results with the experimental data from the reference(Mellin et al., 2014) for gas products generated during fast pyrolysis of biomass in fluidized bed reactor. CFD(ANSYS FLUENT v.15.0) was used for the simulation. Complex pyrolysis reaction scheme of biomass subcomponents was applied for the simulation of pyrolysis reaction. This pyrolysis reaction scheme was included reaction of cellulose, hemicellulose, lignin in detail, gas products obtained from pyrolysis were mainly $CO_2$, CO, $CH_4$, $H_2$, $C_2H_4$. The deviation between the simulation results from this study and experimental data from the reference was calculated about 3.7%p, 4.6%p, 3.9%p for $CH_4$, $H_2$, $C_2H_4$ respectively, whereas 9.6%p and 6.7%p for $CO_2$ and CO which are relatively high. Through this study, it is possible to predict gas products accurately by using CFD simulation approach. Moreover, this modeling approach should be developed to predict fluidized bed reactor performance and other gas product yields.

The NCAM Land-Atmosphere Modeling Package (LAMP) Version 1: Implementation and Evaluation (국가농림기상센터 지면대기모델링패키지(NCAM-LAMP) 버전 1: 구축 및 평가)

  • Lee, Seung-Jae;Song, Jiae;Kim, Yu-Jung
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.4
    • /
    • pp.307-319
    • /
    • 2016
  • A Land-Atmosphere Modeling Package (LAMP) for supporting agricultural and forest management was developed at the National Center for AgroMeteorology (NCAM). The package is comprised of two components; one is the Weather Research and Forecasting modeling system (WRF) coupled with Noah-Multiparameterization options (Noah-MP) Land Surface Model (LSM) and the other is an offline one-dimensional LSM. The objective of this paper is to briefly describe the two components of the NCAM-LAMP and to evaluate their initial performance. The coupled WRF/Noah-MP system is configured with a parent domain over East Asia and three nested domains with a finest horizontal grid size of 810 m. The innermost domain covers two Gwangneung deciduous and coniferous KoFlux sites (GDK and GCK). The model is integrated for about 8 days with the initial and boundary conditions taken from the National Centers for Environmental Prediction (NCEP) Final Analysis (FNL) data. The verification variables are 2-m air temperature, 10-m wind, 2-m humidity, and surface precipitation for the WRF/Noah-MP coupled system. Skill scores are calculated for each domain and two dynamic vegetation options using the difference between the observed data from the Korea Meteorological Administration (KMA) and the simulated data from the WRF/Noah-MP coupled system. The accuracy of precipitation simulation is examined using a contingency table that is made up of the Probability of Detection (POD) and the Equitable Threat Score (ETS). The standalone LSM simulation is conducted for one year with the original settings and is compared with the KoFlux site observation for net radiation, sensible heat flux, latent heat flux, and soil moisture variables. According to results, the innermost domain (810 m resolution) among all domains showed the minimum root mean square error for 2-m air temperature, 10-m wind, and 2-m humidity. Turning on the dynamic vegetation had a tendency of reducing 10-m wind simulation errors in all domains. The first nested domain (7,290 m resolution) showed the highest precipitation score, but showed little advantage compared with using the dynamic vegetation. On the other hand, the offline one-dimensional Noah-MP LSM simulation captured the site observed pattern and magnitude of radiative fluxes and soil moisture, and it left room for further improvement through supplementing the model input of leaf area index and finding a proper combination of model physics.