• Title/Summary/Keyword: random variable

Search Result 904, Processing Time 0.026 seconds

Estimation of Annual Trends and Environmental Effects on the Racing Records of Jeju Horses (제주마 주파기록에 대한 연도별 추세 및 환경효과 분석)

  • Lee, Jongan;Lee, Soo Hyun;Lee, Jae-Gu;Kim, Nam-Young;Choi, Jae-Young;Shin, Sang-Min;Choi, Jung-Woo;Cho, In-Cheol;Yang, Byoung-Chul
    • Journal of Life Science
    • /
    • v.31 no.9
    • /
    • pp.840-848
    • /
    • 2021
  • This study was conducted to estimate annual trends and the environmental effects in the racing records of Jeju horses. The Korean Racing Authority (KRA) collected 48,645 observations for 2,167 Jeju horses from 2002 to 2019. Racing records were preprocessed to eliminate errors that occur during the data collection. Racing times were adjusted for comparison between race distances. A stepwise Akaike information criterion (AIC) variable selection method was applied to select appropriate environment variables affecting racing records. The annual improvement of the race time was -0.242 seconds. The model with the lowest AIC value was established when variables were selected in the following order: year, budam classification, jockey ranking, trainer ranking, track condition, weather, age, and gender. The most suitable model was constructed when the jockey ranking and age variables were considered as random effects. Our findings have potential for application as basic data when building models for evaluating genetic abilities of Jeju horses.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Service Quality, Customer Satisfaction and Customer Loyalty of Mobile Communication Industry in China (중국이동통신산업중적복무질량(中国移动通信产业中的服务质量), 고객만의도화고객충성도(顾客满意度和顾客忠诚度))

  • Zhang, Ruijin;Li, Xiangyang;Zhang, Yunchang
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.3
    • /
    • pp.269-277
    • /
    • 2010
  • Previous studies have shown that the most important factor affecting customer loyalty in the service industry is service quality. However, on the subject of whether service quality has a direct or indirect effect on customer loyalty, scholars' views apparently vary. Some studies suggest that service quality has a direct and fundamental influence on customer loyalty (Bai and Liu, 2002). However, others have shown that service quality not only directly affects customer loyalty, it also has an indirect impact on customer loyalty by influencing customer satisfaction and perceived value (Cronin, Brady, and Hult, 2000). Currently, there are few domestic articles that specifically address the relationship between service quality and customer loyalty in the mobile communication industry. Moreover, research has studied customer loyalty as a whole variable, rather than breaking it down further into multiple dimensions. Based on this analysis, this paper summarizes previous study results, establishes an effect mechanism model among service quality, customer satisfaction, and customer loyalty in the mobile communication industry, and presents a statistical test on model assumptions by using customer investigation data from Heilongjiang Mobile Company. It provides theoretical guidance for mobile service management based on the discussion of the hypothesis test results. For data collection, the sample comprised mobile users in Harbin city, and the survey was taken by random sampling. Out of a total of 300 questionnaires, 276 (92.9%) were recovered. After excluding invalid questionnaires, 249 remained, for an effective rate of 82.6 percent for the study. Cronbach's ${\alpha}$ coefficient was adapted to assess the scale reliability, and validity testing was conducted on the questionnaire from three aspects: content validity, construct validity. and convergent validity. The study tested for goodness of fit mainly from the absolute and relative fit indexes. From the hypothesis testing results, overall, four assumptions have not been supported. The ultimate affective relationship of service quality, customer satisfaction, and customer loyalty is demonstrated in Figure 2. On the whole, the service quality of the communication industry not only has a direct positive significant effect on customer loyalty, it also has an indirect positive significant effect on customer loyalty through service quality; the affective mechanism and extent of customer loyalty are different, and are influenced by each dimension of service quality. This study used the questionnaires of existing literature from home and abroad and tested them in empirical research, with all questions adapted to seven-point Likert scales. With the SERVQUAL scale of Parasuraman, Zeithaml, and Berry (1988), or PZB, as a reference point, service quality was divided into five dimensions-tangibility, reliability, responsiveness, assurance, and empathy-and the questions were simplified down to nineteen. The measurement of customer satisfaction was based mainly on Fornell (1992) and Wang and Han (2003), ending up with four questions. Based on the study’s three indicators of price tolerance, first choice, and complaint reaction were used to measure attitudinal loyalty, while repurchase intention, recommendation, and reputation measured behavioral loyalty. The collection and collation of literature data produced a model of the relationship among service quality, customer satisfaction, and customer loyalty in mobile communications, and China Mobile in the city of Harbin in Heilongjiang province was used for conducting an empirical test of the model and obtaining some useful conclusions. First, service quality in mobile communication is formed by the five factors mentioned earlier: tangibility, reliability, responsiveness, assurance, and empathy. On the basis of PZB SERVQUAL, the study designed a measurement scale of service quality for the mobile communications industry, and obtained these five factors through exploratory factor analysis. The factors fit basically with the five elements, indicating the concept of five elements of service quality for the mobile communications industry. Second, service quality in mobile communications has both direct and indirect positive effects on attitudinal loyalty, with the indirect effect being produced through the intermediary variable, customer satisfaction. There are also both direct and indirect positive effects on behavioral loyalty, with the indirect effect produced through two intermediary variables: customer satisfaction and attitudinal loyalty. This shows that better service quality and higher customer satisfaction will activate the attitudinal to service providers more active and show loyalty to service providers much easier. In addition, the effect mechanism of all dimensions of service quality on all dimensions of customer loyalty is different. Third, customer satisfaction plays a significant intermediary role among service quality and attitudinal and behavioral loyalty, indicating that improving service quality can boost customer satisfaction and make it easier for satisfied customers to become loyal customers. Moreover, attitudinal loyalty plays a significant intermediary role between service quality and behavioral loyalty, indicating that only attitudinally and behaviorally loyal customers are truly loyal customers. The research conclusions have some indications for Chinese telecom operators and others to upgrade their service quality. Two limitations to the study are also mentioned. First, all data were collected in the Heilongjiang area, so there might be a common method bias that skews the results. Second, the discussion addresses the relationship between service quality and customer loyalty, setting customer satisfaction as mediator, but does not consider other factors, like customer value and consumer features, This research will be continued in the future.

Prevalence and Related Factors of Knee Osteoarthritis in Rural Women (농촌여성의 무릎 골관절염 유병률 및 관련요인)

  • Seo, Joong-Hwan;Kang, Pock-Soo;Lee, Kyeong-Soo;Yun, Sung-Ho;Hwang, Tae-Yoon;Park, Jong-Seo
    • Journal of agricultural medicine and community health
    • /
    • v.30 no.2
    • /
    • pp.167-182
    • /
    • 2005
  • Objectives: This study was performed to investigate the prevalence of knee osteoarthritis according to the criteria of diagnosing knee osteoarthritis in rural women and the factors related with this disease. Methods: The data obtained from 200 women older than 40 years of age residing in 5 Ri's in Goryeong-gun. Gyeongsanbuk-do by random cluster sampling from September to October 2002. Knee osteoarthritis was determined positive according to the Kellgren and Lawrence classification and knee pain. Results: Among these subjects, 71.0% showed more than grade 2 in radiologic finding and the rate of knee pain according to the survey was 67.0%. The rate of subjects meeting the criteria of knee osteoarthritis was 54.0%. According to univariate analysis, the prevalence of knee osteoarthritis increased with age and those farming people and people working in household industry was significantly high at 58.9% compared with others. The prevalence of knee osteoarthritis showed a significant relationship with the family history and past history of knee injury and knee surgery(p<0.01), and diabetes mellitus(p<0.05). The score of ADL was significantly different in the subjects with knee osteoarthritis compared with normal group(p<0.05). When the presence of knee osteoarthritis and the period of the life style of seating down on the floor were compared, a significant difference was present between the osteoarthritis group and normal group. As for metabolic factors, the blood sugar level, bone density, and body mass index(BMI) were significantly different in the osteoarthritis group compared with normal group. When multiple logistic regression analysis was performed with the presence of knee osteoarthritis as the dependent variable, the prevalence of knee osteoarthritis was significantly affected by older age, subjects farming or working in household industry, the history of knee injury, the history of surgery, higher blood sugar level, and higher BMI. Conclusions: These subjects need an intervention through self-care programs such as exercise for preventing osteoarthritis, weight control programs, other exercise programs strengthening knee joints, and guidelines when working in vinyl houses.

  • PDF

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Real Option Analysis to Value Government Risk Share Liability in BTO-a Projects (손익공유형 민간투자사업의 투자위험분담 가치 산정)

  • KU, Sukmo;LEE, Sunghoon;LEE, Seungjae
    • Journal of Korean Society of Transportation
    • /
    • v.35 no.4
    • /
    • pp.360-373
    • /
    • 2017
  • The BTO-a projects is the types, which has a demand risk among the type of PPP projects in Korea. When demand risk is realized, private investor encounters financial difficulties due to lower revenue than its expectation and the government may also have a problem in stable infrastructure operation. In this regards, the government has applied various risk sharing policies in response to demand risk. However, the amount of government's risk sharing is the government's contingent liabilities as a result of demand uncertainty, and it fails to be quantified by the conventional NPV method of expressing in the text of the concession agreement. The purpose of this study is to estimate the value of investment risk sharing by the government considering the demand risk in the profit sharing system (BTO-a) introduced in 2015 as one of the demand risk sharing policy. The investment risk sharing will take the form of options in finance. Private investors have the right to claim subsidies from the government when their revenue declines, while the government has the obligation to pay subsidies under certain conditions. In this study, we have established a methodology for estimating the value of investment risk sharing by using the Black - Scholes option pricing model and examined the appropriateness of the results through case studies. As a result of the analysis, the value of investment risk sharing is estimated to be 12 billion won, which is about 4% of the investment cost of the private investment. In other words, it can be seen that the government will invest 12 billion won in financial support by sharing the investment risk. The option value when assuming the traffic volume risk as a random variable from the case studies is derived as an average of 12.2 billion won and a standard deviation of 3.67 billion won. As a result of the cumulative distribution, the option value of the 90% probability interval will be determined within the range of 6.9 to 18.8 billion won. The method proposed in this study is expected to help government and private investors understand the better risk analysis and economic value of better for investment risk sharing under the uncertainty of future demand.

Genetic Analysis of Carcass Traits in Hanwoo with Different Slaughter End-points (세가지 도축 종료 시점을 공변량으로 하는 한우 도체형질에 대한 유전능력 분석모형)

  • Choy, Y.H.;Yoon, H.B.;Choi, S.B.;Chung, H.W.
    • Journal of Animal Science and Technology
    • /
    • v.47 no.5
    • /
    • pp.703-710
    • /
    • 2005
  • Data from Hanwoo steers and bull calves were analyzed to see the phenotypic and genetic relationships between carcass traits from four different covariance models. Four models fit test station and test period as fixed effect of contemporary group and sire as random effect assuming paternal half-sib relationships among animals. Each model fits one of linear covariate (s) of different slaughter end points-age at slaughter in the first order, age at slaughter in the first and second order, slaughter weight or back fat thickness at 12-13th rib of cold carcass. Age at slaughter in its second order was not significant. Age at slaughter accounted for signifi- cant amount of genetic variances and covariances of carcass traits. Heritability estimates of back fat thickness, rib eye area, carcass weight, marbling score and dressing percentage were 0.34, 0.22, 0.24, 0.42 and 0.18, respectively at constant age basis. The genetic correlation between carcass weight and the other variables were all positive and low to high in magnitude. Genetic correlations between back fat thickness and rib eye area and between marbling score and dressing percentage were low but negative. Variance and covariance structure between these traits were shifted to a great extent when these variables were regressed on slaughter weight or on back fat thickness. These two covariates counteracted to each other but they adjusted each carcass variable or their interrelationship according to differential growth of body components, bone, muscle and fat. Slaughter weight tended to decrease genetic variances and covariances of carcass weight and between component traits and back fat thickness tended to increase those of rib eye area and between rib eye area and carcass weight.

Wind Stability of Commercially Important Tree Species and Silvicultural Implications, Daegwallyeong Korea (대관령 지역 경제림에 대한 내풍 안정성 분석 및 임업적 적용)

  • Moktan, Mani Ram;Kwon, Jino;Lim, Joo-Hoon;Shin, Moon-Hyun;Park, Chan-Woo;Bae, Sang-Won
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.17 no.1
    • /
    • pp.58-68
    • /
    • 2015
  • This study compares the wind stability of Larix kaempferi (Lamb.) Carr., Pinus koraiensis Sie. & Zucc. and Abies holophylla Maxim. to understand and inform wind risk management of these plantation trees at Daegwallyeong, Korea. Temporary square plots of $20m{\times}20m$ ($400m^2$) were laid out, and DBH (Diameter at Breast Height) and height for trees greater than 10 cm in DBH were measured by species. A total of 15 plots with 5 plots each in L. kaempferi, P. koraiensis and A. holophylla stands were sampled at random. Among the species, A. holophylla and P. koraiensis have comparatively lower h/d (Height/DBH) ratios than L. kaempferi. These results indicate that the former two species are more wind firm than the latter species. About 9% of the L. kaempferi trees have higher h/d ratios than the critical threshold limit 80. These trees are vulnerable to wind damage and should be removed in the next thinning regime. The analysis of variance detected a significant difference (p < 0.05) in the h/d ratios and Gini coefficient indicating species differences and DBH size variation, respectively. Gini coefficient was 16.4% in A. holophylla, 15.9% in P. koraiensis and 14% in L. kaempferi stands indicating limited DBH size variation. Lower h/d ratios are attributed to thinning in these stands and tree morphological differences. To increase wind firmness, low thinning should concentrate to remove trees with the h/d ratio above 80 coinciding at the time of stand distinction phase. Forest managers and practitioners should measure and maintain h/d ratios of trees below the critical threshold limit of 80 through stand density management. Variable density thinning approach should be tested to increase tree DBH sizes of the even-aged stands.

Kriging of Daily PM10 Concentration from the Air Korea Stations Nationwide and the Accuracy Assessment (베리오그램 최적화 기반의 정규크리깅을 이용한 전국 에어코리아 PM10 자료의 일평균 격자지도화 및 내삽정확도 검증)

  • Jeong, Yemin;Cho, Subin;Youn, Youjeong;Kim, Seoyeon;Kim, Geunah;Kang, Jonggu;Lee, Dalgeun;Chung, Euk;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.379-394
    • /
    • 2021
  • Air pollution data in South Korea is provided on a real-time basis by Air Korea stations since 2005. Previous studies have shown the feasibility of gridding air pollution data, but they were confined to a few cities. This paper examines the creation of nationwide gridded maps for PM10 concentration using 333 Air Korea stations with variogram optimization and ordinary kriging. The accuracy of the spatial interpolation was evaluated by various sampling schemes to avoid a too dense or too sparse distribution of the validation points. Using the 114,745 matchups, a four-round blind test was conducted by extracting random validation points for every 365 days in 2019. The overall accuracy was stably high with the MAE of 5.697 ㎍/m3 and the CC of 0.947. Approximately 1,500 cases for high PM10 concentration also showed a result with the MAE of about 12 ㎍/m3 and the CC over 0.87, which means that the proposed method was effective and applicable to various situations. The gridded maps for daily PM10 concentration at the resolution of 0.05° also showed a reasonable spatial distribution, which can be used as an input variable for a gridded prediction of tomorrow's PM10 concentration.

Preference and Loyalty Evaluation Using Sentiment Analysis for Promotion and Consumption Expansion of Paprika (감성분석을 이용한 파프리카 소비 확대와 홍보를 위한 선호도와 충성도 평가)

  • Jang, Hye Sook;Lee, Jung Sup;Bang, Ji Wong;Lee, Jae Han
    • Journal of Bio-Environment Control
    • /
    • v.31 no.4
    • /
    • pp.343-355
    • /
    • 2022
  • This study investigated the consumption tendency and awareness of paprika in order to expand and promote the consumption of Capsicum annuum L. The research investigated the relationship of preference and loyalty based on emotional response of paprika according to the semantic differential scale. The survey was conducted from January to February 2022 using a random sampling method targeting 155 general people, and a total of 142 questionnaires were analyzed excluding 13 wrong answers. The nine items on the awareness of paprika showed to be consisted of three factors such as 'Food taste', 'Usability', and 'Economics' by factor analysis. Regarding to the awareness of paprika the positive answer that 'I think paprika is good for health' among the nine questions was the highest at 92.3%. In the preference aspect of shape, blocky type had the highest preference for the shape of paprika, followed by mini and conical types in order of preference (p < 0.001). As for color preference, yellow paprika was the most preferred, followed by orange, red, and green, showing statistical significance. The emotional response of paprika by paprika image showed a statistically significant difference in the four colors. The words such as 'bright', 'clean', and 'spirited' appeared as representative emotional vocabulary for paprika. Multiple regression analysis was performed to examine the effect of paprika on the three factors of awareness, preference, and loyalty due to the quality of life. As a result, the higher the paprika preference and quality of life, and the higher the taste and availability factors, the higher the paprika awareness and loyalty. As the variable that has the most influence on the loyalty of the survey respondents, preference was found to have the highest explanatory power at 43%. From these results, it was judged as a very important factor in the survey on the shape and color preference of paprika. Therefore, the recent increase in awareness that paprika is good for health is thought to act as a positive factor in revitalizing the domestic market and increasing consumption of paprika in the future. Also, among the three types of paprika, the yellow blunt type showed the highest preference. Therefore, in order to produce and promote this type of paprika, it is also important to increase the cultivation to suit the purchasing propensity of consumers.