• Title/Summary/Keyword: variable selection

Search Result 875, Processing Time 0.025 seconds

Hail Risk Map based on Multidisciplinary Data Fusion (다학제적 데이터 융합에 기초한 우박위험지도)

  • Suhyun, Kim;Seung-Jae, Lee;Kyo-Moon, Shim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.4
    • /
    • pp.234-243
    • /
    • 2022
  • In Korea, hail damage occurs every year, and in the case of agriculture, it causes severe field crop and cultivation facility losses. Therefore, it is necessary to develop a hail information service system customized for Korea's primary production and crop-growing areas to minimize hail damage. However, the observation of hail is relatively more difficult than that of other meteorological variables, and the available data are also spatially and temporally variable. A hail information service system was developed to understand the temporal and spatial distribution of hail occurrence. As part of this, a hail observation database was established that integrated the observation data from Korea Meteorological Administration with the information from newspaper reports. Furthermore, a hail risk map was produced based on this database. The risk map presented the nationwide distribution and characteristics of hail showers from 1970 to 2018, and the northeastern region of South Korea was found to be relatively dangerous. Overall, hail occurred nationwide, especially in the northeast and some inland areas (Gangwon, Gyeongbuk, and Chungbuk province) and in winter, mainly on the north coast and some inland areas as graupel (small and soft hail). Analyzing the time of day, frequency, and hailstone size of hail shower occurrences by region revealed that the incidence of large hail stones (e.g., 10 cm at Damyang-gun) has increased in recent years and that showers occurred mainly in the afternoon when the updraft was well formed. By integrating multidisciplinary data, the temporal and spatial gap in hail data could be supplemented. The hail risk map produced in this study will be helpful for the selection of suitable crops and growth management strategies under the changing climate conditions.

Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data (다변량 지구과학 데이터와 가우시안 혼합 모델을 이용한 공간 분포 추정)

  • Kim, Ho-Rim;Yu, Soonyoung;Yun, Seong-Taek;Kim, Kyoung-Ho;Lee, Goon-Taek;Lee, Jeong-Ho;Heo, Chul-Ho;Ryu, Dong-Woo
    • Economic and Environmental Geology
    • /
    • v.55 no.4
    • /
    • pp.353-366
    • /
    • 2022
  • Spatial estimation of geoscience data (geo-data) is challenging due to spatial heterogeneity, data scarcity, and high dimensionality. A novel spatial estimation method is needed to consider the characteristics of geo-data. In this study, we proposed the application of Gaussian Mixture Model (GMM) among machine learning algorithms with multivariate data for robust spatial predictions. The performance of the proposed approach was tested through soil chemical concentration data from a former smelting area. The concentrations of As and Pb determined by ex-situ ICP-AES were the primary variables to be interpolated, while the other metal concentrations by ICP-AES and all data determined by in-situ portable X-ray fluorescence (PXRF) were used as auxiliary variables in GMM and ordinary cokriging (OCK). Among the multidimensional auxiliary variables, important variables were selected using a variable selection method based on the random forest. The results of GMM with important multivariate auxiliary data decreased the root mean-squared error (RMSE) down to 0.11 for As and 0.33 for Pb and increased the correlations (r) up to 0.31 for As and 0.46 for Pb compared to those from ordinary kriging and OCK using univariate or bivariate data. The use of GMM improved the performance of spatial interpretation of anthropogenic metals in soil. The multivariate spatial approach can be applied to understand complex and heterogeneous geological and geochemical features.

Segregation Mode of Plant Height in Crosses of Rice Cultivars Ⅸ. Crosses between Semi-dwarf Japonicas and Semi-dwarf(d-t) gene Testers (수도 품종간 교잡에 있어서 간장의 유전분리 Ⅸ. 단간 Japonica 품종과 Semi-dwarf (d-t) gene 검정친과의 조합)

  • Kim, Yong-Kwon;Kim, Hong-Yeol;Nam, Yeong-Woo;Park, Sun-Zik;Heu, Mun-Hue
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.30 no.4
    • /
    • pp.449-454
    • /
    • 1985
  • In order to search for the semi-dwarf japonica varieties allelic to the semi-dwarf rice cultivar which is controlled by d-t gene, seven dwarf japonica varieties. Reimei, Hoyoku. Shiranui, Kokumasari, M 7. S.224 and S.295 were crossed to the semi-dwarf cultivar, wx 817. wx 817 is known to have semi-dwarf gene d-t. Their F$_1$, F$_2$ and F$_3$ were grown in 1984 and 1985 and culm lengths were measured at harvest. The results are summarized as follows. 1. The F$_2$s of all 7 cross combinations showed normal distribution and no segregation. 2. The range of culm length variation in the F$_3$ was variable depending on the cross combination, but the general pattern was similar in the all 7 crosses. 3. The mean of F$_3$ and parental F$_2$ mean which were selected into short, medium and tall groups were similar and showed no segregation, implying the selection efficiency in F$_2$. 4. From the results of F$_2$ and F$_3$ segregations, it is concluded that the culm length of the 7 semi-dwarf japonicas tested here are controlled by the same major gene d-t although they are modified by different minor genes.

  • PDF

A Study on Social Issues and Consumption Behavior Using Big Data (빅데이터를 활용한 사회적 이슈와 소비행동 연구)

  • Baek, Seung-Heon;Kim, Gi-Tak
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.377-389
    • /
    • 2019
  • This study conducted social network big data analysis to investigate consumer's perception of Japanese sporting goods related to Japanese boycott and to extract problems and variables by recognition. Social network big data analysis was conducted in two areas, "Japanese boycott" and "Japanese sporting goods". Months of data were collected and investigated. If you specify the research method, you will identify the issues of the times - keyword setting using social network analysis - clustering using CONCOR analysis using TEXTOM and Ucinet 6 programs - variable selection through expert meetings - questionnaire preparation and answering - and validity of questionnaire Reliability Verification - It consists of hypothesis verification using the structural model equation. Based on the results of using the big data of social networks, four variables of relevant characteristics, nationality, attitude, and consumption behavior were extracted. A total of 30 questions and 292 questionnaires were used for final hypothesis verification. As a result of the analysis, first, the boycott-related characteristics showed a positive relationship with nationality. Specifically, all of the characteristics related to boycotts (necessary boycott, sense of boycott, and perceived boycott benefits were positively related to nationality. In addition, nationality was found to have a positive relationship with consumption behavior.

The Effect on the Switching Intention to the Blockchain-based Supply Chain Management Information System (블록체인 기반 공급망관리 정보시스템으로의 전환의도에 영향을 미치는 요인)

  • Kyoung Sang Oh;Dong Myung Lee
    • Journal of Industrial Convergence
    • /
    • v.20 no.12
    • /
    • pp.11-25
    • /
    • 2022
  • In this study, we want to verify the factors that affect the intention to switch to a supply chain management information system applied with blockchain. To this end, variable selection and research model were constructed through the review of previous studies, and empirical analysis was conducted using the TOE framework and PPM model. The effects of Push and Pull factors on the intention to switch to the block chain system and the moderating effect through the switching cost which is a Mooring factor, were verified. The hypothesis was verified using a structural equation model using a sample of 320 response data by conducting a questionnaire survey on small and medium-sized enterprises located in Korea. As a result of the study, social influence, which is a push factor, and management's will to innovate, which is a Pull factor, had a significant effect on switching intention. And the moderating effect between the groups with high and low switching cost recognition was confirmed. This study is significant in that it presents the concept and research direction of SCBM (supply chain & blockchain management) that can enhance the competitiveness of a company through the implementation of a blockchain-based supply chain management information system.

Prediction of Dormant Customer in the Card Industry (카드산업에서 휴면 고객 예측)

  • DongKyu Lee;Minsoo Shin
    • Journal of Service Research and Studies
    • /
    • v.13 no.2
    • /
    • pp.99-113
    • /
    • 2023
  • In a customer-based industry, customer retention is the competitiveness of a company, and improving customer retention improves the competitiveness of the company. Therefore, accurate prediction and management of potential dormant customers is paramount to increasing the competitiveness of the enterprise. In particular, there are numerous competitors in the domestic card industry, and the government is introducing an automatic closing system for dormant card management. As a result of these social changes, the card industry must focus on better predicting and managing potential dormant cards, and better predicting dormant customers is emerging as an important challenge. In this study, the Recurrent Neural Network (RNN) methodology was used to predict potential dormant customers in the card industry, and in particular, Long-Short Term Memory (LSTM) was used to efficiently learn data for a long time. In addition, to redefine the variables needed to predict dormant customers in the card industry, Unified Theory of Technology (UTAUT), an integrated technology acceptance theory, was applied to redefine and group the variables used in the model. As a result, stable model accuracy and F-1 score were obtained, and Hit-Ratio proved that models using LSTM can produce stable results compared to other algorithms. It was also found that there was no moderating effect of demographic information that could occur in UTAUT, which was pointed out in previous studies. Therefore, among variable selection models using UTAUT, dormant customer prediction models using LSTM are proven to have non-biased stable results. This study revealed that there may be academic contributions to the prediction of dormant customers using LSTM algorithms that can learn well from previously untried time series data. In addition, it is a good example to show that it is possible to respond to customers who are preemptively dormant in terms of customer management because it is predicted at a time difference with the actual dormant capture, and it is expected to contribute greatly to the industry.

Analysis-based Pedestrian Traffic Incident Analysis Based on Logistic Regression (로지스틱 회귀분석 기반 노인 보행자 교통사고 요인 분석)

  • Siwon Kim;Jeongwon Gil;Jaekyung Kwon;Jae seong Hwang;Choul ki Lee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.23 no.2
    • /
    • pp.15-31
    • /
    • 2024
  • The characteristics of elderly traffic accidents were identified by reflecting the situation of the elderly population in Korea, which is entering an ultra-aging society, and the relationship between independent and dependent variables was analyzed by classifying traffic accidents of serious or higher and traffic accidents of minor or lower in elderly pedestrian traffic accidents using binomial variables. Data collection, processing, and variable selection were performed by acquiring data from the elderly pedestrian traffic accident analysis system (TAAS) for the past 10 years (from 13 to 22 years), and basic statistics and analysis by accident factors were performed. A total of 15 influencing variables were derived by applying the logistic regression model, and the influencing variables that have the greatest influence on the probability of a traffic accident involving severe or higher elderly pedestrians were derived. After that, statistical tests were performed to analyze the suitability of the logistic model, and a method for predicting the probability of a traffic accident according to the construction of a prediction model was presented.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

An empirical study on a firm's fail prediction model by considering whether there are embezzlement, malpractice and the largest shareholder changes or not (횡령.배임 및 최대주주변경을 고려한 부실기업예측모형 연구)

  • Moon, Jong Geon;Hwang Bo, Yun
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.9 no.1
    • /
    • pp.119-132
    • /
    • 2014
  • This study analyzed the failure prediction model of the firms listed on the KOSDAQ by considering whether there are embezzlement, malpractice and the largest shareholder changes or not. This study composed a total of 166 firms by using two-paired sampling method. For sample of failed firm, 83 manufacturing firms which delisted on KOSDAQ market for 4 years from 2009 to 2012 are selected. For sample of normal firm, 83 firms (with same item or same business as failed firm) that are listed on KOSDAQ market and perform normal business activities during the same period (from 2009 to 2012) are selected. This study selected 80 financial ratios for 5 years immediately preceding from delisting of sample firm above and conducted T-test to derive 19 of them which emerged for five consecutive years among significant variables and used forward selection to estimate logistic regression model. While the precedent studies only analyzed the data of three years immediately preceding the delisting, this study analyzes data of five years immediately preceding the delisting. This study is distinct from existing previous studies that it researches which significant financial characteristic influences the insolvency from the initial phase of insolvent firm with time lag and it also empirically analyzes the usefulness of data by building a firm's fail prediction model which considered embezzlement/malpractice and the largest shareholder changes as dummy variable(non-financial characteristics). The accuracy of classification of the prediction model with dummy variable appeared 95.2% in year T-1, 88.0% in year T-2, 81.3% in year T-3, 79.5% in year T-4, and 74.7% in year T-5. It increased as year of delisting approaches and showed generally higher the accuracy of classification than the results of existing previous studies. This study expects to reduce the damage of not only the firm but also investors, financial institutions and other stakeholders by finding the firm with high potential to fail in advance.

  • PDF