• Title/Summary/Keyword: 다변수 시스템

Search Result 256, Processing Time 0.02 seconds

Analysis of Correlation between Particulate Matter in the Atmosphere and Rainwater Quality During Spring and Summer of 2020 (봄·여름철 대기 중 미세먼지와 빗물 수질 상관성 분석)

  • Park, Hyemin;Kim, Taeyong;Heo, Junyong;Yang, Minjune
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_2
    • /
    • pp.1859-1867
    • /
    • 2021
  • This study investigated seasonal characteristics of the particulate matter (PM) in the atmosphere and rainwater quality in Busan, South Korea, and evaluated the seasonal effect of PM10 concentration in the atmosphere on the rainwater quality using multivariate statistical analysis. The concentration of PM in the atmosphere and meteorological observations(daily precipitation amount and rainfall intensity) are obtained from automatic weather systems (AWS) by the Korea Meteorological Administration (KMA) from March 2020 to August 2020. Rainwater samples (n = 216, 13 rain events) were continuously collected from the beginning of the precipitation using the rainwater collecting device at Pukyong National University. The samples were analyzed for pH, EC (electrical conductivity), water-soluble cations(Na+, Mg2+, K+, Ca2+, and NH4+), and anions(Cl-, NO3-, and SO42-). The concentration of PM10 in the atmosphere was steadily measured before and after the precipitation with a custom-built PM sensor node. The measured data were analyzed using principal component analysis (PCA) and Pearson correlation analysis to identify relationships between the concentration of PM10 in the atmosphere and rainwater quality. In spring, the daily average concentration of PM10 (34.11 ㎍/m3) and PM2.5 (19.23 ㎍/m3) in the atmosphere were relatively high, while the value of daily precipitation amount and rainfall intensity were relatively low. In addition, the concentration of PM10 in the atmosphere showed a significant positive correlation with the concentration of water-soluble ions (r = 0.99) and EC (r = 0.95) and a negative correlation with the pH (r = -0.84) of rainwater samples. In summer, the daily average concentration of PM10 (27.79 ㎍/m3) and PM2.5 (17.41 ㎍/m3) in the atmosphere were relatively low, and the maximum rainfall intensity was 81.6 mm/h, recording a large amount of rain for a long time. The results indicated that there was no statistically significant correlation between the concentration of PM10 in the atmosphere and rainwater quality in summer.

Extraction of Primary Factors Influencing Dam Operation Using Factor Analysis (요인분석 통계기법을 이용한 댐 운영에 대한 영향 요인 추출)

  • Kang, Min-Goo;Jung, Chan-Yong;Lee, Gwang-Man
    • Journal of Korea Water Resources Association
    • /
    • v.40 no.10
    • /
    • pp.769-781
    • /
    • 2007
  • Factor analysis has been usually employed in reducing quantity of data and summarizing information on a system or phenomenon. In this analysis methodology, variables are grouped into several factors by consideration of statistic characteristics, and the results are used for dropping variables which have lower weight than others. In this study, factor analysis was applied for extracting primary factors influencing multi-dam system operation in the Han River basin, where there are two multi-purpose dams such as Soyanggang Dam and Chungju Dam, and water has been supplied by integrating two dams in water use season. In order to fulfill factor analysis, first the variables related to two dams operation were gathered and divided into five groups (Soyanggang Dam: inflow, hydropower product, storage management, storage, and operation results of the past; Chungju Dam: inflow, hydropower product, water demand, storage, and operation results of the past). And then, considering statistic properties, in the gathered variables, some variables were chosen and grouped into five factors; hydrological condition, dam operation of the past, dam operation at normal season, water demand, and downstream dam operation. In order to check the appropriateness and applicability of factors, a multiple regression equation was newly constructed using factors as description variables, and those factors were compared with terms of objective function used in operation water resources optimally in a river basin. Reviewing the results through two check processes, it was revealed that the suggested approach provided satisfactory results. And, it was expected for extracted primary factors to be useful for making dam operation schedule considering the future situation and previous results.

Comparison of Models for Stock Price Prediction Based on Keyword Search Volume According to the Social Acceptance of Artificial Intelligence (인공지능의 사회적 수용도에 따른 키워드 검색량 기반 주가예측모형 비교연구)

  • Cho, Yujung;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.103-128
    • /
    • 2021
  • Recently, investors' interest and the influence of stock-related information dissemination are being considered as significant factors that explain stock returns and volume. Besides, companies that develop, distribute, or utilize innovative new technologies such as artificial intelligence have a problem that it is difficult to accurately predict a company's future stock returns and volatility due to macro-environment and market uncertainty. Market uncertainty is recognized as an obstacle to the activation and spread of artificial intelligence technology, so research is needed to mitigate this. Hence, the purpose of this study is to propose a machine learning model that predicts the volatility of a company's stock price by using the internet search volume of artificial intelligence-related technology keywords as a measure of the interest of investors. To this end, for predicting the stock market, we using the VAR(Vector Auto Regression) and deep neural network LSTM (Long Short-Term Memory). And the stock price prediction performance using keyword search volume is compared according to the technology's social acceptance stage. In addition, we also conduct the analysis of sub-technology of artificial intelligence technology to examine the change in the search volume of detailed technology keywords according to the technology acceptance stage and the effect of interest in specific technology on the stock market forecast. To this end, in this study, the words artificial intelligence, deep learning, machine learning were selected as keywords. Next, we investigated how many keywords each week appeared in online documents for five years from January 1, 2015, to December 31, 2019. The stock price and transaction volume data of KOSDAQ listed companies were also collected and used for analysis. As a result, we found that the keyword search volume for artificial intelligence technology increased as the social acceptance of artificial intelligence technology increased. In particular, starting from AlphaGo Shock, the keyword search volume for artificial intelligence itself and detailed technologies such as machine learning and deep learning appeared to increase. Also, the keyword search volume for artificial intelligence technology increases as the social acceptance stage progresses. It showed high accuracy, and it was confirmed that the acceptance stages showing the best prediction performance were different for each keyword. As a result of stock price prediction based on keyword search volume for each social acceptance stage of artificial intelligence technologies classified in this study, the awareness stage's prediction accuracy was found to be the highest. The prediction accuracy was different according to the keywords used in the stock price prediction model for each social acceptance stage. Therefore, when constructing a stock price prediction model using technology keywords, it is necessary to consider social acceptance of the technology and sub-technology classification. The results of this study provide the following implications. First, to predict the return on investment for companies based on innovative technology, it is most important to capture the recognition stage in which public interest rapidly increases in social acceptance of the technology. Second, the change in keyword search volume and the accuracy of the prediction model varies according to the social acceptance of technology should be considered in developing a Decision Support System for investment such as the big data-based Robo-advisor recently introduced by the financial sector.

An Empirical Study on the Failure Factors of Startups Using Non-financial Information (비재무정보를 이용한 창업기업의 부실요인에 관한 실증연구)

  • Nam, Gi Joung;Lee, Dong Myung;Chen, Lu
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.14 no.1
    • /
    • pp.139-149
    • /
    • 2019
  • The purpose of this study is to contribute to the minimization of the social cost due to the insolvency by improving the success rate of the startups by providing useful information to the founders and the start-up support institutions through analysis of non-financial information affecting the failure of the startups. This study is aimed at entrepreneurs. The entrepreneurs that are defined by the credit guarantee institutions generally refer to entrepreneurs within 5 years of establishment. The data used in the study are sampled from the companies that were supported by the start-up guarantee from January 2014 to December 2013 as the end of December 2017. The total number of sampled firms is 2,826, 2,267 companies (80.2%), and 559 non-performing companies (19.8%). The non-financial information of the entrepreneur was divided into the entrepreneur characteristics information, the entrepreneur characteristics information, the entrepreneur asset information and the entrepreneur 's credit information, and cross-tabulations and logistic regression analysis were conducted. As a result of cross-tabulations, univariate analysis showed that personal credit rating, presence in the industry, presence of residential housing, presence of employees, and presence of financial statements were selected as significant variables. As a result of the logistic regression analysis, three variables such as personal credit rating, occupation in the industry, and presence of residential house were found to be important factors affecting the failure of founding companies. This result shows the importance of entrepreneur 's personal credibility and experience and entrepreneur' s assets in business management. The start-up support institutions should reflect these results in the entrepreneur 's credit evaluation system, and the entrepreneurs need training on the importance of the personal credit and the management plan in the entrepreneurial education. The results of this analysis will contribute to the minimization of the incapacity of startups by providing useful non-financial information to founders and start-up support organizations.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.