• Title/Summary/Keyword: 데이터 확장 기법

Search Result 833, Processing Time 0.03 seconds

Analysis of Research Trends in Tax Compliance using Topic Modeling (토픽모델링을 활용한 조세순응 연구 동향 분석)

  • Kang, Min-Jo;Baek, Pyoung-Gu
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.99-115
    • /
    • 2022
  • In this study, domestic academic journal papers on tax compliance, tax consciousness, and faithful tax payment (hereinafter referred to as "tax compliance") were comprehensively analyzed from an interdisciplinary perspective as a representative research topic in the field of tax science. To achieve the research purpose, topic modeling technique was applied as part of text mining. In the flow of data collection-keyword preprocessing-topic model analysis, potential research topics were presented from tax compliance related keywords registered by the researcher in a total of 347 papers. The results of this study can be summarized as follows. First, in the keyword analysis, keywords such as tax investigation, tax avoidance, and honest tax reporting system were included in the top 5 keywords based on simple term-frequency, and in the TF-IDF value considering the relative importance of keywords, they were also included in the top 5 keywords. On the other hand, the keyword, tax evasion, was included in the top keyword based on the TF-IDF value, whereas it was not highlighted in the simple term-frequency. Second, eight potential research topics were derived through topic modeling. The topics covered are (1) tax fairness and suppression of tax offenses, (2) the ideology of the tax law and the validity of tax policies, (3) the principle of substance over form and guarantee of tax receivables (4) tax compliance costs and tax administration services, (5) the tax returns self- assessment system and tax experts, (6) tax climate and strategic tax behavior, (7) multifaceted tax behavior and differential compliance intentions, (8) tax information system and tax resource management. The research comprehensively looked at the various perspectives on the tax compliance from an interdisciplinary perspective, thereby comprehensively grasping past research trends on tax compliance and suggesting the direction of future research.

An analysis of land displacements in terms of hydrologic aspect: satellite-based precipitation and groundwater levels (수문학적 관점에서의 지반 변위 분석: 인공위성 강우데이터와 지하수위 연계)

  • Oh, Seungcheol;Kim, Wanyub;Kang, Minsun;Yoon, Hongsic;Yang, Jungsuk;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.1031-1039
    • /
    • 2022
  • As one of the hydrological factors closely related to landslides, precipitation indirectly affects slope stability by generating external forces. Groundwater level fluctuations have attracted more attention lately as factors that directly affect slope stability have become more prominent. Therefore, this study attempted to analyze the relationship between variables through changes in precipitation, groundwater levels, and land displacement. A time series-based analysis was conducted using satellite-based precipitation and point-based groundwater levels in conjunction with the PSInSAR technique to simulate land displacement in urban and mountainous areas. There was a sharp rise in groundwater levels in both urban and mountain areas during heavy rainfall, and a continuous decrease in urban areas when rainfall was low. 6 mm of displacements was observed in the mountainous area as a results of soil outflow from the topsoil layer, which was accompanied by an increased groundwater level. Meanwhile, different results were found in urban area. In response to the rise in groundwater level, the land displacement increases due to the expansion of soil skeletons, while the decrease seems to be attributed to anthropogenic influences. Overall, there was no consistent relationship between groundwater levels and land displacement, which appears to be caused by factors other than hydrological factors. Additional consideration of environmental factors could contribute to a deeper understanding of the relationship between the two factors.

Design of Calibration and Validation Area for Forestry Vegetation Index from CAS500-4 (농림위성 산림분야 식생지수 검보정 사이트 설계)

  • Lim, Joongbin;Cha, Sungeun;Won, Myoungsoo;Kim, Joon;Park, Juhan;Ryu, Youngryel;Lee, Woo-Kyun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.3
    • /
    • pp.311-326
    • /
    • 2022
  • The Compact Advanced Satellite 500-4 (CAS500-4) is under development to efficiently manage and monitor forests in Korea and is scheduled to launch in 2025. The National Institute of Forest Science is developing 36 types of forestry applications to utilize the CAS500-4 efficiently. The products derived using the remote sensing method require validation with ground reference data, and the quality monitoring results for the products must be continuously reported. Due to it being the first time developing the national forestry satellite, there is no official calibration and validation site for forestry products in Korea. Accordingly, the author designed a calibration and validation site for the forestry products following international standards. In addition, to install calibration and validation sites nationwide, the authors selected appropriate sensors and evaluated the applicability of the sensors. As a result, the difference between the ground observation data and the Sentinel-2 image was observed to be within ±5%, confirming that the sensor could be used for nationwide expansion.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

Prediction of commitment and persistence in heterosexual involvements according to the styles of loving using a datamining technique (데이터마이닝을 활용한 사랑의 형태에 따른 연인관계 몰입수준 및 관계 지속여부 예측)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.69-85
    • /
    • 2016
  • Successful relationship with loving partners is one of the most important factors in life. In psychology, there have been some previous researches studying the factors influencing romantic relationships. However, most of these researches were performed based on statistical analysis; thus they have limitations in analyzing complex non-linear relationships or rules based reasoning. This research analyzes commitment and persistence in heterosexual involvement according to styles of loving using a datamining technique as well as statistical methods. In this research, we consider six different styles of loving - 'eros', 'ludus', 'stroge', 'pragma', 'mania' and 'agape' which influence romantic relationships between lovers, besides the factors suggested by the previous researches. These six types of love are defined by Lee (1977) as follows: 'eros' is romantic, passionate love; 'ludus' is a game-playing or uncommitted love; 'storge' is a slow developing, friendship-based love; 'pragma' is a pragmatic, practical, mutually beneficial relationship; 'mania' is an obsessive or possessive love and, lastly, 'agape' is a gentle, caring, giving type of love, brotherly love, not concerned with the self. In order to do this research, data from 105 heterosexual couples were collected. Using the data, a linear regression method was first performed to find out the important factors associated with a commitment to partners. The result shows that 'satisfaction', 'eros' and 'agape' are significant factors associated with the commitment level for both male and female. Interestingly, in male cases, 'agape' has a greater effect on commitment than 'eros'. On the other hand, in female cases, 'eros' is a more significant factor than 'agape' to commitment. In addition to that, 'investment' of the male is also crucial factor for male commitment. Next, decision tree analysis was performed to find out the characteristics of high commitment couples and low commitment couples. In order to build decision tree models in this experiment, 'decision tree' operator in the datamining tool, Rapid Miner was used. The experimental result shows that males having a high satisfaction level in relationship show a high commitment level. However, even though a male may not have a high satisfaction level, if he has made a lot of financial or mental investment in relationship, and his partner shows him a certain amount of 'agape', then he also shows a high commitment level to the female. In the case of female, a women having a high 'eros' and 'satisfaction' level shows a high commitment level. Otherwise, even though a female may not have a high satisfaction level, if her partner shows a certain amount of 'mania' then the female also shows a high commitment level. Finally, this research built a prediction model to establish whether the relationship will persist or break up using a decision tree. The result shows that the most important factor influencing to the break up is a 'narcissistic tendency' of the male. In addition to that, 'satisfaction', 'investment' and 'mania' of both male and female also affect a break up. Interestingly, while the 'mania' level of a male works positively to maintain the relationship, that of a female has a negative influence. The contribution of this research is adopting a new technique of analysis using a datamining method for psychology. In addition, the results of this research can provide useful advice to couples for building a harmonious relationship with each other. This research has several limitations. First, the experimental data was sampled based on oversampling technique to balance the size of each classes. Thus, it has a limitation of evaluating performances of the predictive models objectively. Second, the result data, whether the relationship persists of not, was collected relatively in short periods - 6 months after the initial data collection. Lastly, most of the respondents of the survey is in their 20's. In order to get more general results, we would like to extend this research to general populations.

Change Detection for High-resolution Satellite Images Using Transfer Learning and Deep Learning Network (전이학습과 딥러닝 네트워크를 활용한 고해상도 위성영상의 변화탐지)

  • Song, Ah Ram;Choi, Jae Wan;Kim, Yong Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.3
    • /
    • pp.199-208
    • /
    • 2019
  • As the number of available satellites increases and technology advances, image information outputs are becoming increasingly diverse and a large amount of data is accumulating. In this study, we propose a change detection method for high-resolution satellite images that uses transfer learning and a deep learning network to overcome the limit caused by insufficient training data via the use of pre-trained information. The deep learning network used in this study comprises convolutional layers to extract the spatial and spectral information and convolutional long-short term memory layers to analyze the time series information. To use the learned information, the two initial convolutional layers of the change detection network are designed to use learned values from 40,000 patches of the ISPRS (International Society for Photogrammertry and Remote Sensing) dataset as initial values. In addition, 2D (2-Dimensional) and 3D (3-dimensional) kernels were used to find the optimized structure for the high-resolution satellite images. The experimental results for the KOMPSAT-3A (KOrean Multi-Purpose SATllite-3A) satellite images show that this change detection method can effectively extract changed/unchanged pixels but is less sensitive to changes due to shadow and relief displacements. In addition, the change detection accuracy of two sites was improved by using 3D kernels. This is because a 3D kernel can consider not only the spatial information but also the spectral information. This study indicates that we can effectively detect changes in high-resolution satellite images using the constructed image information and deep learning network. In future work, a pre-trained change detection network will be applied to newly obtained images to extend the scope of the application.

State of Health and State of Charge Estimation of Li-ion Battery for Construction Equipment based on Dual Extended Kalman Filter (이중확장칼만필터(DEKF)를 기반한 건설장비용 리튬이온전지의 State of Charge(SOC) 및 State of Health(SOH) 추정)

  • Hong-Ryun Jung;Jun Ho Kim;Seung Woo Kim;Jong Hoon Kim;Eun Jin Kang;Jeong Woo Yun
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.31 no.1
    • /
    • pp.16-22
    • /
    • 2024
  • Along with the high interest in electric vehicles and new renewable energy, there is a growing demand to apply lithium-ion batteries in the construction equipment industry. The capacity of heavy construction equipment that performs various tasks at construction sites is rapidly decreasing. Therefore, it is essential to accurately predict the state of batteries such as SOC (State of Charge) and SOH (State of Health). In this paper, the errors between actual electrochemical measurement data and estimated data were compared using the Dual Extended Kalman Filter (DEKF) algorithm that can estimate SOC and SOH at the same time. The prediction of battery charge state was analyzed by measuring OCV at SOC 5% intervals under 0.2C-rate conditions after the battery cell was fully charged, and the degradation state of the battery was predicted after 50 cycles of aging tests under various C-rate (0.2, 0.3, 0.5, 1.0, 1.5C rate) conditions. It was confirmed that the SOC and SOH estimation errors using DEKF tended to increase as the C-rate increased. It was confirmed that the SOC estimation using DEKF showed less than 6% at 0.2, 0.5, and 1C-rate. In addition, it was confirmed that the SOH estimation results showed good performance within the maximum error of 1.0% and 1.3% at 0.2 and 0.3C-rate, respectively. Also, it was confirmed that the estimation error also increased from 1.5% to 2% as the C-rate increased from 0.5 to 1.5C-rate. However, this result shows that all SOH estimation results using DEKF were excellent within about 2%.

SKU recommender system for retail stores that carry identical brands using collaborative filtering and hybrid filtering (협업 필터링 및 하이브리드 필터링을 이용한 동종 브랜드 판매 매장간(間) 취급 SKU 추천 시스템)

  • Joe, Denis Yongmin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.77-110
    • /
    • 2017
  • Recently, the diversification and individualization of consumption patterns through the web and mobile devices based on the Internet have been rapid. As this happens, the efficient operation of the offline store, which is a traditional distribution channel, has become more important. In order to raise both the sales and profits of stores, stores need to supply and sell the most attractive products to consumers in a timely manner. However, there is a lack of research on which SKUs, out of many products, can increase sales probability and reduce inventory costs. In particular, if a company sells products through multiple in-store stores across multiple locations, it would be helpful to increase sales and profitability of stores if SKUs appealing to customers are recommended. In this study, the recommender system (recommender system such as collaborative filtering and hybrid filtering), which has been used for personalization recommendation, is suggested by SKU recommendation method of a store unit of a distribution company that handles a homogeneous brand through a plurality of sales stores by country and region. We calculated the similarity of each store by using the purchase data of each store's handling items, filtering the collaboration according to the sales history of each store by each SKU, and finally recommending the individual SKU to the store. In addition, the store is classified into four clusters through PCA (Principal Component Analysis) and cluster analysis (Clustering) using the store profile data. The recommendation system is implemented by the hybrid filtering method that applies the collaborative filtering in each cluster and measured the performance of both methods based on actual sales data. Most of the existing recommendation systems have been studied by recommending items such as movies and music to the users. In practice, industrial applications have also become popular. In the meantime, there has been little research on recommending SKUs for each store by applying these recommendation systems, which have been mainly dealt with in the field of personalization services, to the store units of distributors handling similar brands. If the recommendation method of the existing recommendation methodology was 'the individual field', this study expanded the scope of the store beyond the individual domain through a plurality of sales stores by country and region and dealt with the store unit of the distribution company handling the same brand SKU while suggesting a recommendation method. In addition, if the existing recommendation system is limited to online, it is recommended to apply the data mining technique to develop an algorithm suitable for expanding to the store area rather than expanding the utilization range offline and analyzing based on the existing individual. The significance of the results of this study is that the personalization recommendation algorithm is applied to a plurality of sales outlets handling the same brand. A meaningful result is derived and a concrete methodology that can be constructed and used as a system for actual companies is proposed. It is also meaningful that this is the first attempt to expand the research area of the academic field related to the existing recommendation system, which was focused on the personalization domain, to a sales store of a company handling the same brand. From 05 to 03 in 2014, the number of stores' sales volume of the top 100 SKUs are limited to 52 SKUs by collaborative filtering and the hybrid filtering method SKU recommended. We compared the performance of the two recommendation methods by totaling the sales results. The reason for comparing the two recommendation methods is that the recommendation method of this study is defined as the reference model in which offline collaborative filtering is applied to demonstrate higher performance than the existing recommendation method. The results of this model are compared with the Hybrid filtering method, which is a model that reflects the characteristics of the offline store view. The proposed method showed a higher performance than the existing recommendation method. The proposed method was proved by using actual sales data of large Korean apparel companies. In this study, we propose a method to extend the recommendation system of the individual level to the group level and to efficiently approach it. In addition to the theoretical framework, which is of great value.

A Simulation-Based Investigation of an Advanced Traveler Information System with V2V in Urban Network (시뮬레이션기법을 통한 차량 간 통신을 이용한 첨단교통정보시스템의 효과 분석 (도시 도로망을 중심으로))

  • Kim, Hoe-Kyoung
    • Journal of Korean Society of Transportation
    • /
    • v.29 no.5
    • /
    • pp.121-138
    • /
    • 2011
  • More affordable and available cutting-edge technologies (e.g., wireless vehicle communication) are regarded as a possible alternative to the fixed infrastructure-based traffic information system requiring the expensive infrastructure investments and mostly implemented in the uninterrupted freeway network with limited spatial system expansion. This paper develops an advanced decentralized traveler information System (ATIS) using vehicle-to-vehicle (V2V) communication system whose performance (drivers' travel time savings) are enhanced by three complementary functions (autonomous automatic incident detection algorithm, reliable sample size function, and driver behavior model) and evaluates it in the typical $6{\times}6$ urban grid network with non-recurrent traffic state (traffic incident) with the varying key parameters (traffic flow, communication radio range, and penetration ratio), employing the off-the-shelf microscopic simulation model (VISSIM) under the ideal vehicle communication environment. Simulation outputs indicate that as the three key parameters are increased more participating vehicles are involved for traffic data propagation in the less communication groups at the faster data dissemination speed. Also, participating vehicles saved their travel time by dynamically updating the up-to-date traffic states and searching for the new route. Focusing on the travel time difference of (instant) re-routing vehicles, lower traffic flow cases saved more time than higher traffic flow ones. This is because a relatively small number of vehicles in 300vph case re-route during the most system-efficient time period (the early time of the traffic incident) but more vehicles in 514vph case re-route during less system-efficient time period, even after the incident is resolved. Also, normally re-routings on the network-entering links saved more travel time than any other places inside the network except the case where the direct effect of traffic incident triggers vehicle re-routings during the effective incident time period and the location and direction of the incident link determines the spatial distribution of re-routing vehicles.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.