Denoising Self-Attention Network for Mixed-type Data Imputation (혼합형 데이터 보간을 위한 디노이징 셀프 어텐션 네트워크)
-
- The Journal of the Korea Contents Association
- /
- v.21 no.11
- /
- pp.135-144
- /
- 2021
Recently, data-driven decision-making technology has become a key technology leading the data industry, and machine learning technology for this requires high-quality training datasets. However, real-world data contains missing values for various reasons, which degrades the performance of prediction models learned from the poor training data. Therefore, in order to build a high-performance model from real-world datasets, many studies on automatically imputing missing values in initial training data have been actively conducted. Many of conventional machine learning-based imputation techniques for handling missing data involve very time-consuming and cumbersome work because they are applied only to numeric type of columns or create individual predictive models for each columns. Therefore, this paper proposes a new data imputation technique called 'Denoising Self-Attention Network (DSAN)', which can be applied to mixed-type dataset containing both numerical and categorical columns. DSAN can learn robust feature expression vectors by combining self-attention and denoising techniques, and can automatically interpolate multiple missing variables in parallel through multi-task learning. To verify the validity of the proposed technique, data imputation experiments has been performed after arbitrarily generating missing values for several mixed-type training data. Then we show the validity of the proposed technique by comparing the performance of the binary classification models trained on imputed data together with the errors between the original and imputed values.
This study acquires NBA statistical information for a total of 32 years from 1990 to 2022 using web crawling, observes variables of interest through exploratory data analysis, and generates related derived variables. Unused variables were removed through a purification process on the input data, and correlation analysis, t-test, and ANOVA were performed on the remaining variables. For the variable of interest, the difference in the mean between the groups that advanced to the playoffs and did not advance to the playoffs was tested, and then to compensate for this, the average difference between the three groups (higher/middle/lower) based on ranking was reconfirmed. Of the input data, only this year's season data was used as a test set, and 5-fold cross-validation was performed by dividing the training set and the validation set for model training. The overfitting problem was solved by comparing the cross-validation result and the final analysis result using the test set to confirm that there was no difference in the performance matrix. Because the quality level of the raw data is high and the statistical assumptions are satisfied, most of the models showed good results despite the small data set. This study not only predicts NBA game results or classifies whether or not to advance to the playoffs using machine learning, but also examines whether the variables of interest are included in the major variables with high importance by understanding the importance of input attribute. Through the visualization of SHAP value, it was possible to overcome the limitation that could not be interpreted only with the result of feature importance, and to compensate for the lack of consistency in the importance calculation in the process of entering/removing variables. It was found that a number of variables related to three points and errors classified as subjects of interest in this study were included in the major variables affecting advancing to the playoffs in the NBA. Although this study is similar in that it includes topics such as match results, playoffs, and championship predictions, which have been dealt with in the existing sports data analysis field, and comparatively analyzed several machine learning models for analysis, there is a difference in that the interest features are set in advance and statistically verified, so that it is compared with the machine learning analysis result. Also, it was differentiated from existing studies by presenting explanatory visualization results using SHAP, one of the XAI models.
A Case Study on Venture and Small-Business Executives' Use of Strategic Intuition in the Decision Making Process This paper is a case study on how Venture and Small-Business Executives managers can take advantage of their intuitions in situations where the business environment is increasingly uncertain, a novel situation occurs without any data to reflect on, when rational decision-making is not possible, and when the business environment changes. The case study is based on a literature review, in-depth interviews with 16 business managers, and an analysis of Klein, G's (1998) "Generic Mental Simulation Model." The "intuition" discussed in this analysis is classified into two types of intuition: the Expert Intuition which is based on one's own experiences, and Strategic Intuition which is based on the experience of others. Case study strategic management intuition and intuition, the experts were utilized differently. Features of professional intuition to work quickly without any effort by, while the strategic intuition, is time-consuming. Another feature that has already occurred, one expert intuition in decision-making about the widely used strategic intuition was used a lot in future decision-making. The case study results revealed that managers were using expert intuition and strategic intuition differentially. More specifically, Expert Intuition was activated effortlessly, while strategic intuition required more time. Also, expert intuition was used mainly for making judgments about events that have already happened, while strategic intuition was used more often for judgments regarding events in the future. The process of strategic intuition involved (1) Strategic concerns, (2) the discovery of medium, (3) Primary mental simulation, (4) The offsetting of key parameters, (5) secondary mental simulation, and (6) the decision making process. These steps were used to develop the "Strategic Intuition Decision-making Model" for Venture and Small-Business Executives. The case study results further showed that firstly, the success of decision-making was determined in the "secondary mental simulation' stage, and secondly, that more difficulty in management was encountered when expert intuition was used more than strategic intuition and lastly strategic intuition is possible to be educated.
Personalized smart devices such as smartphones and smart pads are widely used. Unlike traditional feature phones, theses smart devices allow users to choose a variety of functions, which support not only daily experiences but also business operations. Actually, there exist a huge number of applications accessible by smart device users in online and mobile application markets. Users can choose apps that fit their own tastes and needs, which is impossible for conventional phone users. With the increase in app demand, the tastes and needs of app users are becoming more diverse. To meet these requirements, numerous apps with diverse functions are being released on the market, which leads to fierce competition. Unlike offline markets, online markets have a limitation in that purchasing decisions should be made without experiencing the items. Therefore, online customers rely more on item-related information that can be seen on the item page in which online markets commonly provide details about each item. Customers can feel confident about the quality of an item through the online information and decide whether to purchase it. The same is true of online app markets. To win the sales competition against other apps that perform similar functions, app developers need to focus on writing app descriptions to attract the attention of customers. If we can measure the effect of app descriptions on sales without regard to the app's price and quality, app descriptions that facilitate the sale of apps can be identified. This study intends to provide such a quantitative result for app developers who want to promote the sales of their apps. For this purpose, we collected app details including the descriptions written in Korean from one of the largest app markets in Korea, and then extracted keywords from the descriptions. Next, the impact of the keywords on sales performance was measured through our econometric model. Through this analysis, we were able to analyze the impact of each keyword itself, apart from that of the design or quality. The keywords, comprised of the attribute and evaluation of each app, are extracted by a morpheme analyzer. Our model with the keywords as its input variables was established to analyze their impact on sales performance. A regression analysis was conducted for each category in which apps are included. This analysis was required because we found the keywords, which are emphasized in app descriptions, different category-by-category. The analysis conducted not only for free apps but also for paid apps showed which keywords have more impact on sales performance for each type of app. In the analysis of paid apps in the education category, keywords such as 'search+easy' and 'words+abundant' showed higher effectiveness. In the same category, free apps whose keywords emphasize the quality of apps showed higher sales performance. One interesting fact is that keywords describing not only the app but also the need for the app have asignificant impact. Language learning apps, regardless of whether they are sold free or paid, showed higher sales performance by including the keywords 'foreign language study+important'. This result shows that motivation for the purchase affected sales. While item reviews are widely researched in online markets, item descriptions are not very actively studied. In the case of the mobile app markets, newly introduced apps may not have many item reviews because of the low quantity sold. In such cases, item descriptions can be regarded more important when customers make a decision about purchasing items. This study is the first trial to quantitatively analyze the relationship between an item description and its impact on sales performance. The results show that our research framework successfully provides a list of the most effective sales key terms with the estimates of their effectiveness. Although this study is performed for a specified type of item (i.e., mobile apps), our model can be applied to almost all of the items traded in online markets.
This study investigates when and how disagreements in online customer ratings prompt more favorable product evaluations. Among the three metrics of volume, valence, and variance that feature in the research on online customer ratings, volume and valence have exhibited consistently positive patterns in their effects on product sales or evaluations (e.g., Dellarocas, Zhang, and Awad 2007; Liu 2006). Ratings variance, or the degree of disagreement among reviewers, however, has shown rather mixed results, with some studies reporting positive effects on product sales (e.g., Clement, Proppe, and Rott 2007) while others finding negative effects on product evaluations (e.g., Zhu and Zhang 2010). This study aims to resolve these contradictory findings by introducing preference heterogeneity as a possible moderator and causal attribution as a mediator to account for the moderating effect. The main proposition of this study is that when preference heterogeneity is perceived as high, a disagreement in ratings is attributed more to reviewers' different preferences than to unreliable product quality, which in turn prompts better quality evaluations of a product. Because disagreements mostly result from differences in reviewers' tastes or the low reliability of a product's quality (Mizerski 1982; Sen and Lerman 2007), a greater level of attribution to reviewer tastes can mitigate the negative effect of disagreement on product evaluations. Specifically, if consumers infer that reviewers' heterogeneous preferences result in subjectively different experiences and thereby highly diverse ratings, they would not disregard the overall quality of a product. However, if consumers infer that reviewers' preferences are quite homogeneous and thus the low reliability of the product quality contributes to such disagreements, they would discount the overall product quality. Therefore, consumers would respond more favorably to disagreements in ratings when preference heterogeneity is perceived as high rather than low. This study furthermore extends this prediction to the various levels of average ratings. The heuristicsystematic processing model so far indicates that the engagement in effortful systematic processing occurs only when sufficient motivation is present (Hann et al. 2007; Maheswaran and Chaiken 1991; Martin and Davies 1998). One of the key factors affecting this motivation is the aspiration level of the decision maker. Only under conditions that meet or exceed his aspiration level does he tend to engage in systematic processing (Patzelt and Shepherd 2008; Stephanous and Sage 1987). Therefore, systematic causal attribution processing regarding ratings variance is likely more activated when the average rating is high enough to meet the aspiration level than when it is too low to meet it. Considering that the interaction between ratings variance and preference heterogeneity occurs through the mediation of causal attribution, this greater activation of causal attribution in high versus low average ratings would lead to more pronounced interaction between ratings variance and preference heterogeneity in high versus low average ratings. Overall, this study proposes that the interaction between ratings variance and preference heterogeneity is more pronounced when the average rating is high as compared to when it is low. Two laboratory studies lend support to these predictions. Study 1 reveals that participants exposed to a high-preference heterogeneity book title (i.e., a novel) attributed disagreement in ratings more to reviewers' tastes, and thereby more favorably evaluated books with such ratings, compared to those exposed to a low-preference heterogeneity title (i.e., an English listening practice book). Study 2 then extended these findings to the various levels of average ratings and found that this greater preference for disagreement options under high preference heterogeneity is more pronounced when the average rating is high compared to when it is low. This study makes an important theoretical contribution to the online customer ratings literature by showing that preference heterogeneity serves as a key moderator of the effect of ratings variance on product evaluations and that causal attribution acts as a mediator of this moderation effect. A more comprehensive picture of the interplay among ratings variance, preference heterogeneity, and average ratings is also provided by revealing that the interaction between ratings variance and preference heterogeneity varies as a function of the average rating. In addition, this work provides some significant managerial implications for marketers in terms of how they manage word of mouth. Because a lack of consensus creates some uncertainty and anxiety over the given information, consumers experience a psychological burden regarding their choice of a product when ratings show disagreement. The results of this study offer a way to address this problem. By explicitly clarifying that there are many more differences in tastes among reviewers than expected, marketers can allow consumers to speculate that differing tastes of reviewers rather than an uncertain or poor product quality contribute to such conflicts in ratings. Thus, when fierce disagreements are observed in the WOM arena, marketers are advised to communicate to consumers that diverse, rather than uniform, tastes govern reviews and evaluations of products.
As the era of space technology utilization is approaching, the launch of CAS (Compact Advanced Satellite) 500-1/2 satellites is scheduled during 2021 for acquisition of high-resolution images. Accordingly, the increase of image usability and processing efficiency has been emphasized as key design concepts of the CAS 500-1/2 ground station. In this regard, "CAS 500-1/2 Image Acquisition and Utilization Technology Development" project has been carried out to develop core technologies and processing systems for CAS 500-1/2 data collecting, processing, managing and distributing. In this paper, we introduce the results of the above project. We developed an operation system to generate precision images automatically with GCP (Ground Control Point) chip DB (Database) and DEM (Digital Elevation Model) DB over the entire Korean peninsula. We also developed the system to produce ortho-rectified images indexed to 1:5,000 map grids, and hence set a foundation for ARD (Analysis Ready Data)system. In addition, we linked various application software to the operation system and systematically produce mosaic images, DSM (Digital Surface Model)/DTM (Digital Terrain Model), spatial feature thematic map, and change detection thematic map. The major contribution of the developed system and technologies includes that precision images are to be automatically generated using GCP chip DB for the first time in Korea and the various utilization product technologies incorporated into the operation system of a satellite ground station. The developed operation system has been installed on Korea Land Observation Satellite Information Center of the NGII (National Geographic Information Institute). We expect the system to contribute greatly to the center's work and provide a standard for future ground station systems of earth observation satellites.
I test the hypothesis that the gradual diffusion of information across asset markets leads to cross-asset return predictability in Korea. Using thirty-six industry portfolios and the broad market index as our test assets, I establish several key results. First, a number of industries such as semiconductor, electronics, metal, and petroleum lead the stock market by up to one month. In contrast, the market, which is widely followed, only leads a few industries. Importantly, an industry's ability to lead the market is correlated with its propensity to forecast various indicators of economic activity such as industrial production growth. Consistent with our hypothesis, these findings indicate that the market reacts with a delay to information in industry returns about its fundamentals because information diffuses only gradually across asset markets. Traditional theories of asset pricing assume that investors have unlimited information-processing capacity. However, this assumption does not hold for many traders, even the most sophisticated ones. Many economists recognize that investors are better characterized as being only boundedly rational(see Shiller(2000), Sims(2201)). Even from casual observation, few traders can pay attention to all sources of information much less understand their impact on the prices of assets that they trade. Indeed, a large literature in psychology documents the extent to which even attention is a precious cognitive resource(see, eg., Kahneman(1973), Nisbett and Ross(1980), Fiske and Taylor(1991)). A number of papers have explored the implications of limited information- processing capacity for asset prices. I will review this literature in Section II. For instance, Merton(1987) develops a static model of multiple stocks in which investors only have information about a limited number of stocks and only trade those that they have information about. Related models of limited market participation include brennan(1975) and Allen and Gale(1994). As a result, stocks that are less recognized by investors have a smaller investor base(neglected stocks) and trade at a greater discount because of limited risk sharing. More recently, Hong and Stein(1999) develop a dynamic model of a single asset in which information gradually diffuses across the investment public and investors are unable to perform the rational expectations trick of extracting information from prices. Hong and Stein(1999). My hypothesis is that the gradual diffusion of information across asset markets leads to cross-asset return predictability. This hypothesis relies on two key assumptions. The first is that valuable information that originates in one asset reaches investors in other markets only with a lag, i.e. news travels slowly across markets. The second assumption is that because of limited information-processing capacity, many (though not necessarily all) investors may not pay attention or be able to extract the information from the asset prices of markets that they do not participate in. These two assumptions taken together leads to cross-asset return predictability. My hypothesis would appear to be a very plausible one for a few reasons. To begin with, as pointed out by Merton(1987) and the subsequent literature on segmented markets and limited market participation, few investors trade all assets. Put another way, limited participation is a pervasive feature of financial markets. Indeed, even among equity money managers, there is specialization along industries such as sector or market timing funds. Some reasons for this limited market participation include tax, regulatory or liquidity constraints. More plausibly, investors have to specialize because they have their hands full trying to understand the markets that they do participate in
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70