Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)
-
- Journal of Intelligence and Information Systems
- /
- v.27 no.4
- /
- pp.1-22
- /
- 2021
Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
A number of Korean native chicken(KNC) populations were registered in FAO (Food and Agriculture Organization) DAD-IS (Domestic Animal Diversity Information Systems, http://www.fao.org/dad-is). But there is a lack of scientific basis to prove that they are unique population of Korea. For this reason, this study was conducted to prove KNC's uniqueness using 25 Microsatellite markers. A total of 548 chickens from 11 KNC populations (KNG, KNB, KNR, KNW, KNY, KNO, HIC, HYD, HBC, JJC, LTC) and 7 introduced populations (ARA: Araucana, RRC and RRD: Rhode Island Red C and D, LGF and LGK: White Leghorn F and K, COS and COH: Cornish brown and Cornish black) were used. Allele size per locus was decided using GeneMapper Software (v 5.0). A total of 195 alleles were observed and the range was 3 to 14 per locus. The MNA,
The objective of this research is to test the feasibility of developing a statewide truck traffic forecasting methodology for Wisconsin by using Origin-Destination surveys, traffic counts, classification counts, and other data that are routinely collected by the Wisconsin Department of Transportation (WisDOT). Development of a feasible model will permit estimation of future truck traffic for every major link in the network. This will provide the basis for improved estimation of future pavement deterioration. Pavement damage rises exponentially as axle weight increases, and trucks are responsible for most of the traffic-induced damage to pavement. Consequently, forecasts of truck traffic are critical to pavement management systems. The pavement Management Decision Supporting System (PMDSS) prepared by WisDOT in May 1990 combines pavement inventory and performance data with a knowledge base consisting of rules for evaluation, problem identification and rehabilitation recommendation. Without a r.easonable truck traffic forecasting methodology, PMDSS is not able to project pavement performance trends in order to make assessment and recommendations in the future years. However, none of WisDOT's existing forecasting methodologies has been designed specifically for predicting truck movements on a statewide highway network. For this research, the Origin-Destination survey data avaiiable from WisDOT, including two stateline areas, one county, and five cities, are analyzed and the zone-to'||'&'||'not;zone truck trip tables are developed. The resulting Origin-Destination Trip Length Frequency (00 TLF) distributions by trip type are applied to the Gravity Model (GM) for comparison with comparable TLFs from the GM. The gravity model is calibrated to obtain friction factor curves for the three trip types, Internal-Internal (I-I), Internal-External (I-E), and External-External (E-E). ~oth "macro-scale" calibration and "micro-scale" calibration are performed. The comparison of the statewide GM TLF with the 00 TLF for the macro-scale calibration does not provide suitable results because the available 00 survey data do not represent an unbiased sample of statewide truck trips. For the "micro-scale" calibration, "partial" GM trip tables that correspond to the 00 survey trip tables are extracted from the full statewide GM trip table. These "partial" GM trip tables are then merged and a partial GM TLF is created. The GM friction factor curves are adjusted until the partial GM TLF matches the 00 TLF. Three friction factor curves, one for each trip type, resulting from the micro-scale calibration produce a reasonable GM truck trip model. A key methodological issue for GM. calibration involves the use of multiple friction factor curves versus a single friction factor curve for each trip type in order to estimate truck trips with reasonable accuracy. A single friction factor curve for each of the three trip types was found to reproduce the 00 TLFs from the calibration data base. Given the very limited trip generation data available for this research, additional refinement of the gravity model using multiple mction factor curves for each trip type was not warranted. In the traditional urban transportation planning studies, the zonal trip productions and attractions and region-wide OD TLFs are available. However, for this research, the information available for the development .of the GM model is limited to Ground Counts (GC) and a limited set ofOD TLFs. The GM is calibrated using the limited OD data, but the OD data are not adequate to obtain good estimates of truck trip productions and attractions .. Consequently, zonal productions and attractions are estimated using zonal population as a first approximation. Then, Selected Link based (SELINK) analyses are used to adjust the productions and attractions and possibly recalibrate the GM. The SELINK adjustment process involves identifying the origins and destinations of all truck trips that are assigned to a specified "selected link" as the result of a standard traffic assignment. A link adjustment factor is computed as the ratio of the actual volume for the link (ground count) to the total assigned volume. This link adjustment factor is then applied to all of the origin and destination zones of the trips using that "selected link". Selected link based analyses are conducted by using both 16 selected links and 32 selected links. The result of SELINK analysis by u~ing 32 selected links provides the least %RMSE in the screenline volume analysis. In addition, the stability of the GM truck estimating model is preserved by using 32 selected links with three SELINK adjustments, that is, the GM remains calibrated despite substantial changes in the input productions and attractions. The coverage of zones provided by 32 selected links is satisfactory. Increasing the number of repetitions beyond four is not reasonable because the stability of GM model in reproducing the OD TLF reaches its limits. The total volume of truck traffic captured by 32 selected links is 107% of total trip productions. But more importantly, ~ELINK adjustment factors for all of the zones can be computed. Evaluation of the travel demand model resulting from the SELINK adjustments is conducted by using screenline volume analysis, functional class and route specific volume analysis, area specific volume analysis, production and attraction analysis, and Vehicle Miles of Travel (VMT) analysis. Screenline volume analysis by using four screenlines with 28 check points are used for evaluation of the adequacy of the overall model. The total trucks crossing the screenlines are compared to the ground count totals. L V/GC ratios of 0.958 by using 32 selected links and 1.001 by using 16 selected links are obtained. The %RM:SE for the four screenlines is inversely proportional to the average ground count totals by screenline .. The magnitude of %RM:SE for the four screenlines resulting from the fourth and last GM run by using 32 and 16 selected links is 22% and 31 % respectively. These results are similar to the overall %RMSE achieved for the 32 and 16 selected links themselves of 19% and 33% respectively. This implies that the SELINICanalysis results are reasonable for all sections of the state.Functional class and route specific volume analysis is possible by using the available 154 classification count check points. The truck traffic crossing the Interstate highways (ISH) with 37 check points, the US highways (USH) with 50 check points, and the State highways (STH) with 67 check points is compared to the actual ground count totals. The magnitude of the overall link volume to ground count ratio by route does not provide any specific pattern of over or underestimate. However, the %R11SE for the ISH shows the least value while that for the STH shows the largest value. This pattern is consistent with the screenline analysis and the overall relationship between %RMSE and ground count volume groups. Area specific volume analysis provides another broad statewide measure of the performance of the overall model. The truck traffic in the North area with 26 check points, the West area with 36 check points, the East area with 29 check points, and the South area with 64 check points are compared to the actual ground count totals. The four areas show similar results. No specific patterns in the L V/GC ratio by area are found. In addition, the %RMSE is computed for each of the four areas. The %RMSEs for the North, West, East, and South areas are 92%, 49%, 27%, and 35% respectively, whereas, the average ground counts are 481, 1383, 1532, and 3154 respectively. As for the screenline and volume range analyses, the %RMSE is inversely related to average link volume. 'The SELINK adjustments of productions and attractions resulted in a very substantial reduction in the total in-state zonal productions and attractions. The initial in-state zonal trip generation model can now be revised with a new trip production's trip rate (total adjusted productions/total population) and a new trip attraction's trip rate. Revised zonal production and attraction adjustment factors can then be developed that only reflect the impact of the SELINK adjustments that cause mcreases or , decreases from the revised zonal estimate of productions and attractions. Analysis of the revised production adjustment factors is conducted by plotting the factors on the state map. The east area of the state including the counties of Brown, Outagamie, Shawano, Wmnebago, Fond du Lac, Marathon shows comparatively large values of the revised adjustment factors. Overall, both small and large values of the revised adjustment factors are scattered around Wisconsin. This suggests that more independent variables beyond just 226; population are needed for the development of the heavy truck trip generation model. More independent variables including zonal employment data (office employees and manufacturing employees) by industry type, zonal private trucks 226; owned and zonal income data which are not available currently should be considered. A plot of frequency distribution of the in-state zones as a function of the revised production and attraction adjustment factors shows the overall " adjustment resulting from the SELINK analysis process. Overall, the revised SELINK adjustments show that the productions for many zones are reduced by, a factor of 0.5 to 0.8 while the productions for ~ relatively few zones are increased by factors from 1.1 to 4 with most of the factors in the 3.0 range. No obvious explanation for the frequency distribution could be found. The revised SELINK adjustments overall appear to be reasonable. The heavy truck VMT analysis is conducted by comparing the 1990 heavy truck VMT that is forecasted by the GM truck forecasting model, 2.975 billions, with the WisDOT computed data. This gives an estimate that is 18.3% less than the WisDOT computation of 3.642 billions of VMT. The WisDOT estimates are based on the sampling the link volumes for USH, 8TH, and CTH. This implies potential error in sampling the average link volume. The WisDOT estimate of heavy truck VMT cannot be tabulated by the three trip types, I-I, I-E ('||'&'||'pound;-I), and E-E. In contrast, the GM forecasting model shows that the proportion ofE-E VMT out of total VMT is 21.24%. In addition, tabulation of heavy truck VMT by route functional class shows that the proportion of truck traffic traversing the freeways and expressways is 76.5%. Only 14.1% of total freeway truck traffic is I-I trips, while 80% of total collector truck traffic is I-I trips. This implies that freeways are traversed mainly by I-E and E-E truck traffic while collectors are used mainly by I-I truck traffic. Other tabulations such as average heavy truck speed by trip type, average travel distance by trip type and the VMT distribution by trip type, route functional class and travel speed are useful information for highway planners to understand the characteristics of statewide heavy truck trip patternS. Heavy truck volumes for the target year 2010 are forecasted by using the GM truck forecasting model. Four scenarios are used. Fo~ better forecasting, ground count- based segment adjustment factors are developed and applied. ISH 90 '||'&'||' 94 and USH 41 are used as example routes. The forecasting results by using the ground count-based segment adjustment factors are satisfactory for long range planning purposes, but additional ground counts would be useful for USH 41. Sensitivity analysis provides estimates of the impacts of the alternative growth rates including information about changes in the trip types using key routes. The network'||'&'||'not;based GMcan easily model scenarios with different rates of growth in rural versus . . urban areas, small versus large cities, and in-state zones versus external stations. cities, and in-state zones versus external stations.
Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.