• Title/Summary/Keyword: high way

Search Result 6,655, Processing Time 0.041 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Forecasting Substitution and Competition among Previous and New products using Choice-based Diffusion Model with Switching Cost: Focusing on Substitution and Competition among Previous and New Fixed Charged Broadcasting Services (전환 비용이 반영된 선택 기반 확산 모형을 통한 신.구 상품간 대체 및 경쟁 예측: 신.구 유료 방송서비스간 대체 및 경쟁 사례를 중심으로)

  • Koh, Dae-Young;Hwang, Jun-Seok;Oh, Hyun-Seok;Lee, Jong-Su
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.2
    • /
    • pp.223-252
    • /
    • 2008
  • In this study, we attempt to propose a choice-based diffusion model with switching cost, which can be used to forecast the dynamic substitution and competition among previous and new products at both individual-level and aggregate level, especially when market data for new products is insufficient. Additionally, we apply the proposed model to the empirical case of substitution and competition among Analog Cable TV that represents previous fixed charged broadcasting service and Digital Cable TV and Internet Protocol TV (IPTV) that are new ones, verify the validities of our proposed model, and finally derive related empirical implications. For empirical application, we obtained data from survey conducted as follows. Survey was administered by Dongseo Research to 1,000 adults aging from 20 to 60 living in Seoul, Korea, in May of 2007, under the title of 'Demand analysis of next generation fixed interactive broadcasting services'. Conjoint survey modified as follows, was used. First, as the traditional approach in conjoint analysis, we extracted 16 hypothetical alternative cards from the orthogonal design using important attributes and levels of next generation interactive broadcasting services which were determined by previous literature review and experts' comments. Again, we divided 16 conjoint cards into 4 groups, and thus composed 4 choice sets with 4 alternatives each. Therefore, each respondent faces 4 different hypothetical choice situations. In addition to this, we added two ways of modification. First, we asked the respondents to include the status-quo broadcasting services they subscribe to, as another alternative in each choice set. As a result, respondents choose the most preferred alternative among 5 alternatives consisting of 1 alternative with current subscription and 4 hypothetical alternatives in 4 choice sets. Modification of traditional conjoint survey in this way enabled us to estimate the factors related to switching cost or switching threshold in addition to the effects of attributes. Also, by using both revealed preference data(1 alternative with current subscription) and stated preference data (4 hypothetical alternatives), additional advantages in terms of the estimation properties and more conservative and realistic forecast, can be achieved. Second, we asked the respondents to choose the most preferred alternative while considering their expected adoption timing or switching timing. Respondents are asked to report their expected adoption or switching timing among 14 half-year points after the introduction of next generation broadcasting services. As a result, for each respondent, 14 observations with 5 alternatives for each period, are obtained, which results in panel-type data. Finally, this panel-type data consisting of $4{\ast}14{\ast}1000=56000$observations is used for estimation of the individual-level consumer adoption model. From the results obtained by empirical application, in case of forecasting the demand of new products without considering existence of previous product(s) and(or) switching cost factors, it is found that overestimated speed of diffusion at introductory stage or distorted predictions can be obtained, and as such, validities of our proposed model in which both existence of previous products and switching cost factors are properly considered, are verified. Also, it is found that proposed model can produce flexible patterns of market evolution depending on the degree of the effects of consumer preferences for the attributes of the alternatives on individual-level state transition, rather than following S-shaped curve assumed a priori. Empirically, it is found that in various scenarios with diverse combinations of prices, IPTV is more likely to take advantageous positions over Digital Cable TV in obtaining subscribers. Meanwhile, despite inferiorities in many technological attributes, Analog Cable TV, which is regarded as previous product in our analysis, is likely to be substituted by new services gradually rather than abruptly thanks to the advantage in low service charge and existence of high switching cost in fixed charged broadcasting service market.

  • PDF

Analysis of the Korea Traditional Colors within the Spatial Arrangement and Form of the Traditional Garden of Seyeonjeong (보길도 세연정(洗然庭)의 공간구조 형식에 내재한 전통색채 분석)

  • Han, Hee-Jeong;Cho, Se-Hwan
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.32 no.4
    • /
    • pp.14-23
    • /
    • 2014
  • The purpose of this study is to contribute in building credibility of the methodology of the appearance of the traditional colors and the interpretation of the meaning of those appearances by analyzing the spatial construction and configuration and the traditional colors that appear in spatial elements about the scenery component that appear in Seyeonjeong. We conducted a literature research about the traditional colors, the background of the creation of Seyeonjeong, and etc. For the contents for the empirical analysis, we took the scenery and space elements in the poems, such as Eobusasisa and O-u-ga, and the contents of poems related to ojeongsaek (five Korean traditional colors) based on the Yin-Yang and the Five Elements ideology Particularly, after dividing the spatial elements appearing in Seyoenjeong into visual, synesthetic, symbolic/cognitive spatial element, we further distinguished the visual space into positions and directions of the of the spaces and the scenery of the season; the synesthetic space into seasons, time and five senses; and the symbolic/cognitive space into chiljeong (or the seven passions) and sadan (or the four clues). Then we carried out the study by analyzing the correlation between the intention of the garden creation and the meaning of the spaces, through the analysis of ojeongsaek system for each spatial element. Firstly, spatial structure and format that appear in Seyeonjeong can be divided into two directional axes of southeast and northwest according to the flat form of the Seyeongjeong's rectangular palace, with Seyeongjoeng as the center. Secondly, in spatial component element, the frequencies of appearance of the traditional colors of Seyoenjeong are 33.2% for white, 20.8% for blue, 20.8% for black, 18.7% for red and 6.3% for yellow. Thirdly, based on the analysis of the traditional colors the most frequent appearance of 'white' left a room for interpretation like the creation of Seyeonjeong was to enjoy secular living without lingering political feelings so that the high mountains remain clear and clean. Also, the predominant frequency of appearance of blue, similar frequency of appearance of black and red, and the least frequent appearance of yellow is in agreement with or can be at least interpreted related to Yun Seon-do's intention for creating Seyeonjeong not for political rank or power but as a place to enjoy nature, through which he can build on his knowledge, and to lead rest of his life as a noble being through plays, like dancing and writing poems. Fourthly, these interpretations of the analysis of the frequency of appearance of the traditional colors of Seyeongjong shows the reliability, validity, and consistency of the methodology of the analysis of the frequency of appearance of the traditional colors and the interpretation of the meanings in the context that the color white appears most frequently in Soswewon as well and that the background life of the Soswewon's creator Yangsanbo can be interpreted in a similarly way. Above all, this study is significant from the fact that we proposed a theory about the method of analysis and interpretation of the traditional colors in a traditional landscape space. Moreover, there is a great significance of discovering that traditional colors appear in traditional spaces and this can be used as a methodological framework to interpret things like, intention for creation of (buildings/architectures).

The Variation of Natural Population of Pinus densiflora S. et Z. in Korea (V) -Characteristics of Needle and Wood of Injye, Jeongsun, Samchuk Populations- (소나무 천연집단(天然集團)의 변이(變異)에 관(關)한 연구(硏究)(V) -인제(麟蹄), 정선(旌善), 삼척집단(三陟集團)의 침엽(針葉) 및 재질형질(材質形質)-)

  • Yim, Kyong Bin;Kwon, Ki Won;Lee, Kyong Jae
    • Journal of Korean Society of Forest Science
    • /
    • v.36 no.1
    • /
    • pp.9-25
    • /
    • 1977
  • As a successive work of the variation studies of natural Pinus densiflora stands, some characteristics of individual trees of the three natural populations selected from the Kwang-won Province, the middle-east part of Korean peninsula, as shown in the location map, were investigated. And the statiscal differences between individuals within population, and between populations were analysed. Twenty trees from each population were selected for this study purpose. Doing this, those trees lagged in growth, usually showing poorer form, were eliminated. The results obtained are summarized as follows: 1. Though the average population ages had the ranage between 50 and 63, the growth of height or diameter was similar. Population No.9 is, however, considered to have better tree forms at glance. Population No.8 showed the heighest value not only in the clear-stem-length ratio. 0.53 but also in the crown-index 0.91. The higher value can be result from those trees having long lateral branches and relatively short crown height, meaning undesirable crown shape. In regard to the fine branchedness and the acuteness of branching angle, the population No.9. is considered to be a better one, whereas there was almost no difference in crown height among populations. 2. Checking the frequency distributions of the ratio of the clear-stem-height to the total height and the crown-indices, some difference between populations are considered. These might be attributed to the previous way of stand mangement which alters the density. 3. In the serration density, the average number of 54 per 1cm needle length, the significant differences exist between individual trees within population but not between populations. A few trees which extremly high serration density were observed. As in serration, so tendencies were in the number of stomata row and resin duct. 4. The population 8 had the resin duct index value of 0.074 as the highest which was twice or triple of the other ones. 5. The patterns of increasing process of the average 10-year-ring-segment were not similar till the 30 years of age, but beyond this, the tendency lines were aggregated. 6. Regading the average summer wood ratio, no diffrence between populations, but in the ranges, i.e. 23 to 30 in population No.8. and 16 to 36 in population No.9., with regad to the specific gravity of wood, there were hardly observed any difference between populations even in the ranges values. As the increase of tree ages, the increase of specific gravity was followed but the increasing patterns were not similar between populations. 7. No significant differences between populations in the average tracheid length and the range were detected. However, the length was increased according to the age increase. The increasing pattern was same between populations.

  • PDF

An Analysis on the Conditions for Successful Economic Sanctions on North Korea : Focusing on the Maritime Aspects of Economic Sanctions (대북경제제재의 효과성과 미래 발전 방향에 대한 고찰: 해상대북제재를 중심으로)

  • Kim, Sang-Hoon
    • Strategy21
    • /
    • s.46
    • /
    • pp.239-276
    • /
    • 2020
  • The failure of early economic sanctions aimed at hurting the overall economies of targeted states called for a more sophisticated design of economic sanctions. This paved way for the advent of 'smart sanctions,' which target the supporters of the regime instead of the public mass. Despite controversies over the effectiveness of economic sanctions as a coercive tool to change the behavior of a targeted state, the transformation from 'comprehensive sanctions' to 'smart sanctions' is gaining the status of a legitimate method to impose punishment on states that do not conform to international norms, the nonproliferation of weapons of mass destruction in this particular context of the paper. The five permanent members of the United Nations Security Council proved that it can come to an accord on imposing economic sanctions over adopting resolutions on waging military war with targeted states. The North Korean nuclear issue has been the biggest security threat to countries in the region, even for China out of fear that further developments of nuclear weapons in North Korea might lead to a 'domino-effect,' leading to nuclear proliferation in the Northeast Asia region. Economic sanctions had been adopted by the UNSC as early as 2006 after the first North Korean nuclear test and has continually strengthened sanctions measures at each stage of North Korean weapons development. While dubious of the effectiveness of early sanctions on North Korea, recent sanctions that limit North Korea's exports of coal and imports of oil seem to have an impact on the regime, inducing Kim Jong-un to commit to peaceful talks since 2018. The purpose of this paper is to add a variable to the factors determining the success of economic sanctions on North Korea: preventing North Korea's evasion efforts by conducting illegal transshipments at sea. I first analyze the cause of recent success in the economic sanctions that led Kim Jong-un to engage in talks and add the maritime element to the argument. There are three conditions for the success of the sanctions regime, and they are: (1) smart sanctions, targeting commodities and support groups (elites) vital to regime survival., (2) China's faithful participation in the sanctions regime, and finally, (3) preventing North Korea's maritime evasion efforts.

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.1-27
    • /
    • 2017
  • It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

A study on the Relationship between the Degree of Awareness on Low Carbon Green Growth and the Organizational Commitment Focused on the Traditional Retailers (전통시장 상인들의 저탄소 녹색성장에 대한 인식과 조직몰입의 관계에 대한 연구)

  • Yang, Hoe-Chang;Kim, Sung-Il;Park, Young-Ho;Lee, Shang-Nam
    • Journal of Distribution Science
    • /
    • v.9 no.3
    • /
    • pp.37-46
    • /
    • 2011
  • Since the Korean retail industry was made accessible to the big conglomerates and foreign retail companies, local traditional markets have faced serious problems. To sustain the local traditional markets' survival, the Korean government established various remedial policies for addressing, and many scholars published articles to suggest how to find solutions to, the problem. Unfortunately, the results have not been satisfactory. The purpose of this study is to find another way to help the Korean traditional retail market, from the view point of the Green Growth Policy, an initiative designed to address environmentally balanced economic growth in Korea. In order to survive and to maintain sustainable growth, it is incumbent upon retailers in the traditional market to understand the concept of the Green Growth Policy. A survey was conducted as a means of testing the degree of awareness of the Green Growth Policy, as well as determining the relationship between the degree of awareness and the degree of organizational commitment by the retailers in the local traditional markets. Interestingly, we were able to detect some of the features (e.g., they were distinguished by the elderly and the young, as well as low level of education and high level of education) in the traditional market retailers' demographic characteristics. We utilized the analysis of variance (ANOVA) statistical method to simultaneously compare the differences in retailers' demographic characteristics; the results were as follows: Overall, the results showed that the awareness of the Green Growth Policy, the degree of trust in the government's policy, levels of self-efficacy, and levels of organizational commitment were higher with the older traditional market retailers than the younger traditional market retailers. Specifically, the degree of trust in government policies (F=9.964,p < .05), levels of self-efficacy (F=5.532,p < .05), and levels of organizational commitment (F=5.697,p < .05) were statistically significant. Moreover, in the portion of the study that addressed the difference between education levels, all the variables were averaged in the higher education category of the traditional market retailers. Specifically, awareness levels of the Green Growth Policy (F=8.564,p < .005) and levels of self-efficacy (F=6.754,p < .005) were statistically significant. These results revealed that the traditional market retailers' demographic characteristics should be considered important factors in order to realize their policy. The results of the study showed the following: 1) The degree of awareness of the government's Green Growth Policy was statistically significant as it related to traditional market retailers' organizational commitment. 2) The degree of trust of the government's policy was significantly moderated between the awareness of the government's Green Growth Policy and the traditional market retailers' organizational commitment. This result demonstrates that the traditional market retailers' awareness of the government's Green Growth Policy will show more organizational commitment with higher levels of trust of the government's policy. 3) It also revealed that traditional market retailers' self-efficacy was fully mediated between the awareness of the Green Growth Policy of the government and traditional market retailers' organizational commitment. The results suggest that the government should show an interest in showing traditional market retailers how to enhance their traditional markets. Implications and future research directions are also discussed.

  • PDF