• Title/Summary/Keyword: Information analysis system

Search Result 18,999, Processing Time 0.066 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Analysis of Surveys to Determine the Real Prices of Ingredients used in School Foodservice (학교급식 식재료별 시장가격 조사 실태 분석)

  • Lee, Seo-Hyun;Lee, Min A;Ryoo, Jae-Yoon;Kim, Sanghyo;Kim, Soo-Youn;Lee, Hojin
    • Korean Journal of Community Nutrition
    • /
    • v.26 no.3
    • /
    • pp.188-199
    • /
    • 2021
  • Objectives: The purpose was to identify the ingredients that are usually surveyed for assessing real prices and to present the demand for such surveys by nutrition teachers and dietitians for ingredients used by school foodservice. Methods: A survey was conducted online from December 2019 to January 2020. The survey questionnaire was distributed to 1,158 nutrition teachers and dietitians from elementary, middle, and high schools nationwide, and 439 (37.9% return rate) of the 1,158 were collected and used for data analysis. Results: The ingredients which were investigated for price realities directly by schools were industrial products in 228 schools (51.8%), fruits in 169 schools (38.4%), and specialty crops in 166 schools (37.7%). Moreover, nutrition teachers and dietitians in elementary, middle, and high schools searched in different ways for the real prices of ingredients. In elementary schools, there was a high demand for price information about grains, vegetables or root and tuber crops, special crops, fruits, eggs, fishes, and organic and locally grown ingredients by the School Foodservice Support Centers. Real price information about meats, industrial products, and pickled processed products were sought from the external specialized institutions. In addition, nutrition teachers and dietitians in middle and high schools wanted to obtain prices of all of the ingredients from the Offices of Education or the District Office of Education. Conclusions: Schools want to efficiently use the time or money spent on research for the real prices of ingredients through reputable organizations or to co-work with other nutrition teachers and dietitians. The results of this study will be useful in understanding the current status of the surveys carried out to determine the real price information for ingredients used by the school foodservice.

Investigation of Study Items for the Patterns of Care Study in the Radiotherapy of Laryngeal Cancer: Preliminary Results (후두암의 방사선치료 Patterns of Care Study를 위한 프로그램 항목 개발: 예비 결과)

  • Chung Woong-Ki;Kim I1-Han;Ahn Sung-Ja;Nam Taek-Keun;Oh Yoon-Kyeong;Song Ju-Young;Nah Byung-Sik;Chung Gyung-Ai;Kwon Hyoung-Cheol;Kim Jung-Soo;Kim Soo-Kon;Kang Jeong-Ku
    • Radiation Oncology Journal
    • /
    • v.21 no.4
    • /
    • pp.299-305
    • /
    • 2003
  • Purpose: In order to develop the national guide-lines for the standardization of radiotherapy we are planning to establish a web-based, on-line data-base system for laryngeal cancer. As a first step this study was performed to accumulate the basic clinical information of laryngeal cancer and to determine the items needed for the data-base system. Materials and Methods: We analyzed the clinical data on patients who were treated under the diagnosis of laryngeal cancer from January 1998 through December 1999 In the South-west area of Korea. Eligiblity criteria of the patients are as follows: 18 years or older, currently diagnosed with primary epithelial carcinoma of larynx, and no history of previous treatments for another cancers and the other laryngeal diseases. The items were developed and filled out by radiation oncologlst who are members of forean Southwest Radiation Oncology Group. SPSS vl0.0 software was used for statistical analysis. Results: Data of forty-five patients were collected. Age distribution of patients ranged from 28 to 88 years(median, 61). Laryngeal cancer occurred predominantly In males (10 : 1 sex ratio). Twenty-eight patients (62$\%$) had primary cancers in the glottis and 17 (38$\%$) in the supraglottis. Most of them were diagnosed pathologically as squamous cell carcinoma (44/45, 98$\%$). Twenty-four of 28 glottic cancer patients (86$\%$) had AJCC (American Joint Committee on Cancer) stage I/II, but 50$\%$ (8/16) had In supraglottic cancer patients (p=0.02). Most patients(89$\%$) had the symptom of hoarseness. indirect laryngoscopy was done in all patients and direct laryngoscopy was peformed in 43 (98$\%$) patients. Twenty-one of 28 (75$\%$) glottic cancer cases and 6 of 17 (35$\%$) supraglottic cancer cases were treated with radiation alone, respectively. The combined treatment of surgery and radiation was used in 5 (18$\%$) glottic and 8 (47$\%$) supraglottic patients. Chemotherapy and radiation was used in 2 (7$\%$) glottic and 3 (18$\%$) supraglottic patients. There was no statistically significant difference in the use of combined modality treatments between glottic and supraglottic cancers (p=0.20). In all patients, 6 MV X-ray was used with conventional fractionation. The iraction size was 2 Gy In 80$\%$ of glottic cancer patients compared with 1.8 Gy in 59$\%$ of the patients with supraglottic cancers. The mean total dose delivered to primary lesions were 65.98 ey and 70.15 Gy in glottic and supraglottic patients treated, respectively, with radiation alone. Based on the collected data, 12 modules with 90 items were developed or the study of the patterns of care In laryngeal cancer. Conclusion: The study Items for laryngeal cancer were developed. In the near future, a web system will be established based on the Items Investigated, and then a nation-wide analysis on laryngeal cancer will be processed for the standardization and optimization of radlotherapy.

Estimation of GARCH Models and Performance Analysis of Volatility Trading System using Support Vector Regression (Support Vector Regression을 이용한 GARCH 모형의 추정과 투자전략의 성과분석)

  • Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.107-122
    • /
    • 2017
  • Volatility in the stock market returns is a measure of investment risk. It plays a central role in portfolio optimization, asset pricing and risk management as well as most theoretical financial models. Engle(1982) presented a pioneering paper on the stock market volatility that explains the time-variant characteristics embedded in the stock market return volatility. His model, Autoregressive Conditional Heteroscedasticity (ARCH), was generalized by Bollerslev(1986) as GARCH models. Empirical studies have shown that GARCH models describes well the fat-tailed return distributions and volatility clustering phenomenon appearing in stock prices. The parameters of the GARCH models are generally estimated by the maximum likelihood estimation (MLE) based on the standard normal density. But, since 1987 Black Monday, the stock market prices have become very complex and shown a lot of noisy terms. Recent studies start to apply artificial intelligent approach in estimating the GARCH parameters as a substitute for the MLE. The paper presents SVR-based GARCH process and compares with MLE-based GARCH process to estimate the parameters of GARCH models which are known to well forecast stock market volatility. Kernel functions used in SVR estimation process are linear, polynomial and radial. We analyzed the suggested models with KOSPI 200 Index. This index is constituted by 200 blue chip stocks listed in the Korea Exchange. We sampled KOSPI 200 daily closing values from 2010 to 2015. Sample observations are 1487 days. We used 1187 days to train the suggested GARCH models and the remaining 300 days were used as testing data. First, symmetric and asymmetric GARCH models are estimated by MLE. We forecasted KOSPI 200 Index return volatility and the statistical metric MSE shows better results for the asymmetric GARCH models such as E-GARCH or GJR-GARCH. This is consistent with the documented non-normal return distribution characteristics with fat-tail and leptokurtosis. Compared with MLE estimation process, SVR-based GARCH models outperform the MLE methodology in KOSPI 200 Index return volatility forecasting. Polynomial kernel function shows exceptionally lower forecasting accuracy. We suggested Intelligent Volatility Trading System (IVTS) that utilizes the forecasted volatility results. IVTS entry rules are as follows. If forecasted tomorrow volatility will increase then buy volatility today. If forecasted tomorrow volatility will decrease then sell volatility today. If forecasted volatility direction does not change we hold the existing buy or sell positions. IVTS is assumed to buy and sell historical volatility values. This is somewhat unreal because we cannot trade historical volatility values themselves. But our simulation results are meaningful since the Korea Exchange introduced volatility futures contract that traders can trade since November 2014. The trading systems with SVR-based GARCH models show higher returns than MLE-based GARCH in the testing period. And trading profitable percentages of MLE-based GARCH IVTS models range from 47.5% to 50.0%, trading profitable percentages of SVR-based GARCH IVTS models range from 51.8% to 59.7%. MLE-based symmetric S-GARCH shows +150.2% return and SVR-based symmetric S-GARCH shows +526.4% return. MLE-based asymmetric E-GARCH shows -72% return and SVR-based asymmetric E-GARCH shows +245.6% return. MLE-based asymmetric GJR-GARCH shows -98.7% return and SVR-based asymmetric GJR-GARCH shows +126.3% return. Linear kernel function shows higher trading returns than radial kernel function. Best performance of SVR-based IVTS is +526.4% and that of MLE-based IVTS is +150.2%. SVR-based GARCH IVTS shows higher trading frequency. This study has some limitations. Our models are solely based on SVR. Other artificial intelligence models are needed to search for better performance. We do not consider costs incurred in the trading process including brokerage commissions and slippage costs. IVTS trading performance is unreal since we use historical volatility values as trading objects. The exact forecasting of stock market volatility is essential in the real trading as well as asset pricing models. Further studies on other machine learning-based GARCH models can give better information for the stock market investors.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Moderating Effect of Lifestyle on Consumer Behavior of Loungewear with Korean Traditional Fashion Design Elements (소비자대함유한국전통시상설계원소적편복적소비행위지우생활방식적조절작용(消费者对含有韩国传统时尚设计元素的便服的消费行为之于生活方式的调节作用))

  • Ko, Eun-Ju;Lee, Jee-Hyun;Kim, Angella Ji-Young;Burns, Leslie Davis
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.1
    • /
    • pp.15-26
    • /
    • 2010
  • Due to the globalization across various industries and cultural trade among many countries, oriental concepts have been attracting world’s attentions. In fashion industry, one's traditional culture is often developed as fashion theme for designers' creation and became strong strategies to stand out among competitors. Because of the increase of preferences for oriental images, opportunities abound to introduce traditional fashion goods and expand culture based business to global fashion markets. However, global fashion brands that include Korean traditional culture are yet to be developed. In order to develop a global fashion brand with Korean taste, it is very important for native citizen to accept their own culture in domestic apparel market prior to expansion into foreign market. Loungewear is evaluated to be appropriate for adopting Korean traditional details into clothing since this wardrobe category embraces various purposes which will easily lead to natural adaptation and wide spread use. Also, this market is seeing an increased demand for multipurpose wardrobes and fashionable underwear (Park et al. 2009). Despite rapid growth in the loungewear market, specific studies of loungewear is rare; and among research on developing modernized-traditional clothing, fashion items and brands do not always include the loungewear category. Therefore, this study investigated the Korean loungewear market and studied consumer evaluation toward loungewear with Korean traditional fashion design elements. Relationship among antecedents of purchase intention for Korean traditional fashion design elements were analyzed and compared between lifestyle groups for consumer targeting purposes. Product quality, retail service quality, perceived value, and preference on loungewear with Korean traditional design elements were chosen as antecedents of purchase intention and a structural equation model was designed to examine their relationship as well as their influence on purchase intention. Product quality and retail service quality among marketing mixes were employed as factors affecting preference and perceived value of loungewear with Korean traditional fashion design elements. Also effects of preference and perceived value on purchase intention were examined through the same model. A total of 357 self-administered questionnaires were completed by female consumers via web survey system. A questionnaire was developed to measure samples' lifestyle, product and retail service quality as purchasing criteria, perceived value, preference and purchase intention of loungewear with Korean traditional fashion design elements. Also, loungewear purchasing and usage behavior were asked as well in order to examine Korean loungewear market status. Data was analyzed through descriptive analysis, factor analysis, cluster analysis, ANOVA and structural equation model was tested via AMOS 7.0. As for the result of Korean loungewear market status investigation, loungewear was purchased by most of the consumers in our sample. Loungewear is currently recognized as clothes that are worn at home and consumers are showing comparably low involvement toward loungewear. Most of consumers in this study purchase loungewear only two to three times a year and they spend less than US$10. A total of 12 items and four factors of loungewear consumer lifestyle were found: traditional value oriented lifestyle, brand-affected lifestyle, pursuit of leisure lifestyle, and health oriented lifestyle. Drawing on lifestyle factors, loungewear consumers were classified into two groups; Well-being and Conservative. Relationships among constructs of purchasing behavior related to loungewear with Korean traditional fashion design elements were estimated. Preference and perceived value of loungewear were affected by both product quality and retail service quality. This study proved that high qualities in product and retail service develop positive preference toward loungewear. Perceived value and preference of loungewear positively influenced purchase intention. The results indicated that high preference and perceived value of loungewear with Korean traditional fashion design elements strengthen purchase intention and proved importance of developing preference and elevate perceived value in order to make sales. In a model comparison between two lifestyle groups: Well-being and Conservative lifestyle groups, results showed that product quality and retail service quality had positive influences on both preference and perceived value in case of Well-being group. However, for Conservative group, only retail service quality had a positive effect on preference and its influence to purchase intention. Since Well-being group showed more significant influence on purchase intention, loungewear brands with Korean traditional fashion design elements may want to focus on characteristics of Well-being group. However, Conservative group's relationship between preference and purchase intention of loungewear with Korean traditional fashion design elements was stronger, so that loungewear brands with Korean traditional fashion design elements should focus on creating conservative consumers' positive preference toward loungewear. The results offered information on Korean loungewear consumers' lifestyle and provided useful information for fashion brands that are planning to enter Korean loungewear market, particularly targeting female consumers similar to the sample of the present study. This study offers strategic and marketing insight for loungewear brands and also for fashion brands that are planning to create highly value-added fashion brands with Korean traditional fashion design elements. Considering different types of lifestyle groups that are associated with loungewear or traditional fashion goods, brand managers and marketers can use the results of this paper as a reference to positioning, targeting and marketing strategy buildings.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Evaluation of Water Quality Impacts of Forest Fragmentation at Doam-Dam Watershed using GIS-based Modeling System (GIS 기반의 모형을 이용한 도암댐 유역의 산림 파편화에 따른 수(水)환경 영향 평가)

  • Heo, Sung-Gu;Kim, Ki-Sung;Ahn, Jae-Hun;Yoon, Jong-Suk;Lim, Kyoungjae;Choi, Joongdae;Shin, Yong-Chul;Lyou, Chang-Won
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.9 no.4
    • /
    • pp.81-94
    • /
    • 2006
  • The water quality impacts of forest fragmentation at the Doam-dam watershed were evaluated in this study. For this ends, the watershed scale model, Soil and Water Assessment Tool (SWAT) model was utilized. To exclude the effects of different magnitude and patterns in weather, the same weather data of 1985 was used because of significant differences in precipitation in year 1985 and 2000. The water quality impacts of forest fragmentation were analyzed temporarily and spatially because of its nature. The flow rates for Winter and Spring has increased with forest fragmentations by $8,366m^3/month$ and $72,763m^3/month$ in the S1 subwatershed, experiencing the most forest fragmentation within the Doam-dam watershed. For Summer and Fall, the flow rate has increased by $149,901m^3/month$ and $107,109m^3/month$, respectively. It is believed that increased flow rates contributed significant amounts of soil erosion and diffused nonpoint source pollutants into the receiving water bodies. With the forest fragmentation in the S1 watershed, the average sediment concentration values for Winter and Spring increased by 5.448mg/L and 13.354mg/L, respectively. It is believed that the agricultural area, which were forest before the forest fragmentation, are responsible for increased soil erosion and sediment yield during the spring thaw and snow melts. For Spring and Fall, the sediment concentration values increased by 20.680mg/L and 24.680mg/L, respectively. Compared with Winter and Spring, the increased precipitation during Summer and Fall contributed more soil erosion and increased sediment concentration value in the stream. Based on the results obtained from the analysis performed in this study, the stream flow and sediment concentration values has increased with forest fragmentation within the S1 subwatershed. These increased flow and soil erosion could contribute the eutrophication in the receiving water bodies. This results show that natural functionalities of the forest, such as flood control, soil erosion protection, and water quality improvement, can be easily lost with on-going forest fragmentation within the watershed. Thus, the minimize the negative impacts of forest fragmentation, comprehensive land use planning at watershed scale needs to be developed and implemented based on the results obtained in this research.

  • PDF

Quality Assurance of Patients for Intensity Modulated Radiation Therapy (세기조절방사선치료(IMRT) 환자의 QA)

  • Yoon Sang Min;Yi Byong Yong;Choi Eun Kyung;Kim Jong Hoon;Ahn Seung Do;Lee Sang-Wook
    • Radiation Oncology Journal
    • /
    • v.20 no.1
    • /
    • pp.81-90
    • /
    • 2002
  • Purpose : To establish and verify the proper and the practical IMRT (Intensity--modulated radiation therapy) patient QA (Quality Assurance). Materials and Methods : An IMRT QA which consists of 3 steps and 16 items were designed and examined the validity of the program by applying to 9 patients, 12 IMRT cases of various sites. The three step OA program consists of RTP related QA, treatment information flow QA, and a treatment delivery QA procedure. The evaluation of organ constraints, the validity of the point dose, and the dose distribution are major issues in the RTP related QA procedure. The leaf sequence file generation, the evaluation of the MLC control file, the comparison of the dry run film, and the IMRT field simulate image were included in the treatment information flow procedure QA. The patient setup QA, the verification of the IMRT treatment fields to the patients, and the examination of the data in the Record & Verify system make up the treatment delivery QA procedure. Results : The point dose measurement results of 10 cases showed good agreement with the RTP calculation within $3\%$. One case showed more than a $3\%$ difference and the other case showed more than $5\%$, which was out side the tolerance level. We could not find any differences of more than 2 mm between the RTP leaf sequence and the dry run film. Film dosimetry and the dose distribution from the phantom plan showed the same tendency, but quantitative analysis was not possible because of the film dosimetry nature. No error had been found from the MLC control file and one mis-registration case was found before treatment. Conclusion : This study shows the usefulness and the necessity of the IMRT patient QA program. The whole procedure of this program should be peformed, especially by institutions that have just started to accumulate experience. But, the program is too complex and time consuming. Therefore, we propose practical and essential QA items for institutions in which the IMRT is performed as a routine procedure.