• Title/Summary/Keyword: 김

Search Result 193,668, Processing Time 0.172 seconds

A Study on the Establishment of Comparison System between the Statement of Military Reports and Related Laws (군(軍) 보고서 등장 문장과 관련 법령 간 비교 시스템 구축 방안 연구)

  • Jung, Jiin;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.109-125
    • /
    • 2020
  • The Ministry of National Defense is pushing for the Defense Acquisition Program to build strong defense capabilities, and it spends more than 10 trillion won annually on defense improvement. As the Defense Acquisition Program is directly related to the security of the nation as well as the lives and property of the people, it must be carried out very transparently and efficiently by experts. However, the excessive diversification of laws and regulations related to the Defense Acquisition Program has made it challenging for many working-level officials to carry out the Defense Acquisition Program smoothly. It is even known that many people realize that there are related regulations that they were unaware of until they push ahead with their work. In addition, the statutory statements related to the Defense Acquisition Program have the tendency to cause serious issues even if only a single expression is wrong within the sentence. Despite this, efforts to establish a sentence comparison system to correct this issue in real time have been minimal. Therefore, this paper tries to propose a "Comparison System between the Statement of Military Reports and Related Laws" implementation plan that uses the Siamese Network-based artificial neural network, a model in the field of natural language processing (NLP), to observe the similarity between sentences that are likely to appear in the Defense Acquisition Program related documents and those from related statutory provisions to determine and classify the risk of illegality and to make users aware of the consequences. Various artificial neural network models (Bi-LSTM, Self-Attention, D_Bi-LSTM) were studied using 3,442 pairs of "Original Sentence"(described in actual statutes) and "Edited Sentence"(edited sentences derived from "Original Sentence"). Among many Defense Acquisition Program related statutes, DEFENSE ACQUISITION PROGRAM ACT, ENFORCEMENT RULE OF THE DEFENSE ACQUISITION PROGRAM ACT, and ENFORCEMENT DECREE OF THE DEFENSE ACQUISITION PROGRAM ACT were selected. Furthermore, "Original Sentence" has the 83 provisions that actually appear in the Act. "Original Sentence" has the main 83 clauses most accessible to working-level officials in their work. "Edited Sentence" is comprised of 30 to 50 similar sentences that are likely to appear modified in the county report for each clause("Original Sentence"). During the creation of the edited sentences, the original sentences were modified using 12 certain rules, and these sentences were produced in proportion to the number of such rules, as it was the case for the original sentences. After conducting 1 : 1 sentence similarity performance evaluation experiments, it was possible to classify each "Edited Sentence" as legal or illegal with considerable accuracy. In addition, the "Edited Sentence" dataset used to train the neural network models contains a variety of actual statutory statements("Original Sentence"), which are characterized by the 12 rules. On the other hand, the models are not able to effectively classify other sentences, which appear in actual military reports, when only the "Original Sentence" and "Edited Sentence" dataset have been fed to them. The dataset is not ample enough for the model to recognize other incoming new sentences. Hence, the performance of the model was reassessed by writing an additional 120 new sentences that have better resemblance to those in the actual military report and still have association with the original sentences. Thereafter, we were able to check that the models' performances surpassed a certain level even when they were trained merely with "Original Sentence" and "Edited Sentence" data. If sufficient model learning is achieved through the improvement and expansion of the full set of learning data with the addition of the actual report appearance sentences, the models will be able to better classify other sentences coming from military reports as legal or illegal. Based on the experimental results, this study confirms the possibility and value of building "Real-Time Automated Comparison System Between Military Documents and Related Laws". The research conducted in this experiment can verify which specific clause, of several that appear in related law clause is most similar to the sentence that appears in the Defense Acquisition Program-related military reports. This helps determine whether the contents in the military report sentences are at the risk of illegality when they are compared with those in the law clauses.

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

  • Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.49-65
    • /
    • 2014
  • Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.

Study of East Asia Climate Change for the Last Glacial Maximum Using Numerical Model (수치모델을 이용한 Last Glacial Maximum의 동아시아 기후변화 연구)

  • Kim, Seong-Joong;Park, Yoo-Min;Lee, Bang-Yong;Choi, Tae-Jin;Yoon, Young-Jun;Suk, Bong-Chool
    • The Korean Journal of Quaternary Research
    • /
    • v.20 no.1 s.26
    • /
    • pp.51-66
    • /
    • 2006
  • The climate of the last glacial maximum (LGM) in northeast Asia is simulated with an atmospheric general circulation model of NCAR CCM3 at spectral truncation of T170, corresponding to a grid cell size of roughly 75 km. Modern climate is simulated by a prescribed sea surface temperature and sea ice provided from NCAR, and contemporary atmospheric CO2, topography, and orbital parameters, while LGM simulation was forced with the reconstructed CLIMAP sea surface temperatures, sea ice distribution, ice sheet topography, reduced $CO_2$, and orbital parameters. Under LGM conditions, surface temperature is markedly reduced in winter by more than $18^{\circ}C$ in the Korean west sea and continental margin of the Korean east sea, where the ocean exposed to land in the LGM, whereas in these areas surface temperature is warmer than present in summer by up to $2^{\circ}C$. This is due to the difference in heat capacity between ocean and land. Overall, in the LGM surface is cooled by $4{\sim}6^{\circ}C$ in northeast Asia land and by $7.1^{\circ}C$ in the entire area. An analysis of surface heat fluxes show that the surface cooling is due to the increase in outgoing longwave radiation associated with the reduced $CO_2$ concentration. The reduction in surface temperature leads to a weakening of the hydrological cycle. In winter, precipitation decreases largely in the southeastern part of Asia by about $1{\sim}4\;mm/day$, while in summer a larger reduction is found over China. Overall, annual-mean precipitation decreases by about 50% in the LGM. In northeast Asia, evaporation is also overall reduced in the LGM, but the reduction of precipitation is larger, eventually leading to a drier climate. The drier LGM climate simulated in this study is consistent with proxy evidence compiled in other areas. Overall, the high-resolution model captures the climate features reasonably well under global domain.

  • PDF

Studies on a Factor Affecting Composts Maturity During Composting of SWine Manure (돈분 퇴비화 중 부숙도에 미치는 영향인자 구명)

  • Kim, T.I.;Song, J. I.;Yang, C.B.;Kim, M.K.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.2
    • /
    • pp.261-272
    • /
    • 2004
  • This study was conducted to investigate indices affecting composts maturity for swine manure compost produced in a commercial composting facility with air-forced from the bottom. The composting was made of swine manure mixed with puffing rice hull(6: 4) and turned by escalating agitator twice a day. Composting samples were collected periodically during a 45-d composting cycle at that system, showing that indices of Ammonium-N to Nitrate-N ratio were sensitive indicators of composting quality. Pile temperature maintained more than 62$^{\circ}C$ and water contents decreased about 20% for 25days of composting. A great variety and high numbers of aerobic thermophilic heterotropic microbes playing critical roles in stability of composts have been examined in the final composts, sbowing that they were detected $10^8$ to $10^{10}$ $CFUg^{-1}$ in mesophilic bacteria, $10^3$ - $10^4$ in fungi and $10^6$ - $10^8$ in actinomycetes, respectively. The results of this study for detennining a factor affecting compost stability evaluations based on composting steps were as follows; 1. Ammonium-N concentrations were highest at the beginning of composting, reaching approximately 421mg/kg. However Ammonium-N concentrations were lower during curing, reaching approximately l04mg/kg just after 45 day. The ratio between $NH_4-N$ and $NO_3-N$ was above II at the beginning of composting and less than 2 at the final step(45 day). 2. Seed germination Index was dependent upon the compost phytotoxicity and its nutrition. The phytotocity caused the GI to low during the period of active composting(till 25 days of composting time) depending on the value of the undiluted. After 25 days of composting time, the GI was dependent upon compost nutrition. The Gennination index of the final step was calculated at over 80 without regard to treatments. 3. E4: E6 ratio in humic acid of composts was correlatively decreased from 8.86 to 6.76 during the period of active composting. After 25 days of composting time, the E4: E6 was consistently decreased from 6.76 to 4.67($r^2$ of total composting period was 0.95). 4. Water soluble carbon had a tendency to increase from 0.54% to 0.78%during the period of active composting. After 25 days of composting time, it was consistently decreased from 0.78% to 0.42%. Water soluble nitrogen increased from 0.22% to 0.32% during the period of 15 days after initial composting while decreased from 0.32% to 0.21% after 15days of composting. In consequence, the correlation coefficient($r^2$) between water soluble carbon and water soluble nitrogen was 0.12 during the period of active composting mule was 0.50 after 25 days of composting time

Evaluation of Air Quality in the Compost Pilot Plant with Livestock Manure by Operation Types (축분 퇴비화시스템 운용방식에 따른 실내 대기오염 평가)

  • Kim, K.Y.;Choi, H.L.;Ko, H.J.;Kim, C.N.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.2
    • /
    • pp.283-294
    • /
    • 2004
  • Air quality in the livestock waste compost pilot plant at the Colligate Livestock Station was assessed to quantity the emissions of aerial contaminants and evaluate the degree of correlation between them for different operation strategies; with the ventilation types and agitation of compost pile, in this study. The parameters analyzed to reflect the level of air quality in the livestock waste compost pilot plant were the gaseous contaminants; ammonia, hydrogen sulfide, and odor concentration, the particulate contaminants; inhalable dust and respirable dust, and the biological contaminants; total airborne bacteria and fungi. The mean concentrations of ammonia, hydrogen sulfide, and odor concentration in the compost pilot plant without agitation were 2.45ppm, 19.96ppb, and 15.8 when it was naturally ventilated, and 7.61ppm, 31.36ppb, and 30.2 when mechanically ventilated. Those with agitation were 5.50ppm, 14.69ppb, and 46.4 when naturally ventilated, and 30.12ppm, 39.91ppb, and 205.5 when mechanically ventilated. The mean concentrations of inhalable and respirable dust in the compost pilot plant without agitation were 368.6${\mu}g$/$m^3$ and 96.0${\mu}g$/$m^3$ with natural ventilation, and 283.9${\mu}g$/$m^3$ and 119.5${\mu}g$/$m^3$ with mechanical ventilation. They were also observed with agitation to 208.7${\mu}g$/$m^3$ and 139.8${\mu}g$/$m^3$ with natural ventilation, and 209.2${\mu}g$/$m^3$ and 131.7${\mu}g$/$m^3$ with mechanical ventilation. Averaged concentrations of total airborne bacteria and fungi in the compost pilot plant without agitation were observed to 28,673cfu/$m^3$ and 22,507cfu/$m^3$ with natural ventilation, and 7,462cfu/$m^3$ and 3,228cfu/$m^3$ with mechanical ventilation. They were also observed with agitation to 19,592cfu/$m^3$ and 26,376cfu/$m^3$ with the natural ventilation, and 18,645cfu/$m^3$ and 24,581cfu/$m^3$ with the mechanical ventilation. It showed that the emission rates of gaseous pollutants, such as ammonia, hydrogen sulfide, and odor concentration, in the compost pilot plant operated with the mechanical ventilation and with the agitation of compost pile were higher than those with the natural ventilation and without the agitation. While the concentrations of inhalable dust and total airborne bacteria in the compost pilot plant with the natural ventilation and with the agitation, the concentrations of respirable dust and total airborne fungi in the compost pilot plant with the mechanical ventilation and agitation were higher than those with the natural ventilation and without the agitation of compost pile. It was statistically proved that indoor temperature and relative humidity affected the release of particulates and biological pollutants, and ammonia and hydrogen sulfide were believed primary malodorous compounds emitted from the compost pilot plant.

Growth Efficiency, Carcass Quality Characteristics and Profitability of 'High'-Market Weight Pigs ('고체중' 출하돈의 성장효율, 도체 품질 특성 및 수익성)

  • Park, M.J.;Ha, D.M.;Shin, H.W.;Lee, S.H.;Kim, W.K.;Ha, S.H.;Yang, H.S.;Jeong, J.Y.;Joo, S.T.;Lee, C.Y.
    • Journal of Animal Science and Technology
    • /
    • v.49 no.4
    • /
    • pp.459-470
    • /
    • 2007
  • Domestically, finishing pigs are marketed at 110 kg on an average. However, it is thought to be feasible to increase the market weight to 120kg or greater without decreasing the carcass quality, because most domestic pigs for pork production have descended from lean-type lineages. The present study was undertaken to investigate the growth efficiency and profitability of ‘high’-market wt pigs and the physicochemical characteristics and consumers' acceptability of the high-wt carcass. A total of 96 (Yorkshire × Landrace) × Duroc-crossbred gilts and barrows were fed a finisher diet ad laibtum in 16 pens beginning from 90-kg BW, after which the animals were slaughtered at 110kg (control) or ‘high’ market wt (135 and 125kg in gilts & barrows, respectively) and their carcasses were analyzed. Average daily gain and gain:feed did not differ between the two sex or market wt groups, whereas average daily feed intake was greater in the barrow and high market wt groups than in the gilt and 110-kg market wt groups, respectively(P<0.01). Backfat thickness of the high-market wt gilts and barrows corrected for 135 and 125-kg live wt, which were 23.7 and 22.5 mm, respectively, were greater (P<0.01) than their corresponding 110-kg counterparts(19.7 & 21.1 mm). Percentages of the trimmed primal cuts per total trimmed lean (w/w), except for that of loin, differed statistically (P<0.05) between two sex or market wt groups, but their numerical differences were rather small. Crude protein content of the loin was greater in the high vs. 110-kg market group (P<0.01), but crude fat and moisture contents and other physicochemical characteristics including the color of this primal cut were not different between the two sexes or market weights. Aroma, marbling and overall acceptability scores were greater in the high vs. 110-kg market wt group in sensory evaluation for fresh loin (P<0.01); however, overall acceptabilities for cooked loin, belly and ham were not different between the two market wt groups. Marginal profits of the 135- and 125-kg high-market wt gilt and barrow relative to their corresponding 110-kg ones were approximately -35,000 and 3,500 wons per head under the current carcass grading standard and price. However, if it had not been for the upper wt limits for the A- and B-grade carcasses, marginal profits of the high market wt gilt and barrow would have amounted to 22,000 and 11,000 wons per head, respectively. In summary, 120~125-kg market pigs are likely to meet the consumers' preference better than the 110-kg ones and also bring a profit equal to or slightly greater than that of the latter even under the current carcass grading standard. Moreover, if only the upper wt limits of the A- & B-grade carcasses were removed or increased to accommodate the high-wt carcass, the optimum market weights for the gilt and barrow would fall upon their target weights of the present study, i.e. 135 and 125 kg, respectively.

Studies on the ${\beta}-Tyrosinase$ -Part 2. On the Synthesis of Halo-tyrosine by ${\beta}-Tyrosinase$- (${\beta}-Tyrosinase$에 관한 연구 -제2보 ${\beta}-Tyrosinase$에 의한 Halogen화(化) Tyrosine의 합성(合成)-)

  • Kim, Chan-Jo;Nagasawa, Toru;Tani, Yoshiki;Yamada, Hideaki
    • Applied Biological Chemistry
    • /
    • v.22 no.4
    • /
    • pp.198-209
    • /
    • 1979
  • L-Tyrosine, 2-chloro-L-tyrosine, 2-bromo-L-tyrosine, and 2-iodo-L-tyrosine were synthesized by ${\beta}-tyrosinase$ obtained from cells of Escherichia intermedia A-21, through the reversal of the ${\alpha},{\beta}-elimination$ reaction, and their molecular structures were analyzed by element analysis, NMR spectroscopy, mass spectrometry and IR spectroscopy. Rates of synthesis and hydrolysis of halogenated tyrosines by ${\beta}-tyrosinase$, inhibition of the enzyme activity by halogenated phenols, and effects of addition of m-bromophenol on the synthesis of 2-bromotyrosine were determined. The results obtained were as follows: 1) In the synthesis of halogenated tyrosines, the yield of 2-chlorotyrosine from m-chlorophenol were approximately 15 per cent, that of 2-bromotyrosine from m-bromophenol 13.8 per cent, and that of 2-iodotyrosine from m-iodophenol 9.8 per cent. 2) Rate of synthesis of halogenated tyrosines by ${\beta}-tyrosinase$ was slower than that of tyrosine and the rates were decreased in the order of chlorine, bromine and iodine, that is, by increasing the atomic radius. Relative rate of 2-chlorotyrosine synthesis was determined to be 28.2, that of 2-bromotyrosine to be 8.13, and that of 2-iodotyrosine to be 0.98, respectively, against 100 of tyrosine. However 3-iodotyrosine was not synthesized by the enzyme. 3) The relative rate of 2-chlorotyrosine hydrolysis by ${\beta}-tyrosinase$ was 70.7, that of 2-bromotyrosine was 39.0, and that of 2-iodotyrosine was 12.6 against 100 of tyrosine, respectively. The rate of hydrolysis appeared to be decreased in the order of chlorine, bromine and iodine, that is, by increasing the atomic radius or by decreasing the electronegativity. But 3-iodotyrosine was not hydrolyzed by the enzyme. 4) The activity of ${\beta}-tyrosinase$ was inhibited by phenol markedly. Of the halogenated phenols, o-, or m-chlorophenol and o-bromophenol gave marked inhibition on the enzyme action, however inhibition by iodophenol was not strong. Plotting by Lineweaver-Burk method, a mixed-type inhibition by m-chlorophenol was observed and its Ki value was found to be $5.46{\times}10^{-4}M$. 5) During the synthesizing reaction of 2-bromotyrosine by the enzyme, sequential addition of substrate which was m-bromophenol with time intervals and in a small amount resulted in better yield of the product. 6) The halogenated tyrosines which were produced by ${\beta}-tyrosinase$ from pyruvate, ammonia and m-halogenated phenols were analysed to determine their molecular structures by element analysis, NMR spectroscopy, mass spectrometry, and IR spectroscopy. The result indicated that they were 2-chloro-L-tyrosine, 2-bromo-L-tyrosine, and 2-iodo-L-tyrosine, respectively.

  • PDF

A survey on the nutrient intake and food consumption of the students at the dormitories, College of Agriculture, Seoul National University (서울대학교(大學校) 농과대학(農科大學) 남녀(男女) 기숙사생(寄宿舍生)의 영양섭취(營養攝取) 조사(調査))

  • Mo, Su-Mi;Han, In-Kyu;Kim, Ze-Uook;Lee, Chun-Yung;Kim, Ho-Sik
    • Applied Biological Chemistry
    • /
    • v.7
    • /
    • pp.92-104
    • /
    • 1966
  • For the purpose of the better dietary management and to empahsize of importance in nutrition education for 552 students at the dormitories, College of Agriculture, Seoul National University, the dietary survey was conducted for each consecutive seven days, from March 7th to 13th at the boy's dormitory, from March 14th to 20th at the girl's dormitory, respectively. In comparison the average caloric and nutrient intake per caput per day at the both, girl's and boy's dormitory with the recommended dietary allowances for age of 25, the intake of calories and all nutrients except riboflavin were over the allowances for the boy, while the caloric intake by the girl was considerablly below the allowance. But it is meant that only 150 calories was actually deficient in comparison with the figure of the average energy consumption determined for the girls at the dormitory of the Sook-myung Woman's University, whose pattern of living was quite similar to those of the girls at this college. Except iron and ascorbic acid, all other nutrients were deficient for the girls. The calories in the form of protein of a diet taken by the boy was 12.9% and that by the girl was 12.8%. Protein quality of the diet taken by boy scored 70 while that by the girl scored 79. NDp Cal% of the diet taken by the boy was 7 and that by the girl was figured out to be 8. Therefore, calculated reference protein taken by the boy was 55.8 grams and that by the girl was 36.9%. Though it is generally recommended that at least 1/3 of the protein should come from animal sources, it was apparent by this survey that providing 1/5 of the protein from animal sources with remaining part of high quality vegetable protein foods in the adequate mixed diet would give satisfactory results for both girl and boy students. This was clearly demonstrated by the recommended reference protein and NDp Cal% met. Significant difference between boys and girls in the average consumption of seasonings was found. In consumption per day of seasonings, boy used 1.5 grams of red pepper powder which means they used 15 times more of red pepper than girls did. Kochujang was used 13 grams by boy-students which was as high as 21 times of that of the girl. Total salt intake by the boy was 34 grams while the girl consummed 23 grams. It is obviously recognized that boys prefer more peppery and salty flavor than girls do. To reduce the amount of protein consummed and to improve the quality of protein food, increase of riboflavin rich food and increase of fat intake in place of grain intake are recommendable to the boy. For the girl's diet, consumption of grains, particularly more intake of barley mal· be recommendable to meet the B group of vitamins allowances as well as the caloric allowance. The use of more servings of yellow green vegetables is needed to the girl.

  • PDF

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.