• Title/Summary/Keyword: High-Order

Search Result 30,719, Processing Time 0.067 seconds

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

A Comparative Study of Food Habits and Body Satisfaction of Middle School Students According to Clinical Symptoms (일부 남녀 중학생의 건강 관련 임상증상에 따른 식습관과 체헝관심도에 관한 연구)

  • Sung, Chung-Ja
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.34 no.2
    • /
    • pp.202-208
    • /
    • 2005
  • This study was conducted to examine the food habits, knowledge of nutrition and actual conditions of food ingestion of adolescent middle school students according to questionnaire answers. Questionnaires were completed by 524 students, divided into a healthy group (n=289) and an unhealthy group (n=235) according to clinical signs. Further questions were asked of the two groups in the areas of food habits, knowledge of nutrition and nutritional attitude. The results were as follows: Mean age of all subjects was 14, heights for male and female students were 162.0 em, and 157.2 cm, weights were 53.4 kg, and 49.4, respectively. Heights and weights of male students were greater than those of female students. The body mass index (BMI) for male and female students was 20.3 kg/$m^2$ and 20.0 kg/$m^2$, respectively, and all data were within normal ranges. There were no significant differences in mean age, height, weight, and BMI between the healthy and unhealthy groups. There was no significant difference in body image recognition between the two groups, although the ratio of dissatisfaction with their own body shape was significantly higher in the female unhealthy group (46.1%), than in the female healthy group (33.0%) (p<0.05). In the area of the struggle to control body weight during the previous year, the female unhealthy group (59.4%) was higher than the female healthy group (38.4%) (p<0.01). There was no significant difference in the scores between the two groups in the areas of knowledge of nutrition and the nutritional attitude. Meal frequency and meal patterns were showed that having breakfast less than 4x/week was significantly higher in the female unhealthy group (44.0%), than in the female healthy group (30.7%) (p<0.01). Meal frequency for suppers<4x/week showed that the female unhealthy group (18.8%) was also higher than the female healthy group (10.7%). Therefore, the unhealthy group exhibited a higher pattern of missing both breakfast and supper. The male unhealthy group (16.7%) dined out more frequently than the male healthy group (12.3%) (p<0.01), and female unhealthy group also indulged in snacking significantly more frequently than the female healthy group. The unhealthy group also ate only 1 item for meals more frequently than the healthy group and no significant difference. The conclusion of this study is that adolescent Korean middle school students, who showed a higher incidence of clinical symptoms, representing an unhealthy status, missed breakfast and supper, and dined out and indulged in snacking more frequently. Their quality of breakfast and satisfaction of body image were also lower than the healthy group. These results indicated that there is a high correlation between a Korean adolescent's health status, food habits and body image satisfaction. It is recommended that a more intense program of nutritional education and monitoring be introduce into the current Korean middle-school system in order to optimally support and maximize the health potential of the current population of Korean student.

Study on Health Behavior of Hypertensive Patients and Compliance for Treatment of Antihypertensive Medication (고혈압 환자들의 순응도와 건강행태의 관계)

  • Kim, Joo-Yeon;Lee, Dong-Bae;Cho, Young-Chae;Lee, Sok-Goo;Chang, Seong-Sil;Kwon, Yun-Hyung;Lee, Tae-Yong
    • Journal of agricultural medicine and community health
    • /
    • v.25 no.1
    • /
    • pp.29-49
    • /
    • 2000
  • Objectives: To estimate the prevalence rate of hypertension, the changes of health behavior, and compliance for the drug treatment after diagnosed as hypertension. Methods: 7,030 persons who live in Cheonan City of Chungnam Province were selected by the cluster sampling method, and 5,372 persons were surveyed by questionnaire and health examination. This data is analyzed by Chi-square test on each variable. Results: 49.8%- of men and 38.8%- of women had been diagnosed as hypertension, and the prevalence rate of hypertension was significantly increased with aging in both gender. The prevalence rate tended to decrease in highly educated women group. Unemployed persons or obese persons showed relatively higher prevalence rate. The prevalence rate of hypertension increased in groups with higher total cholesterol levels over 240 mg/dl, and groups with glucose level over 200 mg/dl. 53.1%- of male patients and 66.6%- of female patients showed compliance for antihypertensive treatment. Compliance for treatment was higher in aged group or lower educated group in both gender. Among men, proportion of compliant subjects was higher in unemployed group(49.3%-), and lower in labor or primary industry than the others but among women, there was not any significant difference. And men with compliance for treatment had higher monthly income than the others, but women did not show any. Conclusion : This population had a high prevalence rate of hypertension which may lead to cardiovascular disease. Therefore health education programs and distribution of information must be emphasized in order to increase compliance to treatment and encourage the change of health behavior to promote health.

  • PDF

CLINICAL STUDY OF THE ABUSE IN PSYCHIATRICALLY HOSPITALIZED CHILDREN AND ADOLESCENTS (소아청소년 정신과병동 입원아동의 학대에 대한 임상 연구)

  • Lee, Soo-Kyung;Hong, Kang-E
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.10 no.2
    • /
    • pp.145-157
    • /
    • 1999
  • This study was performed by the children and adolescents who were abused or neglected physically, emotionally that were selected in child & adolescents psychiatric ward. We investigated the number of these case in admitted children & adolescents, and also observed characteristics of symptoms, developmental history, characteristics of abuse style, characteristics of abusers, family dynamics and psychopathology. We hypothesized that all kinds of abuse will influnced to emotional, behavioral problems, developmental courses on victims, interactive effects on family dynamics and psychopathology. That subjects were 22 persons of victims who be determined by clinical observation and clinical note. The results of the study were as follows:1) Demographic characteristics of victims:ratio of sex was 1:6.3(male:female), mean age was $11.1{\pm}2.5$. According to birth order, lst was 12(54.5%), 2nd was 5(23%), 3rd was 2(9%) and only child was 3(13.5%). 2) Characteristics of family:According to socioeconomic status, middle to high class was 3(13.5%), middle one was 9(41.% ), middle to low one was 9(41%), low one was 1(0.5%). according to number of family, under the 3 person was 3(13.5%), 4-5 was 17(77.5%), 6-7 was 2(9%). according to marital status of parents, divorce or seperation were 5(23%), remarriage 2(9%), severe marital discord was 19(86.5%). In father, antisocial behavior was 7(32%), alcohol dependence was 10(45.5%). In mother, alcohol abuse was 5(23%), depression was 17(77.3%), history of psychiatric management was 6(27%). 3) Characteristics of abuse:Physical abuse was 18(81.8%), physical and emotional abuse and neglect were 4(18.2%). according to onset of abuse, before 3 years was 15(54.5%), 3-6 years was 5(27.5%), schooler was 1(15%). Only father offender was 2(19%), only mother offender was 8(35.4%), both offender was 8(35.4%), accompaning with spouse abuse was 7(27%), and accompaning with other sibling abuse was 4(18.2%). 4) General characteristics and developmental history of victims:Unwanted baby was 12(54.5%), developmental delay before abuse was9(41%), comorbid developmental disorder was 15(68%). there were 6(27.5%) who didn‘t show definite sign of developmental delay before abuse. 5) Main diagnosis and comorbid diagnosis:According to main diagnosis, conduct disorder 6(27.3%), borderline child 5(23%), depression4(18%), attention deficit hyperactivity disorder(ADHD) 4(18%), pervasive developmental disorder not otherwise specified 2(9%), selective mutism 1(5%). According to comorbid diagnosis, ADHD, borderline intelligence, mental retardation, learning disorder, developmental language disorder, oppositional defiant disorder, chronic tic disorder, functional enuresis and encoporesis, anxiety disorder, dissociative disorder, personality disorder due to medical condition. 5) Course of treatment:A mean duration of admission was $2.4{\pm}1.5$ months. 11(15%) showed improvement of symtoms, however 11(50%) was not changed of symtoms.

  • PDF

Studies on the Improvement of the Cropping System (I) (작부체계(作付體系) 개선(改善)에 관(關)한 조사연구(調査硏究)(I))

  • Choi, Chang Yeol
    • Korean Journal of Agricultural Science
    • /
    • v.10 no.1
    • /
    • pp.61-73
    • /
    • 1983
  • This study was conducted to obtain fundamental informations on the improvement of cropping system to increase in land utilization rate and crop production. In order to group the characteristics of areas, Chungnam province was classified into 4 classes: Suburb (Daedeog Gun, Cheonwon Gun), Plain (Nonsan Gun, Dangjin Gun) Coastal (Seosan Gun, Boryeong Gun) and Hilly region (Gongju Gun, Cheongyang Gun). 100 farm households were sampled from each region, and cropping system and utilization state of paddy and upland in 1982 were surveyed. The results obtained were summarized as follows: 1. Average utilization rate of upland was 161.9 % The utilization rate of upland at plain was highest (188.9 %), and that at suburb showed lowest value (152.0%). 2. Number of crops cultivated at upland was 32 kinds. Among the rate of planting area of each crop. soybean showed highest rate of 18.8%, barley 15.4%, red-pepper 13.1% and chinese' cabbage 10.1% respectively, but the red pepper showed highest rate of planting area at suburb, the barley at hilly region and the soybean at plain and coastal region. 3. Average utilization rate of paddy was 115.6% and the utilization rate of paddy at suburb showed the highest value (140.0%) and that at coastal region the lowest value (108.2%). 4. 12 kinds of crops were cultivated at paddy before or after rice cultivation. Among the crops cultivated at paddy before or after rice cultivation, barley showed the highest area rate (5.0%) of cultivation and strawberry the next but the strawberry showed the highest area rate of cultivation at suburb and barley at other regions. 5. The cropping systems at upland were divided into single cropping and double cropping. Types of double cropping at upland were classified into 38 types by the combinations of crops. Among the types of double cropping, the rate of cultivation area of soybean after barley combination was 35.0%, but at suburb the rate of this type of cropping system was low and the double cropping of vegetable combinations showed high rate. 6. Types of double cropping at paddy were classified into 6 types. As a whole, double cropping of barley after rice combination showed highest rate of cultivation area (42.8%) among crop combinations but at suburb, the area rate of this type cropping was low and cultivation of fruit vegetable after rice showed highest rate. The area rate of post - cropping to rice was 76.3% of whole double cropping area at paddy and significantly higher than the rate of precropping to rice. 7. Some kinds of crop combinations were consisted of same family or closely related crops and the characteristics of the crop rotation between those crops are almost same. The area cultivated those unreasonable crop combinations were 19.09 ha. 8. At upland, planting area of the cereal crops, vegetale crops and industrial crops crops and industrial crops was 88.92ha, 93.70ha and 21.80ha respectively. The Planting area of cereal crops was significantly less than that of vegetable crops. 9. Most of all the research reports on the cropping system from 1910 to 1980 were about the post cropping after rice harvest. The objectives of researches could be classified into 14 kinds and the important objectives of researches were the planting time, the amounting of manuring, the quantity of seeding, the transplanting time, the ridging method, the sowing method and the variety test.

  • PDF

Clinical Applications and Efficacy of Korean Ginseng (고려인삼의 주요 효능과 그 임상적 응용)

  • Nam, Ki-Yeul
    • Journal of Ginseng Research
    • /
    • v.26 no.3
    • /
    • pp.111-131
    • /
    • 2002
  • Korean ginseng (Panax ginseng C.A. Meyer) received a great deal of attention from the Orient and West as a tonic agent, health food and/or alternative herbal therapeutic agent. However, controversy with respect to scientific evidence on pharmacological effects especially, evaluation of clinical efficacy and the methodological approach still remains to be solved. Author reviewed those articles published since 1980 when pharmacodynamic studies on ginseng have intensively started. Special concern was paid on metabolic disorders including diabetes mellitus, circulatory disorders, malignant tumor, sexual dysfunction, and physical and mental performance to give clear information to those who are interested in pharmacological study of ginseng and to promote its clinical use. With respect to chronic diseases such as diabetes mellitus, atherosclerosis, high blood pressure, malignant disorders, and sexual disorders, it seems that ginseng plays preventive and restorative role rather than therapeutics. Particularly, ginseng plays a significant role in ameliorating subjective symptoms and preventing quality of life from deteriorating by long term exposure of chemical therapeutic agents. Also it seems that the potency of ginseng is mild, therefore it could be more effective when used concomitantly with conventional therapy. Clinical studies on the tonic effect of ginseng on work performance demonstrated that physical and mental dysfunction induced by various stresses are improved by increasing adaptability of physical condition. However, the results obtained from clinical studies cannot be mentioned in the indication, which are variable upon the scientist who performed those studies. In this respect, standardized ginseng product and providing planning of the systematic clinical research in double-blind randomized controlled trials are needed to assess the real efficacy for proposing ginseng indication. Pharmacological mode of action of ginseng has not yet been fully elucidated. Pharmacodynamic and pharmacokinetic researches reveal that the role of ginseng not seem to be confined to a given single organ. It has been known that ginseng plays a beneficial role in such general organs as central nervous, endocrine, metabolic, immune systems, which means ginseng improves general physical and mental conditons. Such multivalent effect of ginseng can be attributed to the main active component of ginseng,ginsenosides or non-saponin compounds which are also recently suggested to be another active ingredients. As is generally the similar case with other herbal medicines, effects of ginseng cannot be attributed as a given single compound or group of components. Diversified ingredients play synergistic or antagonistic role each other and act in harmonized manner. A few cases of adverse effect in clinical uses are reported, however, it is not observed when standardized ginseng products are used and recommended dose was administered. Unfavorable interaction with other drugs has also been suggested, which the information on the products and administered dosage are not available. However, efficacy, safety, interaction or contraindication with other medicines has to be more intensively investigated in order to promote clinical application of ginseng. For example, daily recommended doses per day are not agreement as 1-2g in the West and 3-6 g in the Orient. Duration of administration also seems variable according to the purpose. Two to three months are generally recommended to feel the benefit but time- and dose-dependent effects of ginseng still need to be solved from now on. Furthermore, the effect of ginsenosides transformed by the intestinal microflora, and differential effect associated with ginsenosides content and its composition also should be clinically evaluated in the future. In conclusion, the more wide-spread use of ginseng as a herbal medicine or nutraceutical supplement warrants the more rigorous investigations to assess its effacy and safety. In addition, a careful quality control of ginseng preparations should be done to ensure an acceptable standardization of commercial products.

An Analysis on the Priority of Educational Needs of Teachers in Charge of Educational Contents of Invention Intellectual Property in Secondary Vocational Education (중등단계 직업교육에서의 발명·지식재산 교육내용에 대한 담당 교사의 교육요구도 우선 순위 분석)

  • Lee, Sang-hyun;Lee, Chan-joo;Lee, Byung-Wook
    • 대한공업교육학회지
    • /
    • v.40 no.2
    • /
    • pp.155-174
    • /
    • 2015
  • The purposes of this study were to analyze the property of educational needs of teachers for educational contents of invention and intellectual property in secondary vocational education and provide fundamental data for the development of job training programs so as to develop the capabilities of teachers, the base for effective education of invention intellectual property in secondary vocational education. To achieve them, educational needs for the educational contents of invention intellectual property and the priority of the educational needs in secondary vocational education based on the recognition of the teachers were analyzed and suggested. Concrete results of this study can be suggested as follows. First, the average of educational needs of the teachers for the educational contents of invention intellectual property in secondary vocational education was 5.02. There were 23 items of the educational contents whose educational needs were higher than the average of the whole items and for those items and the average of each item, there were F4(The average of patent applications) 6.72, F5(Modification and supplementation of specification sheets) 6.46, F2(Writing of patent floor plans) 6.39, F3(Writing of patent specification sheets and abstraction) 6.31, A5(Invention method and activity) 6.27, E6(Invention design project) 6.15, H3(Invention commercialization) 5.97, F1(Patent information and application) 5.90, E5(Design obligation) 5.78, E3(Designing process of inventional design) 5.77, A4(Invention and problem solving) 5.57, G2(Patent investigation and classification) 5.47, C2(Thinking method of inventional problem solution) 5.45, E4(Production of inventional design product) 5.45, B5(Inventional patent project) 5.42, A2(Creativity development) 5.26, C4(Inventional problem solving project) 5.26, H4(Invention marketing) 5.26, H2(Analysis on invention commercialization) 5.20, D4(Invention and management) 5.16, C3(Problem solving activity) 5.14, E2(Inventional design devise and expression) 5.11, B3(Actuality of inventional method) 5.08 in order. Second, for the priority of educational needs of the teachers for the educational contents of invention intellectual property in secondary vocational education, there were 13 items of the educational contents for the first rank, 10 for the second rank and 17 for the third rank. The items of the educational contents for the first rank were A4(invention and problem solving), A5(inventional method and activity), B5(Invention patent project), C2(Thinking method of inventional problem solution), C4(Inventional problem solving project), E3(Inventional design process), E4(Production of inventional design product), E5(Design obligation), E6(Invention design project), F1(Patent information and application), F2(Writing of patent floor plan), F3(Writing of patent specification sheet and abstract), and H3(Invention commercialization. The items of the educational contents for the second rank were A2(Creativity development), B3(Actuality of inventional method), C3(Problem solving activity), D4(Invention and management), E2(Invention design devise and expression), F4(Range of patent demand), F5(Modification and supplementation of specification sheet), G2(Patent investigation and classification), H2(Analysis on invention commercialization), and H4(Invention marketing). The items for the third rank were the educational contents except the ones of the first rank and the second rank.

Aesthetics of Samjae and Inequilateral Triangle Found in Ancient Triad of Buddha Carved on Rock - Centering on Formative Characteristics of Triad of Buddha Carved on Rock in Seosan - (고대(古代) 마애삼존불(磨崖三尊佛)에서 찾는 삼재(三才)와 부등변삼각(不等邊三角)의 미학(美學) - 서산마애삼존불의 형식미를 중심으로 -)

  • Rho, Jae-Hyun;Lee, Kyu-Wan;Jang, Il-Young;Goh, Yeo-Bin
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.28 no.3
    • /
    • pp.72-84
    • /
    • 2010
  • This study was attempted in order to offer basic data for implementing and applying Samjonseokjo(三尊石造), which is one of traditional stone construction method, by confirming how the constructive principle is expressed such as proportional beauty, which is contained in the modeling of Triad of Buddha Carved on Rock that was formed in the period of the Three States, centering on Triad of Buddha Carved on Rock in Susan. The summarized findings are as follows. 1. As a result of analyzing size and proportion of totally 17 of Triad of Buddha Carved on Rock, the average total height in Bonjonbul(本尊佛) was 2.96m. Right Hyeopsi(右挾侍) was 2.19m. Left Hyeopsi(左挾侍) was 2.16m. The height ratio according to this was 100:75:75, thereby having shown the relationship of left-right symmetrical balance. The area ratio in left-right Hyeopsi was 13.4:13.7, thereby the two area having been evenly matched. 2. The Triad of Buddha Carved on Rock in Seosan is carved on Inam(印岩) rock after crossing over Sambulgyo bridge of the Yonghyeon valley. Left direction was measured with $S47^{\circ}E$ in an angle of direction. This is judged to target an image change and an aesthetic sense in a Buddhist statue according to direction of sunlight while blocking worshipers' dazzling. 3. As for iconic characteristics of Buddha Carved on Rock in Seosan, there is even Hyeopsi in Bangasang(半跏像) and Bongjiboju(捧持寶珠) type Bosangipsang. In the face of Samjon composition in left-right asymmetry, the unification is indicated while the same line and shape are repeated. Thus, the stably visual balance is being shown. 4. In case of Triad of Buddha Carved on Rock in Seosan, total height in Bonjonbul, left Hyeopsi, and right Hyeopsi was 2.80m, 1.66m, and 1.70m, respectively. Height ratio in left-right Hyeopsibul was 0.60:0.62, thereby having been almost equal. On the other hand, the area ratio was 28.8:25.2, thereby having shown bigger difference. The area ratio on a plane was grasped to come closer to Samjae aesthetic proportion. 5. The axial angle of centering on Gwangbae was 84:46:50, thereby having been close to right angle. On the other hand, the axial angle ratio of centering on Yeonhwajwa(蓮華坐: lotus position) was measured to be 135:25:20, thereby having shown the form of inequilateral triangle close to obtuse angle. Accordingly, the upper part and the lower part of Triad of Buddha Carved on Rock in Susan are taking the stably proportional sense in the middle of maintaining the corresponding relationship through angular proportion of inequilateral triangle in right angle and obtuse angle. 6. The distance ratio in the upper half was 0.51:0.36:0.38. On the other hand, the distance ratio in the lower half was 0.53 : 0.33 : 0.27. Thus, the up-down and left-right symmetrical balance is being formed while showing the image closer to inequilateral triangle. 7. As a result of examining relationship of Samjae-mi(三才美) targeting Triad of Buddha Carved on Rock in Susan, the angular ratio was shown to be more notable that forms the area ratio or triangular form rather than length ratio. The inequilateral triangle, which is formed centering on Gwangbae(光背) in the upper part and Yeonhwajwa(lotus position) in the lower part, is becoming very importantly internal motive of doubling the constructive beauty among Samjae, no less than the mutually height and area ratio in Samjonbul.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.