• Title/Summary/Keyword: Third-order

Search Result 6,165, Processing Time 0.04 seconds

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

Mesothermal Gold Mineralization in the Boseong-Jangheung area, Chollanamdo-province (전라남도 보성-장흥지역의 중열수 금광화작용)

  • 허철호;윤성택;소칠섭
    • Economic and Environmental Geology
    • /
    • v.35 no.5
    • /
    • pp.379-393
    • /
    • 2002
  • Within the Boseong-Jangheung area of Korea, five hydrothermal gold (-silver) quartz vein deposits occur. They have the characteristic features as follows: the relatively gold-rich nature of e1ectrurns; the absence of Ag-Sb( -As) sulfosalt mineral; the massive and simple mineralogy of veins. They suggest that gold mineralization in this area is correlated with late Jurassic to Early Cretaceous, mesothermal-type gold deposits in Korea. Fluid inclusion data show that fluid inclusions in stage I quartz of the mine area homogenize over a wide temperature range of 200$^{\circ}$ to 460$^{\circ}$C with salinities of 0.0 to 13.8 equiv. wt. % NaCI. The homogenization temperature of fluid inclusions in stage II calcite of the mine area ranges from 150$^{\circ}$ to 254$^{\circ}$C with salinities of 1.2 to 7.9 equiv. wt. % NaCI. This indicates a cooling of the hydrothermal fluid with time towards the waning of hydrothermal activity. Evidence of fluid boiling including CO2 effervescence indicates that pressures during entrapment of auriferous fluids in this area range up to 770 bars. Calculated sulfur isotope composition of auriferous fluids in this mine area (${\delta}^34S$_{{\Sigma}S}$$\textperthousand$) indicates an igneous source of sulfur in auriferous hydrothermal fluids. Within the Sobaegsan Massif, two representative mesothermal-type gold mine areas (Youngdong and Boseong-Jangheung areas) occur. The ${\delta}^34S values of sulfide minerals from Youngdong area range from -6.6 to 2.3$\textperthousand$ (average=-1.4$\textperthousand$, N=66), and those from BoseongJangheung area range from -0.7 to 3.6$\textperthousand$ (average=1.6$\textperthousand$, N=39). These i)34S values of both areas are comparatively lower than those of most Korean metallic ore deposits (3 to 7TEX>$\textperthousand$). And, within the Sobaegsan Massif, the ${\delta}^34S values of Youngdong area are lower than those of Boseong-Jangheung area. It is inferred that the difference of ${\delta}^34S values within the Sobaegsan Massif can be caused by either of the following mechanisms: (1) the presence of at least two distinct reservoirs (both igneous, with ${\delta}^34S values of < -6 $\textperthousand$ and 2$\pm$2 %0) for Jurassic mesothermal-type gold deposits in both areas; (2) different degrees of the mixing (assimilation) of 32S-enriched sulfur (possibly sulfur in Precambrian pelitic basement rocks) during the generation and/or subsequent ascent of magma; and/or (3) different degrees of the oxidation of an H2S-rich, magmatically derived sulfur source ${\delta}^34S = 2$\pm$2$\textperthousand$) during the ascent to mineralization sites. According to the observed differences in ore mineralogy (especially, iron-bearing ore minerals) and fluid inclusions of quartz from the mesothermal-type deposits in both areas, we conclude that pyrrhotite-rich, mesothermal-type deposits in the Youngdong area formed from higher temperatures and more reducing fluids than did pyrite(-arsenopyrite)-rich mesothermal-type deposits in the Boseong-Jangheung area. Therefore, we prefer the third mechanism than others because the ${\delta}^34S values of the Precambrian gneisses and Paleozoic sedimentary rocks occurring in both areas were not known to the present. In future, in order to elucidate the provenance of ore sulfur more systematically, we need to determine ${\delta}^34S values of the Precambrian metamorphic rocks and Paleozoic sedimentary rocks consisting the basement of the Korean Peninsula including the Sobaegsan Massif.

An Analysis of the Home Economics Education Discipline Items in the Teacher Recruitment Examination for Secondary School (중등교사 신규임용 후보자 선정 경쟁시험 가정과 교과교육학 출제 문항 분석)

  • Kim, Sung-Sook;Chae, Jung-Hyun
    • Journal of Korean Home Economics Education Association
    • /
    • v.19 no.3
    • /
    • pp.149-168
    • /
    • 2007
  • The purpose of this study was to analyze the home economics education items in the teacher recruitment examination for secondary school. To achieve the purpose, all the home economics education items, which were carried out for seven times from the school year 2001 to the most recent year 2007, were compared and analyzed. The form of items was analyzed by frequency and rate. Behavioral domain of items was analyzed by content analysis. In this study, some recommendations were suggested for the quality of home economics education items through discussion of science education and society education items, which were abstracted from the school year 2001 to the most recent year 2007. The results of this study were as follows. First, the score ratio of home economics education items was fluid as 20-30% from the school year 2001 to 2004 but it fixed as 30-35% since the school year 2005. In subcategory of home economics education, curriculum items accounted for highest ratio(43%). In the next thing, items of teaching-learning method(35%), evaluation(19%) and philosophy(3%) related to home economics education were followed in order. Second, the form of home economics education items was coexistent form of single item and subordinate item from the school year 2001 to 2004. But it was changed into form of single item by 100% since the school year 2005. Third, regarding the content of home economics education items, most of the curriculum items were related to the content of the 7th National Curriculum. Teaching-learning method items were taken mostly from model of teaching-learning. Evaluation items were taken mostly from performance assessment. Philosophy items related to home economics education were taken only from Habermas's three systems of action on the school year 2005. Fourth, about behavioral domain of home economics education items, most of the curriculum items were level of 'simple knowledge or memory'. Therefore, it was suggested that behavioral domain of curriculum items had to be changed into 'complex knowledge or comprehension and application'. The behavioral domain of teaching-learning method items and education evaluation items was mostly 'complex knowledge or comprehension and application'. However, to bettering the items it was suggested that the behavioral domain of them has to be changed 'comprehension' into more 'application'. Fifth, regarding the coverage of home economics education items, curriculum items were limited only superficial content of the 7th National Curriculum. Therefore, it was suggested that coverage of curriculum items had to be extended to theoretical content, which was philosophical background and various principles of curriculum. It was suggested that coverage of teaching-learning method items had to be extended to the content including various teaching-learning theories and the practical reasoning home economics instruction proved effective as home economics instruction recently. Evaluation items were taken mostly from performance assessment. Therefore, it was suggested that coverage of evaluation items had to be extended to analysis of evaluation result, item validity and reliability, and evaluator's philosophical perspective.

  • PDF

Lower Lung Field Tuberculosis (폐 하야 결핵)

  • Moon, Doo-Seop;Lim, Byung-Sung;Kim, Yeon-Soo;Kim, Seong-Min;Lee, Jae-Young;Lee, Dong-Suck;Sohn, Jang-Won;Lee, Kyung-Sang;Yang, Suck-Chul;Yoon, Ho-Joo;Shin, Dong-Ho;Park, Sung-Soo;Lee, Jung-Hee
    • Tuberculosis and Respiratory Diseases
    • /
    • v.44 no.2
    • /
    • pp.232-240
    • /
    • 1997
  • Background : Postprimary pulmonary tuberculosis is located mainly in upper lobes. The tuberculous lesion involving the lower lobes usually arises from the upper lobe cavity through endobronchial spread. When tuberculosis is confined to the lower lung field, it often masquerades as pneumonia, lung cancer, bronchiectasis, or lung abscess. Thus the correct diagnosis may be sometimes delayed for a long time. Methods : We carried out, retrospectively, a clinical study on 50 patients confirmed with lower lung field tuberculosis who visited the Department of Pulmonary Medicine at Hanyang University Hospital from January 1992 to December 1994. The following results were obtained. Results : Lower lung field tuberculosis without concomitant upper lobe disease occurred in fifty patients representing 6.9% of the total admission with active pulmonary tuberculosis over a period of 3 years. It occurred most frequently in the third decade but age distribution was relatively even. The mean age was 43 years old. Female was more frequently affected than male (male to female ratio 1 : 1.9). The most common symptom was cough(68%), followed by sputum(52%), fever(38%), and chest discomfort(30%). On chest X-ray of the 50patients, consolidation was the most common finding in 52%, followed by solitary nodule(22%) collapse(16%), cavitary lesion(10%), in decreasing order. The disease confined to the right side in 25 cases, left side 20 cases, and both sides 5 cases. Endobronchial tuberculosis (1) Endobronchial involvement was proved by bronchoscopic examination in 20 of 50patients. (2) Mean age was 44years old and female was more affected than man (male to female ratio 1 : 3). Sputum AFB stain and Mycobacterium tuberculosis culture were positive only in 50% of cases unlikely upper lobe tuberculosis, additional diagnostic methods were needed. In our study, bronchoscopic examination and percutaneous fine needle aspiration biopsy increased diagnostic yield by 18% and 32%, respectively. The most common associated condition was diabetes mellitus(18%) and others were anemia, anorexia nervosa, stomach cancer, and systemic steroid usage. Conclusion : When we find a lower lung field lesion, we should suspect tuberculosis if the patient has diabetes mellitus, anemia, systemic steroid usage, malignancy or other immune suppressed states. Because diagnostic yield of sputum AFB smear & Mycobacterium tuberculosis culture was low, additional diagnostic methods such as bronchoscopy and fine needle aspiration biopsy were needed.

  • PDF

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

  • Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.79-96
    • /
    • 2012
  • As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

A Study on Corrosion according to Distance between Amalgam and Dissimilar Metals (아말감과 이종(異種)금속의 거리에 따른 부식에 대한 고찰)

  • Kim, Ju-won;Jeong, Eun-gyeong
    • Journal of dental hygiene science
    • /
    • v.4 no.3
    • /
    • pp.103-109
    • /
    • 2004
  • The present study prepared 72 test samples - 24 made of amalgam alloy, 24 of Verabond (Ni-Cr alloy) for crown and 24 of Talladium $^{TM}alloy$ for denture - according to the manufacturers' manuals and general method in consideration of the width of the mesial-distal dental crown of the lower $1^{st}$ molar and MOD cavity in clinics, put them in a 200 ml beaker containing 80 ml of artificial saliva, and measured their galvanic corrosion at distances of 0 mm, 7 mm and 40 mm after 7 days. Isolated metals in the electrolyte such as Cu, Ag, Ni, Cr, Sn, Zn and Hg were quantitatively analyzed with Inductively Coupled Plasma - Atomic Emission Spectrometer (ICP-AES, JY-50P, VG Elemental Co. France), and from the results were drawn conclusions as follows. First, Cu, Sn, Ag, Hg and Zn were highly advantageous when amalgam contacted gold alloy compared to Ni-Cr alloy for crown and Talladium alloy for denture. In addition, although gold alloy was finest in terms of oral tissue and biocompatibility, it was most disadvantageous when it was with amalgam. Second, when amalgam contacted gold alloy, heavy metals such as Ni and Cr were not isolated at all because gold alloy did not contain such elements but Sn was isolated as much as $227.1{\pm}18.0035{\mu}g/cm^2$ although it was not included in the composition either. Hg was also isolated. These elements are assumed to have been isolated from amalgam itself. Third, when amalgam alloy was apart from gold alloy 0 mm, 7 mm and 40 mm, Cu and Ag showed significance but Hg did not. This suggests that gold alloy must not be used together with amalgam, and must not be used between dissimilar prostheses regardless of distance. Fourth, when amalgam alloy contacted Ni-Cr alloy for crown, Ag was not isolated from the amalgam, but Zn, Ni, Sn, Hg and Cu were isolated in order of quantity. Significance was observed according to distance - 0 mm, 7 mm and 40 mm. Hg was not isolated but heavy metals Ni and Cr were isolated. If amalgam alloy was in the opposite arch or it was apart from Ni-Cr alloy for crown, the isolation Hg was less than that when amalgam alloy contacted Ni-Cr alloy for crown. Fifth, when amalgam alloy contacted Talladium alloy for denture, significance was observed at distances of 0mm, 7 mm and 40 mm. Hg was not isolated but heavy metals Ni and Cr were isolated. If amalgam alloy was in the opposite arch or it was apart from Talladium alloy for denture, the isolation Hg was less than that when amalgam alloy contacted Talladium alloy for denture. Sixth, according to the result of ICPES test on Cu, Sn, Ag, Hg, Zn, Ni and Cr of amalgam alloy, gold ally, Verabond and Talladium alloy when these alloys contacted artificial saliva, significance was observed in Cu and Hg. Seventh, when amalgam alloy contracted two non-precious metals Ni-Cr alloy for crown and Talladium alloy for denture in artificial saliva, significance was observed in the isolated by-products of Hg, Ni and Cr according to distance.

  • PDF

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

An Empirical Study on Motivation Factors and Reward Structure for User's Createve Contents Generation: Focusing on the Mediating Effect of Commitment (창의적인 UCC 제작에 영향을 미치는 동기 및 보상 체계에 대한 연구: 몰입에 매개 효과를 중심으로)

  • Kim, Jin-Woo;Yang, Seung-Hwa;Lim, Seong-Taek;Lee, In-Seong
    • Asia pacific journal of information systems
    • /
    • v.20 no.1
    • /
    • pp.141-170
    • /
    • 2010
  • User created content (UCC) is created and shared by common users on line. From the user's perspective, the increase of UCCs has led to an expansion of alternative means of communications, while from the business perspective UCCs have formed an environment in which an abundant amount of new contents can be produced. Despite outward quantitative growth, however, many aspects of UCCs do not meet the expectations of general users in terms of quality, and this can be observed through pirated contents and user-copied contents. The purpose of this research is to investigate effective methods for fostering production of creative user-generated content. This study proposes two core elements, namely, reward and motivation, which are believed to enhance content creativity as well as the mediating factor and users' committement, which will be effective for bridging the increasing motivation and content creativity. Based on this perspective, this research takes an in-depth look at issues related to constructing the dimensions of reward and motivation in UCC services for creative content product, which are identified in three phases. First, three dimensions of rewards have been proposed: task dimension, social dimension, and organizational dimention. The task dimension rewards are related to the inherent characteristics of a task such as writing blog articles and pasting photos. Four concrete ways of providing task-related rewards in UCC environments are suggested in this study, which include skill variety, task significance, task identity, and autonomy. The social dimensioni rewards are related to the connected relationships among users. The organizational dimension consists of monetary payoff and recognition from others. Second, the two types of motivations are suggested to be affected by the diverse rewards schemes: intrinsic motivation and extrinsic motivation. Intrinsic motivation occurs when people create new UCC contents for its' own sake, whereas extrinsic motivation occurs when people create new contents for other purposes such as fame and money. Third, commitments are suggested to work as important mediating variables between motivation and content creativity. We believe commitments are especially important in online environments because they have been found to exert stronger impacts on the Internet users than other relevant factors do. Two types of commitments are suggested in this study: emotional commitment and continuity commitment. Finally, content creativity is proposed as the final dependent variable in this study. We provide a systematic method to measure the creativity of UCC content based on the prior studies in creativity measurement. The method includes expert evaluation of blog pages posted by the Internet users. In order to test the theoretical model of our study, 133 active blog users were recruited to participate in a group discussion as well as a survey. They were asked to fill out a questionnaire on their commitment, motivation and rewards of creating UCC contents. At the same time, their creativity was measured by independent experts using Torrance Tests of Creative Thinking. Finally, two independent users visited the study participants' blog pages and evaluated their content creativity using the Creative Products Semantic Scale. All the data were compiled and analyzed through structural equation modeling. We first conducted a confirmatory factor analysis to validate the measurement model of our research. It was found that measures used in our study satisfied the requirement of reliability, convergent validity as well as discriminant validity. Given the fact that our measurement model is valid and reliable, we proceeded to conduct a structural model analysis. The results indicated that all the variables in our model had higher than necessary explanatory powers in terms of R-square values. The study results identified several important reward shemes. First of all, skill variety, task importance, task identity, and automony were all found to have significant influences on the intrinsic motivation of creating UCC contents. Also, the relationship with other users was found to have strong influences upon both intrinsic and extrinsic motivation. Finally, the opportunity to get recognition for their UCC work was found to have a significant impact on the extrinsic motivation of UCC users. However, different from our expectation, monetary compensation was found not to have a significant impact on the extrinsic motivation. It was also found that commitment was an important mediating factor in UCC environment between motivation and content creativity. A more fully mediating model was found to have the highest explanation power compared to no-mediation or partially mediated models. This paper ends with implications of the study results. First, from the theoretical perspective this study proposes and empirically validates the commitment as an important mediating factor between motivation and content creativity. This result reflects the characteristics of online environment in which the UCC creation activities occur voluntarily. Second, from the practical perspective this study proposes several concrete reward factors that are germane to the UCC environment, and their effectiveness to the content creativity is estimated. In addition to the quantitive results of relative importance of the reward factrs, this study also proposes concrete ways to provide the rewards in the UCC environment based on the FGI data that are collected after our participants finish asnwering survey questions. Finally, from the methodological perspective, this study suggests and implements a way to measure the UCC content creativity independently from the content generators' creativity, which can be used later by future research on UCC creativity. In sum, this study proposes and validates important reward features and their relations to the motivation, commitment, and the content creativity in UCC environment, which is believed to be one of the most important factors for the success of UCC and Web 2.0. As such, this study can provide significant theoretical as well as practical bases for fostering creativity in UCC contents.