• Title/Summary/Keyword: complex training

Search Result 575, Processing Time 0.045 seconds

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

A Study on the Realities and the Subject of Environmental Management for Small and Medium-Sized Companies in Gangwon Area (강원지역 중소기업의 환경경영 실태와 과제)

  • Jeon, Yeong-Seung;Park, Eun-Jeong
    • Korean Business Review
    • /
    • v.17
    • /
    • pp.53-81
    • /
    • 2004
  • The purpose of this study is to understand the realities and the subject of environmental management for small and medium-sized companies in Gwangwon area, through surveying the present status as to acquiring the certification of ISO14001, and to seek for a plan to facilitate environmental management. Given summarizing key results, those are as follows. First, while the number of companies in our country which acquired the certification of ISO14001, amounts to 1,215 businesses as of April of 2003, the number of small and medium-sized companies in Gwangwon area which obtained the certification of ISO14001 reached only 26 businesses, the lowest level among metropolitan municipalities. Second, for the reason that companies who didn't acquire the certification, strive not to receive the certification, it did present the point that' costs to be needed in acquiring and maintaining the certification are larger than practical benefit. Third, the biggest reason for either companies which did not acquire the certification of ISO14001 or companies which did (try to) acquire the certification of ISO1400, was, enhancement of a corporate image,' and the effect after a company who obtained the certification introduced the environmental management system, was also shown to be 'the improvement of a corporate image.' Fourth, many companies who acquired the certification of ISO1400 pointed out the response related to 'burden on document creation and costs' and 'lack of manpower' as problems when introducing the environmental management system. On the basis of major results of a study as the above, given presenting the subject and a plan for activating the environmental management of small and medium-sized companies in Gwangwon area, those are as follows. First, because most of companies who did not obtain the certification of ISO1400 have low recognition of ISO14001, it needs continuous and positive publicity, education and a training system. Second, it requires to carry out an educational program to nurture professional manpower due to lack of manpower relevant to environmental management, to expand payment of subsidies, to open exclusive-charge department and consulting contact, to have the relevant information be database and to develop software. Third, in order to make the certification obtained through inexpensive costs and simple procedures, it needs to positively consider the creation of public approval system for a small and medium-sized company, group approval system, industrial-complex approval system, and others.

  • PDF

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

A Study on Integrated Logistic Support (통합병참지원에 관한 연구)

  • 나명환;김종걸;이낙영;권영일;홍연웅;전영록
    • Proceedings of the Korean Reliability Society Conference
    • /
    • 2001.06a
    • /
    • pp.277-278
    • /
    • 2001
  • The successful operation of a product In service depends upon the effective provision of logistic support in order to achieve and maintain the required levels of performance and customer satisfaction. Logistic support encompasses the activities and facilities required to maintain a product (hardware and software) in service. Logistic support covers maintenance, manpower and personnel, training, spares, technical documentation and packaging handling, storage and transportation and support facilities.The cost of logistic support is often a major contributor to the Life Cycle Cost (LCC) of a product and increasingly customers are making purchase decisions based on lifecycle cost rather than initial purchase price alone. Logistic support considerations can therefore have a major impact on product sales by ensuring that the product can be easily maintained at a reasonable cost and that all the necessary facilities have been provided to fully support the product in the field so that it meets the required availability. Quantification of support costs allows the manufacturer to estimate the support cost elements and evaluate possible warranty costs. This reduces risk and allows support costs to be set at competitive rates.Integrated Logistic Support (ILS) is a management method by which all the logistic support services required by a customer can be brought together in a structured way and In harmony with a product. In essence the application of ILS:- causes logistic support considerations to be integrated into product design;- develops logistic support arrangements that are consistently related to the design and to each other;- provides the necessary logistic support at the beginning and during customer use at optimum cost.The method by which ILS achieves much of the above is through the application of Logistic Support Analysis (LSA). This is a series of support analysis tasks that are performed throughout the design process in order to ensure that the product can be supported efficiently In accordance with the requirements of the customer.The successful application of ILS will result in a number of customer and supplier benefits. These should include some or all of the following:- greater product uptime;- fewer product modifications due to supportability deficiencies and hence less supplier rework;- better adherence to production schedules in process plants through reduced maintenance, better support;- lower supplier product costs;- Bower customer support costs;- better visibility of support costs;- reduced product LCC;- a better and more saleable product;- Improved safety;- increased overall customer satisfaction;- increased product purchases;- potential for purchase or upgrade of the product sooner through customer savings on support of current product.ILS should be an integral part of the total management process with an on-going improvement activity using monitoring of achieved performance to tailor existing support and influence future design activities. For many years, ILS was predominantly applied to military procurement, primarily using standards generated by the US Government Department of Defense (DoD). The military standards refer to specialized government infrastructures and are too complex for commercial application. The methods and benefits of ILS, however, have potential for much wider application in commercial and civilian use. The concept of ILS is simple and depends on a structured procedure that assures that logistic aspects are fully considered throughout the design and development phases of a product, in close cooperation with the designers. The ability to effectively support the product is given equal weight to performance and is fully considered in relation to its cost.The application of ILS provides improvements in availability, maintenance support and longterm 3ogistic cost savings. Logistic costs are significant through the life of a system and can often amount to many times the initial purchase cost of the system.This study provides guidance on the minimum activities necessary to Implement effective ILS for a wide range of commercial suppliers. The guide supplements IEC60106-4, Guide on maintainability of equipment Part 4: Section Eight maintenance and maintenance support planning, which emphasizes the maintenance aspects of the support requirements and refers to other existing standards where appropriate. The use of Reliability and Maintainability studies is also mentioned in this study, as R&M is an important interface area to ILS.

  • PDF

NUI/NUX of the Virtual Monitor Concept using the Concentration Indicator and the User's Physical Features (사용자의 신체적 특징과 뇌파 집중 지수를 이용한 가상 모니터 개념의 NUI/NUX)

  • Jeon, Chang-hyun;Ahn, So-young;Shin, Dong-il;Shin, Dong-kyoo
    • Journal of Internet Computing and Services
    • /
    • v.16 no.6
    • /
    • pp.11-21
    • /
    • 2015
  • As growing interest in Human-Computer Interaction(HCI), research on HCI has been actively conducted. Also with that, research on Natural User Interface/Natural User eXperience(NUI/NUX) that uses user's gesture and voice has been actively conducted. In case of NUI/NUX, it needs recognition algorithm such as gesture recognition or voice recognition. However these recognition algorithms have weakness because their implementation is complex and a lot of time are needed in training because they have to go through steps including preprocessing, normalization, feature extraction. Recently, Kinect is launched by Microsoft as NUI/NUX development tool which attracts people's attention, and studies using Kinect has been conducted. The authors of this paper implemented hand-mouse interface with outstanding intuitiveness using the physical features of a user in a previous study. However, there are weaknesses such as unnatural movement of mouse and low accuracy of mouse functions. In this study, we designed and implemented a hand mouse interface which introduce a new concept called 'Virtual monitor' extracting user's physical features through Kinect in real-time. Virtual monitor means virtual space that can be controlled by hand mouse. It is possible that the coordinate on virtual monitor is accurately mapped onto the coordinate on real monitor. Hand-mouse interface based on virtual monitor concept maintains outstanding intuitiveness that is strength of the previous study and enhance accuracy of mouse functions. Further, we increased accuracy of the interface by recognizing user's unnecessary actions using his concentration indicator from his encephalogram(EEG) data. In order to evaluate intuitiveness and accuracy of the interface, we experimented it for 50 people from 10s to 50s. As the result of intuitiveness experiment, 84% of subjects learned how to use it within 1 minute. Also, as the result of accuracy experiment, accuracy of mouse functions (drag(80.4%), click(80%), double-click(76.7%)) is shown. The intuitiveness and accuracy of the proposed hand-mouse interface is checked through experiment, this is expected to be a good example of the interface for controlling the system by hand in the future.

Analysis on On-line Q&A Cases regarding Landscape Trees Management - Focused on Online Consultation Board at Tree Diagnostic Center - (조경수 관리에 관한 온라인 질의응답 사례 분석 - 수목진단센터 온라인 상담 사례를 대상으로 -)

  • Lim, Byoung-Eul;Lee, Sae-Hee
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.41 no.1
    • /
    • pp.44-50
    • /
    • 2013
  • The persons in charge of management request diagnosis and prescription to tree hospitals in order to get consultation about the problems like blight that occur in landscape tree management. This study aims to analyze what the main problems and questions raised by landscape gardeners are and those concerned in landscape tree management. This is done by investigating landscape tree-related questions and answers uploaded on the online consultation boards of the plant diagnostic centers approved in Korea including the Seoul National University Plant Clinic, the Chungbuk National University Plant Hospital, and the Kangwon Diagnostic Center. As a result, those concerned in landscape occupied the most as 81.4% among the questioners. However, only 11.5% did explain the plant management history or surrounding environment, which is essential for landscape tree diagnosis when asking questions. This shows that those concerned in landscape lack basic knowledge or interest about plant diagnosis. Among 263 questions about landscape trees, questions about physiological damage included 94 cases that were the most taking up 35.8%. Moreover, the next were damage by insects and damage by disease in order. It is thought that due to the characteristics of physiological problems that occur by various sorts of stress and with no signs, they tend to request diagnosis or prescription the most. The most frequent reasons for physiological damage are water stress and temperature stress. About damage by disease, there exist many types of diseases, and there are many complex damages accompanied by physiological causes. About damage by insects, the most common include damage by moths. In consideration of this result, universities or technician training centers should provide education for landscape tree management so that landscape technicians and students can acquire essential knowledge and information about landscape tree management and increase their interest in it. In particular, it is necessary to provide profound learning opportunities for plant physiology, and the technicians should make efforts themselves. In addition, it is needed to build organizations to which they can ask technical questions about landscape planting and management in order to understand landscape industry in general and the actual status of landscape planting technique and the actual field. Moreover, to elevate systemicity and expertise in the area of landscape tree management not yet equipped with the foundation, it is needed to cultivate the technicians intensively and conduct research by those concerned both in academic and industrial circles.

The Stakeholder's Response and Future of Mountain Community Development Program in Rep. of Korea (한국 산촌개발사업에 대한 이해관계자의 의식과 향후 발전방안)

  • Yoo, Byoung Il;Kim, So Heui;Seo, Jeong-Weon
    • Journal of Korean Society of Forest Science
    • /
    • v.94 no.4 s.161
    • /
    • pp.214-225
    • /
    • 2005
  • The mountain village development program in Korea started in the mountain villages, the 45.9% of total land and one of the typical marginal region, from 1995 to achieve the equilibrium development of national land and the sustainable mountain development in Chapter 13 in Agenda 21, and it has been accelerated to increase the happiness and the quality of life of mountain community residents through the expansion by province and the improvement of related laws and regulations. This study has been aimed to analyze the response of main stakeholder's -mountain village residents and local government officials - on mountain villages development, and to provide the future plan as community development. The survey and interview data were collected from the mountain villages which already developed 59 villages and developing 15 villages in 2003. The mountain village development program has achieved the positive aspects as community development plan in the several fields, - the voluntary participation of residents, the establishment of self-support spirit as the democratic civilians, the development of base of income increasement, the creation of comfortable living environment, the equilibrium development with the other regions. Especially the mountain residents and local government officials both highly satisfy with the development of base of income increasement and the creation of comfortable living environment which are the main concerns to both stakeholder. However through the mountain development program, it is not satisfied to increase the maintenance of local community and the strengthening of traditional value of mountain villages. Also to improve the sustainable income improvement effects, it is necessary to develop the income items and technical extension which good for the each region. In the decentralization era, it is necessary for local government should have the more active and multilateral activities for these. With this, the introduction of methods which the mountain community people and the local government officials could co-participate in the mountain villages' development from the initial stages and the renovation of related local government organizations and the cooperatives will be much helpful to the substantiality of mountain development program. Also it is essential for the assistance of central government to establish the complex plan and the mountain villages network for all mountain area and the exchange of information, the education and training of mountain villages leader who are the core factor for the developed mountain villages maintenance, the composition of national mountain villages representatives. In case the development proposals which based on the interests of the main stakeholder's on mountain community could be positively accepted, then the possibility of the mountain village development as one of community development will be successfully improved in future.

The Characteristics and Landscape Meanings of Letters Carved on the Rocks of Mt. Sangdu (상두선(象頭山) 바위글씨의 특징과 경관의미)

  • Rho, Jae-Hyun;Lee, Jung-Han;Huh, Joon;Kim, Jeong-Moon
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.30 no.2
    • /
    • pp.1-13
    • /
    • 2012
  • This study aimed at learning the values and meanings of the letters carved on the rocks all over Mt. Sangdu located at the boundary between Kimje-si and Jeongeup-si of Jeollabuk-do by grasping the current state of them, investigating the patterns and contents of them, and understanding the spatial and landscape properties of the region where the rocks are scattered. The results of this study are as follows; The name of Mt. Sangdu came from the mountain with the same name located in India where Buddha were seeking the truth, and means auspicious. With the recognition of ancient maps and books, various propitious spots also made the landscape symbols of Mt. Sangdu solidify. Whoam, Chaangsuk-Kim, Weolgye Young-Cho Song and the members of Cheonggye Society like Dongcho Seok-Gon Kim led the creation of the rocks, and the 41 letter-carved rocks all over four water systems were found out and all of them were carved with Chinese characters. The letters were usually carved on flat and broad rocks, and they mainly had the shape of a small waterfall and a wide waterfall of under 1 meter height. 25(60.9%) of the carved letters were about moral training, and it seemed that they wanted to protect their pride under the shackle of the Japanese colonization over Korea. The styles of handwriting are Hangseo and Jeonseo except for names, and show various and complex styles. The mix composition of the carved letters of 'Yusubulbu(流水不腐)' of Choseo and the rocks of Takjok(濯足) is extraordinary, and the letters carved as the shape of Nakkwan(落款) have artistic value and degree of finishing. It seemed that intellectuals during the Japanese colonization over Korea in the 1930s considered Mt. Sangduasa highly valuable region because they expressed their hope and wish for the new world on the rocks. The letters on the rocks of Mt. Sangdu are invaluable cultural landscaping elements for the improvement of landscaping symbolism of Mt. Sangdu because of colliding values and spirits of the time of 'the anguish and pain of intellectuals' and 'the status of living joyfully outside of the mundane world.'

A Study on Users' Resistance toward ERP in the Pre-adoption Context (ERP 도입 전 구성원의 저항)

  • Park, Jae-Sung;Cho, Yong-Soo;Koh, Joon
    • Asia pacific journal of information systems
    • /
    • v.19 no.4
    • /
    • pp.77-100
    • /
    • 2009
  • Information Systems (IS) is an essential tool for any organizations. The last decade has seen an increasing body of knowledge on IS usage. Yet, IS often fails because of its misuse or non-use. In general, decisions regarding the selection of a system, which involve the evaluation of many IS vendors and an enormous initial investment, are made not through the consensus of employees but through the top-down decision making by top managers. In situations where the selected system does not satisfy the needs of the employees, the forced use of the selected IS will only result in their resistance to it. Many organizations have been either integrating dispersed legacy systems such as archipelago or adopting a new ERP (Enterprise Resource Planning) system to enhance employee efficiency. This study examines user resistance prior to the adoption of the selected IS or ERP system. As such, this study identifies the importance of managing organizational resistance that may appear in the pre-adoption context of an integrated IS or ERP system, explores key factors influencing user resistance, and investigates how prior experience with other integrated IS or ERP systems may change the relationship between the affecting factors and user resistance. This study focuses on organizational members' resistance and the affecting factors in the pre-adoption context of an integrated IS or ERP system rather than in the context of an ERP adoption itself or ERP post-adoption. Based on prior literature, this study proposes a research model that considers six key variables, including perceived benefit, system complexity, fitness with existing tasks, attitude toward change, the psychological reactance trait, and perceived IT competence. They are considered as independent variables affecting user resistance toward an integrated IS or ERP system. This study also introduces the concept of prior experience (i.e., whether a user has prior experience with an integrated IS or ERP system) as a moderating variable to examine the impact of perceived benefit and attitude toward change in user resistance. As such, we propose eight hypotheses with respect to the model. For the empirical validation of the hypotheses, we developed relevant instruments for each research variable based on prior literature and surveyed 95 professional researchers and the administrative staff of the Korea Photonics Technology Institute (KOPTI). We examined the organizational characteristics of KOPTI, the reasons behind their adoption of an ERP system, process changes caused by the introduction of the system, and employees' resistance/attitude toward the system at the time of the introduction. The results of the multiple regression analysis suggest that, among the six variables, perceived benefit, complexity, attitude toward change, and the psychological reactance trait significantly influence user resistance. These results further suggest that top management should manage the psychological states of their employees in order to minimize their resistance to the forced IS, even in the new system pre-adoption context. In addition, the moderating variable-prior experience was found to change the strength of the relationship between attitude toward change and system resistance. That is, the effect of attitude toward change in user resistance was significantly stronger in those with prior experience than those with no prior experience. This result implies that those with prior experience should be identified and provided with some type of attitude training or change management programs to minimize their resistance to the adoption of a system. This study contributes to the IS field by providing practical implications for IS practitioners. This study identifies system resistance stimuli of users, focusing on the pre-adoption context in a forced ERP system environment. We have empirically validated the proposed research model by examining several significant factors affecting user resistance against the adoption of an ERP system. In particular, we find a clear and significant role of the moderating variable, prior ERP usage experience, in the relationship between the affecting factors and user resistance. The results of the study suggest the importance of appropriately managing the factors that affect user resistance in organizations that plan to introduce a new ERP system or integrate legacy systems. Moreover, this study offers to practitioners several specific strategies (in particular, the categorization of users by their prior usage experience) for alleviating the resistant behaviors of users in the process of the ERP adoption before a system becomes available to them. Despite the valuable contributions of this study, there are also some limitations which will be discussed in this paper to make the study more complete and consistent.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.