Search | Korea Science

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
- Journal of Intelligence and Information Systems
- /
- v.24 no.4
- /
- pp.111-136
- /
- 2018
In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.
https://doi.org/10.13088/jiis.2018.24.4.111 인용 PDF KSCI HTML

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

Park, Hyun-jung;Shin, Kyung-shik
- Journal of Intelligence and Information Systems
- /
- v.26 no.4
- /
- pp.1-25
- /
- 2020
Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.
https://doi.org/10.13088/jiis.2020.26.4.001 인용 PDF KSCI

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
- Journal of KIISE:Software and Applications
- /
- v.33 no.12
- /
- pp.1073-1089
- /
- 2006
The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.
PDF KSCI

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.4
- /
- pp.1400-1418
- /
- 2020
In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.
https://doi.org/10.3837/tiis.2020.04.001 인용 PDF KSCI HTML

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
- Journal of KIISE
- /
- v.44 no.8
- /
- pp.813-821
- /
- 2017
Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.
https://doi.org/10.5626/JOK.2017.44.8.813 인용 KSCI

Construction of Test Collection for Evaluation of Scientific Relation Extraction System (과학기술분야 용어 간 관계추출 시스템의 평가를 위한 테스트컬렉션 구축)

Choi, Yun-Soo;Choi, Sung-Pil;Jeong, Chang-Hoo;Yoon, Hwa-Mook;You, Beom-Jong
- Proceedings of the Korea Contents Association Conference
- /
- 2009.05a
- /
- pp.754-758
- /
- 2009
Extracting information in large-scale documents would be very useful not only for information retrieval but also for question answering and summarization. Even though relation extraction is very important area, it is difficult to develop and evaluate a machine learning based system without test collection. The study shows how to build test collection(KREC2008) for the relation extraction system. We extracted technology terms from abstracts of journals and selected several relation candidates between them using Wordnet. Judges who were well trained in evaluation process assigned a relation from candidates. The process provides the method with which even non-experts are able to build test collection easily. KREC2008 are open to the public for researchers and developers and will be utilized for development and evaluation of relation extraction system.
PDF

A Study on the Development of Ability Women Specialist -Focused on the nursing specialist- (여성전문인의 능력발전에 관한 연구 - 전문간호사를 중심으로 -)

Kwon Ill Zoo
- Journal of Korean Public Health Nursing
- /
- v.3 no.1
- /
- pp.101-119
- /
- 1989
In the present, since a five-year plan for economic development which was started in the early 60's has been successfully promoted for a quarter century, with consolidation in a department of social welfare in our country the participation. in economic society for women is more required than any other times. As a professional occupation for women is incereaing through a high-standard specialization, I think the upbringing for productive woman expert who has a strong motive of accomplishment as a developed person as well as a technical and skilful capacity which can be contributed to the growth of organization is very important. So in this study, I am evaluating the technical disposition of character of professional nurses working with hospital and also trying to supply the basic data being served to th extension of a skillful ability as a nurse, understanding the important factor related to it. The research method applied here is that we used 527 of formed questionnaires which were distributed to 7 University and General Hospitals, somewhat large in a scale, located in Seoul as an analytical material. It was performed between October 11, 1988 and October 18, 1988. An implement which was invented by Cho Moo-Sung is used after being amended and supplemented, which can measure the disposition of professional character. The formation of questionnaires of the disposition of character is 26 totally, 10 for Open-disposition, 11 for Active-disposition, 5 for wise-disposition, and it was measured. 'Ye', or 'No' through an one-half-standard and the environment of hospital organization is composed 12 questions from one point of 'Very Good' to 5 points of 'Very Bad.' Collected materials were analysed through an electronic calculation into the average value, the standard deflection, percentage, person correlative number, $X^2-test, m$ stepwise multiple regression. Summarizing the result from this research is as follows; 1. The average age of the subjective person of this investigation is 28.6 and the average career as a nurse is 6.0 years. 2. The Open-disposition that technical nurses showed is mostly half and half. 3. The Active-disposition of professional nurses was discovered affirmative largely and what they said in their questionnaires describes that they are very active answering $88.2\%$ for the disposition of self-control, $87.3\%$ for the people who think the training more seriously. 4. It was found out that the wise·disposition of technical nurse showed $90.7\%$ of 'Yes' about a new alternative of inquisitive question and we can see a progressive and profound aspect here. 5. As technical character of nurses, mutual relations between Active-disposition, Active-disposition, and wisedisposition were very profitably revealed as 0.42 in justice relations and also suggested that relations between Open-disposition, Active-disposition, and wise-disposition are 0.27 and 0.20 respectively. 6. What nurses recognize about the environment of hospital organization is reasonably acceptable while they feel very bad about rewards and punishments showing average 3.1 comparing to average 2.2 about timecontrol each other. Considering the prosperity of Active-disposition upon the result what I mentioned above, th possibility which is contributed to the productive improvement of hospital organization is very great and I think it can be more developed as a professional woman who has a strong motive of accomplishment, in advance.
PDF

Analysis of the Policy Network for the “Feed-in Tariff Law” in Japan: Evidence from the GEPON Survey

Okura, Sae;Tkach-Kawasaki, Leslie;Kobashi, Yohei;Hartwig, Manuela;Tsujinaka, Yutaka
- Journal of Contemporary Eastern Asia
- /
- v.15 no.1
- /
- pp.41-63
- /
- 2016
Energy policy is known to have higher path dependency among policy fields (Kuper and van Soest, 2003; OECD, 2012; Kikkawa, 2013) and is a critical component of the infrastructure development undertaken in the early stages of nation building. Actor roles, such as those played by interest groups, are firmly formed, making it unlikely that institutional change can be implemented. In resource-challenged Japan, energy policy is an especially critical policy area for the Japanese government. In comparing energy policy making in Japan and Germany, Japan’s policy community is relatively firm (Hartwig et al., 2015), and it is improbable that institutional change can occur. The Japanese government’s approach to energy policy has shifted incrementally in the past half century, with the most recent being the 2012 implementation of the “Feed-In Tariff Law” (Act on Special Measures Concerning Procurement of Renewable Electric Energy by Operators of Electric Utilities), which encourages new investment in renewable electricity generation and promotes the use of renewable energy. Yet, who were the actors involved and the factors that influenced the establishment of this new law? This study attempts to assess the factors associated with implementing the law as well as the roles of the relevant major actors. In answering this question, we focus on identifying the policy networks among government, political parties, and interest groups, which suggests that success in persuading key economic groups could be a factor in promoting the law. Our data is based on the “Global Environmental Policy Network Survey 2012-2013 (GEPON2)” which was conducted immediately after the March 11, 2011 Great East Japan Earthquake with respondents including political parties, the government, interest groups, and civil society organizations. Our results suggest that the Feed in Tariff (FIT) Law’s network structure is similar to the information network and support network, and that the actors at the center of the network support the FIT Law. The strength of our research lays in our focus on political networks and their contributing mechanism to the law’s implementation through analysis of the political process. From an academic perspective, identifying the key actors and factors may be significant in explaining institutional change in policy areas with high path dependency. Close examination of this issue also has implications for a society that can promote renewable and sustainable energy resources.
https://doi.org/10.17477/jcea.2016.15.1.041 인용 PDF KPUBS HTML

A Study on the Development and Measurement of Logistics Partners Cooperation Index(LPCI): Focused on the Joint Logistics (물류협력지수의 개발 및 측정에 관한 연구: 공동물류사업을 중심으로)

Suh, Sang-Sok;Song, Gwang-Suk;Park, Jong-Woo
- Journal of Distribution Science
- /
- v.14 no.6
- /
- pp.107-118
- /
- 2016
Purpose - Over 90% of Domestic logistics industry is small enterprise and they are experiencing growth stagnation due to price-based competition structure rather than constructing logistics service of high added value. In order to get over this situation and pursue the development of logistics industry, strengthening its competitiveness, through inter-enterprise cooperative network build-up, would be a key alternative. Therefore, in this study, an index for measuring inter-enterprise cooperation level of Joint logistics business will be developed as a typical collaborative business model in logistics industry. Moreover, a strengthening competitiveness method suggests a developmental step and a key management index to mature in logistics industry. Research Design, Data, Methodology - This study is an index development research for measuring inter-enterprise cooperation level of logistics industry. Such a level was measured by performing a survey by targeting enterprises that participated in Joint logistics business. The targeting enterprises are typical cooperative models in logistics industry. Measurement items were developed which were based on the presented items in existing research. Question items were composed of selection type questions as answering Yes/No. They measures implementation status of corporate activity and detailed activity items measuring qualitative level. Total samples were based on 116 enterprise samples including 90 logistics enterprises and 26 shippers. In addition, by evaluating the importance for Joint logistics business recognition with personnel working level, the weight of measuring variable was extracted. This study has built an assessment tools (LPCI) on Joint logistics business cooperation level in a situation where there are no previous studies on joint logistics business, this study is meaningful for other studies. Results - As a result of analyzing LPCI presented in this study, the score of logistics enterprise was represented as 59.9 points based on full score of 100 points and that of shippers as 47.2 points and cooperation level among enterprises participated in Joint logistics business was revealed to be very low. In particular, as a result of measuring the importance between logistics enterprise and shippers, the difference by each measurement standard was represented among those enterprises. This difference is considered to be a key factor that cooperative operational conformity between logistics enterprises and shippers is represented to be low. Conclusions - As most joint logistics business, being promoted at present, is sharing facility and information with joint logistics business, it is hard to find such a joint logistics business in reality based on cooperative business model in main cooperation agents. Therefore, competitiveness of logistics industry could be strengthened by promoting joint logistics business based on their mutual cooperation among enterprises. In other words, it is to secure sustainable competitiveness of joint logistics business together with creation of new market by inter-enterprise cooperation based on integration of basic logistics business.
https://doi.org/10.15722/jds.14.6.201606.107 인용 PDF HTML

Characteristics of Piet Oudolf's Garden Design from the Viewpoint of the Contemporary Trends in the Use of Grasses (그라스(Grasses)류의 현대 활용추세 관점에서 본 피에트 우돌프(Piet Oudolf)의 정원 디자인 특징)

Park, Eun-Yeong
- Journal of the Korean Institute of Traditional Landscape Architecture
- /
- v.33 no.3
- /
- pp.66-71
- /
- 2015
Given the recent trend of natural planting, the recognized needs for new landscaping plants that have advantages in terms of climate change and maintenance, and expected increases in demands for grasses in Korea, this study is intended to investigate from the design point of view the techniques to use grasses and their significance through garden design by Piet Oudolf who is attracting international interests with the use of perennial plants and grasses and is leading the trends in modern planting design, thereby answering the question: how to best use grasses in landscaping spaces? The characteristics of Oudolf's garden design using grasses are summarized in the following conclusions: First, Oudolf combines perennial plants and grasses to make one-to-one correspondences or express expanded drifts. Here grasses mainly serve as an element to change over to other spaces or as a connecting element between image transitions. Second, the brown color and texture of grasses represent Oudolf's considerations on the temporal continuity of gardens. They express the lyricism and pictorialism of autumn and winter. Third, grasses serve to set layers in wide areas resulting in discordance between viewpoints and circulations. Oudolf repeatedly cross perennial plants and grasses using matrices, islands and distributed layering. Here grasses are used to express abstractive meanings in the settings of scenes.
https://doi.org/10.14700/KITLA.2015.33.3.066 인용 PDF

Search Result 292, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)