• Title/Summary/Keyword: multiple entity model

Search Result 37, Processing Time 0.027 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

A Development of SCM Model in Chemical Industry Including Batch Mode Operations (회분식 공정이 포함된 화학산업에서의 공급사슬 관리 모델 개발)

  • Park, Jeung Min;Ha, Jin-Kuk;Lee, Euy Soo
    • Korean Chemical Engineering Research
    • /
    • v.46 no.2
    • /
    • pp.316-329
    • /
    • 2008
  • Recently the increased attention pays on the processing of multiple, relatively low quantity, high value-added products resulted in adoption of batch process in the chemical process industry such as pharmaceuticals, polymers, bio-chemicals and foods. As there are more possibilities of the improvement of operations in batch process than continuous processes, a lot of effort has been made to enhance the productivity and operability of batch processes. But the chemical process industry faces a range of uncertainties factors such as demands for products, prices of product, lead time for the supply of raw materials and in the production, and the distribution of product. And global competition has made it imperative for the process industries to manage their supply chains optimally. Supply chain management aims to integrate plants with their supplier and customers so that they can be managed as a single entity and coordinate all input/output flows (of materials, information) so that products are produced and distributed in the right quantities, to the right locations, and at the right time.The objective of this study is to solve the purchase, distribution, production planning and scheduling problem, which minimizes the total costs of production, inventory, and transportation under uncertainty. And development of SCM model in chemical industry including batch mode operations. Through that, the enterprise can respond to uncertainty. Also integrated process optimal planning and scheduling model for manufacturing supply chain. The result shows that, the advantage of supply chain integration are quality matters seen by customers and suppliers, order schedules, flexibility, cost reduction, and increase in sales and profits. Also, an integration of supply chain (production and distribution system) generates significant savings by trading off the costs associated with the whole, rather than minimizing supply chain costs separately.

A New Efficient Private Key Reissuing Model for Identity-based Encryption Schemes Including Dynamic Information (동적 ID 정보가 포함된 신원기반 암호시스템에서 효율적인 키 재발급 모델)

  • Kim, Dong-Hyun;Kim, Sang-Jin;Koo, Bon-Seok;Ryu, Kwon-Ho;Oh, Hee-Kuck
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.15 no.2
    • /
    • pp.23-36
    • /
    • 2005
  • The main obstacle hindering the wide deployment of identity-based cryptosystem is that the entity responsible for creating the private key has too much power. As a result, private keys are no longer private. One obvious solution to this problem is to apply the threshold technique. However, this increases the authentication computation, and communication cost during the key issuing phase. In this paper, we propose a new effi ient model for issuing multiple private keys in identity-based encryption schemes based on the Weil pairing that also alleviates the key escrow problem. In our system, the private key of a user is divided into two components, KGK (Key Description Key) and KUD(Key Usage Desscriptor), which are issued separately by different parties. The KGK is issued in a threshold manner by KIC (Key Issuing Center), whereas the KW is issued by a single authority called KUM (Key Usage Manager). Changing KW results in a different private key. As a result, a user can efficiently obtain a new private key by interacting with KUM. We can also adapt Gentry's time-slot based private key revocation approach to our scheme more efficiently than others. We also show the security of the system and its efficiency by analyzing the existing systems.

Utility-Based Video Adaptation in MPEG-21 for Universal Multimedia Access (UMA를 위한 유틸리티 기반 MPEG-21 비디오 적응)

  • 김재곤;김형명;강경옥;김진웅
    • Journal of Broadcast Engineering
    • /
    • v.8 no.4
    • /
    • pp.325-338
    • /
    • 2003
  • Video adaptation in response to dynamic resource conditions and user preferences is required as a key technology to enable universal multimedia access (UMA) through heterogeneous networks by a multitude of devices In a seamless way. Although many adaptation techniques exist, selections of appropriate adaptations among multiple choices that would satisfy given constraints are often ad hoc. To provide a systematic solution, we present a general conceptual framework to model video entity, adaptation, resource, utility, and relations among them. It allows for formulation of various adaptation problems as resource-constrained utility maximization. We apply the framework to a practical case of dynamic bit rate adaptation of MPEG-4 video streams by employing combination of frame dropping and DCT coefficient dropping. Furthermore, we present a descriptor, which has been accepted as a part of MPEG-21 Digital Item Adaptation (DIA), for supporting terminal and network quality of service (QoS) in an interoperable manner. Experiments are presented to demonstrate the feasibility of the presented framework using the descriptor.

Improving Bidirectional LSTM-CRF model Of Sequence Tagging by using Ontology knowledge based feature (온톨로지 지식 기반 특성치를 활용한 Bidirectional LSTM-CRF 모델의 시퀀스 태깅 성능 향상에 관한 연구)

  • Jin, Seunghee;Jang, Heewon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.253-266
    • /
    • 2018
  • This paper proposes a methodology applying sequence tagging methodology to improve the performance of NER(Named Entity Recognition) used in QA system. In order to retrieve the correct answers stored in the database, it is necessary to switch the user's query into a language of the database such as SQL(Structured Query Language). Then, the computer can recognize the language of the user. This is the process of identifying the class or data name contained in the database. The method of retrieving the words contained in the query in the existing database and recognizing the object does not identify the homophone and the word phrases because it does not consider the context of the user's query. If there are multiple search results, all of them are returned as a result, so there can be many interpretations on the query and the time complexity for the calculation becomes large. To overcome these, this study aims to solve this problem by reflecting the contextual meaning of the query using Bidirectional LSTM-CRF. Also we tried to solve the disadvantages of the neural network model which can't identify the untrained words by using ontology knowledge based feature. Experiments were conducted on the ontology knowledge base of music domain and the performance was evaluated. In order to accurately evaluate the performance of the L-Bidirectional LSTM-CRF proposed in this study, we experimented with converting the words included in the learned query into untrained words in order to test whether the words were included in the database but correctly identified the untrained words. As a result, it was possible to recognize objects considering the context and can recognize the untrained words without re-training the L-Bidirectional LSTM-CRF mode, and it is confirmed that the performance of the object recognition as a whole is improved.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

The Gains To Bidding Firms' Stock Returns From Merger (기업합병의 성과에 영향을 주는 요인에 대한 실증적 연구)

  • Kim, Yong-Kap
    • Management & Information Systems Review
    • /
    • v.23
    • /
    • pp.41-74
    • /
    • 2007
  • In Korea, corporate merger activities were activated since 1980, and nowadays(particuarly since 1986) the changes in domestic and international economic circumstances have made corporate managers have strong interests in merger. Korea and America have different business environments and it is easily conceivable that there exists many differences in motives, methods, and effects of mergers between the two countries. According to recent studies on takeover bids in America, takeover bids have information effects, tax implications, and co-insurance effects, and the form of payment(cash versus securities), the relative size of target and bidder, the leverage effect, Tobin's q, number of bidders(single versus multiple bidder), the time period (before 1968, 1968-1980, 1981 and later), and the target firm reaction (hostile versus friendly) are important determinants of the magnitude of takeover gains and their distribution between targets and bidders at the announcement of takeover bids. This study examines the theory of takeover bids, the status quo and problems of merger in Korea, and then investigates how the announcement of merger are reflected in common stock returns of bidding firms, finally explores empirically the factors influencing abnormal returns of bidding firms' stock price. The hypotheses of this study are as follows ; Shareholders of bidding firms benefit from mergers. And common stock returns of bidding firms at the announcement of takeover bids, shows significant differences according to the condition of the ratio of target size relative to bidding firm, whether the target being a member of the conglomerate to which bidding firm belongs, whether the target being a listed company, the time period(before 1986, 1986, and later), the number of bidding firm's stock in exchange for a stock of the target, whether the merger being a horizontal and vertical merger or a conglomerate merger, and the ratios of debt to equity capital of target and bidding firm. The data analyzed in this study were drawn from public announcements of proposals to acquire a target firm by means of merger. The sample contains all bidding firms which were listed in the stock market and also engaged in successful mergers in the period 1980 through 1992 for which there are daily stock returns. A merger bid was considered successful if it resulted in a completed merger and the target firm disappeared as a separate entity. The final sample contains 113 acquiring firms. The research hypotheses examined in this study are tested by applying an event-type methodology similar to that described in Dodd and Warner. The ordinary-least-squares coefficients of the market-model regression were estimated over the period t=-135 to t=-16 relative to the date of the proposal's initial announcement, t=0. Daily abnormal common stock returns were calculated for each firm i over the interval t=-15 to t=+15. A daily average abnormal return(AR) for each day t was computed. Average cumulative abnormal returns($CART_{T_1,T_2}$) were also derived by summing the $AR_t's$ over various intervals. The expected values of $AR_t$ and $CART_{T_1,T_2}$ are zero in the absence of abnormal performance. The test statistics of $AR_t$ and $CAR_{T_1,T_2}$ are based on the average standardized abnormal return($ASAR_t$) and the average standardized cumulative abnormal return ($ASCAR_{T_1,T_2}$), respectively. Assuming that the individual abnormal returns are normal and independent across t and across securities, the statistics $Z_t$ and $Z_{T_1,T_2}$ which follow a unit-normal distribution(Dodd and Warner), are used to test the hypotheses that the average standardized abnormal returns and the average cumulative standardized abnormal returns equal zero.

  • PDF