• Title/Summary/Keyword: multiple entity data model

Search Result 16, Processing Time 0.022 seconds

Object Oriented Spatial Data Model using Geographic Relationship Role (지리 관계 역할을 이용한 객체 지향 공간 데이터 모델)

  • Lee, Hong-Ro
    • Journal of Internet Computing and Services
    • /
    • v.1 no.1
    • /
    • pp.47-62
    • /
    • 2000
  • Geographic Information System(GIS) deal with data which can potentially be useful for a wida range of applications. However, the information needs of each application usually vary, specially in resolution, detail level, and representation style, as defined in the modeling phase of the geographic database design. To be able to deal with such diverse needs, the GIS must after features that allow multiple representations for each geographic entity of phenomenon. This paper addresses the problem of formal definition of the objects and their relationships on geographical information systems. The geographical data is divided in two main classes: geo-objects and geo-fields, which describe discrete and continuous representations of spatial reality. I will study the classes and the roles of relationships over geo-fields, geo-objects and nongeo-objects. Therefore, this paper will contribute the efficient design of geographical class hierarchy schema by means of formalizing attribute-domains of classes.

  • PDF

An Object Oriented Spatial Data Model Based on Geometric attributes and the Role of Spatial Relationships in Geo-objects and Geo-fields (지리-객체와 지리-필드에서 기하 속성과 공간관계 역할에 기반한 객체 지향 공간 데이터 모델)

  • Lee, Hong-Ro
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.516-572
    • /
    • 2001
  • Geographic Information System(CIS) deal with data which can potentially be useful for a wide range of applications. The information needed by each application can be vary, specially in resolution, detail level, application view, and representation style, as defined in the modeling phase of the geographic database design. To be able to deal with such diverse needs, GIS must offer features that allow multiple representation for each geographic entity of phenomenon. This paper addresses on the problem of formal definition of the objects and their relationships on the geographical information systems. The geographical data is divided into two main classes : geo-objects and geo-fields, which describe discrete and continuous representations of spatial reality. I studied the attributes and the relationship roles over geo-object and nongeo-object. Therefore, this paper contributed on the efficient design of geographical class hierarchy schema by means of formalizing attribute-domains of classes.

  • PDF

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Improving Bidirectional LSTM-CRF model Of Sequence Tagging by using Ontology knowledge based feature (온톨로지 지식 기반 특성치를 활용한 Bidirectional LSTM-CRF 모델의 시퀀스 태깅 성능 향상에 관한 연구)

  • Jin, Seunghee;Jang, Heewon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.253-266
    • /
    • 2018
  • This paper proposes a methodology applying sequence tagging methodology to improve the performance of NER(Named Entity Recognition) used in QA system. In order to retrieve the correct answers stored in the database, it is necessary to switch the user's query into a language of the database such as SQL(Structured Query Language). Then, the computer can recognize the language of the user. This is the process of identifying the class or data name contained in the database. The method of retrieving the words contained in the query in the existing database and recognizing the object does not identify the homophone and the word phrases because it does not consider the context of the user's query. If there are multiple search results, all of them are returned as a result, so there can be many interpretations on the query and the time complexity for the calculation becomes large. To overcome these, this study aims to solve this problem by reflecting the contextual meaning of the query using Bidirectional LSTM-CRF. Also we tried to solve the disadvantages of the neural network model which can't identify the untrained words by using ontology knowledge based feature. Experiments were conducted on the ontology knowledge base of music domain and the performance was evaluated. In order to accurately evaluate the performance of the L-Bidirectional LSTM-CRF proposed in this study, we experimented with converting the words included in the learned query into untrained words in order to test whether the words were included in the database but correctly identified the untrained words. As a result, it was possible to recognize objects considering the context and can recognize the untrained words without re-training the L-Bidirectional LSTM-CRF mode, and it is confirmed that the performance of the object recognition as a whole is improved.

The Gains To Bidding Firms' Stock Returns From Merger (기업합병의 성과에 영향을 주는 요인에 대한 실증적 연구)

  • Kim, Yong-Kap
    • Management & Information Systems Review
    • /
    • v.23
    • /
    • pp.41-74
    • /
    • 2007
  • In Korea, corporate merger activities were activated since 1980, and nowadays(particuarly since 1986) the changes in domestic and international economic circumstances have made corporate managers have strong interests in merger. Korea and America have different business environments and it is easily conceivable that there exists many differences in motives, methods, and effects of mergers between the two countries. According to recent studies on takeover bids in America, takeover bids have information effects, tax implications, and co-insurance effects, and the form of payment(cash versus securities), the relative size of target and bidder, the leverage effect, Tobin's q, number of bidders(single versus multiple bidder), the time period (before 1968, 1968-1980, 1981 and later), and the target firm reaction (hostile versus friendly) are important determinants of the magnitude of takeover gains and their distribution between targets and bidders at the announcement of takeover bids. This study examines the theory of takeover bids, the status quo and problems of merger in Korea, and then investigates how the announcement of merger are reflected in common stock returns of bidding firms, finally explores empirically the factors influencing abnormal returns of bidding firms' stock price. The hypotheses of this study are as follows ; Shareholders of bidding firms benefit from mergers. And common stock returns of bidding firms at the announcement of takeover bids, shows significant differences according to the condition of the ratio of target size relative to bidding firm, whether the target being a member of the conglomerate to which bidding firm belongs, whether the target being a listed company, the time period(before 1986, 1986, and later), the number of bidding firm's stock in exchange for a stock of the target, whether the merger being a horizontal and vertical merger or a conglomerate merger, and the ratios of debt to equity capital of target and bidding firm. The data analyzed in this study were drawn from public announcements of proposals to acquire a target firm by means of merger. The sample contains all bidding firms which were listed in the stock market and also engaged in successful mergers in the period 1980 through 1992 for which there are daily stock returns. A merger bid was considered successful if it resulted in a completed merger and the target firm disappeared as a separate entity. The final sample contains 113 acquiring firms. The research hypotheses examined in this study are tested by applying an event-type methodology similar to that described in Dodd and Warner. The ordinary-least-squares coefficients of the market-model regression were estimated over the period t=-135 to t=-16 relative to the date of the proposal's initial announcement, t=0. Daily abnormal common stock returns were calculated for each firm i over the interval t=-15 to t=+15. A daily average abnormal return(AR) for each day t was computed. Average cumulative abnormal returns($CART_{T_1,T_2}$) were also derived by summing the $AR_t's$ over various intervals. The expected values of $AR_t$ and $CART_{T_1,T_2}$ are zero in the absence of abnormal performance. The test statistics of $AR_t$ and $CAR_{T_1,T_2}$ are based on the average standardized abnormal return($ASAR_t$) and the average standardized cumulative abnormal return ($ASCAR_{T_1,T_2}$), respectively. Assuming that the individual abnormal returns are normal and independent across t and across securities, the statistics $Z_t$ and $Z_{T_1,T_2}$ which follow a unit-normal distribution(Dodd and Warner), are used to test the hypotheses that the average standardized abnormal returns and the average cumulative standardized abnormal returns equal zero.

  • PDF