• Title/Summary/Keyword: Query Extension

Search Result 64, Processing Time 0.023 seconds

The National Standard Real Situation Conformance Test System for a Nation-wide Interoperable Transportation Card (전국호환 교통카드 국가 표준 실환경 적합성 평가)

  • Nam, Na-kyung;Lee, Soo-kyung;Lee, Ki-han
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.3
    • /
    • pp.68-76
    • /
    • 2016
  • The pre-paid nation-wide interoperable transportation card, which an pay fee of bus, subway, train, and highway with just one card, released in June. 2014. It has started and operated from Seoul, Gyeonggi, and major local metropolitan area. In this paper, after starting of service, we evaluate conformance and interoperability of nation-wide interoperable transportation card system in real situation. Through this, we check the status of its technical operation. For this, we choose 6 region included Seoul, Gyeonggi which are serviced by different transport vendors and check recognition and billing result from field of transportation card terminal. As a result, we can reach that the major nation-wide interoperable transportation card operate normally and deliver CONFIG DF query command. It means nation-wide interoperable transportation card system which use only one card stably adapt the public transport system and it can make user's public transport use convenience higher through the extension of service area.

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

A design and implementation of DIDL mapping system preserving semantic constraints (의미적 제약조건을 보존하는 DIDL 매핑 시스템의 설계 및 구현)

  • 송정석;김우생
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5B
    • /
    • pp.482-490
    • /
    • 2004
  • Recently, XML has been emerging as a standard for storing and exchanging of data for various distributed applications based on the Internet. Since there are increasing demands to store and manage XML documents, a lot of research works are going on this area to develop new took and techniques based on the XHL. However, most of the researches are concentrated on mapping techniques based on instance or DTD, and the main focus is on structural transformation. Current trend of research is toward the usage of XML documents based on XML schema, and demands not only conversion of structure but also preservation of the semantic constraints. This paper sets up the using of DIDL standing on the basis of XML schema from MPEG-21 as an application domain, and proposes the mapping model that can preserve semantic constraints in addition. We expand previous research techniques in the preprocessing step for the specific domain, and then, apply various new mapping methods in the postprocessing step. We present and discuss the system architecture for implementation, and introduce the algorithms and present implementation environment and semantic extension methodology in detail. Finally we show actual table and query processing based on our proposal.

Wordnet Extension for IT terminology Using Web Search (웹 검색을 활용한 워드넷에서의 IT 전문 용어 확장)

  • Park, Kyeong-Kook;Lee, Kwang-Mo;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2007.10a
    • /
    • pp.189-193
    • /
    • 2007
  • In this paper, we designed a methodology to expand the WordNet. We added unknown terms like IT technical terms to the existing WordNet by using web search. The WordNet is an online taxonomy representing the relationships among terms, but it usually showed limitation to contain new technical terminologies. That's why we tried to expand the WordNet. Firstly, when we met unregistered terms in WordNet, we built a query of those terms for web search. Given a web search results, we tried to find out terms with a high-level relatedness with the unregistered terms. We used the Korean Morphological Analyzer to score the relatedness between terms and located the unregistered term as a hyponym of terms with high score of relatedness.

  • PDF

Development of an Editor for Reference Data Library Based on ISO 15926 (ISO 15926 기반의 참조 데이터 라이브러리 편집기의 개발)

  • Jeon, Youngjun;Byon, Su-Jin;Mun, Duhwan
    • Korean Journal of Computational Design and Engineering
    • /
    • v.19 no.4
    • /
    • pp.390-401
    • /
    • 2014
  • ISO 15926 is an international standard for integration of lifecycle data for process plants including oil and gas facilities. From the viewpoint of information modeling, ISO 15926 Parts 2 provides the general data model that is designed to be used in conjunction with reference data. Reference data are standard instances that represent classes, objects, properties, and templates common to a number of users, process plants, or both. ISO 15926 Parts 4 and 7 provide the initial set of classes, objects, properties and the initial set of templates, respectively. User-defined reference data specific to companies or organizations are defined by inheriting from the initial reference data and the initial set of templates. In order to support the extension of reference data and templates, an editor that provides creation, deletion and modification functions of user-defined reference data is needed. In this study, an editor for reference data based on ISO 15926 was developed. Sample reference data were encoded in OWL (web ontology language) according to the specification of ISO 15926 Part 8. iRINGTools and dot15926Editor were benchmarked for the design of GUI (graphical user interface). Reference data search, creation, modification, and deletion functions were implemented with XML (extensible markup language) DOM (document object model), and SPARQL (SPARQL protocol and RDF query language).

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

A Data Model for an Object-based Faceted Thesaurus System Supporting Multiple Dimensions of View in a Visualized Environment (시각화된 환경에서 다차원 관점을 지원하는 객체기반 패싯 시소러스 관리 시스템 모델의 정형화 및 구현)

  • Kim, Won-Jung;Yang, Jae-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.828-847
    • /
    • 2007
  • In this paper we propose a formal data model of an object-based thesaurus system supporting multi-dimensional facets. According to facets reflecting on respective user perspectives, it supports systematic construction, browsing, navigating and referencing of thesauri. Unlike other faceted thesaurus systems, it systematically manages its complexity by appropriately ing sophisticated conceptual structure through visualized browsing and navigation as well as construction. The browsing and navigation is performed by dynamically generating multi-dimensional virtual thesaurus hierarchies called "faceted thesaurus hierarchies." The hierarchies are automatically constructed by combining facets, each representing a dimension of view. Such automatic construction may make it possible the flexible extension of thesauri for they can be easily upgraded by pure insertion or deletion of facets. With a well defined set of self-referential queries, the thesauri can also be effectively referenced from multiple view points since they are structured by appropriately interpreting the semantics of instances based on facets. In this paper, we first formalize the underlying model and then implement its prototype to demonstrate its feasibility.

DGR-Tree : An Efficient Index Structure for POI Search in Ubiquitous Location Based Services (DGR-Tree : u-LBS에서 POI의 검색을 위한 효율적인 인덱스 구조)

  • Lee, Deuk-Woo;Kang, Hong-Koo;Lee, Ki-Young;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.3
    • /
    • pp.55-62
    • /
    • 2009
  • Location based Services in the ubiquitous computing environment, namely u-LBS, use very large and skewed spatial objects that are closely related to locational information. It is especially essential to achieve fast search, which is looking for POI(Point of Interest) related to the location of users. This paper examines how to search large and skewed POI efficiently in the u-LBS environment. We propose the Dynamic-level Grid based R-Tree(DGR-Tree), which is an index for point data that can reduce the cost of stationary POI search. DGR-Tree uses both R-Tree as a primary index and Dynamic-level Grid as a secondary index. DGR-Tree is optimized to be suitable for point data and solves the overlapping problem among leaf nodes. Dynamic-level Grid of DGR-Tree is created dynamically according to the density of POI. Each cell in Dynamic-level Grid has a leaf node pointer for direct access with the leaf node of the primary index. Therefore, the index access performance is improved greatly by accessing the leaf node directly through Dynamic-level Grid. We also propose a K-Nearest Neighbor(KNN) algorithm for DGR-Tree, which utilizes Dynamic-level Grid for fast access to candidate cells. The KNN algorithm for DGR-Tree provides the mechanism, which can access directly to cells enclosing given query point and adjacent cells without tree traversal. The KNN algorithm minimizes sorting cost about candidate lists with minimum distance and provides NEB(Non Extensible Boundary), which need not consider the extension of candidate nodes for KNN search.

  • PDF

Error-Tolerant Music Information Retrieval Method Using Query-by-Humming (허밍 질의를 이용한 오류에 강한 악곡 정보 검색 기법)

  • 정현열;허성필
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.488-496
    • /
    • 2004
  • This paper describes a music information retrieval system which uses humming as the key for retrieval Humming is an easy way for the user to input a melody. However, there are several problems with humming that degrade the retrieval of information. One problem is a human factor. Sometimes people do not sing accurately, especially if they are inexperienced or unaccompanied. Another problem arises from signal processing. Therefore, a music information retrieval method should be sufficiently robust to surmount various humming errors and signal processing problems. A retrieval system has to extract pitch from the user's humming. However pitch extraction is not perfect. It often captures half or double pitches. even if the extraction algorithms take the continuity of the pitch into account. Considering these problems. we propose a system that takes multiple pitch candidates into account. In addition to the frequencies of the pitch candidates. the confidence measures obtained from their powers are taken into consideration as well. We also propose the use of an algorithm with three dimensions that is an extension of the conventional DP algorithm, so that multiple pitch candidates can be treated. Moreover in the proposed algorithm. DP paths are changed dynamically to take deltaPitches and IOIratios of input and reference notes into account in order to treat notes being split or unified. We carried out an evaluation experiment to compare the proposed system with a conventional system. From the experiment. the proposed method gave better retrieval performance than the conventional system.

Geocoding Scheme for Multimedia in Indoor Space Based on IndoorGML (IndoorGML을 활용한 실내공간 멀티미디어 위치 인코딩 방법)

  • Li, Ki Joune
    • Spatial Information Research
    • /
    • v.21 no.4
    • /
    • pp.35-45
    • /
    • 2013
  • Most multimedia contains location information whether they are implicit or explicitly, and which are very useful for several purposes. In particular, we may use location information in defining query conditions to retrieve relevant multimedia. For this reason, a number of works have been done to organize and retrieve geo-referenced multimedia data. However, they mostly focus on outdoor space where position is identified by (x, y, z) coordinates. In this paper, we focus on multimedia in an alternative space, indoor space, which differs from outdoor space in several aspects. First indoor space is considered as symbolic space, where location is identified by a symbolic code such as room number rather than coordinates. Second, topological information is a crucial element in providing indoor spatial information services. Third, indoor space is in more micro-scale than outdoor space, which influences on determining the visibility of cameras. Based on these different characteristics of indoor space, we survey the requirements of management systems of indoor geo-referenced multimedia. Then we propose a geo-coding scheme for multimedia in indoor space as an extension of IndoorGML, an OGC(Open Geospatial Consortium) candidate standard for indoor spatial information. We also present a prototype system called, IngC (INdoor Geo-Coding) developed to store and manage indoor geo-referenced multimedia.