• Title/Summary/Keyword: Semi-Structured Data

Search Result 421, Processing Time 0.024 seconds

An Efficient Schema Extracting Technique Using DTD in XML Documents (DTD를 이용한 XML문서의 효율적인 스키마 추출 기법)

  • Ahn, Sung-Eun;Choi, Hwang-Kyu
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.141-146
    • /
    • 2001
  • XML is fast emerging as the dominant standard to represent and exchange data in the Web. As the amount of data available in the Web has increased dramatically in recent years, the data resides in different forms ranging from semi-structured data to highly structured data in relational database. As semi-structured data will be represented by XML, XML will increase the ability of semi-structured data. In this paper, we propose an idea for extracting schema in XML document using DTD.

  • PDF

An effective approach to generate Wikipedia infobox of movie domain using semi-structured data

  • Bhuiyan, Hanif;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.49-61
    • /
    • 2017
  • Wikipedia infoboxes have emerged as an important structured information source on the web. To compose infobox for an article, considerable amount of manual effort is required from an author. Due to this manual involvement, infobox suffers from inconsistency, data heterogeneity, incompleteness, schema drift etc. Prior works attempted to solve those problems by generating infobox automatically based on the corresponding article text. However, there are many articles in Wikipedia that do not have enough text content to generate infobox. In this paper, we present an automated approach to generate infobox for movie domain of Wikipedia by extracting information from several sources of the web instead of relying on article text only. The proposed methodology has been developed using semantic relations of article content and available semi-structured information of the web. It processes the article text through some classification processes to identify the template from the large pool of template list. Finally, it extracts the information for the corresponding template attributes from web and thus generates infobox. Through a comprehensive experimental evaluation the proposed scheme was demonstrated as an effective and efficient approach to generate Wikipedia infobox.

An Extended Dynamic Schema for Storing Semi-structured Data

  • Nakata, Mitsuru;Ge, Qi-Wei;Hochin, Teruhisa;Tsuji, Tatsuo
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.301-304
    • /
    • 2002
  • Recently, database technologies have been used commonly. But, ordinary technologies aren't suitable to construct a complicated database such as a classical literature database or an archaeological relic's database. Because this kinds of data are semi-structured data that doesn't have regular structures, database schema can't be defined before databases. We have proposed DREAM model for semi-structured databases. In this model, a database consists of five elements and the model has operations similar to operation of set theory. And further we have introduced dynamic schema "shape" showing structure of each element. We have already realized a prototype of DBMS adopting DREAM model (DREAM DBMS) and constructing function of shapes. However, shape is imperfect to describe database structures because it can't explain nested structures of elements. In this paper, we will profuse a "shape graph"that is dynamic schema showing database structures more exactly and extend the DREAM DBMS. Further we will evaluate the performance of constructing function of shapes and shape graphs.

  • PDF

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

  • Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.11-13
    • /
    • 2007
  • There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.

  • PDF

The Preliminary Feasibility on Big Data Analytic Application in Construction

  • Ko, Yongho;Han, Seungwoo
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.276-279
    • /
    • 2015
  • Along with the increase of the quantity of data in various industries, the construction industry has also developed various systems focusing on collecting data related to the construction performance such as productivity and costs achieved in construction job sites. Numerous researchers worldwide have been focusing on developing efficient methodologies to analyze such data. However, applications of such methodologies have shown serious limitations on practical applications due to lack of data and difficulty in finding appropriate analytic methodologies which were capable of implementing significant insights. With development of information technology, the new trend in analytic methodologies has been introduced and steeply developed with the new name of "big data analysis" in various fields in academia and industry. The new concept of big data can be applied for significant analysis on various formats of construction data such as structured, semi-structured, or non-structured formats. This study investigates preliminary application methods based on data collected from actual construction site. This preliminary investigation in this study expects to assess fundamental feasibility of big data analytic applications in construction.

  • PDF

New Data Model for Efficient Search and Reusability of XML Documents (XML 문서의 효율적인 검색과 재사용성을 지원하는 데이터 모델)

  • Kim Eun-Young;Chun Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.3
    • /
    • pp.27-37
    • /
    • 2004
  • XML has been proposed as a document standard for the representation and exchange of data on the WWW, and also becoming a standard for the search and reuse of scattered documents. When implementing a XML contents management system, special consideration should be imposed on how to model data and how to store the modelled data for effective and efficient management of the semi-structured data. In this paper, we proposed a new data model for the storage and search of XML document data. This proposed data model could represent both of data and structure views of XML documents, and be applied to the new data system for XML documents as well as the existing data systems.

  • PDF

An Efficient Disk Block Allocation Method for XML Data (XML 데이타를 위한 효율적인 디스크 블록 할당 방법)

  • Kim, Jung-Hoon;Son, Jin-Hyun;Chung, Yon-Dohn;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.465-472
    • /
    • 2007
  • With the recent proliferation of the use of semi-structured data such as XML, it becomes more important to efficiently store and manage the semi-structured data. The XML data can be logically modelled as a rooted tree e.g., the DOM tree. In order to process a query on the XML data, we traverse the tree structure. In this paper we present an algorithm that places the XML data to disk blocks. The proposed algorithm assigns a number to each node of the tree in a bottom-up fashion. Then, the nodes are allocated to disk blocks using the assigned number. The proposed algorithm does not need access pattern information, and provides good performance for any access pattern. The characteristics of the proposed method are presented with analysis. Through experiments, we evaluate the performance of the proposed method.

Querying of XML Documents using Oracle8i XDK (Oracle8i XDK를 이용한 XML 문서의 질의)

  • 하상호;이강석
    • Proceedings of the IEEK Conference
    • /
    • 2000.06c
    • /
    • pp.71-74
    • /
    • 2000
  • New methods for storing and retrieving semi-structured data such XML documents has been recently studied. In this paper, we concern querying of XML data stored in relational tables. We first describe a method of querying XML documents using Oracle XDK, especially XSQL Servlet. We then apply the method to XML documents describing book information.

  • PDF

A Study on Application of Web 3.0 Technologies in Small and Medium Enterprises of India

  • Potluri, Rajasekhara Mouly;Vajjhala, Narasimha Rao
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.5 no.2
    • /
    • pp.73-79
    • /
    • 2018
  • The purpose of this study is to explore how small and medium enterprises in India has identified the opportunities and challenges in adopting the Web 3.0 technologies to improve their productivity and efficiency. After an in-depth literature review, researchers framed a semi-structured questionnaire with open-ended questions for collecting responses from managers working in 40 Indian SME's representing five key economic sectors. The collected data was analyzed, and themes were encoded using the NVivo 11 computer-assisted qualitative data analysis software. Content analysis was used to analyze the data collected with the semi-structured interviews. This study identified five key themes and 12 subthemes illustrating the key advantages and challenges as perceived by the managerial leadership of SMEs. The five key themes identified in this study include integration of data and services, the creation of new functionalities, privacy and security, financial and technological challenges, and organizational challenges. The results of this study will benefit the organizational leadership of SMEs in planning and developing their short-term and long-term information systems strategies and will enable SME leaders to make optimal use of their information technology assets, improving the productivity and competitiveness of the firms. Web 3.0 technologies are considered as emerging technologies, so the advantages and challenges of using these technologies for SMEs have not been explored in the context of emerging economies, such as India.

A Study on the Data-based WBS Model for Train Control System to Improve a Maintenance work (열차제어시스템 유지관리 업무 개선을 위한 데이터 기반 WBS 모델 연구)

  • Jeon, Jo Won;Kim, Young Min;Park, Bum
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.18 no.1
    • /
    • pp.99-104
    • /
    • 2022
  • In this paper, to increase the maintenance efficiency of the urban railway train control system and to build a standard data system, we collect as much as possible structured, unstructured, and semi-structured data, and collect data by sensing and monitoring the system status and system status and monitoring. pre-process function data(Identification, purification, integration, transformation) through effective data classification and maintenance activities business classification system was studied. The purpose of this is to define the data matrix model by considering the relationship with the data generated and managed in the O&M stage of the train control system operated by the urban railway together with the WBS model, and to reflect and utilize it in practice.