• 제목/요약/키워드: Semi-Structured Data

검색결과 421건 처리시간 0.026초

DTD를 이용한 XML문서의 효율적인 스키마 추출 기법 (An Efficient Schema Extracting Technique Using DTD in XML Documents)

  • 안성은;최황규
    • 산업기술연구
    • /
    • 제21권A호
    • /
    • pp.141-146
    • /
    • 2001
  • XML is fast emerging as the dominant standard to represent and exchange data in the Web. As the amount of data available in the Web has increased dramatically in recent years, the data resides in different forms ranging from semi-structured data to highly structured data in relational database. As semi-structured data will be represented by XML, XML will increase the ability of semi-structured data. In this paper, we propose an idea for extracting schema in XML document using DTD.

  • PDF

An effective approach to generate Wikipedia infobox of movie domain using semi-structured data

  • Bhuiyan, Hanif;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik
    • 인터넷정보학회논문지
    • /
    • 제18권3호
    • /
    • pp.49-61
    • /
    • 2017
  • Wikipedia infoboxes have emerged as an important structured information source on the web. To compose infobox for an article, considerable amount of manual effort is required from an author. Due to this manual involvement, infobox suffers from inconsistency, data heterogeneity, incompleteness, schema drift etc. Prior works attempted to solve those problems by generating infobox automatically based on the corresponding article text. However, there are many articles in Wikipedia that do not have enough text content to generate infobox. In this paper, we present an automated approach to generate infobox for movie domain of Wikipedia by extracting information from several sources of the web instead of relying on article text only. The proposed methodology has been developed using semantic relations of article content and available semi-structured information of the web. It processes the article text through some classification processes to identify the template from the large pool of template list. Finally, it extracts the information for the corresponding template attributes from web and thus generates infobox. Through a comprehensive experimental evaluation the proposed scheme was demonstrated as an effective and efficient approach to generate Wikipedia infobox.

An Extended Dynamic Schema for Storing Semi-structured Data

  • Nakata, Mitsuru;Ge, Qi-Wei;Hochin, Teruhisa;Tsuji, Tatsuo
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -1
    • /
    • pp.301-304
    • /
    • 2002
  • Recently, database technologies have been used commonly. But, ordinary technologies aren't suitable to construct a complicated database such as a classical literature database or an archaeological relic's database. Because this kinds of data are semi-structured data that doesn't have regular structures, database schema can't be defined before databases. We have proposed DREAM model for semi-structured databases. In this model, a database consists of five elements and the model has operations similar to operation of set theory. And further we have introduced dynamic schema "shape" showing structure of each element. We have already realized a prototype of DBMS adopting DREAM model (DREAM DBMS) and constructing function of shapes. However, shape is imperfect to describe database structures because it can't explain nested structures of elements. In this paper, we will profuse a "shape graph"that is dynamic schema showing database structures more exactly and extend the DREAM DBMS. Further we will evaluate the performance of constructing function of shapes and shape graphs.

  • PDF

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

  • Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2007년도 Proceedings of ISRS 2007
    • /
    • pp.11-13
    • /
    • 2007
  • There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.

  • PDF

The Preliminary Feasibility on Big Data Analytic Application in Construction

  • Ko, Yongho;Han, Seungwoo
    • 국제학술발표논문집
    • /
    • The 6th International Conference on Construction Engineering and Project Management
    • /
    • pp.276-279
    • /
    • 2015
  • Along with the increase of the quantity of data in various industries, the construction industry has also developed various systems focusing on collecting data related to the construction performance such as productivity and costs achieved in construction job sites. Numerous researchers worldwide have been focusing on developing efficient methodologies to analyze such data. However, applications of such methodologies have shown serious limitations on practical applications due to lack of data and difficulty in finding appropriate analytic methodologies which were capable of implementing significant insights. With development of information technology, the new trend in analytic methodologies has been introduced and steeply developed with the new name of "big data analysis" in various fields in academia and industry. The new concept of big data can be applied for significant analysis on various formats of construction data such as structured, semi-structured, or non-structured formats. This study investigates preliminary application methods based on data collected from actual construction site. This preliminary investigation in this study expects to assess fundamental feasibility of big data analytic applications in construction.

  • PDF

XML 문서의 효율적인 검색과 재사용성을 지원하는 데이터 모델 (New Data Model for Efficient Search and Reusability of XML Documents)

  • 김은영;천세학
    • 지능정보연구
    • /
    • 제10권3호
    • /
    • pp.27-37
    • /
    • 2004
  • 인터넷상에서 데이터를 표현하고 데이터를 서로 교환하기 위한 문서 표준으로 XML이 제시되고 있다. XML은 또한 웹상에 산재되어 있는 문서에 대한 쉬운 검색 및 재사용을 지원하는 문서 표준으로도 부각되고 있다. XML 콘텐츠 관리 시스템을 구현할 때 Semi-structured 데이터를 얼마나 효율적이고 효과적으로 검색 및 관리할 수 있는 가하는 점과 XML의 특징인 재사용성을 얼마나 지원해 줄 수 있는가를 고려해서 XML 데이터를 모델링 해야한다. 또한 모델링한 데이터를 어떻게 실질적으로 저장해야 할 것인 가도 고려해야 한다. 본 논문에서는 XML 문서의 데이터를 데이터 시스템에 저장하고 검색하기 위한 새로운 데이터 모델을 제안한다. 제안하는 데이터 모델은 XML 문서의 데이터 뷰와 구조 뷰를 모두 표현하며 XML 문서를 위한 새로운 데이터 시스템이나 기존의 관계형 시스템 모두를 고려한다.

  • PDF

XML 데이타를 위한 효율적인 디스크 블록 할당 방법 (An Efficient Disk Block Allocation Method for XML Data)

  • 김정훈;손진현;정연돈;김명호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제34권5호
    • /
    • pp.465-472
    • /
    • 2007
  • XML과 같은 준구조적 데이타가 많이 사용됨에 따라 이를 효과적으로 저장하고 관리하는 것이 중요해지고 있다. XML 데이타는 트리 형태로 모델링이 가능하며, 기본적으로 질의 처리는 트리를 탐색하는 방식으로 이루어진다. 본 논문에서는 XML 데이타를 디스크 블록에 저장하는 알고리즘을 제안한다. 제안하는 알고리즘은 트리의 각 노드마다 아래쪽에서 위쪽으로 숫자를 할당하며 그 숫자를 이용하여 디스크 블록에 노드들을 매핑한다. 제안하는 알고리즘은 접근 패턴 정보를 필요로 하지 않으며 어떠한 접근 패턴에 대해서도 좋은 성능을 보인다. 제안하는 방법의 몇가지 특성을 증명하고, 실험을 통해서 성능을 평가한다.

Oracle8i XDK를 이용한 XML 문서의 질의 (Querying of XML Documents using Oracle8i XDK)

  • 하상호;이강석
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 하계종합학술대회 논문집(3)
    • /
    • pp.71-74
    • /
    • 2000
  • New methods for storing and retrieving semi-structured data such XML documents has been recently studied. In this paper, we concern querying of XML data stored in relational tables. We first describe a method of querying XML documents using Oracle XDK, especially XSQL Servlet. We then apply the method to XML documents describing book information.

  • PDF

A Study on Application of Web 3.0 Technologies in Small and Medium Enterprises of India

  • Potluri, Rajasekhara Mouly;Vajjhala, Narasimha Rao
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제5권2호
    • /
    • pp.73-79
    • /
    • 2018
  • The purpose of this study is to explore how small and medium enterprises in India has identified the opportunities and challenges in adopting the Web 3.0 technologies to improve their productivity and efficiency. After an in-depth literature review, researchers framed a semi-structured questionnaire with open-ended questions for collecting responses from managers working in 40 Indian SME's representing five key economic sectors. The collected data was analyzed, and themes were encoded using the NVivo 11 computer-assisted qualitative data analysis software. Content analysis was used to analyze the data collected with the semi-structured interviews. This study identified five key themes and 12 subthemes illustrating the key advantages and challenges as perceived by the managerial leadership of SMEs. The five key themes identified in this study include integration of data and services, the creation of new functionalities, privacy and security, financial and technological challenges, and organizational challenges. The results of this study will benefit the organizational leadership of SMEs in planning and developing their short-term and long-term information systems strategies and will enable SME leaders to make optimal use of their information technology assets, improving the productivity and competitiveness of the firms. Web 3.0 technologies are considered as emerging technologies, so the advantages and challenges of using these technologies for SMEs have not been explored in the context of emerging economies, such as India.

열차제어시스템 유지관리 업무 개선을 위한 데이터 기반 WBS 모델 연구 (A Study on the Data-based WBS Model for Train Control System to Improve a Maintenance work)

  • 전조원;김영민;박범
    • 시스템엔지니어링학술지
    • /
    • 제18권1호
    • /
    • pp.99-104
    • /
    • 2022
  • In this paper, to increase the maintenance efficiency of the urban railway train control system and to build a standard data system, we collect as much as possible structured, unstructured, and semi-structured data, and collect data by sensing and monitoring the system status and system status and monitoring. pre-process function data(Identification, purification, integration, transformation) through effective data classification and maintenance activities business classification system was studied. The purpose of this is to define the data matrix model by considering the relationship with the data generated and managed in the O&M stage of the train control system operated by the urban railway together with the WBS model, and to reflect and utilize it in practice.