• Title/Summary/Keyword: structured document

Search Result 170, Processing Time 0.025 seconds

Adaptive Path Index for Efficient U Query Processing (효율적인 XML 질의 처리를 위한 적응형 경로 인덱스)

  • 민준기;심규석;정진완
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.61-71
    • /
    • 2004
  • XML can describe a wide range of data, from regular to irregular and from flat to deeply nested. Thus, XML is rapidly emerging as the do facto standard for the Web document format since XML supports an efficient data exchange and integration. Also, to retrieve the data represented by XML, several XML query languages are proposed. XML query languages such as XPath and XQuery use path expressions to traverse irregularly structured data which comprise B% elements. To evaluate path expressions, various path indexes are proposed. However, traditional path indexes are constructed by utilizing only the XML data structure. Therefore, in this paper, we propose an adaptive path index which utilizes the XML data structure as well as query workloads. To improve the query performance, the adaptive path index proposed by this paper manages the frequently used paths and the structural summary of the XML data using a hash tree and a graph structure. Experimental results show that the adaptive path index improves the query performance typically 2 to 69 times compared with the existing indexes.

An Approach to Structuralizing Business Information for Internet Shopping Malls (인터넷쇼핑몰의 사업자신원정보 구조화 방안)

  • 장용식
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.1
    • /
    • pp.27-45
    • /
    • 2004
  • While on-line shopping is increasing, the "Consumer Protection Law in Electronic Commerce" obliges each internet shopping mall to provide its business information. Although most internet shopping malls provide their business information in the semi-structured format on the bottom of their homepages, the attributes and expression forms of business information are different each other. It makes consumers difficult to identify their business information and lowers public confidence. Hence this study proposes three approaches - HTML-based structure, XML-based structure, and XML data island-based structure - to structuralizing business information for correct expression. The experiment results showed that the business information extraction time by XML data island-based structure is independent of the size of the web document, while the time by HTML-based structure is dependent on the size. By comparing the business information extraction times, we show that XML data island-based structure is more efficient and effective than HTML-based structure.structure.

  • PDF

A study on the Classification Schemes of Internet Resources for Industry (산업 분야 인터넷 자원의 분류체계에 관한 연구)

  • 한상길
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.285-309
    • /
    • 2001
  • The industry information grows faster than any other information resources in the Internet age. Unfortunately, however, there is no consensus on the standard of the classification among the information providers of the industry fields. This may a problematic issue not only in building a continuous and systematic development of the industry information, but also in the use of the information among the users. This study aims to propose a well-structured and/or an efficient classification scheme for the industry information to help the users with easy to retrieve the Internet resources. To do this, we analyzed the subject classification scheme of the domestic industry information on the web sites, which is largely adopted the \"Korean Standard for the Industry Classification\". In addition, we suggested the principle of the subject classification and their hierarchial structure derived from the analysis of the knowledge and document classification scheme. As a result, it was suggested an optimized industry classification scheme based on the analysis of the validity test of classification item measured by the quantitative analysis of the industry information, which it currently accessible through the Internet. Internet.

  • PDF

A Comparative Study of XML and HTML: Focusing on Their Characteristics and Retrieval Functions (디지털도서관 문서양식으로서의 XML과 HTML의 특성 및 검색 기능 비교 연구)

  • 김현희;장혜원
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.2
    • /
    • pp.105-134
    • /
    • 1999
  • For efficient and precise searches in the Web environment, resources should be coded in a structured way. HTML does not cover semantic structure because of its fixed tagging. XML, which has emerged as an alternative standard markuplanguage, uses custom tags that allow structural searching. Therefore, this study aims to compare XML with HTML in terms of their characteristics and retrieval functions. In order to test retrieval functions of XML- and HTML-based systems, we constructed an experimental XML-based system. The XML-based system has several advantages over the HTML system. However, some improvements are needed to make the XML system more comprehensive and effective. First, XML document search engines with user-friendly interfaces are needed. Second, popular Web browsers such as Explorer and Communicator need to support XML 1.0 specification completely. Third, Open DTD format, which will allow information retrieval systems to retrieve documents and compress them into one single format, is also needed to control Web documents more efficiently.

  • PDF

Design and Implementation of XML-based Cyber Counseling System Supporting Counseling Analysis Information (상담 분석 정보를 지원하는 XML 기반 사이버 상담 시스템의 설계 및 구현)

  • Choi, Sook-Young;Back, Hyon-Ki
    • Journal of The Korean Association of Information Education
    • /
    • v.7 no.3
    • /
    • pp.341-352
    • /
    • 2003
  • While most researches for cyber counseling until now have been about counseling methods and the effects that teenagers utilize cyber counseling, there have been no efforts that store counseling contents, analyze them using features and technologies of web and use them effectively to guide students. Therefore, we propose a cyber counseling system that provides counseling information so that teacher may grasp students various interests and problems and thus helps to guide students. For this, we used XML. Since XML document can systematically create a structured information and represent a structure with meaningful information unit than the existing file-based information, it can be effectively used to manage, search and store documents. Thus, we implement a cyber counseling system using XML, which can effectively manage and represent the various analysis information of counseling.

  • PDF

The Development of the Korean Medicine Symptom Diagnosis System Using Morphological Analysis to Refine Difficult Medical Terminology (전문용어 정제를 위한 형태소 분석을 이용한 한의학 증상 진단 시스템 개발)

  • Lee, Sang-Baek;Son, Yun-Hee;Jang, Hyun-Chul;Lee, Kyu-Chul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.77-82
    • /
    • 2016
  • This paper presents the development of the Korean medicine symptom diagnosis system. In the Korean medicine symptom diagnosis system, the patient explains their symptoms and an oriental doctor makes a diagnosis based on the symptoms. Natural language processing is required to make a diagnosis automatically through the patients' reports of symptoms. We use morphological analysis to get understandable information from the natural language itself. We developed a diagnosis system that consists of NoSQL document-oriented databases-MongoDB. NoSQL has better performance at unstructured and semi-structured data, rather than using Relational Databases. We collect patient symptom reports in MongoDB to refine difficult medical terminology and provide understandable terminology to patients.

A Method of Service Reuse using Analysis of Process Similarity and Meta Repository (프로세스 유사도 분석과 메타 저장소를 이용한 서비스 재사용 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Jung, Kye-Dong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1375-1380
    • /
    • 2014
  • SaaS at a cloud computing is a framework to provide a software as a service. Depending on the difference of the tenant and the use, if the service provider re-establish a service, they are required resources In terms of costs and managerial. So we propose a technique for analysis of software structure using the process algebra to reuse existing software. A process algebra analyze the structure of the software, express in different languages and verify that it can be reused. CCS in a process algebra is useful to convert the business process or XML, by using this, we structure a process as process view and propose meta storage for comparison and management a structured document.

Inverted Indexes for XML Updates and Full-Text Retrievals in Relational Model (관계형 모델에서 XML 변경과 전문 검색을 지원하기 위한 역 인덱스 구축 기법)

  • Cheon, Yun-Woo;Hong, Dong-Kweon
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.509-518
    • /
    • 2004
  • Recently there has been some efforts to add XML full-text retrievals and XML updates into new standardization of XML queries. XML full-text retrievals plays an important role in XML query languages. of like tables in relational model an XML document has complex and unstructured natures. We believe that when we try to get some information from unstructured XML documents a full-text retrieval query is much more convenient approach than a regular structured query XML update is another core function that an XML query have to have. In this paper we propose an inverted index to support XML updates and XML full-text queries in relational environment. Performance comparisons exhibit that our approach maintains a comparable size of inverted indexes and it supports many full-text retrieval functions very well. It also shows very stable retrieval performance especially for large size of XML documents. Foremost our approach handles XML updates efficiently by removing cascading effects.

An Efficient Design Pattern Framework for Automatic Code Generation based on XML (코드 자동 생성을 위한 XML 기반의 효율적인 디자인패턴 구조)

  • Kim, Un-Yong;Kim, Yeong-Cheol;Ju, Bok-Gyu;Choe, Yeong-Geun
    • The KIPS Transactions:PartD
    • /
    • v.8D no.6
    • /
    • pp.753-760
    • /
    • 2001
  • Design Patterns are design knowledge for solving issues related to extensibility and maintainability which are independent from problems concerned by application, but despite vast interest in design pattern, the specification and application of patterns is generally assumed to rely on manual implementation. As a result, we need to spend a lot of time to develop software program not only because of being difficult to analyze and apply to a consistent pattern, but also because of happening the frequent programing faults. In this paper, we propose a notation using XML for describing design pattern and a framework using design pattern. We will also suggest a source code generation support system, and show a example of the application through this notation and the application framework. We may construct more stable system and be generated a compact source code to a user based on the application of structured documentations with XML.

  • PDF

Suggestions on how to convert official documents to Machine Readable (공문서의 기계가독형(Machine Readable) 전환 방법 제언)

  • Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.67
    • /
    • pp.99-138
    • /
    • 2021
  • In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.