• Title/Summary/Keyword: Document Databases

Search Result 130, Processing Time 0.024 seconds

A Unification Algorithm for DTDs of XML Documents having a Similar Structure (유사 구조를 가지는 XML 문서들의 DTD 통합 알고리즘)

  • 유춘식;우선미;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1400-1411
    • /
    • 2004
  • There are many cases that many XML documents have different DTDs in spite of having a similar structure and being logically same kind of document. For this reason, It occurs a problem that these XML documents have different database schema and are stored in different databases. So, in this paper, we propose an algorithm that unifies DTDs of these XML documents using the finite automata and the tree structure. The finite automata is suitable for representing repetition operators and connectors of DTD, and is a simple representation method for DTD. By using the finite automata, we are able to reduce the complexity of algorithm. And we apply a proposed algorithm to unify DTDs of science journals.

A Signature Method for Efficient Preprocessing of XML Queries (XML 질의의 효율적인 전처리를 위한 시그너처 방법)

  • 정연돈;김종욱;김명호
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.532-539
    • /
    • 2003
  • The paper proposes a pre-processing method for efficient processing of XML queries in information retrieval systems with a large amount of XML documents. For the pre-processing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based information retrieval systems, the user queries have the form of path query. Therefore, the flat signature cannot be effective for XML documents. In the paper, we propose a structured signature for XML documents. Through experiments, we evaluate the performance of the proposed method.

Implementation of One-Stop Service System on Domestic & Foreign Technology Information (국내외 기술정보의 연계 서비스 체제 구축)

  • Seo, Jin-Ny;Noh, Kyung-Ran
    • Journal of Information Management
    • /
    • v.32 no.1
    • /
    • pp.1-22
    • /
    • 2001
  • In traditional environment, user must search each journal OPAC, bibliographic DB, and full-text DB and E-Journal separately until user finds scientific and technology informations that he needs. The purpose of this study is to build one-click service system of Journals that supports integrating search. This system provides various functions, such as, journal browsing, journal search, article search, alert function, my library, document delivery service by integrating databases and electronic journals. Users search all information sources through journal OPAC and acquire journal full-text by single interface.

  • PDF

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.7
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

An Incremental Clustering Technique of XML Documents using Cluster Histograms (클러스터의 히스토그램을 이용한 XML 문서의 점진적 클러스터링 기법)

  • Hwang, Jeong-Hee
    • Journal of KIISE:Databases
    • /
    • v.34 no.3
    • /
    • pp.261-269
    • /
    • 2007
  • As a basic research to integrate and to retrieve XML documents efficiently, this paper proposes a clustering method by structures of XML documents. We apply an algorithm processing the many transaction data to the clustering of XML documents, which is a quite different method from the previous algorithms measuring structure similarity. Our method performs the clustering of XML documents not only using the cluster histograms that represent the distribution of items in clusters but also considering the global cluster cohesion. We compare the proposed method with the existing techniques by performing experiments. Experiments show that our method not only creates good quality clusters but also improves the processing time.

W3C XQuery Update facility on SQL hosts (관계형 테이블을 이용한 W3C XQuery 변경 기능의 지원)

  • Hong, Dong-Kweon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.306-310
    • /
    • 2008
  • XQuery is a new recommendation for XML query. As an efforts for extending XQuery capabilities XML insertion and deletion are being studied and its standardization are going on. Initially XML databases are developed simply for XML document management. Now their functions are extending to OLTP. In this paper we are adding updating functions to XQuery processing system that is developed only for XQuery retrievals. We suggest the structure of tables, numbering schemes for hierarchical structures, and the methods for SQL translations for XQuery updates.

The Competency in Disaster Nursing of Korean Nurses: Scoping Review (국내 간호사의 재난간호 역량: 주제범위 문헌고찰)

  • Lee, Eunja;Yang, Jungeun
    • Journal of East-West Nursing Research
    • /
    • v.27 no.2
    • /
    • pp.153-165
    • /
    • 2021
  • Purpose: The aim of study was to identify ranges of Korean nurses' competency in disaster nursing. Methods: A scoping review was conducted using the Joanna Briggs Institute methodology. The review used information from four databases: RISS, ScienceON, EBSCO Discovery Service, and CINAHL. In this review, key words were 'disaster', 'nurs*', 'competenc*', 'ability' and 'preparedness'. Inclusion and exclusion criteria were identified as strategies to use in this review. The inclusion criteria for this review focused on the following: Korean nurse, articles related to disaster nursing competency, peer-review articles published in the full text in Korean and English. Review articles were excluded. Results: Nineteen studies were eligible for result extraction. A total of 10 categories of disaster nursing competency were identified: Knowledge of disaster nursing, crisis management, disaster preparation, information collection and sharing, nursing record and document management, communication, disaster plan, nursing activities in disaster response, infection management, and chemical, biological, radiation, nuclear, and explosive management. Conclusion: It is necessary to distinguish between Korean nurses' common disaster nursing competency, professional disaster nursing competency, and disaster nursing competency required in nursing practice. Therefore, future research will be needed to explore and describe disaster nursing competency.

Accounting Education in the Era of Information and Technology : Suggestions for Adopting IT Related Curriculum (기술정보화(IT) 시대의 회계 교육 : IT교과와의 융합교육의 제안)

  • Yoon, Sora
    • Journal of Information Technology Services
    • /
    • v.20 no.2
    • /
    • pp.91-109
    • /
    • 2021
  • Recently, social and economic environment has been rapidly changed. In particular, the development of IT technology accelerated the introduction of databases, communication networks, information processing and analyzing systems, making the use of such information and communication technology an essential factor for corporate management innovation. This change also affected the accounting areas. The purpose of this study is to document changes in accounting areas due to the adoption of IT technologies in the era of technology and information, to define the required accounting professions in this era, and to present the efficient educational methodologies for training such accounting experts. An accounting expert suitable for the era of technology and information means an accounting profession not only with basic accounting knowledge, competence, independency, reliability, communication skills, and flexible interpersonal skills, but also with IT skills, data utilization and analysis skills, the understanding big data and artificial intelligence, and blockchain-based accounting information systems. In order to educate future accounting experts, the accounting curriculum should be reorganized to strengthen the IT capabilities, and it should provide a wide variety of learning opportunities. It is also important to provide a practical level of education through industry and academic cooperation. Distance learning, web-based learning, discussion-type classes, TBL, PBL, and flipped-learnings will be suitable for accounting education methodologies to foster future accounting experts. This study is meaningful because it can motivate to consider accounting educational system and curriculum to enhance IT capabilities.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Problems of Applying Information Technologies in Public Governance

  • Goshovska, Valentyna;Danylenko, Lydiia;Hachkov, Andrii;Paladiiichuk, Sergii;Dzeha, Volodymyr
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.71-78
    • /
    • 2021
  • The relevance of research provides the necessity to identify the basic problems in the public governance sphere and information technology relations, forasmuch as understanding such interconnections can indicate the consequences of the development and spreading information technologies. The purpose of the research is to outline the issues of applying information technologies in public governance sphere. 500 civil servants took part in the survey (Ukraine). A two-stage study was conducted in order to obtain practical results of the research. The first stage involved collecting and analyzing the responses of civil servants on the Mentimeter online platform. In the second stage, the administrator used the SWOT-analysis system. The tendencies in using information technologies have been determined as follows: the institutional support development; creation of analytical portals for ensuring public control; level of accountability, transparency, activity of civil servants; implementation of e-government projects; changing the philosophy of electronic services development. Considering the threats and risks to the public governance system in the context of applying information technologies, the following aspects generated by societal requirements have been identified, namely: creation of the digital bureaucracy system; preservation of information and digital inequality; insufficient level of knowledge and skills in the field of digital technologies, reducing the publicity of the state and municipal governance system. Weaknesses of modern public governance in the context of IT implementation have been highlighted, namely: "digitization for digitalization"; lack of necessary legal regulation; inefficiency of electronic document management (issues caused by the imperfection of the interface of reporting interactive forms, frequent changes in the composition of indicators in reporting forms, the desire of higher authorities to solve the problem of their introduction); lack of data analysis infrastructure (due to imperfections in the organization of interaction between departments and poor capacity of information resources; lack of analytical databases), lack of necessary digital competencies for civil servants. Based on the results of SWOT-analysis, the strengths have been identified as follows: (possibility of continuous communication; constant self-learning); weaknesses (age restrictions for civil servants; insufficient acquisition of knowledge); threats (system errors in the provision of services through automation); opportunities for the introduction of IT in the public governance system (broad global trends; facilitation of the document management system). The practical significance of the research lies in providing recommendations for eliminating the problems of IT implementation in the public governance sphere outlined by civil servants..