A Novel encrypted XML streaming technique for indexing data on multiple channels

Vinay K. Ahlawat;Gaurav Agarwal;Vikas Goel;Kueh Lee Hui;Mangal Sain;

doi:10.3837/tiis.2024.07.007

KSII Transactions on Internet and Information Systems (TIIS)

Volume 18 Issue 7
/
Pages.1840-1867
/
2024
/
1976-7277(pISSN)
/
1976-7277(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

A Novel encrypted XML streaming technique for indexing data on multiple channels

Vinay K. Ahlawat (Department of CS&E, Invertis University) ;
Gaurav Agarwal (Department of CS&E, Invertis University) ;
Vikas Goel (Department of IT, KIET Group of Institutions) ;
Kueh Lee Hui (Department of Electrical Engineering, Dong-A University) ;
Mangal Sain (Division of Computer Engineering, Dongseo University)

Received : 2023.06.27
Accepted : 2024.06.12
Published : 2024.07.31

https://doi.org/10.3837/tiis.2024.07.007 Citation PDF HTML

Download PDF

⟨ Previous Next ⟩

Abstract

In this study, we focus on addressing the functional domain of research related to indexing XML data in wireless networks, emphasizing ensuring data confidentiality. The abstract outlines a novel indexing method designed for broadcasting encrypted XML data over wireless networks. The proposed technique involves two channels: one for indexing and another for transmitting the actual XML data. The method ensures data security by encrypting the XML stream, allowing mobile devices to access only authorized bits based on their access permissions. Despite an increase in data access time and device tuning time, the study concludes that the proposed indexing technique significantly enhances the security of transmitting XML data over mobile wireless networks.

Keywords

1. Introduction

As wireless technologies expand, wireless information systems are gaining prominence. Technology development increases the ability to share information with more individuals. A variety of mobile gadgets: laptops, tablets, and smartphones access information while moving. The information must go to people as fast as possible in the case of an environmental hazard like a flood, earthquake, or any emergency for instance [6].

In the process of indexing, data arrival information is added to the data to establish once the necessary data is available on the channel. Indexing approaches are influenced by three elements: the broadcast environment i.e. single channel or multiple channels, the data type i.e. simple or XML, and the broadcast schedule. A single-channel environment just sends data items and indexes on one channel, as opposed to a multichannel environment that transmits data across multiple channels. Depending on the data type, a group of data items may all be the same size or may be modifiable. How data is distributed across broadcast channels is determined by broadcast scheduling [3,4].

XML, or Extensible Markup Language, is a widely used format for storing and transmitting structured data, including documents, databases, and web services. One of the key challenges in broadcasting XML data over mobile wireless networks is the limited bandwidth and varying quality of service that these networks can provide. To address this challenge, indexing techniques can be used to reduce the amount of data that needs to be transmitted and improve the speed and efficiency of data retrieval.

There are several approaches to indexing XML data for broadcasting in mobile wireless networks. One approach is to use content-based indexing, which involves analyzing the content of XML documents to identify relevant keywords and metadata. Another approach is to use structure-based indexing, which involves analyzing the structure of XML documents to identify relationships between elements and attributes. Because XML data is kept in plain language, it is becoming a standard for online data delivery [1,2]. This provides a method of data storage that is independent of hardware and software. There are a few sets of guidelines offered to access data, including the tree structure initially created from the XML data. It is then transmitted after being created as a stream. Access time and tuning time are calculated depending on the reception time of data at the client.

In the various research work, index management is done by B+ tree. B+ tree is the traditional, common index type which is balanced and indexes the data [22]. The challenges associated with managing encoded content in XML, emphasizing the hierarchical tree structure that leads to varying levels of confidentiality and integrity for different parts of the content. To address issues of efficiency and scalability in dissemination, a tailored approach for XML data is needed. The proposed Signature scheme in [23] aims to achieve secure and selective distribution of XML content while maintaining the overall security and privacy of the data.

However, with advanced data structures like Bpoint tree, constraints like space efficiency, query response time etc are no longer exist [21]. But these may be always the performance measure of any indexing scheme. Battery power must be regulated to minimize access time and tuning time. Therefore, power consumption and access efficiency are the two key considerations. The measuring terms for the access time and tuning time are as follows:

• Access time is the sum of all available time when a client sends a query and receives the response.

• The length of time a client tunes in to a channel is known as tuning time.

Data security on the wireless channel is a significant issue that needs to be addressed. When data is transmitted over wireless networks, it is vulnerable to interception and manipulation by unauthorized parties. Several methods can be used to improve data security on wireless channels. One common approach is to use encryption. Another approach is to use authentication and authorization mechanisms. These controls aid in making sure that only permitted users may access the wireless network and the data being transmitted over it. Overall, ensuring data security on wireless channels is essential for protecting sensitive information and preventing unauthorized access [12-16, 25, 26].

The following is a summary of this paper's significant contributions:

1. We are proposing an encrypted XML stream scheme for broadcasting XML data on wireless channels. First, we encrypt the XML data to be broadcast and then distribute the data across multiple wireless channels. Our proposed approach improves the security of the data transmission and reduces the risk of interception or eavesdropping. Additionally, by indexing the XML nodes over multiple wireless broadcast channels, our proposed approach helps to accelerate XML searching and retrieval time.

2. We have provided various algorithms: algorithm1 creates a global index for XML documents, algorithm2 encrypts and decrypts XML documents, algorithm3 constructs tree structure for XML documents, algorithm4 creates an index segment for XML documents, and algorithm5 broadcasts data segment for XML documents. Users rapidly retrieves the location of the wireless broadcast channel's XML query results by using indexes in the generated XML data stream.

3. By conducting experiments with various XML data sets and different queries, we assess the effectiveness of the proposed encrypted XML data stream technique over multiple wireless broadcast channels. For processing XML queries, our proposed scheme evaluates the two parameters: access time and tuning time.

4. We have compared our proposed indexing scheme to existing XML data indexing systems, which often rely on centralized servers or databases to manage and index the data. The proposed approach performs better in efficiency parameters: access time and tuning time. Our proposed indexing approach encrypts the XML data and distributes the indexing across multiple wireless broadcast channels.

The rest of this research is meant to accomplish as follows: Sample XML data model along with sample XML query language are presented in Section 2. The relevant work that presented in Section 3 along with a thorough comparison of the various XML indexing methods currently in use. The suggested XML streaming method's structure, which explains how the stream is created for broadcast, is provided in Section 4. The comparison study of the parameters together with an examination of the suggested indexing strategy is presented in Section 5. The conclusion of the procedures in Section 6 was reached through analysis and comparison.

2. Sample XML Data Model & Query Language

Typically, a tree structure is used to represent an XML document. The document's items are represented as nodes in the tree structure, and the parent-child relationships between the elements are represented as edges. The paper uses a sample XML document from the ACM SIGMOD RECORD publications database [17] to demonstrate this relationship, and it provides Figs. 1 and 2 that display the document and the matching tree structure. The tree structure is created by the Simple API for XML. Overall, since it makes manipulating the document's elements and their interactions simple, modeling an XML document using a tree structure may be a helpful strategy [20]. The challenges in disseminating XML data through wireless broadcast due to the impact of certain query elements like "*" and "//" on performance.

Fig. 1. Sample XML document.

Fig. 2. XML Tree structure corresponding to the XML document in Fig. 1.

To address this issue, the paper [24] introduces a new indexing method called Deterministic Finite Automaton-based Index (DFAI) specifically designed for XPath queries.

In a sample XML document, the following XPath symbols, are used for selecting XML nodes:

• "/" : two XML nodes having a parent-child relationship is specified by this symbol in an XML tree.

• "//": two XML nodes having the ancestor-descendant relationship is specified by this symbol in the XML tree.

• "@": an XML node's attribute is specified by this symbol in XML tree.

• "*": any XML node with any given node name is specified by this symbol in the XML tree.

• "[]", In the XML query, to provide a predicate condition, this symbol is used.

3. Related Work

In [7], the authors provided three indexing techniques by using simple path XML queries:

One Sibling Address (OSA) is a technique used in XPath to select nodes in an XML document based on their position relative to their parent node. In an XML document, nodes are organized hierarchically, with each node having a parent node and zero or more child nodes. OSA allows us to select a specific child node based on its position relative to its siblings.

The syntax for OSA in XPath is: parent::node/*[position]

The position value in OSA starts from 1 for the first child node, and increments by 1 for each subsequent child node. If the position value is out of range (e.g., greater than the number of child nodes), then no node will be selected.

Two Sibling Address (TSA) is a technique used in XPath to select nodes in an XML document based on their position relative to their parent and grandparent nodes. In an XML document, nodes are organized hierarchically, with each node having a parent node and zero or more child nodes. TSA allows us to select a specific child node based on its position relative to its siblings and its grandparent node.

The syntax for TSA in XPath is: grandparent::parent/*[position]

TSA is useful when there are multiple levels of hierarchy in an XML document, and when we need to specify the exact location of a node relative to its siblings and grandparent node.

SPA (Same-Path-Address): This allows each S-Node to have a second address. This address is associated with the nearest node (cousin or sibling) with the same path address. This indexing technique can be more successful at locating the necessary XML nodes and an additional level of organization to the XML stream from the XML stream by building a chain of S-Nodes with the same path address. This can make it easier and more efficient to retrieve specific nodes from the stream, as the path address provides a quick and easy way to locate the desired node.[7]

An XML streaming structure proposed by Park et al. [8] for broadcasting the XML data in the push-based that utilizes the path summary approach. The basic idea is to use a technique called path summary that pre-computes a summary of the structure of the XML data. It includes the number of nodes and their paths and sends this summary to the clients. This allows the clients to efficiently navigate and extract data from the XML structure as it is being streamed to them in a push-based manner, without requiring the server to recompute the structure for each client [9].

The XML streaming structure itself can be thought of as a hierarchical tree, where each node represents a portion of the XML document. The root of this hierarchical tree represents the entire document, and each subsequent level represents a subset of the document, with each node containing summary information about its children. As the XML data is being streamed to the clients, the server sends updates to the path summary information, indicating which nodes have been added or removed. The clients use this information to efficiently navigate the XML structure and extract the data they need [9].

Park et al. [9] devised a distributed indexing approach called DIX. DIX is a distributed indexing approach to efficiently index and query XML data in a distributed environment. In DIX, each XML node is associated with several types of indexing information, including Foreign Node Link (FNL), two indexes called Content Index (CL) and Foreign Node Link Index (FL), and Location Path Information (LPI) for the FL index. The FNL is used to link related nodes across different documents or nodes distributed across different nodes in a cluster. The CL index is a traditional inverted index that indexes the content of each XML node, allowing for efficient keyword-based searches. The FL index is used for indexing the FNLs of nodes. It allows for efficient navigation and retrieval of related nodes. Finally, the LPI provides additional information such as the location path of the nodes in the XML document for the FL index.

By using multiple types of indexing information and distributing the index across multiple nodes, DIX is able can efficiently index and query XML data in a distributed environment. Additionally, by using the FNL to link related nodes, DIX can handle XML data that is distributed across multiple nodes or documents, which is a common scenario in many real-world applications [10].

The MD5 hash algorithm is used to compress the XML stream, which helps in the reduction of the state’s size and makes it more manageable. This compression process converts a variable-size LPI (location path index) into a fixed size of 16 bytes. This fixed size allows for more efficient indexing and retrieval of data from the stream. The indexing techniques C-DIX based on clustering groups XML nodes with the same depth, which helps to organize the data in the stream and makes it easier to search for specific nodes. By clustering nodes together based on their depth, mobile clients can search a subset of the XML stream rather than having to search the entire stream. This can help to reduce processing time and improve the performance of XML queries [11].

One limitation is that these methods don’t handle very complex XML queries. The complex queries involve wildcards, descendant axes, and predicate conditions in different lo-cation path steps. This is because the fixed-size LPIs generated by the MD5 hash function cannot capture the full complexity of the queries. Another limitation is that the XML stream’s size may still be too large, even after compression and clustering. Each XML node still has a 16-byte LPI and an XML tag name, which can add up to a significant amount of data in large XML streams [11].

For the aim of managing twig pattern XML searches across the XML data stream, a unique indexing system was presented in [10]. The indexing mechanism is designed to efficiently handle queries that involve complex twig patterns, which are commonly used in applications such as XML data integration and web search engines. In an XML document, the twiglet-based indexing mechanism is a small twig pattern that captures the structural relationships between a subset of nodes. For indexing the XML data stream, the twiglet is allowed for efficient retrieval of data that matches the twig pattern.

The indexing mechanism involves several steps. First, the XML data stream is preprocessed to identify and extract all twiglets that occur in the data. Next, each twiglet is indexed using a compact data structure that captures its structural relationships. Finally, queries that involve twig patterns are processed by using the indexed twiglets to efficiently locate the nodes that match the query. One benefit of this indexing mechanism is that it is designed to handle twig patterns that involve multiple levels of nesting and complex structural relationships, which can be challenging to handle using traditional indexing techniques [10].

In [11], the authors presented a novel XML stream structure called PS+Pre/Post, which is designed to handle various sorts of simple and complex pattern XML queries over an XML stream. The structure includes multiple indexing algorithms to support efficient query processing. The PS+Pre/Post structure consists of two main components: the Path Summary (PS) and the Pre/Post indexes. The PS is a compact summary of the structure of the XML data stream, including information about the nodes and their relationships. The Pre/Post indexes are two complementary indexing algorithms that are used to index the PS and support efficient querying.

The Pre-index is used to index the paths in the PS that correspond to the starting points of a query. This allows the system to efficiently locate the nodes that match the starting points of a query. The Post index, on the other hand, is used to index the paths in the PS that correspond to the ending points of a query. This allows the system to efficiently locate the nodes that match the ending points of a query. By using both the Pre and Post indexes in combination, the PS+Pre/Post structure can efficiently manage a wide variety of simple and complex pattern XML queries over an XML stream. Additionally, the compact size of the PS and the efficient indexing algorithms used by the Pre/Post indexes allow the system to handle large XML streams with low memory overhead [11-18].

The authors described a new structure for processing XML streams that incorporates the ideas of path summary and pre/post-labeling methods. The structure consists of a data segment and an index segment, which are created from the XML stream structure described in [11]. The index segments provide information on the root-to-node pathways in the XML stream, which can be used to quickly locate XML nodes that have that root-to-node path using some indexes. The data segment, on the other hand, provides information about the XML nodes that are connected by a root-to-node route. Nevertheless, some information about XML nodes in the XML tree, such as the content of messages and attributes, is not summarized by the new structure mentioned in the paragraph. As a result, the size of the XML stream is not reduced by this indexing technique. Several methods for effectively broadcasting and replicating XML data via wireless mobile networks. XML stream structure-based strategies were described in a previous publication [18,19].

Some of the strategies mentioned in the literature use a single broadcast channel [12-14] to copy and disseminate the XML data, while others use multiple channels [15]. Using multiple channels allows for the duplication and dispersion of XML data, which can help to reduce the size of the XML stream in each channel. By lowering access time, this size reduction helps accelerate the execution of XML queries.

The authors suggested that there are multiple approaches to efficiently disseminating XML data via wireless networks. The choice of strategy depends on various factors, such as the number of available broadcast channels and the processing speed required for XML queries. The choice of strategy may also depend on the size and complexity of the XML stream being disseminated, as well as the capabilities of the mobile devices receiving the XML data.

4. Comparative Study

The following list of variables should be taken into account while comparing the various indexing techniques for XML data broadcast:

• Method Type: Is the suggested approach a replication/distribution method or an indexing method?

• Size of XML Queries: This indicates if the proposed indexing technique can handle various XML queries with basic paths and twig patterns.

• XML Stream Size: How much may the XML stream be compressed using the specified indexing strategy?

• The kind of improvement in XML query processing: Whether the recommended may shorten access time and tuning time to process XML query.

• Traversal method used for data access

• Relationships like parent-cRelationships-2 child, child-child between nodes in the XML tree.

• Type of indexing i.e. kind of indexing tree generation.

Various indexing techniques along with their parameters of concern are summarized in Table 1 for XML data broadcast in wireless channels.

Table 1. Listing of current XML indexing techniques with parameters to compare

The performance of the indexing systems [8–14, 16] in terms of access time increases as the size of the broadcasted data increases. The data's sequential access is the cause of the size increase. The indexing may be different but the data placement is sequential. In [15] an indexing scheme with multiple broadcast channels is proposed that improves the access time significantly by increasing the number of broadcasting channels. However, the issue with this multiple indexing scheme is a security concern. The indexing scheme has no authentication for the broadcasted data anyone on the channel may access the data. The above Table 1 summarizes all the proposed indexing schemes with their key factors.

5. The proposed Encrypted XML Stream Scheme

In XML (eXtensible Markup Language) databases or systems, querying for specific information involves navigating through XML nodes, which are hierarchical structures that organize the data. XML encryption might be utilized to ensure the data secrecy of transmitted communications. You have the option of completely encrypting a communication or just a portion of it. However, employing XML encryption (either apart from or together with XML digital signatures) may have security repercussions.

A method for preserving the integrity and secrecy of data stored in XML documents is XML encryption. It has advantages in terms of security, but it also raises some difficulties and problems. Here are a few examples of the issues that using XML encryption may result in:

1. Increased Complexity: The building of applications and the processing of data can both become much more complex when using XML encryption. Data encryption and decryption require additional code, and this complexity might make the XML structure more challenging to manage and comprehend. Additionally, the system becomes more sophisticated due to the use of cryptographic libraries and encryption techniques, which could result in higher development and maintenance expenses.

2. Key Management: Data that has been encrypted must be securely managed keys. The secure distribution, storage, and revocation of encryption keys are prerequisites for XML encryption. It can be difficult to manage keys for several encrypted XML documents while maintaining their security. Ineffective management of encryption keys can result in data loss and breaches.

3. Searchability and Indexing: It is difficult to conduct searches, queries, and indexing operations on the material contained in XML documents when data is encrypted using XML encryption. Without initially decrypting the full document, it is impossible to look for certain values or elements within encrypted data because it appears as random cipher text. This may make it more difficult for programs to retrieve data effectively.

4. Data Size Increase: The size of the XML documents typically increases with XML encryption. Larger file sizes are the result of the encryption process, which adds metadata, padding, and other information to the contents. This may affect the amount of storage space and network bandwidth needed. The performance of programs can also be impacted by the growth in data size, particularly when moving data across networks or storing it in databases.

5. Interoperability: When attempting to interact with systems or programs that do not support XML encryption standards or make use of other encryption methods, XML encryption may encounter difficulties. This may cause integration issues and obstruct the efficient transfer of data across various systems.

6. Performance Overhead: Computing resources are used in the encryption and decryption of XML data, which might result in performance overhead. This is especially important for systems that frequently need to encrypt and decrypt huge amounts of XML data.

When determining whether to use XML encryption in your systems, it's crucial to give these concerns significant thought. The advantages of data security may outweigh the difficulties presented by XML encryption, depending on your particular use case and circumstances. To ensure the secure and effective usage of XML encryption in our proposed indexing XML technique, it is essential to plan for key management, data indexing, and other potential concerns.

Because the receiver key is public when using public key cryptography, XML encryption protects message confidentiality but not message integrity. The following are XML encryption best practices

• Verify the sender's identity by using digital signatures and XML encryption.

• Data elements within a message can be encrypted to prevent brute force attacks by employing a powerful encryption cipher with a long enough key length.

• Encrypt messages that include sensitive data with a powerful encryption cipher (either with transport encryption message encryption, or both).

• Use strong data encryption to encrypt messages that include sensitive data (which must stay encrypted at rest once the message is received) (not transport encryption).

• An EncryptedData element is encrypted by a procedure known as super-encryption. Zero or more EncryptedData components may be present in an XML document. A parent or child of another EncryptedData element cannot be another EncryptedData element. However, anything, even the EncryptedData and EncryptedKey parts, can represent the encrypted data itself. You must encrypt the entire EncryptedData or EncryptedKey element during super-encryption.

• Data or encrypted keys can utilize a security token to identify the encryption key that corresponds to the key needed for decryption. The decryption key can be identified without specifying a trust path or the precise contents of the certificate.

Our proposed scheme is done in two phases: (a) constructing a secured stream structure for the XML document; and (b) running XML queries using this proposed encrypted XML stream structure. This allows the XML data to be distributed across numerous wireless broadcast channels.

5.1 The Proposed Technique for Indexing the XML Data

This proposed encrypted XML streaming approach is being deployed, and compared to other earlier ways; the proposed scheme shortens access and tuning times. Its effectiveness is increased by using a variety of channels.

The concept of index segment and data segment in the proposed encrypted XML data indexing methods refers to how the XML data stream is divided and stored. The index segment is a section of the data stream that contains metadata and information about the structure of the XML document. The names of the elements, attributes, and their values are all part of this information. The index segment serves as a roadmap to help locate specific elements or attributes in XML documents. The actual data in the XML document is included in the data segment which is a data stream’s portion. This contains the attributes' values as well as the text contained in the elements. The data segment takes place where the actual content of the XML document is stored.

This proposed encrypted XML data placement method usesthese two segments to enable more efficient and secure storage and retrieval of XML data. By encrypting both the index and data segments separately, and then storing them in different locations or using different encryption methods, the security of the XML data can be enhanced. Additionally, by separating the index and data segments, the retrieving and storing of XML data can be optimized, as only the necessary segment needs to be accessed at any given time.

In our proposed encrypted XML data indexing scheme, the index part is always broadcasted on the first channel: the index channel. The data part is always transmitted on the other channel: the data channel. The index and data part of the XML data indexing scheme are encrypted for authentication. Because of index information, mobile clients may utilize the index information for determining the channel number and the time of arrival of XML data on wireless channels. The details of the index and data segment may be defined as:

• Index Segment: Information on the related XML nodes, their offspring, and siblings, is provided in this segment.

• Data Segment: The associated XML nodes' structural information is contained in the data segment. It should be noted that the information in this segment can be used to access the text and attributes in the text & attribute segment.

The proposed encrypted XML data indexing algorithm consists of the following four phases. The global index for an XML document is created in the following steps:

Step 1: A document in XML format is used to create the global index.

Step 2: After dividing the XML document into manageable pieces, the data buckets are first established. These data packets are then encrypted for broadcast security.

Step 3: The index channel is the first broadcast channel to receive the index segment.

Step 4: The data segments are sent to the broadcast second channel: the data channel.

Algorithm 1: Create a Global Index for XML Document