DOI QR코드

DOI QR Code

A Novel Cryptosystem Based on Steganography and Automata Technique for Searchable Encryption

  • Truong, Nguyen Huy (School of Applied Mathematics and Informatics, Hanoi University of Science and Technology)
  • Received : 2019.07.05
  • Accepted : 2020.04.01
  • Published : 2020.05.31

Abstract

In this paper we first propose a new cryptosystem based on our data hiding scheme (2,9,8) introduced in 2019 with high security, where encrypting and hiding are done at once, the ciphertext does not depend on the input image size as existing hybrid techniques of cryptography and steganography. We then exploit our automata approach presented in 2019 to design two algorithms for exact and approximate pattern matching on secret data encrypted by our cryptosystem. Theoretical analyses remark that these algorithms both have O(n) time complexity in the worst case, where for the approximate algorithm, we assume that it uses ⌈(1-ε)m)⌉ processors, where ε, m and n are the error of our string similarity measure and lengths of the pattern and secret data, respectively. In searchable encryption, our cryptosystem is used by users and our pattern matching algorithms are performed by cloud providers.

Keywords

1. Introduction

1.1 Background

Nowadays, with the rapid development of applications based on Internet infrastructure, cloud computing becomes one of the hottest topics in the information technology area. Indeed, it is a computing system based on Internet that provides on-demand services from application and system software, storage to processing data. For example, when cloud users use the storage service, they can upload information to the servers and then access it on the Internet online. Meanwhile, enterprises can not spend big money on maintaining and owning a system consisting of hardware and software. Although cloud computing brings many benefits for individuals and organizations, cloud security is still an open problem when cloud providers can abuse their information and cloud users lose control of it. Thus, guaranteeing privacy of tenants’ information without negating the benefits of cloud computing seems necessary [8, 11, 12, 13, 16, 32, 33].

In order to protect cloud users’ privacy, sensitive data need to be encoded before outsourcing them to servers. Unfortunately, encryption makes the servers perform search on ciphertext much more difficult than on plaintext. To solve this problem, many searchable encryption (SE) techniques have been presented since 2000. SE does not only store users’ encrypted data securely but also allows information search over ciphertext [7, 8, 9, 11, 12, 16, 19, 22, 32].

In cryptography, SE is a cryptosystem such that search can be done on encrypted data directly. SE can be either searchable symmetric encryption (SSE) or searchable asymmetric encryption (SAE). In SSE, only private key holders can create encrypted data and produce trapdoors for search. In SAE, users who have the public key can make ciphertexts but only private key holders can generate trapdoors [7, 8, 12, 16, 32].

Many SE methods pay attention to the problem of searching for pre-chosen keywords in ciphertext. For this problem, suppose that each data (document) contains a set of keywords. Then there are two approaches to SE. First way is to create an index which contains keywords and the corresponding document (forward index) or a keyword and the corresponding documents (inverted index). Second way is to do a sequential search without an index. Recently, to perform search more flexibly and keep away from wrong or no matching results, apart from traditional solutions only providing exact keyword search, the development of new methods supporting approximate (fuzzy) keyword search has been also studied [7, 8, 9, 11, 12, 13, 16, 19, 22, 25, 32].

However, the keyword based SE faces a problem. Keywords must be determined and also encrypted in a form, and then all files encrypted will be uploaded to the cloud. Then searching for new keywords can follow false results, even if the user data contains these keywords but not mentioned in the set of defined keywords. Furthermore, for the index base SE, very large indexes would make the efficiency of keyword search low [7, 12, 13, 19].

To deal with the above problem, there are some SE techniques proposed such as supporting file update functionality [9, 12, 16, 32], creating index file small [19] and providing pattern matching for search pattern is only asked at search time [7, 13, 25].

As we know that pattern matching is applied to search for information and analyze data every day, for example find and replace in text editing systems, in the search engine Google, database queries, searching on genomic data, etc [7, 26, 28].

Here, our work takes an interest in the problem of pattern matching on encrypted data, which is an important research direction in SE.

In spite of the considered problem’s importance, it has not been invested properly. To the best of our knowledge, there have only existed a few SE methods for exact pattern matching, but not for approximate pattern matching. Haynberg et al. [13] introduced SSE for exact pattern matching by using directed acyclic word graphs in the encryption algorithm (for more details about this data structure, see [3]). However, their technique needs the partial decryption of the ciphertext, it follows that the plaintext would be leaked to the attackers. Further, the searching is performed on users. Strizhov et al. [25] allowed pattern search on ciphertexts using the position heap tree data structure (see [10] for more details about the position heap). For this method, server does not perform search on encrypted data directly but only on index constructed from secret data. Desmoulins et al. [7] proposed SE for exact pattern matching, where the search phase Test is a pattern matching algorithm whose time complexity is O(mn) in the worst case for m, n are lengths of the pattern and the secret data.

The goal of this paper is to propose a novel symmetric cryptosystem that is used on users side, and algorithms for exact and approximate pattern matching on ciphertexts which are used on cloud servers side. These are essential components in SSE.

As we know that cryptography and data hiding are two branches of information security. Cryptography is used to distort data such that the data is not understood by attackers, it includes symmetric and asymmetric cryptography. While data hiding is used to hide data in digital media such as image, audio, video files, etc. It can be classified into steganography that protects secret data by hiding the existence of them and watermarking that prevents digital media by embedding watermarks in them [6, 34, 27].

Although cryptography and steganography are both capable of protecting secret data separately, different combinations of them are being developed to create systems with better security. The well studied hybrid technique of cryptography and steganography is to encode secret data using cryptography and then embed the ciphertext using steganography [2, 5, 30]. For gray images, Song et al. [23] introduced the first method which encrypts and embeds at once. However, since these methods must all guarantee acceptable imperceptibility of the digital media, the total number of secret data hidden is limited by the size of them. In other digital media formats, image steganography is used the most popularly because digital images are often transmitted on Internet and they have high degree of redundancy. Furthermore, the technique of image steganography is mainly image steganography in spatial domain [4, 14, 31]. So, to address the limitation of the existing hybrid methods, we propose a novel approach to construct a new cryptosystem based on spatial domain image steganography.

In our approach, we use the data hiding scheme (2,9,8) that is block based method in spatial domain, where 9 is the number of pixels in any image block, 8 is the number of secret bits which can be embedded in a block by changing colors of at most 2 pixels in the block. This scheme is near optimal for gray and palette images with high efficiency in embedding capacity, speed, security as well as visual quality, which are main properties of data hiding schemes [27]. Since our cryptosystem is designed to solve the problem of pattern matching on encrypted data and for an assumption that secret data is a string over the alphabet of size 256, the cryptosystem allows to encrypt letters of the secret data one by one.

For a given letter in the alphabet, corresponding to a 8-bit string, based on the embedding function in the data hiding scheme (2, 9, 8), we compute the information (called the flip information) to change the input image block. This flip information consists of positions of pixels in the block and way changing color of these pixels. The code word of the letter is a binary string presenting the flip information. The security analyses show that our cryptosystem provides high security with a key space of 220399!29028! for gray images and 220399!218+9t28! for palette images, where t is the number of bits representing color indexes.

Return to the remaining main objective in our work which is the problem of pattern matching on encrypted data on cloud providers side. In our results introduced [28,29], automata technique was applied to the problem of exact pattern matching and the longest common subsequence problem on plaintexts. With using this technique, we have achieved aims which are to design effective algorithms in practice to solve these problems. In this paper, we apply the algorithms to constructing exact and approximate pattern matching algorithms on ciphertexts performed by severs. Our main idea is to encrypt the automaton corresponding to a given pattern, and then we consider this encrypted automaton as a part of the trapdoor. Some theoretical analyses remarked that our algorithms all have O(n) time complexity in the worst case, where for approximate algorithm, we need an assumption that it uses [(1−ε)m] processors, where ε , m and n are error of the string similarity measure and lengths of the pattern and secret data, respectively.

1.2 Contributions

Our contributions in this paper can be summarized as follows.

1. We propose a novel approach to construct a cryptosystem based on steganography. The outstanding advantages of our cryptosystem are to allow encoding and decoding done at once, and ciphertexts that do not depend on the input image size as existing hybrid techniques of cryptography and steganography. In particular, this cryptosystem can be applied in SSE to encrypt and decrypt secret data by users.

2. We propose two sequential and parallel algorithms for exact and fuzzy pattern matching on secret data encrypted by our cryptosystem. These algorithms can be used by servers in SSE to perform pattern search. The outstanding feature of the algorithms is that, they can be applied sufficiently to all cryptosystems that support encrypting letters of the secret data one by one.

1.3 Organization

We organize the rest of this paper as follows. In Section 2, we recall some terminologies, definitions and results in [24, 27, 28, 29]. Section 3 proposes algorithms used by users and servers in SSE, we first construct a novel cryptosystem based on the data hiding scheme in [27], apply this cryptosystem to the process of encrypting and decrypting a secret data sequence and some security analyses are also discussed in Subsection 3.1. We then use the automata approach in our previous researches [28,29] to design exact and approximate pattern matching algorithms on secrete data encrypted by our cryptosystem in Subsection 3.2. Finally, Section 4 provides our conclusions.

2. Preliminaries

In this section, we will attempt to recall terminologies, definitions and results in [24, 27, 28, 29], which are really needed in order to present our new results clearly and logically, as well as help readers follow our paper’s content easily.

Now, we start with our near optimal data hiding scheme (2, 9, 8) proposed in [27], one of our data hiding schemes based on the Galois field GF(22), constructed from the polynomial ring Z2[x][24]. This scheme is a core material for constructing our new cryptosystem.

The data hiding scheme (2, 9, 8) is a five tuple  \((\mathrm{I}, \mathrm{M}, \mathrm{K}, E m, E x)\), where \(\mathrm{I}\) is a set of all image blocks of 9 pixels with the same image format, \(\mathrm{M}=G F^{4}\left(2^{2}\right)\)\(\mathrm{K}\) is a finite set of secret keys of 9 elements of \(G F\left(2^{2}\right)\) Em and Ex are designed as follows [27].

Without loss of generality, we assume that for \(I \in \mathrm{I}\)\( K \in \mathrm{K}\), they can be given by

\(I=\left\{I_{1}, I_{2}, \ldots, I_{9}\right\}\),

where Ii is a color index in the palette for palette images or color value for gray images of the \(i^{t h}\) pixel in I, \(\forall i=\overline{1,9}\).

\(K=\left\{K_{1}, K_{2}, \ldots, K_{9}\right\}\)

with \(K_{i} \in G F\left(2^{2}\right)\)\(\forall i=\overline{1,9}\).

Given a secret element \(M \in \mathrm{M}\), an image block \(I \in \mathrm{I}\), a key \(K \in \mathrm{K}\). Let G be the flip graph for gray and palette images constructed as in [27] and the automaton AIM K (, , ) defined as in [27]. With q0 is the initial state and δ is the state transition function of A(I, M, K), \(\text { Adjacent }\left(I_{i_{t}}, a_{t}\right)\) is an adjacent vertex of the vertex \(I_{i_{t}}\) in G and at is the weight of the arc \(\left(I_{i_{t}}, \text { Adjacent }\left(I_{i_{t}}, a_{t}\right)\right)\). Then we have [27]

The embedding function Em embedding M in I:

\(q=q_{0};\)       (2.1)

\(\text { For } i=1 \text { to } 9 \text { Do } q=\delta\left(q, I_{i}\right);\)       (2.2)

\(q=\delta(q, M);\)       (2.3)

\(\text { For each }\left(i_{t}, a_{t}\right) \text { in } q \text { Do } I_{i_{t}}=\operatorname{Adjacent}\left(I_{i_{t}}, a_{t}\right);\)       (2.4)

\(I^{\prime}=I;\)       (2.5)

The extracting function Ex extracting M from I′ :

\(q=q_{0};\)       (2.6)

\(\text { For } i=1 \text { to } 9 \text { Do } q=\delta\left(q, I_{i}^{\prime}\right);\)       (2.7)

\(M=q ;\)       (2.9)

Remember that the data hiding scheme (2,9,8) means we can hide a secret string of length 8 bits in an image block of 9 pixels with at most 2 pixels modified.

According to our construction of the data hiding scheme (2,9,8) and assume that we publish parameters Em, Ex, the vector space \(G F^{4}\left(2^{2}\right)\) and the flip graph G in this scheme. Then the security of the data hiding scheme (2,9,8) is given by the following formula [27]

\(c 3^{9} 9 ! 2^{18} 2^{8} !, \text { where } c \approx 2^{20}\).       (2.10)

We then recall the components and properties of a cryptosystem in [24].

Definition 2.1 ([24]). A five tuple \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) is called a cryptosystem if the following properties hold.

1. \(\mathcal{P}\) is a finite set of plaintexts.

2. \(\mathcal{C}\) is a finite set of ciphertexts.

3. \(\mathcal{K}\) is a finite set of secret keys.

4. For \(\forall k \in \mathcal{K}\), there exists an encrypting function \(e_{k} \in \mathcal{E}\) and a corresponding decrypting function

\(d_{k} \in \mathcal{D}\), where \(e_{k}: \mathcal{P} \rightarrow \mathcal{C}\) and \(d_{k}: \mathcal{C} \rightarrow \mathcal{P}\) holds \(d_{k}\left(e_{k}(x)\right)=x\) with \(\forall x \in \mathcal{P}\).

Next, we present main terminologies and facts in [28] to design the exact pattern matching on encrypted data.

We call a finite set \(\Sigma\) an alphabet, denote the size of \(\Sigma\) by \(|\Sigma|\). Any element in \(\Sigma\) is called a letter. A string on \(\Sigma\) is a finite sequence of letters of \(\Sigma\). Denote the set of all strings over \(\Sigma\) by \(\Sigma^{*}\). The empty string is denoted by \(\varepsilon\). The length of the string x, denoted by ∣x∣, is the number of letters of x. For any string x of length n we can represent by

\(x=x[1] x[2] \ldots x[n], x[i] \in \Sigma, 1 \leq i \leq n\),

where n is a positive integer.

Denote the concatenation operator of the two strings u1 and u2 by u1u2.

A string p is called a substring of the string x, if x = u1pu2 for any strings u1 and u2. In case \(u_{1}=\varepsilon\left(\operatorname{resp} . u_{2}=\varepsilon\right)\), the string p is called a prefix (resp. suffix) of x. If p ≠ x, the prefix (resp. suffix) p is called proper.

We denote the ith element of x by x[i] and i is called a position in x, the substring x[i]x[i+1]..x[j] of x by x[i..j] for ∀1 ≤ i ≤ j ≤ n. Let p be a substring of length m of x, where m is a positive integer, then there exists i for 1 ≤ i ≤ n − m + 1 such that p = x[i..i + m − 1]. We say that i is an occurrence of p in x or p occurs in x at position i.

Definition 2.2 ([28]). Given a string p and a letter a of p. Let i be some position in p for 1 ≤ i ≤ |p|. Then call i the last position of appearance of a in p, denoted by Posp(a) if a = p[i] and ∀j > i, j ≤ |p|, a ≠ p[j].

Definition 2.3 ([28]). Let p be a pattern of length m over the alphabet Σ. Then Nextp of p is a function such that Nextp : {1,...,m} → {0,...,m − 1} defined by Nextp(l) = max{|s| |s is both a suffix and proper prefix of p[1..l]} for l ∈ {1,...,m}.

Lemma 2.4 ([28]). Let p be a pattern, x be a text over the alphabet Σ and suppose that the degree of appearance of p in x at the position i is equal to l, 0 ≤ l ≤ |p|. Then the degree of appearance l’ at the position i+1 in x is given by the formula l’ = Appearancep(l,a), where a = xi+1, and the function Appearancep corresponding to p is determined by

\(\text { Appearance }_{p}(l, a)=\left\{\begin{array}{cc} 0 & l=0, a \neq p[1] ; \text { or } a \notin p, \\ 1 & l=0, a=p[1], \\ l+1 & 0<l<|p|, a=p[l+1], \\ \text { Appearance }_{p}\left(\operatorname{Nex} t_{p}(l), a\right) & 0<l<|p|, a \neq p[l+1] ; \text { or } l=|p| . \end{array}\right.\)

Theorem 2.5 ([28]). Let p be a pattern of length m and \(A_{p}=\left(\Sigma, Q_{p}, q_{0}, \delta_{p}, F_{p}\right)\) corresponding to p be an automaton over the same Σ, where

• Qp = {0,1,...,m} is a set of states,

• q0 = 0 is the initial state,

• Fp = {m} is a set of final states,

• δp is the transition function satisfying δp : Qp × Σ→ Qp and δp(q,a) = Appearancep(q,a), where the function Appearancep corresponding to p as given in Lemma 2.4,

• To accept an input string, we can extend the transition function \(\delta_{p}: \delta_{p}: Q_{p} \times \Sigma^{*} \rightarrow Q_{p}\) such that \(\forall q \in Q_{p}, \delta_{p}(q, \varepsilon)=q, \forall s \in \Sigma^{*}, \forall a \in \Sigma, \delta_{p}(q, a s)=\delta_{p}\left(\delta_{p}(q, a), s\right)\).

Then the pattern p is accepted by the automaton Ap.

Finally, we recall important definitions in [29] to construct the approximate pattern matching on encrypted data.

Let LCS(p, x) be a longest common subsequence of p and x. Denote |LCS(p, x)| by lcs(p, x). We let the lcs(p, x) equal 0, if there does not exist any longest common subsequences of strings p and x (for more details about the concept “longest common subsequence of two strings,” see [29]).

We see that a subsequence u has at least a location in p. Note that u = p[j1]p[j2]..p[jl] is a subsequence of p, then vector (j1,j2,..., jl) is a location of u in p. We sort all the different locations of u into the dictionary order, then call the leftmost location of u the least element, denoted by LeftID(u). The last component in LeftID(u) is denoted by Rmp(u) [29].

The symbol Config(p) is the set of all configurations of p. If C ∈ Config(p), then C can be the empty set, denoted by C0, or C can be an ordered set {u1,u2,...,ul} with 1 ≤ l ≤ |p|, where ui is a subsequence of p, 1 ≤ i ≤ l (see more detail in [29]).

Definition 2.6 ([29]). Let p be a string of length m and C ∈ Config(p). Then the weight of C is a ordered set, denoted by Wp(C), is given by

(a) Wp(C), denoted by W0, is the empty set if C = C0.

(b) Wp(C) = {Wp(u1),Wp(u2),...,Wp(ul)} if C = {u1,u2,...,ul} for 1 ≤ l ≤ m, where the weight of ui in p, denoted by Wp(ui) and Wp(ui) = |p| + 1 − Rmp(ui) for 1 ≤ i ≤ l.

Denote the set of all the weights of all the configurations of p by WConfig(p).

Definition 2.7 ([29]). Let p be a string of length m on the alphabet Σ and Σp be the set of all the letters of p. Then Refp of p is a function such that Refp : {1,...,m} × Σp → {1,...,m − 1} determined by

\(\operatorname{Ref}_{p}(i, a)=\left\{\begin{array}{cc} 0 & i=1, \\ \max \left\{W_{p}^{j}(a) \mid W_{p}^{j}(a)<i\right. \text { for } m+1-i<j \leq m & 2 \leq i \leq m, \end{array}\right.\)

where a ∈ Σp, where the weight of the letter a at the location j in p, \(W_{p}^{j}(a)=m+1-j \).

Notice that for an assumption that p contains a. With 1 ≤ i ≤ |p| if a ≠ p[i], we let \(W_{p}^{i}(a)=0\). At any location, the letter a has a weight. Denote the heaviest weight of a in p by Wmp(a) [29].

Definition 2.8 ([29]). Let p be a string of length m over the alphabet Σ, W be a weight of a configuration of p and a ∈ Σ. Then a function δp is given by δp : WConfig(p) × Σ → WConfig(p) and

1. If a ∉ p, then δp(W,a) = W.

2. If a ∈ p, then δp(W0,a) = {Wmp(a)}.

3. Assume that a ∈ p and W = {w1,w2,...,wl} for 1 ≤ l ≤ m. Put W’ = δp(W,a). Then W’is computed by the following parallel algorithm:

(i) Put W’ = W;

Perform the block of the following commands in parallel way:

(ii) w’l+1 = Refp(wl a) if Refp(wl,a) ≠ 0;

(iii) The following commands are executed in parallel: for ∀i ∈ {1, 2,..., l - 1}, w’i+1 = Refp(wi,a) if Refp(wi,a) > wi+1;

(iv) w’1 = Wmp(a) if Wmp(a) > w1;

4. To accept an input string, we extend the function δp: δp : WConfig(p) × Σ∗ → WConfig(p) such that ∀W ∈ WConfig(p), δp(W,ε) = W, ∀u ∈ Σ∗,∀a ∈ Σ, δp(W,au) = δpp(W,a),u).

3. Main Results

Subsection 3.1, we propose a novel cryptosystem based on our data hiding scheme (2,9,8) re-presented in Section 2 (Theorem 3.2 and Security analyses (3.3), (3.4)) and apply this cryptosystem to the process of encrypting and decrypting a secret data sequence (Proposition 3.4 and Security analyses (3.9), (3.10)). In Subsection 3.2, we use our automata approach recalled in Section 2 to design two algorithms for exact and approximate pattern matching on secret data encrypted by our cryptosystem proposed in Subsection 3.1 (Theorems 3.12 and 3.17).

3.1 A Novel Cryptosystem

Call Em’ to be a function which is derived from the function Em by removing two Statements (2.4) and (2.5). As in [27], the state q in Statement (2.3) is computed by q = δ(q, M) = δ2(q, M), where q, M ∈ GF4 (22) and

\(\delta_{2}(q, M)=\left\{\begin{array}{cc} \varnothing & \text { if } \mathrm{v}=\mathrm{q}, \\ \left(i_{t}, a_{t}\right) \mid 1 \leq i_{t} \leq 9, t=\overline{1, k^{\prime}}, k^{\prime} \leq 2, v_{i_{t}} \in S, a_{t} \in G F\left(2^{2}\right) \backslash\{0\}, M+(-q)=\sum_{i=1}^{k} a_{t} v_{i_{t}}\left(\text { on } G F^{4}\left(2^{2}\right)\right\} & \text { otherwise, } \end{array}\right.\)

where \(S=\left\{v_{1}, v_{2}, \ldots, v_{9}\right\}\) is a 2-Generators S for GF4(22). Note that the number of S is given by [27]

\(c 3^{9} 9 ! \text { , where } c \approx 2^{20}\)       (3.1)

Then it is easy to check that the function Em’ satisfies \(\mathrm{Em}^{\prime}:\mathrm{I}\times \mathrm{M} \times \mathrm{K}\rightarrow2^{\{1,2 \ldots, 9\} \times G F\left(2^{2}\right) \backslash\{0\}}\). Ex’ is a function obtained from Ex by replacing image blocks \(I_{i_{t}}\) with image blocks \(I_{i_{t}}^{\prime}\) in Statement (2.4) and then inserting two Statements (2.5) and (2.4) before Statement (2.6) in Ex, then the function Ex’ holds \(E x^{\prime}: 2^{\{1,2 \ldots, 9\}\times\left(k G F\left(2^{2}\right)\backslash\{0\}\right.} \times \mathrm{I} \times \mathrm{K} \rightarrow \mathrm{M}\). Since we have [27]

\(\forall(I, M, K) \in \mathrm{I} \times \mathrm{M} \times \mathrm{K}, E x(\operatorname{Em}(I, M, K), K)=M\)

and for our construction of two functions Em’ and Ex’ , similary, we also follow

\(\forall(I, M, K) \in \mathrm{I} \times \mathrm{M} \times \mathrm{K}, E x^{\prime}\left(E m^{\prime}(I, M, K), I, K\right)=M\).       (3.2)

Remark 3.1. From defining two functions Em’ and Ex’ as above, all image blocks I used are not changed.

Consider Σ to be an alphabet of size 256. Set \(\mathcal{P}=\Sigma\).

In [27], \(\left(G F^{4}\left(2^{2}\right),+, \cdot\right)\) is considered a vector space over the field \(G F\left(2^{2}\right)\), where \(G F^{4}\left(2^{2}\right)=\left\{\left(x_{1}, x_{2}, x_{3}, x_{4}\right) \mid x_{i} \in G F\left(2^{2}\right), \forall i=\overline{1,4}\right\}\) with the vector addition and scalar multiplication given as follows.

\(x+y=\left(x_{1}+y_{1}, x_{2}+y_{2}, x_{3}+y_{3}, x_{4}+y_{4}\right)\),

\(a x=\left(a x_{1}, a x_{2}, a x_{3}, a x_{4}\right), a \in G F\left(2^{2}\right)\),

where \(x, y \in G F^{4}\left(2^{2}\right)\) and \(x=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)\)\(y=\left(y_{1}, y_{2}, y_{3}, y_{4}\right)\). In addition, by the decimal representation of the vector space \(G F^{4}\left(2^{2}\right)\) over the field GF(22), then \(|\mathcal{P}|=\left|G F^{4}\left(2^{2}\right)\right|=256\), hence there exists a bijective function f from \(\mathcal{P}\) to GF4(22), denote the inverse function of f by f-1. Put \(\mathcal{F}\) to be a set of all f.

From the function δ, the state q of the automaton A(I,M,K) computed by Statement (2.3) is a set. The state q may be one of the following sets: ∅, {(i, a)} for i ∈ {1,2,...,9}, a ∈ GF(22)\{0} or {(i, a),(j, b)} for i, j ∈ {1,2,...,9}, a, b ∈ GF(22)\{0}.

The index i ∈ {1,2,...,9} and the coefficient a ∈ GF(22)\{0}) = {1,2,3} can be presented by binary strings of lengths 4 and 2, respectively. Hence, we can use 12 binary bits to present a state q. Suppose B is a binary string of length 12 to present an arbitrary state q, B = B12 ...B2B1, then the storage structure of q in B is given as follows.

1. If q = ∅, then the value of any bit in B equals 0.

2. If q = {(i, a)}, then the values of 6 bits B7,B8,...,B12 are 0; 6 remaining bits present (i, a), where 2-bit string B2B1 presents a, 4-bit string B6B5B4B3 presents i.

3. If q = {(i, a),(j, b)}, then the 6-bit string B12B11..B7 presents (i, a) and the remaining 6-bit string B6B5..B1 presents (j, b) in the above mentioned way.

Put Q to be a set of all possible states q, \(\mathcal{C}\) is a set of all 12-bit strings B presenting q, q ∈ Q. Consider a function h, \(h: Q \rightarrow \mathcal{C}, h(q)=B\), where q is presented by B. Obviously, h is a bijective function. Denote the inverse function of h by h−1.

Let \(\mathcal{K}=\{(f, K, I) \mid f \in \mathcal{F}, K \in \mathrm{K}, I \in \mathrm{I}\}\) is a finite set of secret keys. For \(k \in \mathcal{K}\), k = (f, K, I), we define ek and dk as follows.

1. \(e_{k}: \mathcal{P} \rightarrow \mathcal{C}, e_{k}(x)=h\left(\operatorname{Em}^{\prime}(I, f(x), K)\right) \text { for } x \in \mathcal{P}\).

2. \(d_{k}: \mathcal{C} \rightarrow \mathcal{P}, d_{k}(y)=f^{-1}\left(E x^{\prime}\left(h^{-1}(y), I, K\right)\right) \text { for } y \in \mathcal{C}\).

Set \(\mathcal{E}=\left\{e_{k} \mid k \in \mathcal{K}\right\}\)\(\mathcal{D}=\left\{d_{k} \mid k \in \mathcal{K}\right\}\). Definition 2.1, the correctness of the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) is guaranteed by the following theorem.

Theorem 3.2. Let \(\forall x \in \mathrm{P}\)\(\forall k \in \mathcal{K}\)\(e_{k} \in \mathcal{E}\) and \(d_{k} \in \mathcal{D}\). Then dk(ek(x)) = x.

Proof. Set M = f(x),q = Em’ (I,M,K),B = h(q), then ek(x) = B = y. We have h−1(y) = h−1(B) = q, Ex’ (q,I,K) = M by Formula (3.2), f−1(M) = x, then dk(y) = x. \(\square\)

Security analysis of the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\): Assume that we publish parameters the flip graph G, Em’, Ex’, GF4(22) and h in the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\). The plaintext x is obtained from y by the Formula

\(x=d_{k}(y)=f^{-1}\left(E x^{\prime}\left(h^{-1}(y), I, K\right)\right)\).

So, to have accurately x, we need to know S and k = (f, K, I). The number of choices for the image block I is 2569 with gray images, 29t with palette images, where t is the number of bits to represent color indexes. Furthermore, by Formula (2.9), the security of the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) is given by the following formula

\(c 3^{9} 9 ! 2^{18} 2^{8} ! 256^{9}=c 3^{9} 9 ! 2^{90} 2^{8} ! \) for gray images,       (3.3)

\(c 3^{9} 9 ! 2^{18} 2^{9 t} 2^{8} !=c 3^{9} 9 ! 2^{18+9 t} 2^{8} !\) for palette images.       (3.4)

Remark 3.3. By Remark 3.1, all pairs of functions (ek, dk) in the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) do not make the image blocks I change for \(\forall k \in \mathcal{K}\)\(k=(f, K, I)\). In addition, we can see that encrypting and hiding are done at the same time.

Consider an arbitrary subset of image blocks F as an input image, \(F \subset \mathrm{I}\), F = {F1,F2,...,Ft2}, t2 is the number of image blocks. Next, we give a way applying the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) to the process of encrypting and decrypting secret data over an insecure channel. By Remark 3.3, we can use a secret key subset H instead of one secret key k,

\(H=\{(f, K, I) \mid K \in \mathrm{K}, I \in F\} \subset \mathcal{K}\)

for \(f \in \mathcal{F}, \mathrm{K}=\left\{K^{1}, K^{2}, \ldots, K^{t}\right\}\).

Suppose that secret data is a string x = x1x2..xt3 for \(x_{i} \in \mathcal{P}\)\(\forall i=\overline{1, t_{3}}\)\(t_{3} \geq 1\). The encrypting algorithm eH used to encrypt x is given as follows.

iK = 1; iF = 1;

For i = 1 to t3 Do

{

\(\qquad k_{i}=\left(f, K^{i K}, F_{i F}\right);\)       (3.5)

\(\qquad y_{i}=e_{k i}\left(x_{i}\right);\)       (3.6)

\(\qquad \begin{array}{l} i_{K}=\left(i_{K}-1\right) \bmod t_{1}+1 ; \\ i_{F}=\left(i_{F}-1\right) \bmod t_{2}+1 ; \end{array}\)

}

y = y1y2 ..yt3;

The decrypting algorithm dH used to decrypt y is given as follows.

iK = 1; iF = 1;

For i = 1 to t3 Do

{

\(\qquad k_{i}=\left(f, K^{i K}, F_{i F}\right);\)       (3.7)

\(\qquad x_{i}^{\prime}=d_{k i}\left(y_{i}\right) ;\)       (3.8)

\(\qquad \begin{array}{l} i_{K}=\left(i_{K}-1\right) \bmod t_{1}+1; \\ i_{F}=\left(i_{F}-1\right) \text { mod } t_{2}+1; \end{array}\)

}

x’ = x’1x’2 ..x’t3;

Propostion 3.4. Let F, x, H and the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) based on the data hiding (2,9,8) as above. Then dH(eH(x)) = x.

Proof. Clearly, \(\forall i=\overline{1, t_{3}}\), ki determined in Statement (3.5) is the same as in Statement (3.7). In addition, by Theorem 3.2, xi is encrypted by Statement (3.6) and obtained by Statement (3.8) such that x’i = xi. Then x = x’, hence dH(eH(x)) = x. \(\square\)

Security analysis of process of encrypting and decrypting the secret data x using two algorithms eH and dH: Assume that we also publish parameters as in the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\). Hence, to restore exactly x, we need to know S and H. The number of choices for S is \(c 3^{9} 9 !\) by Formula (3.1). The number of choices for H is \(2^{8} ! 2^{18 . t 1} 256^{9 . t 2}\) (for gray images), \(2^{8} ! 2^{18 . t 1} 2^{9 . t . t 2}\) (for palette images, where t is the number of bits to represent color indexes). Then for a brute force attack, the number of all possible combinations of S and H used in two algorithms eH and dH is

\(c 3^{9} 9 ! 2^{8} ! 2^{18 . t 1} 256^{9 . t 2}=c 3^{9} 9 ! 2^{8} ! 2^{18 . t 1+72 . t 2}\) for gray images,       (3.9)

\(c 3^{9} 9 ! 2^{8} ! 2^{18 t 1} 2^{9 . t . t 2}=c 3^{9} 9 ! 2^{8} ! 2^{18 . t 1+9 . t . t 2}\) for palette images.       (3.10)

Remark 3.5. For two algorithms eH and dH given as above, an arbitrary image block I in the input image F can be used many times in process of encrypting and decrypting the secret data. So, for a give input image F, the secret data encrypted is not limited by the size of the input image F.

3.2 Automata Technique for Pattern Matching on Encrypted Data

Suppose that Alice has a secret data and prefers to outsource this data to a cloud provider Bob. As the server is semi-trusted, Alice needs to encrypted her plaintext and wishes to only store ciphertext in the cloud. Assume that Alice uses the cryptosystem \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) proposed in Subsection 3.1 to encrypt data with a pair of two secret parameters (S, k) in the cryptosystem, where S is a 2-Generators for GFn(pm), |S| = 9 and \(k=(f, K, I) \in \mathcal{K}\).

Because of limited storage space and computing ability, instead of downloading ciphertext, decrypting it and searching locally, Alice may ask Bob to perform pattern matching tasks on the ciphertext directly with a trapdoor of the pattern received from her.

To be able to support pattern matching on the server side without leaking information in plaintext, bellow we will construct pattern matching algorithms which can search for any pattern directly in the ciphertext.

Consider Σ to be an alphabet of size 256. Suppose that the secret data is a string over Σ

x = x1x2..xt3

for \(x_{i} \in \mathcal{P}\)\(\forall i=\overline{1, t_{3}}, t_{3} \geq 1\) and t3 is often a large natural number, where \(\mathcal{P}=\Sigma\).

Before uploading the secret data x to Bob, Alice use the encrypting function \(e_{k} \in \mathcal{E}\) to encrypt each xi. Then Alice computes yi = ek(xi), \(\forall i=\overline{1, t_{3}}\), and the encrypted secret data is a string over Σ’

y = y1y2..yt3

which is sent to Bob, where Σ’ is an alphabet

Σ’ = {a’ | a’ = ek(a),a ∈ Σ}.

In general case, for x is any string over the alphabet Σ and a string y is obtained from x by the above way. Then we can write y = ek(x) for short and y is a string over the alphabet Σ’.

Remark 3.6. By using only one pair of two secret parameters (S, k), then the security of process of encrypting and decrypting the secret data x is similar to Formulas (3.3) (for gray images) or (3.4) (for palette images).

Suppose that Bob needs to perform exact or approximate pattern matching tasks of an arbitrary pattern p on encrypted data y. Based on our previously introduced results in [28,29], we continue using automata technique to meet the requirements.

We first introduce some theoretical results to follow the exact pattern matching.

Propostion 3.7. Let p be a pattern over the alphabet Σ. Then Posp’(a’) = Posp(a) for ∀a’ ∈ Σ’, a = dk(a’), where p’ = ek(p).

Proof. Set i = Posp(a), then a = pi, hence a’ = pi’ . Without loss of generality, suppose Posp’(a’) > i, then ∃i' > i, p’i' = a’ by Definition 2.2, then a = pi’ = dk(p’i’). Then i < Posp(a), a contradiction. So, we complete the proof. \(\square\)

Propostion 3.8. Let a pattern p and a text x be two strings over the same alphabet Σ and the function Sign be given by

\(\forall a^{\prime} \in \Sigma^{\prime}, \operatorname{Sign}\left(a^{\prime}\right)=\left\{\begin{array}{ll} 1 & \text { If } a \in p, \\ 0 & \text { Otherwise.} \end{array}\right.\)

Then ∀a’ ∈ Σ’ , a’ ∈ p’ if and only if Sign(a’) = 1.

Proof. Suppose ∀a’ ∈ Σ’ , a’ ∈ p’ if and only if ∃i, i = 1..|p’|, a’ = p’i if and only if a = pi if and only if Sign(a’) = 1. \(\square\)

Propostion 3.9. Let a pattern p and a text x be two strings over the same alphabet Σ. Then p occurs at any position i in x if and only if p’ occurs at the position i in y, where y = ek(x).

Proof. Suppose that p occurs at any position i in x if and only if p = xixi+1..xi+|p|−1 if and only if yiyi+1..yi+|p|-1 = p’ if and only if p’ occurs at the position i in y. \(\square\)

Propostion 3.10. Let p be a pattern over the alphabet Σ. Then ∀l,1 ≤ l ≤ |p|, Nextp’(l) = Nextp(l), where p’ = ek(p).

Proof. Without loss of generality, suppose that lm = Nextp(l) < Nextp’(l) for ∀l,1 ≤ l ≤ |p|. Since p’i = ek(pi),∀i = 1..|p|, then \(p_{1}^{\prime} p_{2}^{\prime} \cdot \cdot p_{l m+1}^{\prime}\) is both a proper suffix and prefix of p’[1..l] by Definition 2.3. Hence, p1p2..plm+1 is also both a proper suffix and prefix of p[1..l] by Definition 2.3. Then Nextp(l) > lm. This is a contradiction to our supposition. So, the proof is complete. \(\square\)

Propostion 3.11. Let p be a pattern over the alphabet Σ. Then for ∀l,0 ≤ l ≤ |p| and ∀a’ ∈ Σ’, a = dk(a’), Appearancep’(l,a’) = Appearancep(l,a), where p’ = ek(p).

Proof. Clearly, |p| = |p’| and for ∀i,1 ≤ i ≤ |p’|,∀a’, a’ ∈ Σ’, a’ = p’i if and only if a = pi. By Lemma 2.4 and Proposition 3.10, Appearancep’(l,a’) = Appearancep(l,a). \(\square\)

Theorem 3.12. Let p be a pattern over the alphabet Σ. Let two automata Ap = (Σ,Qp,q0p,Fp) and Ap’ = ( Σ, Qp’, q0, δp’, Fp’) be determined as in Theorem 2.5. Then Qp’ = Qp, Fp’ = Fp,∀q ∈ Qp’, ∀a’ ∈ Σ, a = dk(a’), δp’(q,a’) = δp(q,a), where p’ = ek(p).

Proof. It is easy to verify that |p| = |p’|. In addition, by Theorem 2.5 and Proposition 3.11, then Qp’ = Qp, Fp’ = Fp and δp’(q,a’) = δp(q,a). \(\square\)

Remark 3.13. The meaning of Theorem 3.12 in practice is to compute δp’ from δp.

Let a pattern p and a text (secret data) x be two strings over the same alphabet Σ and assume |p|= |x|. For assuming that we have only the encrypted secret data y which is not decrypted to the secret data x, from Propositions 3.7, 3.8 and 3.9, Theorem 3.12, based on the MRc algorithm for c = 1 and using the type a breaking point and the concept of Pos in [28], and by using the automaton Ap’ given as in Theorem 2.5, we have an exact pattern matching algorithm immediately that finds all occurrences of the pattern p in x as follows. Note that the trapdoor according to the search pattern p is computed based on p, which includes the length of p, the functions Sign, Posp’ and the automaton Ap’.

\(\text { jump }=|p|;\\ \text { While }(j u m p \leq|y|)\\ \{\\ \qquad \quad \text { If }\left(\operatorname{sign}\left(y_{j u m p}\right)==1\right)\\ \qquad \quad\{\\ \qquad \qquad q=0;\\ \qquad \qquad i=\operatorname{jum} p-\operatorname{Pos}_{p'}\left(y_{j u m p}\right)+\\ \qquad \qquad 1;\text{Do}\\ \qquad \qquad \{\\ \qquad \qquad \qquad q=\delta_{p}\left(q, y_{i}\right);\\ \qquad \qquad \qquad \text { If }(q==|p| \text { ) Mark an occurrence of } p \text { at } i-|p|+1 \text { in } x \text { ; }\\ \qquad \qquad \qquad i++;\\ \qquad \qquad \text { \} While }(q \neq 0 \text { and } i \leq|y|) \text { ; }\\ \qquad \qquad \operatorname{jump}=i-1 ;\\ \qquad \quad \}\\ \qquad \quad \operatorname{jum} p=\text { jump }+|p| ;\\ \}\)

Remark 3.14. Obviously, the time complexity of the above algorithm is the same as our MR1 algorithm in the worst case, O(n) [28]. Then in the worst case, our new algorithm’s time complexity is also O(n).

Next, theoretical results for approximate pattern matching are shown as follows.

Propostion 3.15. Let p be a pattern over the alphabet Σ. Then WConfig(p’) = WConfig(p), where p’ = ek(p).

Proof. Obviously, W0 ∈ WConfig(p’) and WConfig(p). Consider ∀W’ ∈ WConfig(p’)\{W0}, then we can set \(\mathrm{W}^{\prime}=\left\{\mathrm{w}_{1}^{\prime}, \mathrm{w}_{2}^{\prime}, \ldots, \mathrm{w}_{l}^{\prime}\right\}\) for 1 ≤ l ≤ |p’|. Then ∃C’ = {u’1,u’2,...,u’l} ∈ Config(p’) by Definition 2.6, where Wp’(u’i) = wi’ for 1 ≤ i ≤ l. Then ∃!C = {u1,u2,...,ul} ∈ Config(p), ui = dk(u’i) for 1 ≤ i ≤ l. Set W = Wp(C), then W ∈ WConfig(p) by Definition 2.6. It easy to verify that Rmp’(u’i) = Rmp(ui) for 1 ≤ i ≤ l, then Wp’(ui’) = Wp(ui) for 1 ≤ i ≤ l by Definition 2.6. Hence, W’ = W, then WConfig(p’) ⊂ WConfig(p). Similarly, we have WConfig(p) ⊂ WConfig(p’). So, the proof is complete. \(\square\)

Propostion 3.16. Let p be a pattern over the alphabet Σ. Then Refp’(i,a’) = Refp(i,a) for ∀i, 0 ≤ i ≤ |p’| and ∀a’ ∈ Σ’, a = dk(a’), where p’ = ek(p).

Proof. Clearly, \(W_{p'}^{i}\left(a^{\prime}\right)=W_{p}^{i}(a)\) by Definition 2.7. So, Refp’(i,a’) = Refp(i,a) by Definition 2.7. Hence, we complete the proof. \(\square\)

Theorem 3.17. Given a pattern p on Σ and a positive integer constant c with 1 ≤ c ≤ |p|. Let two automata \(A_{p}^{P c}=\left(\Sigma, Q_{p}, q_{0,} \delta_{p}, F_{p}\right)\) and \(A_{p^{\prime}}^{P c}=\left(\Sigma^{\prime}, Q_{p^{\prime}}, q_{0}, \delta_{p'}, F_{p'}\right)\) be determined as in Theorem 39 [29]. Then Qp’ = Qp, Fp’ = Fp, ∀q ∈ Qp’, ∀a’ ∈ Σ’, a = dk(a’), δp’(q,a’) = δp(q,a), where p’ = ek(p).

Proof. By Proposition 3.15, Qp’ = Qp. Evidently, ∀a’ ∈ Σ’, a = dk(a’), a’ ∈ p’ if and only if a ∈ p. Furthermore, by Definition 2.8 and Proposition 3.16, δp’(W,a’) = δp(W,a). Then Fp’ = Fp. So, the proof is complete. \(\square\)

Remark 3.18. The meaning of Theorem 3.17 in practice is to compute δp’ from δp.

Based on the approximate pattern matching problems considered in [17, 18, 20], we introduce a new concept of the appearance of the pattern p in x with a given error. This is a basis for giving requirements for the approximate pattern matching algorithm.

Definition 3.19. Given two strings p and x over Σ, and a string similarity measure d. Let an error ε,ε 0, ε > ∈ R . Then p appears in x with the error if there exists a substring u of x such that d(p, u) ≤ ε.

To construct the approximate pattern matching algorithm, we need a function to measure the string similarity. The most commonly used similarities are recalled in [19, 20, 21]. Bakkelund [1] proposed a well known string similarity measure which is based on the longest commonly subsequence. Similarly, here we define a new measure of similarity between two strings

\(d(p, u)=1-\frac{\operatorname{lcs}(p, u)}{\min \{|p|,|u|\}}\),       (3.11)

where p is a pattern and u is a substring of x. Clearly, d given above is positive definite and symmetric.

Propostion 3.20. Given two strings p and x on Σ. Then ∀u’ , u’ is an arbitrary substring of y, d(p’, u’) = d(p, u), where p’ = ek(p),y = ek(x),u = dk(u’).

Proof. Clearly, |p’| = |p|, |u’| = |u| and lcs(p,u) = lcs(p’,u’). By Formula (3.11), d(p’ , u’) = d(p, u). So, we complete the proof. \(\square\)

By using the string similarity measure given in Formula (3.11), the automata technique for computing lcs(p’, u’) [29] will make an approximate pattern matching algorithm fast, and especially efficient for one pattern and a set of a large number of encrypted texts.

Given a pattern p and a text (secret data) x over the same alphabet Σ, and an arbitrary substring u of x. Let ε, 0 < ε < 1 and d(p, u) be given as in Formula (3.11) such that d(p, u) ≤ ε. Then by Proposition 3.20, d(p, u) ≤ ε . By Formula (3.11), we have

\(\operatorname{lcs}\left(p^{\prime}, u^{\prime}\right) \geq(1-\varepsilon) \min \left\{\left|p^{\prime}\right|,\left|u^{\prime}\right|\right\}\).       (3.12)

If there is u’ which is a substring of y such that lcs(p', u') ≥ (1-ε) |p|, then Formula (3.12) holds that means d(p', u') ≤ ε. Hence, ∃u, u is a substring of x, d(p, u) ≤ ε. So, the constant c in Theorem 39 [29] is determined by \(c=\lceil(1-\varepsilon)|p|\rceil\).

Without decrypting y, based on Theorem 3.17, Definition 3.19 and Formula (3.11), use the automaton \(A_{p^{\prime}}^{P c}\) given as in Theorem 39 [29], we immediately have an approximate pattern matching algorithm which determines whether p appears in x with the error ε or not as follows. Here, the trapdoor responding to the pattern p is determined from p and ε , which consists of the constant c and the automaton \(A_{p^{\prime}}^{P c}\).

\(a p p=0;\\ q=W_{0} ; / / \text { The initial state of the automaton } A_{p^{\prime}}^{P_{c}} \text { is started from } W_{0}\\ \text { For } i=1 \text { to }|y| \text { Do }\\ \{\\ \qquad q=\delta_{p'}\left(q, y_{i}\right);\\ \qquad \text { If }(|q|=c)\{\text { app }=1 ; \text { Break; }\}\\ \}\\ \text { If }(a p p=1) \text { Announce the appearance of the pattern } p \text { in } x \text { with the }\ \text { error; }\\ \text { Else Announce that } p \text { does not appear in } x \text { with the error } \varepsilon \text { . }\)

Remark 3.21. Since we can compute δp’ from δp, our proposed algorithm is similar to the Algorithm 2 (the parallel algorithm) in [29]. In addition, according to Theorem 39 [29], δp is computed in parallel way and the Algorithm 2 costs the worst case time complexity O(n) with the supposition that the Algorithm 2 uses k processors for k is an upper estimate of the lcs(p,x). As an immediately consequence, in the worst case, we have the O(n) time complexity of the above algorithm when it uses \(\lceil(1-\varepsilon)|p|\rceil\) processors.

4. Conclusions

From our results in the steganography and pattern matching areas and some suggestions in the next works in [27, 28, 29], this paper has completed some parts of those works. Based on the data hiding scheme (2, 9, 8) in [27], we construct a novel cryptosystem with high security. This method allows both of encrypting and hiding to be done at once, the ciphertext not to depend on the input image size as existing hybrid techniques of cryptography and steganography. Next, we use this cryptosystem to encrypt secret data on users side. With the ciphertext, we design two pattern matching algorithms to search for any pattern in it directly on cloud servers side. The idea of the design is to apply our automata approach for the exact pattern matching and the longest common subsequence problems in [28,29]. For the assumption that the approximate algorithm uses \(\lceil(1-\varepsilon) m)\rceil\) processors, the time complexities of these algorithms are both O(n) in the worst case, where ε , m and n are the error of our measure of similarity between two strings and lengths of the pattern and secret data, respectively.

With our automata approach to pattern matching algorithms, the automata constructed are only based on search patterns. Then the algorithms will have lots of advantages in case of a given pattern and a very large set of ciphertexts stored in the cloud. So, in the future, we continue studying this technique to apply in SE.

Acknowledgements

The author is truly grateful to Phan Trung Huy, Phan Thi Ha Duong and Vu Thanh Nam for their valuable suggestions and help. This work was partially funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under the grant number 101.99-2016.16.

References

  1. D. Bakkelund, "An LCS-based String Metric," University of Oslo (Norway), September 23, 2009.
  2. P. Bharti, R. Soni, "A New Approach of Data Hiding in Images Using Cryptography and Steganography," International Journal of Computer Applications, 58(18), pp. 1-5, 2012. https://doi.org/10.5120/9379-3716
  3. A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen, J. Seiferas, "The Smallest Automation Recognizing The Subwords of A Text," Theoretical Computer Science, Volume 40, pp. 31-55, 1985.
  4. S. Chakraborty, S. K. Bandyopadhyay, "Steganography Method Based on Data Embedding by Sudoku Solution Matrix," International Journal of Engineering Science Invention, 2(7), pp. 36-42, 2013.
  5. A. Chatterjee, A.K. Das, "Secret Communication Combining Cryptography and Steganography," Progress in Advanced Computing and Intelligent Engineering, Vol. 563, pp. 281-291, 2018. https://doi.org/10.1007/978-981-10-6872-0_26
  6. G. Chugh, "Information Hiding - Steganography & Watermarking: A Comparative Study," International Journal of Advanced Research in Computer Science, 4(4), pp. 165-171, 2013.
  7. N. Desmoulins, P. A. Fouque, C. Onete, O. Sanders, "Pattern Matching on Encrypted Streams," Advances in Cryptology - ASIACRYPT 2018, pp. 121-148, 2018.
  8. Q. Dong, Z. Guan, L. Wu, Z. Chen, "Fuzzy Keyword Search over Encrypted Data in The Public Key Setting," Web-Age Information Management, pp. 729-740, 2013.
  9. R. Dowsley, A. Michalas, M. Nagel, N. Paladi, "A Survey on Design and Implementation of Protected Searchable Data in The Cloud," Computer Science Review, Volume 26, pp. 17-30, 2017. https://doi.org/10.1016/j.cosrev.2017.08.001
  10. A. Ehrenfeucht, R, M. McConnell, N. Osheim, S. W. Woo, "Position Heaps: A Simple and Dynamic Text Indexing Data Structure," Journal of Discrete Algorithms, Vol. 9, pp. 100-121, 2011. https://doi.org/10.1016/j.jda.2010.12.001
  11. Y. K. Gedam, J.N. Varshapriya, "Fuzzy Keyword Search over Encrypted Data in Cloud Computing," Journal of Engineering Research and Applications, 4(7), pp. 197-202, 2014.
  12. F. Han, J. Qin, J. Hu, "Secure Searches in The Cloud: A Survey," Future Generation Computer Systems, Vol. 62, pp. 66-75, 2016. https://doi.org/10.1016/j.future.2016.01.007
  13. R. Haynberg, J. Rill, D. Achenbach, J. Muller-Quade, "Symmetric Searchable Encryption for Exact Pattern Matching Using Directed Acyclic Word Graphs," in Proc. of 2013 International Conference on Security and Cryptography (SECRYPT), pp. 403-410, 2013.
  14. M. Jain, S. K. Lenka, "A Review of Digital Image Steganography Using LSB and LSB Array," International Journal of Applied Engineering Research, 11(3), pp. 1820-1824, 2016.
  15. N. S. Jho, D. Hong, "Symmetric Searchable Encryption with Efficient Conjunctive Keyword Search," KSII Transactions on Internet and Information Systems, 7(5), pp. 1328-1342, 2013. https://doi.org/10.3837/tiis.2013.05.022
  16. M. S. John, P. SumaLatha, M. Joshuva, "A Comparative Study of Index-Based Searchable Encryption Techniques," International Journal of Advanced Research in Computer Science, 6(3), pp. 13-15, 2015.
  17. G. M. Landau, U. Vishkin, "Efficient String Matching with k Mismatches," Theoretical Computer Science, Vol. 43, pp. 239-249, 1986. https://doi.org/10.1016/0304-3975(86)90178-7
  18. J. V. Leeuwen, "Handbook of Theoretical Computer Science," Elsevier MIT Press, Vol. A, pp. 290-300, 1990.
  19. Z. Mei, B. Wu, S. Tian, Y. Ruan, Z. Cui, "Fuzzy Keyword Search Method over Ciphertexts Supporting Access Control," KSII Transactions on Internet and Information Systems, 11(11), pp. 5671-5693, 2017. https://doi.org/10.3837/tiis.2017.11.027
  20. G. Navarro, "A Guided Tour to Approximate String Matching," ACM Computing Surveys, 33 (1), pp. 3188, 2001. https://doi.org/10.1145/375360.375365
  21. P. H. Paris, N. Abadie, C. Brando, "Linking Spatial Named Entities to The Web of Data for Geographical Analysis of Historical Texts," Journal of Map & Geography Libraries, 13(1), pp. 82-110, 2017. https://doi.org/10.1080/15420353.2017.1307306
  22. D. X. Song, D. Wagner, A. Perrig, "Practical Techniques for Searches on Encrypted Data," in Proc. of 2000 IEEE Symposium on Security and Privacy, pp. 44, 2000.
  23. S. Song, J. Zhang, X. Liao, J. Du, Q. Wen, "A Novel Secure Communication Protocol Combining Steganography and Cryptography," Procedia Engineering, Vol. 15, pp. 2767-2772, 2011. https://doi.org/10.1016/j.proeng.2011.08.521
  24. D. R. Stinson, "Cryptography: Theory and Practice (CRC Press Series on Discrete Mathematics and Its Application)," CRC Press, pp. 1-20, 180-184, 1995.
  25. M. Strizhov, Z. Osman, I. Ray, "Substring Position Search over Encrypted Cloud Data Supporting Efficient Multi-User Setup," Future Internet, 8(3), 28, 2016. https://doi.org/10.3390/fi8030028
  26. D. M. Sunday, "A Very Fast Substring Search Algorithm," Communications of The ACM, 33(8), pp. 132-142, 1990. https://doi.org/10.1145/79173.79184
  27. N. H. Truong, "A New Digital Image Steganography Approach Based on The Galois Field GF(pm) Using Graph and Automata," KSII Transactions on Internet and Information Systems, 13(9), pp. 4788-4813, 2019. https://doi.org/10.3837/tiis.2019.09.025
  28. N. H. Truong, "A New Approach to Exact Pattern Matching," Journal of Computer Science and Cybernetics, 35(3), pp. 197-216, 2019. https://doi.org/10.15625/1813-9663/35/3/13620
  29. N. H. Truong, "Automata Technique for The LCS Problem," Journal of Computer Science and Cybernetics, 35(1), pp. 21-37, 2019. https://doi.org/10.15625/1813-9663/35/1/13293
  30. Varsha, R. S. Chhillar, "Data Hiding Using Steganography and Cryptography," International Journal of Computer Science and Mobile Computing, 4(4), pp. 802-805, 2015.
  31. R.M. Yadav, D. S. Tomar, R. K. Baghel, "A Study on Image Steganography Approaches in Digital Images," Engineering Universe for Scientific Research and Management, 6(5), pp. 1-6, 2014.
  32. W. Yunling, W. Jianfeng, C. Xiaofeng, "Secure Searchable Encryption": A Survey, Journal of Communications and Information Networks, 1(4), pp. 52-65, 2016. https://doi.org/10.1007/BF03391580
  33. L. Wei, H. Zhu, Z. Cao, X. Dong, W. Jia, Y. Chen, A. Vasilakos, "Security and Privacy for Storage and Computation in Cloud Computing," Information Sciences, Vol. 258, pp. 371-386, 2014. https://doi.org/10.1016/j.ins.2013.04.028
  34. B.B. Zaidan, A. A. Zaidan, A. K. Al-Frajat, H. A. Jalab, "On The Differences between Hiding Information and Cryptography Techniques: An Overview," Journal of Applied Science, 10(15), pp. 1650-1655, 2010. https://doi.org/10.3923/jas.2010.1650.1655