DOI QR코드

DOI QR Code

Enabling Dynamic Multi-Client and Boolean Query in Searchable Symmetric Encryption Scheme for Cloud Storage System

  • Xu, Wanshan (Faculty of Information Technology, Beijing University of Technology) ;
  • Zhang, Jianbiao (Faculty of Information Technology, Beijing University of Technology) ;
  • Yuan, Yilin (Faculty of Information Technology, Beijing University of Technology)
  • Received : 2021.07.12
  • Accepted : 2022.03.18
  • Published : 2022.04.30

Abstract

Searchable symmetric encryption (SSE) provides a safe and effective solution for retrieving encrypted data on cloud servers. However, the existing SSE schemes mainly focus on single keyword search in single client, which is inefficient for multiple keywords and cannot meet the needs for multiple clients. Considering the above drawbacks, we propose a scheme enabling dynamic multi-client and Boolean query in searchable symmetric encryption for cloud storage system (DMC-SSE). DMC-SSE realizes the fine-grained access control of multi-client in SSE by attribute-based encryption (ABE) and novel access control list (ACL), and supports Boolean query of multiple keywords. In addition, DMC-SSE realizes the full dynamic update of client and file. Compared with the existing multi-client schemes, our scheme has the following advantages: 1) Dynamic. DMC-SSE not only supports the dynamic addition or deletion of multiple clients, but also realizes the dynamic update of files. 2) Non-interactivity. After being authorized, the client can query keywords without the help of the data owner and the data owner can dynamically update client's permissions without requiring the client to stay online. At last, the security analysis and experiments results demonstrate that our scheme is safe and efficient.

Keywords

1. Introduction

The development of cloud computing has brought great convenience for the public, more and more users outsource their data to the cloud. The advantages of the cloud storage, such as mobile access, stability and reliability, make the users can access data anytime and anywhere, which greatly improves work efficiency and realize resource sharing while ensuring data security.

To ensure the confidentiality and integrity of cloud storage, the data is encrypted before it is uploaded to the cloud. But unfortunately, performing keyword search on ciphertext is a difficult task for the user. When searching for a particular protocol to achieve the user must download the cipher-text and decrypt it after searching. It is extremely inefficient and impractical when the scale of the data is very large. Therefore, searchable encryption (SE) ([1], [2], [3], [4]) came into being.

SE allows the user to search keywords on the ciphertext without revealing their privacy. SE performs queries on the ciphertext, and files to be searched is transparent to the server, which helps to achieve the integrity and confidentiality of the data on the cloud. And furthermore, it is conducive to protecting user privacy.

Searchable symmetric encryption (SSE) ([5], [6], [7], [8]) is an efficient and secure SE. Assisting with the inverted index and symmetric encryption primitives, the SSE achieves efficient ciphertext retrieval in sublinear time. Although SSE is an efficient means of ciphertext retrieval, but now most of the existing SSE schemes mainly focus on the search of a single-keyword in a single-client setting, which limits the expansion of SSE in practice. The multi-client SSE scheme was first proposed by Curtmola [9] in 2006, and then multi-client schemes were proposed one after another. However, the existing multi-client SSE schemes are mostly interactive (the client interacts with the data owner when performing keyword retrieval), and do not support the dynamic update of the client (dynamic addition and deletion of the client) or the dynamic update of files.

Related works. Searchable encryption was proposed by Song [1] in 2000, which is a full text search, the search cost grows linearly with the size of the database. To improve query efficiency, Curtmola [9] proposes a symmetric searchable encryption scheme with inverted index to achieve optimal search time. Chase and Kamara [10] propose a similar scheme but costs higher storage. In addition to search efficiency, many works have been done to improve query expression ([13], [14], [15], [16], [17]) and advanced security ([20], [24], [25], [26], [27], [28]).

The original SSE scheme was mainly for single-keyword, to enrich the search function, some research focus on multi- keyword SSE scheme. Golle [13] and Ballard [14] proposed efficient conjunctive keyword searches over encrypted data, these two schemes can realize multi-keyword queries, but the communication cost is linear in the number of documents. To provide a truly practical search capability, Cash [15] proposed a highly-scalable searchable symmetric encryption with support for Boolean queries, which constructs the OXT protocol to achieve Boolean query. Based on OXT protocol, Lai [16] proposed a result pattern hiding SSE scheme supporting conjunctive queries. Xu [17] proposed EGRQ, a range query scheme to achieve secure and efficient query on encrypted spatial data. The above SSE schemes support multi-keyword query, but only supports single-client scenarios. The concept of multi-client symmetric searchable encryption (MSSE) was first proposed by Curtmola [9], which uses broadcast encryption on top of a single-client scheme. Raykova [18] improves the efficiency of Curtmola by employing a deterministic encryption and achieves a linear search time. These two schemes are interactive and have a large communication cost. Jarecki [19] extends the OXT scheme proposed by Cash [15] to the multi-client by the utilization of homomorphic signature and oblivious pseudorandom functions (PRFs), and realizes the Boolean query in the multi-client setting. Faber [21] extends the query type of OXT, supporting for range, substring, wildcard, and phrase queries. However, both these two schemes are interactive. Du [23] presented a multi-client SSE scheme that supports Boolean queries, which incorporates a client’s authorization information into search tokens and indexes. The scheme proposed by Du supports dynamic update of client permissions, however, the data owner regenerates the search index every time a new client joins, which becomes very inefficient when the index scale is large.

Contributions. We propose a dynamic multi-client searchable symmetric encryption scheme supporting Boolean query, which extends SSE from the single-client setting to multiple clients, while realizing dynamic update of clients and fine-grained access control. In addition, our scheme supports Boolean query with multiple keywords. The main contributions are summarized as follows:

1. We use attribute-based encryption (ABE) to extend SSE from single client to multi-client, and implement Boolean query of multiple keywords. We construct a hybrid encryption that symmetric encryption is used to encrypt files and ABE is used to encrypt the symmetric key, only the client meeting the access policy can decrypt the key so as to decrypt files, an efficient SSE scheme in multi-client is implemented.

2. We implement efficient dynamic update (add/delete) of clients, and the time cost is for O(1) N clients. By constructing the access control list (ACL), our scheme only allows authorized clients to access keywords, so as to prevent malicious clients from illegal access.

3. We have realized the dynamic update of files, and the data owner can update files independently without affecting other clients. Furthermore, in our scheme, the deleted files can be filtered by judging the operation (add/delete) when the server performs a search, and only valid files are sent back to the client, which improves the search efficiency and reduces the communication load.

2. Preliminaries

2.1 System model

The system model of DMC-SSE is shown in Fig. 1, there are three entities in the system: data owner D, client Ci and the cloud sever. The server provides cloud storage services and is honest-but-curious, also it is not trusted. The data owner encrypts the plaintext DB (a database including a list of d identifier-keyword pairs (idi, Wid=1)di=1 to ciphertext EDB, and sends it to the cloud. To perform a query Q with keywords (\(\begin{aligned}\bar{W}=\left\{w_{1}, w_{2}, \ldots, w_{n}\right\}\end{aligned}\)) from the server, the client Ci needs to register to the data owner D first, and then D will returns an authorization certificate Ωi to the client, with which the client Ci generates a search token TKi,Q and sends it to the server. On receiving TKi,Q, the server will search the EDB and returns the results R that satisfies the requirement to the client Ci, finally Ci decrypts the files in R locally.

E1KOBZ_2022_v16n4_1286_f0001.png 이미지

Fig. 1. System model

2.2 Attribute-based encryption (ABE)

Attribute-based encryption (ABE) is developed from the encryption scheme based on fuzzy identity, which can be divided into two types: key strategy ABE (KP-ABE) and ciphertext strategy ABE (CP-ABE). KP-ABE allows the private key to correspond to an access structure, and the ciphertext corresponds to an attribute set; while CP-ABE, on the contrary, allows the private key to correspond to an attribute set, and the ciphertext corresponds to an access structure. Whether KP-ABE or CP-ABE, only the attribute set satisfy the access policy can decrypt the ciphertext. ABE is very effective in encrypting data sharing, since it can realize data access control while encrypting data. In this paper, we use CP-ABE attribute encryption scheme, which contains the following four algorithms:

• ABE.Setup(λ) : takes secret parameter λ as the input and outputs the system parameter mpk and master key msk.

• ABE.KeyGen(msk, mpk, A) : takes the system parameter mpk, master key msk and the attribute set A as the input and outputs a private key skiA.

• ABE.Enc(mpk, msg, U) : takes the message msg, system parameter mpk and the access structure U as the input and outputs the ciphertext msg*.

• ABE.Dec(skiA, msg*) : takes the ciphertext msg* and the private key skiA as the input, msg* contains an access structure U and skiA is associated with a set of attribute A , this algorithm outputs the decrypted information msg if A ∈ U.

3. Overview

The multi-client SSE scheme in our system combines attribute-based encryption with searchable symmetric encryption. First, files in DB are encrypted by the symmetric key Kc, and then Kc is encrypted by attribute-based encryption. Only clients who meet the attribute policy can decrypt the symmetric key and decrypt the ciphertext.

3.1 Access Control List (ACL)

The access control list (ACL) is owned by the data owner to control the permission of other clients. Assume that the client Ci will joins our system, it registers with the data owner D. To identify the client Ci, D generates a tag αi for Ci, αi ∈ Z*p. For legitimate keywords (w1, w2,..., wn) that can be accessed by Ci, Dcomputes cj ← αiㆍwj, j = 1, 2, …, n, and adds cj to ACL, at last, D sends the updated ACL to the cloud server.

Assume that there are three clients (C1, C2, C3) with the tag (α1, α2, α3) and the corresponding authorization keywords are (w1, w2), (w2, w3) and (w1, w3),the structure of ACL is shown in Fig. 2.

E1KOBZ_2022_v16n4_1286_f0002.png 이미지

Fig. 2. The structure of ACL

When the client performs a query, it calculates ci as above, and the server checks whether ci is in ACL, if so, the query is legal and can be continued, else the query is illegal and the query will be stopped.

\(\begin{aligned}\mathrm{c}_{i}\left\{\begin{array}{l}\mathrm{c}_{i} \in A C L, \text { query is legal,continue } \\ \mathrm{c}_{i} \notin A C L, \text { query is illegal,stop }\end{array}\right.\end{aligned}\)

The structure of the ACL is a one-way list, and the ACL is updated dynamically according to the change of client's search permission. When a new client joins, the cj of the client are added at the tail of the ACL while the values will be removed when the client is revoked.

3.2 Scheme Definition

Our scheme mainly includes the following algorithms :

• KeyGen(1λ): takes the system parameterλ as input, and outputs the system master key msk and public key mpk. It is performed by the data owner D.

• EDBSetup(DB, msk, mpk, U): takes the database DB, system master key msk, master public key mpk and an attribute universe U as input, and outputs the encrypted database EDB.

• ClientAuth(msk, mpk, Ai, ACL): the client Ci submits its attribute Ai to the data owner D, D generates the private key skiA of Ci with msk , mpk and Ci’s attributes Ai, meanwhile, D assigns the client Ci an identity αi, which will be encrypted by enri ← ABE.Enc(αi) ,the skiA and enri will be sent back to the client. In addition, D will calculates ci ← αiㆍwi for legal keywords of client and adds ci to ACL. At last, D sends the updated ACL to the server.

• TokenGen(skiA, enri , W): the client Ci takes private key skiA , encrypted identity enri and keywords W to query as input, and generates the token TKi,Q as output.

• Search (TKi,Q, EDB): the sever takes the token TKi,Q as input, and outputs results R that satisfy the query requirements to the client.

• Retrieve( skiA , R):the client Ci gets the identifiers of documents by decrypting the returned results R with her private key skiA . With the identifiers and decryption key , the client will retrieve the original documents.

3.3 Security Definition

In our scheme, we consider the security against adversarial server, it’s the design goal of our scheme to reveal as little information as possible to the server in a query. The less information leaked to the server, the more difficult it is for the server to guess the information of the token or file, so as to better protect the privacy of users.

Security against adversarial server. A loss function L is used to represent the information leaked to the adversary during a query, let ∏ = {KeyGen, EDBSetup, ClientAuth, TokenGen, Search} be our DMC-SSE scheme, we define the security of ∏ by two experiments: RealA(λ) and IdealA,S(λ) :

RealA(λ) : Then adversary A chooses a series of queries adaptively and repeatedly to trigger the experiment runs KeyGen, ClientAuth, TokenGen and Search and the experiment outputs a bit b that A returns to the experiment.

IdealA,S(λ) : Adversary A chooses a database DB and a series of queries Q, the experiment runs S(L(DB,Q)) and output a bit b.

Definition 1. We say that ∏ is L-semantically-secure if for any probabilistic, polynomial-time (PPT) adversaries A ,there exists a PPT simulator S, such that :

| Pr[RealA(λ)=1]-Pr[IdealA,S(λ)=1]| ≤ negl(λ)

Now we describe the loss function L in our scheme. As ref [23], for simplicity of analysis, we consider a simple setting that all queries are conjunctive queries. We use Q = (s,x) to denote a series of conjunctive queries, where Q[k] = (s[k], x[k, 1], x[k, 2], …, x[k, n]) is an individual query, s[k] and x[k,ㆍ] denote sterm (the least frequent keyword among all keywords in a query) and xterm (the other keywords except sterm in a query), respectively. The leakage function L (DB, Q) can be defined as below:

• N=Σdi=1 | Wi |, the number of the (wi, idi) pairs.

\(\begin{aligned}\overline{\mathrm{S}} \in \mathbf{N}^{T}\end{aligned}\), the equality pattern of sterms s, indicating which queries have the same sterm.

• SN, the number of files matching the sterm, obviously, SN[k] =|DB[k]|.

• AN, the number of files matching the entire conjunction query, AN[k] = DB(s[k])∩ DB(x[k,α]) , α = {1, 2,… ,n}

• IP is the conditional intersection pattern, which is formally defined by

\(\begin{aligned}\operatorname{IP}\left[k_{1}, k_{2}, \alpha, \beta\right]=\left\{\begin{array}{ll}\mathrm{DB}\left(\mathrm{s}\left[k_{1}\right]\right) \cap \mathrm{DB}\left(\mathrm{s}\left[k_{2}\right]\right) \text { if } k_{1} \neq k_{2} \text { and } \mathrm{x}\left[k_{1}, \alpha\right]=\mathrm{x}\left[k_{2}, \beta\right] \\ \phi & \text { otherwise }\end{array}\right.\end{aligned}\)

• DBT is the search result pattern of the sterm in the k-th query, DBT[k] = DB[s[k]]

• XN is the number of xterms in the k-th query.

4. Dynamic multi-client SSE

In this section, we give our multi-client searchable symmetric encryption scheme ∏ = {KeyGen, EDBSetup, ClientAuth, TokenGen, Search}. Let Hi : {0,1}* → {0,1}λ , be hash functions, and F : {0,1}λ × {0,1}λ → {0,1}λ be PRFs.

4.1 Our construction

KeyGen(1λ): with the system parameter λ ,the data owner generates the master key msk and public key mpk, where (mpk, msk) ← ABE.Setup(1λ).

EDBSetup(DB, mpk, U): As shown in algorithm 1, data owner takes the database DB=(idi, Wi)di=1 , public key mpk and an attribute set U as input, and outputs the encrypted database EDB = (T, X). It chooses big primes p, q, random keys KI, Kz for a PRF Fp and Kw for a PRF F, \(\begin{aligned}F, g \stackrel{s}{\leftarrow} G\end{aligned}\). To improve the efficiency of DB encryption and decryption, symmetric encryption primitives are used in our scheme. To share the symmetric key Kid with legitimate users, D encrypts Kid with the attribute set U, op represents the operation (add/delete) of the files.

EDB consists of a TSet T and a XSet X, these two sets are stored in dictionary structure, EDB uses inverted index to store the identifiers of all documents. Like most other MSSE schemes, identifiers of files in DB is encrypted and stored in T.

Algorithm 1 EDBSetup

Input: DB, mpk, U

Output: EDB

1: function EDBSetup (DB, mpk, U)

2: T ← {};X ← {};cnt ← 0

3: for w ∈ W do

4: cnt ← 1 ; stagw ← F(Kw, W)

5: for id ∈ DB[w] do

6: l ← H(stagw || count) ; u ← ABE.Enc(mpk, id || Kid, U)

7: eid ← Fp (KI, id) ; z ← Fp(Kz, w)

8: v ← eidㆍz-1 ; o ← op⊕ l

9: x ← gFp(Kx,w)ㆍeid; X ← X ∪ x;

10: T[l ]←(u, v, o) ; cnt ← cnt + 1

11: end for

12: end for

13: EDB ← {T,X}

14: end function

Algorithm 2 ClientAuth

Input: msk, mpk, ACL

Output: Ω ,ACL’

1: function ClientAuth (msk, mpk, ACL)

2: skiA ← ABE.KeyGen(msk, mpk, Ai)

3: ri ← {0,1}λ ; αi ← F(Kk, ri)

4: enr ← ABE.Enc(αi)

5: for \(\begin{aligned}\mathrm{W} \in \overline{\mathrm{W}}\end{aligned}\) do

6: ci ← αiㆍw ; ACL ← ACL ∪ ci

7: end for

8: ACL'← ACL ; Ωi ← {skiA, Kw, Kz, Kx, mpk, enr}

9: return Ωi

10: end function

ClientAuth(msk, mpk, Ai, ACL) : When a client with attribute set Ai performs a query on the encrypted database for the first time, he needs to authenticate with the data owner. The data owner D generates a corresponding private key skiA according to the properties Ai of the client Ci, where skiA ← ABE.KeyGen(msk, mpk, Ai) , D sends the private key skiA to the client. To ensure that legitimate clients can only access the authorized keywords, the data owner D first generates an identity αi for client Ci, then uses αi and legal keywords w to generate a blind factor ci which will be added to ACL, keywords only that in ACL can be accessed by the client. At last, D sends the Ωi ← (skiA, Kw, Kz, Kx, mpk, enr) back to the client Ci, where enr ← ABE.Enc(αi) and send the updated access control list ACL' to the server.

Algorithm 3 TokenGen

Input: Ωi, Q= {w1, w2, …,Wn }

Output: TKi,Q

1: function TokenGen(Ωi, Q)

2: acfw ← F(Kw, w1) ; αi ← ABE.Dec(enr)

3: for c = 1,2, …. until the server stops do

4: for j = 2, …., n do

5: xtoken[c, j] ← gFp(Kz, w1∥c)·Fp(Kx, wj) ; ctl[j] ← αi · wj

6: end for

7: TKi, Q ← {acfw, xtoken[2], …, xtoken[n], ctl[2], …ctl[n]}

8: end for

9: return TKi, Q

10: end function

TokenGen(Ωi, Q) : When the client Ci wants to perform a boolean search on the EDB with a set of keyword \(\begin{aligned}\bar{w}=\left\{w_{1}, w_{2}, \ldots, w_{n}\right\}\end{aligned}\), he first choose a sterm who is the keyword with lowest-frequency from \(\begin{aligned}\bar{w}\end{aligned}\), for simplicity, we assume that w1 is the sterm and assume that we take the conjunctive query Q = {w1 ∧ w2 ∧…∧ wn). The client Ci generates a blind factor ctl[i] ← αiㆍwi for each keyword wi(i = 2, ......, n) in xterms, where αi ← ABE.Enc(enr) , and the token will be generated by the algorithm 3.

Algorithm 4 Search

Input: TKQ, EDB, ACL

Output: R

1: function Search (TKQ, EDB, ACL)

2: R ← {} ; c ← 0

3: while true do

4: l ← H(acfw || c)

5: if T[l]=null then

6: return R

7: else

8: (u, v, o) ← T[l] ; op ←u⊕l

9: if op = "add" then

10: if xtoken[i]v ∈ X and ctl[i] ∈ ACL, i = 2, …,n then

11: R ← R ∪ e

12: else if op = "del" then

13: R ← R - e

14: c ← c+1

15: end if

16: end while

17: end function

Search (TKi,Q EDB, ACL): On receiving the search token TKi,Q sent by the client Ci, the server will perform a search in EDB to find the matching files for TKi,Q, and returns the file set R to the Ci, as is shown in algorithm 4.

To ensure the legitimate access of a query Q, the blind factor ctl[i] of the client is checked during the search, and only the keywords that the blind factor in the ACL are allowed to be accessed. Furthermore, only data owner D can perform op (add or delete) in function EDBSetup, and other clients can only query keywords. If op = "add", it indicates that the file is added and the corresponding id is valid, but for op = "del", it indicates that the file is deleted, and the corresponding id is invalid.

After the client Ci retrieves the ciphertext R, it decrypts the symmetric key Kid with the algorithm Kid ← ABE.Dec(skiA, e). Due to Kid is encrypted by attributes, only clients that satisfy the attribute encryption policy can decrypt it. With the symmetric key Kid, Ci can decrypt files efficiently.

4.2 Dynamic Update of Clients

In the multi-client SSE scheme, the dynamic update of the client is worth considering, since in practice, new clients may join the system at any time, and the clients in the system may be revoked at any time, too. In the scheme proposed by Du [23], when there is a client, the data owner D not only needs to update ACL, but also needs to update the encrypted database and regenerate the encrypted index according to the pk of the client, time cost is O(n) for n clients, which is obviously inefficient when clients update frequently. However, in our scheme, due to the use of ABE, only the client whose attributes satisfy the access control policy can decrypt the ciphertext, therefore, the access rights of the file rely on the attributes of the client, and have nothing to do with the pk of the client, so the search index does not need to be regenerated no matter how many clients are added, the only thing that D has to do is updating the ACL, so time cost is O(1) for n clients. The process of client revocation is similar, the difference is that the ACL update changes from adding to deleting. Therefore, our scheme is more efficient than Du [23] when clients update dynamically.

4.3 Dynamic Update of Files

In addition to the dynamic update of the client, another problem worth considering is the dynamic update of files. Because the files stored in the cloud are not immutable, the data owner may add new files or delete expired files, so the dynamic update of files is also necessary. We construct a novel dynamic operator op that denotes “add” or “delete” operation of files and op is encrypted together with the identifier when the data owner D updates the file. For query on the EDB, the server checks op and filters the documents whose op = "delete", thus only valid files are reserved. The general SSE scheme returns all the matching files found, but there are some invalid files that are expired or to be deleted. In our scheme, the data owner D filters these invalid files through dynamic update of files, so that only valid files are returned, which reduces the communication load and improves the communication efficiency.

4.4 Supporting for Boolean queries

Given a query Q with keywords (w1, w2, …, wn), to support Boolean query, we use Boolean formula φ to construct the searchable form : w1∧φ(w2, …, wn). To perform a Boolean query, the client Ci sends the token TKi,Q to the server along with the Boolean formula φ. It’s the same with the algorithm 4 except that for a tuple (u, v, o), instead of using xtoken[i]v ∈ X, the server gets a binary value bvi for each keywords in (w2, w3, …, wn), where bvi = 1 if the xtoken[i]v ∈ X corresponding to the keyword wi, else bvi = 1. After getting all the binary values, the server calculates the expression φ based on the values of bv2, bv3, …,bvn and forwards u to R if the result is true.

4.5 Security Analysis

Theorem 1. Our scheme ∏ is L -semantically secure against adaptive attacks, where L is the leakage function defined in Definition 1, assuming that the DDH (Decisional Diffie-Hellman) assumption holds in G, F and Fp are secure PRFs and that ∑ = (Enc,Dec) is an IND-CPA scheme.

Algorithm 5 G0, G1

JAKO202213841060560_1295.png 이미지

Proof: The proof can be conducted by constructing a sequence of games. Among these games, G0 is designed to have the same distribution as RealA(λ) and the last game G8 is designed to simulate easily for the simulator S. In the proof of Theorem 1, the indistinguishability of the distribution between games proves that the simulator S satisfies the Definition 1, and the proof of the theorem is completed.

Game G0. As is shown in algorithm 5, G0 is the real game with minor modifications for easy analysis. It takes ( ACL, DB, s, x ) as input to simulate EDBSetup in algorithm 1 by using function INITIALIZE. INITIALIZE is identical to EDBSetup except that X is separated as a subfunction, XSetup.

G0 generates the transcript by using the function TransGen , before that, G0 generates the secret key Ωi by running function ClientAuth that simulates the ClientAuth algorithm as defined in algorithm 2, specifically, \(\begin{aligned}\overline{\mathrm{W}}\end{aligned}\)  is the set of the authorized keywords, and the order of keywords are recorded in WPerms. For k ∈ [T], G0 runs function TransGen (EDB,Ωi,s[k], x[k,·],c[k,·]) to output transcript t[k] ,the transcript is similarly as in the real game except the generation of ResInds : it gets ResInds by calculating DB(s[k])∩ DB(x[k,1])∩ DB(c[k,1])…∩ DB(x[k,n])∩ DB(c[k,n]). G0 has the same distribution with RealHA(λ) assuming that no false positives happening, it’s easy to get:

Pr[G0 = 1] - Pr[Realadv = 1] ≤ negl(λ)

Game G1 . G1 is identical to G0 except the calculation of stag and α, the difference between G1 and G0 is shown in the boxed codes in algorithm 5. The values of stag and α will be recorded after being computed for the first time, and will be directly looked up instead of being computed again when used later. So we can get:

Pr[G1 = 1 ] = Pr[G0 = 1]

Algorithm 6 G3, G4

JAKO202213841060560_1296.png 이미지

Game G2. The difference between G2 and G1 is that G2 uses random functions instead of PRFs Fand Fp, the details are shown in algorithm 6. Since F(Kw, ·) and F(Kk, ·) are calculated once for the same input, so we can replace them with random strings. Fp(KI, ·) , Fp(Kz, ·) and Fp(Kx, ·) are replaced by fI, fz, fx, respectively. Note that, TransGen takes A as input so that CLIENTAUTH can be omitted. We can get that there exist adversaries B1,1 and B1,2 such that:

Pr[G2 = 1] - Pr[G1 = 1] ≤ 2AdvPRFF,B1.1(λ) + 3AdvPRFF,B1.2(λ)

Game G3 . G3 is same as G2 except for the code in the box, the details are shown in algorithm 6. G3 uses an encryption of the constant string 0λ to replace the encryption of file identifiers. Since the encryption operation is executed m times, so we can get that there exists an adversary B2 which satisfies:

Pr[G3 = 1] - Pr[G2 = 1] ≤ m · Advind-cpaΣ.B2(λ)

Game G4. As shown in algorithm 7, G4 is same as G3 except the way of generating X and xtoken. Different from G3, in G4, elements X_Elem = gfx(w)·fI(id) in X are precomputed and recorded in array H(id, w) though the keyword w and the corresponding id. In G4, elements in X are generated in such a way: for a given w ∈ W and id ∈ DB(w), G4 adds the value H(id, w) form the array H to the set X. Recall that the value added to X is calculated by gfx(w)·fI(id), which is same as G3.

As for the value in xtoken, in G3, the xtoken is computed as gfz(St|||cnt)·fx(x[k,β]), in G4, TransGen looks up(id1,....,idTs) ← DB(sk), σ ← WPerms[sk] and y ← fI[idσ[cnt]] · fz[s || cnt], the xtoken[β] is set H[idσ[cnt], x[t,β]]1/v = gfx(x[k,β]·fz(st||cnt) ,which is same as in G3. It’s easy to see that:

Pr[G4= 1] = Pr[G3 = 1]

Game G5. G5 and G4 are almost the same except the code in box in algorithm 7. Simplify, G5 selects vform Z*p randomly instead of computing it. It is easy to see that:

Pr[G5 = 1] = Pr[G4 = 1]

Game G6. G6 is almost identical to G5, the difference is that instead of computing values of H and Y as the previous game G5, G6 selects them form G randomly, the details are shown in algorithm 7 with the double boxed codes. Under the DDH assumption, we can get that there exists an efficient adversary B5 :

Pr[G6 = 1] - Pr[G5 = 1] ≤ AdvDDHG.B5(λ)

Intuitively, in G5 ,the value of XTemp[w] is gfx(w) ,which is the form of ga if we replace fX(w) with a. The element in H is XTemp[w]eid, that is gfX(w)·eid it is the form of gab if we replace eid with b. With the above replacement, the distribution of H is indistinguishable from a random element in G under the DDH assumption. In the same way, Y is indistinguishable from a random element in G.

Game G7. G7 is almost same as G6 except that it changes way in generating X, the details are shown in algorithm 8. In G7, only elements in H that are used or accessed for multiple are added to X, otherwise, a random element in G is added to X. Furthermore, after H is generated,

Algorithm 7 G4, G5. G6

JAKO202213841060560_1298.png 이미지

XSETUP will just access H once, only the function TransGen access elements in H. However, elements that are accessed by TransGen satisfy that id ∈ DB[sk] and w = x[k,β]=. As for others, it is indistinguishable with random selection. Therefore, the distribution of G7 is the same as G6 ,so we get:

Pr[G7 = 1] = Pr[G6 = 1]

Game G8. G8 is almost same as G7 except the way to access H in function TransGen, as shown in algorithm 8. To test a possible repeated access to elements in H, the check is necessary that if either XSETUP will access the index or the function TransGen will read it again. In this case, XSETUP only access an index if id ∈ DB(sk) and x[k,β] = w in G7, which meets the purpose of the first “if” in G8. However, it is also possible in TransGen when there are two different queries, k and k'. For this situation, it should ensure ' id ∈ DB[sk]∩ DB[sk] and w = k[k,β] ∈ xk, that is what the “else if” statement in G8. Obviously,

Pr[G8 =1] = Pr[G7 =1]

Simulator: Simulator S takes L(DB,s,x) = (N,\(\begin{aligned}\bar{S}\end{aligned}\),SN,AN,IP,XN) as input and outputs a simulated EDB=(T,X) and a transcript array t. We prove that the simulator S and G8 are indistinguishable, so we can prove that the simulator S satisfies theorem 1 through the transitivity of trust between games.

Algorithm 8 G7, G8

JAKO202213841060560_1299.png 이미지

Firstly, S computes \(\begin{aligned}\bar{x}\end{aligned}\), which is a restricted equality pattern of x, it denotes the server knows which xtrems are equal. With the elements in X, it is possible for the server to infer some certain xterms are equal because if there is a id that satisfies id ∈ DB[s[k1]]∩DB[s[k2]], in which k1 and k2 are two different queries, then the server can infer x[k1,β] is equal to x[k2,δ] due to the repeating values of elements in X that xtemp[x[k1,β], id] and xtemp[x[k2,δ], id]. This can be formulated equivalently in terms of the leakage IP by \(\begin{aligned}\overline{\mathrm{x}}[k, \beta]\end{aligned}\) such that \(\begin{aligned}\overline{\mathrm{x}}\left[k_{1}, \beta\right]=\overline{\mathrm{x}}\left[k_{2}, \delta\right]\end{aligned}\) iff IP[k1, k2]≠ 𝜙 . Particularly, we have: \(\begin{aligned}\overline{\mathrm{x}}\left[k_{1}, \beta\right]=\overline{\mathrm{x}}\left[k_{2}, \delta\right] \quad \Rightarrow \quad \mathrm{x}\left[t_{1}, \beta\right]=\mathrm{x}\left[t_{2}, \delta\right]\end{aligned}\) and \(\begin{aligned}\left(\mathrm{x}\left[k_{1}, \beta\right]=\mathrm{x}\left[k_{2}, \delta\right]\right) \wedge\left(\mathrm{DB}\left[\mathrm{s}\left[k_{1}\right]\right] \cap \mathrm{DB}\left[\mathrm{s}\left[k_{2}\right]\right] \neq \phi\right) \Rightarrow \overline{\mathrm{x}}\left[k_{1}, \beta\right]=\overline{\mathrm{x}}\left[k_{2}, \delta\right]\end{aligned}\).

To show that the distribution of S is same as G8, we prove the distribution of EDB, X and xtoken is the same as that of G8, respectively, the details of EDB in S are shown in algorithm 9. In S ,the generation of EDB is same as for G8, in which \(\begin{aligned}w \in \overline{\mathrm{s}}\end{aligned}\) and \(\begin{aligned}|\bar{s}|<N\end{aligned}\) which is obvious by the definition of \(\begin{aligned}\overline{\mathrm{s}}\end{aligned}\), so S fills out the additional random elements of EDB. In both G8 and S, the elements in EDB are computed in the same way, so the distribution of S and G8 is indistinguishable.

The X in simulator S is generated by algorithm 10. In both simulator and G8 the elements in X are randomly chosen from group G. For the Σw∈W DB[w] , there are N elements, in G8, the elements are added to X for w ∈ W and ind ∈ DB(w). InS, this is done by keeping track of each

Algorithm 9 Generation of EDB in Simulator

JAKO202213841060560_algor 9.png 이미지 

Algorithm 10 Generation of X in Simulator

JAKO202213841060560_algor 10.png 이미지 

Algorithm 11 Generation of t in Simulator

JAKO202213841060560_algor 11.png 이미지

addition with k2, and adding additional (N − k2) elements at last. For the distribution the X, we show that with the xtoken.

The transcript t including the xtokens in simulator S are generated by algorithm 11. They y and σ are being uniformly random , hence distributed identically both in G8 and S. The reuse of σ is almost same in these two games, σ are reused when an sterm is repeated, while in S σ are reused when \(\begin{aligned}\overline{\mathrm{S}}\end{aligned}\) repeats.

Next, we observe that in calculating xtokens in G8, the H is accessed either the id satisfies a conjunction query that id ∈ DB[sk] ∩ DB[x[k,β]] or the id is in another query with the same xterm. The simulator S has the same logic by reading the σ(cnt)-th identifier in R which contains these two conditions.

Finally, we show that the reusage of H is same in G8 and S when H is used for multiple times. Consider two elements of H that (id1, x[k1,β]) and (id2, x[k2,β]) are read in different queries in G8, so id1 either satisfies the conjunction query or another query with the same xtrem, so as to id2. In S ,the simulator will read values from RP or IP where the indices are same with (id1, \(\begin{aligned}\overline{\mathrm{x}}\left[k_{1}, \beta\right]\end{aligned}\))and (id2, \(\begin{aligned}\overline{\mathrm{x}}\left[k_{2}, \beta\right]\\\end{aligned}\)) in H. We claim that:

\(\begin{aligned}\left(\mathrm{id}_{1}, \mathrm{x}\left[k_{1}, \beta\right]\right)=\left(\mathrm{id}_{2}, \mathrm{x}\left[k_{2}, \beta\right]\right) \Leftrightarrow\left(\mathrm{id}_{1}, \overline{\mathrm{x}}\left[k_{1}, \beta\right]\right)=\left(\mathrm{id}_{2}, \overline{\mathrm{x}}\left[k_{2}, \beta\right]\right)\end{aligned}\)       (1)

The left direction ⇐ of (1) is easy since that for \(\begin{aligned}\bar{x}\end{aligned}\), we have

\(\begin{aligned}\overline{\mathrm{x}}\left[k_{1}, \beta\right]=\overline{\mathrm{x}}\left[k_{2}, \delta\right] \Rightarrow \mathrm{x}\left[k_{1}, \beta\right]=\mathrm{x}\left[k_{2}, \delta\right]\end{aligned}\),

and \(\begin{aligned}\bar{x}\end{aligned}\) has another property that \(\begin{aligned}\left(\mathrm{x}\left[k_{1}, \beta\right]=\mathrm{x}\left[k_{2}, \beta\right]\right) \wedge\left(\mathrm{DB}\left[\mathrm{s}\left[k_{1}\right]\right] \cap \mathrm{DB}\left[\mathrm{s}\left[k_{2}\right]\right] \neq \phi\right) \Rightarrow\end{aligned}\)\(\begin{aligned}\overline{\mathrm{x}}\left[k_{1}, \beta\right]=\overline{\mathrm{x}}\left[k_{2}, \delta\right]\mathrm{DB}\left[\mathrm{s}\left[k_{1}\right]\right] \cap \mathrm{DB}\left[\mathrm{s}\left[k_{2}\right]\right] \neq \phi\end{aligned}\), if DB[s[k1]]∩DB[s[k2]]≠𝜙 ,then the direction ⇒ will be proven. Suppose that (id1,x[k1,β]) = (id2,x[k2,β]) ,we have id1=id2 ,but this means the id is in DB[s[k1]]∩DB[s[k2]] and the intersection is not empty, thus we have \(\begin{aligned}\overline{\mathrm{x}}\left[k_{1}, \beta\right]=\overline{\mathrm{x}}\left[k_{2}, \beta\right]\end{aligned}\).

5. Performance Analysis

To evaluate the performance of our DMC-SSE scheme, we conduct experiments based on real data set, and compared it with multi-client schemes Du [23] and Jarecki [19] that with similar functions from two aspects of function and performance.

5.1 Functional Analysis

First of all, we conduct a functional comparative analysis as shown in Table 1, although all three schemes implement multi-client SSE and support Boolean queries, but Du [23] and our scheme can better update the clients dynamically. As for the dynamic update of the client, both ACL and index are needed to be updated in the scheme proposed by Du, unlike this, only ACL is updated in our scheme. At last, our scheme supports dynamic updates of files, which is not supported by the other two schemes.

Table 1. Comparison of functionality features

E1KOBZ_2022_v16n4_1286_t0001.png 이미지

As for computing overhead, only the most time-consuming operations are considered: exponential operation (denoted by ep) and bilinear pairing (denoted by bp), therefore, only the calculation cost of EDBSetup, TokenGen and Search algorithms are compared, other algorithms that use fewer of these operations are omited, i.e., ClientAuth. The comparison results are shown in Table 2, in which Nc denotes the number of clients and Nf denotes the number of files corresponding to the keyword w, besides, Npis the total number of keyword-identifier pairs, Np =| DB(w)|. In our scheme, EDBSetup needs to computes 1 exponentiation to realize the attribute encryption of the symmetric key and the file identification, thus there are 2 ep in one keyword-identifier pair.

Table 2. Comparison of computing cost

E1KOBZ_2022_v16n4_1286_t0002.png 이미지

Since the communication cost mainly depends on the size of the data transmitted in the network, the size of the data can be used to evaluate the communication cost. In our scheme, the data transmitted in the network is mainly the encrypted database and the search token generated during keyword query, so we use the size of them to evaluate the communication cos like Du [23], the comparison result is shown in Table 3. In which |·| denotes the size of a set or group and Nq is the number of keywords to query.

Table 3. Comparison of communication cost

E1KOBZ_2022_v16n4_1286_t0003.png 이미지

5.2 Performance Analysis

We deploy our experiments on a local machine with an operating system of Ubuntu 18.04, Intel(R) Core (TM) i7-8550U CPU and 8 GB of RAM. We use python 3.6 to compile our programs on Pycharm 2020.2. We use charm-crypto library to implement cryptographic and group operations. For PRFs and hash functions, we use AES-128 and HMAC-MD5 respectively, also NIST 224p elliptic curves is used for group operations.

For dataset, we adopt Enron Email Dataset in our experiments. Enron Email Dataset has about 517401 email files from about 150 users. Keywords are extracted by the jieba library in python, and about 1672878 keywords are extracted and we generate an inverted index based on the extracted keywords, our experiments are mainly based on this index.

We analyze the experimental results of the three algorithms of EDBSetup, TokenGen, and Search, which are computationally expensive, to evaluate the performance of our scheme.

Fig. 3 shows the comparison results of the time cost in EDBSetup. In Fig. 3 (a), the scheme proposed by Du grows linearly with the number of clients, however, it has almost no impact on our scheme and the scheme proposed by Jarecki [19]. The reason is that Du [23] performs bilinear map operation for each client, which is not needed in our scheme. In Fig. 3 (b), all three schemes grow linearly with the number of keyword-identifier pairs, but our scheme takes less time than Jarecki [23].

E1KOBZ_2022_v16n4_1286_f0003.png 이미지

Fig. 3. The time cost in EDBSetup. (a) the number of keyword-identifier is fixed at 6000 with various clients. (b) the number of clients is fixed at 10 with various keyword-identifier.

Fig. 4 shows the comparison results of the time cost in TokenGen. Our scheme and Jarecki [19] cost more time than Du [23] since the use bilinear mapping makes the number of tokens only depends on the number of keywords in Du [23]. The time cost of our scheme is almost the same as that of Jarecki [19], but it is more because our scheme needs to calculate the blind factor ctl[j] ← αi · wj when generating the token.

E1KOBZ_2022_v16n4_1286_f0004.png 이미지

Fig. 4. The time cost in TokenGen. (a) the number of keyword-identifier is fixed at 200 with various keywords. (b) the number of keywords is fixed at 100 with various keyword-identifier.

Fig. 5 shows the time cost spent on a Boolean query in the three schemes, which grows linearly with the number of keywords and files in the above three schemes. The scheme of Du [23] has the highest time cost due to expensive bilinear pairing operation, since our scheme performs file filtering operations, our solution takes more time than Jarecki [19].

E1KOBZ_2022_v16n4_1286_f0005.png 이미지

Fig. 5. The time cost in Search. (a) the number of keyword-identifier is fixed at 1000 with various keywords. (b) the number of keywords is fixed at 20 with various keyword-identifier.

6. Conclusion

In this paper, we propose a searchable symmetric encryption scheme for multi-client that supports Boolean queries, DMC-SSE, which realizes multi-keyword search in multi-client scenario and supports fine-grained access control of client, in addition, our scheme realizes full dynamic update of client and file. Experimental results and security analysis show that our scheme is correct, efficient and secure.

References

  1. D. X. Song, D. Wagner, and A. Perrig, "Practical techniques for searches on encrypted data," in Proc. of IEEE Symp. Secur. Privacy, pp. 44-55, 2000.
  2. D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano, "Public key encryption with keyword search," in Proc. of Int. Conf. Theory Appl. Cryptographic Techn., pp. 506-522, 2004.
  3. M.Naveed, M. Prabhakaran, and C.A. Gunter, "Dynamic searchable encryption via blind storage," in Proc. of IEEE Symp. Secur. Privacy, pp. 639-654, 2014.
  4. S. Kamara, C. Papamanthou, and T. Roeder, "Dynamic searchable symmetric encryption," in Proc. of ACM Conf. Comput. Commun. Secur., pp. 965-976, 2012.
  5. K. S. Kim, M. Kim, D. Lee, J. H. Park, and W. Kim, "Forward secure dynamic searchable symmetric encryption with efficient updates," in Proc. of ACM Conf. Comput. Commun. Secur., pp. 1449-1463, 2017.
  6. J. Li, Y. Huang, Y. Wei, Z. L. Liu, C. Y. Dong, W. J. Lou, "Searchable Symmetric Encryption with Forward Search Privacy," IEEE Trans. Dependa. Secure Comput, vol. 18, no. 1, pp. 460-474, Jan/Feb 2021. https://doi.org/10.1109/TDSC.2019.2894411
  7. S. Tahir, S. Ruj, Y. Rahulamathavan, M. Rajarajan and C. Glackin, "A New Secure and Lightweight Searchable Encryption Scheme over Encrypted Cloud Data," IEEE Trans. Emerging Topics in Computing, vol. 7, no. 4, pp. 530-544, 1 Oct.-Dec. 2019. https://doi.org/10.1109/tetc.2017.2737789
  8. H. Li, Y. Yang, Y. Dai, S. Yu and Y. Xiang, "Achieving Secure and Efficient Dynamic Searchable Symmetric Encryption over Medical Cloud Data," IEEE Trans.Cloud Comput, vol. 8, no. 2, pp. 484-494, 1 April-June 2020. https://doi.org/10.1109/TCC.2017.2769645
  9. R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, "Searchable symmetric encryption: improved definitions and efficient constructions," Journal of Computer Security, vol. 19, no. 5, pp. 895-934, 2011. https://doi.org/10.3233/JCS-2011-0426
  10. M. Chase and S. Kamara, "Structured encryption and controlled disclosure," in Proc. of Advances in Cryptology - ASIACRYPT 2010, pp. 577-594, 2010.
  11. S. Kamara, T. Moataz, "Boolean searchable symmetric encryption with worst-case sub-linear complexity," in Proc. of Advances in Cryptology - EUROCRYPT 2017, pp.94-124, 2017.
  12. B. Fuhry, R. Bahmani, F. Brasser, F. Hahn, F. Kerschbaum, AR. Sadeghi, "HardIDX: Practical and Secure Index with SGX," in Proc. of IFIP Annual Conference on Data and Applications Security and Privacy XXXI, pp.386-408, 2017.
  13. P. Golle, J. Staddon, and B. R. Waters, "Secure conjunctive keyword search over encrypted data," in Proc. of International Conference on Applied Cryptography and Network Security, pp. 31-45, 2004.
  14. L. Ballard, S. Kamara, F. Monrose, "Achieving efficient conjunctive keyword searches over encrypted data," in Proc. of 7th. Int. Conf. Info. Commu. Secur, pp. 414-426, 2005.
  15. D. Cash, S. Jarecki, C. S. Jutla, H. Krawczyk, M. Rosu, and M. Steiner, "Highly-scalable searchable symmetric encryption with support for boolean queries," Advances in Cryptology-CRYPTO, pp. 353-373, 2013.
  16. S. Lai, S. Patranabis, A. Sakzad, J. Liu, D. Mukhopadhyay, R. Steinfeld, S. Sun, D. Liu, and C. Zuo, "Result pattern hidingsearchable encryption for conjunctive queries," in Proc. of ACM Conf. Comput. Commun. Secur., pp. 745-762, 2018.
  17. G. Xu, H.W. Li, Y.S. Dai, K. Yang, X.D. Lin, "Enabling efficient and geometric range query with access control over encrypted spatial data," IEEE Trans.Inf. Forensics Security, vol.14, no.4, pp.870-885, Apr.2019. https://doi.org/10.1109/TIFS.2018.2868162
  18. M. Raykova, B. Vo, S. M. Bellovin, and T. Malkin, "Secure anonymous database search," in Proc. of the 2009 ACM workshop on Cloud computing security, pp. 115-126, 2009.
  19. S. Jarecki, C. S. Jutla, H. Krawczyk, M. Rosu, and M. Steiner, "Outsourced symmetric private information retrieval," in Proc. of ACM Conf. Comput. Commun. Secur., pp. 875-888, 2013.
  20. B.A. Fisch, B. Vo, F. Krell, A. Kumarasubramanian, V. Kolesnikov, T. Malkin, S.M. Bellovin, "Malicious-client security in blind seer: a scalable private DBMS," in Proc. of IEEE Symp. Secur. Privacy, pp. 395-410, 2015.
  21. S. Faber, S. Jarecki, H. Krawczyk, Q. Nguyen, M. Rosu, M. Steiner, "Rich queries on encrypted data: beyond exact matches," in Proc. of ESORICS, Vienna, Austria, 123-145, 2015.
  22. S.F. Sun, C. Zuo, J.K. Liu, A. Sakzad, R. Steinfeld, T.H. Yuen, D. Gu, "Non-Interactive Multi-Client Searchable Encryption: Realization and Implementation," IEEE Trans. Depend. Secure Comput, vol. 19, no. 1, pp. 452-467, 2022. https://doi.org/10.1109/TDSC.2020.2973633
  23. L. Du, K. Li, Q. Liu, Z. Wu, S. Zhang, "Dynamic multi-client searchable symmetric encryption with support for boolean queries," Inf. Sci, vol.506, pp.234-257, Jan. 2020. https://doi.org/10.1016/j.ins.2019.08.014
  24. D. Cash, P. Grubbs, J. Perry, and T. Ristenpart, "Leakage-abuse attacks against searchable encryption," in Proc. of 22nd ACM SIGSAC Conf. Comput. Commun. Secur., pp. 668-679, 2015.
  25. Y. Zhang, J. Katz, and C. Papamanthou, "All your queries are belong to us: The power of file-injection attacks on searchable encryption," in Proc. of IEEE Symp. Secur. Privacy, pp. 707-720, 2016.
  26. Y. Wei, S. Lv, X. Guo, Z. Liu, Y. Huang, and B. Li, "FSSE: Forward secure searchable encryption with keyed-block chains," Inf. Sci., vol. 500, pp. 113-126, Oct. 2019. https://doi.org/10.1016/j.ins.2019.05.059
  27. J. G. Chamani, D. Papadopoulos, C. Papamanthou, and R. Jalili, "New constructions for forward and backward private symmetric searchable encryption," in Proc. of ACM Conf Computer Commun Secur., pp. 1038-1055, 2018.
  28. X. Song, C. Dong, D. Yuan, Q.L. Xu, M.H. Zhao, "Forward Private Searchable Symmetric Encryption with Optimized I/O Efficiency," IEEE Trans. Depend. Secure Comput, vol.17, no.5, pp.912-927, Sept.-Oct. 1 2020. https://doi.org/10.1109/tdsc.2018.2822294
  29. H. Li, Y. Yang, Y. Dai, Y. Shui, X. Yong, "Achieving Secure and Efficient Dynamic Searchable Symmetric Encryption over Medical Cloud Data," IEEE Trans.Cloud Comput, vol.8, no.2, pp. 484-494. April-June 2020. https://doi.org/10.1109/TCC.2017.2769645
  30. X. Liu, G. Yang, Y. Mu, H. Deng, "Multi-user Verifiable Searchable Symmetric Encryption for Cloud Storage," IEEE Trans. Depend.Secure Comput, vol.17, no.6, pp.1322-1332, Nov.-Dec. 1 2020. https://doi.org/10.1109/TDSC.2018.2876831