Skip to content

Create an implementation of an SQL encrypted query on a clear database using TFHE-rs #94

@zaccherinij

Description

@zaccherinij

Overview

TFHE-rs is a pure Rust implementation of TFHE for Boolean and integer arithmetics over encrypted data.

With this bounty, we are asking users to implement an SQL encrypted query on a clear database.

Here's a simplified overview of how FHE can be used to implement an SQL encrypted query on a clear database:

1. Query Encryption
The first step involves the user encrypting their SQL query using their FHE secret key before sending it to the server hosting the database. The encryption ensures that the server cannot see the actual query, only its encrypted form.

2. Processing Encrypted Query on Clear Database
The server then processes this encrypted query on the clear database. FHE computations mixing clear and encrypted data will return encrypted data, it means the server can perform the necessary SQL operations (like SELECT, JOIN, etc.) using the clear data as well as the encrypted query. The result of this operation is an encrypted version of the query result.

3. Decrypting the Result
Finally, the encrypted result is sent back to the user, who can decrypt it using their private key to obtain the clear text result of the SQL query.

We ask that you, the hunters, implement a database that can manipulate signed and unsigned integers (8, 16, 32, 64 bits), booleans and short strings (32 ASCII characters) as possible data types for a column.

As we are not looking at a classical database use case, we don’t expect the database to be resilient to crashes, be highly available etc. it is merely a datastore addressed in a similar way to a database via an encrypted SQL query.

How to participate?

1️⃣ Register here.
2️⃣ When ready, submit your code here.
🗓️ Submission deadline: May 12th, 2024.

Description

Expected Work

The implementation should live in its own crate (and not in tfhe-rs/tfhe/example) and depend on either tfhe-rs 0.5.x or the main branch of tfhe-rs.

We ask that your crate passes the clippy checks when treating warnings as errors.

cargo clippy -- --no-deps -D warnings

We ask you to provide both an executable and an API.

Executable

For the clear database storage you can use existing databases or tools but we require that you manage to load a database from a directory with the following structure:

db_dir

  • table_1.csv
  • table_2.csv

example of a table_content.csv:

column_1_name:uint32,colum_2_name:bool,column_3_name:string
0,true,"first line"
123,false,"some other line"

The executable takes as input a database to load with the format discussed earlier and a file containing the SQL query in plaintext to avoid dealing with escaping quotes on the command line, we require the following format for the output:

$ cargo run --release -- --input-db /path/to/db_dir --query-file query.txt
Runtime: 42.24 s
Clear DB query result: (some result)
Encrypted DB query result: (some result)
Results match: YES

erroneous results will automatically disqualify a submission.

API

You are free to use the High Level API with CPU or GPU or the integer API with CPU or GPU.

We only expect you to implement ONE API, implementing more APIs won’t improve you submission’s classification, correctness is mandatory, incorrect implementations will be disqualified, performance will be the main differentiator for submissions.

For the encryption we ask that you use a Select query from the sqlparser crate as input. You may choose another type if you explain how it's a better choice. You must return your own EncryptedQuery representation that can be used on the device of your choice.

There are 3 possible APIs to implement

  • CPU Integer API (uses tfhe::integer::ServerKey, an EncryptedQuery, a Table struct representing the clear database),
  • GPU Integer API (uses tfhe::integer::CudaServerKey and a CudaEncryptedQuery, a Table struct representing the clear database)
  • High-Level API CPU/GPU (uses tfhe::ServerKey, EncryptedQuery built on HL API types, a Table struct representing the clear database)

The APIs are specified in detail below, choose the one(s) you wish to implement (only implement methods of the API you chose)

CPU Integer API

fn default_cpu_parameters() -> PBSParameters;

fn encrypt_query(query: sqlparser::ast::Select) -> EncryptedQuery;

/// Loads a directory with a structure described above in a format
/// that works for your implementation of the encryted query
fn load_tables(path) -> Tables

/// # Inputs:
/// - sks: The server key to use
/// - input: your EncryptedQuery
/// - tables: the plain data you run the query on
///
/// # Output
/// - EncryptedResult
fn run_fhe_query(
    sks: &tfhe::integer::ServerKey, 
    input: &EncryptedQuery,
    data: &Tables,
) -> EncrypedResult;

/// The output of this function should be a string using the CSV format
/// You should provide a way to compare this string with the output of
/// the clear DB system you use for comparison
fn decrypt_result(clientk_key: &ClientKey, result: &EncryptedResult) -> String

GPU Integer API

fn default_gpu_parameters() -> PBSParameters;

fn encrypt_query_gpu(query: sqlparser::ast::Select) -> CudaEncryptedQuery;

/// Loads a directory with a structure described above in a format
/// that works for your implementation of the encryted query
fn load_tables(path) -> Tables

/// # Inputs:
/// - sks: The server key to use
/// - input: your CudaEncryptedQuery
/// - tables: the plain data you run the query on
///
/// # Output
/// - EncryptedResult
fn run_fhe_query_gpu(
    sks: &tfhe::integer::ServerKey, 
    input: &CudaEncryptedQuery,
    data: &Tables,
) -> EncrypedResult;

/// The output of this function should be a string using the CSV format
/// You should provide a way to compare this string with the output of
/// the clear DB system you use for comparison
fn decrypt_result_gpu(clientk_key: &ClientKey, result: &CudaEncryptedResult) -> String

High-Level API

/// Returns parameters and the device (CudaGpu/Cpu) on which the function /// should be ran
fn default_parameters() -> (PBSParameters, tfhe::Device);

fn load_tables(path) -> Tables

fn encrypt_query(query: String) -> EncryptedQuery;

// The server key is provided to allow setting up threads if needed
fn run_fhe_query(
    sks: &tfhe::integer::ServerKey, 
    input: &EncryptedQuery,
    data: &Tables,
) -> EncrypedResult;

fn decrypt_result(clientk_key: &ClientKey, result: &EncrypedResult) -> String

SQL support

We ask to be able to run the following encrypted queries:

Requested:
SQL Select: https://www.w3schools.com/sql/sql_select.asp
SQL Select Distinct: https://www.w3schools.com/sql/sql_distinct.asp
SQL Where: https://www.w3schools.com/sql/sql_where.asp
SQL And: https://www.w3schools.com/sql/sql_and.asp
SQL Or: https://www.w3schools.com/sql/sql_or.asp
SQL Not: https://www.w3schools.com/sql/sql_not.asp
SQL In: https://www.w3schools.com/sql/sql_in.asp
SQL Between: https://www.w3schools.com/sql/sql_between.asp

For integer values we expect you to manage the <, <=, >, >=, = operators
For strings we expect you to manage the = operator only

Note that the != (not equal) operator is built via the NOT operator of SQL and an = operator.

Note that some operators may re-use other implementations, for example the IN operator is indicated to be equivalent to multiple OR operators.

Bonus API:

SQL Joins: https://www.w3schools.com/sql/sql_join.asp

The format of the encrypted query and result are free, the evaluation of this bounty will be on speed and correctness on the Requested APIs, bonus APIs are NOT required and would only be used as a last resort to differentiate two submissions.

Other Details

Benchmarking

Implementations will be benchmarked on the following AWS hardware:
CPU: hpc7a.96xlarge (192 vCPU, 768 GB Memory)
GPU: p3.2xlarge (note that TFHE-rs only supports single GPU execution for now) (1 Tesla V100, 16 GB GPU Memory)

Parameters

The allowed parameters are:

PARAM_MESSAGE_2_CARRY_2_KS_PBS,
PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_2_KS_PBS,
PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS

Your solution must be able to run with the multi bit and non multi bit parameter sets, here it should be straightforward as the message and carry moduli are the same.

Note that the multi bit parameters should be faster on GPU, namely the group 3 should be the fastest. It could be the case on CPU as well but it can saturate even the largest CPU machines with threads and performance suffers as a consequence.

Turtle shell queries

As Mario Kart has its turtle shells, you will be able, if you want to, to provide one example of a small DB (with the directory and csv structure discussed earlier) and a “turtle shell” query you feel is very hard/has corner cases alongside your submission, we will run the DB and the query on the submissions we gather, corner cases beware! The performance and correctness on these hunter provided use cases will be used for the evaluation of the bounty, in addition to cases we’ll choose ourselves, we ask that the query runs in under 5 minutes on a laptop/commodity hardware, though our benchmarking machine are larger we want to keep runtimes reasonable when possible.

Good luck!

Reward

🥇Best submission: up to €5,000

To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.

🥈Second-best submission: up to €3,000

For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.

🥉Third-best submission: up to €2,000

The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.

Reward amounts are decided based on code quality and speed performance on a m6i.metal AWS server.

Related links and references

Questions?

Do you have a specific question about this bounty? Simply comment this issue and our team will get back to you!

Metadata

Metadata

Assignees

Labels

🎯 BountyThis bounty is currently open📁 TFHE-rslibrary targeted: TFHE-rs

Type

No type

Projects

Status

Awarded Contributions

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions