posted on 2017-03-02, 01:46authored byTeo, Sin Gee
Rapid advances in automated data collection tools and data storage technology has
led to the wide availability of huge amount of distributed data owned by different parties.
Data mining can use the distributed data to discover rules, patterns or knowledge that
are normally not discovered data owned by a single party. Thus, data mining on the
distributed data can lead to new insights and economic advantages. However, in recent
years, privacy laws have been enacted to protect any individual sensitive information from
privacy violation and misuse. To address the issue, many have proposed privacy-preserving
data mining (PPDM) based on secure multi-party computation (SMC) that can mine the
distributed data with privacy preservation (i.e., privacy protection). However, most SMC-based
solutions are ad-hoc. They are proposed for specific applications, and thus cannot
be applied to other applications directly. Another limitation of current PPDM is with only
a limited set of operators such as +,−,× and log (logarithm). In data mining primitives,
some functions can involve operators such as / and √ . The above issues have motivated
us to investigate a general SMC-based solution to solve the current limitations of PPDM.
In this thesis, we propose a general model for privacy-preserving data mining, namely
as DAG. We apply a hybrid model that combines the homomorphic encryption protocol
and the circuit approach in DAG model. The hybrid model has been proven efficient in
computation and effective in protecting data privacy via the theoretical and experimental
proofs. Specifically, our proposed research objectives are as follows:
(i) We want to propose a general model of privacy-preserving data mining (i.e., DAG) that consists of a set of secure operators. The secure operators can support many
mining primitives. The two-party model which is the efficient and effective model
is applied to develop secure protocols in DAG. Our secure operators can provide a complete privacy under the semi-honest model. Moreover, the secure operators are
efficient in computation.
(ii) We will integrate DAG model into various classification problems by proposing new
privacy-preserving classification algorithms.
(iii) To make our DAG model that can support wider applications, we will integrate DAG
into other application domains. We will integrate DAG into ant colony optimization
(ACO) to solve the traveling salesman problem (TSP) by proposing a new privacypreserving
traveling salesman problem (PPTSP).
In this report, we present most results of the objectives mentioned above. The DAG
model is general – its operators, if pipelined together, can implement various functions.
It is also extendable – new secure operators can be defined to expand the functions the
model supports. All secure operators of DAG are strictly proven secure via simulation
paradigm (Goldreich, 2004). In addition, the error bounds and the complexities of the
secure operators are derived so as to investigate accuracy and computation performance
of our DAG model. We apply our DAG model into various application domains. We first
apply DAG into data mining classification algorithms such as support vector machine,
kernel regression, and Na¨ıve Bayes. Experiment results show that DAG generates outputs
that are almost the same as those by non-private setting, where multiple parties simply
disclose their data. DAG is also efficient in computation of data mining tasks. For example,
in kernel regression, when training data size is 683,093, one prediction in non-private
setting takes 5.93 sec, and that by our DAG model takes 12.38 sec. In the experiment of
PPTSP, a salesman can find the approximate optimal traveled distance without disclosing
any city locations in TSP. Various domain applications studies show that our DAG is the
general model yet efficient for secure multi-party computation.
History
Campus location
Australia
Principal supervisor
Cheng-Siong Lee
Year of Award
2016
Department, School or Centre
Information Technology (Monash University Clayton)