Data Traceability


A common denominator for supporting the demand for compliance and explainability is the ability to trace data from the point of production to the point of consumption. This data traceability needs to be supported along a data-processing pipeline that includes heterogeneous computational components and devices. Principles from information flow control, where data is tagged with labels and these labels follow data along its propagation and transformation throughout the computation, need to be extended to support the heterogeneity of today’s data-processing pipelines. We study problems related to this challenge: 

  • How can data traceability be supported during model training and invocation?
  • How can traceability be bootstrapped at computation-restricted IoT devices?
  • How can we get assurance that the proposed traceability tool is appropriate for enforcing certain GDPR requirements?