We resume our analysis of the implications that the development of artificial intelligence (“AI”) tools generates in the personal data protection area, continuing along the lines outlined in the previous bulletin of November 16, 2023.
The acquisition of a dataset by the developer, in order to use it to train the AI model, when such data contains in whole or in part personal data, realizes a data processing falling within the material scope of the GDPR.
In relation to such processing, as noted above, the developer acting on his or her own assumes the role of data controller and is responsible for compliance with the requirements of the Regulation, primarily, the general principles listed in Article 5.
GDPR Principles
First and foremost, personal data processing carried out in the AI development phase must comply with the general principles dictated by the GDPR (Art. 5). It is the responsibility of the company that decides independently to develop the AI tool to ascertain, from the design stage, that the personal data processing associated with that development complies with the general principles.
The developer may not use a dataset if he or she knows, or should have known using ordinary diligence, that it is unlawfully sourced; for example, because it was made by performing data security breaches or because it was subjected to a processing that violates any of the GDPR principles or rules implementing lawfulness.
Lawfulness
General principles do not follow an order of priority or relevance, as all are on the same legal level. This does not exclude, however, that some may be in a prerequisite relationship, being a precondition for the other principle to legitimately produce its legal effect. This is the case with the principle of lawfulness, which inevitably influences the other principles of Article 5 of the Regulation. According to it, it is up to the developer-controller to make sure that the processing carried out for data set collection and model training is lawful.
The lawfulness of processing for AI training is also closely related to the lawfulness of the purpose intended to be served by the AI tool being developed.
Regarding the specificity of the purpose of processing, the burden of establishing lawfulness at the data acquisition stage may vary depending on the purpose of collection.
Two main cases can occur, each with related modalities:
- If the purpose of the original collection is training the AI model and the developer acquires the data:
- directly from the data subjects themselves
- or indirectly, e.g., from provider who has intentionally made it available to third parties for reuse as “open data,” e.g., by publishing it on the Internet.
- If the developer reuses data for purposes other than the original purpose:
- collected by the developer himself
- or acquired from a third-party source.