Legal Risks and Regulatory Frameworks for Synthetic Data in AI Large-Model Training
Abstract
artificial intelligence (AI) large-model training. However, its generation and application involve complex legal risks, such as systemic risks
arising from quality defects, limitations in privacy protection, the reinforcement and amplification of biases, and the potential for misuse.
To address these risks, a multidimensional regulatory framework is necessary, encompassing quality standards, algorithmic transparency,
traceability mechanisms, proactive safety protection, and accountability, aiming to strike a dynamic balance between technological innovation and risk mitigation.
Keywords
Full Text:
PDFReferences
[1] Emma Keen, Gartner Identifies Top Trends Shaping the Future of Data Science and Machine Learning, GARTNER, Aug. 1, 2023.
[2] Gal, Michal S. & Lynskey, Orla, Synthetic Data: Legal Implications of the Data-Generation Revolution, Iowa Law Review,
Vol.109:1087, p.1094-1095(2024).
[3] Kurapati, S., & Gilli, L., Synthetic Data: Convergence between Innovation and GDPR, Journal of Open Access to Law, Vol.11:1, p.1-
12(2023).
[4] Bellovin Steven M., Dutta, Preetam K., Reitinger N., Privacy and Synthetic Datasets, Stanford Technology Law Review, Vol.22:1, p.21-
41(2018).
[5] Fernando Lucini, The Real Deal About Synthetic Data, at https://sloanreview.mit.edu/article/the-real-deal-about-synthetic-data (Last visited on April 17, 2025).
[6] Ilia Shumailov, et al., AI Models Collapse When Trained on Recursively Generated Data, Nature, Vol.631:8022, p.755-759(2024).
[7] Rohan Taori & Tatsunori B. Hashimoto, Data Feedback Loops: Model-driven Amplification of Dataset Biases, in Proceeding of the 40th
International Conference on Machine Learning, New York: JMLR. Org, 2023.
[8] Ilkhan Ozsevim, Research Finds ChatGPT & Bard Headed for 'Model Collapse', at https://aimagazine.com/articles/research-finds-chatgpt-headed-for-model-collapse (Last visited on May 8, 2025).
[9] Ebers M., Standardizing AI: The Case of the European Commission's Proposal for an 'Artificial Intelligence Act', Cambridge University
Press, 2022, p.331.
[10] Emiliano De Cristofaro, Synthetic Data: Methods, Use Cases, and Risks, Security & Privacy, Vol.22:3, p.62-67(2024).
[11] Haonan Zhong et al., Copyright Protection and Accountability of Generative AI: Attack, Watermarking and Attribution, at https://doi.
org/10.48550/arXiv.2303.09272(Last visited on May 10, 2025).
[12] Peter Lee, Synthetic Data and the Future of AI, Cornell Law Review, Vol.110:1, p.40-42(2025).
[13] Boudewijn, Alexander & Ferraris, Andrea F., Legal and Regulatory Perspectives on Synthetic Data as an Anonymiz.
DOI: http://dx.doi.org/10.70711/aitr.v3i3.8037
Refbacks
- There are currently no refbacks.