Aussie AI

AI Safety Research

  • Last Updated 12 December, 2024
  • by David Spuler, Ph.D.

Safe and responsible use of AI is an important and all-encompassing goal. Multiple concerns arise in the use of modern AI capabilities, and for the future with more advanced AI systems. This article examines the various research papers on difference AI safety issues.

Types of AI Safety Issues

There are a variety of distinct issue in terms of appropriate use of AI. Some of the categories include:

  • Bias and fairness
  • Inaccurate results
  • Imaginary results ("hallucinations")
  • Inappropriate responses

There are some issues that get quite close to being philosophy rather than technology:

  • Alignment (ensuring AI engines are "aligned" with human goals)
  • Overrideability/interruptibility
  • Obedience vs autonomy

There are some overarching issues for AI matters for the government and in the community:

  • Ethics
  • Governance
  • Regulation
  • Auditing and Enforcement
  • Risk Mitigation

And since we may rely on AI models in various real-world situations, including dangerous real-time situations like driving a car, there are some practical technological issues ensuring that AI engines operate safely and reliably within their basic operational scope:

  • Testing and Debugging (simply avoiding coding "bugs" in complex AI engines)
  • Real-time performance profiling ("de-slugging")
  • Error Handling (tolerance of internal or external errors)
  • Code Resilience (handling unexpected inputs or situations reasonably)

Overviews, Surveys, and Reviews

Various authors have reviewed the areas of safety and ethics:

Hallucinations

Hallucinations are plausible-sounding answers that are not correct, and not based on any facts. It appears like the LLM is lying or faking the answer, but it doesn't actually know this. Rather, it is probabilistically trying to come up with the best answer, and sometimes it doesn't have a factual answer, so it can fill in the blanks.

Security of AI

Research on security issues involving AI and LLMs:

Safety Monitor

A safety monitor is a component that can be added to the LLM deployment.

General Thoughts on AI Safety

High-level debate and discussions of AI safety issues:

Government Policy and Regulation

Various governments have examined issues around regulation, and there has also been much debate:

Auditing and Enforcement

Papers on auditing or enforcement of AI policy:

  • J. Mökander and L. Floridi. 2022, Operationalising AI governance through ethics-based auditing: An industry case study. AI and Ethics, pages 1–18, https://link.springer.com/article/10.1007/s43681-022-00171-7
  • J. Mökander, J. Schuett, H. R. Kirk, and L. Floridi. June 2023. Auditing large language models: A three-layered approach. arXiv preprint arXiv:2302.08500. https://arxiv.org/abs/2302.08500
  • J. Mökander, J. Morley, M. Taddeo, and L. Floridi. Ethics-based auditing of automated decision-making systems: Nature, scope, and limitations. Science and Engineering Ethics, 27(44), 2021. https://arxiv.org/abs/2110.10980

Bias and Fairness

AI engines have shown bias in various ways. The goal is to have them show "fairness" in their results:

Ethics of Responsible AI Research

Ethical issues in AI research and related publication of results:

AI Alignment Research

Alignment is the study of how to ensure that AI engines are "aligned" with the goals and intent of humans.

AI Industry Safety Practices

Various papers discuss the practices of the major AI players in the industry, along with issues such as self-governance.

Technical Verification and Testing of AI Safety

Testing and evaluation of AI safety issues:

  • Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. May 2017. Safety verification of deep neural networks. In Computer Aided Verification, pages 3–29, https://arxiv.org/abs/1610.06940
  • D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, and J. Clark. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022 https://arxiv.org/abs/2209.07858
  • K Ramesh, A Chavan, S Pandit, 2023, A Comparative Study on the Impact of Model Compression Techniques on Fairness in Language Models, Microsoft Research, https://aclanthology.org/2023.acl-long.878.pdf, https://www.microsoft.com/en-us/research/uploads/prod/2023/07/3687_Paper.pdf (Rather than testing full models, this analysis examines optimized models due to quantization, pruning or distillation.)
  • T. Shevlane. Structured access: An emerging paradigm for safe AI deployment. In The Oxford Handbook of AI Governance, 2022, https://arxiv.org/abs/2201.05159
  • E. Perez, S. Huang, F. Song, T. Cai, R. Ring, J. Aslanides, A. Glaese, N. McAleese, and G. Irving. 2022, Red teaming language models with language models. arXiv preprint arXiv:2202.03286, https://arxiv.org/abs/2202.03286
  • OpenAI. 2023. Safety best practices. https://platform.openai.com/docs/guides/safety-best-practices
  • William Saunders, Girish Sastry, Andreas Stuhlmueller, and Owain Evans. Trial without error: Towards safe reinforcement learning via human intervention. arXiv preprint arXiv:1707.05173, 2017. https://arxiv.org/abs/1707.05173
  • Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed, Oct 2023, Mistral 7B, https://arxiv.org/abs/2310.06825, Code: https://mistral.ai/news/announcing-mistral-7b/ (Examines guardrails and testing of the safety of the model against harmful inputs.)

AI Factual Inaccuracy

Research papers on accuracy of AI results include:

AI Safety Incidents

Various incidents and accidents related to AI safety issues:

Incident Databases: There are various databases that collect information about AI safety incidents.

Medical Ethics and AI

The use of AI in medicine creates some additional ethical issues:

  • Vollmer S., Mateen B.A., Bohner G., Király F.J., Ghani R., Jonsson P., et al. Machine learning and AI research for patient benefit: 20 critical questions on transparency, replicability, ethics and effectiveness. BMJ. 2018;(368):1–12. https://pubmed.ncbi.nlm.nih.gov/32198138/
  • Cockerill RG., 2020, Ethics Implications of the Use of Artificial Intelligence in Violence Risk Assessment. J Am Acad Psychiatry Law. 2020 Sep;48(3):345-349. doi: 10.29158/JAAPL.003940-20. Epub 2020 May 14. PMID: 32409300, https://pubmed.ncbi.nlm.nih.gov/32409300/
  • Barron DS. 2021, Commentary: the ethical challenges of machine learning in psychiatry: a focus on data, diagnosis, and treatment. Psychol Med. 2021 Nov;51(15):2522-2524. doi: 10.1017/S0033291721001008. Epub 2021 May 12. PMID: 33975655, https://pubmed.ncbi.nlm.nih.gov/33975655/
  • O'Reilly-Shah VN, Gentry KR, Walters AM, Zivot J, Anderson CT, Tighe PJ. 2020, Bias and ethical considerations in machine learning and the automation of perioperative risk assessment. Br J Anaesth. 2020 Dec;125(6):843-846. doi: 10.1016/j.bja.2020.07.040. Epub 2020 Aug 21. PMID: 32838979, https://pubmed.ncbi.nlm.nih.gov/32838979/
  • Buchlak QD, Esmaili N, Leveque JC, Bennett C, Piccardi M, Farrokhi F., 2020, Ethical thinking machines in surgery and the requirement for clinical leadership. Am J Surg. 2020 Nov;220(5):1372-1374. doi: 10.1016/j.amjsurg.2020.06.073. Epub 2020 Jul 8. PMID: 32723487, https://pubmed.ncbi.nlm.nih.gov/32723487/
  • Starke G, De Clercq E, Borgwardt S, Elger BS., 2020, Computing schizophrenia: ethical challenges for machine learning in psychiatry. Psychol Med. 2021 Nov;51(15):2515-2521. doi: 10.1017/S0033291720001683. Epub 2020 Jun 15. PMID: 32536358, https://pubmed.ncbi.nlm.nih.gov/32536358/
  • Jacobson NC, Bentley KH, Walton A, Wang SB, Fortgang RG, Millner AJ, Coombs G 3rd, Rodman AM, Coppersmith DDL., 2020, Ethical dilemmas posed by mobile health and machine learning in psychiatry research. Bull World Health Organ. 2020 Apr 1;98(4):270-276. doi: 10.2471/BLT.19.237107. Epub 2020 Feb 25. PMID: 32284651, https://pubmed.ncbi.nlm.nih.gov/32284651/
  • Johnson SLJ., 2019, AI, Machine Learning, and Ethics in Health Care. J Leg Med. 2019 Oct-Dec;39(4):427-441. doi: 10.1080/01947648.2019.1690604. PMID: 31940250 https://pubmed.ncbi.nlm.nih.gov/31940250/
  • Vayena E, Blasimme A, Cohen IG., 2018, Machine learning in medicine: Addressing ethical challenges. PLoS Med. 2018 Nov 6;15(11):e1002689. doi: 10.1371/journal.pmed.1002689. eCollection 2018 Nov. PMID: 30399149, https://pubmed.ncbi.nlm.nih.gov/30399149/
  • Nabi J., 2018, How Bioethics Can Shape Artificial Intelligence and Machine Learning. Hastings Cent Rep. 2018 Sep;48(5):10-13. doi: 10.1002/hast.895. PMID: 30311202, https://pubmed.ncbi.nlm.nih.gov/30311202/
  • Char DS, Shah NH, Magnus D., 2018, Implementing Machine Learning in Health Care - Addressing Ethical Challenges. N Engl J Med. 2018 Mar 15;378(11):981-983. doi: 10.1056/NEJMp1714229. PMID: 29539284, https://pubmed.ncbi.nlm.nih.gov/29539284/
  • Fiske A, Henningsen P, Buyx A., 2019, Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy. J Med Internet Res. 2019 May 9;21(5):e13216. doi: 10.2196/13216. PMID: 31094356, https://pubmed.ncbi.nlm.nih.gov/31094356/
  • Beil Michael, Proft Ingo, van Heerden Daniel, Sviri Sigal, van Heerden Peter Vernon. 2019, Ethical considerations about artificial intelligence for prognostication in intensive care. Intensive Care Medicine Experimental. 2019;7:70. http://www.ncbi.nlm.nih.gov/pmc/articles/pmc6904702/, https://pubmed.ncbi.nlm.nih.gov/31823128/
  • Lasse Benzinger, Frank Ursin, Wolf-Tilo Balke, Tim Kacprowski & Sabine Salloch, 2023, Should Artificial Intelligence be used to support clinical ethical decision-making? A systematic review of reasons BMC Medical Ethics volume 24, Article number: 48 (2023), https://doi.org/10.1186/s12910-023-00929-6
  • Rachel Dlugatch, Antoniya Georgieva & Angeliki Kerasidou, 2023, Trustworthy artificial intelligence and ethical design: public perceptions of trustworthiness of an AI-based decision-support tool in the context of intrapartum care, BMC Medical Ethics Open Access 20 June 2023, https://doi.org/10.1186/s12910-023-00917-w
  • Dzobo K, Adotey S, Thomford NE, Dzobo W. Integrating Artificial and Human Intelligence: A Partnership for Responsible Innovation in Biomedical Engineering and Medicine. OMICS. 2020 May;24(5):247-263. doi: 10.1089/omi.2019.0038. Epub 2019 Jul 16. PMID: 31313972, https://pubmed.ncbi.nlm.nih.gov/31313972/
  • McCradden MD, Joshi S, Mazwi M, Anderson JA., 2020, Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health. 2020 May;2(5):e221-e223. doi: 10.1016/S2589-7500(20)30065-0. PMID: 33328054, https://pubmed.ncbi.nlm.nih.gov/33328054/
  • Kulikowski CA., 2019, Beginnings of Artificial Intelligence in Medicine (AIM): Computational Artifice Assisting Scientific Inquiry and Clinical Art - with Reflections on Present AIM Challenges. Yearb Med Inform. 2019 Aug;28(1):249-256. doi: 10.1055/s-0039-1677895. Epub 2019 Apr 25. PMID: 31022744, https://pubmed.ncbi.nlm.nih.gov/31022744/
  • Park S.H., Kim Y.H., Lee J.Y., Yoo S., Kim C.J. Ethical challenges regarding artificial intelligence in medicine from the perspective of scientific editing and peer review. Science Editing. 2019;6:91–98. https://www.semanticscholar.org/paper/Ethical-challenges-regarding-artificial-in-medicine-Park-Kim/7a5b3c84c6f5d16e68eaf17989b0debfd4ba57d0

Data Leakage

Data leakage refers to the AI accidentally causing the leak of data that you'd prefer was kept confidential. The "leak" can actually be caused by the LLM, or by the user, depending on the context. There are various ways this can occur:

  • Uploading confidential data in AI queries (User data leakage)
  • Training or fine-tuning data containing proprietary information (Training data leakage)
  • RAG datastore documents containing proprietary information (RAG data leakage)

In the context of an LLM output leaking, this refers to where internal company IP is accidentally "leaked" to the public by training the AI with documents containing internal information. The AI is not smart enough to note when it shouldn't be reading a document, and anything that goes into the training dataset, or in the RAG datastore, will be shown to users.

User data leakage is where company users are sending proprietary information to a third-party AI engine. In theory, this data is protected by the confidentiality practices of the LLM company. This issue is similar to having company staff emitting confidential information in their Google queries, but the issue is more problematic because AI queries can upload entire documents to be analyzed by the LLM, such as when doing grammar checking with an LLM.

Research papers on data leakage:

Refusal

Refusal refers to the way that an LLM will politely decline to answer an inappropriate question. There are all types of questions that we don't want an LLM to respond to, and this requires training to achieve.

Guardrails

Jailbreak

Jailbreaking is the hack of using English to break into a computer system. Actually, it's not so much a violation of the server, but it does refer to a way of getting the LLM to answer questions that its developer probably doesn't want it to. In other words, it's a trick to bypass the "refusal" module of an LLM.

Privacy

Research on privacy-related risks or concerns:

More Research on AI Safety

Research papers that cover various other AI safety issues:

More AI Research

Read more about: