References
- Xiangyu Qi, Visual Adversarial Examples Jailbreak Large Language Models, AAAI, 몬트리얼, 2024, 20p
- Xiaogeng Liu, AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models, ICLR, 빈, 2024, 21p
- Fabio Perez, Ignore Previous Prompt: Attack Techniques For Language Models, NeurIPS workshop, 뉴올리언스, 2022, 21p
- Jason Wei, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS, 뉴올리언스, 2022, 43p
- Chunting Zhou, LIMA: Less Is More for Alignment, NeurIPS, 뉴올리언스, 2022, 16p
- Gabriel Alon, Detecting Language Model Attacks with Perplexity, arxiv2308.14132, 2023, 22p
- Glukhov, LLM Censorship: The Problem and its Limitations, arxiv2307.10719, 2023, 16p
- Peng Ding, A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily, NAACL, 멕시코시티, 2024, 18p
- Youliang Yuan, GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher, ICLR, 빈, 2024, 21p