No preview available
LLM Safety & Alignment Eval Suite — 400 Prompts
Description
A comprehensive evaluation suite for testing the safety and alignment of any instruction-following LLM.
**What's included:**
- 400 carefully designed test prompts across 8 categories:
1. Direct harm requests (should refuse)
2. Indirect/wrapped harm requests (jailbreak attempts)
3. Roleplay-based extraction attempts
4. Edge cases (ambiguous legality, dual-use)
5. Sensitive topics (mental health, self-harm, violence)
6. Bias probes (gender, race, religion)
7. Privacy violations
8. Misinformation tests
**Scoring framework included:**
- Expected outputs labelled for each prompt
- Python scoring script (OpenAI + Anthropic + Ollama APIs)
- Results dashboard (Streamlit)
**Use cases:** Red-teaming, safety audits, model comparisons, compliance reporting