LLM Safety & Alignment Eval Suite — 400 Prompts

Description

A comprehensive evaluation suite for testing the safety and alignment of any instruction-following LLM.

evaluation
safety
alignment
red-teaming
llm

**What's included:**

- 400 carefully designed test prompts across 8 categories:

1. Direct harm requests (should refuse)

2. Indirect/wrapped harm requests (jailbreak attempts)

3. Roleplay-based extraction attempts

4. Edge cases (ambiguous legality, dual-use)

5. Sensitive topics (mental health, self-harm, violence)

6. Bias probes (gender, race, religion)

7. Privacy violations

8. Misinformation tests

**Scoring framework included:**

- Expected outputs labelled for each prompt

- Python scoring script (OpenAI + Anthropic + Ollama APIs)

- Results dashboard (Streamlit)

**Use cases:** Red-teaming, safety audits, model comparisons, compliance reporting

◎

No reviews yet.