AI-Powered Cybersecurity Tools: LLM Security Testing (Open Source)

← Back to AI-Powered Cybersecurity Tools Hub | Full AI Tools Catalog | Main Atlas

This category contains 9 documented tools. It focuses on capabilities used for baseline hardening, monitoring integration, and defense-in-depth validation. Use this section when building shortlists, comparing operational tradeoffs, and mapping controls to detection/response ownership.

Category Evaluation Checklist

  • Coverage depth against your highest-priority threats and compliance obligations.
  • Operational overhead for deployment, tuning, and long-term maintenance.
  • Signal quality versus analyst workload and false-positive pressure.
  • Integration fit with SIEM, ticketing, identity, cloud, and engineering workflows.
  • Governance readiness including auditability, ownership clarity, and change control.

Jump by Name

D | G | L | N | P | R | T

Letter D

This letter section contains 1 tools.

DeepEval

  • Website: https://github.com/confident-ai/deepeval
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: DeepEval is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: LLM evaluation framework for measuring behavior quality, safety constraints, and test outcomes.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Letter G

This letter section contains 2 tools.

garak

  • Website: https://github.com/NVIDIA/garak
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: garak is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: LLM vulnerability scanner for probing prompt injection, unsafe outputs, and model security weaknesses.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Giskard

  • Website: https://github.com/Giskard-AI/giskard
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: Giskard is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Open-source testing framework for ML and LLM quality, robustness, and security risk analysis.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Letter L

This letter section contains 1 tools.

LLM Guard

  • Website: https://github.com/protectai/llm-guard
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: LLM Guard is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Input and output sanitization toolkit for LLM applications to reduce injection and leakage risk.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Letter N

This letter section contains 1 tools.

NeMo Guardrails

  • Website: https://github.com/NVIDIA/NeMo-Guardrails
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: NeMo Guardrails is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Framework for defining conversational safety and policy guardrails around LLM-driven applications.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Letter P

This letter section contains 2 tools.

promptfoo

  • Website: https://github.com/promptfoo/promptfoo
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: promptfoo is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Prompt testing and evaluation framework with automated red-team checks and policy assertions.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

PyRIT

  • Website: https://github.com/Azure/PyRIT
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: PyRIT is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Python toolkit for adversarial and red-team style testing of generative AI systems.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Letter R

This letter section contains 1 tools.

Rebuff

  • Website: https://github.com/protectai/rebuff
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: Rebuff is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Prompt injection detection and mitigation tooling for LLM application security hardening.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump

Letter T

This letter section contains 1 tools.

TruLens

  • Website: https://github.com/truera/trulens
  • Model: Open Source
  • Category: LLM Security Testing (Open Source)
  • Source Lists: Curated List

What it does: TruLens is used in llm security testing (open source) programs to support baseline hardening, monitoring integration, and defense-in-depth validation. Source summaries describe it as: Observability and evaluation toolkit for LLM applications, including feedback and risk signals.

Operational value: Security teams commonly use this capability to improve consistency between detection, investigation, and response decisions, especially when alerts, evidence collection, and triage ownership are distributed across multiple teams.

Typical deployment pattern: Implementations usually start with scoped pilot coverage, baseline logging/telemetry validation, and explicit runbook mapping so analysts understand when to escalate, contain, or defer.

Selection considerations: As an open-source option, teams usually evaluate maintainer activity, release cadence, and community response quality. Related source context: LLM Security Testing (Open Source).

Back to Name Jump