EthicsComplianceVerifier

Overview

EthicsComplianceVerifier.circom is a zk-SNARK circuit that proves an AI agent’s training and evaluation meet predefined ethical requirements bias limits, fairness scores, and safety thresholds—without revealing any private metrics, data, or intermediate test results. It is designed to plug directly into the ZKAgentVerificationOrchestrator pipeline but can operate as an independent verifier in any zero-knowledge AI-audit workflow.

Objectives

The circuit enables a prover to demonstrate that:

Bias: Every bias-test score does not exceed max_bias_threshold.
Fairness: Every fairness-metric score is at least min_fairness_score.
Safety: The aggregate count of harmful content detections is not greater than max_harmful_rate.
All three conditions are cryptographically linked to a prior ethics commitment and the agent identifier.

Inputs

Private Inputs

Group

Name

Description

Bias tests

bias_test_results[n_bias_checks]

Integer scores 0-1000 per bias check

Fairness tests

fairness_scores[n_ethics_tests]

Integer scores 0-1000 per fairness evaluation

Safety flags

harmful_content_flags[n_ethics_tests]

0 = safe, 1 = harmful detected

Dataset hash

ethics_training_data_hash

Poseidon hash of ethics-specific training set

Red-team data

red_team_results[n_ethics_tests]

(placeholder, currently unused in constraints)

Public Inputs

Name

Description

max_bias_threshold

Upper bound for any bias score

min_fairness_score

Lower bound for any fairness score

max_harmful_rate

Maximum allowed total harmful detections

ethics_commitment_hash

Commitment published before training/validation

agent_id

Unique identifier for the agent

Public Outputs

Output

Meaning

ethics_verified

1 if all ethics criteria pass; otherwise 0

bias_compliance

1 if all bias tests satisfy threshold

fairness_compliance

1 if all fairness tests satisfy threshold

safety_compliance

1 if harmful detections ≤ max_harmful_rate

ethics_proof_hash

Poseidon hash binding compliance results to commitments

Circuit Logic

Bias Verification For each bias score, LessEqThan enforces bias_test_results[i] ≤ max_bias_threshold. The product of all flag outputs yields bias_compliance.
Fairness Verification For each fairness score, GreaterEqThan enforces fairness_scores[i] ≥ min_fairness_score. The product of all flag outputs yields fairness_compliance.
Safety Verification HarmfulContentCounter sums the Boolean flags harmful_content_flags[i]. LessEqThan ensures the total ≤ max_harmful_rate, producing safety_compliance.
Aggregate Result
```
ethics_verified = bias_compliance · fairness_compliance · safety_compliance
```
Implemented as two quadratic constraints to remain R1CS-valid.

Proof Hash

ethics_proof_hash = Poseidon(
    agent_id,
    bias_compliance,
    safety_compliance,
    ethics_commitment_hash,
    ethics_training_data_hash
)

Used by higher-level contracts to reference this proof succinctly.

Compilation

# prerequisites: circom 2.x, snarkjs, pot16_final.ptau
circom EthicsComplianceVerifier.circom \
      --r1cs --wasm --sym --c

snarkjs groth16 setup EthicsComplianceVerifier.r1cs \
                 pot16_final.ptau \
                 ecv_final.zkey

snarkjs zkey export verificationkey \
                 ecv_final.zkey \
                 ecv_verification_key.json

Proof Generation Example

node EthicsComplianceVerifier_js/generate_witness.js \
     EthicsComplianceVerifier_js/EthicsComplianceVerifier.wasm \
     input.json \
     witness.wtns

snarkjs groth16 prove ecv_final.zkey \
                     witness.wtns \
                     proof.json public.json

snarkjs groth16 verify ecv_verification_key.json \
                     public.json proof.json

public.json contains the five public outputs ready for on-chain submission.

Security

Private metrics remain local; only binary pass/fail and hashes are public.
Poseidon hashing maintains circuit efficiency on BN128.
If a single test fails, ethics_verified collapses to 0, preventing partial disclosure attacks.
Trusted-setup ceremony must be secured or replaced by MPC.

Status

Stable for integer bias, fairness, and safety metrics.
Planned extensions: incorporate red_team_results constraints; allow dynamic weighting of fairness scores; add differential-privacy proofs for training data.

PreviousTrainingQualityVerifier NextComplianceVerifier

Last updated 4 months ago