Illinois Lawmakers Just Passed America’s Strongest AI Safety Bill

Learn to build an AI safety monitoring system that tracks performance metrics and generates compliance reports, similar to what Illinois's new AI safety bill requires of major AI companies.

Introduction

In response to growing concerns about AI safety and accountability, Illinois has passed what experts are calling America's strongest AI safety bill. This legislation requires major AI companies to undergo third-party safety audits. In this tutorial, you'll learn how to implement a basic AI safety monitoring system using Python that could serve as a foundation for compliance with such regulations. You'll build a system that tracks AI model performance metrics and generates safety reports that could be used to demonstrate compliance with safety standards.

Prerequisites

Before beginning this tutorial, you should have:

Basic Python programming knowledge
Python 3.7 or higher installed
Familiarity with machine learning concepts
Understanding of data analysis and reporting

You'll also need to install the following Python packages:

pip install pandas scikit-learn numpy

Step-by-Step Instructions

1. Create the AI Safety Monitor Class

The first step is to create a foundational class that will monitor AI model performance. This class will track key safety metrics that regulators might require.

import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score
import json


class AISafetyMonitor:
    def __init__(self, model_name):
        self.model_name = model_name
        self.metrics_history = []
        self.safety_thresholds = {
            'accuracy': 0.8,
            'precision': 0.7,
            'recall': 0.7,
            'bias_score': 0.1
        }

    def calculate_metrics(self, y_true, y_pred):
        """Calculate key safety metrics for AI model performance"""
        accuracy = accuracy_score(y_true, y_pred)
        precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
        recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
        
        # Simple bias score calculation
        bias_score = self._calculate_bias_score(y_true, y_pred)
        
        return {
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'bias_score': bias_score,
            'timestamp': pd.Timestamp.now()
        }

    def _calculate_bias_score(self, y_true, y_pred):
        """Simple bias score calculation"""
        # This is a simplified version - real bias detection would be more complex
        df = pd.DataFrame({'true': y_true, 'pred': y_pred})
        return df.groupby('true')['pred'].apply(lambda x: x.value_counts().max() / len(x)).mean()

    def add_metrics(self, y_true, y_pred):
        """Add new metrics to the history"""
        metrics = self.calculate_metrics(y_true, y_pred)
        self.metrics_history.append(metrics)
        return metrics

Why this step matters: This creates the core monitoring structure that will track the key performance indicators that regulators care about. The metrics we're tracking (accuracy, precision, recall, and bias) are all critical safety indicators that would be required in any AI safety audit.

2. Implement Compliance Reporting

Next, we'll add functionality to generate compliance reports that would be required for third-party audits.

    def generate_compliance_report(self):
        """Generate a compliance report for safety audit"""
        if not self.metrics_history:
            return "No metrics recorded yet."
        
        df = pd.DataFrame(self.metrics_history)
        
        # Calculate averages
        avg_metrics = df.mean()
        
        # Check compliance
        compliance_status = self._check_compliance(avg_metrics)
        
        report = {
            'model_name': self.model_name,
            'report_date': pd.Timestamp.now().strftime('%Y-%m-%d'),
            'metrics_averages': avg_metrics.to_dict(),
            'compliance_status': compliance_status,
            'recommendations': self._generate_recommendations(avg_metrics)
        }
        
        return report

    def _check_compliance(self, metrics):
        """Check if metrics meet safety thresholds"""
        compliance = True
        failed_metrics = []
        
        for metric, threshold in self.safety_thresholds.items():
            if metric in metrics and metrics[metric] < threshold:
                compliance = False
                failed_metrics.append(metric)
        
        return {
            'status': 'compliant' if compliance else 'non-compliant',
            'failed_metrics': failed_metrics
        }

    def _generate_recommendations(self, metrics):
        """Generate recommendations based on current performance"""
        recommendations = []
        
        if metrics['accuracy'] < self.safety_thresholds['accuracy']:
            recommendations.append('Consider retraining the model to improve accuracy')
        
        if metrics['bias_score'] > self.safety_thresholds['bias_score']:
            recommendations.append('Investigate potential bias in training data')
            
        return recommendations

Why this step matters: This creates the reporting functionality that would be essential for demonstrating compliance to third-party auditors. The report format mimics what regulatory bodies would expect to see in an audit package.

3. Create Sample Data and Test the Monitor

Now let's test our monitor with sample data to see how it works in practice.

def main():
    # Create sample data
    np.random.seed(42)
    y_true = np.random.choice([0, 1, 2], size=1000)
    y_pred = np.random.choice([0, 1, 2], size=1000)
    
    # Initialize monitor
    monitor = AISafetyMonitor('Sample AI Model')
    
    # Add some metrics
    metrics = monitor.add_metrics(y_true, y_pred)
    print("Sample Metrics:", metrics)
    
    # Generate report
    report = monitor.generate_compliance_report()
    print("\nCompliance Report:")
    print(json.dumps(report, indent=2, default=str))

if __name__ == '__main__':
    main()

Why this step matters: Testing with real data helps you understand how the system would behave in practice. This sample demonstrates how the monitoring system would work with actual AI model outputs.

4. Add Logging and Alerting

For production use, we need to add logging and alerting capabilities that would be required for ongoing monitoring.

    def check_alerts(self):
        """Check if any safety thresholds have been crossed"""
        if not self.metrics_history:
            return []
        
        df = pd.DataFrame(self.metrics_history)
        alerts = []
        
        for metric, threshold in self.safety_thresholds.items():
            if metric in df.columns:
                current_value = df[metric].iloc[-1]  # Latest value
                if current_value < threshold:
                    alerts.append({
                        'metric': metric,
                        'threshold': threshold,
                        'current_value': current_value,
                        'severity': 'high' if current_value < threshold * 0.8 else 'medium'
                    })
        
        return alerts

    def log_alerts(self):
        """Log any alerts to a file"""
        alerts = self.check_alerts()
        if alerts:
            with open(f'{self.model_name}_alerts.log', 'a') as f:
                for alert in alerts:
                    f.write(f"{pd.Timestamp.now()}: {alert}\n")
        return alerts

Why this step matters: Real-world AI safety monitoring requires continuous alerting. This system would notify stakeholders when safety thresholds are breached, which is essential for maintaining compliance.

5. Export Report to PDF (Optional Enhancement)

For regulatory submissions, reports often need to be exported in specific formats.

    def export_report_to_pdf(self, filename='ai_safety_report.pdf'):
        """Export compliance report to PDF"""
        try:
            from fpdf import FPDF
            
            pdf = FPDF()
            pdf.add_page()
            pdf.set_font('Arial', 'B', 16)
            
            report = self.generate_compliance_report()
            
            pdf.cell(0, 10, f'AI Safety Compliance Report - {self.model_name}', 0, 1)
            pdf.cell(0, 10, f'Report Date: {report["report_date"]}', 0, 1)
            pdf.ln(10)
            
            pdf.set_font('Arial', 'B', 12)
            pdf.cell(0, 10, 'Metrics Averages:', 0, 1)
            pdf.set_font('Arial', '', 10)
            
            for key, value in report['metrics_averages'].items():
                pdf.cell(0, 8, f'{key}: {value:.4f}', 0, 1)
            
            pdf.ln(10)
            pdf.set_font('Arial', 'B', 12)
            pdf.cell(0, 10, 'Compliance Status:', 0, 1)
            pdf.set_font('Arial', '', 10)
            pdf.cell(0, 8, f'Status: {report["compliance_status"]["status"]}', 0, 1)
            
            if report['compliance_status']['failed_metrics']:
                pdf.cell(0, 8, 'Failed Metrics:', 0, 1)
                for metric in report['compliance_status']['failed_metrics']:
                    pdf.cell(0, 8, f'- {metric}', 0, 1)
            
            pdf.output(filename)
            return f"Report exported to {filename}"
        except ImportError:
            return "PDF export requires fpdf library. Install with: pip install fpdf"

Why this step matters: Regulatory compliance often requires formal documentation in specific formats. This enhancement shows how you could easily generate professional reports for auditors.

Summary

This tutorial demonstrated how to build a basic AI safety monitoring system that could help companies comply with legislation like Illinois's new AI safety bill. The system tracks key performance metrics, generates compliance reports, and provides alerting capabilities that would be essential for maintaining third-party audit readiness.

The components you've built include:

A monitoring class that tracks AI model performance
Compliance reporting functionality
Alerting mechanisms for safety threshold breaches
Report export capabilities

While this is a simplified implementation, it provides a foundation that could be expanded with more sophisticated bias detection algorithms, additional metrics, and integration with actual AI model outputs for real-world compliance monitoring.