OpenAI open-sources teen safety policies for developers amid mounting lawsuits over ChatGPT deaths

Learn to build a basic AI safety monitoring system that detects potentially harmful content, similar to OpenAI's teen safety policies. This beginner-friendly tutorial teaches you how to implement keyword detection, sentiment analysis, and safety violation handling.

Introduction

In response to growing concerns about AI safety, particularly regarding young users, OpenAI has open-sourced its teen safety policies. This tutorial will guide you through creating a basic safety monitoring system for AI applications, similar to what OpenAI has shared. You'll learn how to implement safeguards that help protect vulnerable users, especially teenagers, when interacting with AI systems.

This is a beginner-friendly tutorial that will teach you to build a simple safety filter that can detect potentially harmful content and prompt appropriate warnings or responses.

Prerequisites

To follow this tutorial, you'll need:

A computer with internet access
Basic understanding of Python programming
Python 3.6 or higher installed on your system
Access to a Python IDE or text editor (like VS Code or PyCharm)
Basic understanding of how AI models work (no advanced knowledge required)

Step-by-Step Instructions

1. Set Up Your Python Environment

First, we need to create a Python environment for our project. Open your terminal or command prompt and create a new directory for this project:

mkdir ai_safety_monitor
 cd ai_safety_monitor

Next, create a virtual environment to keep our project dependencies isolated:

python -m venv safety_env
source safety_env/bin/activate  # On Windows use: safety_env\Scripts\activate

Now, install the required packages:

pip install nltk

2. Create the Safety Policy Framework

Open your code editor and create a new file called safety_monitor.py. This file will contain our core safety monitoring logic:

import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import re

# Download required NLTK data
nltk.download('vader_lexicon')

# Define safety policies
SAFETY_POLICIES = {
    'harmful_keywords': [
        'kill', 'die', 'suicide', 'hurt', 'harm', 'death', 'kill myself',
        'end my life', 'want to die', 'self-harm', 'hurt myself'
    ],
    'sensitive_topics': [
        'mental health', 'depression', 'anxiety', 'trauma', 'abuse'
    ],
    'age_restriction': 18
}

# Initialize sentiment analyzer
sia = SentimentIntensityAnalyzer()

class SafetyMonitor:
    def __init__(self):
        self.violations = []

    def check_text_safety(self, text):
        """Check if text violates safety policies"""
        # Check for harmful keywords
        for keyword in SAFETY_POLICIES['harmful_keywords']:
            if keyword in text.lower():
                self.violations.append(f"Harmful keyword detected: {keyword}")
                return False
        
        # Check for sensitive topics
        for topic in SAFETY_POLICIES['sensitive_topics']:
            if topic in text.lower():
                self.violations.append(f"Sensitive topic detected: {topic}")
                return False
        
        # Analyze sentiment
        sentiment = sia.polarity_scores(text)
        if sentiment['compound'] < -0.5:  # Very negative sentiment
            self.violations.append("Very negative sentiment detected")
            return False
        
        return True

    def get_violations(self):
        return self.violations

    def reset_violations(self):
        self.violations = []

3. Implement User Input Handling

Now we need to create a simple way to test our safety monitor with user input:

def main():
    monitor = SafetyMonitor()
    print("AI Safety Monitor - Type 'quit' to exit")
    print("This system checks for potentially harmful content.")
    print("\n" + "="*50)
    
    while True:
        user_input = input("\nEnter text to check: ")
        
        if user_input.lower() == 'quit':
            break
        
        # Check safety
        is_safe = monitor.check_text_safety(user_input)
        
        if is_safe:
            print("✅ Content is safe for all users.")
        else:
            print("⚠️  Potential safety violation detected!")
            violations = monitor.get_violations()
            for violation in violations:
                print(f"   - {violation}")
            print("   Suggestion: Redirect to mental health resources or counselor.")
        
        # Reset violations for next check
        monitor.reset_violations()

if __name__ == "__main__":
    main()

4. Run the Safety Monitor

Save your safety_monitor.py file and run it in your terminal:

python safety_monitor.py

You should see the program start and prompt you for input. Try entering different types of text:

Normal text like "Hello, how are you?"
Potentially harmful text like "I want to die"
Sensitive topics like "I'm feeling depressed"

5. Test the Safety Policies

Try running some tests to see how your safety monitor responds:

# Test 1: Normal text
Input: "How are you today?"
Output: ✅ Content is safe for all users.

# Test 2: Harmful keyword
Input: "I want to kill myself"
Output: ⚠️  Potential safety violation detected!
   - Harmful keyword detected: kill myself
   Suggestion: Redirect to mental health resources or counselor.

# Test 3: Sensitive topic
Input: "I'm having anxiety problems"
Output: ⚠️  Potential safety violation detected!
   - Sensitive topic detected: anxiety
   Suggestion: Redirect to mental health resources or counselor.

6. Enhance the Safety Monitor

For a more advanced version, you can enhance the safety monitor by adding:

Integration with external APIs for mental health resources
More sophisticated natural language processing
User age verification systems
Logging and reporting features

Here's an example of how to add a basic resource redirect:

def suggest_resources(self):
    """Suggest mental health resources for users"""
    resources = [
        "National Suicide Prevention Lifeline: 988",
        "Crisis Text Line: Text HOME to 741741",
        "Crisis Chat: https://www.crisischat.org"
    ]
    return resources

Summary

In this tutorial, you've created a basic AI safety monitoring system that mimics some of the safety policies OpenAI has open-sourced. This system helps detect potentially harmful content and suggests appropriate actions when safety violations are found.

The key concepts covered include:

Creating a safety policy framework with keyword detection
Using sentiment analysis to identify negative content
Building a user-friendly interface for testing
Implementing basic safety violation handling

While this is a simplified version, it demonstrates the core principles behind AI safety monitoring. Real-world applications would require more sophisticated approaches, including integration with actual AI models, more comprehensive databases of harmful content, and proper user verification systems.

This foundation can be expanded upon to build more robust safety systems that protect vulnerable users while maintaining the utility of AI applications.