Perplexity AI sued over alleged data sharing with Meta and Google
Back to Tutorials
techTutorialintermediate

Perplexity AI sued over alleged data sharing with Meta and Google

April 1, 20261 views6 min read

Learn to build a privacy-focused data handling system that monitors and logs data sharing activities while maintaining user privacy through encryption and access controls.

Introduction

In this tutorial, we'll explore how to build a privacy-focused data handling system that could help prevent the kind of data sharing issues that Perplexity AI is facing. We'll create a Python-based system that monitors and logs data sharing activities while maintaining user privacy through encryption and access controls. This system demonstrates how developers can implement robust privacy safeguards in AI applications.

Prerequisites

  • Python 3.8 or higher installed
  • Basic understanding of Python programming
  • Familiarity with AI/ML concepts and data handling
  • Knowledge of encryption and security principles
  • Installed packages: pycryptodome, sqlite3, json

Step-by-Step Instructions

Step 1: Set Up the Project Structure

First, we'll create the basic project structure for our privacy monitoring system. This will include directories for configuration, data storage, and logging.

1.1 Create Project Directory

mkdir privacy_monitor
mkdir privacy_monitor/src
mkdir privacy_monitor/data
mkdir privacy_monitor/logs

1.2 Initialize Python Package

touch privacy_monitor/__init__.py

Why: This creates a proper Python package structure that will help organize our code as it grows.

Step 2: Create Configuration Management

Next, we'll set up configuration management to handle different environments and security settings.

2.1 Create config.py

import os
from dataclasses import dataclass

@dataclass
class Config:
    # Security settings
    encryption_key: str = os.getenv('ENCRYPTION_KEY', 'default_key_for_dev')
    log_level: str = os.getenv('LOG_LEVEL', 'INFO')
    
    # Data sharing policies
    allow_third_party_sharing: bool = False
    share_analytics: bool = False
    
    # Database settings
    db_path: str = 'data/privacy_monitor.db'
    
    # Logging settings
    log_file: str = 'logs/privacy_monitor.log'

# Initialize configuration
config = Config()

2.2 Create environment file

echo 'ENCRYPTION_KEY=your_secure_encryption_key_here' > .env

Why: Separating configuration from code allows us to manage different environments (development, production) without changing code, and keeps sensitive information out of version control.

Step 3: Implement Data Encryption Module

Now we'll create an encryption module to protect user data before any sharing occurs.

3.1 Create encryption.py

from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
from Crypto.Protocol.KDF import PBKDF2
import base64
import json

# AES encryption with key derivation

class DataEncryption:
    def __init__(self, password: str):
        self.password = password.encode('utf-8')
        
    def encrypt_data(self, data: dict) -> str:
        # Generate key from password
        salt = get_random_bytes(16)
        key = PBKDF2(self.password, salt, dkLen=32)
        
        # Encrypt data
        cipher = AES.new(key, AES.MODE_EAX)
        ciphertext, tag = cipher.encrypt_and_digest(json.dumps(data).encode('utf-8'))
        
        # Combine salt, nonce, tag, and ciphertext
        encrypted_data = {
            'salt': base64.b64encode(salt).decode('utf-8'),
            'nonce': base64.b64encode(cipher.nonce).decode('utf-8'),
            'tag': base64.b64encode(tag).decode('utf-8'),
            'ciphertext': base64.b64encode(ciphertext).decode('utf-8')
        }
        
        return json.dumps(encrypted_data)
    
    def decrypt_data(self, encrypted_data: str) -> dict:
        data = json.loads(encrypted_data)
        
        # Decode base64 data
        salt = base64.b64decode(data['salt'])
        nonce = base64.b64decode(data['nonce'])
        tag = base64.b64decode(data['tag'])
        ciphertext = base64.b64decode(data['ciphertext'])
        
        # Derive key
        key = PBKDF2(self.password, salt, dkLen=32)
        
        # Decrypt
        cipher = AES.new(key, AES.MODE_EAX, nonce=nonce)
        plaintext = cipher.decrypt_and_verify(ciphertext, tag)
        
        return json.loads(plaintext.decode('utf-8'))

Why: This encryption module ensures that even if data is intercepted, it remains unreadable without the proper key, protecting user privacy from unauthorized access.

Step 4: Build Data Sharing Monitor

Let's create the core monitoring system that tracks when data sharing occurs.

4.1 Create monitor.py

import sqlite3
import json
import datetime
from typing import Dict, Any
from config import config

class DataSharingMonitor:
    def __init__(self):
        self.db_path = config.db_path
        self._init_database()
        
    def _init_database(self):
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS sharing_logs (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT,
                target_service TEXT,
                data_type TEXT,
                user_id TEXT,
                is_encrypted BOOLEAN,
                sharing_policy TEXT
            )
        ''')
        
        conn.commit()
        conn.close()
        
    def log_sharing_activity(self, target_service: str, data_type: str, user_id: str, 
                           is_encrypted: bool = True, sharing_policy: str = 'default'):
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute('''
            INSERT INTO sharing_logs (timestamp, target_service, data_type, user_id, 
                                    is_encrypted, sharing_policy)
            VALUES (?, ?, ?, ?, ?, ?)
        ''', (
            datetime.datetime.now().isoformat(),
            target_service,
            data_type,
            user_id,
            is_encrypted,
            sharing_policy
        ))
        
        conn.commit()
        conn.close()
        
        print(f"Sharing activity logged for {target_service}")
        
    def get_sharing_history(self, user_id: str = None) -> list:
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        if user_id:
            cursor.execute('''
                SELECT * FROM sharing_logs WHERE user_id = ?
                ORDER BY timestamp DESC
            ''', (user_id,))
        else:
            cursor.execute('''
                SELECT * FROM sharing_logs ORDER BY timestamp DESC
            ''')
            
        results = cursor.fetchall()
        conn.close()
        
        return results

Why: This monitor tracks all data sharing activities, creating an audit trail that can be used to verify compliance with privacy policies and detect unauthorized sharing attempts.

Step 5: Create User Data Handler

Now we'll implement a system that manages user data with privacy controls.

5.1 Create user_handler.py

from encryption import DataEncryption
from monitor import DataSharingMonitor
from config import config
import uuid

class UserDataHandler:
    def __init__(self):
        self.encryption = DataEncryption(config.encryption_key)
        self.monitor = DataSharingMonitor()
        
    def create_user(self, user_data: dict) -> str:
        user_id = str(uuid.uuid4())
        encrypted_data = self.encryption.encrypt_data(user_data)
        
        # Log user creation
        self.monitor.log_sharing_activity(
            target_service='user_creation',
            data_type='user_profile',
            user_id=user_id,
            is_encrypted=True,
            sharing_policy='user_consent'
        )
        
        return user_id
        
    def share_data(self, user_id: str, target_service: str, data: dict, 
                  allow_sharing: bool = False) -> bool:
        # Check sharing policy
        if not allow_sharing and not config.allow_third_party_sharing:
            print(f"Sharing to {target_service} blocked by policy")
            return False
            
        # Encrypt data before sharing
        encrypted_data = self.encryption.encrypt_data(data)
        
        # Log the sharing activity
        self.monitor.log_sharing_activity(
            target_service=target_service,
            data_type='chat_data',
            user_id=user_id,
            is_encrypted=True,
            sharing_policy='third_party_consent' if allow_sharing else 'restricted'
        )
        
        print(f"Data shared with {target_service} (encrypted)")
        return True
        
    def get_user_history(self, user_id: str) -> list:
        return self.monitor.get_sharing_history(user_id)

Why: This handler ensures that all user data is properly encrypted before sharing and that sharing activities are logged, providing transparency and control over user data.

Step 6: Test the System

Finally, let's create a test script to verify our system works correctly.

6.1 Create test_system.py

from user_handler import UserDataHandler
from config import config

# Initialize system
handler = UserDataHandler()

# Test user creation
user_data = {
    'name': 'John Doe',
    'email': '[email protected]',
    'preferences': {'notifications': True, 'analytics': False}
}

user_id = handler.create_user(user_data)
print(f"Created user with ID: {user_id}")

# Test data sharing with policy restrictions
chat_data = {
    'message': 'Hello, how are you?',
    'timestamp': '2025-02-01T10:00:00Z'
}

# This should be blocked by default
handler.share_data(user_id, 'Meta', chat_data, allow_sharing=False)

# This should be allowed
handler.share_data(user_id, 'Google', chat_data, allow_sharing=True)

# Check sharing history
history = handler.get_user_history(user_id)
print("\nSharing History:")
for record in history:
    print(f"{record[1]} - {record[2]} - {record[4]}")

Why: This test script validates that our privacy monitoring system correctly enforces policies, encrypts data, and logs all sharing activities.

Summary

In this tutorial, we've built a comprehensive privacy monitoring system that could help prevent the kind of data sharing issues that Perplexity AI is facing. The system includes encryption for user data, a monitoring module that logs all sharing activities, and policy enforcement mechanisms. By implementing these privacy controls, developers can ensure that user data is protected and that any data sharing activities are properly tracked and auditable. This approach demonstrates how to build robust privacy safeguards into AI applications, which is increasingly important as regulatory requirements around data protection continue to evolve.

Source: The Decoder

Related Articles