Introduction
In the wake of the Mercor data breach, it's crucial for developers and security professionals to understand how to protect sensitive data and implement robust security measures. This tutorial will guide you through creating a secure data handling system using Python and encryption techniques that can help prevent the types of breaches that affected companies like Mercor.
By the end of this tutorial, you'll have built a secure data storage system that demonstrates key security concepts including encryption, secure key management, and data integrity verification.
Prerequisites
- Python 3.7 or higher installed on your system
- Basic understanding of Python programming and object-oriented concepts
- Understanding of cryptographic concepts (encryption, hashing, digital signatures)
- Required Python packages:
pycryptodome,hashlib,os,json
Step-by-Step Instructions
1. Install Required Dependencies
First, we need to install the cryptographic library that will handle our encryption operations. The pycryptodome library provides robust encryption algorithms.
pip install pycryptodome
Why this step: The pycryptodome library is a self-contained Python package of low-level cryptographic primitives that will allow us to implement secure encryption and decryption operations.
2. Create the Secure Data Handler Class
Let's create a foundational class that will handle our secure data operations:
import os
import json
import hashlib
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
from Crypto.Protocol.KDF import PBKDF2
class SecureDataHandler:
def __init__(self, password: str):
self.password = password
self.salt = get_random_bytes(16)
self.key = PBKDF2(password, self.salt, dkLen=32)
def encrypt_data(self, data: str) -> dict:
# Generate a random initialization vector
iv = get_random_bytes(16)
# Create cipher object
cipher = AES.new(self.key, AES.MODE_CBC, iv)
# Pad the data to be multiple of 16 bytes
padded_data = self._pad(data.encode())
# Encrypt the data
encrypted_data = cipher.encrypt(padded_data)
# Return encrypted data with metadata
return {
'iv': iv.hex(),
'salt': self.salt.hex(),
'data': encrypted_data.hex(),
'hash': hashlib.sha256(data.encode()).hexdigest()
}
def decrypt_data(self, encrypted_data: dict) -> str:
# Convert hex back to bytes
iv = bytes.fromhex(encrypted_data['iv'])
salt = bytes.fromhex(encrypted_data['salt'])
encrypted_bytes = bytes.fromhex(encrypted_data['data'])
# Recreate key with stored salt
key = PBKDF2(self.password, salt, dkLen=32)
# Decrypt the data
cipher = AES.new(key, AES.MODE_CBC, iv)
decrypted_padded = cipher.decrypt(encrypted_bytes)
# Remove padding
decrypted_data = self._unpad(decrypted_padded).decode()
return decrypted_data
def _pad(self, data: bytes) -> bytes:
block_size = 16
padding_length = block_size - (len(data) % block_size)
return data + bytes([padding_length] * padding_length)
def _unpad(self, data: bytes) -> bytes:
padding_length = data[-1]
return data[:-padding_length]
Why this step: This class implements a secure encryption system using AES-256 with proper padding and initialization vectors, which are essential for preventing cryptographic attacks.
3. Implement Data Integrity Verification
Let's enhance our system with data integrity checks to ensure that data hasn't been tampered with:
import hmac
class SecureDataHandler:
# ... previous methods ...
def verify_integrity(self, data: str, expected_hash: str) -> bool:
actual_hash = hashlib.sha256(data.encode()).hexdigest()
return hmac.compare_digest(actual_hash, expected_hash)
def store_data(self, key: str, data: str) -> dict:
# Encrypt the data
encrypted = self.encrypt_data(data)
# Add metadata
store_data = {
'key': key,
'data': encrypted,
'timestamp': self._get_timestamp()
}
return store_data
def retrieve_data(self, store_data: dict) -> str:
# Decrypt the data
decrypted = self.decrypt_data(store_data['data'])
# Verify integrity
if not self.verify_integrity(decrypted, store_data['data']['hash']):
raise ValueError('Data integrity check failed')
return decrypted
def _get_timestamp(self) -> str:
import datetime
return datetime.datetime.now().isoformat()
Why this step: Adding data integrity verification ensures that even if data is intercepted or modified, we can detect the tampering and prevent unauthorized access to corrupted data.
4. Create a Secure Data Storage System
Now let's build a complete system that can store and retrieve data securely:
import json
import os
class SecureDataStorage:
def __init__(self, password: str, storage_file: str = 'secure_data.json'):
self.handler = SecureDataHandler(password)
self.storage_file = storage_file
self.data = self._load_data()
def _load_data(self) -> dict:
if os.path.exists(self.storage_file):
with open(self.storage_file, 'r') as f:
return json.load(f)
return {}
def save_data(self, key: str, data: str):
# Store the data
stored_data = self.handler.store_data(key, data)
# Add to local storage
self.data[key] = stored_data
# Save to file
with open(self.storage_file, 'w') as f:
json.dump(self.data, f, indent=2)
def get_data(self, key: str) -> str:
if key not in self.data:
raise KeyError(f'Key {key} not found')
return self.handler.retrieve_data(self.data[key])
def list_keys(self) -> list:
return list(self.data.keys())
Why this step: This creates a complete storage system that persists encrypted data to disk while maintaining the security properties we've implemented.
5. Test the Secure System
Let's test our secure data storage system:
def main():
# Create a secure storage instance
storage = SecureDataStorage('my_secure_password_123', 'test_data.json')
# Store some sensitive data
sensitive_data = 'This is confidential information that needs protection'
storage.save_data('user_credentials', sensitive_data)
# Retrieve the data
retrieved_data = storage.get_data('user_credentials')
print(f'Retrieved data: {retrieved_data}')
# List all stored keys
print(f'Stored keys: {storage.list_keys()}')
# Try to access non-existent key
try:
storage.get_data('non_existent_key')
except KeyError as e:
print(f'Error: {e}')
if __name__ == '__main__':
main()
Why this step: Testing ensures our implementation works correctly and handles edge cases properly, including error conditions and data integrity checks.
6. Add Additional Security Measures
For production use, we should add more security measures:
import time
import threading
from datetime import datetime, timedelta
class EnhancedSecureDataHandler(SecureDataHandler):
def __init__(self, password: str, max_age_hours: int = 24):
super().__init__(password)
self.max_age = max_age_hours
def is_expired(self, timestamp: str) -> bool:
stored_time = datetime.fromisoformat(timestamp)
return datetime.now() > stored_time + timedelta(hours=self.max_age)
def secure_delete(self, file_path: str):
# Overwrite file with random data before deletion
with open(file_path, 'r+b') as f:
f.write(get_random_bytes(os.path.getsize(file_path)))
os.remove(file_path)
def rotate_key(self, new_password: str):
# Generate new key with new password
self.password = new_password
self.key = PBKDF2(new_password, self.salt, dkLen=32)
Why this step: Adding key rotation, data expiration, and secure deletion helps protect against long-term data exposure and ensures that access keys can be updated when necessary.
Summary
This tutorial demonstrated how to build a secure data handling system that protects against the types of vulnerabilities that led to breaches like Mercor's. We implemented:
- AES-256 encryption with proper padding and initialization vectors
- Data integrity verification using SHA-256 hashing
- Secure key derivation using PBKDF2
- Persistent storage with integrity checks
- Additional security measures like data expiration and key rotation
While this system provides a strong foundation for data security, remember that real-world applications require additional measures such as secure network communication (HTTPS), proper access controls, regular security audits, and compliance with relevant regulations like GDPR or CCPA. The key takeaway is that robust security requires multiple layers of protection, not just encryption alone.



