Introduction
In this tutorial, you'll learn how to work with the technology behind AI training platforms like DoorDash's Tasks app. You'll build a simple data labeling system that mimics how gig workers contribute to AI training by creating labeled datasets. This is a practical demonstration of how AI training data is collected and processed, which is essential for understanding the future of AI-powered gig work.
Prerequisites
- Basic Python knowledge
- Understanding of machine learning concepts
- Python libraries: pandas, numpy, pillow, and requests
- Basic understanding of image processing and data labeling
Step-by-Step Instructions
1. Set Up Your Development Environment
First, create a virtual environment and install the required packages. This ensures your project doesn't interfere with other Python installations.
python -m venv ai_tasks_env
source ai_tasks_env/bin/activate # On Windows: ai_tasks_env\Scripts\activate
pip install pandas numpy pillow requests
2. Create the Data Labeling Structure
Next, we'll create a basic data structure to represent the labeling tasks that gig workers would encounter. This simulates how DoorDash's Tasks app organizes work.
import pandas as pd
import numpy as np
from PIL import Image
import os
class TaskData:
def __init__(self):
self.tasks = pd.DataFrame(columns=['task_id', 'image_path', 'label', 'status', 'worker_id'])
def add_task(self, image_path, label, worker_id=None):
task_id = len(self.tasks) + 1
new_task = {
'task_id': task_id,
'image_path': image_path,
'label': label,
'status': 'pending',
'worker_id': worker_id
}
self.tasks = pd.concat([self.tasks, pd.DataFrame([new_task])], ignore_index=True)
def get_pending_tasks(self):
return self.tasks[self.tasks['status'] == 'pending']
def mark_completed(self, task_id, worker_id, label):
self.tasks.loc[self.tasks['task_id'] == task_id, 'status'] = 'completed'
self.tasks.loc[self.tasks['task_id'] == task_id, 'worker_id'] = worker_id
self.tasks.loc[self.tasks['task_id'] == task_id, 'label'] = label
3. Generate Sample Images for Training
We'll create sample images that represent the types of tasks gig workers might encounter. These images will be used to simulate the training data collection process.
def create_sample_images(directory='sample_images'):
if not os.path.exists(directory):
os.makedirs(directory)
# Create sample images
for i, label in enumerate(['laundry', 'scrambled_eggs', 'park_walk']):
# Create a simple image
img = Image.new('RGB', (224, 224), color=(i*50, i*30, i*20))
img.save(f'{directory}/{label}_{i}.jpg')
print(f'Created {label}_{i}.jpg')
create_sample_images()
4. Implement Task Assignment Logic
This step simulates how tasks are assigned to gig workers. We'll create a system that distributes tasks based on worker availability and skill levels.
class TaskAssigner:
def __init__(self, worker_pool):
self.workers = worker_pool
self.task_queue = []
def assign_task(self, task_data, worker_id):
# Simple assignment logic - assign to first available worker
for worker in self.workers:
if worker['status'] == 'available':
worker['status'] = 'busy'
worker['current_task'] = task_data
return worker['id']
return None
def complete_task(self, worker_id, label):
for worker in self.workers:
if worker['id'] == worker_id:
worker['status'] = 'available'
worker['completed_tasks'] += 1
break
5. Simulate Worker Interaction
Now we'll simulate how a gig worker would interact with the tasks, including viewing, labeling, and submitting work.
def simulate_worker_interaction(task_data, assigner, worker_id):
print(f'\nWorker {worker_id} starting task assignment')
# Get pending tasks
pending = task_data.get_pending_tasks()
if len(pending) == 0:
print('No pending tasks available')
return
# Assign first task
first_task = pending.iloc[0]
print(f'Assigning task {first_task["task_id"]} to worker {worker_id}')
# Simulate worker labeling
worker_label = input(f'Worker {worker_id}, please label task {first_task["task_id"]}: ')
# Mark as completed
task_data.mark_completed(first_task['task_id'], worker_id, worker_label)
print(f'Task {first_task["task_id"]} completed with label: {worker_label}')
# Update worker status
assigner.complete_task(worker_id, worker_label)
6. Build the Complete Simulation
Finally, we'll put everything together to create a complete simulation that demonstrates how the gig work system functions.
def main_simulation():
# Initialize data and workers
task_data = TaskData()
workers = [
{'id': 1, 'status': 'available', 'completed_tasks': 0},
{'id': 2, 'status': 'available', 'completed_tasks': 0},
{'id': 3, 'status': 'available', 'completed_tasks': 0}
]
assigner = TaskAssigner(workers)
# Add sample tasks
sample_tasks = [
('sample_images/laundry_0.jpg', 'laundry'),
('sample_images/scrambled_eggs_1.jpg', 'scrambled_eggs'),
('sample_images/park_walk_2.jpg', 'park_walk')
]
for img_path, label in sample_tasks:
task_data.add_task(img_path, label)
print('Starting AI Training Task Simulation')
print(f'Total tasks: {len(task_data.tasks)}')
# Simulate worker interactions
for worker_id in [1, 2, 3]:
simulate_worker_interaction(task_data, assigner, worker_id)
print('\nFinal Task Status:')
print(task_data.tasks)
print('\nWorker Statistics:')
for worker in workers:
print(f'Worker {worker["id"]}: {worker["completed_tasks"]} tasks completed')
# Run the simulation
if __name__ == '__main__':
main_simulation()
Summary
This tutorial demonstrated how AI training platforms like DoorDash's Tasks app function by building a simplified simulation. You learned how tasks are structured, assigned to workers, and processed. The system mimics real-world gig work where workers contribute to AI training by labeling data. Understanding this process is crucial for grasping how AI development relies on human-in-the-loop systems and how gig economy platforms are evolving to support AI training infrastructure.
The key takeaway is that these systems represent a new form of work where human labor directly contributes to AI model development, creating both opportunities and challenges for gig workers in the AI era.



