Introduction
In this tutorial, we'll explore how to programmatically analyze laptop deals from Amazon Prime Day using Python. This intermediate-level tutorial will teach you how to scrape laptop data, process it, and create a structured analysis that helps identify the best deals. You'll learn web scraping techniques, data manipulation with pandas, and how to filter and rank products based on key criteria.
Prerequisites
- Basic Python knowledge (functions, loops, lists, dictionaries)
- Python installed on your system
- Required packages: requests, BeautifulSoup4, pandas, lxml
- Basic understanding of HTML structure and web scraping concepts
Step-by-step Instructions
1. Setting Up Your Environment
1.1 Install Required Packages
First, we need to install the necessary Python libraries for web scraping and data processing:
pip install requests beautifulsoup4 pandas lxml
This installs the essential tools: requests for HTTP requests, BeautifulSoup for HTML parsing, pandas for data manipulation, and lxml for efficient parsing.
1.2 Create Project Structure
Create a new directory for this project and set up the basic files:
mkdir prime_day_laptops
cd prime_day_laptops
touch laptop_scraper.py
touch laptop_analysis.py
2. Web Scraping Implementation
2.1 Basic Web Scraper Setup
Let's create a basic scraper that can fetch laptop data from Amazon:
import requests
from bs4 import BeautifulSoup
import time
# Set headers to mimic a real browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
}
def scrape_laptop_data(query):
url = f'https://www.amazon.com/s?k={query}&ref=nb_sb_noss_2'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
laptops = []
# Find product containers
products = soup.find_all('div', {'data-component-type': 's-search-result'})
for product in products:
try:
title = product.h2.a.text.strip()
price = product.find('span', {'class': 'a-price-whole'})
if price:
price = price.text.replace(',', '')
price = float(price)
else:
price = 0
laptops.append({
'title': title,
'price': price,
'url': 'https://amazon.com' + product.h2.a['href']
})
except Exception as e:
print(f'Error parsing product: {e}')
continue
return laptops
This scraper mimics a real browser to avoid being blocked, and extracts basic laptop information including title, price, and URL.
2.2 Add Rate Limiting
To respect Amazon's servers and avoid getting blocked, implement rate limiting:
import time
def scrape_with_delay(query, delay=1):
laptops = scrape_laptop_data(query)
time.sleep(delay) # Wait between requests
return laptops
Adding delays between requests prevents overwhelming the server and reduces the chance of IP blocking.
3. Data Processing and Analysis
3.1 Create Data Analysis Functions
Now let's build functions to process and analyze the scraped data:
import pandas as pd
# Define laptop categories
LAPTOP_CATEGORIES = {
'macbook': ['macbook', 'apple', 'm1', 'm2'],
'gaming': ['gaming', 'rtx', 'geforce', 'nvidia'],
'ultraportable': ['ultra', 'thin', 'lightweight', 'portable'],
'workstation': ['workstation', 'professional', 'business', 'dual', 'quad']
}
def categorize_laptop(title):
title_lower = title.lower()
for category, keywords in LAPTOP_CATEGORIES.items():
if any(keyword in title_lower for keyword in keywords):
return category
return 'other'
def analyze_laptops(laptops):
df = pd.DataFrame(laptops)
df['category'] = df['title'].apply(categorize_laptop)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
# Remove any rows with invalid prices
df = df.dropna(subset=['price'])
# Calculate price rankings within categories
df['price_rank'] = df.groupby('category')['price'].rank(method='min', ascending=True)
return df
This function categorizes laptops by type and ranks them by price within each category, making it easier to identify the best deals.
3.2 Add Deal Identification Logic
Let's add functionality to identify potential Prime Day deals:
def identify_deals(df):
# Find laptops with low prices relative to their category average
category_averages = df.groupby('category')['price'].mean()
df['avg_price'] = df['category'].map(category_averages)
df['price_difference'] = df['avg_price'] - df['price']
df['deal_score'] = df['price_difference'] / df['avg_price'] * 100
# Filter for potential deals (top 20% by deal score)
threshold = df['deal_score'].quantile(0.8)
deals = df[df['deal_score'] >= threshold]
return deals.sort_values('deal_score', ascending=False)
This logic calculates a deal score based on how much below the category average a laptop is priced, helping identify truly exceptional deals.
4. Complete Implementation
4.1 Main Execution Script
Now let's put everything together in a complete script:
def main():
# Scrape different laptop types
queries = ['macbook air', 'gaming laptop', 'ultraportable laptop', 'workstation laptop']
all_laptops = []
for query in queries:
print(f'Scraping {query}...')
laptops = scrape_with_delay(query, delay=2)
all_laptops.extend(laptops)
print(f'Found {len(laptops)} laptops for {query}')
# Process and analyze
df = analyze_laptops(all_laptops)
deals = identify_deals(df)
# Display results
print('\nTop Prime Day Deals:')
print(deals[['title', 'price', 'category', 'deal_score']].head(10))
# Save to CSV
df.to_csv('prime_day_laptops.csv', index=False)
deals.to_csv('prime_day_deals.csv', index=False)
print('\nData saved to CSV files')
if __name__ == '__main__':
main()
This complete script scrapes multiple laptop categories, processes the data, identifies the best deals, and saves results to CSV files for further analysis.
5. Advanced Features
5.1 Add Sorting and Filtering Options
Enhance your analysis with additional sorting capabilities:
def get_top_deals(df, category=None, max_price=None, top_n=5):
filtered_df = df.copy()
if category:
filtered_df = filtered_df[filtered_df['category'] == category]
if max_price:
filtered_df = filtered_df[filtered_df['price'] <= max_price]
return filtered_df.nlargest(top_n, 'deal_score')
This function allows you to filter deals by category or maximum price, making it easier to focus on specific types of laptops.
Summary
This tutorial demonstrated how to build a comprehensive laptop deal analysis system using Python. You learned to scrape Amazon for laptop data, process it with pandas, categorize products, and identify exceptional deals based on price comparisons. The system provides a structured approach to analyzing Prime Day deals, helping you make informed purchasing decisions. This framework can be extended to include additional metrics like reviews, specifications, or brand comparisons to further enhance deal identification accuracy.



