Introduction
In this tutorial, you'll learn how to programmatically scrape and analyze Samsung product data from Amazon Prime Day deals using Python. This intermediate-level tutorial will teach you how to extract pricing information, compare deals, and identify the best savings using web scraping techniques and data analysis. You'll build a tool that can monitor Samsung product prices and alert you to the best deals before Prime Day begins.
Prerequisites
- Python 3.7 or higher installed on your system
- Familiarity with Python programming concepts
- Basic understanding of web scraping and HTML structure
- Required Python libraries: requests, BeautifulSoup, pandas, and lxml
Step-by-step instructions
Step 1: Set up your development environment
Install required packages
First, you'll need to install the necessary Python libraries. Open your terminal or command prompt and run:
pip install requests beautifulsoup4 pandas lxml
This installs the essential libraries for web scraping and data manipulation. The requests library handles HTTP requests, BeautifulSoup parses HTML content, pandas manages data structures, and lxml provides a fast XML parser.
Step 2: Create the main scraping class
Initialize the scraper
Create a new Python file called samsung_deal_scraper.py and start by importing the required modules:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import random
class SamsungDealScraper:
def __init__(self):
self.base_url = "https://www.amazon.com"
self.session = requests.Session()
# Set a user agent to avoid being blocked
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})
The class initializes a session with a user agent header to mimic a real browser, which helps avoid being blocked by Amazon's anti-bot measures.
Step 3: Implement the search functionality
Search for Samsung products
Add the following method to search for Samsung products:
def search_samsung_products(self, search_term, max_results=20):
search_url = f"{self.base_url}/s?k={search_term}&i=electronics&rh=n%3A283155%2Cp_n_feature_browse-bin%3A12485791011"
try:
response = self.session.get(search_url)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'lxml')
products = self.parse_search_results(soup)
return products[:max_results]
except requests.RequestException as e:
print(f"Error searching for products: {e}")
return []
This method constructs a search URL targeting Samsung products in the electronics category and parses the results using BeautifulSoup.
Step 4: Parse product details
Extract relevant product information
Implement the parsing method to extract product details:
def parse_search_results(self, soup):
products = []
product_containers = soup.find_all('div', {'data-component-type': 's-search-result'})
for container in product_containers:
try:
# Extract product title
title_elem = container.find('h2', class_='a-size-mini')
title = title_elem.get_text(strip=True) if title_elem else "N/A"
# Extract price
price_elem = container.find('span', class_='a-price-whole')
price = price_elem.get_text(strip=True) if price_elem else "N/A"
# Extract rating
rating_elem = container.find('span', class_='a-icon-alt')
rating = rating_elem.get_text(strip=True).split()[0] if rating_elem else "N/A"
# Extract product link
link_elem = container.find('a', class_='a-link-normal')
link = f"{self.base_url}{link_elem['href']}" if link_elem else "N/A"
products.append({
'title': title,
'price': price,
'rating': rating,
'link': link
})
except Exception as e:
print(f"Error parsing product: {e}")
continue
return products
This method extracts key product information including title, price, rating, and link, which will help identify the best deals.
Step 5: Analyze and filter deals
Identify the best value products
Add a method to analyze the scraped data and find the best deals:
def analyze_deals(self, products):
# Convert to DataFrame for easier analysis
df = pd.DataFrame(products)
# Clean price data
df['price_numeric'] = df['price'].str.replace(',', '').str.replace('$', '').astype(float)
# Filter for Samsung products
samsung_products = df[df['title'].str.contains('Samsung', case=False, na=False)]
# Sort by price (ascending for best deals)
best_deals = samsung_products.sort_values('price_numeric').head(10)
return best_deals
def get_prime_day_deals(self):
# Search for specific Samsung products
search_terms = ['Samsung Galaxy S23', 'Samsung Galaxy Tab', 'Samsung Smart TV']
all_products = []
for term in search_terms:
print(f"Searching for: {term}")
products = self.search_samsung_products(term, max_results=10)
all_products.extend(products)
# Be respectful to Amazon's servers
time.sleep(random.uniform(1, 3))
return self.analyze_deals(all_products)
This method converts the scraped data into a pandas DataFrame, cleans the price data, filters for Samsung products, and sorts by price to identify the best deals.
Step 6: Run the scraper and display results
Execute the deal monitoring tool
Add the main execution block to run your scraper:
if __name__ == "__main__":
scraper = SamsungDealScraper()
print("Scraping Samsung Prime Day deals...")
deals = scraper.get_prime_day_deals()
if not deals.empty:
print("\nTop Samsung Prime Day Deals:")
print("=" * 50)
for index, deal in deals.iterrows():
print(f"Title: {deal['title']}")
print(f"Price: ${deal['price_numeric']:.2f}")
print(f"Rating: {deal['rating']}")
print(f"Link: {deal['link']}")
print("-" * 30)
else:
print("No deals found or error occurred.")
This final step executes the scraper, displays the results, and formats them for easy reading.
Step 7: Enhance with additional features
Implement deal tracking and alerts
For a more advanced feature, add a method to track price changes over time:
def track_price_changes(self, product_link, days=7):
# This would require storing historical data
# For now, just return a placeholder
print(f"Tracking price changes for: {product_link}")
return "Price tracking functionality would be implemented here"
def save_deals_to_csv(self, deals, filename="prime_day_deals.csv"):
deals.to_csv(filename, index=False)
print(f"Deals saved to {filename}")
This enhanced functionality would allow you to monitor price changes over time and create alerts for significant drops.
Summary
In this tutorial, you've built a comprehensive Samsung Prime Day deal scraper that can search for products, extract pricing information, and identify the best deals. You've learned how to use Python libraries for web scraping, data manipulation with pandas, and how to respect web server resources through proper delays and user agents. This tool can be extended with additional features like email alerts, database storage, or integration with price tracking services to help you maximize your savings during Prime Day. Remember to always follow ethical web scraping practices and respect website terms of service.



