Vertice buys Vendr to build what it calls the largest procurement dataset

Learn how to build procurement intelligence datasets by analyzing purchasing patterns, supplier performance, and negotiation savings using Python and data visualization techniques.

Introduction

In this tutorial, we'll explore how to build and analyze procurement datasets using Python and machine learning techniques. This tutorial is inspired by Vertice's acquisition of Vendr, which created the world's largest procurement intelligence dataset. We'll learn how to collect, clean, and analyze procurement data to extract meaningful insights that could power procurement intelligence systems.

By the end of this tutorial, you'll have built a working procurement data analysis pipeline that can help organizations understand their purchasing patterns, supplier negotiations, and cost optimization opportunities.

Prerequisites

To follow this tutorial, you should have:

Intermediate Python programming knowledge
Basic understanding of pandas and NumPy
Experience with data analysis and visualization libraries
Python 3.7+ installed
Required packages: pandas, numpy, matplotlib, seaborn, scikit-learn

Step-by-step Instructions

1. Set up your development environment

First, create a new Python environment and install the required packages:

pip install pandas numpy matplotlib seaborn scikit-learn

This creates a clean environment for our procurement analysis work, ensuring we have all necessary tools for data manipulation and visualization.

2. Create a sample procurement dataset

Let's generate a realistic procurement dataset that mimics real-world procurement data:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random

# Create sample procurement data
np.random.seed(42)

# Define categories and suppliers
categories = ['IT Equipment', 'Office Supplies', 'Software Licenses', 'Services', 'Hardware']
suppliers = ['TechCorp', 'OfficeMax', 'SoftSolutions', 'ServicePro', 'HardwareHub']

# Generate sample data
n_records = 1000

# Create DataFrame
data = {
    'purchase_id': range(1, n_records + 1),
    'date': [datetime(2023, 1, 1) + timedelta(days=random.randint(0, 365)) for _ in range(n_records)],
    'category': [random.choice(categories) for _ in range(n_records)],
    'supplier': [random.choice(suppliers) for _ in range(n_records)],
    'amount': np.random.uniform(100, 10000, n_records),
    'quantity': np.random.randint(1, 100, n_records),
    'negotiation_savings': np.random.uniform(0, 0.3, n_records),  # Savings as percentage
    'contract_type': [random.choice(['Annual', 'Monthly', 'One-time']) for _ in range(n_records)]
}

procurement_df = pd.DataFrame(data)
procurement_df.to_csv('procurement_data.csv', index=False)
print(procurement_df.head())

This creates a realistic procurement dataset with various fields that procurement intelligence systems would analyze, including dates, categories, suppliers, and negotiation savings.

3. Load and explore the dataset

Load the dataset and perform initial exploration:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('procurement_data.csv')

# Basic information about the dataset
print("Dataset shape:", df.shape)
print("\nDataset info:")
df.info()

# Summary statistics
print("\nSummary statistics:")
print(df.describe())

This step helps us understand our data structure and identify any potential issues before analysis.

4. Data cleaning and preprocessing

Prepare the data for analysis by handling missing values and converting data types:

# Check for missing values
print("Missing values:")
print(df.isnull().sum())

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

# Create additional useful columns
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

# Calculate effective price per unit
df['effective_price'] = df['amount'] / df['quantity']

# Check data types
print("\nData types:")
print(df.dtypes)

# Display cleaned dataset
print("\nCleaned dataset:")
print(df.head())

Proper data cleaning ensures that our analysis is accurate and reliable. Converting date formats and creating derived columns like effective price helps us analyze procurement patterns more effectively.

5. Analyze procurement patterns

Perform key analyses to understand procurement behavior:

# Analysis 1: Spending by category
plt.figure(figsize=(10, 6))
spending_by_category = df.groupby('category')['amount'].sum().sort_values(ascending=False)
spending_by_category.plot(kind='bar', color='skyblue')
plt.title('Total Spending by Procurement Category')
plt.xlabel('Category')
plt.ylabel('Total Amount ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('spending_by_category.png')
plt.show()

# Analysis 2: Savings distribution
plt.figure(figsize=(10, 6))
sns.histplot(df['negotiation_savings'], bins=30, kde=True)
plt.title('Distribution of Negotiation Savings')
plt.xlabel('Savings Percentage')
plt.ylabel('Frequency')
plt.savefig('savings_distribution.png')
plt.show()

# Analysis 3: Monthly spending trend
monthly_spending = df.groupby('month')['amount'].sum()
plt.figure(figsize=(10, 6))
monthly_spending.plot(kind='line', marker='o')
plt.title('Monthly Procurement Spending Trend')
plt.xlabel('Month')
plt.ylabel('Total Amount ($)')
plt.grid(True)
plt.savefig('monthly_trend.png')
plt.show()

These analyses reveal key procurement patterns that organizations can use to optimize their purchasing strategies and identify cost-saving opportunities.

6. Advanced analysis: Supplier performance

Examine supplier performance and negotiation effectiveness:

# Supplier analysis
supplier_analysis = df.groupby('supplier').agg({
    'amount': ['sum', 'mean', 'count'],
    'negotiation_savings': 'mean'
}).round(2)

supplier_analysis.columns = ['Total_Spending', 'Average_Amount', 'Purchase_Count', 'Avg_Savings']
print("Supplier Performance Analysis:")
print(supplier_analysis.sort_values('Total_Spending', ascending=False))

# Create supplier performance visualization
plt.figure(figsize=(12, 8))

# Subplot 1: Total spending by supplier
plt.subplot(2, 2, 1)
spending_by_supplier = df.groupby('supplier')['amount'].sum().sort_values(ascending=False)
spending_by_supplier.plot(kind='bar', color='lightgreen')
plt.title('Total Spending by Supplier')
plt.ylabel('Total Amount ($)')
plt.xticks(rotation=45)

# Subplot 2: Average savings by supplier
plt.subplot(2, 2, 2)
avg_savings = df.groupby('supplier')['negotiation_savings'].mean().sort_values(ascending=False)
avg_savings.plot(kind='bar', color='orange')
plt.title('Average Negotiation Savings by Supplier')
plt.ylabel('Average Savings (%)')
plt.xticks(rotation=45)

# Subplot 3: Purchase frequency by supplier
plt.subplot(2, 2, 3)
purchase_frequency = df['supplier'].value_counts()
purchase_frequency.plot(kind='bar', color='purple')
plt.title('Purchase Frequency by Supplier')
plt.ylabel('Number of Purchases')
plt.xticks(rotation=45)

plt.tight_layout()
plt.savefig('supplier_analysis.png')
plt.show()

This supplier analysis helps procurement teams identify top-performing vendors and areas for negotiation improvement.

7. Create a procurement intelligence dashboard

Build a simple dashboard to visualize key procurement insights:

# Create comprehensive dashboard
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Procurement Intelligence Dashboard', fontsize=16)

# 1. Spending by category
spending_by_category = df.groupby('category')['amount'].sum().sort_values(ascending=False)
spending_by_category.plot(kind='bar', ax=axes[0,0], color='skyblue')
axes[0,0].set_title('Spending by Category')
axes[0,0].set_ylabel('Amount ($)')
axes[0,0].tick_params(axis='x', rotation=45)

# 2. Savings by category
savings_by_category = df.groupby('category')['negotiation_savings'].mean().sort_values(ascending=False)
savings_by_category.plot(kind='bar', ax=axes[0,1], color='lightcoral')
axes[0,1].set_title('Average Negotiation Savings by Category')
axes[0,1].set_ylabel('Average Savings (%)')
axes[0,1].tick_params(axis='x', rotation=45)

# 3. Spending trend over time
monthly_spending = df.groupby('month')['amount'].sum()
monthly_spending.plot(kind='line', marker='o', ax=axes[1,0])
axes[1,0].set_title('Monthly Spending Trend')
axes[1,0].set_ylabel('Amount ($)')
axes[1,0].grid(True)

# 4. Supplier performance
supplier_performance = df.groupby('supplier')['amount'].sum().sort_values(ascending=False)
supplier_performance.plot(kind='bar', ax=axes[1,1], color='lightgreen')
axes[1,1].set_title('Supplier Performance')
axes[1,1].set_ylabel('Total Amount ($)')
axes[1,1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('procurement_dashboard.png')
plt.show()

print("Dashboard created successfully!")

This dashboard provides a comprehensive view of procurement intelligence, making it easy for decision-makers to understand key procurement metrics at a glance.

Summary

In this tutorial, we've built a comprehensive procurement data analysis system that demonstrates the core concepts behind Vertice's acquisition of Vendr. We've learned how to:

Create realistic procurement datasets
Perform data cleaning and preprocessing
Analyze spending patterns by category and time
Evaluate supplier performance and negotiation effectiveness
Build visual dashboards for procurement intelligence

This system provides the foundation for building more sophisticated procurement intelligence platforms that can help organizations optimize their purchasing decisions and identify cost-saving opportunities. The techniques demonstrated here are directly applicable to real procurement intelligence systems that companies like Vertice and Vendr use to create valuable datasets for business insights.