Introduction
In the enterprise software space, companies like Rivvun AI are pioneering AI-driven solutions to recover lost revenue and optimize spending. This tutorial will guide you through building a basic enterprise spend recovery system using Python and machine learning concepts. We'll focus on creating a prototype that can identify potential revenue losses in contract data, similar to what Rivvun AI is working on.
Prerequisites
- Basic Python knowledge and experience with pandas and scikit-learn
- Python 3.7 or higher installed
- Required packages: pandas, scikit-learn, numpy, matplotlib
- Sample contract data in CSV format
Step-by-Step Instructions
1. Set Up Your Development Environment
First, we need to create a virtual environment and install the necessary packages. This ensures we have a clean, isolated environment for our project.
python -m venv spend_recovery_env
source spend_recovery_env/bin/activate # On Windows: spend_recovery_env\Scripts\activate
pip install pandas scikit-learn numpy matplotlib
2. Create Sample Contract Data
Before we can analyze contracts, we need sample data. Create a CSV file named contracts.csv with the following structure:
contract_id,company_name,contract_value,contract_start_date,contract_end_date,service_type,terms_compliance
C001,Acme Corp,100000,2023-01-01,2023-12-31,Software,Compliant
C002,Global Inc,250000,2023-03-15,2024-03-15,Consulting,Non-Compliant
C003,Future Tech,150000,2023-06-01,2024-06-01,Hardware,Compliant
C004,Alpha Ltd,300000,2023-02-01,2024-02-01,Software,Compliant
C005,Beta Corp,200000,2023-08-01,2024-08-01,Consulting,Non-Compliant
This dataset represents basic contract information that we'll use to identify potential revenue losses.
3. Load and Explore the Data
Now we'll load the data and perform basic exploratory analysis to understand what we're working with:
import pandas as pd
df = pd.read_csv('contracts.csv')
print(df.head())
print(df.info())
print(df.describe())
This step is crucial because understanding your data structure helps you identify patterns and anomalies that might indicate revenue loss opportunities.
4. Create Revenue Loss Detection Logic
Next, we'll implement a basic algorithm to detect potential revenue losses based on contract compliance and service types:
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
import numpy as np
def detect_revenue_loss(df):
# Create a new column to flag potential revenue loss
df['potential_loss'] = 0
# Flag contracts with non-compliant terms
df.loc[df['terms_compliance'] == 'Non-Compliant', 'potential_loss'] = 1
# Flag contracts with high-value but short duration
df.loc[(df['contract_value'] > 200000) & (df['contract_end_date'] - df['contract_start_date']).dt.days < 365, 'potential_loss'] = 1
# Flag specific service types that often lead to losses
high_risk_services = ['Consulting', 'Support']
df.loc[df['service_type'].isin(high_risk_services), 'potential_loss'] = 1
return df
# Apply the detection function
df = detect_revenue_loss(df)
print(df[['contract_id', 'company_name', 'potential_loss']])
This logic identifies potential revenue losses based on known patterns in enterprise contracts. The 'potential_loss' flag will help prioritize which contracts need further review.
5. Implement Machine Learning Model for Predictive Analysis
To make our system more intelligent, we'll train a machine learning model to predict revenue loss probability:
# Prepare data for ML model
le = LabelEncoder()
df['service_type_encoded'] = le.fit_transform(df['service_type'])
df['terms_compliance_encoded'] = le.fit_transform(df['terms_compliance'])
columns_to_use = ['contract_value', 'service_type_encoded', 'terms_compliance_encoded']
X = df[columns_to_use]
y = df['potential_loss']
# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Evaluate model
accuracy = model.score(X_test, y_test)
print(f'Model Accuracy: {accuracy:.2f}')
The Random Forest model provides a good baseline for classification tasks and is robust to overfitting, making it suitable for enterprise applications where data quality can vary.
6. Generate Spend Recovery Report
Finally, we'll create a comprehensive report that highlights potential revenue recovery opportunities:
def generate_recovery_report(df, model):
# Predict potential losses using ML model
X_pred = df[columns_to_use]
df['loss_probability'] = model.predict_proba(X_pred)[:, 1]
# Identify high-risk contracts
high_risk = df[df['loss_probability'] > 0.5]
# Create summary statistics
total_value = df['contract_value'].sum()
potential_loss_value = high_risk['contract_value'].sum()
print('=== Enterprise Spend Recovery Report ===')
print(f'Total Contract Value: ${total_value:,}')
print(f'Potential Loss Value: ${potential_loss_value:,}')
print(f'Estimated Recovery Opportunity: ${total_value - potential_loss_value:,}')
print('\nHigh-Risk Contracts:')
print(high_risk[['contract_id', 'company_name', 'contract_value', 'loss_probability']])
return high_risk
# Generate the report
high_risk_contracts = generate_recovery_report(df, model)
This report provides actionable insights that enterprise decision-makers can use to prioritize recovery efforts.
7. Visualize Results
Visualizing our findings helps stakeholders understand the data more effectively:
import matplotlib.pyplot as plt
# Plot contract values by risk level
plt.figure(figsize=(10, 6))
plt.hist([df[df['potential_loss']==1]['contract_value'], df[df['potential_loss']==0]['contract_value']],
bins=10, label=['High Risk', 'Low Risk'], alpha=0.7)
plt.xlabel('Contract Value')
plt.ylabel('Frequency')
plt.title('Distribution of Contract Values by Risk Level')
plt.legend()
plt.show()
Charts and graphs make complex data more accessible and help communicate findings to non-technical stakeholders.
Summary
This tutorial demonstrated how to build a foundational spend recovery system that mimics the core functionality of enterprise AI solutions like Rivvun AI. We created a system that:
- Loads and explores enterprise contract data
- Identifies potential revenue losses through rule-based logic
- Implements machine learning for predictive analysis
- Generates comprehensive recovery reports
- Visualizes findings for stakeholder communication
While this is a simplified prototype, it demonstrates the core principles that enterprise AI systems use to identify and recover lost revenue. In real-world applications, such systems would integrate with enterprise resource planning (ERP) systems, process large volumes of data, and continuously learn from new information to improve accuracy over time.



