Icertis veterans raise $7.55 million to build the AI layer that recovers money enterprises don’t know they’re losing

Learn to build a basic enterprise spend recovery system that identifies potential revenue losses in contract data using Python and machine learning.

Introduction

In the enterprise software space, companies like Rivvun AI are pioneering AI-driven solutions to recover lost revenue and optimize spending. This tutorial will guide you through building a basic enterprise spend recovery system using Python and machine learning concepts. We'll focus on creating a prototype that can identify potential revenue losses in contract data, similar to what Rivvun AI is working on.

Prerequisites

Basic Python knowledge and experience with pandas and scikit-learn
Python 3.7 or higher installed
Required packages: pandas, scikit-learn, numpy, matplotlib
Sample contract data in CSV format

Step-by-Step Instructions

1. Set Up Your Development Environment

First, we need to create a virtual environment and install the necessary packages. This ensures we have a clean, isolated environment for our project.

python -m venv spend_recovery_env
source spend_recovery_env/bin/activate  # On Windows: spend_recovery_env\Scripts\activate
pip install pandas scikit-learn numpy matplotlib

2. Create Sample Contract Data

Before we can analyze contracts, we need sample data. Create a CSV file named contracts.csv with the following structure:

contract_id,company_name,contract_value,contract_start_date,contract_end_date,service_type,terms_compliance
C001,Acme Corp,100000,2023-01-01,2023-12-31,Software,Compliant
C002,Global Inc,250000,2023-03-15,2024-03-15,Consulting,Non-Compliant
C003,Future Tech,150000,2023-06-01,2024-06-01,Hardware,Compliant
C004,Alpha Ltd,300000,2023-02-01,2024-02-01,Software,Compliant
C005,Beta Corp,200000,2023-08-01,2024-08-01,Consulting,Non-Compliant

This dataset represents basic contract information that we'll use to identify potential revenue losses.

3. Load and Explore the Data

Now we'll load the data and perform basic exploratory analysis to understand what we're working with:

import pandas as pd

df = pd.read_csv('contracts.csv')
print(df.head())
print(df.info())
print(df.describe())

This step is crucial because understanding your data structure helps you identify patterns and anomalies that might indicate revenue loss opportunities.

4. Create Revenue Loss Detection Logic

Next, we'll implement a basic algorithm to detect potential revenue losses based on contract compliance and service types:

from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
import numpy as np

def detect_revenue_loss(df):
    # Create a new column to flag potential revenue loss
    df['potential_loss'] = 0
    
    # Flag contracts with non-compliant terms
    df.loc[df['terms_compliance'] == 'Non-Compliant', 'potential_loss'] = 1
    
    # Flag contracts with high-value but short duration
    df.loc[(df['contract_value'] > 200000) & (df['contract_end_date'] - df['contract_start_date']).dt.days < 365, 'potential_loss'] = 1
    
    # Flag specific service types that often lead to losses
    high_risk_services = ['Consulting', 'Support']
    df.loc[df['service_type'].isin(high_risk_services), 'potential_loss'] = 1
    
    return df

# Apply the detection function
df = detect_revenue_loss(df)
print(df[['contract_id', 'company_name', 'potential_loss']])

This logic identifies potential revenue losses based on known patterns in enterprise contracts. The 'potential_loss' flag will help prioritize which contracts need further review.

5. Implement Machine Learning Model for Predictive Analysis

To make our system more intelligent, we'll train a machine learning model to predict revenue loss probability:

# Prepare data for ML model
le = LabelEncoder()
df['service_type_encoded'] = le.fit_transform(df['service_type'])
df['terms_compliance_encoded'] = le.fit_transform(df['terms_compliance'])

columns_to_use = ['contract_value', 'service_type_encoded', 'terms_compliance_encoded']
X = df[columns_to_use]
y = df['potential_loss']

# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate model
accuracy = model.score(X_test, y_test)
print(f'Model Accuracy: {accuracy:.2f}')

The Random Forest model provides a good baseline for classification tasks and is robust to overfitting, making it suitable for enterprise applications where data quality can vary.

6. Generate Spend Recovery Report

Finally, we'll create a comprehensive report that highlights potential revenue recovery opportunities:

def generate_recovery_report(df, model):
    # Predict potential losses using ML model
    X_pred = df[columns_to_use]
    df['loss_probability'] = model.predict_proba(X_pred)[:, 1]
    
    # Identify high-risk contracts
    high_risk = df[df['loss_probability'] > 0.5]
    
    # Create summary statistics
    total_value = df['contract_value'].sum()
    potential_loss_value = high_risk['contract_value'].sum()
    
    print('=== Enterprise Spend Recovery Report ===')
    print(f'Total Contract Value: ${total_value:,}')
    print(f'Potential Loss Value: ${potential_loss_value:,}')
    print(f'Estimated Recovery Opportunity: ${total_value - potential_loss_value:,}')
    
    print('\nHigh-Risk Contracts:')
    print(high_risk[['contract_id', 'company_name', 'contract_value', 'loss_probability']])
    
    return high_risk

# Generate the report
high_risk_contracts = generate_recovery_report(df, model)

This report provides actionable insights that enterprise decision-makers can use to prioritize recovery efforts.

7. Visualize Results

Visualizing our findings helps stakeholders understand the data more effectively:

import matplotlib.pyplot as plt

# Plot contract values by risk level
plt.figure(figsize=(10, 6))
plt.hist([df[df['potential_loss']==1]['contract_value'], df[df['potential_loss']==0]['contract_value']], 
         bins=10, label=['High Risk', 'Low Risk'], alpha=0.7)
plt.xlabel('Contract Value')
plt.ylabel('Frequency')
plt.title('Distribution of Contract Values by Risk Level')
plt.legend()
plt.show()

Charts and graphs make complex data more accessible and help communicate findings to non-technical stakeholders.

Summary

This tutorial demonstrated how to build a foundational spend recovery system that mimics the core functionality of enterprise AI solutions like Rivvun AI. We created a system that:

Loads and explores enterprise contract data
Identifies potential revenue losses through rule-based logic
Implements machine learning for predictive analysis
Generates comprehensive recovery reports
Visualizes findings for stakeholder communication

While this is a simplified prototype, it demonstrates the core principles that enterprise AI systems use to identify and recover lost revenue. In real-world applications, such systems would integrate with enterprise resource planning (ERP) systems, process large volumes of data, and continuously learn from new information to improve accuracy over time.