Two ex-McKinsey founders raise $4.1M from Seedcamp to give boards an AI analyst that monitors corporate reputation in real time

Learn to build an AI-powered reputation monitoring system that collects data from multiple sources, analyzes sentiment and topics, and generates executive summaries similar to Omniscient's platform.

Introduction

In this tutorial, we'll build a simplified version of an AI-powered reputation monitoring system similar to what Omniscient is developing. You'll learn how to ingest data from multiple sources, process it using natural language processing, and generate executive summaries. This system will demonstrate core concepts of decision intelligence platforms that help boards monitor corporate reputation in real-time.

Prerequisites

Python 3.8+
Basic understanding of APIs and web scraping
Knowledge of natural language processing concepts
Installed libraries: requests, newspaper3k, transformers, pandas, numpy

Step-by-step Instructions

1. Setting up the Project Structure

First, create a project directory and install the required dependencies:

mkdir omniscient_reputation_monitor
 cd omniscient_reputation_monitor
 pip install requests newspaper3k transformers pandas numpy

This creates our project workspace and installs essential libraries for data collection, NLP processing, and data manipulation.

1.1 Create main Python files

Create the following files in your project directory:

data_ingestion.py
nlp_processor.py
briefing_generator.py
main.py

2. Implementing Data Ingestion

2.1 Create data ingestion module

In data_ingestion.py, we'll build a system that collects news articles, social media posts, and press releases:

import requests
import json
from datetime import datetime
from newspaper import Article


class DataIngestor:
    def __init__(self):
        self.sources = []
        
    def add_news_source(self, url, source_type):
        self.sources.append({
            'url': url,
            'type': source_type,
            'timestamp': datetime.now()
        })
        
    def fetch_articles(self):
        articles = []
        
        for source in self.sources:
            try:
                if source['type'] == 'news':
                    article = Article(source['url'])
                    article.download()
                    article.parse()
                    
                    articles.append({
                        'title': article.title,
                        'content': article.text,
                        'url': source['url'],
                        'source_type': source['type'],
                        'timestamp': source['timestamp']
                    })
                elif source['type'] == 'api':
                    response = requests.get(source['url'])
                    data = response.json()
                    
                    for item in data.get('articles', []):
                        articles.append({
                            'title': item.get('title', ''),
                            'content': item.get('content', ''),
                            'url': item.get('url', ''),
                            'source_type': source['type'],
                            'timestamp': source['timestamp']
                        })
            except Exception as e:
                print(f"Error fetching from {source['url']}: {e}")
                
        return articles

# Example usage
if __name__ == '__main__':
    ingestor = DataIngestor()
    ingestor.add_news_source('https://www.reuters.com/technology/', 'news')
    articles = ingestor.fetch_articles()
    print(f"Fetched {len(articles)} articles")

This module demonstrates how to collect data from different sources - traditional news websites and API endpoints. The system handles both web scraping and API integration, similar to how Omniscient ingests 100,000+ sources.

3. Building the NLP Processor

3.1 Create NLP processor module

In nlp_processor.py, we'll implement text analysis capabilities:

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
import pandas as pd
from collections import Counter


class NLPProcessor:
    def __init__(self):
        # Initialize sentiment analysis pipeline
        self.sentiment_pipeline = pipeline("sentiment-analysis", 
                                          model="cardiffnlp/twitter-roberta-base-sentiment-latest")
        
        # Initialize zero-shot classification for topic detection
        self.classifier = pipeline("zero-shot-classification", 
                                  model="facebook/bart-large-mnli")
        
    def analyze_sentiment(self, text):
        try:
            result = self.sentiment_pipeline(text[:512])  # Limit to 512 tokens
            return result[0]['label'], result[0]['score']
        except Exception as e:
            print(f"Sentiment analysis error: {e}")
            return 'neutral', 0.0
            
    def detect_topics(self, text, candidate_labels):
        try:
            result = self.classifier(text[:512], candidate_labels)
            return result['labels'][0], result['scores'][0]
        except Exception as e:
            print(f"Topic detection error: {e}")
            return 'unknown', 0.0
            
    def process_article(self, article):
        # Extract key information
        sentiment_label, sentiment_score = self.analyze_sentiment(article['content'])
        
        # Define potential topics
        topics = ['financial', 'legal', 'product', 'management', 'technology', 'regulatory']
        detected_topic, topic_score = self.detect_topics(article['content'], topics)
        
        return {
            'title': article['title'],
            'sentiment': sentiment_label,
            'sentiment_score': sentiment_score,
            'topic': detected_topic,
            'topic_score': topic_score,
            'source': article['url'],
            'timestamp': article['timestamp']
        }

# Example usage
if __name__ == '__main__':
    processor = NLPProcessor()
    sample_article = {
        'title': 'Company X Announces New Product Launch',
        'content': 'Company X has announced a revolutionary new product that will transform the industry. The launch has received positive feedback from analysts.',
        'url': 'http://example.com',
        'timestamp': '2023-01-01'
    }
    result = processor.process_article(sample_article)
    print(result)

This processor uses transformer models to analyze sentiment and detect topics in news content. It mimics how Omniscient processes and synthesizes information from multiple sources.

4. Creating the Executive Briefing Generator

4.1 Implement briefing generator

In briefing_generator.py, we'll create a system that synthesizes information into executive summaries:

import pandas as pd
from collections import defaultdict
import numpy as np


class BriefingGenerator:
    def __init__(self):
        self.articles = []
        
    def add_article(self, article):
        self.articles.append(article)
        
    def generate_briefing(self):
        if not self.articles:
            return "No articles to process"
            
        # Convert to DataFrame for easier analysis
        df = pd.DataFrame(self.articles)
        
        # Analyze sentiment distribution
        sentiment_counts = df['sentiment'].value_counts()
        
        # Analyze topic distribution
        topic_counts = df['topic'].value_counts()
        
        # Calculate average sentiment score
        avg_sentiment = df['sentiment_score'].mean()
        
        # Generate summary
        summary = {
            'timestamp': pd.Timestamp.now(),
            'total_articles': len(self.articles),
            'sentiment_distribution': sentiment_counts.to_dict(),
            'topic_distribution': topic_counts.to_dict(),
            'average_sentiment_score': avg_sentiment,
            'key_topics': list(topic_counts.index[:3]),
            'key_sentiments': list(sentiment_counts.index[:2])
        }
        
        return self._format_briefing(summary)
        
    def _format_briefing(self, summary):
        briefing = f"\n=== Executive Briefing ===\n"
        briefing += f"Generated on: {summary['timestamp']}\n"
        briefing += f"Total Articles Processed: {summary['total_articles']}\n\n"
        
        briefing += "Sentiment Analysis:\n"
        for sentiment, count in summary['sentiment_distribution'].items():
            briefing += f"  - {sentiment}: {count} articles\n"
        
        briefing += "\nTopic Distribution:\n"
        for topic, count in summary['topic_distribution'].items():
            briefing += f"  - {topic}: {count} articles\n"
        
        briefing += f"\nAverage Sentiment Score: {summary['average_sentiment_score']:.2f}\n"
        briefing += f"Key Topics: {', '.join(summary['key_topics'])}\n"
        briefing += f"Key Sentiments: {', '.join(summary['key_sentiments'])}\n"
        
        return briefing

# Example usage
if __name__ == '__main__':
    generator = BriefingGenerator()
    # Add sample articles
    generator.add_article({
        'title': 'Article 1',
        'sentiment': 'positive',
        'sentiment_score': 0.8,
        'topic': 'product',
        'topic_score': 0.9,
        'url': 'http://example.com',
        'timestamp': '2023-01-01'
    })
    
    briefing = generator.generate_briefing()
    print(briefing)

This component demonstrates how to synthesize information into a concise executive briefing, similar to the two-minute summaries mentioned in the Omniscient case study.

5. Integrating Everything Together

5.1 Create main execution file

In main.py, we'll tie everything together:

from data_ingestion import DataIngestor
from nlp_processor import NLPProcessor
from briefing_generator import BriefingGenerator


def main():
    print("Starting Omniscient-style Reputation Monitor")
    
    # Initialize components
    ingestor = DataIngestor()
    processor = NLPProcessor()
    generator = BriefingGenerator()
    
    # Add sample data sources
    ingestor.add_news_source('https://www.reuters.com/technology/', 'news')
    ingestor.add_news_source('https://api.example-news.com/articles?query=corporate', 'api')
    
    # Fetch articles
    print("Fetching articles...")
    articles = ingestor.fetch_articles()
    print(f"Fetched {len(articles)} articles")
    
    # Process articles
    print("Processing articles...")
    processed_articles = []
    
    for article in articles:
        processed = processor.process_article(article)
        processed_articles.append(processed)
        generator.add_article(processed)
        
    # Generate briefing
    print("Generating executive briefing...")
    briefing = generator.generate_briefing()
    print(briefing)
    
    # Save to file
    with open('executive_briefing.txt', 'w') as f:
        f.write(briefing)
    
    print("Briefing saved to executive_briefing.txt")

if __name__ == '__main__':
    main()

This main script orchestrates the entire workflow from data collection to briefing generation, demonstrating the end-to-end process of an AI-powered decision intelligence platform.

6. Running the System

6.1 Execute the complete system

Run your complete system with:

python main.py

This will execute the entire workflow, showing how data flows from ingestion through processing to final briefing generation.

6.2 Test with real data

Modify the add_news_source calls in main.py to use actual news sources or APIs to test with real content. This demonstrates how the system scales to handle multiple sources like Omniscient's 100,000+ sources.

Summary

In this tutorial, you've built a simplified version of an AI-powered reputation monitoring system similar to Omniscient. You've learned how to:

Ingest data from multiple sources including news websites and APIs
Process text using transformer-based NLP models for sentiment and topic analysis
Aggregate and synthesize information into executive summaries
Build a modular system that can be extended with additional features

This implementation demonstrates core concepts of decision intelligence platforms that help executives make informed decisions by monitoring corporate reputation in real-time. While this is a simplified version, it shows the fundamental architecture that powers sophisticated systems like Omniscient.