Introduction
In this tutorial, you'll learn how to set up and configure a robust monitoring system for Ubuntu infrastructure using Prometheus and Grafana. This is particularly important given recent outages like the one described in the Ars Technica article, where infrastructure downtime affected critical vulnerability communications. By implementing proper monitoring, you'll be able to detect issues before they escalate and maintain better visibility into your Ubuntu systems.
Prerequisites
- Ubuntu 20.04 or 22.04 server with sudo privileges
- Basic understanding of Linux command line and networking
- Docker installed on your system
- At least 4GB RAM available for containerized services
Step 1: Install Docker and Docker Compose
First, we need to set up the containerization environment that will run our monitoring stack. Docker allows us to easily deploy and manage our monitoring services without worrying about system dependencies.
Install Docker
sudo apt update
sudo apt install -y docker.io
sudo systemctl start docker
docker --version
Install Docker Compose
sudo apt install -y docker-compose
Why: Docker provides isolated environments for our monitoring services, ensuring they don't conflict with existing system software and making deployment consistent across different environments.
Step 2: Create Monitoring Directory Structure
We'll create a dedicated directory for our monitoring configuration files and data storage.
Create Project Directory
mkdir -p ubuntu-monitoring/{prometheus,grafana,data}
cd ubuntu-monitoring
Create Prometheus Configuration
cat > prometheus/prometheus.yml << EOF
# Global config
global:
scrape_interval: 15s
evaluation_interval: 15s
# Alerting configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape
scrape_configs:
# The job name is added as a label `job` to any timeseries scraped from this config
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'ubuntu-server'
static_configs:
- targets: ['localhost:9100']
EOF
Why: This configuration tells Prometheus where to scrape metrics from, including itself and the node exporter which will monitor system metrics on our Ubuntu server.
Step 3: Set Up Docker Compose File
Now we'll define our monitoring stack using Docker Compose, which will orchestrate all our services together.
Create Docker Compose Configuration
cat > docker-compose.yml << EOF
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./data/prometheus:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=24h'
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
ports:
- "9100:9100"
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- ./data/grafana:/var/lib/grafana
restart: unless-stopped
EOF
Why: Docker Compose allows us to define and run multi-container Docker applications with a single command, making it easy to deploy our entire monitoring stack.
Step 4: Start the Monitoring Stack
With our configuration files in place, we can now start all our monitoring services.
Launch Services
docker-compose up -d
Verify Services Are Running
docker ps
# Expected output should show prometheus, node-exporter, and grafana containers running
Why: This command starts all services in detached mode, allowing them to run in the background while we continue working. The verification step ensures all containers are properly initialized.
Step 5: Configure Grafana Dashboard
Now we'll set up Grafana to visualize our Ubuntu system metrics.
Access Grafana Web Interface
Open your browser and navigate to http://localhost:3000. The default login is admin/admin.
Add Prometheus Data Source
- Click on the gear icon (Configuration) in the left sidebar
- Select "Data Sources"
- Click "Add data source"
- Select "Prometheus"
- Set URL to
http://prometheus:9090 - Click "Save & Test"
Why: Grafana needs to know where to fetch metrics from, and Prometheus serves as our time-series database for system monitoring data.
Step 6: Create Ubuntu System Dashboard
We'll create a custom dashboard to monitor key Ubuntu infrastructure metrics.
Create New Dashboard
- Click the "+" icon in the left sidebar
- Select "Dashboard"
- Click "Add new panel"
- Set Query to:
node_cpu_seconds_total{mode!="idle"} - Change the panel type to "Graph"
- Set Title to "CPU Usage by Mode"
Add Memory Usage Panel
- Click "Add panel"
- Set Query to:
node_memory_MemAvailable_bytes - Change the panel type to "Gauge"
- Set Title to "Available Memory"
Why: Creating custom dashboards allows you to focus on the most critical metrics for your Ubuntu infrastructure, making it easier to spot potential issues before they cause outages like the one mentioned in the news article.
Step 7: Set Up Alerting Rules
To proactively detect issues, we'll configure alerting rules in Prometheus.
Update Prometheus Configuration
cat >> prometheus/prometheus.yml << EOF
# Alert rules
rule_files:
- "alert.rules.yml"
EOF
Create Alert Rules File
cat > prometheus/alert.rules.yml << EOF
groups:
- name: ubuntu-alerts
rules:
- alert: HighCPUUsage
expr: rate(node_cpu_seconds_total{mode!='idle'}[5m]) > 0.8
for: 2m
labels:
severity: critical
annotations:
summary: "High CPU usage detected"
description: "CPU usage has been above 80% for more than 2 minutes"
- alert: LowMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Low memory warning"
description: "Available memory is below 10% for more than 5 minutes"
EOF
Restart Prometheus to Apply Rules
docker-compose restart prometheus
Why: Alerting rules provide automated notifications when system metrics exceed predefined thresholds, helping you identify and respond to issues before they cause infrastructure outages.
Summary
In this tutorial, you've set up a comprehensive monitoring solution for Ubuntu infrastructure using Prometheus and Grafana. You've learned how to:
- Install and configure Docker and Docker Compose
- Create a multi-container monitoring stack
- Configure Prometheus to scrape system metrics
- Set up Grafana for visualization
- Create custom alerting rules for critical infrastructure metrics
This monitoring setup will help prevent the kind of extended outages that affected Ubuntu infrastructure recently. By proactively monitoring CPU usage, memory consumption, and other key metrics, you'll be able to detect and address issues before they escalate into major problems that impact critical vulnerability communications and system availability.



