Network monitoring transforms IT operations from reactive firefighting — discovering outages when users call — to proactive management where issues are detected and resolved before users notice. Without monitoring, IT teams are flying blind: unable to answer basic questions like "which link is saturated?", "when did the server go down?", or "how close are we to capacity?"
This guide walks through building a production monitoring platform from scratch — selecting the right tools for your scale, configuring SNMP on every device, deploying PRTG or Zabbix for polling, building Grafana dashboards for visualization, and designing an alerting system that pages the right person at the right time without alert fatigue.
1 Monitoring Platform Selection
No single monitoring tool is best for every organization. The right choice depends on your environment size, budget, technical depth, and what you primarily need to monitor — network devices, servers, applications, or all three.
Platform Comparison
| Platform | Best For | Strengths | Limitations | Cost |
| PRTG Network Monitor |
SMB to mid-market — Windows shops |
Auto-discovery, 250+ sensor types, easy setup, great UI, built-in maps |
License per sensor, expensive at scale, Windows-only server |
Free (100 sensors) / ₹45,000+/year |
| Zabbix |
Mid-market to enterprise — Linux shops |
Free/open-source, highly scalable, powerful templates, active community |
Steep learning curve, complex initial setup, UI less polished |
100% Free (open-source) / Paid support available |
| LibreNMS |
Network-focused monitoring |
Auto-discovery, excellent vendor support (MikroTik, FortiGate, Cisco), free |
Network devices primarily — limited server/app monitoring |
100% Free (open-source) |
| Grafana + Prometheus |
Metrics visualization layer |
Best-in-class dashboards, works with any data source, alerting engine |
Not a standalone NMS — requires backend data source (Zabbix, InfluxDB) |
Free (OSS) / Grafana Cloud from $0 |
| Nagios / Icinga2 |
Legacy environments |
Mature, highly configurable, large plugin ecosystem |
Configuration file-based — complex to manage at scale |
Free (core) / Nagios XI paid |
| SolarWinds NPM |
Large enterprise |
Comprehensive, excellent maps, deep Cisco/Juniper integration |
Very expensive, complex, SolarWinds supply chain breach history |
₹500,000+/year |
Recommended Stack by Organization Size
- 1–50 devices: PRTG free tier (100 sensors) or LibreNMS — zero cost, quick setup, covers all basics for small networks
- 50–500 devices: Zabbix (monitoring backend) + Grafana (dashboards) — scales well, free, powerful enough for complex environments
- 500+ devices: Zabbix with distributed proxies + Grafana Enterprise OR PRTG paid tier — enterprise scale, distributed collection, high availability
- Network-heavy (ISPs, datacenters): LibreNMS for network devices + Zabbix for servers — best of both specialized tools
2 SNMP Configuration on Devices
SNMP (Simple Network Management Protocol) is the universal language of network monitoring — it allows monitoring platforms to query device metrics (CPU, memory, interface traffic, errors) without installing agents. Configuring SNMP correctly and securely on every device is the foundation of the entire monitoring platform.
SNMP Version Comparison
| Version | Authentication | Encryption | Use |
| SNMPv1 | Community string only | None | Never use — completely insecure |
| SNMPv2c | Community string only | None | Acceptable on isolated management VLAN only |
| SNMPv3 | Username + auth password | AES-128/256 | Always use for production |
FortiGate SNMP Configuration
# FortiGate — SNMPv3 Configuration
config system snmp sysinfo
set status enable
set description "FortiGate-HQ-Firewall"
set contact "noc@enterweb.in"
set location "Server Room - Rack 2"
end
config system snmp community
# SNMPv2c — management VLAN only (if SNMPv3 not supported by tool)
edit 1
set name "EnterWeb-NMS-Readonly"
config hosts
edit 1
set ip 10.10.50.10 255.255.255.255 # Monitoring server IP only
next
end
set query-v1-status disable
set query-v2c-status enable
set trap-v1-status disable
set trap-v2c-status enable
set trap-v2c-lport 162
next
end
config system snmp user
# SNMPv3 user
edit "nms-readonly"
set queries enable
set query-port 161
set auth-proto sha256
set auth-pwd "StrongAuthPassword123!"
set priv-proto aes256
set priv-pwd "StrongPrivPassword456!"
set security-level auth-priv
set notify-hosts 10.10.50.10 # SNMP trap destination
next
end
MikroTik SNMP Configuration
# MikroTik RouterOS — SNMP Setup
/snmp
set enabled=yes \
contact="noc@enterweb.in" \
location="Branch-Office-Router" \
trap-version=2 \
trap-community="EnterWeb-NMS-Readonly"
/snmp community
set [ find default=yes ] name="public" disabled=yes # Disable default "public"
add name="EnterWeb-NMS-Readonly" \
addresses=10.10.50.10/32 \
read-access=yes \
write-access=no \
authentication-protocol=SHA1 \
encryption-protocol=AES \
authentication-password="StrongAuthPass!" \
encryption-password="StrongEncPass!"
# Verify SNMP is responding
# From monitoring server: snmpwalk -v2c -c EnterWeb-NMS-Readonly 10.10.1.1
Ubuntu Server SNMP Agent
# Install SNMP daemon
sudo apt install snmpd snmp -y
# Configure /etc/snmp/snmpd.conf
sudo nano /etc/snmp/snmpd.conf
# Replace default content with:
# ── Access Control ──────────────────────────────────
agentaddress udp:161
# SNMPv3 user (add via net-snmp-create-v3-user)
# Run BEFORE starting snmpd:
sudo systemctl stop snmpd
sudo net-snmp-create-v3-user -ro -A "StrongAuthPass!" -a SHA \
-X "StrongPrivPass!" -x AES nms-readonly
# View-based access control
rocommunity disabled # Disable v2c
rouser nms-readonly priv # SNMPv3 only
# System info
sysLocation "Datacenter-Rack3-Ubuntu-Server"
sysContact "noc@enterweb.in"
# Extend with additional metrics
extend .1.3.6.1.4.1.2021.100 distro /usr/bin/distro
sudo systemctl enable snmpd
sudo systemctl start snmpd
# Allow SNMP through UFW from monitoring server only
sudo ufw allow from 10.10.50.10 to any port 161 proto udp
⚠️ Warning: Never use the community string "public" or "private" in production — these are the defaults that every network scanner and attacker tries first. Change community strings to random 20+ character strings and restrict SNMP access to only the monitoring server's IP address in the SNMP ACL. Place SNMP traffic on a dedicated management VLAN so it is never routed over untrusted network segments.
3 PRTG Setup & Auto-Discovery
PRTG Network Monitor provides the fastest path to a working monitoring platform — auto-discovery scans your network and creates devices and sensors automatically. The free tier supports 100 sensors, sufficient for monitoring 15–20 devices comprehensively.
PRTG Initial Setup Steps
- Download PRTG from paessler.com — install on Windows Server 2019/2022 (minimum 4 vCPU, 8GB RAM, 100GB storage)
- Access the web UI at
https://[server-ip]:443 — default credentials are prtgadmin / prtgadmin — change immediately
- Navigate to Setup → System Administration → Core & Probes — verify probe is connected
- Add SNMP credentials: Setup → System Administration → SNMP Compatibility Options
- Run auto-discovery: Devices → Add Device → Auto-Discovery — enter your management subnet (e.g., 10.10.0.0/24)
- PRTG will scan the subnet and create device groups with pre-configured sensors automatically
- Review discovered devices — remove duplicates, assign correct device types and icons
- Configure notification contacts: Setup → Account Settings → Notifications
Key PRTG Sensors to Deploy per Device Type
| Device Type | Essential Sensors | Sensor Count |
| Firewall (FortiGate) |
Ping, SNMP CPU, SNMP Memory, SNMP Traffic (WAN + LAN), SNMP Sessions, HTTPS uptime |
6–8 sensors |
| Router (MikroTik) |
Ping, SNMP CPU, SNMP Memory, SNMP Traffic (all active interfaces), BGP peer state |
5–10 sensors |
| Switch (managed) |
Ping, SNMP Traffic (uplinks), SNMP Port Errors, SNMP STP state |
4–6 sensors |
| Windows Server |
Ping, WMI CPU, WMI Memory, WMI Disk space (all volumes), WMI Services (critical), WMI Event Log |
8–12 sensors |
| Linux Server |
Ping, SSH CPU, SSH Memory, SSH Disk, SSH Process count, SNMP Load average |
6–8 sensors |
| Internet link (ILL) |
Ping (external target), HTTP(S) check, SNMP Traffic on WAN interface, latency probe |
4 sensors |
✅ Pro Tip: In PRTG, use Device Templates to standardize sensor sets across device types — create one template for "MikroTik Router", one for "Windows Server", one for "FortiGate Firewall." When you add a new device, apply the matching template to automatically create all required sensors in seconds. This eliminates manual sensor creation and ensures consistent monitoring coverage across all devices of the same type.
4 Zabbix Deployment & Templates
Zabbix is the most powerful free network monitoring platform — highly scalable, template-driven, and capable of monitoring tens of thousands of devices with a single installation. The initial setup is more involved than PRTG but the depth of capability and zero licensing cost make it the preferred choice for growing environments.
Zabbix Server Installation (Ubuntu 22.04)
# Install Zabbix 7.x on Ubuntu 22.04
# Step 1: Add Zabbix repository
wget https://repo.zabbix.com/zabbix/7.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_7.0-1+ubuntu22.04_all.deb
sudo dpkg -i zabbix-release_7.0-1+ubuntu22.04_all.deb
sudo apt update
# Step 2: Install Zabbix server, frontend, agent
sudo apt install zabbix-server-mysql zabbix-frontend-php \
zabbix-apache-conf zabbix-sql-scripts zabbix-agent2 -y
# Step 3: Configure MySQL database
sudo mysql -uroot -p
CREATE DATABASE zabbix CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
CREATE USER 'zabbix'@'localhost' IDENTIFIED BY 'StrongDBPassword123!';
GRANT ALL PRIVILEGES ON zabbix.* TO 'zabbix'@'localhost';
SET GLOBAL log_bin_trust_function_creators = 1;
FLUSH PRIVILEGES; EXIT;
# Step 4: Import initial schema
zcat /usr/share/zabbix-sql-scripts/mysql/server.sql.gz | \
mysql --default-character-set=utf8mb4 -uzabbix -p zabbix
# Step 5: Configure Zabbix server (/etc/zabbix/zabbix_server.conf)
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=StrongDBPassword123!
StartPollers=20
StartPollersUnreachable=5
StartTrappers=10
StartPingers=10
CacheSize=128M
HistoryCacheSize=64M
TrendCacheSize=32M
# Step 6: Start services
sudo systemctl enable --now zabbix-server zabbix-agent2 apache2
sudo systemctl status zabbix-server
# Step 7: Access web UI
# http://[server-ip]/zabbix
# Default login: Admin / zabbix — CHANGE IMMEDIATELY
Zabbix Templates for Network Devices
# Zabbix ships with pre-built templates — enable via:
# Configuration → Templates → Search for your device vendor
# Key built-in templates to activate:
"Template Net Cisco IOS SNMPv2" → Cisco routers/switches
"Template Net Fortinet FortiGate SNMPv2" → FortiGate firewalls
"Template Net MikroTik SNMPv2" → MikroTik RouterOS
"Template OS Linux by Zabbix agent" → Linux servers
"Template OS Windows by Zabbix agent" → Windows servers
"Template App Apache by HTTP" → Web servers
"Template App MySQL by Zabbix agent" → MySQL/MariaDB
# Assign template to a host:
# Configuration → Hosts → [select host] → Templates tab
# → Start typing template name → Select → Update
# Custom MikroTik template items (if built-in missing metrics):
# Configuration → Templates → Create item
Name: WAN Interface Traffic In
Type: SNMP agent
OID: .1.3.6.1.2.1.31.1.1.1.6.1 (ifHCInOctets for interface index 1)
Key: net.if.in[wan1]
Type of info: Numeric (unsigned)
Units: bps
Preprocessing: Change per second → Multiplier (8) [bytes to bits]
✅ Pro Tip: Use Zabbix Proxies for monitoring remote sites — deploy a lightweight Zabbix Proxy at each branch office. The proxy collects data locally and forwards it to the central Zabbix server, reducing WAN bandwidth consumption by 90% compared to the server polling remote devices directly. Proxies also continue collecting data during WAN outages and sync when connectivity is restored — ensuring no monitoring gaps during the exact events you most need data about.
5 Grafana Dashboard Setup
Grafana transforms raw monitoring data into beautiful, interactive dashboards that operations teams and management can actually read. It connects to Zabbix, InfluxDB, Prometheus, and dozens of other data sources — acting as a unified visualization layer across your entire monitoring stack.
Grafana Installation (Ubuntu)
# Install Grafana OSS
sudo apt install -y apt-transport-https software-properties-common
wget -q -O - https://apt.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://apt.grafana.com stable main" | \
sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install grafana -y
sudo systemctl enable --now grafana-server
# Access: http://[server-ip]:3000
# Default: admin / admin — change on first login
# Install Zabbix data source plugin
grafana-cli plugins install alexanderzobnin-zabbix-app
sudo systemctl restart grafana-server
# Enable Zabbix plugin:
# Plugins → alexanderzobnin-zabbix-app → Enable
# Configuration → Data Sources → Add → Zabbix
# URL: http://localhost/zabbix/api_jsonrpc.php
# Username: Admin / [your zabbix password]
Essential Grafana Dashboards to Build
- Network Operations Center (NOC) Overview: Full-screen TV dashboard — all devices listed with green/red status indicators, current WAN bandwidth utilization, active alerts count, and uptime percentage for the day
- WAN Bandwidth Dashboard: Time-series graphs for each internet link — inbound/outbound Mbps, utilization %, peak traffic times, and 30-day trend comparison
- Server Health Dashboard: CPU, memory, and disk utilization for all servers in a grid — color-coded thresholds (green <70%, yellow 70–85%, red >85%)
- Top Talkers Dashboard: Which IP addresses or interfaces are consuming the most bandwidth — updated every 5 minutes, sortable table
- Monthly Executive Report Dashboard: Uptime SLA percentage, average response times, total alerts fired, top 5 recurring issues — export as PDF for management
- VPN Tunnel Status: All site-to-site VPN tunnels listed with up/down status, tunnel uptime, bytes transferred — instant visibility into branch connectivity
# Sample Grafana panel query (Zabbix datasource — WAN bandwidth)
# Panel type: Time series
# Data source: Zabbix
Group: Network Devices
Host: FortiGate-HQ
Application: Network Interfaces
Item: WAN1: Bits received per second
# Add second query for outbound:
Item: WAN1: Bits sent per second
# Panel display settings:
Unit: bits/sec (auto-scale to Mbps/Gbps)
Fill opacity: 10
Line width: 2
Thresholds:
70% of link capacity → Yellow
90% of link capacity → Red
6 Syslog & Log Aggregation
SNMP polling tells you metrics — syslog tells you events. When a firewall blocks a connection, a VPN tunnel drops, an interface flaps, or a login fails, the device sends a syslog message. Collecting and centralizing these logs is essential for troubleshooting and security monitoring.
Syslog Server Setup (rsyslog on Ubuntu)
# Install and configure rsyslog as central syslog server
sudo apt install rsyslog -y
# Edit /etc/rsyslog.conf — enable UDP and TCP syslog reception
# Uncomment these lines:
module(load="imudp")
input(type="imudp" port="514")
module(load="imtcp")
input(type="imtcp" port="514")
# Route logs by source IP to separate files
# Add to /etc/rsyslog.d/10-network-devices.conf:
if $fromhost-ip == '10.10.1.1' then /var/log/network/fortigate-hq.log
if $fromhost-ip == '10.10.1.2' then /var/log/network/mikrotik-core.log
if $fromhost-ip startswith '10.10.' then /var/log/network/network-devices.log
& stop
# Create log directory and set permissions
sudo mkdir -p /var/log/network
sudo chown syslog:adm /var/log/network
# Configure log rotation (/etc/logrotate.d/network-devices)
/var/log/network/*.log {
daily
rotate 90
compress
delaycompress
missingok
notifempty
sharedscripts
postrotate
/usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
endscript
}
sudo systemctl restart rsyslog
# Allow syslog through firewall (from network devices only)
sudo ufw allow from 10.10.0.0/16 to any port 514
Configure FortiGate Syslog Forwarding
# FortiGate — Send all logs to central syslog server
config log syslogd setting
set status enable
set server 10.10.50.10 # Syslog server IP
set mode udp
set port 514
set facility local7
set source-ip 10.10.1.1 # FortiGate management IP
set format rfc5424 # Standard syslog format
end
config log syslogd filter
set severity information # Send info level and above
set forward-traffic enable
set local-traffic enable
set sniffer-traffic disable
set anomaly enable
set voip disable
set gtp disable
set filter-type include
end
# Verify logs are being sent
diagnose log test
✅ Pro Tip: For organizations that need to search and analyze logs at scale — install the ELK Stack (Elasticsearch, Logstash, Kibana) or the lighter Graylog on top of your syslog collection. These platforms parse structured syslog data, enable full-text search across millions of log entries in seconds, and let you build dashboards showing security events, top blocked IPs, VPN tunnel events, and authentication failures — turning raw syslog into actionable intelligence.
7 Alert Design & Escalation
Alert fatigue is the most common failure mode of monitoring deployments — too many low-priority alerts train teams to ignore them, including the critical ones. Good alert design means alerting only on actionable conditions, with the right severity, to the right person, at the right time.
Alert Threshold Reference
| Metric | Warning Threshold | Critical Threshold | Check Interval |
| Device ping (packet loss) | > 5% for 3 min | > 30% for 1 min / down 2 min | 60 sec |
| WAN bandwidth utilization | > 70% sustained 5 min | > 90% sustained 2 min | 60 sec |
| CPU utilization (router/firewall) | > 75% for 5 min | > 90% for 3 min | 60 sec |
| Server CPU utilization | > 80% for 10 min | > 95% for 5 min | 60 sec |
| Server memory utilization | > 85% for 5 min | > 95% for 2 min | 60 sec |
| Disk space used | > 75% capacity | > 90% capacity | 5 min |
| VPN tunnel status | N/A | Tunnel down > 2 min | 30 sec |
| Interface error rate | > 0.1% error rate | > 1% error rate | 60 sec |
| SSL certificate expiry | 30 days remaining | 7 days remaining | Daily |
Escalation Matrix
# Alert Escalation Policy — define in PRTG/Zabbix notification settings
Level 1 — Warning (Yellow):
Notify: NOC email group (noc@enterweb.in)
Method: Email
Delay: Immediate (on first trigger)
Repeat: Every 30 minutes while active
Level 2 — Critical (Red):
Notify: On-call engineer
Method: Email + SMS/WhatsApp via Twilio/WATI API
Delay: Immediate
Repeat: Every 15 minutes while active
Level 3 — Critical unacknowledged > 30 min:
Notify: IT Manager + On-call engineer
Method: Phone call (Twilio voice alert)
Delay: 30 minutes after Level 2 trigger
Repeat: Every 30 minutes
Level 4 — Major outage (core device down > 1 hour):
Notify: IT Director + All stakeholders
Method: Email + Phone + Incident ticket auto-created
Delay: 60 minutes after initial trigger
⚠️ Warning: Sending SNMP or syslog alerts via email only is insufficient for critical infrastructure — email delivery can be delayed 5–15 minutes due to spam filtering and MX propagation. For Critical-level alerts (device down, WAN outage, VPN failure), configure a secondary notification channel: SMS via Twilio, WhatsApp Business API (WATI), Telegram bot, or PagerDuty integration. The goal is to wake someone up at 3 AM within 2 minutes of a critical failure — email alone will not reliably achieve this.
8 Reporting & SLA Tracking
Monitoring data becomes a business asset when it is turned into regular reports — proving SLA compliance to clients, identifying recurring problems for proactive resolution, and demonstrating the value of IT investments to management.
Monthly Report Contents
- Uptime SLA report: Per-device availability percentage for the month — export from PRTG Availability Report or Zabbix Report → Availability report. Target: 99.9% (8.7 hours downtime/year)
- WAN utilization trends: Average and peak bandwidth per link, growth trend vs. last month, capacity planning recommendation if utilization consistently exceeds 60%
- Alert summary: Total alerts fired by severity, top 10 most-alerting devices, mean time to acknowledge (MTTA) and mean time to resolve (MTTR)
- Top issues: Recurring alerts — devices that triggered the same alert 3+ times indicate an underlying problem needing permanent resolution, not repeated acknowledgement
- Patch compliance: Devices with outdated firmware or OS patches — flagged for remediation in the following month
- Capacity forecast: Based on 3-month growth trend — predict when each WAN link, server disk, or device CPU will hit critical threshold
Automated PRTG PDF Report
# PRTG — Schedule monthly PDF report
# Setup → Reports → Add Report
Report type: Custom Report
Schedule: Monthly (1st of each month, 08:00)
Output: PDF + Email to: management@enterweb.in
# Sections to include:
- Summary: Total sensors, down sensors, uptime %
- Top 10 sensors by downtime
- Bandwidth graphs: All WAN interfaces (last 30 days)
- Server resources: CPU/Memory/Disk (last 30 days)
- Alert log: All Critical alerts last 30 days
# PRTG API — pull uptime data programmatically
GET https://[prtg-server]/api/table.json?
content=sensors&
columns=device,sensor,status,uptime,downtime&
filter_status=5& # 5 = Down sensors
apitoken=[your-api-token]
✅ Pro Tip: Create a dedicated NOC Dashboard TV screen in your server room or IT office — a Grafana dashboard displayed on a wall-mounted monitor showing real-time device status, current WAN utilization graphs, active alert count, and the last 10 alert events. This passive visibility means your team instantly notices a spike or outage without needing to actively check the monitoring console — the screen catches issues in peripheral vision during normal work hours, dramatically reducing mean time to detect (MTTD).
Need Help Setting Up Network Monitoring?
EnterWeb IT Firm deploys and configures PRTG, Zabbix, Grafana, and LibreNMS monitoring platforms for organizations of all sizes. We design alert escalation workflows, build custom dashboards, and deliver monthly SLA reports so your IT team always has complete visibility into infrastructure health.