Detect Hanging Cron Job: Timeout & Execution Time Monitoring
Imagine your backup script starts at 3 AM and never finishes. No error, no log—just a process that blocks for hours. Learn how to detect cron job timeouts and stuck scripts before they cause real damage.
The Problem: Scripts That Hang Without Failing
A cron job can "run" but never complete. The process is alive, consuming resources, but stuck on a network call, a locked file, or a dead database connection. Exit code is never returned. No ping, no alert—until you notice something is wrong.
- Script blocks on I/O (API timeout, DB lock, NFS hang)
- Script runs 5 hours instead of 5 minutes and nobody notices
- Next scheduled run is skipped because the previous one is still "running"
- Resource exhaustion: too many stuck processes
A simple "ping when done" only tells you the job finished. It doesn't tell you it took 6 hours. You need to measure time between start and stop.
Solution: Start/Stop Tracking and Timeout Detection
Send a start signal when the job begins and a stop signal (or payload ping) when it ends. The monitor measures elapsed time. If no stop arrives within your configured timeout, you get an alert—hanging job detected.
- Alert if the job doesn't complete within expected duration (e.g. 30 minutes)
- See actual execution time in the dashboard for every run
- Detect gradual slowdown (job used to take 2 min, now 45 min)
Bash: Start at Beginning, Ping at End
#!/bin/bash
# 1. Signal "job started" (get run_id from response if you need it)
curl -s -X POST "https://deadmanping.com/api/ping/backup-daily/start" -o /tmp/run.json
# 2. Do the actual work
/path/to/backup.sh
# 3. Signal "job finished" - optional: send run_id so monitor can match start/stop
curl -X POST "https://deadmanping.com/api/ping/backup-daily"
# If backup.sh hangs, step 3 never runs -> timeout alertPython: Start/Stop with run_id
import requests
import subprocess
PING_SLUG = "data-sync-job"
BASE = "https://deadmanping.com/api/ping"
# Start tracking
r = requests.post(f"{BASE}/{PING_SLUG}/start")
run_id = r.json().get("run_id") # optional, for linking start/stop
# Your long-running work
subprocess.run(["/path/to/sync.sh"], check=True)
# Stop: job completed (optionally pass run_id)
requests.post(f"{BASE}/{PING_SLUG}", json={"run_id": run_id})
# If sync hangs, this never runs -> you get a timeout alertNode.js: Async Job with Start/Stop
const slug = 'report-generation';
const base = 'https://deadmanping.com/api/ping';
// Start
const startRes = await fetch(`${base}/${slug}/start`, { method: 'POST' });
const { run_id } = await startRes.json();
try {
await runReportJob(); // your async work
await fetch(`${base}/${slug}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ run_id })
});
} catch (e) {
// Optionally ping with failure status so monitor knows it didn't hang, it failed
await fetch(`${base}/${slug}`, {
method: 'POST',
body: JSON.stringify({ run_id, status: 'failure' })
});
throw e;
}Configure Timeout in DeadManPing
When you use start/stop tracking, set an expected max duration (e.g. 30 minutes). If no completion ping arrives within that window, DeadManPing marks the run as timed out and sends an alert—so you know the job is stuck, not just late.
You keep your script and cron. Add two HTTP calls: one at start, one at end. No SDK, no agent—just curl or your language's HTTP client.
Catch Stuck Scripts Before They Cost You
DeadManPing start/stop tracking measures execution time and triggers timeout alerts when jobs hang. Set up in minutes—add a start ping and a completion ping to your existing cron job.
Related Articles
Learn more about cron job monitoring and troubleshooting:
Monitor Cron Jobs: One Curl Line, No Migration
Monitor cron jobs with one curl line. No SDK, no migration—get alerts when jobs fail.
Cron Job Failed? How to Detect & Fix in 5 Min
Step-by-step guide to detect why your cron failed, fix it, and get alerts so it never happens again.
Verify Cron Job Completed
How to verify that cron jobs completed successfully.
Cron Job Not Executing
How to detect when cron jobs are not executing.