Ansible Automation Platform Jobs Stuck in Pending: Root Cause and Fix

March 31, 2026

Today I ran into one of those issues that looks complex, but turns out to be beautifully simple.

Every job I launched in Ansible Automation Platform (Tower / Controller 4.5) just sat there.
No output. No errors. No movement. Just PENDING.

And sometimes, silence is the loudest signal.

What I Saw

Everything looked healthy on the surface:

Job templates launched successfully
Jobs stayed in PENDING forever
No logs, no failures, no hints
Execution Environments looked perfectly fine

At first, I suspected Ansible itself, playbooks, environments, something deep.

But this didn’t feel like a playbook problem.
This felt like something wasn’t even starting.

The Turning Point

When jobs don’t start at all, it’s usually not Ansible… it’s scheduling.

So I went one level lower services.

That’s when I checked Receptor, the quiet engine behind job execution in AAP 4.x.

And there it was:


systemctl status receptor
● receptor.service - Receptor
   Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/receptor.service.d
           └─override.conf
   Active: failed (Result: exit-code) since Tue 2026-03-31 12:49:06 CDT; 4s ago
  Process: 946196 ExecStart=/usr/bin/receptor -c /etc/receptor/receptor.conf (code=exited, status=1/FAILURE)
 Main PID: 946196 (code=exited, status=1/FAILURE)

Mar 31 12:49:06 xxxx systemd[1]: receptor.service: Service RestartSec=100ms expired, scheduling restart.
Mar 31 12:49:06 xxxx systemd[1]: receptor.service: Scheduled restart job, restart counter is at 5.
Mar 31 12:49:06 xxxx systemd[1]: Stopped Receptor.
Mar 31 12:49:06 xxxx systemd[1]: receptor.service: Start request repeated too quickly.
Mar 31 12:49:06 xxxx systemd[1]: receptor.service: Failed with result 'exit-code'.
Mar 31 12:49:06 xxxx systemd[1]: Failed to start Receptor.

Now we were getting somewhere.

The Real Error

Running Receptor manually revealed the truth:


error opening Unix socket: could not acquire lock on socket file: no such file or directory

That one line explained everything.

What Actually Broke

Receptor relies on a Unix socket located here:


/var/run/receptor

Here’s the subtle part:

/var/run (linked to /run) is temporary
It gets cleared on reboot or system cleanup
The directory /var/run/receptor was simply… gone

And Receptor?
It doesn’t recreate it.

So it failed silently.
And when Receptor is down:

No execution capacity
No scheduling
Jobs stay in PENDING forever

No errors in UI, because nothing even reached that layer.

The Fix

Sometimes the fix feels almost too simple.

I recreated the directory:


mkdir -p /var/run/receptor
chown receptor:receptor /var/run/receptor
chmod 755 /var/run/receptor

Then restarted services:


systemctl start receptor
automation-controller-service restart

And just like that—

PENDING → RUNNING

The system came back to life.

Making It Permanent

Because /var/run is temporary, this would happen again after reboot.

So I made it persistent using systemd:

Create:


/etc/tmpfiles.d/receptor.conf

Add:


d /var/run/receptor 0755 receptor receptor -

Then apply:


systemd-tmpfiles --create

Now the directory is recreated automatically every time.

What This Taught Me

When everything looks fine, but nothing moves:

Don’t start with playbooks
Don’t chase UI clues
Go deeper

Sometimes the failure isn’t in automation…
It’s in the foundation that enables it.

A missing directory.
A silent service.
A system waiting for something that no longer exists.

Final Thought

Not every problem announces itself.

Some just sit there quietly…
like a job stuck in PENDING,
waiting for you to look where no one else does.

Jay

Search This Blog

Jayanth Katta | Technology, Life, Health & Learning