Teardown

Malicious-but-clean: the workflow attacks scanners can't see

2026 · ~6 min read

Most workflow scanners hunt for a flaw — an injection sink, a dangerous trigger, an unpinned action. But the most dangerous CI/CD changes don't have a flaw. They're syntactically perfect. The maliciousness is in the intent, not the syntax.

What "clean" looks like

Here's a workflow step that passes a linter, uses no deprecated features, has no injection vulnerability, and would survive most automated review:

- name: Publish build metrics
  run: |
    echo "$METRICS_PAYLOAD" | base64 -d | bash
  env:
    METRICS_PAYLOAD: ${{ secrets.METRICS }}

Nothing here is a "vulnerability." It's valid YAML, a normal run: step, a normal secret reference. A scanner looking for template-injection or a known-bad action sees nothing. But the step decodes and executes an opaque blob — and a reviewer skimming a 200-line diff for "the build" waves it through. The whole tj-actions and TeamPCP/Trivy class lives here: legitimate-looking steps doing illegitimate things.

Why pattern-matching can't win this

The open-source scanners (zizmor, octoscan, OpenSSF Scorecard) are genuinely good — at what they do, which is catching unintentionally vulnerable patterns: expression injection, dangerous $GITHUB_OUTPUT writes, missing permission scoping. They key on the shape of a known flaw.

A deliberately malicious workflow has no flaw to match. It uses standard syntax, references real actions, declares plausible permissions. There's no signature because the attacker isn't exploiting a bug — they're using the platform as designed, with bad intent. And the one contextual signal scanners do reach for — "is this author trustworthy?" — they answer with metadata (account age, org membership), which says nothing when the author is a real, trusted account that's been compromised or coerced.

You can't pattern-match intent. You have to reason about it.

What detecting intent actually requires

The question isn't "does this match a bad pattern?" It's "is this change consistent with how this codebase and this person normally behave, and is there a plausible benign reason for it?" Answering that needs three things a regex doesn't have:

The semantics of the diff — what the change actually does, not what tokens it contains. Decoding-and-executing, exfiltrating env to an external host, adding pull_request_target with a checkout of untrusted code and secret access: each is a reasoned judgment, not a string match.
The committer's posture — was this pushed during an anomalous session? With no linked PR? By an account that doesn't normally touch CI config? The same diff is benign from the maintainer at 2pm and a five-alarm fire from a web session in a new country at 3am.
A bias toward "explain it" — a model that, given the diff and the context, has to articulate why a change is or isn't malicious — and flags the ones it can't explain benignly.

That's what an LLM is actually good at, and it's the gap the open-source tooling explicitly leaves open: they score the syntax; nobody fuses the diff's intent with who pushed it.

Intent, fused with identity

Sentinel's intent layer reasons over each workflow diff — malicious vs. misconfiguration vs. benign — in the context of the committer's identity posture, not in isolation. A clean-looking workflow pushed during an identity anomaly isn't "valid YAML." It's stage two of a kill chain. That fusion across domains is the whole point.

Become a design partner →