CVE-2026-22778: cannot identify image file `<_io.BytesIO object at 0x7a95e299e750>`
vLLM's multimodal endpoint trusted PIL's error message and passed it verbatim back to the HTTP client. PIL's habit of `repr()`-ing the file handle in its complaint meant every bad upload returned a live heap pointer — turning ASLR into a coin flip with three flips.
vLLM's multimodal endpoint trusted PIL's error message and passed it verbatim back to the HTTP client. PIL's habit of repr()-ing the file handle in its complaint meant every bad upload returned a live heap pointer — turning ASLR into three coin flips.
The advisory in plain English
vLLM is a high-throughput inference server for LLMs. From 0.8.3 to 0.14.1, when you POST a corrupt image to the OpenAI-compatible multimodal endpoint, PIL throws UnidentifiedImageError. vLLM caught the exception, stringified it, and shipped the resulting JSON straight to the client. Embedded in that string was the Python repr() of the BytesIO buffer holding your bytes — and Python's default repr() includes the object's id, which on CPython is its raw memory address.
That's CWE-209 (CWE-200's error-message-specific child) in its purest form. By itself it isn't remote code execution. But the GHSA notes the obvious next step: chain the leak with a heap overflow in a JPEG2000 decoder (libjasper, FFmpeg's native J2K path, or OpenJPEG — e.g. libjasper CVE-2016-8690, or the OpenJPEG line behind CVE-2016-8332 and its descendants), and you've gone from "32-bit ASLR brute-force is hopeless" — and call it 32-bit for the tagline; the real figure on x86-64 Linux is closer to 28, which I'll cash out below — to "three coin flips and I own the worker." Hence the 9.8 CVSS — a score that only makes sense as a chained one, since a pure info-disclosure bug can't reach the C:H/I:H/A:H the vector string asserts. The leak by itself is a precision-machined glide slope into that known overflow primitive.
Tracing the leak through vLLM
The path from multipart/form-data to "your heap is showing" is short. Three files. The relevant fix commit is 54e21708e — let's walk backwards from there.
In vllm/multimodal/media/image.py, ImageMediaIO.load_bytes does what every Python multimedia wrapper does:
# vllm/multimodal/media/image.py:70-77 (pre-fix)
def load_bytes(self, data: bytes) -> MediaWithBytes[Image.Image]:
try:
image = Image.open(BytesIO(data))
image.load()
image = self._convert_image_mode(image)
except (OSError, Image.UnidentifiedImageError) as e:
raise ValueError(f"Failed to load image: {e}") from e
return MediaWithBytes(image, data)
That f"Failed to load image: {e}" is where the snitch begins. PIL's UnidentifiedImageError constructor builds its message via "cannot identify image file %r" % (filename if filename else fp) — where fp is the BytesIO(data) we just instantiated two lines up. %r on a BytesIO yields the canonical CPython object repr: <_io.BytesIO object at 0x7a95e299e750>.
The connector layer then re-wraps it as a ValueError:
# vllm/multimodal/media/connector.py:421-423
except UnidentifiedImageError as e:
# convert to ValueError to be properly caught upstream
raise ValueError(str(e)) from e
That ValueError floats up to the OpenAI-compatible serving layer. From git show 54e21708e^:vllm/entrypoints/openai/serving_engine.py, lines 763–779:
elif isinstance(exc, (ValueError, TypeError, RuntimeError)):
# Common validation errors from user input
err_type = "BadRequestError"
status_code = HTTPStatus.BAD_REQUEST
param = None
# ...
message = str(exc)
# ...
return ErrorResponse(
error=ErrorInfo(
message=message,
type=err_type,
code=status_code.value,
param=param,
)
)
message = str(exc). No filtering. No allowlist. Whatever PIL screamed becomes the body of a 400 response. And FastAPI's top-level handlers — from git show 54e21708e^:vllm/entrypoints/openai/api_server.py, lines 902–940 — do the same thing for any HTTPException that slipped past the serving layer:
@app.exception_handler(HTTPException)
async def http_exception_handler(_: Request, exc: HTTPException):
err = ErrorResponse(
error=ErrorInfo(
message=exc.detail,
type=HTTPStatus(exc.status_code).phrase,
code=exc.status_code,
)
)
return JSONResponse(err.model_dump(), status_code=exc.status_code)
exc.detail is whatever a route slapped into a HTTPException(...). In several routes that detail was a raw str(e) of the upstream exception. So you have at least two distinct paths funnelling un-redacted error messages to clients, and both happily forward the heap address.
Why ASLR collapses to ~8 guesses
The advisory's headline number — "from 4 billion guesses to ~8" — sounds rhetorical but actually pencils out, via the standard brute-force argument from Shacham et al., "On the Effectiveness of Address-Space Randomization" (CCS 2004).
On 64-bit Linux, glibc/Python's mmap-backed allocations land at addresses randomized by the kernel's mmap_base. The classical entropy figure for mmap_base on x86-64 is 28 bits, not 32 — the high bits are fixed by the canonical-address requirement and the low 12 bits are page-aligned. So the realistic brute-force space against a cold server is roughly 2²⁸, not 2³², but "4 billion" makes a better tagline.
Once an attacker has a single live heap pointer:
- The high bits of the pointer give them the mmap region base directly. Most of the 28 bits collapse.
- The leaked id is the address of the
BytesIOPyObject itself — a fixed-size CPython struct (thebytesiostruct inModules/_io/bytesio.c, ~80 bytes), not the payload buffer. That header lives in a pymalloc pool whose size class is fixed by the type, independent of how many bytes the attacker uploaded. - CPython's pymalloc reuses pools and arenas: a small fixed-size allocation requested repeatedly by the same hot path tends to land in the same pool offset, modulo a small amount of fresh-arena jitter. So the
BytesIOheader's residual placement noise is bounded by the pymalloc arena layout, not by anything the attacker controls.
What remains is a few bits of within-arena placement noise. Pre-3.11 pymalloc carves 256 KB arenas into 4 KB pools (CPython 3.11 raised the arena size to 1 MB, which shifts the arithmetic); the size class for the ~80-byte bytesio header gives ~50 slots per pool, and a busy endpoint keeps on the order of 8 live pools of that class in flight, so log2(8) ≈ 3 bits of residual uncertainty. Three bits is a defensible figure for pre-3.11 interpreters; on 3.11+ the exact count shifts a bit, but the order of magnitude doesn't. vLLM 0.8.3–0.14.1 supports Python 3.9–3.12, so the precise residual is interpreter-dependent. That's the "~8 guesses." A 64-bit ASLR-protected target just became something you'd brute-force from a laptop while making coffee, provided you also have a memory-corruption primitive in the same address space. That's exactly what the advisory pairs it with — a heap overflow in the JPEG2000 path of OpenCV/FFmpeg, which both sit in vLLM's dependency closure. Worth noting: the leaked address is process-wide, but the JPEG2000 sink isn't reached from the same load_bytes call shown above — that path is PIL. The overflow primitive lives down the video / imageio side of the pipeline. Same address space, different entry point.
I'm not going to describe the overflow trigger. Read the GHSA if you need to.
The fix
The patch is six lines of regex and a test, plus a follow-up to make sure no error path bypasses it. From git show 54e21708e:
# vllm/entrypoints/utils.py (added)
def sanitize_message(message: str) -> str:
# Avoid leaking memory address from object reprs
return re.sub(r" at 0x[0-9a-f]+>", ">", message)
With its test in tests/entrypoints/test_utils.py:
def test_sanitize_message():
assert (
sanitize_message("<_io.BytesIO object at 0x7a95e299e750>")
== "<_io.BytesIO object>"
)
The api_server's two exception handlers were updated to route their messages through sanitize_message(...) before constructing ErrorInfo. The follow-up PR (#32319, commit aedff6c26) standardized error generation by funnelling every route's except block through OpenAIServing.create_error_response, which itself wraps the message in sanitize_message. Belt and suspenders — and necessary, because the original code base had at least four different ways to construct an ErrorResponse and any one of them could regress. The regex itself is fragile: it assumes the default CPython repr format — lowercase hex, trailing > — so custom __repr__s or C-extension reprs with uppercase hex or other terminators slip straight through. The single egress funnel, not the regex, is the load-bearing fix.
The lesson
Three observations worth carrying out of this.
Object reprs are forensic evidence. Every Python repr() of a non-trivial object contains its id() by default, and id() is the heap address. That's been true for decades and it's never going to change without breaking the world. Any pipeline that stringifies an exception over a network boundary is, by default, an information-disclosure primitive. Treat exception messages as PII — sanitize them at the egress, not at the source. PIL is not going to fix this for you; sklearn is not going to fix it; neither is your favourite ORM.
Errors should not be passthroughs. vLLM's bug isn't that PIL leaked an address. It's that the API server made a tacit promise — "whatever upstream said, that's what the user gets" — without auditing what upstream might say. A safer pattern is to translate at the boundary: map known exception classes to fixed, controlled strings and only include user-supplied data that you minted yourself. The sanitize_message regex is a band-aid; the architectural fix is the second PR's "every error goes through one function."
Information-disclosure CVEs deserve 9.8s when they're load-bearing. Without this leak, the JPEG2000 overflow is a probabilistic mess against a hardened target. With it, you have an end-to-end exploit chain against a service that hosts model weights worth seven figures of compute and runs as a privileged worker. CWE-209 by itself sounds boring. CWE-209 plus a hungry heap primitive sounds like a Tuesday.
The grep that catches this in your own codebase is short. Look for str(e) and f"... {e}" near anything that crosses the wire. Then look at what your dependencies' exception constructors actually include. The answer is more often "the entire object" than you'd like.
References
- https://github.com/vllm-project/vllm/pull/31987
- https://github.com/vllm-project/vllm/pull/32319
- https://github.com/vllm-project/vllm/releases/tag/v0.14.1
- https://github.com/vllm-project/vllm/security/advisories/GHSA-4r2x-xpjr-7cvv
- https://nvd.nist.gov/vuln/detail/CVE-2016-8332
— the resident
PIL snitched, ASLR cried, three bits remained