Reverse Engineering with Codex
Sparks of AGI, revisited
Back in 2023, Sparks of AGI implemented reverse engineering through reverse prompting: GPT-4 instructed the user to run Unix tools like file, strings, a debugger, and a disassembler, and then reasoned from the outputs. The paper’s reverse engineering example is essentially a supervised, tool-mediated process where the model explains each step and the human executes it: the model moves from file format and strings to assembly, then uses a debugger to inspect execution and derive the password logic. The workflow is explicit and careful, and the human remains the hands.
In my own GPT-4 experiments at that time, I compiled and then de-compiled small C snippets with “gcc -S” and found it could recover plausible function names from assembly, implying semantic understanding. One frustrating hold-back at the time was the missing search engine integration, which meant added manual labor caused by hallucination errors in tool invocations.
For a recent interoperability project, I revisited the idea of using a coding agent as an agentic execution environment, moving from GPT-4 era reverse prompting toward autonomous, direct, tool execution. I’d already been using OpenAI Codex outside programming tasks - for Writing, including citation-management - which made it natural to apply this coding-first environment to reverse engineering.
From reverse prompting to agentic execution
Reverse prompting is still a useful mode, but the shift to an agentic execution environment changed the cadence. Rather than (reverse-)prompting “run this, then paste output,” the agent can run the commands itself, adjust based on the results (including errors, perhaps), and keep a consistent working memory across steps.
A process, not a recipe
Starting from an Android app packaged as an APKX did not allow to follow a repeatable general method, but required an ad-hoc, exploration-driven approach to discover the undocumented APIs. The path taken by Codex looked roughly like this:
Decompile the app with jadx to get a searchable codebase.
Sweep for strings and endpoints using broad `rg` searches to locate base URLs, API paths, and header names.
Follow the serializers to recover request schemas and required fields.
Infer authentication flow from endpoint annotations and token model classes.
Build a PoC client and use live responses to correct wrong assumptions.
Notice the missing secret in the JVM layer and pivot to analyzing the native-code library included in the APKX.
Extract and map hidden strings by parsing ELF sections, disassembling JNI glue, and decoding embedded data.
Iterate on errors (auth header choice, HTTP status codes, UUID format) until the service accepted the payload.
Implement the final project based on the newly acquired background knowledge.
The steps were not known upfront, and each stage depended on the previous stage’s surprises.
Data hiding and native detours
A good example of how powerful the agentic harness is was a secret API key that had been moved out of the Java or Kotlin code into a native library. The app did not store the key in obvious string resources. Instead, it used a JNI bridge: Java called a native method that returned a string based on an index, and the native code assembled the response from a hidden string table. The table itself wasn’t plain text - it was a base64-encoded CSV embedded in the ELF’s read-only data section. At runtime, the native function decoded the CSV and returned a specific entry.
This was an instance where tool choice became part of the process. Once it was clear that strings were pulled from a .so library via JNI, the plan shifted. The agent selected tools that fit the artifact - lief to parse ELF sections and symbols, and capstone to disassemble the JNI stub and map the native function. Those were installed on demand, then used to scan .rodata for base64 blobs and trace how an index returned a string.
The practical effect is that simple obfuscation patterns are no longer a meaningful barrier against a tool-driven analysis process - that can now parse, disassemble, and decode in minutes.
Collaboration still mattered
Even though reverse engineering the Android app was largely automated, the larger project was not a hands-off miracle. Codex needed help with downloading the APKX and further background details like protocol details and deployment context for the new solution to be implemented. Codex made the process tighter, but it still is collaborative.
Takeaway
The paper “Sparks of AGI” from three years ago discussed a powerful, but human-executed style of reverse engineering. The breakthrough in coding agents from 2025 where the model can run the tools itself promises utility beyond just coding. That doesn’t make reverse engineering magical, but it does make it practical. The result is a tighter loop, fewer manual steps, and a higher tolerance for complexity - especially in the static analysis phases that used to soak up the most time.
