OpenAI Codex App: Agentic AI run locally
Despite the "for coding" pitch, Codex is useful beyond software engineering
OpenAI released “Codex app”, a nice GUI re-imagination of Codex CLI. It’s currently only available for macOS, with support for Linux and Windows coming “in a week or two”. While it, as the other Codex products, continues to be positioned as centered around coding, suitability for tasks other than coding was confirmed on the Codex AMA on X: possible tasks include testing, deploying, interacting with Figma. Or video editing: find interesting moments in a video.
The Codex App can be thought of ChatGPT with access to local files and other Agentic AI features like Skills or Automations. (The question on why there is a separation between ChatGPT and Codex App was posed on the AMA, but not answered due to technical difficulties; Skill are textual instructions on how to perform certain tasks).
Codex App introduce the new term “thread”. This is essentially the same as a conversation in ChatGPT or a session (created with "/new”, restored with “/history” or “codex resume”) in Codex CLI. One project (essentially a folder/directory in the filesystem being operated on) can have many such threads and several threads can run an agentic processing plan in parallel. Caution: while you may switch threads to work on or simply look at, pressing the blue Update button on the upper left proved perilous: a drafted thread of mine got discarded.
When kicking off a prompt, Codex App shows the agentic work steps it is performing below the prompt (orange arrow). This looks more high level, more polished than in Codex CLI, but can be expanded to its real (Unix) command representation. As part of it processing, Codex creates an execution plan and ticks that off step by step (green arrow).
On the upper right, the GUI offers shortcuts to open the project folder in e.g. Visual Studio Code, in Finder, in a Terminal session or Xcode (blue arrow). This is especially handy when creating initial input files. Output files are typically linked in the response returned by the model.
By default, executions are sandboxed - and thus isolated - from the system. This mode of operation already allowed me to have a German Excel file translated - with the original formatting retained:
This took 6 prompts, but can likely be wrapped in a single skill (Codex had produced a broken Excel file at some point, so I had to ask it to rebuild the file. It discovered by itself that it should use “a non-destructive approach (text-only replacement inside the original XML) so formatting and structure stay intact”).
For more involved processing where additional Python packages need to be installed, the sandbox is not sufficient: network access is restricted/prohibited, so “pip” cannot install the additional packages. (However, during the Codex AMA it was hinted that the sandbox(es) can be configured, so this could be a problem only with the default sandbox). One such missing package in my tests was BeautifulSoup for HTML processing - so Excel worksheet translation did work, converting HTML to Markdown did not… a “jagged frontier” of a different kind.
Yet another - successful - test of mine was scrolling through a chat window and having Codex App create a Markdown “transcript” of the chat. It did, flawlessly as far as I can tell. Codex App did this by running ffmpeg to extract individual frames from the video. However, it needed reassurance that its “vision capabilities” were “state-of-the-art” and “superior to tesseract” so it used its own VLLM capabilities. This means that Codex add video understanding to the OpenAI suite, something only easily to be had with Gemini.
While Codex App is already very useful, I do have experienced some minor annoyances:
it’s slow: I haven’t measured execution times, but it seemed as if its requests to the server-side LLM were lower priority than Codex CLI
no minimal reasoning mode: there are only “medium”, “high” and “extra high” reasoning efforts to select from. The Visual Studio Code extension for Codex used to allow “low” and “minimal” for speedy responses
frequent connection resets: Codex CLI suffers from the same when operated on the German ISP Congstar “Homespot” offering that uses 4G - and perhaps has their Carrier-grade NAT kill connections that sit idle for a short time while the Reasoning Model is thinking. (The issue has been reported several times, with reports being closed without remedy). I have no such issues on Telekom 5G, and it also seems dependent on the time of day.
Update 2025-02-08: Codex App can now ask for approval to reach outside the sandbox, e.g. for network access:



