Aileen 3: Finding What’s New in Talks, Podcasts, and Conference Sessions
A personalized AI agent that analyzes conference talks and podcasts against your priors to surface only what’s new
I have published Aileen 3, a Generative AI system to help with cutting down on the steady information (over-)flow through podcasts, conference sessions etc. Its predecessor Aileen 2 was about building personalized summaries of German parliament proceedings. Aileen 3 broadens the scope and narrows the output by moving to the concept of expectation-driven information foraging.
Expectation-driven processing
As conference talks like to rehash at length what was discussed before elsewhere, the key approach of Aileen 3 is a personalized, contrastive approach to AI-driven summarization. Grounded in cognitive science, perceived novelty is treated as surprises.
Information is surprises. You learn something when things don’t turn out the way you expected — Roger Schank
Such surprises in a talk are modelled as prediction errors against user’s baseline expectations.
On a high level, the Aileen 3 system - “Aileen 3 Agent” - collects what the user already knows, as well as expects from a talk, at the so-called “priors”. The goal of the system to compile a dense report against these priors. Main user interaction - both entering the priors and receiving the output report - takes place in a Gradio web frontend:
Implementation
The backend to this frontend is implemented using the Google Agents Development Kit (ADK). This runs several (sub-)agents - most crucially the Assistant Agent - which ultimately send the report back to the frontend. This agent uses two tools: a Google Vertex AI memory bank and the Aileen 3 Core MCP server.
Memory Bank
The memory bank is the agent’s persistent memory: a helper script is used to set it up and ingest from e.g. LinkedIn posts in order to form additional priors. Due to lack of time, this was not tested much. But the memory bank is queried as part of the agent execution. Memory Bank is a paid service with Google Vertex AI now.Aileen 3 Core
This encapsulates working with media, like downloading e.g. from YouTube, extracting (+ translating) slides as still images, and transcribing the audio track. To curb Automated Speech Recognition hallucinations, the Priors are used to steer and ground the ASR model (Gemini 3, currently).
Aileen 3 Core is a separate project as it was submitted to the 1st MCP birthday competition hosted by Anthropic and Gradio. The demo space explains some of the challenges addressed, a demo video of this is on YouTube.
Because the Google ADK did not properly support agentic tool calling as there was a breaking change with Gemini 3 with regards to the LLM/RLM reasoning traces, the Aileen 3 Agent project uses a fork of ADK.
References
GitHub Links
Aileen 3 Agent
Multi-Agent system with user frontendAileen 3 Core
MCP server for agentic media acess
Demo videos
focused on Aileen 3 Agent
including which tools were used in the course of the Kaggle Hackathon submission
focused on Aileen 3 Core MCP
Social media post on LinkedIn
Limitations
Aileen 3 is still an early system and has limitations. These include:
Slide translation is not prior-aware
No dedicated safety guardrails yet
There are no explicit guardrails for style/tone control or prompt-injection resistance at this stage.No built-in cost controls
As with any generative AI system, hallucinations remain possible, so outputs should be verified before high-stakes use.



