Back to Blog

March 16, 2026

How We Built FreeTitle with Google Gemini and Google Cloud

A Creative Director for AI Video Production

This post was created for the purposes of entering the Gemini Live Agent Challenge.

The Gemini Live Agent Challenge asks builders to go beyond the traditional text box and create next-generation multimodal agents with Google AI models on Google Cloud. For the Creative Storyteller track, that means building an agent that can think and create like a creative director, weaving together text, images, audio, and video into one cohesive experience.

That vision felt deeply aligned with what we have been building.

We created FreeTitle as a creative director for AI video production: an AI system that helps turn vision into launch-ready cinematic videos. Instead of forcing creators to piece together disconnected tools for writing, image generation, storyboarding, sound, and video, FreeTitle brings the process into one agent-native studio where an AI creative director can interpret intent, shape style, generate assets, and guide a project from concept to final output.

Why we built it

Having worked as an AI Engineer in Hollywood and as an independent filmmaker, I have seen a real tension around AI in the creative field. That tension is understandable. Filmmaking is not just output. It is taste, emotion, authorship, and craft. Many people resist AI because they are protecting something genuinely human and valuable.

At the same time, I have repeatedly seen a quieter tragedy: talented filmmakers, creators, and brands with strong ideas simply cannot bring their vision to life because production is too expensive, too slow, and too operationally complex. The distance between imagining a film and actually shipping it is still brutal.

Generative AI changes that equation. It lowers barriers and opens up entirely new possibilities for storytelling. But most creative AI workflows still feel fragmented and hostile to the process. They ask users to become prompt technicians, workflow engineers, and tool wranglers. Instead of supporting vision, they often bury it.

That is why FreeTitle exists.

We believe AI should not replace human creativity. It should function as a creative intelligence layer that carries the operational burden of production while preserving human taste, direction, and authorship. In fully autonomous mode, it can help drive the workflow end to end. In collaborative mode, it works like a creative teammate inside the user's studio.

Our goal is not to make an "AI slop" machine. It is to build a real creative director for AI video production: a system with the judgment, production skills, and multimodal fluency to help more people turn vision into cinematic work.

Why this challenge felt like the right place

What makes the Gemini Live Agent Challenge exciting is that it is not asking for another generic chatbot. It is asking builders to create software that feels more immersive, multimodal, and alive.

That matters because video production is inherently multimodal. A creative director does not work in isolated steps like "first text, then image, then video." Real creative work moves fluidly across narrative, visuals, sound, pacing, and revision. It requires a system that can hold all of those moving parts together.

The Creative Storyteller track felt like the right fit because it emphasizes interleaved output: experiences where narration, images, explanations, storyboards, audio, and motion are woven into one flow rather than generated as disconnected pieces. That is exactly how we think cinematic creation should feel.

For us, Gemini made it possible to build toward that kind of experience.

What FreeTitle does

FreeTitle helps creators go from idea to cinematic output inside a professional AI studio.

A project can begin from a brief, a mood, a visual reference, a brand concept, or a rough story idea. From there, FreeTitle can help develop the creative direction, scripts, characters, visual worlds, storyboard frames, shots, audio, and video assets needed to move the project forward.

Just as importantly, it supports both autonomy and collaboration.

If a creator wants speed, the system can take on large parts of the workflow autonomously. If they want tighter creative control, they can step in at any point to redirect, refine, and co-create. That balance is central to our vision. The future of creative AI should not be a black-box generator that erases authorship. It should be a system that expands what creators can do.

The product decision that changed everything

One of the most important choices behind FreeTitle was deciding that AI video production should not live only in a prompt box or inside a maze of nodes.

Video production is inherently linear. Scenes, shots, revisions, and final assembly all benefit from a clean production flow. So instead of building around fragmented chat prompts or graph-heavy canvases, we built FreeTitle AI Studio around an agent-native timeline workspace.

That matters for both people and AI.

For creators, the timeline makes the work easier to direct, review, and iterate. For the AI creative director, the workspace provides a much clearer operating environment: what scene is being discussed, what shot needs revision, what visual references belong where, and how the project is evolving as a whole.

The result is that the AI is not hidden behind a backend pipeline. It works alongside the user in the same studio. Users can watch it create in real time, interrupt it, leave feedback, and stay hands-on while the system continues guiding the project.

How we built it with Google AI models

At the heart of FreeTitle is a flexible Gemini-powered creative system built for multimodal production.

We used Gemini as the reasoning engine for creative direction, planning, tool use, multimodal inspection, and contextual decision-making across the workflow. This is what allows the system to behave less like a fixed automation script and more like a creative director that can hold context, interpret intent, and adapt as the project evolves.

We used Google AI models across the rest of the production stack as well:

Most importantly for this challenge, we designed FreeTitle around interleaved multimodal output. Instead of treating writing, images, sound, and video as separate systems glued together after the fact, FreeTitle is built to connect them into one evolving creative flow. A brief can lead to narrative direction, which leads to visuals, which informs motion, which feeds back into the overall artistic identity of the project.

That interleaving is not a side feature for us. It is the core creative experience.

How we built it on Google Cloud

We also wanted FreeTitle to feel like a real deployed product, not just a local prototype.

FreeTitle is hosted on Google Cloud, with the backend deployed on Cloud Run and generated media plus traceable production artifacts stored in Google Cloud Storage. That setup gave us a practical and developer-friendly environment for shipping a working multimodal product quickly while still keeping the system production-minded.

Google Cloud was a strong fit for this kind of project for a few reasons.

First, it made it straightforward to deploy and expose a real product experience rather than just a demo script. Second, it pairs naturally with Gemini-powered workflows, which matters when your application is deeply multimodal. Third, it supports the kind of iterative development independent builders need: fast deployment, manageable infrastructure, and room to grow as the product becomes more ambitious.

For us, Google Cloud was not just a hosting requirement. It was the environment that let us turn a creative systems idea into an accessible studio product.

A few design principles behind the system

We were careful not to build FreeTitle as a rigid production pipeline.

Real creative work rarely follows one fixed sequence. Sometimes you start with a script. Sometimes with a visual. Sometimes with a moodboard, a reference image, or even a single shot. Good creative systems need to support revision, reordering, experimentation, and iteration.

That is why FreeTitle is designed to be adaptive. The creative director can work across the project context, evaluate outputs multimodally, and guide the next step without forcing creators into one brittle workflow.

We also built a strong art-direction layer into the product through Stylization Boards. Instead of relying on vague prompting and hoping for a good aesthetic result, users can collaborate with the creative director to explore aesthetics, compare references, and lock in a moodboard for the whole project. That visual grounding helps the final output feel more coherent, more intentional, and more cinematic.

Finally, we cared a lot about observability and trust. Even though this is a creative product, it still has to behave like real software. That meant building with enough traceability and production awareness to make the system feel dependable as it scales.

What we learned

One of the biggest lessons from building FreeTitle is that multimodal AI becomes much more powerful when it is treated as a creative system rather than a generator.

What makes a creative director valuable is not just that it can produce assets. It is that it can hold context, compare options, interpret taste, maintain style, and keep different media aligned toward one vision. That is where multimodal reasoning starts to feel genuinely useful for creative work.

This is also why the Gemini Live Agent Challenge felt so relevant. It encourages builders to create software that breaks the text-box paradigm and treats multimodal interaction as the center of the experience, not an add-on. That is the direction we believe in too.

Closing

We built FreeTitle because we believe creative AI should expand authorship, not flatten it.

The future we want is one where more creators, filmmakers, and brands can make ambitious cinematic work without needing a full studio budget or a patchwork of disconnected tools. Google Gemini gave us the multimodal intelligence to prototype that vision, and Google Cloud gave us the environment to turn it into a working product experience.

FreeTitle is still early, but the direction is clear: a creative director for AI video production, built with Google AI models and Google Cloud, designed to help more people bring vision to life.