I’m more and more convinced of one thing: AI video is heating up again not because models suddenly learned how to make films, but because the input layer changed.

For a while, everyone treated prompt writing like a contest. You had to know shot language, pacing, style cues, and how to avoid broken hands, broken faces, and broken subtitles. That works for power users. For everyone else, it gets old fast. People are not here to take a prompt exam. They are here to ship content.

The signals in today’s tech radar point in the same direction: AI video tools are moving from “help me describe better” to “help me describe less.” That is a big shift.

This wave is about intake, not just generation

If you only look at the surface, it’s easy to think the heat is back because the models improved again. Sure, they did. But I care more about a different change: tools are finally learning how to absorb natural human inputs.

You speak, they turn it into a video. You record your screen, they add subtitles, pacing, and packaging. You drop in raw material, they assemble something publishable.

That is much closer to real creative work than “give me a perfect prompt.”

Most creators don’t start with polished cinematic language. They start with a voice note, a screen demo, a product point, a rough idea. If the tool still expects people to talk like directors, the barrier is still too high.

The Product Hunt and GitHub signals line up nicely

A few names from today’s charts tell the story well.

Gemini Omni: content from any input, starting with video

Gemini Omni shows up with a very direct promise: from any input to content.

I like that direction because it stops assuming the user begins with a prompt. You can start from video, voice, assets, or a rough thought, and the system moves you toward a finished piece. That is a much healthier product assumption than “write something clever and try your luck.”

Useful tools do not force you to relearn how to express yourself. They continue from the input habits you already have.

Velo 2.0: voice and screen are already the most natural inputs

Velo 2.0 feels right to me for a simple reason: the most natural way humans communicate is not by writing a perfect prompt. It is by speaking, showing, and recording.

If you want to explain a feature, you talk. If you want to build a tutorial, you record the screen. If you want to present a product, you narrate while using it.

That is much more real than trying to describe a cinematic 16:9 shot with blue edge lighting and a slow push-in.

So for me, the real competition in AI video is not prompt skill. It is the ability to turn voice, screen captures, assets, and scripts into a finished video automatically.

Open-Generative-AI, Pixelle-Video, and hyperframes: open source is moving toward pipelines too

The GitHub side says the same thing.

Open-Generative-AI looks like an open content factory, bringing many image and video models into one studio. Pixelle-Video pushes hard toward a fully automated short-video engine. hyperframes goes even further with a blunt message: Write HTML. Render video. Built for agents.

The shared idea is obvious: video is no longer just about generating clips. It is becoming something you can orchestrate, reuse, and automate.

I especially like the hyperframes direction. HTML is structured by nature. It already works well for layout, data binding, templates, and batch reuse. Turning video into a structured asset is much more valuable long term than treating it as a one-off spectacle.

Why I think “inputs” matter more than prompts now

Because prompts still force the user to translate.

You have an idea in your head, and then you have to translate it into model-friendly language. Translate well, and the result is decent. Translate badly, and the output drifts.

That’s fine for people who enjoy tinkering. It is too much friction for most content teams. Marketers, knowledge creators, small business owners, training accounts, product managers, they do not want to become prompt specialists. They want to get the content out the door faster.

So I’m much more bullish on a different stack:

  • Make input as natural as possible. If speaking works, don’t make people type too much.
  • Automate the process. If it can be orchestrated, don’t make users hand-build it.
  • Keep output editable. Don’t freeze the result.
  • Keep templates stable. Don’t start from scratch every time.

Put those together, and you get something that actually behaves like a production tool.

The real competition is shifting from generation to orchestration

The old question was: can AI generate video?

That question is too small now. The better question is: can AI take care of the full video production chain?

A video is not just visual output. It includes:

  • topic and angle
  • script structure
  • asset collection
  • screen recording or generation
  • subtitles and voiceover
  • pacing and transitions
  • cover and title
  • format adaptation across platforms
  • post-publish review

Single-shot generation only solves a small part of that. Workflow tools need to solve the whole chain.

That is why I’m both cautious and optimistic about the idea that people can make videos without writing prompts. Cautious, because I don’t buy the “one click, high-end masterpiece” fantasy. That sounds too much like course-selling fluff. Optimistic, because ordinary creators really are getting a way out of a lot of unnecessary technical overhead.

What I’m betting on

I’m most bullish on the combination of agent + template + multimodal input + editable output.

The agent understands the task and breaks it into steps. Templates keep style and structure consistent. Multimodal input lowers the expression cost. Editable output absorbs the revision work.

That stack feels like a production system, not a toy.

The best video tools in the next wave probably won’t open with a long prompt box. They’ll ask more human questions:

  • Which platform is this for?
  • Who is the audience?
  • What do you want people to do after watching?
  • Do you already have assets?
  • Should we reuse last time’s style?

Then, behind the scenes, they fill in the script, subtitles, shots, structure, and export settings for you.

It sounds less flashy, but I think it is more honest. The things that change daily creative work are usually not the most dramatic features. They are the most effortless workflows.

Final thought

AI video is heating up again, but the center of gravity has changed.

The old game was about who could generate better. The new game is about who can absorb inputs better and turn them into a clean, reusable production line.

Put simply, the thing worth paying for is no longer a magic prompt. It is a system that can ship, reuse, and iterate.

That is the direction I like. Less theatrical, more useful.

References