AI technology continues to evolve rapidly, and one of its most exciting and creative new capabilities is the ability to generate videos from text. If you haven’t tried this yet, it’s surprisingly fun and powerful. But it does take a little practice and know-how to get good results. After reading this, you’re going to be ready to create your first video masterpiece.
If you haven’t seen this funny (and wild) video, have a quick watch. You can now generate AI videos like this – complete with corresponding audio! - for just $20/mo.
It’s a remarkable breakthrough. Let’s dive in and explore.
A brief history of text to video tech
The modern boom in text-to-video generation didn’t really begin until late 2022, when Meta’s Make-A-Video preview proved that diffusion models could turn ordinary prompts into coherent five-second clips. A few months later (March 2023), Runway’s Gen-2 became the first accessible product for everyday creators, and by that fall, YouTube’s Dream Screen embedded the technology directly into Shorts.
Recently though, capabilities leaped forward with OpenAI’s SORA and Google incorporating this technology directly into Gemini. The race is on. OpenAI’s Sora preview (Feb 2024) and Google DeepMind’s Veo (May 2024) extended clip lengths and added convincing physics and in Veo’s case, native audio. While we were all still understanding text to image generation advances, AI video generation has also been evolving at a breakneck pace.
And now we have sound! VO3 videos now include sound by default, which is a first that I’ve seen. This makes sense as Google has access to all YouTube content for training their models, and I would imagine that is a remarkable and unique advantage in this field.
Meet the tools
OpenAI’s SORA and Google’s VEO3 are available through their AI chatbots with their respective Pro plans for GPT and Gemini ($20/mo). To start exploring, simply head to the chatbot interface, and click on the below.

Key Differences
Right off the bat, if you’re following along, immediately you’ll see that there are differences and options available in both:
SORA format is different – it takes you out of ChatGPT and you see a dedicated space for SORA generation that also shows what others are creating in real time. In the prompt box there are various options you can choose:
Video or Photo
Aspect ratio (to create video for YouTube vs Mobile Phone use, for instance)
Resolution
Video time (longer time is available with a more expensive Pro license)
Number of video variations to generate to choose result from
Veo3 has less options, but it is integrated directly into Gemini, which makes using it a breeze. You do have an option to upload a photo and make a video out of it (something to try!). As noted above, sound is included in your result as default.
Other Tools
For the purposes of this post, we’re going to keep with SORA and VEO3 as they are the most accessible for most people already are using an AI service, but there are also many other tools to explore out there. If interested, some are highlighted in this fantastic “GenAI steak off” post below.

Getting started
To give it a try, head to Sora or Veo3 and try generating a short clip. The beauty of it is that it could be about just about anything . Let’s just start with a simple idea and prompt to get started, and a comparison on what they look like in our two tools above.
📝 Example Prompt: Glass jar on a table, inside tiny storm clouds form and rain, then a rainbow.
This was my first result of the same prompt in SORA and Veo3. As you can see, the styles are a bit different, but overall they are very similar, with the most obvious difference being that sound included in Veo.
Project: Dad Joke Theater
Let’s take on a little more ambitious project and build out a series of videos compiled together. I actually haven’t done this, so I’m starting from scratch and documenting the process below if you want to follow along.
For something like this, I’m going to start brainstorming with AI to help me plan out my approach and figure out tools. i used GPT to brainstorm about content for a fun one-minute video and after considering a few alternatives, I decided to build out a collection of dad joke videos and splice them together using Microsoft Clipchamp (which is included in Windows 11). Let’s give it a go.
I went ahead and used Veo3 for everything because of audio capabilities. I brainstormed with GPT on a bunch of dad jokes, chose a few, and then asked it for 8-second video prompts (in a Canvas for easy editing). I had to go back and forth a bit, but it overall worked well, and it provided a nice summary of prompts.
With prompts in hand, time to create the videos, so I headed to Gemini, clicked the video button, and started pasting.
All worked great but with some caveats. For instance, Veo3 would only generate the joke audio sometimes, but if I replaced “VO AUDIO” with “VO AUDIO WORDS” then it seemed to do just fine. Took me some trial and error on that one.
Also, you’ll likely be cutoff at some point. I created so many videos that Gemini started limiting me to three videos per day – annoying! This took a little longer than I was anticipating, but finally got there, three videos a day at a time.
Next, time to pull it all together using Clipchamp. I asked GPT for a tutorial, but it’s a pretty intuitive tool. I just started playing around with it and splicing it all together.
The result is… 🥁 drumroll please 🥁… a fun little dad joke original.
I saw for a first timer, pretty good!
An overview of my process
Came up with the idea to create a short movie (but had no clue where to start).
Chatted with GPT to brainstorm, storyboard, and gather prompt ideas
Used GPT to generate a list of dad jokes, chose a few favorites, and created corresponding prompts (using Canvas for clarity).
Opted to use Veo3 for video creation due to the built-in audio feature.
Discovered some quirks (like Veo3 often ignored voiceover unless I included the phrase “VO AUDIO WORDS” in the prompt).
Hit a daily limit: after generating too many, Veo3 restricted me to three videos per day—frustrating!
Downloaded all clips and stitched them together using Microsoft Clipchamp.
Uploaded the final product to YouTube. Voila!
Key takeaways
Time-consuming: The process took longer than expected due to platform limits.
Lack of clarity at times: Google didn’t give me a sense of how many I could produce, then cut me off for 5 days, then limited me to three videos a day. Is that standard? Unclear.
Prompt precision matters: Prompts matter less with chatbots I find but they matter a whole lot with videos. Specifying details makes a big difference.
Trial and effort pays off: Sometimes it takes multiple times with the same prompt to get something close to what you’re looking for.
I also recommend this excellent WSJ article which outlines their process and experience. It’s fascinating to see what people who really know how to use these tools can do. This Film Was Made Entirely With Runway AI and Google’s Veo. It Nearly Broke Us. - WSJ
Conclusion
AI video generation is a fun and accessible tool for anyone. Whether you’re making serious content or something like my Dad Joke Theater project, there’s never been a better time to experiment with this technology.
What can we expect to see in the future? Looking ahead, I would expect to see general improvements in video and sound, longer scenes, increased frame-to-frame consistency, and finer control over the content and feel of the videos. As always, these tools today will be the worst they will ever be.
Give it a try, create something, and share it below. If anything, just have some fun with it.