
Dream Popper - Hazy Memory Music Video
Ever tried to recall a distant memory, and it comes through in hazy fragments?
This music video was created using similar techniques to my AI music video workflow, but I’ve advanced and started working on a more improved workflow. It still needs improvements and I think I’m just going to have to get down and dirty in the code. Workflows available below.
This time instead of just Text 2 Video, I focused in on using Image 2 Video to give me better control over the visuals. That way I could use Loras for specific styles. For the consistent characters I just used the same prompts for the characters. No IPAdapter or Character Lora, though I did train them, just didn’t have time to get results.
To help with being able to work in bulk and because of how long the video generations took vs the image, I wanted to bulk run images, then pick my favourite outputs to then convert to video, using the same original prompt. To do this was messy using existing nodes and why I’ll possibly look into making my own for how I want to work. Overall to process was:
- Figure out a theme for the song and a style (I went for watercolor to add to the romantic and hazy memory theme)
- Generate scene ideas myself and then with ChatGPT as a “Wan2.1 prompting expert”
- Convert the descriptions into JSON
- Bulk generate images, at least 4 per prompt
- Save the prompt metadata to the image
- Pick my favourite images
- Load the images and get their original prompt from the metadata
- Run I2V workflow (About 10 minutes per render at 832x480, 49frames, 16 fps, 4 seconds) * 4
- Pick my favourite video outputs
- Run the videos through ComfyUI Ultimate SD Upscaler, 2-4 frames at a time, with face detailer
- This took about 20-30 minutes per 4 second clip. Definitely needed improving
- Put the video together in CapCut (Davinci Resolve is still dead on my computer and I can’t get it going)
- Trying to have the video work with the theme and the beats of the video
Overall this was a great project to take AI video making to the next level for myself, with still many issues left. The amount of time generating images and then videos was insane. I’m sure there’s a way to improve this process and tweak the workflow to be more efficient.
Upcoming improvements for the next video is to have consistent character generation but lost a lot of time on this and wasn’t getting the results I wanted and a lot of time spent on this so I’ll leave that for another project. Better resource management, by getting better results. In just computing each second of the video probably took almost 2 hours of rendering and throwing stuff away, upscaling etc.
Links & Resources
The workflows are ugly, my apologies.
- 🧑🎨 Bulk Image Workflow
- 🎥 Bulk Video Workflow
- 📈 Upscale Workflow
- 🎙️ Suno - AI Music Generation
- 👷♂️ ComfyUI - Node-based AI pipeline
- 📺 Wan2.1 - I2V 1.3B
- 🎨 Watercolor Lora
Lyrics
It was 2012 in the summer
I was seventeen and you were eighteen
You said “We’re two lost souls
But we’re together”
And I swear
I swear
I fell in love right thereAnd it was the best summer of my life
You were the best thing I ever found
I was just too blind to see
That you were everything I need
And you were right thereIt was 2012 in the summer
I was a stoner and you were a lover
I said “We can’t be friends
But I can’t be your girl”
And I swear
I swear
You fell out of love right thereBut it was the best summer of my life
You were the best thing I ever found
I was just too blind to see
That you were everything I need
And you were right there * 6