How hard would be for you to generate a Docker file that merge videos and overlay audio with ffmeg?
Are you into building smaller solutions like that? I have a ton of people that would love just a simple API that they could deploy.
They get blocked by the all the API costs.
assuming it would take a list of videos and merge them and you could also add audio, and it would call a webhook when it was done.