How do you implement scheduled tasks in your SaaS?

SteveMcLeod · March 29, 2017, 8:44am

When running multiple servers in a high availability setup, how did you implement the running of daily scheduled tasks such as sending summary emails or billing?

I’m running Feature Upvote on two identical AWS servers for high availability. I now want to add daily scheduled tasks (such as typically done with cron, or in my case, with the Java ecosystem’s Quartz Scheduler). However I don’t want BOTH AWS servers to run daily tasks such as sending summary emails, as then customers will get duplicates. Is using a third “worker” server the best solution?

ezekg · March 29, 2017, 2:14pm

Yeah, I’ve always used a separate worker server for background and recurring jobs.

maximus · March 29, 2017, 3:11pm

We run schedulers on all symmetrical nodes behind load balancer in a cluster. Only one scheduler is active at any given time. Schedulers report statuses to the Redis and check “who is active” every 30 sec. If node goes down, next available scheduler activates itself. There is a locking mechanism (based on Redis) in place to make sure only one scheduler is active at the time. Redis is shared between all nodes (for HA we use master-slave deployment).

By the way, in our case schedulers don’t actually execute tasks (sending emails in your case). They asynchronously execute rest endpoint on LB, which tunnels request to any node in a cluster.

SteveMcLeod · March 29, 2017, 3:40pm

Can you please explain how you did this (or tell me a good keyword to search for; my own attempts at finding a good explanation for doing this are failing)? Is it something built into Redis?

maximus · March 29, 2017, 4:01pm

Each node is trying to say “I’m active” by using
https://redis.io/commands/setnx

If setnx returns 0 - slot is already taken by other node.

maximus · March 29, 2017, 4:06pm

If node misses next health check - it is considered inactive so key can be deleted.

stympy · March 29, 2017, 4:18pm

We use consul-do for this at Honeybadger. It does the same thing as the redis shared lock already described, but uses consul for lock coordination. I wouldn’t set up consul just for this use case, but we were already using consul for service discovery, so it was a good fit.

Another option would be to use a scheduled lambda function to either do the work or trigger the task on a randomly-selected server.

maximus · March 29, 2017, 5:18pm

Agree. For statically scheduled events I absolutely recommend scheduled lambda function, which can call API endpoint.

courtz · March 29, 2017, 10:32pm

we use Hangfire as a .NET house:

rfctr · March 30, 2017, 2:36am

No, third one would be a SPOF.

Once I implemented a similar functionality in a cluster (8 nodes). Every hour all nodes tried to insert a record into a shared DB which had a current date+current hour as an unique key. Obviously, only one of the inserts was successful (the rest were getting constraint violation error). That node was the master for the hourly tasks for that hour, and the rest of nodes went back to sleep.

Even if one or many nodes are down, at most one node was able to process the hourly tasks.

A nice side effect of this locking mechanism is that for every date you can tell what node was the master - i.e. the lock record is an audit record, too.

rfctr · March 30, 2017, 2:38am

I see a new SaaS idea:

Online scheduling of tasks. At a schedule configured via the UI, the SaaS will ping your URL or call a web service. $5/mo.

SteveMcLeod · March 30, 2017, 11:29am

Thanks all. Your comments helped. It turns out that the Java product I’m using (Quartz Scheduler) does what I want. They call it clustered scheduling, and it was there all along but I didn’t understand the terminology.

maximus · March 30, 2017, 12:47pm

As far as I remember clustered scheduling in Quartz requires database, but I guess you have it anyway.

SteveMcLeod · March 30, 2017, 3:27pm

This seems to already exist. Some say that the existence of competitors validates the idea…

rfctr · March 31, 2017, 1:25am

I intended that idea as a joke, but now I do not know what to think anymore.

SteveMcLeod · March 31, 2017, 2:41pm

@stympy and @maximus suggested scheduled AWS lambda functions. I investigated this as an alternative to Quartz Scheduler in clustered mode. Scheduled lambda wins easily. This solution is flexible, logged and monitored by default. It also really easy to schedule once-off triggers if our normal scheduled event fails.

Quartz is solid but really feels like Java code from 10 to 15 years ago, when everything needed a ton of configuration just to do the most basic thing. It also serializes the java classes for jobs and gets mightily upset when you refactor between deploys.

Skiltz · April 2, 2017, 2:35am

To keep my sanity i use https://cronitor.io/ to ping every time a scheduled task is started/completed. Cronitor will notify me if something is down.

davidhemphill · April 2, 2017, 4:34am

Kind of funny, I just announced a few days ago that I’ve built a recurring task scheduling service called Crondog which does this. It will ping a URL I specify with data on a recurring schedule (yearly, monthly, daily at certain time, every minute, etc), which then allows me to handle the task on my server. I can set up these recurring tasks via API or through the interface.

SteveMcLeod · April 2, 2017, 8:45am

After my experience last week with this problem I’m quite certain that a market exists for this! Good luck.

davedevelopment · April 3, 2017, 11:07am

Just my two cents, I already had Jenkins in service, running deployments via ansible, so I set all my scheduled tasks up on there, write up here https://davedevelopment.co.uk/2015/06/04/scheduled-tasks-with-jenkins.html.

TLDR, I was already using jenkins and ansible, running scheduled tasks this way gets me tasks that run at a scheduled time, on one of the available servers, console output is logged, success/failure is logged and notifications can go anywhere jenkins can send them (slack for us).