Smart Scheduling with the OR Toolkit

Blog Single

Nonprofit staff are often deeply idealistic people who want to help others—teaching kids, running camps, coaching, creating art, and making the world better in concrete ways. Unfortunately, the administrative and bureaucratic demands of modern organizations often pull them away from that work and into spreadsheets and scheduling tools.

Empower SCI is an amazing non-profit that runs therapy camps for individuals with spinal cord injuries. Their clients travel from all over the country to a scenic locale, where they take part in a variety of healthy activities such as yoga, meditation, cooking lessons, and outdoor adventures that have been designed with their special needs in mind. They also receive several therapy sessions of different types, including physical, occupational, and massage.

Running these camps is a difficult organizational challenge for many reasons. One tricky issue is scheduling the different types of therapy. There are a variety of constraints that need to be satisfied when scheduling, for example:

  • - Each client must receive three sessions of physical therapy (PT) and occupational therapy (OT) during the week
  • - Each client must receive one session of rehabilitation counseling (RC) and massage therapy (MT), if that type of practitioner is present at the camp
  • - Ideally, the client should work with the same therapist for all their PT and OT sessions, and these sessions should be at the same time of day
  • - The therapy sessions should occur on Monday, Wednesday, and Friday, at 9am, 10am, or 11am, with noon reserved for lunch
  • - If necessary, a client can have their lunch break at 11am, and another therapy session at noon, but only if the therapy is less physically taxing (OT or RC)

What makes this scheduling problem tricky is that there is a limited number of therapists of each type. For MT and RC, there is typically one provider for the whole week. For PT and OT, there are multiple therapists, but these therapists might not be available on every day of the camp.

In prior years, the Empower admin staff attempted to solve this problem by manual data manipulation in Excel. This was a highly tedious and time-consuming task: scheduling under complex constraints is the kind of work the human brain is not particularly good at.

To solve this scheduling problem, we built a standard WWIO app to hold all the data about the camp: how many clients, how many therapists of each type, what time a certain client visits a therapist, and so on. The app allows the admins to manage the data, as well as view the data in various ways (calendar view, etc).

In addition to the standard WWIO app, we built a special backend integration that uses Google’s OR-Tools library, an industrial-strength optimization toolkit widely used in logistics, manufacturing, and workforce scheduling. To attack this problem, we wrote a Python script that pulls data in from the WWIO SQLite databases, sets up the necessary constraints based on the data, and then invokes the OR Toolkit solver.

Here's an idea of how the optimization problem is setup. First, we read therapist and camper ID list using standard SQL queries (not shown). Then we start creating OR variables that using the NewBoolVar function. These are the quantities that the OR solver will operate on to perform the optimization.

        for cid in camperlist:
            for tid in theraplist:
                for offset in self.loader.get_therapy_off_list():
                    for hour in UTIL.HOUR_LIST:
                        appkey = Appoint(camper_id=cid, therap_id=tid, off_set=offset, hour=hour)
                        thevar = self.model.NewBoolVar(f"appointment__{len(self.app_info):05d}")
                        self.app_info[appkey] = thevar
Each of these boolean Appointment variables is indexed by a therapist ID, a camper ID, a date offset (an integer representing the date), and an hour (9, 10, ...). After defining these basic variables, we can set up derived variables which can express the goals of the optimization. One goal is that each patient is consistently paired with the same therapist, for each type of therapy. So if participant Bob works with physical therapist Steve on Monday, we'd like to ensure that Bob also works with Steve throughout the program, not some other PT. To do this, we define a ct_link variable:
        for cid in camperlist:
            for tid in theraplist:

                self.ct_link[(cid, tid)] = self.model.NewBoolVar(f"camper_therap_link_{cid}_{tid}")

                allctappt = [v for app, v in self.app_info.items() if app.camper_id == cid and app.therap_id == tid]

                self.model.AddMaxEquality(self.ct_link[(cid, tid)], allctappt)
The ct_link variable is defined as being true if there is any meeting between a participant and a therapist. We implement this definition by using the AddMaxEquality to set a given link variable to be the disjunction of a list of all possible appointments between a therapist and a participant. Finally, to produce the desired behavior, we tell the optimizer to minize the number of these variables that is active:
        self.model.Minimize(
            sum(self.ct_link.values())
            + ... (other optimization values)
        )
To see a more complex example, consider another rule about avoiding "busy lunch". There are four hours blocked off for therapy, 9 to 12. It's preferred that one of the hours 11 or 12 is free for lunch. In some cases, due to other pressures on the schedule, it is acceptable for the participant to have lunch during therapy, but we would like to avoid this "busy lunch" scenario. To do this, we define new variables for each participant + date pair, which represents if the person has a busy lunch that day:
for cid in self.loader.get_camper_ids():
    for offset in self.loader.get_therapy_off_list():

        self.camper_busy_lunch[(cid, offset)] = self.model.NewBoolVar(f"camper_busy_lunch{cid}_{offset}")

        lunchappt = [
            v for k, v in self.app_info.items()
            if k.camper_id == cid and k.therap_id != -1 and k.off_set == offset and k.hour in UTIL.LUNCH_HOURS
        ]

        self.model.Add(sum(lunchappt) == len(UTIL.LUNCH_HOURS)).OnlyEnforceIf(self.camper_busy_lunch[(cid, offset)])
        self.model.Add(sum(lunchappt)  < len(UTIL.LUNCH_HOURS)).OnlyEnforceIf(self.camper_busy_lunch[(cid, offset)].Not())
The somewhat verbose construction involving an equality and an inequality, which are only enforced if the busy lunch variable is active, is how we specify the IFF (IF and Only iF) relationship to the OR engine. So if the busy lunch is active, there must be as many lunch appointments as lunch hours; otherwise there must be fewer. Finally, we ask the model to minimize the number of active busy lunch variables:
    self.model.Minimize(
        sum(self.ct_link.values())
        + sum(self.camper_busy_lunch.values())
        + ...
    )

In principle, we might want to add weights to the link variable sums and the busy lunch sums, to represent the tradeoff of how relatively important it is to avoid busy lunch compared to mixing up the therapist / participant pairings. In practice, the optimization seems to work well enough that it hasn't been necessary to do this.

It's worth making one last observation about the number of variables involved. Keen observers may have noticed that the original definition of the Appointment variables involved 4 nested for-loops: participant, therapist, date, and hour. For the Empower camps, there are typically about 10 participants and therapists, just 3 therapy days, and 4 potential therapy hours, for a total of about 1200 variables. So the number of variables is not excessive. But as computer scientists know, these kinds of optimizations can be very difficult in principle - it is closely related to the Boolean Satisfiability (SAT) problem, one of the foundational problems in computer science. Fortunately, these problems are often tractable in practice. The field of Operational Research (OR) has made a lot of progress in finding heuristics and inference techniques that allow a "typical" SAT problem, such as the one we deal with in this scheduling task, to be solved easily.