[Originally published 09/27/2018]
I wanted to write a quick article on distributing workloads as a sort of “white paper” if you will to illustrate the overall power of distributed approaches and as an easy indicator on the classes of problems that lend themselves well to distribution (and which ones may not be as amenable to the approach).
Also I wanted to tackle the idea that no matter how efficient or powerful systems might be there is always waste and that the waste costs can become significant even when other aspects are considered cost free.
I will illustrate a simple example and the complexities associated with scaling (complexity in terms of computational complexity). For clarity, I will assume that analogous operations take roughly the same amount of time, meaning that I will assume that simple operations (*, /, +) take roughly the same amount of computational effort. This assumption does not hold in reality (but more on this below), but for the purposes of our examples, it should suffice. Throughout this analysis, I will refer to the time necessary for a mathematical computation to be 1▲ (read as: one-delta).
What is a delta?
Delta is a very popular cross-discipline notation which corresponds to a change in time between an initial state and a new state at a later point in time. For readers familiar with physics and calculus this notation will not be a new concept as it occurs in the all important difference quotient equation:
In computational engineering 1 ▲ is utilized to indicate a generic unit of time (kind of a generic clock pulse) it takes to complete a required calculation. From a theoretical perspective all basic computations can be completed in 2▲ with proper hardware. For the sake of utility, I will simply refer to it as 1▲ in the examples I present (you can imagine it as 2▲ if you wish, but it has no real impact on the complexity).
Example 1: Adding Numbers
Adding numbers seems simple enough so it lends itself well to our analysis. As simple as it may seem, it easy to make it more difficult, adding 10 numbers together (we’ll assume that the numbers are greater than 5 and no more than 8 digits in length) should not take the average person all that long. Assuming that 1▲ in this case is roughly 10 seconds (generous), you could imagine that a person could complete this task in roughly 1.5 minutes. Feel free to try it if you like, I have provided a simple python example to generate a list for you.
>>> import random
>>> def number_example(count, min, max):
... return [ random.randint(min, max) for x in range(count) ]
>>> number_example(10, 10000, 99999999)
[97201715, 84920685, 32364643, 59542870, 21273660, 73307523, 27820221, 89226944, 91377399, 39764286]
Now you may find that if you perform your additions sequentially that the order of magnitude of the numbers increases quickly. This may make the later operations take longer than the earlier ones (this is not the case with computers running our theoretically perfect hardware, so we can discount for this). You may also notice that you can likely increase the speed of the tasks by dividing it into two sets of five, adding them first, and then adding the two results into the final sum. Or likewise you can add pairs to generate a set of five and then add pairs again (you will have a number hanging off the end at this point), finally you can add the leftover number into the running sum to calculate the answer.
This particular problem is classified as “Embarrassingly Parallel”. It should be obvious that there are many ways to attack the solution even when using a single agent for the following reasons:
- There is no communication (or information) required to be shared by the individual elements during processing
- There is no partitioning scheme that would impact the overall answer
- The operation of addition is both associative and commutative which means that the order in which the operations are performed or the order of the operands have no impact on the solution
The reader is cautioned to not assume that all summation based operations are necessarily as well behaved. For instance taking an average of a large group of numbers is not always straightforward in this sense (the partitioning matters). Try subdividing a group of numbers into a smaller workload to take their average and you will find that unless you can equally partition the set (and resultant subsets) into equal sized sets you will not be able to do it. Try it with 11 values (or any prime number of values) if you don’t believe me.
Back to our initial example. If we cannot up the difficulty to make it intractable for humans to perform, we can certainly up the magnitude to an unpleasant level. If I asked you to do the same with 1,000 numbers it would certainly be more of a challenge. If 10 numbers took 10▲ then 1,000 should take 1,000▲. While this might bore you, it is not impossible (it would probably take you about 3 hours).
But as we can see all I need to do is up an order of magnitude and the operations (even though intrinsically simple) become onerous. For every 1,000 elements basically add another 3 hours to your timecard, so 10,000 numbers would take 30 hours (this assumes that you will also not make a single mistake, sleep, eat, and so on).
Let’s Bring Our Friends to This Party
Now let’s assume that you have some friends who are good at addition (easy enough) and love to be bored (or paid to do boring things as a thought exercise). If 1,000 numbers take 1,000▲ then more hands should certainly make lighter work, but how much lighter? For simplicity’s sake we will assume that we have 10 people (yourself and 9 friends) and that these people are working at the same normalized speed of 1▲ per operation.
These easiest partitioning scheme would be to divide the full set into 10 equal sets of 100 elements each. Have each person add their elements which should take 100▲ (remember they are adding in parallel and at the same rate per operation). After they are done you should be left with 10 numbers which will require an additional 10▲ to sum for the final result. So in contrast to our initial workload of one person doing everything, dividing the load among 10 people will yield faster results:
- 1,000 numbers added sequentially -> 1,000▲
- 1,000 numbers added in parallel by ten people -> 100▲ + 10▲ -> 110▲
Well that seems much better, however we must keep in mind that we are discounting several important factors here:
- Your friends are not going to work for free (they will likely expect a meal for their trouble as computers expect power)
- Even in embarrassingly parallel situations there will be a transmission/operational cost (in this case the cost is in the partitioning of the initial set and in the transmission of each workers answers back to the partitioner for final aggregation)
- Waste (in this example for instance nine people sit idle while the final summation is done implying a waste of 90▲ of calculation time)
We can increase our number of “friends” to make our calculations faster but we need to also take a look at other factors. If we instead invited 99 friends to the party (I throw great parties by the way), we could trim the overall time down significantly. We will still have to aggregate the workers results at the end so there will be a set cost to pay for that item which we will never be able to truly escape, but let us take a deeper look at the other factors as well. For ease of illustration (also the reason I’m keeping everything to powers of 10) let’s do this in multiple passes and see the results after each pass.
- Partition dataset into 100 subsets of 10 and distribute to yourself and 99 friends (Assume that this is no cost and no waste operation for simplicity)
- You and each friend spends 10▲ aggregating
- You have a set of 100 numbers to add after this pass
- Partition dataset into 10 subsets of 10 and distribute to yourself and 9 of your 99 friends (Same assumptions as above)
- You and 9 friends spends 10▲ aggregating
- You have a set of 10 numbers to add after this pass
- You spend 10▲ aggregating
- You have your aggregation complete in 30▲
Well that is certainly impressive. More hands make for less work, but do they make for less waste? Let us take a look at the cost before we get too excited about the speed increase. Even with the generous discounts we give ourselves in these examples we still cannot escape waste. We have shown above that dividing this problem among 10 people gives us about 90▲ worth of waste although it gives us an order of magnitude increase in the speed at which we can reach our goal. What is the impact of scaling up to 100 people on waste?
We can see that our theoretically perfect first pass generated no waste in ▲. But our second and final passes are not so economical. During Pass 2 for instance 90 people are waiting while 10 people are working leading to a waste of 900▲ (after the second pass we have already outpaced the waste that was generated by just 10 people earlier by a factor of 10). By the final pass, 99 people are waiting leading to an addition waste of 990▲. This implies that this scenario, while the overall answer is attained faster, the waste to achieve that answer is 1,890▲ (21 times the waste of using 10 people).
Is that amount of waste worth it for the nominal uptick in speed? The answer to that question is highly dependent on the value of what you are actually calculating to your business. But waste should always be taken into consideration in distribution of workloads.
So for reference here is the quick rundown on workers, time to completion and associated waste for the examples above:
- 1 Person | time: 1,000▲ | waste: none
- 10 People | time: 110▲ | waste: 90▲
- 100 People | time: 30▲ |waste: 1890▲
Thanks for reading
- Embarrassingly Parallel – https://en.wikipedia.org/wiki/Embarrassingly_parallel
- Intractability – https://en.wikipedia.org/wiki/Computational_complexity_theory#Intractability