Let’s say you travel to office by a flexible means of transport, that lets you choose the route on a daily basis. The simplest example is you drive your own car. Assume there are three routes for you to take:
- Route A has a fixed travel time of 1 hour.
- Route B has a variable travel time. 90% of the time the route is free, and you reach in 40 minutes. But 10% of the time, the route gets congested, and it takes 90 minutes. This has an expected travel time of 45 minutes.
- Route C has an unknown travel time.
Which route do you take?
Well, it depends on the situation. If you simply want to minimize your expected travel time, pick route B. On the other hand, if you have a meeting to attend that’s starting in 75 minutes, you want to reach on time, so pick route A. This shows that we may not always want to maximize expected gain (or in this case, minimize expected cost), as measured directly by the variable under consideration, here travel time.
What about route C? We don’t know anything about it, so why pick it? The idea is that once in a while, you should try out new routes. It might turn out better than one or both of A and B. But you can’t pick route C when you have a meeting. At other times, with a small probability (say 5%), you should pick route C (or any route you haven’t tried before), and the rest of the time (95%), you should pick the best route you know. This is a good example of the explore-exploit tradeoff.