Monte Carlo simulations

Traditionally, a business case includes a calculation of the costs and benefits based on best guesses. Sometimes, three calculations are made – the worst, the best, and the most likely scenarios. The problem with both these approaches is that neither hints to the actual probability of any scenario being more likely than another.

The solution is to use the Monte Carlo method[Wikipedia] to simulate large volumes of scenarios based on the same model, and exclude for example the 10% worst and 10% best outcomes, leaving you with an 80% probability than the actual outcome isn’t worse (or better) than the worst and best of the remaining scenarios.

The book How to Measure Anything: Finding the Value of Intangibles in Business[Amazon] by Douglas W. Hubbard describes the method in detail.

It’s possible to use a spreadsheet to build a calculation model, and most spreadsheet apps even has built-in functions that generate randomized distribution values. But spreadsheets are difficult to maintain. Once you’ve constructed the model, and start simulating – and want to adjust the model, it’s easy to break formulas.

What you need is a user-friendly web-based app like sim that helps you gradually build the model as your understanding of the costs and benefits deepens, as well as take care of the complex probability functions based on statistical distribution math.

Just as in the spreadsheet example above, the simulation model is composed by values that represent costs and benefits. Values can be be constants, ranges, or formulas. When it comes to ranges, statistical distribution functions are applied to random numbers, generating spreads of values that match different curves:

The normal – or ‘bell’ – distribution is the most commonly used. If you consider a likely sales price range between 10 and 20 USD, a normal distribution will generate most random values int the +/- 35% range around the mean value 15 (20 – 10). Much fewer values will fall outside this range.

This distribution can be used if you feel that the the minimal value is the most likely, but want to take into account the there is also a possibility that the maximum value may occur. This could be used for a parameter that represents an interest rate, which is already low, and with very little risk of increasing.

Use this distribution if you consider the maximum value to be the most likely, but still want to allow for less likely lower values. Again, consider a current very low interest rate, which you expect to increase. This distribution will then generate quite few low values near the defined minimum, and increasingly more values towards the defined maximum.

This distribution generates most values around the lower half of the range. Let’s say you’re considering spending 100 to 200 hours on building something, but want to spread to generated simulation scenarios as far as to 400 hours.

Use this distribution for ranges where values in the upper half of the range are considered most likely. An example could be that you expect the number of new users to be in the 3-400 range, but want to take into account the off-chance it turns out to be as low as 100.

Designing a webapp to build simulations requires a structure that supports several needs:

  • Users need to manage a whole catalog of models.
  • Users must be able to organize their models i a folder hierarchy.
  • Data structures must support the user interface components required to display and edit them.

To meet the design needs only three entity types are defined in the database (as tables). The entire hierarchal folder structure is serialized into a single JSON string in a field in the user entity class. Similarly, each simulation model is serialized into a JSON string in a field in the model entity. This is a viable optimization because models a small in memory size, and are always edited in their entirety.

Zzz