M O B J E C T I V I S T: Monte Carlo of Dispersive Discovery/Oil Shock model

To continue to foster openness¹ I decided to run through several Monte Carlo simulations of the Dispersive Discovery and Oil Shock model. I usually work the models out analytically because that gives the most probable outcome, but the Monte Carlo approach gives you some extra insight into how the statistics play out. In other words, each set of Monte Carlo runs gives a possible "alternate history" (AH)² describing the passage of the oil age.

Before I get into the description, the fact that we can have significantly different alternate histories has to do with the fat-tails of the reservoir sizing curve. The rank histogram of the world's large reservoirs suggests that we will likely find at most a couple super-giants nearing 100 GB in URR. Since these occur sporadically (as Taleb's "gray swans") yet have a significant impact on oil production, the Monte Carlo simulations should reflect the possibilities of super-giants occurring. So up to this point, we only have one alternate history to contend with, but the out years will likely show a variation from the expected analytical result due to the odd super-giant potentially still lurking out there.

The Simulation Code

Even though it computes much more slowly, I coded the Monte Carlo simulation using Python to take advantage of the direct connection to the Sage plotting library functions. The simulation code itself comes out very concisely as the random variates for dispersive aggregation (field sizes), dispersive discovery (date of discovery), and oil shock phases (latencies of development) don't require much more than a random number lookup and a simple transformation.

The model used corresponds closely to the analytical Dispersive Discovery/Oil Shock model I posted to TheOilDrum in late 2007. I did not retain the oil shock perturbations in the MC as the effects of the noise fluctuations in reservoir sizing can blur the distinction. So instead of the shocked curve to the right, we can analytically generate the unshocked curve below.

Figure 1 : Analytical Result of Dispersive Discovery plus Shock Model with a stable extraction rate (i.e. no shocks)

I can understand how some believe that that these models contain some mysterious complexity. Well, you can look at the following code and judge for yourself. I would only add that perhaps the complexity exists only in arriving at the right derivation; after that the math falls out very straightforwardly. Essentially I rely on maximum entropy principles to do the heavy lifting, and the complexity disappears as Gell-Mann had suggested (see last month''s Crude Complexity post on TOD).

Using the Sage software package, the following code gets invoked as:

env SEED=1 sage "filename".py

Changing the SEED parameter gives a set of different outcomes.


from math import exp, pow, log
from random import seed, random
import os

# Constant Parameters
C = 2.0                         # Characterstic size of reservoir(MB)
N = 6                           # Power-law exponent for dispersive discovery
K = 110.0                       # Power-law growth factor
F = 6.0                         # Mean Fallow time (years)
B = 8.0                         # Mean Build time (years)
M = 10.0                        # Mean Maturation time (years)
D = 0.04                        # Depletion rate from mature reserves (per year)
URR = 2800000.0                 # Ultimately Recoverable Reserves (MB)
MRS = 250000.0                  # Max reservoir size (MB)
NumR = 3                        # Average number rigs per reservoir
Total_Years = 250
Start_Year = 1858
End_Year = 2100
 
# Variables
R = 0.0                         # random number between 0 and 1
Year = 0.0                      # year of discovery variate
Size = 0.0                      # reservoir size variate
Cumulative = 0.0                # accumulated reservoir size
Total_N = 0                     # accumulated number of reservoirs

L = 10000                       # arrays for holding generated values
Discovery = [0.0]*L
Fallow = [0.0]*L
Build = [0.0]*L
Maturation = [0.0]*L
Production = [0.0]*L
 
Seed = int(os.getenv('SEED',1))  # Change random seed as environment variable
seed(Seed)
 
# MonteCarlo of Dispersive Discovery into Shock Model
#
while( Cumulative < URR ) :           # Generate variates until URR is reached
  R = random()
  Year = int(K * pow(R/(1.0-R),1/N))     # Sample dispersivediscovery year
  R = random()
    Size = C * R/(1.0+C/MRS-R)           # Sample dispersive aggregation reservoir size
    Cumulative += Size
    Discovery[Year] += Size              # Accumulate discoveries for that year
    Total_N += 1
    for I in range(1,NumR+1)             # Generate discrete random times for phases
         Y = float(Year)
         Y += -F*log(random())           # Shock model Fallow time
         Fallow[int(Y)] += Size/NumR
         Y += -B*log(random())           # Shock model Build time
         Build[int(Y)] += Size/NumR
         Y += -M*log(random())           # Shock model Maturation time
         Maturation[int(Y)] += Size/NumR
 
for I in range(0,Total_Years):           # Unshocked continuous extraction rate
    for J in range(I,Total_Years):       # ... inner convolution gives Production
         Production[J] += Maturation[I]*D*exp(-D*(J-I))
 
# Plotting the results using Sage software http://www.sagemath.org/
 
Prod = lambda t : Production[int(t)-Start_Year]
Disc = lambda t : Discovery[int(t)-Start_Year]
 
title = text("C=%g, G=(t/%g)^%d, D=%g\n#=%d,URR=%g" % \
                 (C,K,N,D,Total_N,URR),((End_Year+Start_Year)/2,MRS))
G = plot(Disc, (Start_Year,End_Year), thickness=1,rgbcolor=(0.0, 1.0, 0.0)) + \
       plot(Prod, (Start_Year,End_Year), thickness=1, rgbcolor=(1.0,0.0, 0.0) ) + title
G.show(xmin=Start_Year,xmax=End_Year,ymin=0,ymax=MRS,axes_labels=('Year', 'MB/Year'))
G.save('MC'+repr(Seed)+'.png')

The one caveat I have to this simulation has to do with the amount of smearing that goes on during the stages of the oil shock model. I chose to simulate each phase as a set of three segments, say 3 rigs per field or reservoir; so that 1/3 of each phase gets drawn as a random variate latency. This particular choice does have the effect of increasing the noise, but it doesn't really change the essential profile too much (for example, increasing this number to 20 reduced the standard deviation by two years).

The noisy curves in GREEN below represent discoveries per year, while the RED curves indicate production for that year. The noise in discoveries reflects the reality of the situation; compare the curves to the data points collected by Laherrere in the figure to the right (here 100 MB/day=36,500 MB/year).

The most frequently occurring of the curves show a peak after the year 2000 but significant numbers occur in the adjacent decades. So even with average sample sizes of over 100,000 reservoirs, a few super-giants strategically placed earlier in the timeline can actually shift the peak quite a bit. Chart #5 place peak closer to 1980, largely due to a set of 3 super-giants (all bigger than Ghawar) occurring before 1950. By comparison, Chart #7 has the super-giants occurring right before 1980 which pushes the peak to the latter part of the next decade.

Yet, even with all this variability, if you look at the production level in the out-year of 2100, given a constant extraction rate, all the charts show approximately the same production level at that future point in time. In understanding what causes this common quasi-asymptotic behavior, just remember that all the fluctuations have settled out due to the cumulative number of reservoirs contributing to the signal. As professional oil men Rockman and WestTexas at TOD have said, many of the oil reservoirs now in play will continue to produce, just at reduced levels.

1

Overall the size of the fluctuations agree with that observed for sporadic nature of the super-giants. If the assumption of around 130,000 reservoirs (averaging 20 MB each) holds over time, then we can expect at most one or two more super-giants left. Although impossible to verify, the current cumulative count of reservoirs world-wide stands at likely over 50,000. As a sanity check, I use the formula I originally derived here to estimate the URR from the dispersive aggregation parameters. This should match approximately 2,800,000 MB.

URR ~ MaxRank * C * (ln (L/C) - 1) = 130,000 * 2 * (ln(250,000/2)-1) = 2,791,000

I would consider all of these simulations conservative in the sense that the URR remains on the high side of projections. Add to this the fact that size of discovery has no dependence on time (i.e. we remove the bias of anticipating finding the largest reservoirs first). So even with the large amount of potential outcomes, the number of reservoirs discovered and produced so far point to a relatively small spread in peak dates. The figure below shows the results of running the Monte Carlo simulation 10,000 times and plotting a histogram of the peak dates. The standard deviation in this profile is 8.5 years and the mean/mode/median peak date of 2004 agrees with that of the analytical result shown in Figure 1, as that result gives 2004 as the precited peak date as well.

Even though a substantial spread exists in potential outcomes, we have to consider that most of these have occurred in the past and we should discount negative results in any future projection. In other words, since nearly half of those that show large variance in peak date have occurred in
the past, we can eliminate the possibility that an alternate history will put the actual peak much beyond the next decade. One can justify this argument by simply considering adding Bayesian priors and running the Monte Carlo from the current date. I believe that this spread in outcomes has probably contributed (along with unanticipated reserve growth) to the usual problem of jumping the gun at predicting a peak date.

To gain an appreciation of the number of reservoirs that played in the simulation, I Monte Carlo generated a few rank histograms shown below. According to the dispersive aggregation algorithm, one can see only a handful of super-giants occur out of the tens of thousands of natural reservoirs.

This next histogram resulted from approximately a 30% reduction in the number of reservoirs drawn. This better matches the current number of reservoirs at least 1000 MB in size (~130 according to Wikipedia's List of Oil Fields) and also the tally of known reservoirs (under 50,000 according to various sources). I expect the number of reservoirs to grow as the last geographic areas get exploited. So the rank histogram above gives an optimistic scenario for URR and the one below borders on the side of a pessimistic projection.

Click on the following thumbnails if you want to see other variations.

For comparison, the following source code generates the analytical result. Ignoring the perturbation makes it a little more consice than the full Oil Shock model. I coded this in Ruby.

# dd.rb -- Dispersive Discovery model

def dd(a,b,n)
 last = 0
 urr = a.to_f
 length = n.to_f
 k = b.to_f
 for i in 0..length do
   output = urr /(1+(110.0/i)**6.0)
   puts output-last
   last = output
 end
end

dd(ARGV[0].to_f, ARGV[1].to_f, ARGV[2].to_f)


# exp.rb -- exponential spread for generating oil shock convolutions

def exp(a, b)
 length = a.length
 temp = 0.0  
 for i in 0..length do
   output = (a[i].to_f + temp) * b *(1-b/2)
   temp = (a[i].to_f + temp) * (1.0 - b*(1-b/2))
   puts output
 end
end

exp(STDIN.readlines, ARGV[0].to_f)

I use a pipe approach to doing the Oil Shock Model convolution. Invoke this from a command line prompt.

ruby dd.rb 2800000 110 250 | ruby exp.rb 0.1666 | ruby exp.rb 0.125 |\
ruby exp.rb 0.1 | ruby exp.rb 0.04 > production.dat

footnotes

¹Climate science haters, pay attention! I don't get paid for doing this and you have no right to my work, yet here you have simulations served up on a silver platter. Knock yourselves out.

²This term comes from a popular genre of science fiction of which I have run into quite often.

3 Comments:

Professor Remigius said...: Just FYI with respect to speed, if you do this in a Sage notebook and put %cython at the top of the cell and then change some things (esp. loops) to take advantage of using C directly via Cython (this sounds paradoxical but isn't), you might find your speed woes disappear. And you still get pretty graphs. Also check out the interact functionality for seeing how parameters change things in (almost) real time.; 1:30 PM
Professor @whut said...: Thanks, I have an alternate version in a compiled language that I use to verify against. I used that version to do the 10,000 iterations. I can see what you mean, with the Python interpreter that would have taken many days to execute. Fortunately there isn't much code to manage so having alternate versions isn't a big deal.; 2:28 PM
Professor The Energy Standard team said...: Thank you very much both for your comments and the excellent article to begin with.

I have just DLed python after almost 10 years of no programming.

Time to get started again...; 3:30 AM