[[ Check out my Wordpress blog Context/Earth for environmental and energy topics tied together in a semantic web framework ]]

Saturday, January 07, 2006

Monte Carlo Discoveries

I learned moons ago in engineering school that you should not fear noise. Noise can actually tell you a lot about the underlying physical character of a system under study. I started thinking about this again because of the historically noisy oil discovery curves that get published. This chart of global discoveries appears unfiltered:

This chart I used in the past has a 3-year moving average:

Some say that the discovery curves approximate a bell curve underneath the noise but I would differ with that assertion as the noise still exists even with a moving average applied. From the Schopper's article above:
"Pearson's r" test found no correlation between oil discoveries from one year to the next, i.e. discoveries appear to be random.
The fluctuations become very apparent because of the limited number of discoveries we have had in a finite amount of time. Laherrere estimates that worldwide we have had on the order of 10,000 crude oil discoveries. Pepper this over a range of 100 years and you get a relatively small sample size to deal with per year. This small number over a span of <100 years essentially gives rise to the relatively big yearly fluctuations. Making it even worse, we still have to consider the distribution of resorvoir sizes; anything that shows a large variance in sizes (i.e. a spread) will lead to larger fluctuations.

The reservoir size distribution seems to follow a log-normal function, which has the nice property of preventing negative sizes by transforming the independent variable by its logarithm (i.e. logs of the values follow a normal distribution). This pattern also seems to work for natural gas reservoirs:
Lognormal distributions -- a method based on the observation that, in large well-explored basins, field-size distributions approximate to a lognormal distribution. The method is most reliable with large datasets, i.e., in mature basins, and may be unreliable with small datasets.
As the variance tightens around the mean, the shape of the curve peaks away from zero. But importantly, a large variance allows the larger-than-average sizes (the "super-giants") to appear.

The physical basis for a peaked distribution (away from truly small sizes) likely has to to with coalescents and agglomeration of deposits. Much like cloud droplet and aerosol particulate distribution (which also show a definite peak in average size due to coalescence), oil deposits have a similarity in structure if not scale that we can likely trace to fundamental processes.

With that in mind and spurred on by comments in a recent TOD post:
The area under the complete discoveries curve must equal the area under the eventually completed global production curve, whatever it's math description - oil discovered must equal oil produced. The discovery process is controlled heavily by the math laws of probability with the bigger, easy-to-find pools of oil found first. Resource discoveries fall on bell curves too. Deffeyes makes the point that even with changing technology, this is the way discoveries play out. The global discoveries curve peaked in the mid 60s and, despite the immense tech revolution since then, the charted yearly discoveries have formed a pretty nice fit to a Gaussian bell curve, particularly if they are "smoothed" by grouping into 5 year bars in a bar graph.
I decided to take a shot at running a Monte Carlo analysis that shows the natural statistical fluctuations which occur in yearly discoveries. This data comes from several Monte Carlo trial runs of 10,000 samples with a log mean of 16 (corresponding to 9 million barrel discovery), and a log standard deviation (corresponding to 0.73 million barrel on the low side and 108 million on the high side). I then put a "gold rush" mentality on the frequency of discovery strikes; this essentially started with 8 strikes per year and rising to 280 strikes per year at the peak.

The first chart shows a typical sample generation, and the rest generate the discovery curves via the application of a steadily rising and then falling yearly accumulation factor; i.e. without the noise it would look like a triangular curve.

The main thing to note relates to the essential noise characteristic in the system. The fluctuation excursions fairly well match that of the real data (see the first diagram at the top of this post), with the occasional Ghawar super-giant showing up in the simulations, at about the rate expected for a log-normal distribution. But the truly significant observation relates to the disappearance of the noise on the downslope, in particular look at the noise after about 1980.

Remember what I said initially about noise telling us something? The fact that the noise starts to disappear should make us worry. That noise-free downslope tells us that we have pretty effectively mined the giants and super-giants out there and that oil exploration has resorted to back-up strategies and a second pass over already explored territory or to more difficult regions that have a tighter distribution of field sizes.

Contrary to the TOD commenter, I wouldn't quite say that the biggest fields get discovered first (Schoppers also sees no correlation), only that they have a higher probability cross-section which overcomes their naturally lower frequency of occurrence. The big ones may actually be found later because, over time, more resources get applied to exploration (increase in number of darts thrown at the dart board). And then eventually the resources get applied to more difficult exploration avenues as that dries up. That basically accounts for the noisy rise and noisy fall, until the noise disappears.

--

So we can largely account for the noise. The real smoothing process comes about when we apply extraction to these discoveries, essentially dragging the data through several transfer functions that extracts at rates proportional to what is left. This does not result in a logistic curve, but something more closely resembling the convolution of the discovery data with a gamma distribution. Which leads us full circle to the basis for my oil shock model.

Update: A thread over at PeakOil.com has the optimistic yet open-minded RockDoc commenting:
Remember that the Megagiant field size sits on the 99 percentile of world field size distribution….meaning that the chance of finding another is pretty slim.
As to the likelihood of finding more of them diminishing every day….that is only true if exploration efforts in areas where they are most likely to be found has been aggressive.
The RockDoc, an industry insider, promises to provide some data after I maniacally ask "I propose that rockdoc volunteer what he thinks is the global log-normal distribution of discovery sizes".
I think I can do that...I have a nifty program that will plot out field size on log normal probability paper. May take some time to dig up the field sizes though (I think I have it up to 2003 but may not have the 2004 data yet)....I'll first check to see what is in print already. There are some problems in doing this though...as an example Ghawar is often treated as being one big accumulation when in fact there are several distinct pools...hopefully that will be lost in the noise.
We'll see what he comes up with. He better get it out quick before every single internet transmission gets filtered by corporate legal departments courtesy of BushCo.

Thanks to Aldert Van Der Ziel, "Professor der Noise", who I had the privilege to study under in grad school.