[[ Check out my Wordpress blog Context/Earth for environmental and energy topics tied together in a semantic web framework ]]

Sunday, June 29, 2008

Numbers Don't Lie, Liars Don't Count

Nate Hagens posted the Dispersive Discovery derivation of the Logistic curve on TheOilDrum.com last week. Lots of good comments posted but one kind of got my goat because it seemed quite regressive.
"WHT analysis would have been important in the 1920-1930's but its 2008 now and they seems to have done quite well without the dispersive model."
This happens quite often when a new idea comes across the bow. The idea tends to get marginalized by trivializing the context. Consider that the Big Bang Theory seems a popular topic among scientific cosmologists and the Stephen Hawking readers. But some genius comes along and says "Why is the Big Bang important? That happened billions of years ago, and I can't buy gas for less than $4 a gallon today."

So the idea and its underlying worth becomes a matter of perspective. I figure that we still want to understand how we got ourselves into this mess.

Remember the George Monbiot quote: "Tell people something they know already and they will thank you for it. Tell them something new and they will hate you for it."

You would almost think that we have concern trolls that battle ideas via pure rhetoric without ever figuring to lift a pen and trying to contribute some sort of mathematical analysis (I know, how naive of me). Worse still, one of the most disturbing posts I have read recently came from Carl Pope 'Let Them Hate, So Long as They Fear' , about the anti-scientific views of the Bush administration. And now, all these scientific neotards have started to come out of the woodwork with the high gas prices. I recall doing a post several years ago where I monitored the Powerline blog, a notorious right-wing puppet site, for historical references to any kind of post relating to real oil depletion issues. Looking back at the reference table I generated, they did make 3 references to opening up the Arctic National Wildlife Refuge. So I now can see how the roots of the right-wing talking points develop. Of course, now the Powerline guys have become experts on the current situation, filling their blog with their "take" on the current oil situation. But we all know the PowerLiars couldn't count their way out of a paper sack.

Fellow Minnesotans, please vote for Al Franken. Likely the only senatorial candidate ever to achieve a perfect score (800) on the mathematics portion of the SAT, he understands this stuff and knows even more about the lying techniques of the 'minionist right. He basically wrote the book on it.

Sunday, June 08, 2008

Double Dispersive Discovery leads to the Sigmoid and Logistic

I believe I made a significant finding in regards to the Dispersive Discovery model. In its general form, keeping search growth constant, the dispersive part of the model produces a cumulative function that looks like this:
D(x) = x * (1-exp(-k/x))
The instantaneous curve generated by the derivative looks like
dD(x)/dx = c * (1-exp(-k/x)*(1+k/x))
Adding a growth term for x and we can get a family of curves for the derivative:

I generated this set of curves simply by applying growth terms of various powers, such as quadratic, cubic, etc, to replace x. No bones about it, I could have just as easily applied a positive exponential growth term here, and the characteristic peaked curve would result, with the strength of the peak directly related to the acceleration of the exponential growth. I noted that in an earlier post:
As for as other criticisms, I suppose one could question the actual relevance of a power-law growth as a driving function. In fact the formulation described here supports other growth laws, including monotonically increasing exponential growth.
Overall, the curves have some similarity to the Logistic sigmoid curve and its derivative, traditionally used to model the Hubbert peak. Yet it doesn't match the sigmoid because the equations obviously don't match -- not surprising since my model differs in its details from the Logistic heuristics.

However, and it starts to get really interesting now, I can add another level of dispersion to my model and see what happens to the result. I originally intended for the dispersion to only apply to the variable search rates occurring over different geographic areas of the world. But I hinted that we could extend it to other stochastic variables:
We have much greater uncertainties in the stochastic variables in the oil discovery problem, ranging from the uncertainty in the spread of search volumes to the spread in the amount of people/corporations involved in the search itself.
So I started with a spread in search rates given as an uncertainty in the searched volume swept, and locked down the total volume as the constant k=L0. Look at the following figure, which show several parts of the integration, and you can see that the uncertainties only reflect in the growth rates and not in the sub-volumes, which shows up as a clamped-asymptote below the cumulative asymptote:

I figured that adding uncertainty to this term would make the result more messy than I would like to see at this expository level. But in retrospect, I should have taken the extra step as it does give a very surprising result.

That extra step involves a simple integration of the constant k=L0 term as a stochastic variable over a damped exponential probability density function (PDF) given by p(L)=exp(-L/L0)/L0. This adds stochastic uncertainty to the total volume searched, or more precisely, uncertainty to the fixed sub-volumes searched, that when aggregated provide the total voluume.
D(x) = Integral [ x * (1-exp(-L/x))*exp(-L/L0)/L0 dL ]
This turns into a trivial analytical integration from L=0 to L=infinity. The result becomes the simple relation:
D(x) = 1/(1/L0 + 1/x)
Note that the exponential term from the original dispersive discovery function disappears. This occurs because of dimensional analysis: the dispersed rate stochastic variable in the denominator has an exponential PDF and the dispersed volume in the numerator has an exponential PDF; these essentially cancel each other after each gets integrated over the stochastic range.

In any case, the simple relationship that this gives, when inserted with an exponential growth term such as A*eB*t, results in the logistic sigmoid function:
D(t) = 1 / (1/L0 + 1/(A*eB*t))
I will make the next statement in as passive a voice as possible. This is the Holy Grail derivation of the Logistic curve.

Seriously, I don't think anyone has ever figured out how to derive the Logistic in such fundamental terms until now. The logistic has now transformed from a cheap heuristic into a model result. The fact that it builds on the Dispersive Discovery model gives us a deeper understanding of its origins. So whenever we see the logistic sigmoid used in a fit of the Hubbert curve we know that several preconditional premises must exist:
  1. It models a discovery profile.
  2. The search rates are dispersed via an exponential PDF
  3. The searched volume is dispersed via an exponential PDF
  4. The growth rate follows a positive exponential.
This finding now precludes other meaningless explanations for the Logistic curve's origin, including birth-death models, predator-prey models, and other ad-hoc carrying capacity derivations that other fields of scientific study have traditionally incorporated into their temporal dynamics theory. None of that matters, as the Logistic -- in terms of oil discovery -- simply models the stochastic effects of randomly searching an uncertain volume given an exponentially increasing average search rate. In the end, intuition has always told me this, and the math has served as a formal verification of my understanding. You have to shoot holes in the probability theory to counter the argument, which any good debunking needs to do.

As a very intriguing corollary to this finding, the fact that we can use a Logistic to model discovery means that we cannot use a Logistic to model production. I have no qualms with this turn of events as production comes about as a result of applying the Oil Shock model to discoveries, and this essentially shifts the discovery curve to the right in the timeline while maintaining most of its basic shape. (And as another bit of insight, consider the application of multiple Logistic curves to model complicated scenarios. The fact that I just integrated multiple stochastic volumes over a search space to derive the logistic raises questions about the validity of such an approach. This really needs a fundamental analysis as it would necessarily duplicate the integration I have already accomplished. Unfortunately, such misuse happens when a curve gets used as a heuristic, separated from its first principles derivation.)

In spite of such a surprising revelation, we can continue to use the Dispersive Discovery in its more general form to understand a variety of parametric models, which means that we should remember that the Logistic manifests itself from a specific instantiation of dispersive discovery. Good to have this chapter closed, as the origin of the Logistic had become a nagging obsession of mine over the past few years. I can basically put it to rest, which will maintain my sanity for awhile.

As a corollary, given the result:
D(x) = 1/(1/L0 + 1/x)
we can verify another type of Hubbert Linearization. Consider that the parameter x describes a constant growth situation. If we can plot cumulative discovered volume (D) against cumulative discoveries or depth (x), we should confirm the creaming curve heuristic. In other words, the factor L should remain invariant allowing us to linear regress a good estimate of ultimate volume :
L0 = 1/(1/D - 1/x)
It looks like this might arguably fit some curves better than previously shown.