Monday, September 06, 2010

Hydrogeology for Dummies

A running theme of this blog involves the reduction of seemingly complex behaviors into simple mathematical formulations. It remains a bit of a mystery to me why in many situations that no one has either (a) done this work on their own or (b) uncovered the work of someone else who has done the simplifying analysis years ago.

The majority of scientists practicing mainstream research have furthered the cause by following the lead of others who go down blind alleys and over-complicate the analysis. I suspect that a few complicate matters intentionally, as it demonstrates to other scientists their intellectual prowess. In certain cases, creating a private world of intricate analysis acts as a kind of moat around which they can fortify their specialty discipline.

Of course, this doesn't happen universally. Certainly we run across many scientific and engineering subdisciplines that have gone through years of scrubbing. In these cases, the most salient and simple analyses have emerged and stood the test of time. They often share the same traits of elegance and crystalline transparency so that we can use their patterns to understand the world without a lot of extra effort. To me, that seems a reasonable goal to strive for.

In this post, I will go through the derivation of what I consider a very overlooked and simple argument having to do with the transport of materials in porous media -- much as what you would find in tracing a contaminant though a groundwater basin. Or what may happen if you frac for natural gas and open up new pathways to a drinking water aquifer. Or how oil will migrate to a reservoir over time, feeding the production output of a stripper well for years. Or what happens if you spill oil in a waterway.

Unfortunately, when you pose this kind of problem to a research geologist or hydrologist, you will have to prepare for an onslaught of ornate misdirection. They will either derive some hideous numerical model or possibly run a piece of commercial software. Apparently, they will never resort to plain logic and elementary first-principles considerations.

The Problem

1. Consider a contaminant that enters an aquifer in a single dose
2. Predict how long it will take to pass by a downstream location
3. How do you solve this problem?

A large scale experiment typically looks like this scenario:


And you get a result that looks like the following figure. Intuitively, one would expect that the concentrated dose will disperse as it travels downstream and that the original concentration will spread out in time. The red curve that goes through the data gives you a feel for what I will derive via a simple model.

As a main premise, I assume that disorder plays a big role in providing a variety of pathways from source to sink. One can imagine that some paths might occur on the main waterway, providing a maximum speed or path of least resistance. Other paths may follow obstructions or diversions which will either slow down or speed up the flow from the main path.

The main path has a mean velocity v0 and the other paths have probabilities that range below this, with some mean deviation vm from v0. A distribution that maximizes entropy while holding to these two minimal constraints looks like the following graph.

Figure 1: MaxEnt velocity distribution for absolute mean deviation

This illustrates simple dispersion. For this post we won't even consider diffusion, which although important may in fact act as only a second-order effect depending on the speed of the main flow.

The calculation of downstream concentration, n(x,t), drops out of the Fokker-Planck equation if we ignore diffusion. Note the delta function, δ(x-vt), which describes a traveling pulse for each velocity component.
n(x,t) = ∫ p(v) δ(x-vt) dv
Next we apply the Maximum Entropy Principle to generate a velocity distribution as shown in the Figure 1:
p(v) = 1/vm exp(-|v-vo|/vm)
No other distribution has a higher entropy given that mean and an absolute deviation from the mean, so it ranks as the least biased estimator for that set of constraints. (Note that this does not describe the normal or Gaussian distribution as that requires a second-moment, i.e variance, constraint. It turns out that the mean deviation distribution, also known as the Laplace, is actually a smeared Gaussian where we have MaxEnt uncertainty in σ-squared. So Laplace entropy is higher than the Gaussian entropy)

We can trivially solve the integral to generate a concentration at some downstream location x (forget about adding extra dimensions as a one-dimensional result should suffice).
n(x,t) = 1/(vmt) exp(-|x/(vmt)-v0/vm|)
Let's see how this works in practice.

I pulled data from a pair of papers from 2008, "Non-Fickian dispersion in porous media", T Le Borgne, P Gouze, et al. The scientists created a carefully controlled experiment, which relied on a customized apparatus for making precise measurements of the contaminant, a flourescent dye called uranine. The value of this particular experiment lies in the large dynamic range of the resultant data. The concentration runs over 4-orders of magnitude and the time scale 2-orders. Their own model, although generating a good fit to the data, needed a numerical calculation to solve, violating my assertion that we can model via simpler mechanisms.

The following figure allows for the wide dynamic range by plotting the concentration (also known as a breakthrough curve) on a log-log scale. The red triangles fit the Maximum Entropy dispersion model, n(x,t), for a fixed value of x and a value of vm/v0 = 0.18. By inverting the concentration we can get the probability distribution of velocities in the bottom figure; on a semi-log plot a symmetric two-sided exponential looks like a perfect isosceles triangle. Based on the outstanding fit and symmetric distribution I find it blatantly obvious that entropic mechanisms generate the dispersion observed. You won't get this parsimonious a fit from such a simple model -- with essentially a single parameter vm/v0 -- unless it has some real merit.

Figure 2: Breakthrough curve (top) and
measured velocity distribution (bottom)
for flourescent dye tracer experiment.

I would suggest that any further modeling of these kinds of porous structures makes little sense since we have essentially proved that the multitude of the pathways maximize entropy and thus maximized the disorder of the system. In other words, you could not model a more complex system given those constraints if you tried. Nature will always win out with entropy in its back pocket.

The simplicity of the model also points out how readily fat-tail effects emerge from entropic disorder. The power law drop-off obeys a 1/time behavior that certainly has consequences in terms of how long a contaminant will remain in a groundwater basin. Velocity dispersion with a mean MaxEnt constraint will always lead to a power-law drop-off in time (see more here).

See also these posts:
  1. http://mobjectivist.blogspot.com/2009/06/dispersive-transport.html
  2. http://mobjectivist.blogspot.com/2010/05/characterizing-mobility-in-disordered.html
  3. http://mobjectivist.blogspot.com/2010/05/fokker-planck-for-disordered-systems.html
The hydologists and geologists who ignore entropy in favor of some other fancy model do so based on their own stubborness or ignorance. I have observed the practice of making things too complicated runs rampant among geologists and it really strikes me as kind of sad. We have hydrogeologist hacks like Steven Gorelick writing cornucopian books diminishing the significance of peak oil, when they can't even do the science of their own discipline correctly.

Friday, August 20, 2010

Tasseography

Oil Watch Monthly

Because of the magnified nature of the production scale I find it interesting to place the data on the real scale, which shows the zeros and the full temporal range. See the short black segment in the following figure, which signifies the range reported on TOD.

I don't really understand this infatuation with what I consider noise riding on top of the more important overall scaled profile. Readers must feel a need to see this magnified view which I don't quite grasp.

Is it because people have become accustomed to using the information for futures trading or anticipating the stock market? I presume that every little glitch provides a chance to make some money.

Or do we suffer from climate change envy where temperature trends get studied to death? That works in a different context because temperatures normally occupy a narrow range and the important signal can get buried in the measurement noise.

Or do people want to anticipate seeing that sudden, precipitous drop that will signal us going over the cliff?

More likely the answer is that we continue to plot the magnified view because we can and it gives us a strawman to argue back and forth over. The term tasseography describes this behavior.

Noise can tell us something but it to first-order it really only tells us what we already know. The fewer the number of independent measurements or actors in the market, the greater the noise and fluctuations.

Thursday, July 01, 2010

GOM Maximum Production Rate and Macondo

I did some analysis based on Berman's post from a few days ago:
(Estimated Oil Flow Rates From the BP Mississippi Canyon Block 252 “Macondo” Well)

I think he messed up the statistics because of his use of a truncated data set from the MMS and the log-normal distribution he used.

I wasn't sure exactly how he got his data but I essentially had to screen scrape the data off of about 18 PDF files giving the Maximum Production Rate (MPR) going back to 1975: http://www.gomr.mms.gov/homepg/pubinfo/repcat/product/MPR.html

I plotted the results histogram against a model of dispersive aggregation for reservoir sizes. The maximum rate is then a simple proportional draw-down from the reservoir size. Bigger reservoirs have a higher rate and smaller reservoirs have a smaller rate -- nothing to argue about here as it is a pretty safe approximation. The way you read this histogram is that the flat regions have the highest frequency.


The integrated underneath the two curves is equal and about 16.5 million barrels per day peak. Don't confuse this with any rate attainable from the GOM; it is high because it sums up the peaks from a span of years. The median value is 200 barrels per day.

The interesting point in the curve is that the model predicts a higher peak rate for the largest reservoirs, the curve goes off the graph to above 400,000 barrels per day. Now, I would think that the operators would never try to have that throughput from a single well. So what do they do? Of course they split it into several wells to extract the maximum amount from that reservoir and essentially throttle that from an individual well.

Since the total amount is conserved between the two curves, the bulge that you see in the data is the extra wells drilled to make up for the excess. My model is totally based on the principle of Maximum Entropy applied to reservoir sizing, and the reordering of the rank histogram is caused by artificial constraints set by human intervention. Notice that all the small reservoirs effectively require no throttling.

The point of this comment is that working wells are likely throttled but the Macondo could conceivably be higher than the maximum of 50,000 barrels per day that Berman suggested. The operators have no way of throttling it until the relief wells are put in place. Of course this kind of throughput is very rare, as at the most a couple of dozen out of 10,000 reservoirs will get this big and generate this potential, but this is the way that nature operates, a big fat-tail effect.

Saturday, June 19, 2010

Petroleum Engineering

With all the discussion on the Gulf Oil disaster going on, lots of petroleum engineers and others from the oil industry have pitched in with their opinions. In which case we can see exactly what they think of their profession.

One commenter, an authority on reservoir engineering apparently had this to say about Peak Oil:
We understand how our business works, certainly. Guys like us, (those IN THE KNOW) have been declaring the end of oil since at least 1886. In Pittsburgh to be specific. Can't say we didn't give the rest of you noobs plenty of warning.
So let me understand this statement. Oil industry types apparently have always known that the end of oil would occur since day one. I wonder why no one thought to just ask them? How did we miss that one?

This same fellow has huge problems with my analysis, because he thinks that what I do amounts to "curve fitting".
I mean seriously, who else would confuse curve fitting with knowledge?
In truth, most of the forecasters who point to continually increasing oil production well into the future base their projections on very little real knowledge. They actually practice curve fitting, i.e. fitting a curve to the production level that we need, because they have no other justification for a realistic outlook.

Bayesian analysis works by using past knowledge to predict future outcomes. We have so much knowledge about previous discoveries, reserve growth mechanisms, and extraction rates that our ability to predict should work very effectively ... if we would just start universally using this kind of approach. The other benefit is that the analysis keeps on getting better and better with time due to the Bayesian updating process. The mathematician Laplace first applied this powerful mode of probabilistic reasoning in the late 1700's to real problems, but we still have holdouts in various disciplines. To top it off, if you have a real model underneath the knowledge, it makes the forecasting that much more powerful.
Let them get through diffy-q, I suppose the only other gang besides engineers forced through that one are the more mathematically inclined....and they are mostly jealous because their theoretical skills don't translate into income very well.
Common knowledge in college that students that went into geology, civil, and petroleum engineering didn't want to get stick in a desk job. Lots of them could not imagine being sedentary for 8 hours a day.

Wednesday, June 16, 2010

Hubbert peak in Five Easy Pieces

Based on the increase in spill rate from the leaking Gulf of Mexico oil well, HO at TheOilDrum.com suggested a potential explanation. His post essentially argued that sand particles acting as a strong abrasive driven along by the already high velocity stream of escaping oil leads to increasing in the channeling and thus an even faster leak rate.

HO described a process known as CHOPS (Cold Heavy Oil Production with Sand) which can enlarge a well's streaming throughput by promoting the formation of heavily eroded channels. The TOD post provided the following picture of the possible outcome of the behavior.
Note that the lower curve shows the typical output from a throttled flow. Above that curve, the modulated line shows the results of an accelerated extraction -- note that a peak actually appears which pinpoints the maximum flow rate. In terms of the oil spill, we don't want this behavior because it gives us less time to fix or relieve the problem well. Yet, ordinarily we want this same behavior -- that of fast extraction -- in practical situations because we want and need the oil right now! (so that oil companies can make money, of course)

Which leads me to formulating the following very simple but physically correct model of Hubbert's Peak. You won't find this anywhere else, because this derivation does not jive with how geologists think about oil extraction. They get many of the pieces but they never put them all together.

I will offer up a derivation for this behavior leading to a Hubbert Peak in 5 easy pieces.

Piece 1. The standard assumption of draw-down from a reservoir results in an exponential decline over time. You can consider that the exponential shape results from a law of diminishing returns; in that a constant amount proportional to the remainder draws down per unit time. Or you can say that a maximum entropy range of extraction rates gets applied to the volume. A proportional extraction rate that we call R defines the mean and U0 is the reservoir size. U(t) gives us the cumulative reserve.
U(t) = U0*exp(-R*t)


Piece 2. Next, we realize that we have uncertainty over the size of the reservoir; the U0 we have defined actually only serves as an estimate of the size. This means we have an uncertainty over the rate of proportional extraction as well. This turns into a form of hyperbolic discounting and the cumulative draw-down actually looks like this.
U(t) = U0 / (1+R*t)


Note the fat-tail.

Piece 3. Next we assert that the constant but uncertain proportional extraction rate undergoes an acceleration starting from the original value, R(t) = R0 + k*t. This acceleration equates to Newton's law, first-order with time. Then the instantaneous absolute rate of extraction from the remaining reservoir looks like:
RateOfExtraction(t) = -dU(t)/dt = U0*(R0 + k*t)/(1+R0*t+k*t2/2)2
For R0=0.5 and k=2, it results in this shape



This curve we can scale and overlay on top of the CHOPS curve to validate our thought process.



Piece 4. Over a larger set of reservoirs that experience a technical improvement over time, we can assume that the proportional extraction rate can accelerate even more strongly over time, R(t)=C*exp(k*t). This gives us a Moore's law form of acceleration, doubling every set number of years. Then
RateOfExtraction(t) = -dU(t)/dt = U0 * R(t) / (1+integral(R(t)dt))2

= U0*C*exp(k*t)/(1+C/k*(exp(k*t)-1))2
For a small starting rate, the acceleration further accentuates the subtle peak that we observe in piece 3 and it turns into a full-fledged symmetric peak as shown in the next figure:



Piece 5. Congratulations. You haven't broken any rules and you have just derived the famed Hubbert Peak, also known as the Logistic Sigmoid function.


Some Backstory
An alternate derivation exists for the corresponding discovery peak, which I call Dispersive Discovery. There, the uncertainty involves how much volume gets explored and at what rate, otherwise the math turns out exactly the same. Both derivations result from an assumed finite constraint but uncertainty in both rates and subvolumes. The only problem with using the Hubbert peak derivation for extraction is that it premises that each extraction rate started at the same time (globally this would be 1858). We know that this has not happened for global production, as extraction can only start after a discovery, and then some variable hold time. By using dispersive discovery, we get a larger spread in start years, and then The Oil Shock model generates the extraction curve. In general, if the discovery peak precedes the oil production peak by a number of years, I would use Dispersive Discovery, but if the two coincide, then extraction tracks discovery and it doesn't really matter how you interpret the rates. This explains why this particular derivation works well for more localized production areas that have seen significant technology changes. In contrast, the technology of discovery has undergone tremendous technology changes over the years, so that dispersive discovery works very well in terms of global modeling. This is actually not much of a caveat, as the more ways that you can find the same result, the more confidence you have that you have remained on the right track.

The current derivation also points out the huge hole in the technique known as Hubbert Linearization (HL). As defined, HL derives from the observation that
dU(t)/dt = U(t)*(U0-U(t))
Yet this only works for the one case where we can define R(t) as an exponential function, that of piece 4. The formula does not work for either piece 1, 2, or 3. Therefore, HL only serves as a curious mathematical identity for that one exponential case, which we know does not always occur.

The actual "WebHub" Linearization takes the following form:
dU(t)/dt = -U0 * R(t) / (1+integral(R(t)dt))2
This may not prove as handy as HL perhaps, but it has the benefit of correctness, and it works well for certain cases.

Like me, Robert Rapier has railed against the inadequacy of HL and this may take up the slack.

Monday, June 14, 2010

GOM Reservoir Size Distributions

Question:

Permalink | Subthread | Parent | Parent subthread | Comments top

I have heard many unofficial estimates of the magnitude of oil in this formation... 2nd largest in America, 2nd largest in the world...

Does anyone have a credible estimate on the formation reserves?

Some historical data available from the MMS.
http://www.gomr.mms.gov/PDFs/2009/2009-064.pdf
On the basis of proved oil, for 8,014 proved undersaturated oil reservoirs, the median is 0.3 MMbbl, the mean is 1.8 MMbbl.

Peak Oil theory (Entropic Dispersive Aggregation) says the cumulative size distribution of reservoirs (ranked small to large) goes as P(Size)=1/(1+0.3/Size) if we assume a median of 0.3. It doesn't quite follow this exactly because infinite sized reservoirs can not exist.

If you want the raw data it is here:
file:///G:/RE/Shared/EOGR%20Report/2008-034%20Estimated%20Oil%20and%20Ga...

Sorry, that was a joke, the MMS puts the information on a public web server, and the data is retrieved as a local filesystem URL?
http://www.gomr.mms.gov/homepg/pubinfo/freeasci/geologic/estimated2006.html

I placed whatever data I could get into Google Docs, and placed theory next to it.




The MMS is to be split into 3 agencies apparently. Throughout their history, they failed in doing any kind of useful depletion analysis in the GOM. Anyone can collect data; interpreting it is the challenging part.

Saturday, June 12, 2010

The Mentaculus

I saw the Coen brothers movie "A Serious Man" a few months ago. A definite period piece from the 1960's, it contrasted two scientists, one an academic and one a hapless amateur. The main protagonist, Larry Gopnick, a physics professor at what looks like a small liberal arts school in the Twin Cities (Macalester, Hamline maybe?), spends time teaching his students what look like elaborate mathematical derivations on a huge chalkboard. He has trouble dealing with some of his students on occasion:
Clive Park: Yes, but this is not just. I was unaware to be examined on the mathematics.
Larry Gopnik: Well, you can't do physics without mathematics, really, can you?
Clive Park: If I receive failing grade I lose my scholarship, and feel shame. I understand the physics. I understand the dead cat.
Larry Gopnik: You understand the dead cat? But... you... you can't really understand the physics without understanding the math. The math tells how it really works. That's the real thing; the stories I give you in class are just illustrative; they're like, fables, say, to help give you a picture. An imperfect model. I mean - even I don't understand the dead cat. The math is how it really works.
His academic colleagues want Professor Gopnick to publish articles at some point (with the implicit threat of not getting tenure). Gopnick's main problem lies in his rationality:
But his rigid framing of a cause-and-effect universe makes him indignant about lack of apparent cause ...
Gopnick's brother, the minor character of Uncle Arthur, takes the role of an almost savant numerologist, busy at work on a treatise called The Mentaculus. Filled with dense illustrations and symbology, it apparently functions as a "probability map" in what appears to spell out a Theory of Everything. It also apparently works to some extent:
We might guess that it makes no sense, but Arthur's "system" apparently "works" as intended, and he applies it to winning at back room card games.
Based on the events that eventually transpire, the theme of the movie essentially says that if you seek rationality, you will ultimately only land on random chance.

I consider myself a "serious man" as well. But do I have a variation of The Mentaculous buried in the contents of this blog?

I tried to make a probability map of all the applications and blog links that I have worked on relating to what I call entropic dispersion in the following table [full HTML]:



The math is how it really works. Perhaps I should publish. Yet blogging is too much fun. Perhaps I need to take a canoe trip.



Good reads describing The Mentaculus of probability and statistics
  1. "Dawning of the Age of Stochasticity", David Mumford
    From its shady beginnings devising gambling strategies and counting corpses in medieval London, probability theory and statistical inference now emerge as better foundations for scientific models, especially those of the process of thinking and as essential ingredients of theoretical mathematics, even the foundations of mathematics itself.
  2. "Probability Theory: The Logic of Science", Edwin T. Jaynes

    Our theme is simply: probability theory as extended logic. The ‘new’ perception amounts to the recognition that the mathematical rules of probability theory are not merely rules for calculating frequencies of ‘random variables'; they are also the unique consistent rules for conducting inference(i.e. plausible reasoning) of any kind. and we shall apply them in full generality to that end.

  3. "On Thinking Probabilistically", M.E. McIntyre
  4. "The Black Swan" and "Fooled by Chance", N.N. Taleb