Sunday, June 7, 2009

More Raw Data--Indiana This Time.

I thought we would look at a problem seen between two Indiana towns,Columbus and Seymour. This data is from Dave's favorite site--he seems to have disappeared after I started this sequence of comparisons of data from his preferred data site, showing how abysmal the state of the raw data is in, and this data has been partially corrected. The data is from here

First, lets look at the full record. You can see that the two towns differ in temperature by half a degree a year, first being warmer in one town then the other. These towns, are separated by only 21 miles.

But one can also see that in 1992 and 1993 something strange happened to the temperature. For 2 years it was hotter, almost every day in Columbus. Was Columbus a living hell? Did the devil and his minions bring fire and brimstone to Columbus? No, it is just the normal crap of the raw data. but lest look at it in detail.

The red line is the the running average, a 30 day running average. It varies throughout the 2 years but it is almost always hotter in Columbus. Somebody was doing something to one of those temperature stations to make it hotter or cooler, and they did it for 2 years. No one knows what it was. But this is the quality of data that everyone is basing their hysteria of global warming upon.

Let's look at those 2 years in more detail. Here is the histogram of temperature differences for those two years. You can see it is anything BUT Gaussian,meaning that normal probabilistic statistics won't help it.

It is interesting that we US citizens love to think that we are the most advanced society on earth yet we can't measure temperature better than this.


  1. Interesting, as usual. When I took the data sets and plotted the histogram of the difference between the two stations I saw a narrow distribution (albeit with "heavy tails"), but the median difference was 0 degrees (the mean was 0.06degrees difference).

    And then when I did the histogram of the subset in the latter part I noted it had a median of 2 degrees. Regardless of it being non-normal, I can still measure the median.

    The 0 degrees difference between the two stations doesn't look too bad. The 2 degree difference over the course of a couple years is bothersome, but is it really all that bad?

    Measurements always have errors. Always have noise.

    One other thing, from what I've read, I don't think anyone is basing global climate change hypotheses on this data. Climatologists make models that don't use this data. Once the model is made they "hindcast" with continental scale gridded averages of the data. That way you kind of even out the noise and the outliers.

    The present example is a good one. If you were to take just Seymour and then try to draw significant results from just the raw data you might not be able to see a signal. But if you average it with a larger scale and you measure the _trends_ as opposed to just the raw data, then real trends may become visible.

    Some of these stations have been around since 1901 or earlier. We've been using meteorological data for a very long time. Long before anyone even thought about global climate change. This data, like all data, has problems. The more data you get the better the image is.

  2. Hi Hagiograph, I am enjoying your comments, you think about the data

    You ask "The 0 degrees difference between the two stations doesn't look too bad. The 2 degree difference over the course of a couple years is bothersome, but is it really all that bad?"

    If you run an experiment and for whatever reason, you have to move the raw answers by 2 of anything, that becomes the error bar for your experiment. Measuring global warming is basically a grand experiment in which some of the data has to be moved by not 2 but 10 degrees, but even if we ignore those other cities, then we are left with a 2 degree error in the raw data set.

    Now, what does that mean? It means that we have a case where the global temperature can't be measured to the precision they say it is being measured to. Let's look at an article from the American Meteorological society journal.

    "Furthermore, Gallo (2005) found microclimate-related differences exceeding 0.5 [deg] C in pairs of stations, differences that could not be explained by either latitude, elevation, instrumentation, observing practices, or quality of the siting." Thomas C. Peterson, "Examination of Potential Biases in Air Temperature Cause by Poor Station Locations," American Meteorolgoical Society, Aug 2006, p. 1074-1075

    Note that they know about this and that makes the average temperature error bar to be exceeding .5 deg C. This isn't a Gaussian error in the cases I have been showing, it is systematic bias. When you 'correct' a systematic bias, you have to make a judgement that you are correcting it correctly, meaning, that you know the cause of the error. As noted above, they don't know the cause, and thus can't know the magnitude of the error.

    All this boils down to a case where we have the IPCC saying that the tempeature has risen over the past 100 years by .74 deg C. (see p. 30)

    If so, and if we are moving the temperature by as much as 2-3 degrees, we have the statistically stupid situation where one must conclude that the world has warmed by .74 +/- 2 degrees, meaning it might have cooled by 1.25 degrees or warmed by 2.75 degrees. In other words, we can't really know if it has warmed or cooled.

    Now you might say that the bias is towards warming and that is true. But in the 1990, many rural stations were dropped out of the system and they were stations which showed cooling trends. If you remove the stations showing cooling and then average the rest, you will see an acceleration of the warming. That is what is happening statistically.