More stupid craziness

I fixed the problems with the zero-crossing angles from the previous diary. It took an entire day, not just to fix the actual problems, but also to double-check all the other data to insure that it was not also compromised. I’m pretty sure that I’ve corrected all those problems. So I went back to work on the correlation issue described in this diary. To refresh your memory, I was looking for correlations in numbers of meteors seen by the four Aria cameras. My reasoning was that if all four cameras report a large number of meteors at the same time, there must be more to it than mere randomness. I devised a rough and ready figure of merit for the correlation, and compared my results with results produced by randomly-generated data sets of Leonids that met the same overall characteristics as the actual data set. To my shock, I discovered that the actual data showed __less__ variation than the randomly-generated data – that is, the Leonid counts for each camera during different times are __less__ correlated than random data. This implies that the different cameras were looking at very different phenomena. This makes no sense whatever. The only reasonable conclusion is that I screwed up somewhere.

This time I decided to carry out a proper calculation of the correlation coefficient between each pair of cameras. I wasn’t confident that this would produce a final answer, but I hoped that it would shed some light on the problem. But the results I got only support the hypothesis that I don’t know what the hell I’m doing.

To recap, here’s the basic idea of the analysis: I select a period of time during which all four cameras are reporting: 1:40 AM to 2:56 AM. This period encompasses the peak of the shower and a great portion of all the meteors recorded. There were two small glitches in this data set: camera AR50R went offline for one minute at 2:09 AM, and camera AR50F when offline for one minute at 2:36 AM. These interruptions were not large enough, I believe, to affect my results significantly.

First, I broke that 86-minute period up into 5,160 bins, each one second long. Then I went through those bins, counting how many Leonids each of the four cameras reported for each bin. I calculated the correlation coefficient for the bin counts for each pair of cameras throughout that procedure, and saved that value. Then I repeated the procedure, but this time I broke the 86-minute period into 2580 bins, each 2 seconds long. Then for bins 3 seconds long, bins 4 seconds long, and so forth right up to bins 600 second long -- that’s ten minutes. For these longest bins, there were only 8 bins for each camera. Here is the graph of the correlation coefficient for the pairing AL50R and AR50R: the cameras facing rear on both sides of the aircraft:

Figure 1: correlation coefficient between AR50R and AL50R, versus bin size

If the Leonids are perfectly randomly distributed, we’d expect to see a correlation coefficient of 1.00. As you can see, we get something very close to 1.00 throughout most of the range of bin sizes. But what about the left side of the graph, with small bin sizes? That shows very low correlation, which suggests nonrandomness at short time durations. Is this evidence of nonrandomness?

No, because at these small bin sizes, small-number effects come into play. Consider the situation with bin sizes of 1 second. The rate of Leonids per camera, averaged over the 86-minute period, was roughly 0.4 Leonids per second. That means that, for any given bin, there was a 40% chance that it would have 1 Leonid and a 60% chance that it would have no Leonids. (The real situation is a bit more complex, but I won’t go into it. Just say “Poisson” and forget it.) The problem here is that 1 correlates with 0 very badly. In mathematical terms, the difference between 1 Leonid per bin and 0 Leonids per bin is quite large. Thus, even if the situation were truly random, we’d still get a low correlation coefficient between cameras. We’d get something even lower if the situation were non-random. So I repeated the whole process with randomly-generated data and got these results:

Figure 2: mean of 1000 random data sets (blue) with actual data (green)

The blue circles represent the mean of the correlation coefficient for 1000 random trials (modern computers are really fast!) What’s astounding is the size of the error bars. I kept getting mathematical errors in the analysis of the standard deviations of the correlation coefficients in the random trials. It turned out to be due to the fact that the standard deviations were ridiculously small. They were so small that they ran afoul of the round-off errors of the double-size floating point numbers used by Java. These numbers are represented in 8 bytes, meaning that we have 52 bits of binary precision in the mantissa of the number. Even a number that precise ran into enough problems with the tiny values that it simply couldn’t keep up. When I added a kluge to work around the problem, I discovered that most of the standard deviations were on the order of 10**-7, or in part in ten million. Let’s give it an order of magnitude leeway and assume that the average circle here has an error of 10**-6. The error bars on the blue circles in this graph would then be one ten-thousandth of a pixel high.

This leads to the conclusion that the cameras unquestionably showed correlation. This looks like more proof that Leonids are non-randomly distributed. But I’m suspicious of this conclusion. In the first place, the data is __too__ good: the actual data points (green circles) are something like a million standard deviations off the random value. That’s way too big to be believable. In the second place, the effect seems to be uniform through all bin sizes. If Leonids were non-randomly distributed, we’d expect to see some sort of pattern in their appearances. No such pattern has shown up in the data.