Informally, the data generating process is the truth about the deck of cards and how they are dealt. If we are rolling dice, it is the manner we roll them and whether they are weighted. For complicated events in the wild, we don’t even know what the deck of cards is, how many, or how they are shuffled.
More formally, the data generating process is the unknown model that actually generates the data we observe. It is really the entire universe and beyond if there is one. However, what we do in practice is invent models of the data generating process and then with our invented model we calculate probabilities. We can then tune the model based on observed data.
For MH17 and MH370 we don’t know how to calculate the probabilities because we don’t know what deck of cards to use. What card deck has these events in them? So we make them up. We are guided, however, by experience in probability or in stochastic process modeling.
For MH17, shot down over Ukraine, we can start with some simple models.
Given the total number of airplanes passing over that area per day, what was the chance that:
- A civilian airliner would be shot down?
- Any plane would be shot down?
- 2 or more planes?
- A Boeing 777?
- A Malaysian airliner?
- A plane that matches the MH370 taking that as already having happened.
For MH370, we have to think about what the event is.
- Is the event a plane disappearing?
- Is it one disappearing so it can’t be found?
- Is it a passenger jet that matches an airline flying over Ukraine?
- Is it a passenger jet that originates at a destination airport for a flight from Europe that flies over Ukraine?
Depending on how we define the event, changes the probability, given some model of probability that we have made up. Even for Ukraine, we are defining the event by our list above.
So we have to invent what the event is. Then we have to invent a probability model that will generate events like that. Then we need parameters. Then we calculate the event probability.
For MH17 and MH370, we have to decide what the joint event is and what the joint model is, and then calculate the joint probability. We can break that up by the following.
- Calculate the probability of one of them independently of the other.
- Calculate the probability of the other one conditional on the first.
We also have to decide if the event should include more of what is happening in the world. Does it include that MH370 disappeared while Russia was already invading Crimea? How long was it planned? If it took a year to plan this, then we have to think about events that happened during that year. Who would be planning such a thing for 2013 and start of 2014? Who was on the move?
Who was linked to high tech terrorist acts that turned out hard to trace? Who is painting pictures on our televisions from 2013 to now, in event after another? Who is doing one Moscow Snowden airport show after another? Who tricked the US to force down a diplomatic plane thinking Snowden was on it? Who has airport and airplane events going from 2013 to now?
Who would have satellite experience to know what Inmarsat could do with the Doppler shift and a wobbly satellite? Who had to do position and velocity calculations in the early days of space flight with tools of that sophistication?
If we frame it this way, we end up with Russia. But that is one way to frame it, ie to define the event space. Then we need a model that generates a sequence of events like that and including the one we actually have observed.
Who covered up MH17 crash site? Who had a conspiracy theory ready that linked MH370 and MH17 and framed the CIA and Diego Garcia? What country or terrorist group acts like that?
If MH370 was a terrorist group acting independently, which group disappears a plane into deep ocean and doesn’t take any credit for it? We have no idea who did it, if it was a terrorist group like Hamas.
This is draft and preliminary. The above is hypotheses and speculation. Comments and corrections welcome. Please restate as questions. All other disclaimers apply.