Journey analysis, Part 2: Whens and How Manys

In Part 1, I introduced Journey to the End of the Night and did some basic analysis of who signs up. Here, I’m going to look at the pattern of signups over time. Signups over time are interesting because they can potentially give us some insight into /why/ people signed up. They can also give us a way to estimate what our final registration numbers will be. Here’s the raw plot of cumulative registrations over time:

A few things to notice here. First, the jump in registrations on 10/10 — that’s right after we sent out the announcement for Journey to the journey-sf mailing list. Second, we can see that registrations really take off on 11/3, six days before the event itself; presumably this indicates that for many people, they’ve waited until the week to make their plans (however, this might be equally explained by being listed on funcheapsf.com…unfortunately, I didn’t instrument for referral source. Bad analyst. Next time. Definitely next time.)

We can also see in the final signups a daily pattern — fairly consistent registrations throughout the day, flattening out during the night, when people are asleep. I’ll be honest, I’m not sure what action to take based on that (send email blasts before people go to bed?), but it’s interesting to see.

Predicting Total Registrations

Now, the real question: can we project our ultimate number of registrations, given the registrations we’ve seen so far?

Let’s assume there’s a point, somewhere close to the event, that starts an increase in the rate of registrations. I’ll call this the “what am I doing this weekend?” point. For this Journey, it looks like this happens about six days before the event.

Trying to fit an adaptive spline model (using the earth package) comes up with those splits we noticed before: an early section, a broad middle, and an increase towards the end. It also separates out the period two weeks before the event, where we do see an increase in registration rate.

Really, the most important thing this tells us (unfortunately) is that we get a huge proportion of our signups in the week before the event. And since we only have the one event, we don’t really know if that signup rate is predictable from the earlier rate. It gives us a rough theory to be able to predict an increased rate near the game, but I wouldn’t have much confidence in that ratio holding for future games. Clearly, the answer is to increase the sample size: let’s gather data from more games and see what general predictive behavior we can find. (So yes, this means that if you’re running a game, I am interested in running signups for you and getting your data.)

One hopeful note is that slight increase in registration rate two weeks before the event. So, *if* we were going to use this as a model for predicting registrations, we’d take the rates at each of those sections, assume (big assumption!) either an additive or multiplicative change two weeks and one week before the event, and plan out the registrations we’d expect to see. We’d know a lot more in the week or two before the event, but we’ve got to order ribbons before then. One nice thing is that maps can be printed very close to game day, so getting cheap ribbon and getting extra might work, then cutting down the map printing based on a closer estimate.

We’d love to be able to do some Monte Carlo simulation of possible worlds, to get some idea of the variance in actual outcome — so that instead of just predicting a certain number of registrations, we could also give a prediction interval for the high and low possibilities. But that’s a pretty hard thing to figure out how to sample from (it’s not clear how much to oversample or undersample the registrations, and that assumption has a big impact). But if we were doing a running estimate of ultimate registrations using this segmented model, the result might look something like this, giving us our current estimate (in green) for the number of registrations as we get closer to the event:

That’s not too bad — a relatively flat line — it looks like we may be able to provide some reasonable predictions on our eventual registrations (though again with the caveat that we’re making a pretty strong and brittle assumption on how much the registrations increase two weeks before and one week before the event)

The End

You can download the cleaned registrations data and code for this post here. Join us next time for Part 3, where we ask, “Just how flaky is the Bay Area, really?”

This entry was posted on Monday, May 12th, 2014 at 10:41 pm and is filed under Analytics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

No comments yet.

Thomas Lotze