Journey analysis, Part 1: Who signs up?

Introduction

I’ve been involved in running an event called Journey to the End of the Night (or, in DC, “SurviveDC”) for the last several years. It’s a street game where you run through the streets seeking checkpoints and trying to avoid chasers. It’s a great game. It’s also completely free. I want to use analytics to understand better how people play the game, who plays the game, and how to run it better. I really like tracking game behavior and using it to understand and improve the experience; it’s probably time to think about the next iteration of Journey log for this purpose.

In this series, I’ll be looking at the registration data from the 2013 San Francisco Journey to the End of the Night. Because this game involves running your real-life body through the streets of a real-life city, with real-life dangers, we make everyone sign a waiver to indicate they understand those dangers. This year, we made everyone register and print those waivers beforehand.

As it happens, we can also use this registration data to understand some things about who plays Journey. We’ll explore what kind of things you might want to look for in registration data (not just for Journey, and not just for free street game experiences), the methods you can use to look for them, and how you might want to improve things based on what you find. For those who want to dig in a little deeper, I’ll go a bit into the statistical analysis, with some digressions into interesting data analysis and visualization along the way. I’ll also post the code I used, which could be used to perform the same analysis on other datasets.

Let’s get started.

Age distribution

Demographics, demographics, demographics…the median age for Journey registrants is 25, with 50% of registrants between 22 and 29, 95% between 17 and 40. And if you played Journey and are younger than 15 or older than 44 — you /are/ the one percent. The histogram can be seen here:

age_histogram

I don’t know much about marketing — but I do know that different channels are going to be better for different age ranges. There’s certainly a bias in the people who come out for Journey that’s affected by where the notifications have been historically; but I suspect the effect from people self-selecting is much stronger. So if you want to publicize your Journey, good places to do that will be those focusing on the 22-29 age bracket. Similarly, if we did make Journey into a commercial event, those would be the people who would be targeted.

Gender distribution

Now, we didn’t ask for gender explicitly on the signup form (because, honestly, we don’t care what you identify as). But I was still kind of curious how this breaks down. So I took the very interesting ‘gender’ package, which attempts to map names to gender likelihoods based on US birth records. I used this to map each person’s registration to a likely birth gender, based on name and birth year. Based on that analysis, Journey registration is about 40% female…which is not that far from the San Francisco overall ratio for this age range, reputedly (though I can’t find a source I think is reliable; the best I can do is Trulia or half sigma). A better comparison might be that Journey’s gender numbers are slightly worse than female participation in the Olympics — we’re at 40%, they’re at 44%.

I can hardly believe I’m making this into a plot, but:

why_is_this_a_plot

We can also look at a breakdown by age and gender:

age_gender_barplot

It looks like there’s a slight skew older for male names. We’re not going to worry too much about it, but if we wanted to check that, we could do a Kolmogorov-Smirnov test to see if the distributions were equal.

Email

So, gone are the days where you could figure out much about someone from their email domain. Of the people registering for Journey, almost 75% used a gmail.com address. Another 9% from yahoo.com, and 4% from hotmail.com. The next highest contender is berkeley.edu, with only 1.72%, and it just gets smaller from there. This was going to be a table, but everyone likes graphs, right?

email_bars

Takeaway? Make sure your Journey emails look good in gmail. If they don’t, don’t even bother. It’s worth looking at it in yahoo, too — to the tune of 10%. This was actually something we considered in the email confirmations: because we include an image of the QR code we’d later scan in for registration confirmation, we had to make sure it would print in gmail (for which I reccommend the roadie gem).

Authentication

What authentication method are people using? OMG, it’s overwhelmingly Facebook (even though everyone has gmail.com addresses). When choosing between authenticating with facebook, google, twitter, or other OpenAuth, 57% chose facebook. 38% chose google, and 5% chose twitter. I was surprised by this one, especially because almost everyone had a gmail.com email address. But in terms of doing easy identity registration, facebook authentication and google authentication were great. I feel like we need to keep OpenAuth for the open/hacker ethos…but I don’t think twitter is worth it.

authentication_bars

The End

You can download the cleaned registrations data and R code for part 1. Come back next time for part 2, where I take a look at the timeseries of our signup rates over time.

  1. No comments yet.

SetPageWidth