Transforming Event Planning: Predicting Attendance
By
Media CTO Team
in
Call the caterer, secure a keynote speaker, create an interactive event experience, reserve event space, invite the guests, send out a schedule, assemble an event team, organize mixers, get a discount from partner hotels – Events, even in the early planning stages, require a lot of work. But it’s all worth it once the event attendees walk into the welcome session on the first day, right? But here is the rub: how do you know that will happen?
As long as people show up and attendance numbers hit expectations, events are worth it. 68% of B2B marketers say that live events generate the most leads, but what if people don’t show up as expected? Meeting rooms feel sparse, discussion sessions aren’t as productive as planned, and the impact of the efforts will not meet expectations. If you don’t get your attendance predictions right, even the best event will struggle to shine.
For smaller event organisers or in-house event teams, it can feel impossible to keep up. But here is a secret: every event is capable of having the star power that events from major organisers have, so long as data models are used correctly. By predicting attendance you can personalise breakout sessions, and customize every attendee’s experience - That’s available to you, too.
We created an event data model that that will give event experts a new level of confidence: The Digital Intelligence Generator. We refer to it as “The DIG.” Our tool utilizes the Google Cloud ecosystem and a sophisticated machine learning model to predict attendance on an individual level, giving you the chance to tailor your event to exactly the right audience.
The DiG will help you move away from placing bets on hopeful attendance numbers and, instead, harness the power of data science to get an accurate read on who will walk through those doors on Day 1. And the best part? Anyone can use it – in this article, we will share how we built the first model over two years ago. If you have the capability, your data and insights team can start similarly.
How the Model Behind The DiG Works
It may sound complex and technical at first, but understanding how The DiG works is less about the technology behind it and more about understanding how events work. But don't worry, we'll explain exactly how the model is built, what parameters it uses, and even how you can easily deploy it.
The DiG in essence is a live dashboard with an advanced attendance predictor; it uses a logistic regression model. A logistic regression model was the preferred approach because it gives us a model output of either “True” or “False.” When predicting attendance, an organiser’s biggest fears are whether attendees turn up and how they will interact with the event. Half a Participant can’t show up, so we need “yes” or “no” output – AKA “true” or “false.”
The best part of working on The DiG is that its aim was not to build for technically advanced users; it was built for people who don’t know how to code but still rely on data-driven insights to do their jobs. The no-code market, which is expected to have a value of $65 billion by 2027, is growing in popularity. No-code solutions are becoming widely used in many settings and creating an accessible world of technology for even the most non-technical people.
Defining The DiG’s Prediction Variables
Behind every data model is a set of real-world variables. In this case, the variables or certain characteristics of attendance registrations. We can teach the model to assess each variable, assign a certain score to that variable, and output a “True” or “False result regarding whether that person will actually show up. Let’s break down each variable that's part of our model:
keyChannel
key channels combine multiple characteristics like UTM parameters, tracking codes, and more. At the simplest level, "keyChannel" could be organic channels or email, but there are more complex channels, too.
"hostedAttendee," is one example. The people in this channel are hosted, meaning given some special treatment and are important. In our model, the fact that a Participant is hosted is more important to determining their attendance than how they registered for the event. Other "keyChannel" variables could be organic social, referral, youTube or direct as examples.
tierOne
Delegates in the Tier One category are critical attendees. By combining factors like company classification and seniority, we identify all attendees in this category, allowing you to monitor them more closely and weight their attendance results differently.
tierTwo
Very similar to Tier One, the Tier Two category identifies attendees with slightly less weight but who are still important and anticipated at the event.
travelBarrier
Measuring the travel distance required by each participant can become a sophisticated endeavour. If you think about it, the perceived wisdom is that someone who has to show up in LA from Miami will be less likely to attend than someone driving up from San Diego. While the actual measurement is important, what is more significant is the ease of travel. A three-hour dive or a three-hour flight may not be that different.
When building your early model, we recommend setting this to “domestic,” “short,” or “long.” The classification of each category is up to you, depending on the range of your invite list. If your event is international, you may want to include how far an attendee’s country of origin is from the event country. In some cases, we also include how hard it is to get entry approval.
ticketType
This is helpful in indicating if a participant is coming with a group if they received a special promotion, or if they have restricted access to the event. Separating VIP from general tickets or expo-only tickets happens in this variable.
Paid?
Whether or not a participant has paid in advance will have a major impact on whether or not they show up. No one wants to waste hundreds of dollars they already spent by not attending the event they paid for.
weeksOut
Anchor the number of weeks out to the registration date. As it gets closer, it will impact the model’s calculations. It is an effective way of normalising registration data. The granularity of weeks has been enough in our experience.
Now that you understand all the variables at hand, you are ready to start training the model. Preparing the training data is easy; it can look as simple as the image below. Simply figure out how many potential responses there are for each variable column and create a data set based on that information from your historical datasets. The important column is the historical information showing if the individual attended or not.
Please note a few important items from the above data set:
“keyChannel” is a text value.
“tierOne” returns a 0 or 1
“tierTwo” returns a 0 or 1
“flightCategory” is synonymous with “travelBarrier” and returns 0, 1, 0r 2
“ticketType” is a text value
“Paid” is Boolean, meaning “true” or “false”
“weeksOut” is a number
“Attend” is “true” or “false” as well
Building the Model
Once the training data is in place, putting together a basic model is simple. Here is how you would do it in BigQuery.
CREATE OR REPLACE MODEL `your_project.your_dataset.the_name_of_your_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
keyChannel,
tierOne,
tierTwo,
flightCategory,
ticketType,
paid,
weeksOut,
Attend AS label
FROM
`your_project.your_dataset.the_name_of_your_training_data`;
By going through the model line-by-line, it starts to become clear how it works.
The first line defines the model’s name and location
`your_project.your_dataset.the_name_of_your_model`
In OPTIONS, you tell the computer which model to use. In this case, it’s ‘logistic_reg.’ Because we want a yes / no answer.
OPTIONS(model_type='logistic_reg')
The lines under “SELECT” are the relevant data points from your training data set.
keyChannel,
tierOne,
tierTwo,
flightCategory,
ticketType,
paid,
weeksOut,
The final line tells the model that “Attend” should be the outcome or final label of the model. You want it to tell you whether or not a person will attend, right?
Attend AS label
In machine learning, “features” are the input variables or characteristics that are used to predict the outcome, and the “label” is the result that the model is trying to predict.
After this step, BigQuery ML, which powers The DiG, will comb through all the training data, learn how to assess each input and improve its prediction accuracy before you deploy it for an event.
But you can do this for yourself if you are tech-savvy, and if you have done it right, you have successfully created your own machine-learning model in BigQuery. Congratulations! That wasn’t so hard, was it?
You’ll know it was successful when you see this in the project of your big query console:
Putting the Model to Work
The hard part is over; now it’s time to deploy your model. Essentially, all you’ll need to do is feed it data sets with the variables above and run it. It will output attendance predictions for each data set. In practice, every event’s attendee list will be its own data set for the model to run through.
Every time you run the model with a new data set, you can save the results in a temporary prediction table that will have a unique ticketID with the prediction results. Look at the query below and let’s break it down one section at a time to solidify your understanding of how it works.
CREATE OR REPLACE TABLE your_project.your_dataset.the_name_of_your_temp_prediction_table AS
-- This part selects the information form your registration table
WITH prep AS (
SELECT
ticketID,
keyChannel,
flightCategory,
ticketType,
paid,
weeksOut,
tierOne,
tierTwo
FROM `your_project.your_dataset.the_name_of_your_registration_table`
),
-- This part calls or invokes the model you created in the last step
attendPredictor AS (
SELECT
ticketID,
predicted_label AS attendModelValue
FROM ML.PREDICT(MODEL `your_project.your_dataset.the_name_of_your_prediction_model`, TABLE prep)
)
SELECT
*
from
attendPredictor
The first part pulls all the characteristics from your data set needed to determine the attendance outcome. It uses the “SELECT” function to grab the following information from your registration table:
ticketID,
keyChannel,
flightCategory,
ticketType,
paid,
weeksOut,
tierOne,
tierTwo
We give the data set a pseudo name by using the “WITH prep AS (“ demand above “SELECT” in this section.
The second part of the query sends the information to the model and brings back the “ticketID” and the attendance prediction line-by-line.
attendPredictor AS (
SELECT
ticketID,
predicted_label AS attendModelValue
The “attendModelValue” will return a “True” if the guest is predicted to attend, or a “False” if the model has determined it is unlikely for the guest to show up.
The last part of the query is how your data pull communicates with the model. It tells The DiG to search a certain table for the name of your model. But we can break down the command into even smaller parts.
ML.PREDICT tells BigQuery that it’s a prediction model.
`your_project.your_dataset.the_name_of_your_prediction_model` tells BigQuery where the model is located.
TABLE prep shoots off every row of the preparation table to the model. The model then runs through each ticketID and assesses all the characteristics associated with that attendee.
ticketID,
keyChannel,
flightCategory,
ticketType,
paid,
weeksOut,
tierOne,
Finally, the model will return the “attendModelValue” results. Once this is done, the very first line item in the query “CREATE OR REPLACE TABLE” will either create or fully replace the temporary table storing your predictions.
Updating Your Registration Table
After BigQuery has pulled all the data in your preparation table, analysed the attendee list in question, and spit out attendance predictions, you’ll need to update your registration table.
The piece of code that will do that for you is:
UPDATE `your_project.your_dataset.the_name_of_your_main_registration_table` target
SET target.attendPredict = source.attendModelValue
FROM your_project.your_dataset.the_name_of_your_temp_prediction_table AS source
WHERE target.ticketID = source.ticketID;
“UPDATE” means that information is not being created or replaced; it is merely being updated in the main registration table.
The next part tells the computer to change the field “attendPredict” with the information from “attendModelValue.”
The source – where the model is pulling data -- is defined as the table created in the previous step:
FROM your_project.your_dataset.the_name_of_your_temp_prediction_table AS Source
Model Benefits
Now that you have an understanding of how the model works and exactly what its outputs mean, you can see how The DiG’s advanced insights and reporting capabilities allow your event leaders to make data-driven decisions with ease. The below shows you how our prediction model powers relevant score cards.
You will also be able to click through to an overview table to assess the model’s predictions in a detailed view for each of your acquisition channels.
Bet on Yourself; Bet on The DiG and Data Science
Looking at an event after the fact and figuring out what went wrong used to be how the events industry operated. But now, it's important to get ahead of the event; organisers need to anticipate audience dynamics and prevent things from going wrong before they actually do. The DiG enables this shift by allowing organisers to make data-driven decisions when curating in-person events. Analytical insights aren't just for major production companies with huge technology budgets; they're more accessible today than ever.
Data will never replace the human element of events. The reason attendees love going to them in the first place is because of the creativity and thoughtfulness that event organisers bring to the table. Data science, machine learning, and AI will never replace that, but rather, they'll act as a tool to allow talented event professionals to focus on what they're best at – events!
Whether you decide to build your own model in BigQuery using our early approach shared in this article or use what we've created in The DiG, our team wants to help you create events that leave a legacy.