An Introduction to Podsights Analytics.

April 05, 2019

Overview

Podcast advertising is unique. Unlike digital advertising, there are no cookies or unique identifiers to match a user to an action. Unlike radio advertising, we do get the listener’s IP and User Agent on every download. It sits somewhere between knowing nothing about the user and knowing everything, which is comforting in the time of privacy invasion, but unsatisfying to marketers looking to make purchasing decisions.

Podsights’ goal is to squeeze as much information out of this data to give you, the marketer, more clarity.

This post draws from data from 8 campaigns. We are going to use some of the data anonymously, but reference In Defense of Paper’s data by name. In Defense of Paper (IDoP) is a small journal company that we run as an original brand that allows us to test the effectiveness of ads in a real-life setting.

We are going to break down how Podsights works at each step of the process and go into the insights we learned along the way. This post into six parts: Current State, How Podsights Works, The Data, Reporting, Attribution, and Retargeting.

Current State

If you are new to podcasting, you need to understand this isn’t like Facebook or Google marketing. Downloads are reported by spreadsheets, admin screenshots, and email. You can spend a million dollars on a campaign, and the publisher will send you an email after saying, yep, you got 30 million downloads, and that’s it.

Downloads are also counted differently between hosting providers. IAB v2.0 download spec is looking to solve this, but we are seeing podcasters selling on 30,000 downloads an episode and only getting 15,000. They aren’t trying to do this; their hosting provider is just giving them bad data.

Between the lack of data, and the bad data that is reported, podcast advertising has a high churn. 85% of podcast advertisers are new to the medium. 70% of them will churn after making their first buy.

Podsights sets out to solve these two main problems by centralizing reporting into one dashboard and measuring ROI. We give data-centric marketers the tools to invest in podcast advertising.

How Podsights Works

Podsights is remarkably easy to integrate into your current and future podcast campaigns. There are two pieces: Analytics Prefix and Pixel.

Analytics Prefix

The publisher installs the Analytics Prefix through their hosting provider. Every time a user downloads an episode, the request goes to Podsights first, then to the audio file. This allows us to report real-time download numbers and is the basis of attribution.

Analytics prefix

If Megaphone or Art19 hosts the podcast Podsights can also use their third-party pixels instead of an Analytics Prefix. In this case, the user downloads the episode directly from the host and the host sends Podsights the IP and User Agent when they dynamically insert the ad.

Analytics pixel

Podsights, can or has worked with: Art19, Megaphone, Libsyn, Simplecast, SoundCloud, Transistor, and Squarespace.

Pixel

The pixel is a small piece of JavaScript that is included on the brand’s site. By default the pixel collects IP, User Agent and page views. The brand can include optional events such as product view, add to cart, checkout, purchase and lead to enhance their dashboard.

We are sensitive to page loads, so our pixel does not block the page from loading.

For more legal information around our pixel, please see our privacy policy and Opt Out tool

Data

Podsights works primarily from IP and User Agents. It’s useful for us to explain both, as we will reference both extensively below.

IPs

An Internet Protocol(IP) address is a numerical label attached to the network your device is connected to. It looks like this 172.16.254.1 or 2001:db8:0:1234:0:567:8:1. IPs are the backbone of how traffic is routed around the web.

IPs are not unique to a user. As you move from your house to the train and then to work your phone is being given different IPs depending on what network you are connected to. Users are also pooled through the same IP address, so at home, you, your partner, your kids, your grandma are all using the same IP.

IPs can also be “noisy”. Noisy means that many users are pooled through the same IP address. AT&T is especially bad about putting many unrelated users into the same IP. If you are part of a large organization, your work IP is also noisy.

How do we figure out if an IP is noisy? Good question, we use web traffic to determine how users move across networks if we see too many users going through the same channels.

IPs are tied to a location, and the ISPs will report approximate locations back to IANA and others. You can buy databases from many providers to go from IP to location. This is how geo-blocking works. It’s not particularly accurate; the location can be anywhere within a radius of up to 500 miles

Because IPs are so pervasive a number of providers have sprung up in the last decade offering cross-device graphs. These are massive graphs of every household IP mapped to identifiers they have seen at that IP. Tapad and Drawbridge are two of the largest.

We use cross-device graphs to understand how users move across networks. If we see a download at one IP address and a site visit at another and they are linked via the cross-device graph, there is a strong probability they are one in the same user.

Cross-device graphs tend to throw out noisy IPs because their primary use case is personalization. If there are too many identifiers at a location, personalization falls apart.

For every 100 IPs that we send to a cross-device graph, we will get between 35 and 50 matches back. We use these matches for cross-device attribution, retargeting, and frequency measurements.

User Agents

The last piece of the puzzle we will use to identify a device is its user agent. Every time a podcast player downloads a podcast, it sends along a User Agent identifying some device characteristics. This is a user agent from Apple’s Podcast app:

AppleCoreMedia/1.0.0.16D57 (iPhone; U; CPU OS 12_1_4 like Mac OS X; en_us)

It tells us it’s an iPhone running iOs 12.1.4. Other user agents look like:

okhttp/3.12.1

Really not helpful. We know this is most likely the Podcast Addict app on Android because of the traffic pattern, but we can do better everyone.

Podsights uses User Agents to include and rule out traffic. If we see a download at an IP and then a site visit at the same IP we check to make sure the User Agents make sense. If the device that made the site visit is running a lower version of iOS than the download, we can rule that out as noise.

Reporting

Attribution is nothing without the download data attached. We need to understand, down to the creative level, what shows convert for your brand and at what price.

Downloads

Podsights does a lot of work to measure downloads accurately in real time, but the integration is as simple as possible. The podcaster inserts the Podsights Analytics Prefix in front of each Enclosure URL in their RSS feed, so every listener request for the podcast audio file hits Podsights’ server first, before being redirected to the podcast host.

This is a common industry practice, and Podsights uses a redundant cloud architecture to ensure fast, highly-available response times that are invisible to the listener. Many podcast hosts have direct support for adding an analytics prefix, and for other hosts, it’s as simple as sending one email. Podsights also works with newer podcast hosts like Megaphone.fm and Art19 to measure downloads via a tracking webhook, instead of the analytics prefix.

When Podsights receives a download request, the metadata is immediately pushed into our real-time downloads pipeline, as well as being saved in a permanent storage backup in case of later reprocessing. In accordance with the IAB Podcast Measurement Technical Guidelines 2.0, each request is evaluated to see if it represents a listener with intent to consume the audio. In fact, the majority of traffic does not qualify and is mostly made up of HTTP HEAD requests (does this file exist?), bot requests (Pandora crawls podcasts constantly), and streaming check requests (Apple Podcasts always checks to make sure each episode can be streamed). Each podcast app has its own pattern of requests before it actually starts playing the audio for the listener, and it’s important that Podsights counts those downloads accurately.

Within the real-time downloads pipeline, valid download requests are immediately pushed and made available in our historical and real-time dashboards. Valid downloads are also published in real-time via webhooks to the appropriate customers, allowing further integration with other marketing tools.

Invalid downloads are kept on hand and further requests are monitored, waiting until a valid request is made complete. If a sequence of download requests from a client never becomes valid within the 24 hour IAB 2.0 window, Podsights reports these invalid downloads separately in our dashboard.

Downloads are not measured consistently across hosting providers. Of the shows our Analytics Prefix is on, we compare numbers from the hosting provider, other prefixes and our own data. Some are just egregious. The following compares Podsights’ numbers to that of a leading prefix over 10 episodes:

Downloads

For this show, the numbers are 85% higher than Podsights. Brands are only getting 55% of the downloads they are buying and their CPM goes from $30 to $55.

No matter the underlying hosting provider, Podsights provides consistent, reliable numbers.

Frequency

Frequency measures how often we are exposing a user to a brand. 30 times is likely too many, once is likely too few. If you are buying one show on one network on the same hosting provider, then the hosting provider can give you a frequency number. What happens when you are buying across multiple shows, networks and hosting providers? Podsights can help.

The most basic measure of frequency is reach/downloads where reach is the number of unique User-Agent/IP pairs. We have found this under-represents frequency. There is a set number of people that will always download a podcast on a mobile network. They will rarely download the same podcast on the same IP, and hence they will always look unique, but it’s the same person. If you have a 3 episode buy, that user will look like three separate users and reduce the overall frequency.

In a campaign, we recently ran across six shows, the basic version of frequency reported would have been 1.53. When we remove the noisy IPs, this number goes to 1.8, and if we use cross-device data, this number jumps to 2.26.

True frequency was 48% higher than basic frequency measurements.

Using this data, we can revise the overall reach to more closely match frequency. In this case, the reach went from ~260,000 users to ~180,000. This, in turn, increases the conversion rate.

Overlap

Another question we get a lot is around overlap. Of the shows I’m buying, where is the greatest and the least overlap? Again, this is easy if all shows use the same hosting provider, but measuring across multiple providers is more difficult.

Podsights uses the same methodology for overlap as with frequency. We want to understand overlap without the noise.

For two shows in the campaign, we found that the basic overlap was 7.9%. By removing the noisy IPs overlap rose to 11.56%. When we include only IPs with MAIDs, it jumps again to 17.03%.

We have found overlap to be a net positive, and we need to rerun these numbers, but users exposed to a brand across podcasts were ~15% more likely to convert. We will update this post once we have better data.

Attribution

We get asked about a match rate, a lot. Of the number of people that come to the site from a podcast, how many will you catch? The answer is definitively not 100%. There is inherently too much noise in just using User Agent and IP to get to that number.

Podsights works at a household level, not a user level. We treat any action at a household to be attributable to the podcast. If you download a podcast and your partner is the one that visited the brand site, this is an attributable action. In all probability, however, this is infrequent, and it was you that heard the ad and went to the brand site.

Podsights takes a waterfall approach to attribution that works from certainty to uncertainty. We put all the downloads and users in one long list and start pulling attributable downloads and actions out of that list.

Direct

We start with direct, non-noisy IPs. Direct is the simplest, and the most accurate case. These are residential IPs, with a limited number of identifiers and cookies that are likely single-family households. In this case, it’s your podcast player downloading the podcast on your home wifi, you listening to it anywhere, but visiting the brand site after you get home, connected to that home wifi.

Direct traffic is the most reliable and forms the basis of a lot of the attribution that happens next. We look to see how those users entered the site, what they did, and where they exited. We can create a model of what other listeners look like from these actions.

Direct also gives us a sense of what the conversion rate should be down the line. If the conversion of just pure direct traffic is 4%, but the rest is 0.5% percent, we know we have many more actions to attribute.

Cross

Next Podsights moves on to Cross IP traffic. We use a third-party data provider that helps us understand how users move across networks. You as a human have a routine. You wake up in the morning, download a podcast for your drive to work, listen to it in the car/train and go to work. The whole time your devices, with cookies from many different services, are riding along with you. As you pop up on different networks, graph companies build profiles of these patterns and try to match traffic back to a home network for better personalization in and around the web.

Podsights uses this data to attribute actions across networks. If you download a podcast at home and visit the brand site at work, we should try and connect those two actions.

Because it’s less certain, Podsights puts additional restrictions on this traffic. We make sure it looks like podcast traffic and not random web traffic, i.e., they came through search or direct and a reasonable time after the download.

Noisy

Noisy is the last step in the process and where you can get in the most trouble in IP based attribution. In an early test for a large e-commerce site that gets hundreds of thousands of visits a day we found if we just took noisy traffic at face value, over 30% of podcast listeners visited the brand site. Mobile networks, in particular AT&T, pool thousands of users through the same IP addresses. Without windowing, and modeling from previous steps a podcast can look great, but in reality, it’s just noise.

There are real, attributable actions coming from these IPs. One potential use case: You stream a podcast on your mobile device while on a train. You hear an ad, and immediately go to the site as you’re already playing on your phone anyways. Podsights makes sure it’s the same device, fits the model of interaction from previous steps and that interaction happens in a short period after the download.

From the above 30%, the actual conversion rate was 4.5%. Still great and believable.

Custom URLs

All podcast ads should have custom URLs. They are simple to set up, and it’s an easy way to survey podcast listeners. If I type in a custom URL, it’s me telling the brand that I came from a podcast.

More importantly, it’s an easy way for podcast hosts to boost attribution back to their show. Podcasters should use the custom URL in the show notes and place it on their podcast’s website.

In Defense of Paper ran on the Erasable podcast and used both discount codes and custom URLs. Erasable linked their custom URL in the show notes and on their site.

50% of the traffic to that custom URL was referred from their website erasable.us. If we measured purely by URL usage, they doubled their performance just by properly linking to In Defense of Paper.

In Defense of Paper also ran on the “New Mindset, Who Dis” podcast. We were not linked in the show notes, or on-site, and as a result, we saw zero usage of the URL.

If you measure purely by Custom URL usage, you are really measuring who is properly linking to your brand, rather than actual visits.

Podsights takes into account Custom URLs in it’s waterfall. They can be misleading when taken alone, but as part of the whole, they are powerful.

Discount Codes

Discount Codes are the Gold Standard in podcast measurement. Discount codes leak, for sure, but they are a dependable way to inform what shows are converting. Podsights takes codes into its attribution model. You can send us this data via the purchase event like so.

pdst("purchase", {
  value:  10.00,
  currency: "USD",
  discount_code: "PODCAST_CODE",
  order_id: "12322323232"
})

Podsights will then back that user out to a session, events, and downloads giving you a full picture of how podcast traffic compares.

A note here on measurement purely on Discount Codes alone. We hear a lot of high numbers around conversion rates from people that measure purely by codes. I.e., 20% or 30% of listeners visited the site, and I’m here to tell you this is false. We haven’t found data to support anything this high.

The way they are getting this number is by reversing their typical conversion rate and multiplying code usage. If I’m an e-commerce company, I might have a 2% conversion rate, meaning that for every 100 visitors I get, two will purchase. The reverse is true, so if I know I got 20 purchases, that must mean 1,000 people came to the site.

Say we advertise on a podcast with the reach of 10,000 users and we get 50 purchases. If we reverse it using the typical conversion rate, we will get that 2,500 visitors came from the podcast or 25%.

The truth is that podcast traffic just converts better. If I have a discount code in hand and I’ve just been told by a friend I listen to weekly to go buy something I’m a much more motivated buyer.

Conversion rate varies from podcast to podcast, site to site. In Defense of Paper found that podcast traffic converted five times better than web traffic.

Going back to the example, if the e-commerce site got 50 purchase and the conversion rate was 5x higher, it would mean that you got 500 visitors or 5% of the podcast.

Deterministic

Lastly, after all that work is done, taking in all these signals, we use a deterministic model. People do weird things that direct, noisy, cross, URLs and discount codes can’t catch. They get on planes, they visit friends, they use all their mobile data on podcasts. We will never catch everyone using this model. Instead, we use the data given to us to create a model and validate the model based on web traffic.

The simplest explanation of this is the conversion rate. If we remove all the noisy IPs and add the cross-device data and we get a conversion rate of 4%, we expect that conversion rate to hold across all traffic given a large enough sample size. If the overall is 2%, then we know we are missing people.

We will then expand filters to include some traffic we initially ruled out and find other users that look like podcast traffic that has a high probability of coming from a podcast. However, if the traffic doesn’t support this, we know that the direct traffic performed better than the rest.

Retargeting

Podcast advertising is effective, but not actionable. 74% of users can recall hearing a brand in a podcast, but only a fraction will end up visiting the site. To bridge the gap between audio and digital, retargeting allows marketers to reach those listeners across multiple channels.

Podsights uses third-party data providers to enhance our understanding of IPs, networks, and users. Those third-parties will hand us back Mobile Ad Identifiers (MAIDs) associated with those IPs. A MAID is an identifier tied to your mobile device, and it looks like this A70B9877-E94A-45EE-9E83-1139EC215947.

MAIDs are the backbone of a lot of mobile advertising networks. If I want to target users across mobile apps, MAIDs are the way to go.

Now, these are household level MAIDs in that we will get identifiers not associated with the downloader, but of other people in the household as well. Sometimes we get 25 MAIDs at an IP, sometimes 2.

Podsights allows you to use these MAIDs to retarget users across networks.

We ran a test using MAIDs for In Defense of Paper, using identifiers from the campaigns we ran. One ad set targeted these MAIDs, the other used interest-based targeting. While this was a limited test, we found that the interest-based targeting produced more clicks, the conversion rate from click to conversion was .2%. For the MAIDs test, it was 33%.

Conclusion

The goal here is to show you that we have thought a lot about podcast attribution, we understand the space, its complexities and know its limits.

Podsights’ approach of combining downloads, site traffic, and cross-device data gives you, the marketer, unprecedented information about the success of your campaigns.