Contact Us
data science targeted advertising

Data Science for Targeted Advertising: How to Display Relevant Ads by Leveraging Past User Behavior

The online advertising industry is bigger than you think

The past decade has seen a huge growth in online advertising. It is an enormous industry with brands expected to pour over 30 billion of dollars in 2014. Online advertising provides companies with instant feedback and publishers with more knowledge about their users. Advertisers are very interested in precisely targeted ads. In particular, they want to spend the smallest amount of money and get the maximum increase in profit. This is resolved by applying targeted advertising.

The problem involves determining where, when, and to who to display particular advertisements on the Internet. Advertising systems deliver ads based on demographic, contextual, or behavioral attributes. Sponsored search is one example. It is the most profitable business model on the web and accounts for the huge amount of income for the top search engines Google, Yahoo, and Bing. It generates at least 25 billion dollars per year.

There are a couple of usable methods to do targeted advertising:

  • Demographic Targeting – this approach defines the targeted audience by gender, age, income, location, etc. It is an old and efficient approach because it is easy to project behavior for products categories. Demographic targeting is popular since it’s easy to understand and implement. It provides advertiser transparence and control over the audience selected for targeting.
  • Property Targeting – is a simple and popular targeting mechanism. The advertiser specifies set of pages where the ad should be shown. For example the company who sells tracks could show advertisement on website about vehicles.
  • Behavioral Targeting – provides an approach to serve ads to users leveraging the past behavior of the user (searches, site visits, purchases). The most valuable resource for behavioral targeting is network traffic of particular user. The more such data you have, the better targeting result you will achieve. Thus, even local ISP companies can provide more accurate ads for consumer than Google or Yahoo.

Real-time bidding exchanges – de facto standard for targeted advertising

The online advertising industry has grown significantly during the past few years, with extensive usage of real-time bidding exchanges (RTB). This auction website allows advertisers to bid on the opportunity to place online display ads in real-time.  Advertisers are integrated with an exchange system via API and collect a variety of data to decide whether or not to bid and at what price. This has created a simple and efficient method for companies to target advertisements to particular users.

As the industry standard, showing the display ad to the consumer is called “impression”. The auctions run in real-time and instantly trigger when a user navigates to the web page and take place during the time the page is completely loaded in the user’s browser. During the auction, information about the location of the potential advertisement, along with user information, is passed to bidders in the form of bid requests.  This data is often appended with information previously collected by advertisers about the user.

When an auction starts, a potential advertiser makes the decision if it wants to bid on this impression, at which price, and what advertisement to show in case it wins the auction. There are billions of such real-time transactions each day, and advertisers require large-scale solutions to handle such auctions in milliseconds.

Such complicated ecosystems are a perfect opportunity for applying machine learning services, which play a key role in the ad bidding optimization process, increasing the targeting accuracy and reaching the ultimate goal from marketer’s perspective “Address the right browsers with the right message at the right moment and preferably at the right price”.

Improving ads relevance by applying Machine Learning techniques

The main task of a machine learning system is to identify prospective customers – online users who have a higher propensity to purchase a specific product in the near future after being displayed in the advertisement. The ultimate goal is to build a system that will learn predictive models for each ad targeting automatically. One of the challenges of building such systems is that different ad campaigns could have different performance measures.

However, each of these criteria may be approximately represented as some ranking of potential purchases in terms of purchase propensity. A primary source of input features for behavioral targeting is user browser history, recorded as a set of web pages visited in the past. The target labels could be individual for each campaign and based on actual purchases of the specific product. From a high level, this looks like an example of a straightforward predictive modeling problem. But if we take a closer look, it appears that it is impossible to obtain the necessary amount of training data directly for this problem.

First, the probability of purchase in the next 7 days after seeing the ads is very low and is in the range from 0.0000001 to 0.001, depending on the advertisement campaign. Second, the input feature vector includes more than one million features, even in the simplest case (considering the user browsing history is encoded as a set of hashed URLs). These dataset attributes involve difficulties in the training process, however, there are efficient approaches which are designed to predict the consumer purchase propensity in such difficult circumstances.

Site visits as better purchase predictors than click-through rate (CTR)

We know that the probability of purchasing after seeing the advertisement is a rare event. This causes model training with highly imbalanced class distribution (skewed classes). The simplest and most widely used approach is to introduce proxy-trained models. Currently, the most common proxy is clicking on an advertisement. The efficiency of campaigns is often evaluated based on “click-through rate” (CTR). As a result, they are optimized toward increased CTR.

In this approach, clicks on advertisements are treated as positive samples. Hence instead of conversions, the model is trained using clicks, but the test set is still labeled by conversions. In a recent study [1], this approach was tested against 10 different ad campaigns. The result implies that targeting based on clicks does not necessarily mean maximizing conversions.

data science advertising

Figure 1. Improvement in prediction accuracy by using conversions for training instead of clicks. Testing is done using conversions in both cases.

Are there other good proxy candidates for evaluating and optimizing the advertising campaigns? Latest researches [2], answered this question. In contrast to clicks, site visits turned out generally to be good proxies for purchases.  Specifically, site visits do remarkably well as the basis for building models to target browsers who will purchase subsequent to being shown the ad. Even is some cases the models trained on site visits are producing better results than one trained on conversions.

data science advertising

Figure 2. AUC performance distribution in with respect to purchase prediction of the models trained on clicks, site visits and purchases respectively.

The results show that site visitors are more likely to tend to be purchasers rather than ad clickers.

Dimensionality reduction techniques improve model accuracy.

As mentioned earlier, another difficulty in predicting the purchase is the huge input feature space, which typically requires dimensionality reduction. In most cases ad targeting system tracks over 100 million unique URLs, and any of them could be used in a predictive model. It’s very expensive to build and store such high-dimensional models. However, a number of dimensionality reduction techniques are available nowadays, but not all of them are well suited for ad targeting problems.

The simplest method for massive binary feature space reduction is feature hashing. It transforms a bag of words into a bag of hashed IDs. Given a set of tokens and a hash function h(), we apply the hash function to each of the tokens, and the new feature space is simply the set of hashed values. We can generate a column index for a given token with a hash function. The output of the hash function should be big enough to avoid collision with even a million unique tokens. The pseudocode is following:

function hash_vectorizer(features : string array, N : integer):
    x := new vector[N]
    for f in features:
    h := hash(f)
    x[h mod N] += 1
    return x

Dimensionality reduction results from hash collisions. For example, if a URLs set contains {,,}, and we have h(“”) = 6, h(“”) = 6 and h(“”) = 8 then, in the new space, the hashed URLs set has values for features 6 and 8. Hash functions are typically 32-bit or 64-bit, and to project into an arbitrary k-dimensional feature space, we compute h() mod k.

Another approach is Contextual Categories. The web has a number of sources, both proprietary and free, that categorize specific web pages by their content. These categories serve as content-based groupings that can be used to reduce the dimensionality of the data. With category data, the original feature space of URLs becomes a feature space of categories.

There are many other techniques for dimensionality reduction, including Singular Value Decomposition (SVD) and Principal Component Analysis (PCA), which are proven to be good alternatives for reducing the huge URL feature space.


This article made a brief overview of the targeted advertising business, which is a multi-billion industry and is growing dramatically. Most of the big players in the online advertising market are working with Real-time bidding systems (RTB), which connects advertisers and publisher. RTB acts as an online auction allowing advertisers to bid on the opportunity to place online display ads for a particular user in real-time.

Right now, in the industry, the key metric for measuring the success of ad camping is click-through rate (CTR), however, recent studies presented that site visits are a better conversion predictor than CTR. At first, sight, applying data science and machine learning for target advertising seems to be a trivial solution. But after looking at the problem more precisely, one may notice underlying difficulties, including rare conversion, lack of training data, and highly dimensional input feature space.

However, a number of researches have been conducted, which identified efficient solutions for solving mentioned difficulties and providing good models for predicting future conversion events. If you need expert support in implementing data science to boost your sales and customer engagement, we are here to help. Get in touch with us.

Have a question?
Speak to an expert
Data Science Service
Explore our data science services
Contact Us
  • We need your name to know how to address you
  • We need your phone number to reach you with response to your request
  • We need your country of business to know from what office to contact you
  • We need your company name to know your background and how we can use our experience to help you
  • Accepted file types: jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, Max. file size: 10 MB.
(jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, PNG)

We will add your info to our CRM for contacting you regarding your request. For more info please consult our privacy policy
  • This field is for validation purposes and should be left unchanged.

The breadth of knowledge and understanding that ELEKS has within its walls allows us to leverage that expertise to make superior deliverables for our customers. When you work with ELEKS, you are working with the top 1% of the aptitude and engineering excellence of the whole country.

sam fleming
Sam Fleming
President, Fleming-AOD

Right from the start, we really liked ELEKS’ commitment and engagement. They came to us with their best people to try to understand our context, our business idea, and developed the first prototype with us. They were very professional and very customer oriented. I think, without ELEKS it probably would not have been possible to have such a successful product in such a short period of time.

Caroline Aumeran
Caroline Aumeran
Head of Product Development, appygas

ELEKS has been involved in the development of a number of our consumer-facing websites and mobile applications that allow our customers to easily track their shipments, get the information they need as well as stay in touch with us. We’ve appreciated the level of ELEKS’ expertise, responsiveness and attention to details.

Samer Awajan
Samer Awajan
CTO, Aramex