The past decade has seen a huge growth in online advertising. It is an enormous industry with brands expected to pour over 30 billion of dollars in 2014. Online advertising provides companies with instant feedback and publishers with more knowledge about their users. Advertisers are very interested in precisely targeted ads. In particular, they want to spend the smallest amount of money and get the maximum increase in profit. This is resolved by applying targeted advertising.
The problem involves determining where, when, and to who to display particular advertisements on the Internet. Advertising systems deliver ads based on demographic, contextual, or behavioral attributes. Sponsored search is one example. It is the most profitable business model on the web and accounts for the huge amount of income for the top search engines Google, Yahoo, and Bing. It generates at least 25 billion dollars per year.
There are a couple of usable methods to do targeted advertising:
The online advertising industry has grown significantly during the past few years, with extensive usage of real-time bidding exchanges (RTB). This auction website allows advertisers to bid on the opportunity to place online display ads in real-time. Advertisers are integrated with an exchange system via API and collect a variety of data to decide whether or not to bid and at what price. This has created a simple and efficient method for companies to target advertisements to particular users.
As the industry standard, showing the display ad to the consumer is called “impression”. The auctions run in real-time and instantly trigger when a user navigates to the web page and take place during the time the page is completely loaded in the user’s browser. During the auction, information about the location of the potential advertisement, along with user information, is passed to bidders in the form of bid requests. This data is often appended with information previously collected by advertisers about the user.
When an auction starts, a potential advertiser makes the decision if it wants to bid on this impression, at which price, and what advertisement to show in case it wins the auction. There are billions of such real-time transactions each day, and advertisers require large-scale solutions to handle such auctions in milliseconds.
Such complicated ecosystems are a perfect opportunity for applying machine learning services, which play a key role in the ad bidding optimization process, increasing the targeting accuracy and reaching the ultimate goal from marketer’s perspective “Address the right browsers with the right message at the right moment and preferably at the right price”.
The main task of a machine learning system is to identify prospective customers – online users who have a higher propensity to purchase a specific product in the near future after being displayed in the advertisement. The ultimate goal is to build a system that will learn predictive models for each ad targeting automatically. One of the challenges of building such systems is that different ad campaigns could have different performance measures.
However, each of these criteria may be approximately represented as some ranking of potential purchases in terms of purchase propensity. A primary source of input features for behavioral targeting is user browser history, recorded as a set of web pages visited in the past. The target labels could be individual for each campaign and based on actual purchases of the specific product. From a high level, this looks like an example of a straightforward predictive modeling problem. But if we take a closer look, it appears that it is impossible to obtain the necessary amount of training data directly for this problem.
First, the probability of purchase in the next 7 days after seeing the ads is very low and is in the range from 0.0000001 to 0.001, depending on the advertisement campaign. Second, the input feature vector includes more than one million features, even in the simplest case (considering the user browsing history is encoded as a set of hashed URLs). These dataset attributes involve difficulties in the training process, however, there are efficient approaches which are designed to predict the consumer purchase propensity in such difficult circumstances.
We know that the probability of purchasing after seeing the advertisement is a rare event. This causes model training with highly imbalanced class distribution (skewed classes). The simplest and most widely used approach is to introduce proxy-trained models. Currently, the most common proxy is clicking on an advertisement. The efficiency of campaigns is often evaluated based on “click-through rate” (CTR). As a result, they are optimized toward increased CTR.
In this approach, clicks on advertisements are treated as positive samples. Hence instead of conversions, the model is trained using clicks, but the test set is still labeled by conversions. In a recent study [1], this approach was tested against 10 different ad campaigns. The result implies that targeting based on clicks does not necessarily mean maximizing conversions.
Are there other good proxy candidates for evaluating and optimizing the advertising campaigns? Latest researches [2], answered this question. In contrast to clicks, site visits turned out generally to be good proxies for purchases. Specifically, site visits do remarkably well as the basis for building models to target browsers who will purchase subsequent to being shown the ad. Even is some cases the models trained on site visits are producing better results than one trained on conversions.
The results show that site visitors are more likely to tend to be purchasers rather than ad clickers.
As mentioned earlier, another difficulty in predicting the purchase is the huge input feature space, which typically requires dimensionality reduction. In most cases ad targeting system tracks over 100 million unique URLs, and any of them could be used in a predictive model. It’s very expensive to build and store such high-dimensional models. However, a number of dimensionality reduction techniques are available nowadays, but not all of them are well suited for ad targeting problems.
The simplest method for massive binary feature space reduction is feature hashing. It transforms a bag of words into a bag of hashed IDs. Given a set of tokens and a hash function h(), we apply the hash function to each of the tokens, and the new feature space is simply the set of hashed values. We can generate a column index for a given token with a hash function. The output of the hash function should be big enough to avoid collision with even a million unique tokens. The pseudocode is following:
function hash_vectorizer(features : string array, N : integer): x := new vector[N] for f in features: h := hash(f) x[h mod N] += 1 return x
Dimensionality reduction results from hash collisions. For example, if a URLs set contains {intel.com, nytimes.com, nyu.edu}, and we have h(“intel.com”) = 6, h(“nyu.edu”) = 6 and h(“nytimes.com”) = 8 then, in the new space, the hashed URLs set has values for features 6 and 8. Hash functions are typically 32-bit or 64-bit, and to project into an arbitrary k-dimensional feature space, we compute h() mod k.
Another approach is Contextual Categories. The web has a number of sources, both proprietary and free, that categorize specific web pages by their content. These categories serve as content-based groupings that can be used to reduce the dimensionality of the data. With category data, the original feature space of URLs becomes a feature space of categories.
There are many other techniques for dimensionality reduction, including Singular Value Decomposition (SVD) and Principal Component Analysis (PCA), which are proven to be good alternatives for reducing the huge URL feature space.
This article made a brief overview of the targeted advertising business, which is a multi-billion industry and is growing dramatically. Most of the big players in the online advertising market are working with Real-time bidding systems (RTB), which connects advertisers and publisher. RTB acts as an online auction allowing advertisers to bid on the opportunity to place online display ads for a particular user in real-time.
Right now, in the industry, the key metric for measuring the success of ad camping is click-through rate (CTR), however, recent studies presented that site visits are a better conversion predictor than CTR. At first, sight, applying data science and machine learning for target advertising seems to be a trivial solution. But after looking at the problem more precisely, one may notice underlying difficulties, including rare conversion, lack of training data, and highly dimensional input feature space.
However, a number of researches have been conducted, which identified efficient solutions for solving mentioned difficulties and providing good models for predicting future conversion events. If you need expert support in implementing data science to boost your sales and customer engagement, we are here to help. Get in touch with us.
The breadth of knowledge and understanding that ELEKS has within its walls allows us to leverage that expertise to make superior deliverables for our customers. When you work with ELEKS, you are working with the top 1% of the aptitude and engineering excellence of the whole country.
Right from the start, we really liked ELEKS’ commitment and engagement. They came to us with their best people to try to understand our context, our business idea, and developed the first prototype with us. They were very professional and very customer oriented. I think, without ELEKS it probably would not have been possible to have such a successful product in such a short period of time.
ELEKS has been involved in the development of a number of our consumer-facing websites and mobile applications that allow our customers to easily track their shipments, get the information they need as well as stay in touch with us. We’ve appreciated the level of ELEKS’ expertise, responsiveness and attention to details.