Did Facebook Just Beat Google At Its Own Game? (via allfacebook)

Ken Deeter, Facebook Engineer

The keyword used to be the exclusive province of Google, and one of the things that may ensure Google doesn’t become completely overshadowed by Facebook. But then Twitter’s trending topics began to eat into that monopoly.

Now Facebook may show you stories from multiple people about the same topic. How does the social network do it? How do they prevent you from seeing a bunch of stories about dogs or bananas?

It turns out, one of Facebook’s engineers revealed the secret sauce on Quora. But he did so in quite opaque, academic language, so I’m going to break it down into real-people-speak (for myself and you).

Ken Deeter explains:

I was the lead engineer on this project so I’ll give this a shot. Without going into too much secret sauce…

1. We build language models based on publicly available corpora for our entity extraction. Based on this data we can extract topics at various levels of confidence. To answer your question, yes, it can figure out terms like “arrested development” out of normal text. It can also disambiguate between words like “Apple” the fruit, and “Apple” the computer company.

2. We have a second level of infrastructure that tries to use other data to increase accuracy. Generally you can think of this adding more context into the equation, whereas the first level only takes into account the text of a message.

3. We have some heuristics to decide to show a particular cluster. Generally this is a combination of trying to filter out noise from the extraction system, and deciding when something is newsworthy enough to show. Two of your friends talking about bananas, for example, is pretty uninteresting.

Like I said, we’re going to have to break that down.

Facebook’s Language Models

Facebook probably also uses lists of the names of pages with many likes (including place and community pages).

Perhaps the company dips into other publicly available lists of hot topics like Google and Twitter trends or the Yahoo Buzz Index.

The social network, however, has all kinds of data on what people on Facebook are sharing, what pages they’re commenting on and so on.

So even if the company uses only internal resources, there’s a huge amount of data on what the most popular topics are at any one time.

Context Improves Keyword Groupings

What this reminds me of is Google’s related keywords. One of the things that goes into Google’s rankings is whether you use ancillary words and phrases surrounding the main keyword.

For example, for consideration of whether you should rank for “camping gear,” do you talk about things like tents, boots, hiking, fires, food, and water purification? It could work like that on Facebook, which might also use a social context.

I suspect from Facebook advertising’s topic targeting that the company has quantified the affinities between various precise interests. In other words, Facebook knows that if you like the band Coldplay, there’s a 35 percent chance you also like Death Cab For Cutie.

This is just an example, and probably the wrong value. But if Facebook wonders whether you’re writing about a politician and you have many politically-oriented likes in your profile, that would be a context that would increase confidence that you’re talking about that keyword.

Heuristics, Important Topics and Salience

I just love to use the word salience whenever I can. I once studied attention deficit disorder, and in this mental condition the brain has trouble determining what is most salient (important or high priority).

If you have a low signal to noise ratio, cognitively, you can’t focus on something (signal) and ignore all the other stuff going on at the time (noise).

So Facebook is using some rules of thumb (heuristics) to arrive at whether a topic is important enough and talked about enough to show in the news feed.

The example he gives is something mundane (bananas — who cares?) and a small amount of conversation (two people).

However, we must assume that if 10,000 people talk about bananas and Google News is carrying a story about a problem with bananas, it’s an important topic and we should show posts around that keyword.

Will Facebook Kill Google With This?

The fact that Facebook has developed algorithms around keywords is a big problem for Google. As soon as Facebook includes keywords as an option in Facebook advertising, Google AdWords (Google’s primary revenue source) becomes much less important.

AdWords may always have a leg up since it analyzes keywords for all websites, but why shouldn’t Facebook move this direction? Why shouldn’t the social network make its search functionality as good as Google’s?

Google has not proven they can successfully imitate Facebook’s strengths, but Facebook may be showing they can duplicate Google’s.

Brian Carter is the author of The Like Economy: How Businesses Make Money on Facebook. He’s also speaking at Socialize West this Thursday.