There's a lot of talk around data mining social media but not much action, according to Matthew Russell, author of Mining the Social Web, Second Edition, a guide to social data collection and analysis. The perceived difficulty of data mining is a main barrier for those interested in getting involved. It's a false barrier, said Russell, because mining Twitter with well-known programming languages -- Python, in particular -- doesn't require advanced developer or data scientist skills.
Mining data in social media helps businesses gather crucial information. Once they know the basics of putting in API requests, analyzing sales trends or code and so on, they can use the insights to fuel innovation, according to Russell, CTO of cloud services provider, Digital Reasoning Systems Inc. This article presents his advice for developers who, in his words, want to get their hands dirty with data.
The ultimate data mining platform
Every social networking medium presents a value proposition for data mining, but Russell sees no better starting point than Twitter. The simplicity and asymmetric following model of this platform, together with the 232 million active monthly users, make it particularly suited for data mining. Russell likened the app to a busy public street. "You have a lot of chatter and amongst all that chatter there's a signal that can be teased out."
From a developer perspective, Twitter is particularly suited for data mining because of three key attributes.
Matthew RussellCTO, Digital Reasoning
- Twitters's API is well designed and easy to access.
- Twitter data is in a convenient format for analysis.
"I think the simplicity of [Twitter, aggregated] with millions of active accounts, creates a lot of value," Russell said. This potential value is largely untapped, however, and Russell believes executives and developers are missing opportunities to uncover important social trends.
Twitter's data is almost exclusively used for reputation management, branding and sentiment analysis. In other words, it's advertising. "When you have 232 million active monthly accounts with a fairly high percentage of those active daily, that's where there are some other unique opportunities, as far as social research goes," Russell said.
He described Twitter as an interest graph, an online portrayal of an individual or group's interests. On a small scale, an interest graph serves to predict purchase behavior. On a larger scale, it can be used to analyze societal trends. "If you think of a following relationship as an 'interested in' relationship, which it really is, you have some pretty powerful data in aggregate," he said. When an interest graph operates on a massive scale, its potential for valuable insights begins to stretch beyond advertising. "There's already a body of data so you want to tap into that, not so much to result in an action like selling something, but really to understand what's going on in a market or in a niche area." An example is a hedge fund whose entire trading model was built on interest analysis of Twitter data, which they then used to make smarter investments.
The value of Twitter's API should not be underestimated, in Russell's opinion. The API acts as an entry point that can allow the Twitter platform to enable third-party innovation. "There are an awful lot of creative human beings out in the world who will probably have more good ideas than [Twitter] could ever come up with internally," he said. Twitter's API is an enormous and, as of yet, underused resource. "Anyone can harness that energy and tap into this third-party commodity in order to innovate, from a tiny little startup of one person with a good idea to a much larger corporate entity with dozens of software developers."
The potential of Twitter's self-organizing, ever-growing pool of data offers direct insight into trends and interests on both a personal and collective scale, but it has yet to fully capture the imagination of developers. And, in terms of the value that could come from data mining social media, Twitter is just the tip of the iceberg. Russell predicts and hopes that businesses will begin to treat ads as a means to an end, and that they will find the true value in innovating with social media data.