I have been researching on Big data analytics for social networks for sometime now. I would like to share what I have seen and known. May be some discussion on it.
I would like to share the insight I have gained to the larger audience at VMWare.
Measuring IT efficiency varies significantly as the data capture, data processing and storage is viewed differently than the traditional systems in Big data analytics. Dell’s Hadoop infrastructure solutions have some answers.
Big data is all the voluminous and unstructured data from a wide ranging sources in the form of click stream data from websites, social media data like ‘Likes’, Tweets and ‘Blog posts’ etc. and from video entertainment as well. Just to give you an idea, Google processes about 24 petabytes of data and not all of this in rows and columns. The consumers as well as working professionals in the organizations have begun to realize the potential value and the intelligence that can be derived from the vast amount of data that is generated through social media conversations.
Big data technologies rely on their ability to handle large amounts of unstructured data. The server infrastructure capability depends on their ability to handle geometric growth of social networks. Data is generated all the time and in real time in social networks.
The challenges for mining such huge voluminous unstructured data are of two kinds. Firstly, this requires use of emerging technology such data mining grid and Map reduce infrastructures such as Hadoop and a non-linear and non-deterministic software architecture. This actually changes the way we think about data capture and processing.
Secondly, it is known fact that ‘what we measure is what we manage’. We need to know ‘What we are looking for’ and the timing ‘When to ask the question’ is important. ‘Spotting trends’ is one emerging area in social media analytics. Then the question, Do you know what you are looking for? Still lingers on.
Cheers.