An architect's guide: How to use big data
A comprehensive collection of articles, videos and more, hand-picked by our editors
Left unmanaged, big data becomes dead weight. Luckily, there are insights to be gained and repeatable patterns to be identified for those businesses willing to capture, analyze and store the massive amounts of information streaming through their network each day. While big data tools are still in their infancy, this industry is expected to grow tremendously in the next few years, and companies will gain long-term value from investing in their own big data before it gets too big to manage.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Below are answers to frequently asked questions on what big data is, why it can be a problem and how it can be a resource.
What is big data?
Big data is the structured and unstructured information that a company accumulates in high volumes and at rapid velocity. Depending on how it's managed, big data can be a valuable resource or an expensive problem. If used for analysis, big data provides insight into business trends and patterns. Left unattended, however, it becomes costly to store, and locating specific information, which may be important in the event of an audit, becomes difficult to do.
What are the primary challenges associated with big data?
In her article about big data problems and solutions, SearchSOA Site Editor Maxine Giza referred to the primary challenges as the three Vs: volume, velocity and variety. Volume is the sheer amount of information that needs to be managed, analyzed, stored and secured. Velocity is the speed at which data streams must be captured and processed. Variety refers to the many different types of data that add further complexity to an already daunting task.
How can I mitigate some of these challenges?
Using big data tools like Hadoop is one way to manage large quantities of data. This is an open source programming framework that supports massive data sets and data transfers. Its streaming technology can capture and store information as fast as it flows in. Design patterns are another way to help reduce some of the complexity associated with big data. These are used to provide template solutions to recurring problems in big data management. Using a variety of purpose-driven design patterns, developers can mash up semi-structured data, find event sequence signals, respond to signal patterns in real time and match up cloud-based data services.
How will big data impact the industry?
According to Gartner Inc., big data technology will grow radically over the next few years. Gartner research predicts that by 2015, 4.4 million IT jobs will have been created to staff big data initiatives. Even now, a Gartner survey has discovered that 40% of almost 500 IT executives have invested or will invest in big data technology.
Should I use SOA or REST when developing big data applications?
This is an important consideration for architects looking to build big data applications. Big data tools usually have a combination of RESTful and SOA application programming interfaces (APIs). This makes understanding which one best fits the development process more difficult. In his article on big data applications, Tom Nolle advises architects to use SOA in scenarios where it's intended that the applications use big data for specific analysis. For applications that use big data as a resource set but aren't intended to utilize high-level services, RESTful interfaces are more appropriate. Security is also a consideration. SOA security can be integrated into the application's access controls and user directories. In REpresentational State Transfer, or REST, on the other hand, security would need to be accessed externally.
Is there a certain skill set that is necessary for managing big data?
Data scientist is an emerging role that will likely be more and more in demand as big data continues to gain momentum. This is a difficult skill set to find because it requires knowledge of new technologies such as Hadoop and Cassandra, familiarity with the domain where the data resides, and creative analytical and problem-solving abilities.