Big Data, Data Scientists, and You
Since the early 2000s, Big Data (as both a concept and an industry) has been on the rise, and with it has come the creation of many new jobs and positions. Many businesses are eager to hop aboard the hype train and get in on all the benefits Big Data has to offer, but often don’t know what it is or how to take advantage of it.
In short, Big Data is a term coined in 2008 by industry analyst Doug Laney regarding three aspects of data that flood a business on a day-to-day basis. These aspects, known as The Three Vs, are Volume, Velocity, and Variety.
- Volume refers to the sheer amount of data that a company takes in on a daily basis, through events such as business transactions or social media interactions.
- Velocity deals with how fast data is obtained, and technology like RFID tags, sensors, and smart metering have evolved to handle and organize the large data flow.
- Variety refers to all the formats that data comes in, whether it’s structured numerical data in a database or free floating emails, text documents, or financial transactions.
It’s a lot to sift through, and most executives have no idea how to manage it all, never mind extract useful information out of it. That’s where the data scientists come in.
So what exactly is a data scientist?
According to IBM, a data scientist is similar to a business or data analyst in that they use data to uncover information about specific topics, and then interpret that information and present it in the forms of graphs and charts. Where they differ is that they have a much stronger grasp of business politics, and that they’re able to communicate their findings clearly with both business and IT professionals. Their job is to take all the data gathered by a company, analyze it, and come up with solutions to problems based on the results.
Perhaps the best way to describe them is as individuals who are experts in both statistics and machine learning, as well as software engineering. Retailers, social media sites, or online services all use algorithms designed by data scientists that analyze user behavior in order to make recommendations and improve the user’s experience.
A venn diagram of the skills required to make a data scientist. Image courtesy of drewconway.com
Additionally, data scientists have been involved in various other projects in sectors you might not think of. For example, data scientists from the Data Science for Social Good fellowship program were brought in to help reduce Chicago’s bus crowding issues. Many basketball teams, such as the Toronto Raptors, have also started installing sports cameras in order to analyze player’s styles and movement patterns to help identify in-game trends and improve coaching. Physicists apply data science methods to help organize and make sense of the vast amounts of data collected by the Sloan Digital Sky Survey.
Does my company need a data scientist?
The above examples are high-profile situations that most companies aren’t exactly concerned about. Many data scientists that are hired by the average business work with data that answers questions like “what will next month’s revenue be?” or “which offer should I present to a user?” In all likelihood, most small businesses aren’t going to need a data scientist on their payroll, simply because they’re not producing the amount of data that needs the attention of a specialized worker. A lot of the problems that data scientists solve can be handled heuristically on a small scale without cause for concern. For projects that may require the touch of a data scientist, outsourcing to technical solution companies (like DCD), will be far more cost-effective than hiring on a permanent position.
KDNuggets suggests considering five questions before looking into hiring a data scientist.
- Do I know what a data scientist does?
- Do I have enough data available?
- Do I have a specific problem to solve?
- Can I get away with heuristics, intuition, and/or manual processes?
- Am I committed to being data driven?
A graph of data scientist as they are employed by industry. Image courtesy of KDNuggets.com
If you’re looking to hire a data scientist, it’s arguably the most important to be able to answer questions 1 and 3 with a firm, solid yes. Hiring a data scientist without understanding who they are or what they do means that there is a chance that they will be unproductive and become a drain on company resources. It’s easy for a data scientist to generate reports based on data gathered by the company, but without a specific goal in mind, these reports don’t really do much.
Overall, Big Data and Data Science are still fairly new concepts in the IT industry, and they’re still finding their footing. While taking advantage of this new aspect of technology is an exciting prospect, for small companies, it’s simply better to wait until they iron out the kinks before adding data science to your business model.
Still confused about what a data scientist does? As a data scientist, is there any input you would like to share? Do you use one in your company? Why or why not? Comment and share!