Being hired as the resident IoT data scientist can come with a lot of pressure. Oftentimes the only one on the team with the unique ability to turn data into business intelligence, data scientists are responsible for making key IoT decisions, setting plans, ensuring execution and meeting deliverables. On top of this, there can be a number of stumbling blocks out of the gate that make it hard to reach goals. Being aware of these challenges not only helps put a data scientist on the shortest route to success, it makes it easier to identify where and when more help will be needed.
Here are some of the most challenging requirements data scientists face when starting an IoT project:
1. Poor data quality
Regardless of the business type or size, a data scientist is bound to find messy data. Organizing it can take significant time and effort.
It’s imperative to avoid manual data entry where possible. Application integration tools are one way to automate data entry and reduce the proliferation of typographical errors, alternate spellings, and individual idiosyncrasies from the data. Careful data preparation is also key to good data quality. This involves clear communication and documentation of placeholder values, calculation and association logic, and cross-dataset keys. It also should include using well-defined industry standards and continuous anomaly detection and statistical validation techniques (such as tracking frequency and distribution characteristics on incoming and historical data).
Data scientists should make it clear to program stakeholders what’s involved at this step and the importance of being thorough or else the quality of the result can be jeopardized. Time spent ‘readying data’ also helps prevent re-doing work down the line.
Once data is uniform and consistent, you are ready to start weeding out the data you don’t need. This is an essential step to ensuring data quality, which brings us to challenge number two.
2. Too much data
Despite the current hype around ‘big data,’ an overabundance of data can actually cause a host of problems that prevent meaningful progress. In these instances, reducing features and employing data selection techniques (such as PCA and penalization methods) can help eliminate the noise and cut through to what matters most.
One common misstep when performing predictive analytics, for example, is collecting too much data that is unrelated to reaching the goal. If the data becomes too large, you may fall into the trap of developing excellent predictive models that don’t deliver results due to a combination of high variance fields and an inability to generalize well. Conversely, if you track too many occurrences without robust validation procedures and statistical tests in place, rare events may seem more frequent than they actually are. In either circumstance, validation and testing routines are paramount.
Completing this step might promote the illusion that anything is possible, including the coveted ability to predict critical business events. This couldn’t be further from the truth, as you’ll see in challenge three.
3. Predicting events is not easy
Predictive analytics is an exciting capability made possible by IoT. Because of its perceived value to business, it can quickly become the priority of IoT stakeholders. But predictive analytics is not possible or valuable in all instances. It’s essential to first establish a clear objective for your analytics program and follow that with research to ensure its viability and value upon completion.
For example, an oil and gas business might want to predict failure of oil pumps. The next step is syncing with subject matter experts to determine what predictions will aid in achieving the goal. Next, you’ll need to be sure you have all of the data required to make the prediction. In some cases, plans can be made to obtain data you may be lacking. At other times, you may need to re-set goals.
In advance of any IoT program, data scientists should consider seeking the help of an outsourced data scientist or data science services. Doing so can help avoid mistakes, conserve internal resources, and reduce time to value. In “4 ways data science services is helping businesses reach IoT goals faster,” I discuss the growing trend of using outside data science services to review more data faster and run multiple programs in parallel – instead of sequentially.)
Another hurdle data science services help organizations overcome is the educational gap faced by data scientists applying their skills to a particular vertical market, such as manufacturing or transportation. Working with a team familiar with your industry, a data scientist can quickly learn and apply best practices to their program for optimized results.
Data scientists that prepare for common data science challenges and make plans to overcome them proactively and/or lean on other experts for help, will reach success much faster, driving up the value of their contributions and influence on the business.
This article is published as part of the IDG Contributor Network. Want to Join?