Comprehensive Data Solutions

Covering Your Entire Data Infrastructure

Data is changing the world more than ever, and companies need data engineers who understand the whole picture - engineers who can provide comprehensive solutions. See how our turnkey solutions can transform your company's data infrastructure.

Collection

From the outset, data must be gathered correctly and accurately.

Analysis

The power of data comes out of the insights and story of the analysis.

Storage

Collecting is not enough. It must be stored in a way that it can be retrieved efficiently.

Visualization

Even with good analysis, we need to visualize the story of the data.

Data Collection

The first step in any data solution is the actual collection of data. There are three common ways of collecting data:

  • Human Entry
  • IoT Devices
  • ETL Processes

We are most familiar with human entry. This is as it sounds - a person enters data, usually through some kind of UI (User Interface). Examples include entering data as another person gives their name, phone number and so forth. Or a biologist working in the field, entering soil sample information. Or perhaps a loan officer entering suggested values for determining a loan applicant's viability.

IoT (Internet of Things) refers to the world of electronic sensors and other devices collecting data. These are now ubiquitious in a lot of the world, collecting data on almost everything imaginable, from cars passing on a highway, to temperature readings every ten seconds, to a heart-rate monitor, to video surveillance. And much, much more. This way of collecting data has expanded the amount of data being collected by orders of magnitude. I have worked on systems that log millions of rows of data every hour. And this data translates into money saved, efficiency increases, and other significant benefits.

Finally, ETL processes also collect data, although technically they are really just moving or transforming data. This may include internal flows, where data in one store is copied or aggregated, and then inserted elsewhere. It may also include pulling data from third-party sources, such as the National Weather Service.

All of these methods of collecting data must be done in an effective way, or the resuling data set is less than optimal, and in some cases may be unreliable or even useless. The key is to collect the right data, in the right format, at a reliable rate, so that it can be used later for analysis and visualization - so that the data can in fact tell its story. If this first step in the data solution is flawed, all subsequent steps are likewise damaged.

Data Storage

Data Storage is the process of taking actual bytes of data and storing them in some sort of data store, where they can be retrieved quickly and efficiently later. This process is much more difficult than many realize, and this is the main area where teams go wrong in their data solution.

Data storage is a science in and of itself, and differs drastically from other, common software procedures. One could write the most amazing software algorithm, that could finally and definitively determine if Captain Kirk is in fact "better" than Captain Picard, but that same developer may not have a clue as to how to store large amounts of data in a way where it can be retrieved quickly and according to what data subset is actually needed.

What also makes data storage complicated is the nature of exponential performance decay. If you have one hundred rows of data in a database, you can retrieve it quickly no matter how you stored it. Now ramp that up to one thousand, one million, how about one billion rows of data? And we're just getting started. If you have stored ten billion rows of data, can you retrieve the subset you need, in less than one second? What if you need it in one tenth of a second? So many systems out there break down, even to the point of taking down the server itself, when too much data gets inserted improperly into a system.

As would be expected, if you store data, but cannot retrieve it, the data solution becomes useless, and the last two steps cannot be performed effectively.

Data Analysis

We are now at a very fun step, data analysis. This is the process of determining what the data is actually saying. A billion rows of data, sitting in a database, is not saying much at all. What is the story? What can be gleaned? What useful, summary information can we learn from all this data?

In some industries, data is used for marketing and sales. Determining what sells, and why. In some industries, data increases profit. Monitoring the stock market. Following the trends in the price of gold. In some industries, data tries to predict the future. Based on the thermal sensors, this machine is about to go down. The engine on this plane is vibrating more than normal, and should be shut down. The cases of COVID-19 in this area suggest that a major outbreak may be coming.

In most cases, data is used to increase profitability in one way or another. It can also be used to protect health and safety. It can be used in scientific studies to discover new things or confirm current theories. But in the end, it is the story that the huge mounds of data tell that makes the whole data solution worth its keep.

Data Visualization

Okay, so we have collected, stored and analyzed data. A lot of it. Good job! There does, however, remain one final, key step in the whole solution. We need to get the summations, metrics, and whatever analyses we have run, and present these to the end users - and these end users are almost certainly a mixed group of people. Some may have statistical and data knowledge. Most will not. Some may understand logarithmic graphs, histograms, box and whisker charts, but for many these widgest may only confuse the story. Choosing the right tools and visualizations are key in completing the data solution and presenting an understandable, lucid story to the interested parties.

Sometimes, the data is clear enough to the end users, that it simply needs to be presented in the right set of widgets. In a manufacturing context, the shop floor supervisor needs to see that the machinery is operating within acceptable tolerances. The saftey person needs to see that proper protocols are being observed, or that hazards are being avoided or tackled immediately. The inventory people need to see that the right amount of raw material is coming in and getting stored in the right places, and that the right amount of product is headed out the door.

The list goes on and on, but key to the data solution is providing the right data to the right people in the right context, so that the message is broadcast, and the right actions can therefore be taken in response. In many cases, this data story is far more than a nice convenience - it may be the factor that determines whether a person is injured, or if a colossal amount of money is lost. Data, properly collected, stored, analyzed and presented, must be recognized as a game changer in a company's effectiveness, and a differentiator in competing successully in todays's world.