A Self-healing Data Pipeline

As discussed above, enterprises are creating, collecting, and replicating brain-numbing amounts of data. The volume and diversity of data means that most organizations have several analytics platforms to derive insight from multiple data hubs. These platforms and hubs will lead to myriad data models, different toolsets and, ultimately, complexity and difficulty making data usable, accessible, and secure. And if the C-suite needs rapid analysis of real-time data? That’s going to be a very tall order, indeed.As an example, let’s say a large enterprise wants to gain a holistic view of its customers to understand whether they’re getting full value from the company’s solutions and how satisfied they are. They’ll need to pull data from their SaaS-based CRM, data warehouses, net promoter score (NPS) systems, support ticketing platforms, and likely many other sources. These data workflows will need to be orchestrated, integrated, and transformed to a central source — ideally cloud-based — for analysis.“When you have data sources spread across different areas of your organizations, you have different permissions on each of those solutions,” said Niamh O’Brien, manager of solution architecture at Fivetran, a Google Cloud technology partner. “Different people have different access levels, it’s highly inefficient if you’re actually trying to do analytics across a number of business units. So having the ability to centralize those sources into the cloud, there’s enough infrastructure there that you’re actually able to analyze across different business units. [This] will remove a lot of the bottlenecks and friction points that exists within your organization, not just from a data perspective, but also from a people and a collaboration perspective.”Manually creating data pipelines using a traditional extraction, load, and transform (ELT) approach is going to consume a great deal of resources, because a traditional enterprise data warehouse doesn't accommodate new data sources easily. Integrating them is a time-consuming, cumbersome process. And even worse, many of them don’t easily support machine learning or predictive analytics, both of which are rapidly becoming table stakes to compete in a modern marketplace. This is time that a company cannot afford to burn. In fact, it often takes so much time to create the pipelines that organizations end up making business decisions based on data at the ingestion stage, because they simply can’t wait for the data to be ready to analyze.

When you have data sources spread across different areas of your organizations, you have different permissions on each of those solutions. Different people have different access levels, it’s highly inefficient if you’re actually trying to do analytics across a number of business units.

— Niamh O’Brien, manager of solution architecture at Fivetran, a Google Cloud technology partner

What’s more, even after the engineer creates the pipelines, they’re fragile. It’s a complex system, and manual methods cannot keep up with changes to the data pipelines. API updates, changes to business processes, the addition of new tables and fields, new custom objects — any and all of these can break the workflow. Now an engineer needs to troubleshoot the problem and adjust the pipeline so data can begin flowing properly again. Just as problematic, the organization becomes dependent on the engineer or engineers who built the pipeline originally. Without their specific expertise and familiarity with the system, anyone who needs to fix and maintain it will need to familiarize themselves with its inner workings, again, delaying time-to-insight.Finally, these data pipelines must be secure. A constant stream of manual changes introduces the likelihood of human error, creating vulnerabilities that malicious actors can exploit.In order to prevent this, organizations should automate the creation of data pipelines with an intelligent, self-healing data integration solution that delivers reliable data to a cloud-based data warehouse. Modern solutions come with hundreds of pre-created, automated data pipelines that reduce their creation time from months to minutes. Plus, in the background, these platforms are supported by skilled engineers who continuously update the pipelines to accommodate updated APIs and database versions. This ability ensures that organizations are making decisions based on the most current data.Fivetran provides just such a data integration solution, and it integrates effortlessly with Google Cloud BigQuery, a fully managed, serverless data warehouse. Fivetran is secure and self-healing, ensuring that data pipelines are not just fast to set up, but also reliable, even in the face of constant change. Together with Google Cloud, Fivetran enables organizations to exponentially expedite the process of ingesting and transforming data, which shortens time to insight.

These data workflows will need to be orchestrated, integrated, and transformed to a central source — ideally cloud-based — for analysis.

Organisations should automate the creation of data pipelines with an intelligent, self-healing data integration solution that delivers reliable data to a cloud-based data warehouse.