• December 7, 2022

AI Shake-Up As Prominent AI Guru Proposes Mind-Bending “Mortal Computers” Which Also Gets AI Ethics And AI Law Dug In

Here’s something that you probably hadn’t been yet mulling over: Mortal computers. But maybe you should be. The heady topic came up at the recent and altogether quite prominent annual conference …

Piiano Releases Secure Database For Enterprises And Developers

Data protection company Piiano has released Piiano Vault, a secure database that’s designed to give enterprises the ability to store safely sensitive personal data in compliance with the EU’s GDPR, California’s …

How Creative Artists Agency (CAA) Is Employing Data, Analytics, And AI To Shape The Culture Of Our Times And Inspire The World

George Clooney. Cate Blanchett. Beyoncé. Lady Gaga. Brad Pitt. Tom Hanks. What do these artists have in common? Each of them, among others, in addition to a roster of professional athletes, …

Data is flowing faster. As we have noted here recently, modern business is now increasingly running on data streaming technologies designed to channel a flood (in a positive sense) of real time data into and out of applications, across analytics engines and through database structures.

Some of that data flow will now reside and be processed in well-known enterprise databases from the data vendors that even average non-technical laypersons will have heard of. Other elements of that streamed data flow need to be churned and wrangled through the new and more powerful services offered by the major ‘hyperscaler’ Cloud Services Providers (CSPs).

Getting data from one (often legacy) database into a hyperscaler data service involves more than investing in a new cable or clicking a button.

Stream on Striim

Logically named to convey a sense of data flow from the start, Striim, Inc. (pronounced stream, as in river) works to not only create and build the data pipeline to get data from traditional databases to new cloud services, it also works to filter, transform, enrich and correlate that data on its journey.

The company’s Striim for BigQuery is a cloud-based streaming service that uses Change Data Capture (CDC) technologies (a database process designed to track, pinpoint and subsequently work on the changed data in any given set of information) to integrate and replicate data from enterprise-grade databases such as Oracle, MS-SQL, PostgreSQL, MySQL and others to Google Cloud BigQuery enterprise data warehouse.

In short, Google BigQuery cloud data service for business intelligence.

To explain the technology in full, Google BigQuery is a fully managed (cloud-based platform-as-a-service) serverless (a virtualized server technique to deliver server resource requirements more precisely at the actual point of use) data warehouse (a data management technique created by bringing together information from more than one source) that enables scalable analysis over petabytes (1024 terabytes) of data with built-in machine learning capabilities.

Organizations using this technology can now build a new data pipeline to stream transactional data from hundreds and thousands of tables to Google BigQuery with sub-second end-to-end latencies. This is the kind of intelligence needed if we are going to enable real-time analytics and address time-sensitive operational issues.

“Enterprises are increasingly seeking solutions that help bring critical data stored in databases into Google BigQuery with speed and reliability,” said Sudhir Hasbe, senior director of product management, Google Cloud.

Water-based data flow analogies

If it feels like we’ll never conceivably run out of water-based data flow analogies, we probably won’t. This is a zone of technology where organizations need to replicate data from multiple databases (that they have previously been operating, many of them before the so-called digital transformation era) and get that data to cloud data warehouses, data lakes and data lakehouses.

Advertisement

Why would companies need to do this and get data flowing in this direction? To enable their data science and analytics teams to optimize their decision-making and business workflows. But, there are traditionally two problems a) legacy data warehouses are not easily scalable or high-performant enough to deliver real-time analysis capabilities and b) cloud-based data ingestion platforms often require significant effort to set up.

Striim for BigQuery offers a user interface that allows users to configure and observe the ongoing and historical health and performance of their data pipelines, reconfigure their data pipelines to add or remove tables on the fly, and repair their pipelines in case of failures.

Fresh data, come & get it

Executive VP of engineering and products at Striim is Alok Pareek. Pointing to the need for what he calls ‘fresh data’ (i.e. streamed real-time data that works at the speed of modern life and business with user mobile device ubiquity and new smart machines creating their own always-on information channels) to get business decisions right.

“Our customers are increasingly using BigQuery for their data analytics needs. We have designed Striim for BigQuery for operational ease, simplicity and resiliency so that users can quickly and easily extract business value from their data. We have automated schema management, snapshot functionality [a means of saving the current state of a data stream to start a new version or for backup & recovery purposes], CDC coordination [see above definition] and failure handling in the data pipelines to deliver a delightful user experience,” said Pareek.

There is automation happening here too. Striim for BigQuery continuously monitors and reports pipeline health and performance. When it detects tables that cannot be synced to BigQuery, it automatically quarantines the errant tables and keep the rest of the pipeline operational, preventing what could be hours of pipeline downtime.

Striim for BigQuery Striim works to continuously ingest, process and deliver high volumes of real-time data from diverse sources (both on-premises or in the cloud) to support multi- and hybrid cloud infrastructures. It collects data in real-time from enterprise databases (using non-intrusive change data capture), log files, messaging systems, and sensors and delivers it to virtually any target on-premises or in the cloud with sub-second latency enabling real-time operations and analytics.

Hyperscaler indifference?

All of which is great stuff then i.e. we can get data from Oracle and other above-noted databases to hyperscaler Cloud Service Provider (CSP) clouds from Google, AWS and Microsoft better, faster, more easily and at a more cost-effective price point. We can even do so with a greater degree of additional (cleansing, filtering etc.) services.

Why, then, don’t the major cloud players offer this kind of technology?

In truth they do – remember when we said that cloud-based data ingestion platforms often require significant effort to set up? Many of these functions are possible with the hyperscalers and it’s not hard to find reams of documentation across the web from all three big clouds detailing the internal mechanics of snapshots, streaming and schema management. It’s just more expensive and usually not as dedicated a service (they do have the planet’s biggest clouds to run, after all) and typically without all the kinds of add-ons discussed here.

The water-based data flow analogies will continue – coming next: the data jet wash, probably.

Advertisement

Leave a Reply

Your email address will not be published.