GCP (Google Cloud Platform) is a suite of public computing services that provides IaaS (Infrastructure as a service), PaaS (Platform as a Service), SaaS (Software as a Service) and Serverless computing environments.
The platform offers services for compute, storage, networking, big data, machine learning and the internet of things (IoT), as well as cloud management, security and developer tools.
For our client, a leading automotive company, we've developed a BigData Platform based on GCP and that processes and stores the industrial company from factories, test vehicles, packaging. The platform it's the main pillar of Industry 4.0 company's strategy.
Tools that we are using from GCP in our current projects:
- BigQuery - is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. In our use case we've employed BigQuery as the main storage of the Industrial Platforms analytics data.
- Cloud DataFlow - Google’s big data solution for stream & batch processing based on Apache Beam library. We’ve used it for stream & batch processing of large volumes of data in the Automotive Industry.
- Cloud DataProc - is a managed Spark and Hadoop service on GCP. We’ve used it for stream & batch processing flows. Cloud DataProc was the initial solution but we’ve migrated it to the Cloud DataFlow
- Cloud DataLab - we’ve used to explore, visualize, analyze, and transform data in different studies for seeing data patterns in the industrial platforms and ways to improve existing services.
- Cloud PubSub - is an asynchronous messaging service that decouples services that produce events from services that process events. We’re using CloudPubSub for event ingestion and delivery for streaming analytics pipelines.
- Cloud Functions - is a serverless execution environment for building and connecting cloud services. We are using cloud functions for events where scaling with DataFlow is not necessary and for short lived processing logic that can be processed in seconds or minutes.
- Google Kubernetes Engine is a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure. The GKE environment consists of multiple machines (specifically, Compute Engine instances) grouped together to form a cluster. We are using GKE for deploying applications built with Java, Spring and that are accessed by the people from around the world in the client company.
- Cloud Storage - is a service for storing objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format. The objects are stored in containers called buckets. We’ve used Cloud Storage for file transfer, persisting configuration, staging & temporary locations.
- Cloud SQL is a fully managed service that makes it easy to set up, manage, and administer relational databases: PostgreSQL, MySQL, and SQL Server. We’ve used it for storing web applications state.
- Cloud Memorystore - Memorystore for Redis is a fully managed Redis service for the Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments. We’ve used Memorystore for sharing state between DataFlow jobs, for the data where we needed fast and reliable access to data stored as a key value pair.