Lead Data Engineer GCP

Roles and Responsibilities


Data Architecture Design 

  • As a Lead GCP Data Engineer, your primary responsibility is to design the data architecture that supports efficient data processing and analysis on the Google CloudPlatform. This involves understanding the data requirements of the organization and working closely with data scientists, business analysts, and other stakeholders to design effective data models and structures. You will need to choose the appropriate GCP services and technologies to build a scalable and robust data architecture that aligns with the organization’s goals. 

Data Pipeline Development

  • Developing data pipelines is a key responsibility of a Lead GCP Data Engineer. These pipelines enable the smooth flow of data from various sources to the desired destinations, ensuring data quality, reliability, and governance. You will work with GCP services like Google Cloud Storage, Bigquery, Dataflow, and Pub/Sub to build data ingestion, transformation, and processing pipelines. This involves coding, scripting, and configuring these services to ensure data is processed and transformed efficiently. 

Data Transformation and Integration 

  • Lead GCP Data Engineers are proficient in data transformation techniques and tools. You will leverage technologies like Apache Beam, Apache Spark, and Cloud Dataproc to clean, transform, and integrate data from diverse sources. This involves performing data cleansing, aggregation, enrichment, and normalization to ensure data consistency, accuracy, and usability for downstream applications and analytics. 

Performance Optimization 

  • Lead GCP Data Engineers are responsible for optimizing the performance of data processing workflows. You will monitor data pipelines, identify bottlenecks, and fine-tune the pipelines for optimal performance. This may involve optimizing data transformations, improving data partitioning and sharding, and leveraging GCP’s autoscaling and load-balancing capabilities. Your goal is to ensure efficient resource utilization, reduce processing time, and achieve optimal performance for data processing and analysis tasks. 

Improving Skills Continuously 

  • To excel as a Lead GCP Data Engineer, continuous learning and staying updated with the latest advancements in data engineering and cloud technologies are crucial. You will actively explore new features and services offered by GCP and identify innovative solutions to improve data engineering processes.  Continuous learning involves attending training sessions, pursuing relevant certifications, participating in industry events and forums, and staying connected with the data engineering community. By staying up to date with the latest trends, you can leverage new technologies and techniques to enhance data processing, analysis, and insights. 

Conduct Research 

  • Lead GCP Data Engineers often need to stay informed about the latest industry trends, emerging technologies, and best practices in data engineering. Researching and evaluating new tools, frameworks, and methodologies can help you identify opportunities for innovation and improvement within your organization.  By conducting research, attending conferences, and staying connected with the data engineering community, you can bring fresh ideas and insights to enhance data engineering processes and drive continuous improvement. 

Automate Tasks 

  • As a Lead GCP Data Engineer, you will be responsible for automating data engineering tasks to improve efficiency and productivity. This involves developing scripts, and workflows, or using tools like Cloud Composer or Cloud Functions to automate repetitive or time-consuming data processes. By automating tasks such as data ingestion, transformation, or monitoring, you can reduce manual effort, minimize errors, and streamline data workflows.

Experience and Skill Fitment

  • BE / BTECH | MTech | MCA.
  • Minimum 10+ years of work experience in development /Migration projects. 5+ years of experience in handling data engineering/data science-based projects with 3+ years of GCP experience.
  • Experience working in GCP based airflow/Kubeflow development & deployments using Google Cloud  Storage, PubSub, Dataflow, Dataproc, Airflow.
  • Excellent problem-solving and analytical skills.
  • Strong communication and collaboration skills.
  • Ability to work independently and as part of a global team.
  • Passionate for data solutions.
  • Self-motivated and able to work in a fast-paced environment.
  • Detail-oriented and committed to delivering high-quality work.
  • Strong coding skills in languages such as Python, PySpark,Airflow/Kubeflow.
  • Hands-on experience with GCP data services such as Dataflow, Pub/Sub, DataProc, and Cloud Storage.
  • Experience using Databricks (Data Engineering and DeltaLake components).
  • Experience with source control tools such as GitHub and related dev process.
  • Experience with workflow scheduling tools such as Airflow.
  • Strong understanding of data structures and algorithms.
  • Experience building data lake solutions leveraging Google Data Products (e.g., DataProc, AI Building Blocks,  Looker, Cloud Data Fusion, Dataprep, etc.),Hive, Spark.
  • Experience with relational SQL/No SQL.

Benefits:

  • Kloud9 provides a robust compensation package and a forward-looking opportunity for growth in emerging fields.

Equal Opportunity Employer:

  • Kloud9 is an equal opportunity employer and will not discriminate against any employee or applicant on the basis of age, color, disability, gender, national origin, race, religion, sexual orientation, veteran status, or any classification protected by federal, state, or local law.

Resumes to be sent  to: recruitment@kloud9.nyc

Apply Online

-->