proficiency in Python
Pandas and NumPy
Strong skills in SQL
Apache Spark
Hadoop
You will play a key role in designing and developing robust, high-performance ETL pipelines and managing large-scale datasets to support critical business functions. This role requires deep technical expertise, strong problem-solving skills, and the ability to thrive in a fast-paced, evolving environment.
Key Responsibilities
- 1) Design, develop, and maintain scalable and reliable ETL/ELT pipelines for processing large volumes of data (terabytes and beyond).
- 2) Model and structure data for performance, scalability, and usability.
- 3) Work with cloud infrastructure (preferably Azure) to build and optimize data workflows.
- 4) Build and manage data lake/lakehouse architectures in alignment with best practices.
- 5) Optimize ETL performance and manage cost-effective data operations.
- 6) Collaborate closely with cross-functional teams including data science, analytics, and software engineering.
- 7) Ensure data quality, integrity, and security across all stages of the data lifecycle.
Required Skills & Qualifications
- 1) 6 to 7 years of relevant experience in data engineering.
- 2) Advanced proficiency in Python, including libraries such as Pandas and NumPy.
- 3) Strong skills in SQL for complex data manipulation and analysis.
- 4) Hands-on experience with Apache Spark, Hadoop, or similar distributed systems.
- 5) Proven track record of handling large-scale datasets (TBs) in production environments.
- 6) Cloud development experience with Azure(preferred), AWS, or GCP.
- 7) Solid understanding of data lake and data lakehouse architectures.
- 8)Expertise in ETL performance tuning and cost optimization techniques.
- 9)Knowledge of data structures, algorithms, and modern software engineering practices.
Soft Skills
- 1) Strong communication skills with the ability to explain complex technical concepts clearly and concisely.
- 2) Self-starter who learns quickly and takes ownership.
- 3) High attention to detail with a strong sense of data quality and reliability.
- 4) Comfortable working in an agile, fast-changing environment with incomplete requirements.
Preferred Qualifications
- 1) Experience with tools like Azure Data Factory, or similar.
- 2)Familiarity with CI/CD and DevOps in the context of data engineering.
- 3)Knowledge of data governance, cataloging, and access control principles.