At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio.
We are currently looking for a dedicated Senior AI Data Engineer to join one of our clients' teams. If you're looking for an exciting opportunity to grow in an innovative environment, this could be the perfect fit for you.
- Design, build, and scale robust ETL/ELT pipelines optimized for AI workloads, including RAG, fine-tuning, and batch inference.
- Transform unstructured data sources such as PDFs, logs, and transcripts into structured and vectorized formats suitable for LLM consumption.
- Maintain and automate the data-to-model lifecycle, ensuring AI knowledge bases remain synchronized with changing business data.
- Develop and maintain real-time feature pipelines that support low-latency AI and machine learning applications.
- Integrate data platforms with Kafka and other event-driven systems to enable real-time processing and AI-driven responses.
- Manage and optimize Feature Stores to ensure consistency between model training and production environments.
- Implement automated data quality controls and validation processes to ensure the reliability and accuracy of AI training and inference data.
- Establish and maintain data lineage frameworks to provide traceability, auditability, and regulatory compliance across data workflows.
- Enforce data security, privacy, and governance standards, including PII protection and compliance with industry regulations.
- Manage data movement and synchronization across on-premises systems, cloud platforms, and data warehouses.
- Optimize data storage and retrieval strategies for Vector Databases to support high-performance RAG and AI search workloads.
- Collaborate with Data Scientists, ML Engineers, Software Engineers, and business stakeholders to deliver scalable AI data solutions.
10+ years of experience in Data Engineering or Backend Engineering with a strong focus on data platforms and pipelines.- 2+ years of hands-on experience supporting AI/ML data pipelines, including data preparation for machine learning and generative AI applications.
- Expert-level proficiency in Python and SQL; experience with Java or Scala is an advantage.
- Strong experience building and maintaining real-time data streaming solutions using Apache Kafka, Flink, or Spark Streaming.
- Hands-on experience with modern data orchestration and transformation tools such as Airflow, dbt, and Prefect.
- Experience working with Vector Databases and Feature Stores to support AI and machine learning workloads.
- Strong knowledge of cloud-based data services on AWS, Azure, or GCP, including services such as Glue, Kinesis, Data Factory, or Dataflow.
- Experience deploying and managing data workloads in Kubernetes (K8s) environments.
- Proven experience handling sensitive data within regulated industries such as Fintech, Healthcare, or other compliance-driven environments.
- Strong understanding of data quality, governance, security, and privacy best practices.
- Bachelor's degree in Computer Science, Software Engineering, Information Systems, or a related technical field. Equivalent practical experience will also be considered.
- Excellent problem-solving skills and the ability to collaborate effectively with cross-functional engineering, data, and AI teams.