Data Engineer
Прямой работодатель Vallettasoftware Software Development ( vallettasoftware.com )
Опыт работы более 5 летот 4 500 до 5 000 $
Short description and industry: AI
Tech stack: Ruby on Rails, React, Postgre SQL, AWS
Existing team: FE, BE, PM, BA, UX/UI at partner’s side
Context: Working on a project of an analytical platform with multimodal data: documents (PDF, images, video, audio), structured data (CSV, JSON). Preparing data for search and RAG. Supporting the entity graph. Security and audit requirements. Deployment in AWS, without using public LLM-API.
🛠 Key Responsibilities:
- Design and implementation of ETL/ELT pipelines ingest → storage → processing → publishing
- Building a Data Lake/Lakehouse with raw/clean/curated zones and schema management
- Integration of multimodal pipelines (OCR, HTR, ASR) into the overall processing flow
- Preparing data for search and RAG (chunking, embeddings, indexing, updates)
- Development of batch and streaming processes, DAG orchestration
- Ensuring data quality and observability (tests, metrics, alerts, SLA)
- Implementation of security practices (RBAC/ABAC, PII masking, encryption, audit trail)
- Collaboration with ML and backend teams (feature preparation, Feature Store support)
- Infrastructure Operations (CI/CD, IaC, Cost Optimization)
- Documentation of pipelines and data lifecycle management
Requirements:
Storage and formats
● AWS S3, Parquet/ORC
● Lakehouse technologies (Iceberg or Delta Lake or Hudi)
● AWS Glue, data lineage practices
Orchestration and processing
● Apache Airflow (DAG orchestration)
● Apache Spark or equivalent (Glue, EMR)
● Kafka or AWS Kinesis, AWS SQS
Search and RAG
● Basic understanding of vector search (FAISS, pgvector, OpenSearch Vector)
● Experience in preparing data for indexes and updating them
Quality and observability
● Great Expectations or Deequ
● Pipeline monitoring and alerting
Security
● IAM, KMS, VPC
● Data encryption at rest and in flight
● PII masking and auditing actions
Infrastructure
● Docker, Kubernetes/EKS, ECR
● Terraform or Pulumi
● CloudWatch, CloudTrail
● CI/CD (GitHub Actions, GitLab CI or Jenkins)
Desirable
● Athena, Redshift, Trino or Presto
● OpenSearch, pgvector
● OpenLineage or Marquez
● Prometheus, Grafana
● Experience with Feature Store (Feast, Tecton or custom)
● CDC experience (Debezium, DMS), Schema Registry
● Knowledge of GovCloud limitations, work without public LLM-API
● Experience with OCR/ASR metrics (CER, WER), control of indexing completeness
Joining Valletta Software Development means:
🌍 A Global, Thriving Team
Join 100+ specialists from 20+ countries, united by a passion for outstanding
IT solutions.
🚀Diverse projects: Fintech, MedTech, AI/ML, e-commerce, and more. Switch
teams or industries to broaden your skills.
💡 Support at Every Step Client interview prep: We train you to succeed + give actionable feedback.
✔️ Strategic stability: Well-structured processes, strong management, and long- term vision.
✔️ Core values: Honesty, flexibility, innovation, and a people-first approach.
💸 Regular salary review based on your personal results
✨ Paid rest days and sick leaves.
P.S. When responding in a cover letter, please provide the following information:
Your telegram username
Location of stay
Your salary expectations
Thank you!