Data Engineer

Прямой работодатель  Vallettasoftware Software Development ( vallettasoftware.com )
Сеньор
Аналитика, Data Science, Big Data • Инженер • Заказная разработка
9 сентября
Удаленная работа
Опыт работы более 5 лет
от 4 500 до 5 000 $
Работодатель  Vallettasoftware Software Development
Описание вакансии

Short description and industry: AI

Tech stack: Ruby on Rails, React, Postgre SQL, AWS

Existing team: FE, BE, PM, BA, UX/UI at partner’s side

Context: Working on a project of an analytical platform with multimodal data: documents (PDF, images, video, audio), structured data (CSV, JSON). Preparing data for search and RAG. Supporting the entity graph. Security and audit requirements. Deployment in AWS, without using public LLM-API.

 

🛠 Key Responsibilities:

  • Design and implementation of ETL/ELT pipelines ingest → storage → processing → publishing
  • Building a Data Lake/Lakehouse with raw/clean/curated zones and schema management
  • Integration of multimodal pipelines (OCR, HTR, ASR) into the overall processing flow
  • Preparing data for search and RAG (chunking, embeddings, indexing, updates)
  • Development of batch and streaming processes, DAG orchestration
  • Ensuring data quality and observability (tests, metrics, alerts, SLA)
  • Implementation of security practices (RBAC/ABAC, PII masking, encryption, audit trail)
  • Collaboration with ML and backend teams (feature preparation, Feature Store support)
  • Infrastructure Operations (CI/CD, IaC, Cost Optimization)
  • Documentation of pipelines and data lifecycle management

Requirements:

Storage and formats

● AWS S3, Parquet/ORC

● Lakehouse technologies (Iceberg or Delta Lake or Hudi)

● AWS Glue, data lineage practices

Orchestration and processing

● Apache Airflow (DAG orchestration)

● Apache Spark or equivalent (Glue, EMR)

● Kafka or AWS Kinesis, AWS SQS

Search and RAG

● Basic understanding of vector search (FAISS, pgvector, OpenSearch Vector)

● Experience in preparing data for indexes and updating them

Quality and observability

● Great Expectations or Deequ

● Pipeline monitoring and alerting

Security

● IAM, KMS, VPC

● Data encryption at rest and in flight

● PII masking and auditing actions

Infrastructure

● Docker, Kubernetes/EKS, ECR

● Terraform or Pulumi

● CloudWatch, CloudTrail

● CI/CD (GitHub Actions, GitLab CI or Jenkins)

Desirable

● Athena, Redshift, Trino or Presto

● OpenSearch, pgvector

● OpenLineage or Marquez

● Prometheus, Grafana

● Experience with Feature Store (Feast, Tecton or custom)

● CDC experience (Debezium, DMS), Schema Registry

● Knowledge of GovCloud limitations, work without public LLM-API

● Experience with OCR/ASR metrics (CER, WER), control of indexing completeness

 

Joining Valletta Software Development means:

🌍 A Global, Thriving Team
Join 100+ specialists from 20+ countries, united by a passion for outstanding 
IT solutions.
🚀Diverse projects: Fintech, MedTech, AI/ML, e-commerce, and more. Switch 
teams or industries to broaden your skills.
💡 Support at Every Step Client interview prep: We train you to succeed + give actionable feedback. 
✔️ Strategic stability: Well-structured processes, strong management, and long- term vision.
✔️ Core values: Honesty, flexibility, innovation, and a people-first approach.
💸 Regular salary review based on your personal results
Paid rest days and sick leaves.

 

P.S. When responding in a cover letter, please provide the following information:

Your telegram username

Location of stay

Your salary expectations

Thank you!

 


Специализация
Аналитика, Data Science, Big DataИнженер
Отрасль и сфера применения
Заказная разработка
Уровень должности
Сеньор
Загрузка формы отклика...