Ukraine Flag We stand with Ukraine! Learn More

Data Engineer/Machine Learning

We are seeking a skilled MLOps Engineer with expertise in building and optimizing data pipelines, deploying models, processing large datasets, working with large-scale text data, and experience with the preparation and curation of high-quality datasets.

About Akvelon

🌎Akvelon is an American company with offices in Seattle, Mexico, Ukraine, Poland, and Serbia etc. Our company is an official vendor of Microsoft and Google. Our clients also include Amazon, Evernote, Intel, HP, Reddit, Pinterest, AT&T, T-Mobile, Starbucks, and LinkedIn. To work with Akvelon means to be connected with the best and brightest engineering teams from around the globe and working with an actual technology stack building Enterprise, CRM, LOB, Cloud, AI and Machine Learning, Cross-Platform, Mobile, and other types of applications customized to client’s needs and processes.


Responsibilities:

• Design and Build ML Pipelines: Develop, maintain, and optimize scalable and reliable data pipelines to ingest, process by running model inference, and store large volumes of structured and unstructured data, focusing on text data.

• Google Cloud Platform: Leverage GCP services such as BigQuery, Dataflow, Vertex AI, Kubeflow, and Cloud Storage to build and manage data pipelines and storage solutions. Experience with Cloud Run and Cloud Composer is plus.

• Real-time inference and batch processing with deep learning models, being familiar with deployment environments

• Data Processing and Transformation: Implement ETL processes to clean, transform, and enrich data from various sources, ensuring data quality and integrity for downstream applications.

• Text Data Management: Work extensively with text data, including preprocessing, sanitization, chunking, document segmentation. Develop pipelines that facilitate the processing and analysis of text data at scale.

• High-Quality Dataset Preparation and Curation: Collaborate with machine learning modeling team to understand data needs. Design processes for sourcing, cleaning, and curating high-quality datasets, ensuring they are well-documented, consistent, and ready for use in machine learning and analytical applications.

• Database and Data Warehouse Management: Design and maintain databases and data warehouses, ensuring efficient storage, retrieval, and management of large datasets, including structured, semi-structured, and unstructured data.

• Collaboration with Machine Learning Team: Work closely with data scientists, machine learning engineers, and analysts to understand data needs and provide scalable solutions that support advanced analytics and machine learning models.

• Performance Optimization: Continuously monitor and improve the performance of data pipelines and storage solutions to handle increasing data volumes and complexity.

• Automation and Monitoring: Implement automation and monitoring tools to ensure the reliability, availability, and efficiency of data pipelines and infrastructure.

• Documentation and Best Practices: Document processes, data flows, and pipeline architectures

Experience:

  • 3+ years of experience in data engineering, with a strong focus on building and managing data pipelines. 
  • Machine learning model deployments and pipelines
  • Experience in dataset preparation, curation, and quality management.
  • Proven experience working with large-scale text data
  • Prior experience working with modeling teams
  • Hands-on experience with Google Cloud Platform (GCP) services, including Vertex AI BigQuery, Kubeflow (Vertex AI pipelines), Dataflow, Cloud Storage, and other GCP data tools.
  • Proficiency in Python, SQL
  • Experience with AI platforms on Cloud, e.g. Vertex AI
  • Experience with big data frameworks like Dataflow
  • Strong knowledge of ETL and data integration platforms
  • Strong experience with workflow orchestration tools (e.g. Kubeflow, Airflow, Cloud Composer)
  • Familiarity with databases (e.g., MySQL, MongoDB)

Nice to have

  •  Knowledge of text processing libraries and NLP frameworks (e.g., NLTK, SpaCy, Regex)
  •  Being familiar with concepts, developments and tools in LLM ecosystem (e.g. Instructor, Pydantic, Langchain, Embeddings, Vector DBs)

Overlap Time Requirements:
12 PM Eastern Standard Time (US)

Benefits:

  • Paid vacation, sick leave (without sickness list)
  • Official state holidays — 11 days considered public holidays
  • Professional growth while attending challenging projects and the possibility to switch your role, master new technologies and skills with company support
  • Flexible working schedule: 8 hours per day, 40 hours per week. 
  • Personal Career Development Plan (CDP)
  • Employee support program (Discount, Care, Heals, Legal compensation)
  • Paid external training, conferences, and professional certification that meets the company’s business goals
  • Internal workshops & seminars
  • Corporate library (Paper/E-books) and internal English classes

Get up to $1,5K for
a referrence

Long-term partnership approach

Refer a friend