Data Engineer

Required Skills

Python
Pandas
SQL
ETL
AI/ML

Job Description

Job Title: Data Engineer


Job Type: Full-time


Location: Remote


Job Summary

We are seeking a skilled Data Engineer with strong analytical thinking and a passion for solving research-driven data challenges. This exciting role involves building data pipelines, performing exploratory data analysis (EDA), and working with both structured and unstructured data. Your exposure to AI/ML techniques will be advantageous as you collaborate with data scientists and researchers to derive insights and support model development.


Key Responsibilities

- Design, develop, and maintain robust and scalable ETL pipelines for ingesting and transforming raw data from diverse sources.

- Conduct exploratory data analysis (EDA) to identify patterns, anomalies, and valuable insights.

- Collaborate with researchers and data scientists to prepare datasets for AI/ML modeling and experimentation.

- Develop and manage data models, schemas, and databases for efficient storage and querying of large datasets.

- Write optimized SQL queries and scripts for data extraction and aggregation.

- Ensure data quality, integrity, and security across all pipelines and storage systems.

- Automate data validation and reporting workflows to facilitate ongoing research tasks.


Required Skills and Qualifications

- Proficiency in Python and SQL.

- Experience in building and managing ETL pipelines.

- Expertise in using Pandas and NumPy for data manipulation.

- Familiarity with AI/ML techniques and tools such as scikit-learn, Hugging Face Transformers, and OpenAI API.

- Strong written and verbal communication skills.

- Experience with databases like PostgreSQL and MySQL.

- Knowledge in exploratory data analysis tools including Jupyter Notebooks, VS Code, or PyCharm.


Preferred Qualifications

- Experience with vector search tools like Qdrant, FAISS, or Pinecone.

- Familiarity with data visualization tools such as Matplotlib, Seaborn, or Plotly.

About micro1
micro1 is a data engine that helps AI labs train foundational models and enterprises build AI agents. We provide frontier evaluations and reinforcement learning environments used to improve LLM capabilities, as well as contextual evaluations used to monitor and improve AI agents in enterprise settings. Our data engine includes an AI recruiter agent that sources and vets domain experts, a data platform that enables rapid production of high-quality training data, and a pipeline performance system that ensures both quality and velocity.
Our goal is to have 1 billion people doing meaningful work by contributing their expertise to the development of frontier AI models. We’ve raised $40M+ in funding, and our AI recruiter has powered more than 1 million AI-led interviews as our global network of experts expands to form the human intelligence layer for AGI.

Apply now

Please note that after completing the interview process, you’ll be added to our talent pool and considered for this and other roles that match your skills.

Have any questions? See FAQs

Refer and Earn