Senior Data Engineer, GenAI

MongoDB

MongoDB

Data Science
New York, NY, USA
Posted on Thursday, May 9, 2024

The worldwide data management software market is massive (According to IDC, the worldwide database software market, which it refers to as the database management systems software market, was forecasted to be approximately $82 billion in 2023 growing to approximately $137 billion in 2027. This represents a 14% compound annual growth rate). At MongoDB we are transforming industries and empowering developers to build amazing apps that people use every day. We are the leading developer data platform and the first database provider to IPO in over 20 years. Join our team and be at the forefront of innovation and creativity.

The Data Pipelines Engineering team is responsible for building ETL pipelines that populate the Internal Data Platform, which drives analytics that help the company run more efficiently. Our team builds highly performant and scalable processes that extract massive datasets and makes those datasets available for querying in an optimal way. We are also building a Generative AI framework that will help teams within the company tap into the data that we store in their Retrieval-Augmented Generation (RAG)-based applications.

We are looking to speak to candidates who are based in New York City for our hybrid working model.

What you’ll do:

  • Build ETL pipelines using technologies such as Python and Spark
  • Implement new ETL pipelines on top of a variety of architectures (e.g. file-based, streaming)
  • Determine best strategies for building AI tools, including how best to chunk and retrieve RAG-based data and which LLMs are most appropriate to support use cases
  • Stay abreast of industry trends in the AI space, and evaluate and incorporate new concepts/tools into MongoDB’s internal AI architecture
  • Make architectural decisions relating to storing large datasets using a variety of file formats (e.g. Parquet, JSON) and table types (e.g. Iceberg, Hive)
  • Work with Security and Compliance teams to ensure that datasets have appropriate permissions and regulations in place
  • Work with Data Analysts and Data Scientists to understand and make available the data that is important for their analysis
  • Work with our Data Platform, Architecture, and Governance sibling teams to make data scalable, consumable, and discoverable

We’re looking for someone with:

  • 5+ years of building ETL pipelines for a Data Lake/Warehouse
  • 1+ year building AI and RAG-based applications
  • 5+ years Python experience
  • 5+ years Spark experience
  • Hive, Iceberg, Glue, or other technologies that expose big data as tables
  • Familiarity with different big data file types such as Parquet, Avro, and JSON

Success Measures:

  • In 3 months, you'll have a thorough understanding of the architecture of MongoDB’s internal Data Lake and AI ecosystem
  • In 6 months, you'll have owned the delivery of a large project from start (scoping, design) to finish (delivery)
  • In 12 months, you'll have designed new features, led development work, and become a go-to expert on parts of the system

To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world!

MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.

MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

MongoDB’s base salary range for this role is posted below. Compensation at the time of offer is unique to each candidate and based on a variety of factors such as skill set, experience, qualifications, and work location. Salary is one part of MongoDB’s total compensation and benefits package. Other benefits for eligible employees may include: equity, participation in the employee stock purchase program, flexible paid time off, 20 weeks fully-paid gender-neutral parental leave, fertility and adoption assistance, 401(k) plan, mental health counseling, access to transgender-inclusive health insurance coverage, and health benefits offerings. Please note, the base salary range listed below and the benefits in this paragraph are only applicable to U.S.-based candidates.

MongoDB’s base salary range for this role in the U.S. is:
$118,000$231,000 USD