Software Architect, Agent Evaluation & Core Framework

Remote, USA Full-time
About Datagrid Forget everything you know about AI assistants. At Datagrid, we’re building AI agents that actually do the work. We’re a team of passionate, hard-working builders, thinkers, and problem-solvers who are genuinely excited about what we do. Our mission is to supercharge the workday by turning complex data and tedious workflows into simple, automated actions. It’s an incredibly exciting time to join us—we’re growing fast, expanding our platform’s capabilities, and partnering with enterprise customers who want to 10x their teams’ output. We thrive on collaboration and are looking for people who are ready to make a tangible impact. If you want to be part of a team that’s not just talking about the future of AI but actively creating it, you’ve come to the right place. Our Values At Datagrid, our values guide how we work, build, and grow together. Act with Purpose: Everything we do is tied to our mission. You’ll see the impact of your work as we move quickly to solve meaningful problems for our customers. Own the Outcome: We believe in true ownership. You’ll take responsibility for your projects and see them through to success—empowered to make decisions that drive real results. Clarity without Ego: We value honesty, transparency, and trust. You can expect and provide direct feedback in an environment where candor sharpens our ideas and strengthens our team. Creativity with Purpose: Innovation is central to our culture. Your creative thinking will be valued and directed toward solving real-world challenges and creating lasting impact. About the role Datagrid Agents operate where our customers work-across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app Software Architect, Agent Evaluation & Core Framework, is crucial because we cannot manually test the vast array of agent interactions and capabilities. You will own and drive extending our evaluation harness to provide actionable reports on agent regressions and improvements, directly impacting strategic direction and customer experience. A key part of this will be incorporating the best open-source benchmarks into our evaluation set, and figuring out how to Agentically generate evaluations that are representative of customer use cases. As you become established, you will also have the opportunity to make fundamental changes to the Core Framework to improve the way Agents reason, use tools, and collaborate with humans. What you'll do: • Work closely with an Ex Googler who built Gemini evals to create a harness for evaluating Agent performance , make that harness available both for local development an CI/CD pipeline, and set up alerts when Agents misbehave. • Influence and contribute to the extension of Datagrid’s Agentic capabilities. • Choose the best open/closed source components to build out the testing infra. • Integrate publicly available benchmarks such as RAGBench into the testing system. • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions. • Expose evaluation performance via alerts and dashboards What we're looking for: • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1. • 10+ years of B2B software engineering experience. • Ability to write effective LLM prompts without assistance. • Proficiency with nodejs and server side frameworks such as NestJS or NextJS. • Familiarity with JavaScript frameworks such as React, Angular JS. • Experience with databases such as Weaviate and BigQuery. • Experience working with GCP or similar cloud providers. Who we're looking for: • Experience with any LLM evaluation platform (Galileo, Arize, LangSmith Orq) • Background in B2B SaaS automation tools • Contributions to open-source AI projects or published research • Familiarity with prompt engineering or model evaluation Pay Range and Benefits • Salary Range: $200,000 - $240,000 • Generous equity compensation • Flexible vacation/time-off policy • All U.S. federal holidays observed, plus an additional company-wide Week of Rest in December • Competitive benefits package - 100% premium coverage for employees and generous coverage for dependents • Work-from-home stipend to support your ideal setup • 401(k) plan The base pay range target for the role seniority described in this job description is between $200,000 - $240,000. Final offer amounts depend on multiple factors such as candidate experience and expertise, geographic location, total compensation, and market data. In addition to cash pay, full-time regular positions are eligible for equity, 401(k), health benefits, and other benefits; some of these benefits may be available for part-time or temporary positions. Apply tot his job
Apply Now

Similar Jobs

Web Developer Intern (Remote)

Remote, USA Full-time

Sr. Full Stack Software Engineer (Cybersecurity / 100% Remote)

Remote, USA Full-time

Manager, Software Manager (Barcelona)

Remote, USA Full-time

WorkForce Software Implementation Consultant

Remote, USA Full-time

Solution Architect Wright Patterson AFB, OH

Remote, USA Full-time

Senior Solution Architect :: Denver, CO (REMOTE)

Remote, USA Full-time

[Remote] Solutions Architect (End-to-End Architecture exp. req.)

Remote, USA Full-time

Immediate Hiring: (REMOTE, Part Time) Southwest Airlines Remote

Remote, USA Full-time

Southwest Airlines Remote Positions $27/Hour – ...

Remote, USA Full-time

Southwest Airlines Remote Jobs $35/Hour

Remote, USA Full-time

Experienced Full Stack Software Engineer – Web & Cloud Application Development

Remote, USA Full-time

Coordinator, Land Transportation (1st Shift) - Remote

Remote, USA Full-time

Machine Learning Engineer, MLOps Enginner

Remote, USA Full-time

Experienced Data Entry Specialist for Remote Full-Time Opportunity at blithequark

Remote, USA Full-time

Graduate Risk & Resilience Engineer (Available 2026)

Remote, USA Full-time

Fractional CMO Needed to Drive Lead Gen for $1.5M Mastermind (Direct Response + Facebook Ads)

Remote, USA Full-time

Backend Engineer, AI (Senior/Staff Level)

Remote, USA Full-time

Salesforce Admin ll

Remote, USA Full-time

Attorney Needed to Draft and Send Single Demand Letter to United Airlines

Remote, USA Full-time

Experienced Data Scientist for Remote Opportunities - Data Analysis and Interpretation Specialist at arenaflex

Remote, USA Full-time
Back to Home