AI/LLM Evaluation & Alignment Software Engineer

Remote, USA Full-time

At Leo Tech, we are passionate about building software that solves real-world problems in the Public Safety sector. Our software has been used to help the fight against continuing criminal enterprises, drug trafficking organizations, identifying financial fraud, disrupting sex and human trafficking rings and focusing on mental health matters to name a few. Role • This is a remote, WFH role. • As an AI/LLM Evaluation & Alignment Engineer on our Data Science team, you will play a critical role in ensuring that our Large Language Model (LLM) and Agentic AI solutions are accurate, safe, and aligned with the unique requirements of public safety and law enforcement workflows. You will design and implement evaluation frameworks, guardrails, and bias-mitigation strategies that give our customers confidence in the reliability and ethical use of our AI systems. This is an individual contributor (IC) role that combines hands-on technical engineering with a focus on responsible AI deployment. You will work closely with AI engineers, product managers, and Dev Ops teams to establish standards for evaluation, design test harnesses for generative models, and operationalize quality assurance processes across our AI stack. Core Responsibilities • Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases. • Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows. • Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability). • Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems. • Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios. • Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment. • Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs. • Provide technical leadership in responsible AI practices, influencing standards across the organization. • Contribute to Dev Ops/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus). • Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation. What We Value • Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field. • 3–5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety. • Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming. • Experience with bias detection, fairness approaches, and responsible AI design. • Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith • Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (Lang Graph, Strands Agents, Pydantic AI, Lang Chain, Hugging Face, PyTorch, Llama Index). • Experience integrating evaluations into Dev Ops/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or Git Hub Actions. • Understanding of cloud AI platforms (AWS, Azure) and deployment best practices. • Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios. • Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders. Technologies We Use • Cloud & Infrastructure: AWS (Bedrock, Sage Maker, Lambda), Azure AI, Kubernetes (EKS), Terraform, ArgoCD. • LLMs & Evaluation: Hugging Face, OpenAI API, Anthropic, Lang Chain, Llama Index, Ragas, Deep Eval, OpenAI Evals. • Observability & Guardrails: Langfuse, Guardrails AI. • Backend & Data: Python (primary), Elastic Search, Kafka, Airflow. • Dev Ops & Automation: Git Hub Actions, Code Pipeline. What You Can Expect • Wor… Apply tot his job

Apply Now

Experienced Remote Data Entry Specialist – $25/Hour – Blithequark’s Virtual Workforce – Work-Life Balance & Career Growth Opportunities

Remote, USA Full-time

Experienced Remote Data Entry Operator – Full-Time and Part-Time Work-from-Home Opportunities for Detail-Oriented and Efficient Individuals

Remote, USA Full-time

AI/LLM Evaluation & Alignment Software Engineer

Similar Jobs

Remote Mortgage Loan Officer – Home Financing Pro

Premier Loan Officer - TX Remote

Remote Call Center Loan Officer- VA Loans- NMLS required

Remote Broker Processor/LOA- Broker Exp required

Loan Processor - Veteran's Lending Group

Event Operations and Logistics Manager

Supply Chain Analyst III

Dangerous Goods Transportation Analyst

Scheduling and Logistics Coordinator – Healthcare

Customer Logistics Manager

Experienced Early Morning Stocking and Customer Service Representative - Retail Sales and Merchandising Expert

Detail-Oriented Data Entry Specialist - Flexible Full-Time & Part-Time Roles at blithequark

Experienced Remote Data Entry Specialist – $25/Hour – Blithequark’s Virtual Workforce – Work-Life Balance & Career Growth Opportunities

Experienced Remote Data Entry Operator – Full-Time and Part-Time Work-from-Home Opportunities for Detail-Oriented and Efficient Individuals

Experienced Customer Service Representative – Medicare Appeal Process Support

Experienced Full Stack Customer Service Representative – Digital Threat Intelligence and Data Analysis

Experienced Data Entry Specialist – Remote Work Opportunity for Career Growth and Development at blithequark

Remote Part‑Time UPS Data Entry & Logistics Virtual Assistant – Flexible Hours, Immediate Start, U.S. Based

Experienced Data Entry Specialist – Remote Opportunity with arenaflex

Experienced Customer Service Representative – Remote Work Opportunity at arenaflex

AI​/LLM Evaluation & Alignment Software Engineer

Similar Jobs

Remote Mortgage Loan Officer – Home Financing Pro

Premier Loan Officer - TX Remote

Remote Call Center Loan Officer- VA Loans- NMLS required

Remote Broker Processor/LOA- Broker Exp required

Loan Processor - Veteran's Lending Group

Event Operations and Logistics Manager

Supply Chain Analyst III

Dangerous Goods Transportation Analyst

Scheduling and Logistics Coordinator – Healthcare

Customer Logistics Manager

Experienced Early Morning Stocking and Customer Service Representative - Retail Sales and Merchandising Expert

Detail-Oriented Data Entry Specialist - Flexible Full-Time & Part-Time Roles at blithequark

Experienced Remote Data Entry Specialist – $25/Hour – Blithequark’s Virtual Workforce – Work-Life Balance & Career Growth Opportunities

Experienced Remote Data Entry Operator – Full-Time and Part-Time Work-from-Home Opportunities for Detail-Oriented and Efficient Individuals

**Experienced Customer Service Representative – Medicare Appeal Process Support**

**Experienced Full Stack Customer Service Representative – Digital Threat Intelligence and Data Analysis**

Experienced Data Entry Specialist – Remote Work Opportunity for Career Growth and Development at blithequark

Remote Part‑Time UPS Data Entry & Logistics Virtual Assistant – Flexible Hours, Immediate Start, U.S. Based

**Experienced Data Entry Specialist – Remote Opportunity with arenaflex**

**Experienced Customer Service Representative – Remote Work Opportunity at arenaflex**

AI/LLM Evaluation & Alignment Software Engineer

Experienced Customer Service Representative – Medicare Appeal Process Support

Experienced Full Stack Customer Service Representative – Digital Threat Intelligence and Data Analysis

Experienced Data Entry Specialist – Remote Opportunity with arenaflex

Experienced Customer Service Representative – Remote Work Opportunity at arenaflex