Your personal AI career agent
Applied Scientist I(m/w/x)
Designing LLM evaluation pipelines for reasoning and factual accuracy. PhD in CS or AI required. Flexible hybrid work, comprehensive benefit plans.
Requirements
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or related field
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain
- Comfort with experimental design, data analysis, and communicating technical findings clearly
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools)
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies
- Experience in cloud-based ML development (AWS, Azure, or GCP)
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions
- Interest in Responsible AI, fairness, and interpretability research
Tasks
- Design and execute evaluation pipelines for LLMs and agentic systems
- Assess reasoning, factual accuracy, and alignment
- Build tools for automatic evaluation and synthetic dataset creation
- Implement LLM-as-a-judge workflows and continuous benchmarking systems
- Collaborate with applied scientists, ML engineers, and product managers
- Translate evaluation results into model improvements and product insights
- Prototype new evaluation metrics and contribute to internal reports
- Support publications and presentations on evaluation methods
- Promote reproducibility, transparency, and ethical AI evaluation
Work Experience
- approx. 1 - 4 years
Education
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- Transformers
- LangChain
- OpenAI Evals
- HELM
- LM Harness
- AWS
- Azure
- GCP
Benefits
Flexible Working
- Flexible hybrid working environment
- Flexible work arrangements
Other Benefits
- Supportive workplace policies
- Comprehensive benefit plans
More Vacation Days
- Flexible vacation
- Paid volunteer days off
Mental Health Support
- Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Retirement Plans
- Retirement savings
Additional Allowances
- Tuition reimbursement
- Resources for financial wellbeing
Bonuses & Incentives
- Employee incentive programs
Healthcare & Fitness
- Resources for physical wellbeing
Social Impact
- Pro-bono consulting opportunities
Sustainability Focus
- Environmental, Social, and Governance initiatives
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- Thomson ReutersFull-timeWith HomeofficeSeniorZug
- Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
Full-timeInternshipWith HomeofficeZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug
Applied Scientist I(m/w/x)
Designing LLM evaluation pipelines for reasoning and factual accuracy. PhD in CS or AI required. Flexible hybrid work, comprehensive benefit plans.
Requirements
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or related field
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain
- Comfort with experimental design, data analysis, and communicating technical findings clearly
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools)
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies
- Experience in cloud-based ML development (AWS, Azure, or GCP)
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions
- Interest in Responsible AI, fairness, and interpretability research
Tasks
- Design and execute evaluation pipelines for LLMs and agentic systems
- Assess reasoning, factual accuracy, and alignment
- Build tools for automatic evaluation and synthetic dataset creation
- Implement LLM-as-a-judge workflows and continuous benchmarking systems
- Collaborate with applied scientists, ML engineers, and product managers
- Translate evaluation results into model improvements and product insights
- Prototype new evaluation metrics and contribute to internal reports
- Support publications and presentations on evaluation methods
- Promote reproducibility, transparency, and ethical AI evaluation
Work Experience
- approx. 1 - 4 years
Education
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- Transformers
- LangChain
- OpenAI Evals
- HELM
- LM Harness
- AWS
- Azure
- GCP
Benefits
Flexible Working
- Flexible hybrid working environment
- Flexible work arrangements
Other Benefits
- Supportive workplace policies
- Comprehensive benefit plans
More Vacation Days
- Flexible vacation
- Paid volunteer days off
Mental Health Support
- Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Retirement Plans
- Retirement savings
Additional Allowances
- Tuition reimbursement
- Resources for financial wellbeing
Bonuses & Incentives
- Employee incentive programs
Healthcare & Fitness
- Resources for physical wellbeing
Social Impact
- Pro-bono consulting opportunities
Sustainability Focus
- Environmental, Social, and Governance initiatives
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
Thomson Reuters
Industry
Media
Description
The company provides trusted content and technology for professionals in legal, tax, accounting, compliance, government, and media.
Not a perfect match?
- Thomson Reuters
Lead Applied Scientist I(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
Full-timeInternshipWith HomeofficeZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug