The AI Job Search Engine
Applied Scientist I(m/w/x)
Designing LLM evaluation pipelines for reasoning and factual accuracy. PhD in CS or AI required. Flexible hybrid work, comprehensive benefit plans.
Requirements
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or related field
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain
- Comfort with experimental design, data analysis, and communicating technical findings clearly
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools)
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies
- Experience in cloud-based ML development (AWS, Azure, or GCP)
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions
- Interest in Responsible AI, fairness, and interpretability research
Tasks
- Design and execute evaluation pipelines for LLMs and agentic systems
- Assess reasoning, factual accuracy, and alignment
- Build tools for automatic evaluation and synthetic dataset creation
- Implement LLM-as-a-judge workflows and continuous benchmarking systems
- Collaborate with applied scientists, ML engineers, and product managers
- Translate evaluation results into model improvements and product insights
- Prototype new evaluation metrics and contribute to internal reports
- Support publications and presentations on evaluation methods
- Promote reproducibility, transparency, and ethical AI evaluation
Work Experience
- approx. 1 - 4 years
Education
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- Transformers
- LangChain
- OpenAI Evals
- HELM
- LM Harness
- AWS
- Azure
- GCP
Benefits
Flexible Working
- Flexible hybrid working environment
- Flexible work arrangements
Other Benefits
- Supportive workplace policies
- Comprehensive benefit plans
More Vacation Days
- Flexible vacation
- Paid volunteer days off
Mental Health Support
- Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Retirement Plans
- Retirement savings
Additional Allowances
- Tuition reimbursement
- Resources for financial wellbeing
Bonuses & Incentives
- Employee incentive programs
Healthcare & Fitness
- Resources for physical wellbeing
Social Impact
- Pro-bono consulting opportunities
Sustainability Focus
- Environmental, Social, and Governance initiatives
Not a perfect match?
- Thomson ReutersFull-timeWith HomeofficeSeniorZug
- Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
Full-timeInternshipWith HomeofficeZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug
Applied Scientist I(m/w/x)
Designing LLM evaluation pipelines for reasoning and factual accuracy. PhD in CS or AI required. Flexible hybrid work, comprehensive benefit plans.
Requirements
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or related field
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain
- Comfort with experimental design, data analysis, and communicating technical findings clearly
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools)
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies
- Experience in cloud-based ML development (AWS, Azure, or GCP)
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions
- Interest in Responsible AI, fairness, and interpretability research
Tasks
- Design and execute evaluation pipelines for LLMs and agentic systems
- Assess reasoning, factual accuracy, and alignment
- Build tools for automatic evaluation and synthetic dataset creation
- Implement LLM-as-a-judge workflows and continuous benchmarking systems
- Collaborate with applied scientists, ML engineers, and product managers
- Translate evaluation results into model improvements and product insights
- Prototype new evaluation metrics and contribute to internal reports
- Support publications and presentations on evaluation methods
- Promote reproducibility, transparency, and ethical AI evaluation
Work Experience
- approx. 1 - 4 years
Education
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- Transformers
- LangChain
- OpenAI Evals
- HELM
- LM Harness
- AWS
- Azure
- GCP
Benefits
Flexible Working
- Flexible hybrid working environment
- Flexible work arrangements
Other Benefits
- Supportive workplace policies
- Comprehensive benefit plans
More Vacation Days
- Flexible vacation
- Paid volunteer days off
Mental Health Support
- Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Retirement Plans
- Retirement savings
Additional Allowances
- Tuition reimbursement
- Resources for financial wellbeing
Bonuses & Incentives
- Employee incentive programs
Healthcare & Fitness
- Resources for physical wellbeing
Social Impact
- Pro-bono consulting opportunities
Sustainability Focus
- Environmental, Social, and Governance initiatives
About the Company
Thomson Reuters
Industry
Media
Description
The company provides trusted content and technology for professionals in legal, tax, accounting, compliance, government, and media.
Not a perfect match?
- Thomson Reuters
Lead Applied Scientist I(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
Full-timeInternshipWith HomeofficeZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug