Die KI-Suchmaschine für Jobs
Applied Scientist I(m/w/x)
Designing LLM evaluation pipelines for reasoning and factual accuracy. PhD in CS or AI required. Flexible hybrid work, comprehensive benefit plans.
Anforderungen
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or related field
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain
- Comfort with experimental design, data analysis, and communicating technical findings clearly
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools)
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies
- Experience in cloud-based ML development (AWS, Azure, or GCP)
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions
- Interest in Responsible AI, fairness, and interpretability research
Aufgaben
- Design and execute evaluation pipelines for LLMs and agentic systems
- Assess reasoning, factual accuracy, and alignment
- Build tools for automatic evaluation and synthetic dataset creation
- Implement LLM-as-a-judge workflows and continuous benchmarking systems
- Collaborate with applied scientists, ML engineers, and product managers
- Translate evaluation results into model improvements and product insights
- Prototype new evaluation metrics and contribute to internal reports
- Support publications and presentations on evaluation methods
- Promote reproducibility, transparency, and ethical AI evaluation
Berufserfahrung
- ca. 1 - 4 Jahre
Ausbildung
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Python
- PyTorch
- Transformers
- LangChain
- OpenAI Evals
- HELM
- LM Harness
- AWS
- Azure
- GCP
Benefits
Flexibles Arbeiten
- Flexible hybrid working environment
- Flexible work arrangements
Sonstige Vorteile
- Supportive workplace policies
- Comprehensive benefit plans
Mehr Urlaubstage
- Flexible vacation
- Paid volunteer days off
Mentale Gesundheitsförderung
- Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Betriebliche Altersvorsorge
- Retirement savings
Sonstige Zulagen
- Tuition reimbursement
- Resources for financial wellbeing
Boni & Prämien
- Employee incentive programs
Gesundheits- & Fitnessangebote
- Resources for physical wellbeing
Gemeinnützige Ausrichtung
- Pro-bono consulting opportunities
Fokus auf Nachhaltigkeit
- Environmental, Social, and Governance initiatives
Noch nicht perfekt?
- Thomson ReutersVollzeitmit HomeofficeSeniorZug
- Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Vollzeitmit HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Vollzeitmit HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
VollzeitPraktikummit HomeofficeZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Vollzeitmit HomeofficeSeniorZug
Applied Scientist I(m/w/x)
Designing LLM evaluation pipelines for reasoning and factual accuracy. PhD in CS or AI required. Flexible hybrid work, comprehensive benefit plans.
Anforderungen
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or related field
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain
- Comfort with experimental design, data analysis, and communicating technical findings clearly
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools)
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies
- Experience in cloud-based ML development (AWS, Azure, or GCP)
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions
- Interest in Responsible AI, fairness, and interpretability research
Aufgaben
- Design and execute evaluation pipelines for LLMs and agentic systems
- Assess reasoning, factual accuracy, and alignment
- Build tools for automatic evaluation and synthetic dataset creation
- Implement LLM-as-a-judge workflows and continuous benchmarking systems
- Collaborate with applied scientists, ML engineers, and product managers
- Translate evaluation results into model improvements and product insights
- Prototype new evaluation metrics and contribute to internal reports
- Support publications and presentations on evaluation methods
- Promote reproducibility, transparency, and ethical AI evaluation
Berufserfahrung
- ca. 1 - 4 Jahre
Ausbildung
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Python
- PyTorch
- Transformers
- LangChain
- OpenAI Evals
- HELM
- LM Harness
- AWS
- Azure
- GCP
Benefits
Flexibles Arbeiten
- Flexible hybrid working environment
- Flexible work arrangements
Sonstige Vorteile
- Supportive workplace policies
- Comprehensive benefit plans
Mehr Urlaubstage
- Flexible vacation
- Paid volunteer days off
Mentale Gesundheitsförderung
- Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Betriebliche Altersvorsorge
- Retirement savings
Sonstige Zulagen
- Tuition reimbursement
- Resources for financial wellbeing
Boni & Prämien
- Employee incentive programs
Gesundheits- & Fitnessangebote
- Resources for physical wellbeing
Gemeinnützige Ausrichtung
- Pro-bono consulting opportunities
Fokus auf Nachhaltigkeit
- Environmental, Social, and Governance initiatives
Über das Unternehmen
Thomson Reuters
Branche
Media
Beschreibung
The company provides trusted content and technology for professionals in legal, tax, accounting, compliance, government, and media.
Noch nicht perfekt?
- Thomson Reuters
Lead Applied Scientist I(m/w/x)
Vollzeitmit HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Vollzeitmit HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Vollzeitmit HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
VollzeitPraktikummit HomeofficeZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Vollzeitmit HomeofficeSeniorZug