The AI Job Search Engine
Applied Scientist, NLP/GenAI(m/w/x)
Developing AI pipelines for legal document understanding, knowledge extraction, and synthetic data generation at a legal, tax, and media content provider. 3+ years building/deploying deep learning/LLM-based document understanding systems required. Work from anywhere for up to 8 weeks per year.
Requirements
- PhD in Computer Science, AI, NLP, or related field, or Master's with equivalent research/industry experience
- 3+ years experience building/deploying document understanding, information extraction, or knowledge graph systems (deep learning, LLMs, NLP)
- Ability to translate complex document understanding problems into innovative AI applications
- Professional experience scaling and leading in applied research
- Strong programming skills (Python) and modern deep learning frameworks experience
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Deep understanding of document understanding fundamentals (layout analysis, semantic chunking, classification, taxonomies, multi-label, domain schemas)
- Expertise in knowledge extraction and knowledge graph construction (entity recognition, relation extraction, citation parsing, graph representations)
- Expertise in LLM-based information extraction, few-shot/multi-task learning, post-training, knowledge distillation
- Solid understanding of synthetic data generation techniques for NLP (query-answer generation, data augmentation)
- Solid understanding of efficiency optimization (knowledge distillation, model compression, SLM-based solutions)
- Solid understanding of DL/ML approaches for NLP tasks
- Experience designing annotation workflows, creating labeled datasets, and developing evaluation frameworks
- Prior work on legal document understanding, information extraction, knowledge representation (legal citations, domain concepts), or legal AI applications
- Prior work handling complex legal document structures (non-uniform formatting, nested hierarchies, cross-references, embedded elements)
- Experience building systems for analysis, question answering, or retrieval across large document collections
- Experience with knowledge graph frameworks/methodologies for legal or enterprise applications
- Understanding of RAG and agentic workflows for enterprise knowledge
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Experience with AzureML or AWS SageMaker
Tasks
- Develop and deploy AI solutions for legal document understanding.
- Develop advanced models for semantic chunking lengthy legal documents.
- Build document enrichment systems for classification and rich metadata extraction.
- Create LLM-based pipelines for extracting and linking legal knowledge.
- Develop scalable synthetic data generation systems.
- Support model training with synthetic data.
- Simulate complex legal research queries.
- Generate hallucination-free answers.
- Collaborate with engineering for software delivery and reliability.
- Develop comprehensive data and evaluation strategies.
- Leverage human annotation and synthetic data for evaluation.
- Apply robust training and evaluation methodologies.
- Balance model performance with latency requirements for SLM solutions.
- Apply knowledge distillation to compress models into efficient SLMs.
- Determine appropriate architectures for challenging document understanding.
- Develop semantic chunking strategies for diverse documents.
- Design document classification approaches for legal taxonomies.
- Implement LLM-based knowledge extraction methods.
- Build multi-document reasoning architectures.
- Balance accuracy, efficiency, and scalability for real-world challenges.
- Partner with Engineering and Product to translate legal challenges.
- Engage stakeholders to understand use case requirements.
- Align document understanding capabilities with business needs.
- Maintain scientific and technical expertise in relevant areas.
Work Experience
Education
Languages
Tools & Technologies
Benefits
Flexible Working
- •Flexible hybrid work environment
- •Flex My Way policies
- •Flexible work arrangements
Workation & Sabbatical
- •Work from anywhere for up to 8 weeks per year
Family Support
- •Work-life balance
Learning & Development
- •Culture of continuous learning
- •Skill development
- •Grow My Way programming
Other Benefits
- •Skills-first approach
More Vacation Days
- •Flexible vacation
Mental Health Support
- •Two company-wide Mental Health Days off
- •Access to Headspace app
- •Resources for mental wellbeing
Retirement Plans
- •Retirement savings
Additional Allowances
- •Tuition reimbursement
- •Resources for financial wellbeing
Bonuses & Incentives
- •Employee incentive programs
Healthcare & Fitness
- •Resources for physical wellbeing
Social Impact
- •Two paid volunteer days off annually
- •Pro-bono consulting project opportunities
Sustainability Focus
- •ESG initiative involvement opportunities
- Thomson ReutersFull-timeWith HomeofficeSeniorZug
- Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
Full-timeInternshipWith HomeofficeZug
Applied Scientist, NLP/GenAI(m/w/x)
Developing AI pipelines for legal document understanding, knowledge extraction, and synthetic data generation at a legal, tax, and media content provider. 3+ years building/deploying deep learning/LLM-based document understanding systems required. Work from anywhere for up to 8 weeks per year.
Requirements
- PhD in Computer Science, AI, NLP, or related field, or Master's with equivalent research/industry experience
- 3+ years experience building/deploying document understanding, information extraction, or knowledge graph systems (deep learning, LLMs, NLP)
- Ability to translate complex document understanding problems into innovative AI applications
- Professional experience scaling and leading in applied research
- Strong programming skills (Python) and modern deep learning frameworks experience
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Deep understanding of document understanding fundamentals (layout analysis, semantic chunking, classification, taxonomies, multi-label, domain schemas)
- Expertise in knowledge extraction and knowledge graph construction (entity recognition, relation extraction, citation parsing, graph representations)
- Expertise in LLM-based information extraction, few-shot/multi-task learning, post-training, knowledge distillation
- Solid understanding of synthetic data generation techniques for NLP (query-answer generation, data augmentation)
- Solid understanding of efficiency optimization (knowledge distillation, model compression, SLM-based solutions)
- Solid understanding of DL/ML approaches for NLP tasks
- Experience designing annotation workflows, creating labeled datasets, and developing evaluation frameworks
- Prior work on legal document understanding, information extraction, knowledge representation (legal citations, domain concepts), or legal AI applications
- Prior work handling complex legal document structures (non-uniform formatting, nested hierarchies, cross-references, embedded elements)
- Experience building systems for analysis, question answering, or retrieval across large document collections
- Experience with knowledge graph frameworks/methodologies for legal or enterprise applications
- Understanding of RAG and agentic workflows for enterprise knowledge
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Experience with AzureML or AWS SageMaker
Tasks
- Develop and deploy AI solutions for legal document understanding.
- Develop advanced models for semantic chunking lengthy legal documents.
- Build document enrichment systems for classification and rich metadata extraction.
- Create LLM-based pipelines for extracting and linking legal knowledge.
- Develop scalable synthetic data generation systems.
- Support model training with synthetic data.
- Simulate complex legal research queries.
- Generate hallucination-free answers.
- Collaborate with engineering for software delivery and reliability.
- Develop comprehensive data and evaluation strategies.
- Leverage human annotation and synthetic data for evaluation.
- Apply robust training and evaluation methodologies.
- Balance model performance with latency requirements for SLM solutions.
- Apply knowledge distillation to compress models into efficient SLMs.
- Determine appropriate architectures for challenging document understanding.
- Develop semantic chunking strategies for diverse documents.
- Design document classification approaches for legal taxonomies.
- Implement LLM-based knowledge extraction methods.
- Build multi-document reasoning architectures.
- Balance accuracy, efficiency, and scalability for real-world challenges.
- Partner with Engineering and Product to translate legal challenges.
- Engage stakeholders to understand use case requirements.
- Align document understanding capabilities with business needs.
- Maintain scientific and technical expertise in relevant areas.
Work Experience
Education
Languages
Tools & Technologies
Benefits
Flexible Working
- •Flexible hybrid work environment
- •Flex My Way policies
- •Flexible work arrangements
Workation & Sabbatical
- •Work from anywhere for up to 8 weeks per year
Family Support
- •Work-life balance
Learning & Development
- •Culture of continuous learning
- •Skill development
- •Grow My Way programming
Other Benefits
- •Skills-first approach
More Vacation Days
- •Flexible vacation
Mental Health Support
- •Two company-wide Mental Health Days off
- •Access to Headspace app
- •Resources for mental wellbeing
Retirement Plans
- •Retirement savings
Additional Allowances
- •Tuition reimbursement
- •Resources for financial wellbeing
Bonuses & Incentives
- •Employee incentive programs
Healthcare & Fitness
- •Resources for physical wellbeing
Social Impact
- •Two paid volunteer days off annually
- •Pro-bono consulting project opportunities
Sustainability Focus
- •ESG initiative involvement opportunities
About the Company
Thomson Reuters
Industry
Legal
Description
The company provides trusted content and technology for professionals in legal, tax, accounting, compliance, government, and media.
- Thomson Reuters
Senior Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Applied Scientist Intern(m/w/x)
Full-timeInternshipWith HomeofficeZug