Your personal AI career agent
Applied Scientist, NLP/GenAI(m/w/x)
Developing AI pipelines for legal document understanding, knowledge extraction, and synthetic data generation at a legal, tax, and media content provider. 3+ years building/deploying deep learning/LLM-based document understanding systems required. Work from anywhere for up to 8 weeks per year.
Requirements
- PhD in Computer Science, AI, NLP, or related field, or Master's with equivalent research/industry experience
- 3+ years experience building/deploying document understanding, information extraction, or knowledge graph systems (deep learning, LLMs, NLP)
- Ability to translate complex document understanding problems into innovative AI applications
- Professional experience scaling and leading in applied research
- Strong programming skills (Python) and modern deep learning frameworks experience
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Deep understanding of document understanding fundamentals (layout analysis, semantic chunking, classification, taxonomies, multi-label, domain schemas)
- Expertise in knowledge extraction and knowledge graph construction (entity recognition, relation extraction, citation parsing, graph representations)
- Expertise in LLM-based information extraction, few-shot/multi-task learning, post-training, knowledge distillation
- Solid understanding of synthetic data generation techniques for NLP (query-answer generation, data augmentation)
- Solid understanding of efficiency optimization (knowledge distillation, model compression, SLM-based solutions)
- Solid understanding of DL/ML approaches for NLP tasks
- Experience designing annotation workflows, creating labeled datasets, and developing evaluation frameworks
- Prior work on legal document understanding, information extraction, knowledge representation (legal citations, domain concepts), or legal AI applications
- Prior work handling complex legal document structures (non-uniform formatting, nested hierarchies, cross-references, embedded elements)
- Experience building systems for analysis, question answering, or retrieval across large document collections
- Experience with knowledge graph frameworks/methodologies for legal or enterprise applications
- Understanding of RAG and agentic workflows for enterprise knowledge
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Experience with AzureML or AWS SageMaker
Tasks
- Develop and deploy AI solutions for legal document understanding.
- Develop advanced models for semantic chunking lengthy legal documents.
- Build document enrichment systems for classification and rich metadata extraction.
- Create LLM-based pipelines for extracting and linking legal knowledge.
- Develop scalable synthetic data generation systems.
- Support model training with synthetic data.
- Simulate complex legal research queries.
- Generate hallucination-free answers.
- Collaborate with engineering for software delivery and reliability.
- Develop comprehensive data and evaluation strategies.
- Leverage human annotation and synthetic data for evaluation.
- Apply robust training and evaluation methodologies.
- Balance model performance with latency requirements for SLM solutions.
- Apply knowledge distillation to compress models into efficient SLMs.
- Determine appropriate architectures for challenging document understanding.
- Develop semantic chunking strategies for diverse documents.
- Design document classification approaches for legal taxonomies.
- Implement LLM-based knowledge extraction methods.
- Build multi-document reasoning architectures.
- Balance accuracy, efficiency, and scalability for real-world challenges.
- Partner with Engineering and Product to translate legal challenges.
- Engage stakeholders to understand use case requirements.
- Align document understanding capabilities with business needs.
- Maintain scientific and technical expertise in relevant areas.
Work Experience
- 3 years
Education
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- Hugging Face Transformers
- DeepSpeed
- LLMs
- SLM
- RAG
- AzureML
- AWS SageMaker
Benefits
Flexible Working
- Flexible hybrid work environment
- Flex My Way policies
- Flexible work arrangements
Workation & Sabbatical
- Work from anywhere for up to 8 weeks per year
Family Support
- Work-life balance
Learning & Development
- Culture of continuous learning
- Skill development
- Grow My Way programming
Other Benefits
- Skills-first approach
More Vacation Days
- Flexible vacation
Mental Health Support
- Two company-wide Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Retirement Plans
- Retirement savings
Additional Allowances
- Tuition reimbursement
- Resources for financial wellbeing
Bonuses & Incentives
- Employee incentive programs
Healthcare & Fitness
- Resources for physical wellbeing
Social Impact
- Two paid volunteer days off annually
- Pro-bono consulting project opportunities
Sustainability Focus
- ESG initiative involvement opportunities
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- Thomson ReutersFull-timeWith HomeofficeSeniorZug
- Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Research Engineer(m/w/x)
Full-timeWith HomeofficeExperiencedZug
Applied Scientist, NLP/GenAI(m/w/x)
Developing AI pipelines for legal document understanding, knowledge extraction, and synthetic data generation at a legal, tax, and media content provider. 3+ years building/deploying deep learning/LLM-based document understanding systems required. Work from anywhere for up to 8 weeks per year.
Requirements
- PhD in Computer Science, AI, NLP, or related field, or Master's with equivalent research/industry experience
- 3+ years experience building/deploying document understanding, information extraction, or knowledge graph systems (deep learning, LLMs, NLP)
- Ability to translate complex document understanding problems into innovative AI applications
- Professional experience scaling and leading in applied research
- Strong programming skills (Python) and modern deep learning frameworks experience
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Deep understanding of document understanding fundamentals (layout analysis, semantic chunking, classification, taxonomies, multi-label, domain schemas)
- Expertise in knowledge extraction and knowledge graph construction (entity recognition, relation extraction, citation parsing, graph representations)
- Expertise in LLM-based information extraction, few-shot/multi-task learning, post-training, knowledge distillation
- Solid understanding of synthetic data generation techniques for NLP (query-answer generation, data augmentation)
- Solid understanding of efficiency optimization (knowledge distillation, model compression, SLM-based solutions)
- Solid understanding of DL/ML approaches for NLP tasks
- Experience designing annotation workflows, creating labeled datasets, and developing evaluation frameworks
- Prior work on legal document understanding, information extraction, knowledge representation (legal citations, domain concepts), or legal AI applications
- Prior work handling complex legal document structures (non-uniform formatting, nested hierarchies, cross-references, embedded elements)
- Experience building systems for analysis, question answering, or retrieval across large document collections
- Experience with knowledge graph frameworks/methodologies for legal or enterprise applications
- Understanding of RAG and agentic workflows for enterprise knowledge
- Publications at relevant venues (ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD)
- Experience with AzureML or AWS SageMaker
Tasks
- Develop and deploy AI solutions for legal document understanding.
- Develop advanced models for semantic chunking lengthy legal documents.
- Build document enrichment systems for classification and rich metadata extraction.
- Create LLM-based pipelines for extracting and linking legal knowledge.
- Develop scalable synthetic data generation systems.
- Support model training with synthetic data.
- Simulate complex legal research queries.
- Generate hallucination-free answers.
- Collaborate with engineering for software delivery and reliability.
- Develop comprehensive data and evaluation strategies.
- Leverage human annotation and synthetic data for evaluation.
- Apply robust training and evaluation methodologies.
- Balance model performance with latency requirements for SLM solutions.
- Apply knowledge distillation to compress models into efficient SLMs.
- Determine appropriate architectures for challenging document understanding.
- Develop semantic chunking strategies for diverse documents.
- Design document classification approaches for legal taxonomies.
- Implement LLM-based knowledge extraction methods.
- Build multi-document reasoning architectures.
- Balance accuracy, efficiency, and scalability for real-world challenges.
- Partner with Engineering and Product to translate legal challenges.
- Engage stakeholders to understand use case requirements.
- Align document understanding capabilities with business needs.
- Maintain scientific and technical expertise in relevant areas.
Work Experience
- 3 years
Education
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- Hugging Face Transformers
- DeepSpeed
- LLMs
- SLM
- RAG
- AzureML
- AWS SageMaker
Benefits
Flexible Working
- Flexible hybrid work environment
- Flex My Way policies
- Flexible work arrangements
Workation & Sabbatical
- Work from anywhere for up to 8 weeks per year
Family Support
- Work-life balance
Learning & Development
- Culture of continuous learning
- Skill development
- Grow My Way programming
Other Benefits
- Skills-first approach
More Vacation Days
- Flexible vacation
Mental Health Support
- Two company-wide Mental Health Days off
- Access to Headspace app
- Resources for mental wellbeing
Retirement Plans
- Retirement savings
Additional Allowances
- Tuition reimbursement
- Resources for financial wellbeing
Bonuses & Incentives
- Employee incentive programs
Healthcare & Fitness
- Resources for physical wellbeing
Social Impact
- Two paid volunteer days off annually
- Pro-bono consulting project opportunities
Sustainability Focus
- ESG initiative involvement opportunities
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
Thomson Reuters
Industry
Legal
Description
The company provides trusted content and technology for professionals in legal, tax, accounting, compliance, government, and media.
Not a perfect match?
- Thomson Reuters
Senior Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters Enterprise Centre GmbH
Lead Applied Scientist, NLP/GenAI(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Lead Applied Scientist - Legal Tech(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Senior Applied Scientist, Knowledge Graphs and ML(m/w/x)
Full-timeWith HomeofficeSeniorZug - Thomson Reuters
Research Engineer(m/w/x)
Full-timeWith HomeofficeExperiencedZug