For Employers
Research Officer (Large Language Models & Data Engineering), A*STAR BII


A*STAR RESEARCH ENTITIES
16 days ago
Posted date
16 days ago
N/A
Minimum level
N/A
ITJob category
IT
We are a project team comprising researchers from SingHealth, A*STAR, and Synapxe, developing a product to enhance the management and treatment of patients with lower respiratory tract infections (LRTIs)-a condition that accounts for nearly 489 million cases globally and is a major contributor to inappropriate antibiotic use in Singapore.

Over 40% of prescriptions for LRTIs are unnecessary, contributing to antimicrobial resistance (AMR)-a top global health threat. Traditional stewardship programs are expert-dependent and resource-intensive, limiting scalability.

To overcome these challenges, we are building an AI-powered Large Language Model platform that uses routinely collected clinical data to help clinicians identify cases that do not require antibiotics, thereby reducing unnecessary prescriptions.

Role:

As a Research Officer, you will play a central role by integrating Large Language Models (LLMs) for intelligent processing of unstructured clinical text-such as physician notes, discharge summaries, and radiology reports. Your work will help build explainable, scalable, and real-time AI recommendations that assist clinicians at the point of care for lower respiratory tract infections.

Key Responsibilities:
  • Contribute to the LLM development, focusing on enhancing unstructured data processing for clinical and biomedical applications.
  • Fine-tune and train LLMs (e.g., LLaMA, Mistral, Phi, and the GPT family) using supervised and instruction-based datasets.
  • Design and implement pipelines for data cleaning, preprocessing, and tokenisation of large-scale text corpora.
  • Integrate retrieval-augmented generation (RAG) and knowledge graph components for domain adaptation.
  • Evaluate model performance using BLEU, ROUGE, BERTScore, and factual consistency metrics.
  • Develop optimised PEFT/LoRA/QLoRA fine-tuning frameworks for efficiency on GPU clusters.
  • Collaborate with researchers to design experiments, interpret results, and publish findings.
  • Maintain reproducible codebases, documentation, and experiment logs.

Requirements:
  • Master or Bachelor in computer science, data science, artificial intelligence, computational linguistics, or related disciplines. At least 5 years of relevant experience.
  • Strong experience in Natural Language Processing (NLP), transformer-based models, and text generation.
  • Proficiency in Python, PyTorch, Hugging Face, Transformers, and LLM fine-tuning libraries (e.g., PEFT, DeepSpeed, bitsandbytes).
  • Experience with text and data processing, including annotation, tokenisation, and augmentation.
  • Familiarity with vector databases (FAISS, Qdrant) and RAG pipelines.
  • Understanding of GPU-based training, distributed model optimisation, and experiment tracking (e.g., MLflow, W&B).
  • Strong analytical, communication, and collaborative skills.

Preferred Experience
  • Research experience with open-weight LLMs or domain-specific adaptation (e.g.biomedical).
  • Experience with multi-agent frameworks, prompt engineering, or LLM safety evaluation.
  • Familiarity with cloud computing (AWS, Azure) or on-prem GPU clusters.

What We Offer
  • Opportunity to work on cutting-edge LLM research with measurable real-world impact.
  • Access to high-performance GPU infrastructure and interdisciplinary collaborations.
  • Mentorship and opportunities for publication, conference presentation, and project leadership.

The above eligibility criteria are not exhaustive. A*STAR may include additional selection criteria based on its prevailing recruitment policies. These policies may be amended from time to time without notice. We regret that only shortlisted candidates will be notified.
Related tags
-
JOB SUMMARY
Research Officer (Large Language Models & Data Engineering), A*STAR BII
A*STAR RESEARCH ENTITIES
Singapore
16 days ago
N/A
Contract / Freelance / Self-employed

Research Officer (Large Language Models & Data Engineering), A*STAR BII