Past projects

Radar plot showing STEMerald subject-level accuracy across STEM domains

LLM fine-tuning, alignment, evaluation

STEMerald: A Gemma-Based Course Assistant

A compact Gemma-based STEM assistant can gain useful domain competence through alignment, while remaining small enough to run as a quantized 2GB model.

Reached 75% accuracy on university-level STEM multiple-choice QA and compressed the model to a 2GB 4-bit variant.

Wordle reinforcement learning results comparing training methods and dataset quality

Algorithm analysis, environment design, model evaluation

Offline Reinforcement Learning with LLMs

Reward-weighted behavioral cloning offers a simple way to bridge imitation learning and reward filtering in short-horizon language environments.

Studied Reward-Weighted Behavioral Cloning as a practical bridge between Behavioral Cloning and Filtered-BC.

Confusion matrix for the best stance detection model on SemEval

LLM fine-tuning, LoRA experiments, domain generalization analysis

LLMs for Argument Stance Detection

Careful LoRA fine-tuning can make open LLMs reliable stance classifiers, with Mistral-7B giving the strongest cross-dataset generalization.

Mistral-7B achieved strong cross-dataset results, including 0.76 F1 on SemEval and 0.92 F1 on IBM-Debater in the combined setting.