Google Tunix Hack - Train a model to show its work
I post-trained Gemma 3 1B with Tunix (JAX) using GRPO to make outputs reliably follow a structured format. Trained on math (GSM8K, SVAMP, MultiArith) and QA (SQuAD v1, Natural Questions), improving both format compliance and answer correctness.