We are pleased to announce that our article "MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training" has been accepted for publication in the journal Transactions on Machine Learning Research (TMLR; 06/2025).
In this work, we present Math Mutator (MAMUT) - a novel method for automatically enriching mathematical datasets. Starting from a given LaTeX formula (e.g., (a+b)^2 = a^2 + 2ab + b^2), MAMUT generates new, mathematically equivalent variants with altered notation and variables (e.g, c^2+a^2+2*a*c=(a+c)^2). In addition, MAMUT can also generate deliberately incorrect variants by subtly modifying individual terms (e.g., replacing b^2 with 2^b), resulting in a similar appearing but mathematically non-equivalent formula. Our experiments show that language models trained on data generated by MAMUT develop a stronger mathematical understanding.
The full article is available here: openreview.net/forum