OPTIMIZING TRANSFORMER ARCHITECTURES FOR NATURAL LANGUAGE PROCESSING