Since it is machine learning, there is always a chance that it could learn something weird under the hood that can impact it's efficiency. It also won't learn it perfectly the first time it tries something (you may have to correct it on what pink means over several rounds.
It can identify beakers by left-most and right-most (it doesn't have vocab for it, but that is the mechanism it selects by), but maybe that helps you describe what you want it to do.
Also, hypothetically, you can have it practice commands. It only moves you ahead when the puzzle is complete, but you can give it any arbitrary command, and tell it good job for getting it right! (ex. the puzzle suggests it turn the beaker pink, but you tell it to turn yellow, then praise it when it is correct)
Beyond that, fewer characters helps improve loading times, and consistency is PARAMOUNT. Even capitalization/not looks different to it.
Good luck!