Yeah; it's a browser limitation, I'm afraid. To start an audio context playing, you must kick it off as a direct result of a user input action. If the subsystem hasn't been set up yet, Octo will make an attempt in response to keyboard or touch input.
Yes, I could. I thought about adding another "Press any key" to the first screen (that doesn't have any sound, and currently just transitions after a given time). But then I'd have to redesign that screen to make room for the extra text. Ah well, you have to know when to stop sometimes ;) This is not an issue with Octo or my program, it's just a little annoyance because of the fact that we run it in a browser. That's okay with me.