Thank you for the encouraging words - you are far too kind!
Regarding the implementation, depending on how complex your scene is (in terms of Actors, animating parts, tile count, etc.) you could do both methods that you mention - in my case (quite complex scene with lots of animating parts), I found that "spawning" the notes via a script worked best. The sound effect is actually very simple - I detect note collision with an invisible collider just below the buttons - if a collision happens, I play the sound effect.
I hope this helps!