From a statistical perspective I totally agree with you. Most cases can probably be broken down to something very space efficient. Most songs in this representation are essentially a sparse matrix.
What makes the password approach less appealing to me personally is that you could still build pathological examples where you don’t save much with such a compression scheme. In other words, passwords would have a variable length and, in the worst case, are probably as long as the full tune (modulo some base64 encoding or something).
And it becomes less practical in case I implement the option to compose longer tunes.