Might work in some situations, but it gets pretty brutal.
Assuming a buffer with data packed in an appropriate representation, we can do a fast 64x32 pixel (lores) fullscreen draw by using a series of 8 sprite draws in 21 cycles:
If you fill the v-registers with zeroes and take advantage of chip-8's auto-incrementing behavior on save, you can clear the buffer in 21 cycles:
Bulk operations have to touch every byte, so the best case for inverting the pixels stored in the buffer is around 296 cycles, with aggressive loop unrolling:
Merging together multiple buffers is even nastier, since you'll need to manually ping-pong the index register from buffer to buffer. Making use of the flag registers (xo-chip has 16 of 'em) as a lookaside buffer could potentially help.
The main thing I've papered over here is the byte layout. Plotting a pixel at a given x/y position in the buffer is fairly complex for the above routines, since 16x16 sprites alternate columns each byte and the sprites themselves are laid out in another zigzag pattern. There are many arrangements possible that would help or hinder different parts of a sprite drawing routine, but I think the result will be rather complicated any way you slice it. Using 8x15 sprites instead of 16x16 could help, but then drawing will be somewhat slower. Everything's a tradeoff.
These complete examples are here: http://johnearnest.github.io/Octo/index.html?key=A6NUST1P