Imagine you were writing a book and you intermixed first-person and third-person. No one would know what was going on. You MUST be consistent. Make a choice and stick to it.
Assuming you choose to use first-person, that doesn't mean that every room description has to start with 'I see'. In the example in your screen grab, it might say 'A path winds through the woodland...' This is where the art of the storyteller comes into play. However, the description must be consistent with the image. The description says the path goes to the west, but in the image, it clearly goes to the east, or maybe that's the north-east. So where's the west path? The line of lanterns only follows one path, presumably the north-east path. This is terribly confusing for the player.
All your images should be facing north. If they're not, you need to say so. Attention to detail is paramount.