Skip to main content

On Sale: GamesAssetsToolsTabletopComics
Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Hi Superfury,

it's good to see that you're still maintaining the project, but the issues I had six months ago still seem unsolved. In the meantime I bought a PSP-2000, so it's definitely no longer a matter of memory limitations. 

I did a test using the same approach as last time - using files that are proven to work with the PC version of UniPCemu. This time, it was the PC-XT bios 3.1 linked from the Itch page, and forcing CGA mode (so that no additional files would be required). I changed no other settings, other than to display the clock speed.

Again, this approach works brilliantly on PC - and does nothing on PSP. The CPU clock speed drops to 0% and the screen stays black - even after minutes of waiting.

Just ran the current commit version (which has some optimizations and speed improvements (up to 75% more speed, purely optimizing DMA clocking and non-rendering audio channels(mostly the MIDI synth's 24 idle note channels)).

When in cycle-accurate CPU mode, it's indeed very slow (constant 0%, although running). It's has a green outline, so it's actually running (but at less than 1% speed).

In IPS clocking mode, it's faster (more than twice the speed) as measured on an i7-4790K (~35% in cycle-accurate, ~80% at IPS clocking mode(at 315KIPS with it's default setting used).

In IPS clocking mode, I do see the speed percentage going from 0% to 1% and back every few seconds. So it's definitely running, but very slowly.

Can confirm, even with the release version I can see the CPU speed hovering at 0-1% when set to IPS clocking mode. So it is running. After leaving the PSP alone in a corner for several minutes, it does display the beginnings of a launching BIOS.

Other than switching to IPS, are there any settings that I can change to get it somewhere close to being usable?

(1 edit)

Well, you can lower the cycles setting to make it run more realtime(less cycles = more realtime).

But that will come at the cost of running software of course slowing down, as the amount of instructions each second (in emulated time, not realtime) becomes less.

The main issue is that according to most profiling I did on Windows builds, it's spending most time on memory accesses alone (RAM and ROM memory). I'm trying hard to optimize it to get faster, but it's proven difficult because no dynarec of any sorts is implemented (it's about ~20% CPU emulation(of which about only memory accesses itself being the main cuprit for that), ~20% video card emulation and 60% of only the other hardware and timing overhead).

It's running better on Android (5%) and i7-4790K Windows (~20%), but I have no idea what to change atm to make it more fast right now. Most memory accesses are already reduced to single 16/32-bit accesses where possible, but most of it seems to be prefetching only (filling up the Prefetch Input Queue to contain the maximum instruction length each instruction). Although said overhead is less on cycle-accurate mode(due to not loading unneeded bytes from memory most of the time), it has it's own optimization difficulties (due to ticking hardware and CPU multiple times during a instruction (the amount of CPU cycles for an instruction)).

Disabling some of the emulated hardware should theoretically free up some speed for the CPU to run faster(if it's enough), but the CPU is mostly ticking way more than the hardware anyways(all but VGA and CPU always ticking at 14.31818MHz base intervals, with the VGA depending on it's state(25/28MHz on VGA, 14.31818MHz on CGA, EGA, MDA clock crystals, even higher on SVGA when setup(up to Dosbox's ratings of the Tseng chips)) and CPU acting as the base timing for all hardware (divided it up into video card and 14MHz(1MHz for Sound Blaster, ticking at 14MHz base clock division) depending on it's speed setting(the IPS cycle setting).

Dosbox PSP does have a dynarec - code is here if you want to have a look.

Well, the issue with dynarec is that it completely destroys any cycle-accuracy. I want to keep the emulator capable of doing that(and Dosbox doesn't have that issue, as it runs instructions in blocks anyway).


Currently the main issue preventing speed is the sheer amount of memory and ROM read/write operations itself. So that means that the BIU and mostly instruction fetching itself (which is reading memory most of the time) is the main bottleneck. The memory reads themselves are already optimized to be only a single call when the address is byte/word/dword aligned(it then uses direct memory access to read/write data to/from the CPU cache or write buffer(for writes only) using a simple pointer dereference). Unaligned dword data is handled as words if possible, bytes otherwise.

But even with all that, the profiler still reports the memory accesses themselves being the hot path.