We've realised that the biggest constraint is really the size of the pads. You're going to want a QSPI interface to some RAM and flash, so that's 7 or preferably 12 IOs. And then 4 or 8 IOs for actually doing something. By the time you've made that pad ring you might as well put at least enough RAM in it for registers.
But yes, I believe SERV can be configured to store registers in external RAM, that will be extremely slow though!
I think going for RV32E (only 15 registers) makes sense. Also if you support the compressed instructions (which is good to save flash bandwidth if you have no/minimal icache) they mostly can only use registers that are in RV32E anyway.