74HC595PW Shift Registers for Large LED Arrays

Hey everyone,
I’m working on a project where I’ll drive a moderately large LED array (say 8×16 or more) using multiple shift registers, and I’m considering the 74HC595PW-112. It’s a classic part, but as always, real-world issues pop up when you stack them. I’ve used 595s before in small scopes, but this time I’m trying to push more LEDs with multiplexing or scanning, and I’m a little uneasy about chain delay, noise, and timing.
A few things I’d love to get your feedback on:
How many 74HC595s chained do you feel is practical before timing becomes a pain (especially at higher refresh rates)? When multiplexing, do you add latching delays or buffer stages to avoid ghosting? Any tricks you swear by? Level shifting & voltage tolerance: if driven from 3.3V logic but powering LEDs at higher voltages (5V), any gotchas in using HC series parts?
Thanks. always fun to pick the brains of folks who’ve been there.