this drops a 100k loop of copies from ~2700ms to ~37ms and a 100k loop
of copy-n-find from ~3900ms to ~40ms.
the cache addition is predicated on the idea that lookups of the same files will be frequent enough to warrant the small amount of extra memory usage for the cache. if that does not hold in practice (though it appears to) then this will not be a gain and in fact be a small net loss for memory footprint
the dptr change is predicated on the assumption that copies of Package will be frequent, and so must be low cost. this is known to be true. even if it wasn't, there is no down-side to this change. the upside are significant time and memory savings.