performance on large caches

I have an app that does a large import, like 25,000 record csv files,
at least once a day per customer and often times more often than that.
Most of the contents of the files will be the same from day to day,
perhaps 10% of the records will be new or modified. It occurred to me
that I might be able to reduce the database hit on the import process
by using a cache in front of it. I wrote a task that takes the parsed
contents of each record, does an SHA1 hash on it and sticks it into
the Coldbox cache, using that hash as both the contents and the key
name. When doing the import, I'll first check the cache and see if it
is there. If it is, I know I don't have to check the db. If not, I
need to check and see if it exists in the db and then act accordingly.

That's great, but in testing I'm finding myself with a cache of
100,000+ objects and having to use expiration dates and last date
accessed times of days on my objects to make sure that they are there
the next time I want to access them. The objects themselves are small
but the memory footprint is piling up and will only get worse in
production as we add customers. I'm also unsure about the performance
of doing the cache lookups on big object pools.

This makes me wonder if this was as good an idea as it seemed at the
time. Should I use the Coldbox cache for this sort of task? If not,
what about ehCache? Or should I revert back to a query based approach,
make sure my indexes are good and cache the queries?

Thoughts?

Thanks,
Judah