I’ve been working on the flat file record store. The idea is we’ll have a bunch of JSON records that will be indexed with Solr. Those files sit in groups of up to a thousand in up to a thousand directories, two levels deep. In more graphical terms:
records/000/000/000 <-- record with id# 000000000 /000/000/001 <-- record with id# 000000001 ... /001/204/586 <-- record with id# 001204586
I really have no idea how well this is optimized for file IO, but I appreciate the simplicity of it. I’m excited to see the performance with a few million records. Comments about alternatives or potential pitfalls are welcome.