A record store

25 January 2008 | Sean | Coding | 4 Comments

I’ve been working on the flat file record store. The idea is we’ll have a bunch of JSON records that will be indexed with Solr. Those files sit in groups of up to a thousand in up to a thousand directories, two levels deep. In more graphical terms:

records/000/000/000 <-- record with id# 000000000
       /000/000/001 <-- record with id# 000000001
       /001/204/586 <-- record with id# 001204586

I really have no idea how well this is optimized for file IO, but I appreciate the simplicity of it. I’m excited to see the performance with a few million records. Comments about alternatives or potential pitfalls are welcome.


  1. Hoss said on 26 Jan 2008 at 3:07 am:

    common thing people anticipate when doing flat file data storage: filesystems tend to do poorly with huge number of inodes in a single directory — i’ve heard “keep it below 4000” is a good rule of thumb.

    so people do sub-dir hashing of their resource filenames — really easy to do when they’re numeric, just make every 3 digits a directory — except that you also want to reverse the order of the digits .. you want the digits that increment the fastest to be in the top level directory names so you get a better distribution across your directories (hashing doesn’t do you much good if every sequence of 999 records that you add are all contesting for the same directory)

  2. Sean said on 27 Jan 2008 at 5:38 pm:

    Thanks for the comment. Much appreciated. I’ll remember the <4000 rule.

    I decided to test out the “reverse the digits” rule. I wrote the following code:

    import os
    text = 'some text in a file.'
    def create_file(file_id, record_dir):
        file_path_list = [record_dir, file_id[:3], file_id[3:6], file_id[6:]]
        file_path = '/'.join(file_path_list)
        print 'Creating file %s ...' % file_path,
            file_handle = open(file_path, 'w')
        except IOError:
                os.mkdir('/'.join([record_dir, file_id[:3], file_id[3:6]]))
            except OSError:
                os.mkdir('/'.join([record_dir, file_id[:3]]))
                os.mkdir('/'.join([record_dir, file_id[:3], file_id[3:6]]))
            file_handle = open(file_path, 'w')
        print 'done.'
    def increment_bottom(record_dir):
        for number in xrange(10000000):
            file_id = '%09d' % number
            create_file(file_id, record_dir)
    def increment_top(record_dir):
        for number in xrange(10000000):
            file_id = '%09d' % number
            file_id_list = list(file_id)
            reversed_file_id = ''.join(file_id_list)
            create_file(reversed_file_id, record_dir)

    and with an increment_bottom('records0') and an increment_top('records1'), I created twenty million small files, ten with the digits reversed, ten not.

    (to be continued)

  3. Sean said on 28 Jan 2008 at 2:06 pm:

    … and still to be continued! I’m not sure why yet, but the creation of those ten million files with the digits reversed is taking a long time. More in a bit.

  4. Sean said on 5 Feb 2008 at 5:10 pm:

    So the big lesson here is there are some things about disk IO I still don’t understand. I quit after creating 5 million records with the digits reversed, and even with those files merely sitting on my filesystem I’m seeing iowait (under “wa” in the top display) bouncing around from 60-97%. A simple “du -h” on either records0 or records1 takes up to an hour. So there’s some more work to be done there.

    In the meantime, I ran the following:

    >>> import timeit
    >>> t = timeit.Timer("f = open('records0/003/384/792'); f.read(); f.close()")
    >>> t.repeat(3, 200000)
    [3.7020750045776367, 3.6237351894378662, 3.6314709186553955]
    >>> t = timeit.Timer("f = open('records1/297/483/300'); f.read(); f.close()")
    >>> t.repeat(3, 200000)
    [3.5828580856323242, 3.5122859477996826, 3.517305850982666]

    As you can see, the access times are slightly lower for records1 (with the digits reversed), but not enough of a difference to make the extra complexity worthwhile. I didn’t do any testing on the creation of files, but it took much longer to create the records1 directory with its 5 million files than the records0 directory with its 10 million. This might have simply been because I created the records0 first, however.

Leave a Reply