Introduction

The Interchange File Format is a simple structured binary file format consisting of sized and typed chunks of data, selectively readable without having to know the format of each chunk. This functionality is similar to what XML provides for text documents, and the IFF format can indeed be viewed as a sort of a binary XML. IFF's extensibility is an excellent way of not breaking old applications when the file format changes, making it an excellent choice for your next application's data files. The IFF is also the simplest and the smallest such data format, ensuring that your files consist of real data rather than overhead and that your code spends more time on real work than on parsing the data file. This library defines the IFF header structures and provides simple algorithms for directly writing many of your objects as chunks and containers.

Installation

Building this library is a fairly standard procedure. Download the package from https://github.com/msharov/iff/releases/latest. Make sure you have the necessary dependencies: C++ compiler, such as gcc 4.6+, the uSTL library version 1.6 or higher.

First, unpack and install uSTL, as described in its documentation. Unpack libiff and run ./configure && make install, which will install the library to /usr/local/lib and headers to /usr/local/include. ./configure --help lists available configuration options, in the usual autoconf fashion. The one thing to be aware of is that by default the library will not be completely conforming to EA85 specification. Why that is so, and why you should take the default options anyway, is discussed in detail in the next section. If you really want to use the original EA85 format, you can to pass --with-bigendian --with-2grain to configure.

Usage

If you are using C++, chances are you already have an object-oriented design of some kind. You have a collection of objects, related to each other in some way, and you want to write them all to a file in some way. It is, of course, possible to just write them all to the file, one after the other, but that approach makes things difficult if you ever decide to change the structure of those objects, write more or fewer of them, or explain to other people how to read your format. Hence, it is desirable to create some kind of structure in the file, to be able to determine where each objects begins and ends, and what kind of object is where. When using an IFF format, you'll make simple objects into chunks, and objects containing other objects into FORMs, LISTs, or CATs.

The first task is to make each of your objects readable and writable through uSTL streams. To do that you'll need to define three methods, read, write, and stream_size, and create flow operator overrides with a STD_STREAMABLE macro. Here is a typical example:

    #include <iff.h>	// iff header includes ustl.h, but doesn't use the namespace.
    using namespace ustl;	// it is recommended to leave iff:: namespace on.

    // Since the object is simple, and contains no other objects, we'll
    // make it a simple chunk. The chunk format is a 4 byte integer. In a
    // hex editor you'll see PLYR at the beginning of the object making it
    // easy to find when you want to hack something in it.
    enum { fmt_PlayerStats = IFF_FMT('P','L','Y','R') };     

    /// Stores player's vital statistics.
    class CPlayerStats {
    public:
	void        read (istream& is)
			{ is >> _hp >> _maxhp >> _mana >> _maxmana; }
	void        write (ostream& os) const
			{ os << _hp << _maxhp << _mana << _maxmana; }
	size_t      stream_size (void) const
			{ return (stream_size_of (_hp) +	// This evaluates at compile time to 8,
				  stream_size_of (_maxhp) +	// so making this function inline is
				  stream_size_of (_mana) +	// usually a good idea.
				  stream_size_of (_maxmana)); }
    private:
	uint16_t    _hp;
	uint16_t    _maxhp;
	uint16_t    _mana;
	uint16_t    _maxmana;
    };

    STD_STREAMABLE(CPlayerStats)

This needs to happen in all your objects. Make them streamable and define a format id. Then, to save everything, use iff::Read/Write functions from iff/utils.h like this:

    void GameObject::SavePlayer (const string& filename) const
    {
	memblock buf (iff::form_size_of (m_Player));	// A buffer to store the file
	ostream os (buf);					// and a stream to write the player object to.
	//
	// This writes it as a FORM, which just happens to be one of the three
	// allowable top-level chunk types. When a file starts with FORM, LIST,
	// or CAT, programs know it is an IFF file. The top level chunk should
	// contain the entire file, and nothing should follow it.
	//
	iff::WriteFORM (os, _player, fmt_MyGamePlayer);
	buf.write_file ("player.sav");
    }

    // The following will be in the player class

    /// Reads the player object from stream \p is
    size_t CPlayer::stream_size (istream& is)
    {
	// Player is a compound object, a FORM, so it can only contain other
	// chunks and FORMs, not simple values.
	return (iff::chunk_size_of (_name) +	// _name is a string
		iff::chunk_size_of (_stats) +	// CPlayerStats object
		iff::vector_size_of (_inventory) + // a vector, chunk saved
		// Quests is a special manager object that writes each quest
		// as a chunk or a form with some common settings.
		iff::list_size_of (_quests));
    }

    /// Reads the player object from stream \p is
    void CPlayer::write (ostream& os) const
    {
	iff::WriteChunk (os, _name, fmt_PlayerName);
	iff::WriteChunk (os, _stats, fmt_PlayerStats);
	iff::WriteVector (os, _inventory, fmt_Inventory);
	iff::WriteLIST (os, _quests, fmt_Quests);
    }

    /// Reads the player object from stream \p is
    void CPlayer::read (istream& is)
    {
	iff::ReadChunk (is, _name, fmt_PlayerName);
	iff::ReadChunk (is, _stats, fmt_PlayerStats);
	iff::ReadVector (is, _inventory, fmt_Inventory);
	iff::ReadLIST (is, _quests, fmt_Quests);
    }

The above is enough to get you a structured file with format checking on read. If you try reading something that is not a player savegame file or if the file is somehow corrupted, you'll get an informative exception. The read method above is the simplest implementation, to be used if you really do not expect things to change at this level. A player will always have a name, some stats, and stuff in his pockets. But what if you decided to add, say, a known spell list? The read function will then break. Sure you can hack up some conversion routine for yourself, but what about your users who already have an old version of your game? Are you going to tell them that those fifty four hours of gameplay they saved will have to be replayed just because you added one lousy feature? Well, some companies are like that. But with IFF, you have the formats and sizes in each header, so you can make a more resilient read that will support new and old formats alike:

    /// Reads the player object from stream \p is
    void CPlayer::read (istream& is)
    {
	while (is.remaining()) {
	    //
	    // The Peek function will get the format without moving the stream
	    // pointer, handling also the case of compound chunks.
	    //
	    switch (iff::PeekChunkOrGroupFormat (is)) {
		case fmt_PlayerName:  iff::ReadChunk (is, _name,      fmt_PlayerName);  break;
		case fmt_PlayerStats: iff::ReadChunk (is, _stats,     fmt_PlayerStats); break;
		case fmt_Inventory:   iff::ReadVector(is, _inventory, fmt_Inventory);   break;
		case fmt_Quests:      iff::ReadLIST  (is, _quests,    fmt_Quests);      break;
		default:              iff::SkipChunk (is);                               break;
	    }
	}
    }

This way missing chunks will not cause a problem, resulting in defaults being used for the corresponding object, and neither will new ones, which will be skipped. Once you have this set up, changing the file format becomes more or less easy and painless, both for you and your users. All this you get for a very small cost of changing simple flow writes to WriteChunk and the like. As you can see above, the code remains readable, maintainable, and compact. Finally, if a user sends you his savefile that reproduces some bug, and you need to find out whether he was low on mana, you'll be able to find the right number at a glance in a hex editor; it's bytes 8 and 9 after STAT.

Format description

The full format description is available in the original EA85 IFF standard document, and you are invited to read it. This section provides only a short and practical summary.

Every IFF file consists of a hierarchy of chunks, each chunk being an opaque block of data with an 8-byte header specifying the chunk's type and size. The type field is a 32-bit integer with such a value that it looks like text when viewed in a hex editor. Typical values are "BODY", "FORM", or "LIST". The size field is a 32-bit integer specifying the size of the chunk's data, not including the 8-byte header. For example, a color palette chunk would start with 4 byte type "CMAP", followed by the size of the color data, 256*3=768, and followed by 768 bytes of RGB 3-byte values, for a total of 776 bytes in the chunk with header.

Some chunks may contain other chunks. These compound chunks have format fields FORM, LIST, and CAT, and append a 4 byte contents type to the chunk header before the child chunks. For example, a FORM chunk containing two other chunks will start with format field FORM, followed by the size of the chunk data, which is the number of bytes between the end of this size field and the end of the last child chunk, followed by the two child chunks in normal format.

The difference between FORM, LIST, and CAT, is that FORMs can contain chunks. CATs contain only other containers and are thought of as a simple aggregation of them without regard for order. LISTs are like CATs, but may also contain PROP chunks, which are scoped named property values.

The above ideas are the ones present in all the IFF-based formats, which, believe it or not, still exist. However, it is not very likely that you will encounter any and when you do, there is a good chance that what you will find will not be standard compliant. The problematic issues are byte ordering and chunk alignment. The EA85 specification states that header integers are to be in big endian byte order and that chunks are to be aligned on even grain, but every IFF-based format makes up its own mind about byte order and alignment. This library is no exception, having chosen little-endian byte order and 4-grain alignment.

Why be incompatible? Because there really aren't all that many IFF files out there. Aside from Windows .wav files in RIFF format, you are unlikely to ever encounter one. It is far more likely that you are using this library to write your own data files, which are not expected to be read by any old IFF-based software, all of which have been obsolete for decades. Therefore, it is much better to modify the format a little to fit modern hardware rather than the old Motorola 68000 for which EA85 IFF standard was designed.

Why use little-endian byte order? Because pretty much every computer in the world uses little-endian integers today. Back in 1985 there were quite a few big-endian machines, including Motorola 68000, and many people thought that big-endian byte order will eventually become the standard as Intel heretics convert to the true faith. This was particularly true in academic circles, which were aggressively lobbied by Sun and its hardware obtained with substatial educational institution discounts. Over time, the exact opposite has happened, and the only big-endian machines remaining are gathering dust in server rooms and antique university computer labs. The rest of us use Intel or ARM processors, both of which are little-endian.

As a result, it makes far more sense to use little-endian integers everywhere, furthering the entrenchment of this now almost universal standard. Many IFF formats, including its most common variation, the Microsoft RIFF format, used for .wav files, already use little-endian integers. In a few more years big-endian format should disappear completely and all that extra byteswapping work will no longer be necessary. Until then, it's your choice; will you stick to an old format for the sake of conformity to an obsolete standard, or embrace the future and set a new standard that best fits modern hardware? Both options are supported by the library, via configure.

The second contenuous issue is chunk alignment. The EA85 standard specifies 2-grain alignment because the 68000 processor could not access a dword at an odd grain. This is still true, and some modern CPUs also crash when you try to do that. Intel architecture CPUs do not crash, but they slow down considerably. In addition, unlike the 68000 processor, modern CPUs require 4-grain alignment for 32-bit values. This is the reason for choosing 4-grain as the default alignment in this library.

Why read the integers in this way? Yes, a memcpy or a direct file read to an address will not suffer from alignment limitations, but it creates considerably more code. Using direct cast-to-dword-and-dereference method is the fastest and the most compact way to read stuff into memory. Each such read compiles to about five instructions, and it is possible that future compiler improvements will be able to reduce it to three.

Using a larger alignment than required will not even make all that much difference in most cases, since most data you are likely to be working with is already sized as a multiple of 4. Microsoft's RIFF format is the one example known to exist and to already be 4-grain aligned. So even though most of the files you are likely to find (if you can find any...) do not align the chunks at all, your chances of being able to read them with this library compiled with default settings are reasonably good.

Support

Problems should be reported using Github project trackers reachable from the library project page.