Things of interest.

 
 
July 21, 2021

Basis for Base64


Reductionist SQLite storage.

I’ve lately been storing generated binaries and small files in SQLite via base64 text and BLOB fields.

I note the claims by Django folks that it’s ‘not good practice’ (which appears quite contrary to formal experimental verification, and possibly untrue unless you’re network-bound).

Arguably they have a point with BLOBs. But I don’t think I’ll agree with the arguments against base64 storage: hey, it’s just text. And I like it. It reduces filesystem management and I get a relational framework to manage metadata, rather than having to rely on the OS’s filesystem, which often feels slow (especially in Windows).

This was also an interesting read on the history of Base64:

Your first mistake is thinking that ASCII encoding and Base64 encoding are interchangeable. They are not. They are used for different purposes.

  • When you encode text in ASCII, you start with a text string and convert it to a sequence of bytes.

  • When you encode data in Base64, you start with a sequence of bytes and convert it to a text string.

To understand why Base64 was necessary in the first place we need a little history of computing. …

Originally a lot of different encodings were created (e.g. Baudot code) which used a different number of bits per character until eventually ASCII became a standard with 7 bits per character. However most computers store binary data in bytes consisting of 8 bits each so ASCII is unsuitable for tranferring this type of data. Some systems would even wipe the most significant bit. Furthermore the difference in line ending encodings across systems mean that the ASCII character 10 and 13 were also sometimes modified.

To solve these problems Base64 encoding was introduced. This allows you to encode arbitrary bytes to bytes which are known to be safe to send without getting corrupted (ASCII alphanumeric characters and a couple of symbols). The disadvantage is that encoding the message using Base64 increases its length - every 3 bytes of data is encoded to 4 ASCII characters.

To send text reliably you can first encode to bytes using a text encoding of your choice (for example UTF-8) and then afterwards Base64 encode the resulting binary data into a text string that is safe to send encoded as ASCII. The receiver will have to reverse this process to recover the original message. This of course requires that the receiver knows which encodings were used, and this information often needs to be sent separately.


 

- Areas: Code / Music / Gaming / Languages
- Lang: Japanese / French / English
- Hume: Law / Work / Learning / People
- All: Tags / Posts / Search
 
© 1997 - 2021 / Info