Checksums
Posted: Wed Apr 23, 2008 3:56 pm
Introduction:
We've had some trouble in the past -- and even recently -- with incomplete and corrupted uploads. And when things go wrong with the blog, it'd be nice to have some way to say "at least the files are OK".
Enter checksums. By keeping a list of the required files and their MD5 (or other algorithm) checksums, we can check at any time whether our files are intact.
Enter FTP, opposing. Since most of our files are ASCII, FTP can mangle them by translating the newlines to the local machine's variation. (DOS, Unix, and Mac all use different newlines.) The checksum algorithm would detect this perfectly normal and valid change as corruption.
Enter SuperJude! I've worked around this problem by reading the file into a buffer, changing the various newline variations to spaces, and checksumming that instead. Now FTP variations don't modify the checksum, and we can detect important modifications to the file.
I'm working on this in trunk/, and I've got it mostly working on one of my sandbox installations. It's surprisingly fast: generating checksums takes a few seconds, and verifying them takes two seconds or less.
The Questions:
There are lots of ways to work this. I've chosen to add serendipity_FTPChecksum (which calculates the FTP-impervious MD5 checksum for a single file) and serendipity_verifyFTPChecksums (which returns a list of files with incorrect checksums) in functions_installer.inc.php. I also updated the upgrader and the installer to call serendipity_verifyFTPChecksums when they're run. Finally, I'm modifying serendipity_admin.php to provide checksum validation from a button and a special URL. The checksums themselves are in the root directory, as checksums.inc.php.
But for generating the list of checksums, I've provided a serendipity_generateFTPChecksums script in deployment/. I'm not sure this is the best way to do it. What I'd really like is to have checksums automatically generated along with the nightly builds, and included in the build archive. How do I do that?
Finally, when I actually get this working, is there a chance that it'll get into the distribution, or is the whole thing just too radical?
We've had some trouble in the past -- and even recently -- with incomplete and corrupted uploads. And when things go wrong with the blog, it'd be nice to have some way to say "at least the files are OK".
Enter checksums. By keeping a list of the required files and their MD5 (or other algorithm) checksums, we can check at any time whether our files are intact.
Enter FTP, opposing. Since most of our files are ASCII, FTP can mangle them by translating the newlines to the local machine's variation. (DOS, Unix, and Mac all use different newlines.) The checksum algorithm would detect this perfectly normal and valid change as corruption.
Enter SuperJude! I've worked around this problem by reading the file into a buffer, changing the various newline variations to spaces, and checksumming that instead. Now FTP variations don't modify the checksum, and we can detect important modifications to the file.
I'm working on this in trunk/, and I've got it mostly working on one of my sandbox installations. It's surprisingly fast: generating checksums takes a few seconds, and verifying them takes two seconds or less.
The Questions:
There are lots of ways to work this. I've chosen to add serendipity_FTPChecksum (which calculates the FTP-impervious MD5 checksum for a single file) and serendipity_verifyFTPChecksums (which returns a list of files with incorrect checksums) in functions_installer.inc.php. I also updated the upgrader and the installer to call serendipity_verifyFTPChecksums when they're run. Finally, I'm modifying serendipity_admin.php to provide checksum validation from a button and a special URL. The checksums themselves are in the root directory, as checksums.inc.php.
But for generating the list of checksums, I've provided a serendipity_generateFTPChecksums script in deployment/. I'm not sure this is the best way to do it. What I'd really like is to have checksums automatically generated along with the nightly builds, and included in the build archive. How do I do that?
Finally, when I actually get this working, is there a chance that it'll get into the distribution, or is the whole thing just too radical?