Methods of proving data integrity
(c) GNU Free Documentation
License 2001 Horst Herb, hherb@gnumed.net
In order to prove data integrity we have to check
- that the document has not been accidentally corrupted
- that the document has not been illegitimately
manipulated
Protection against accidental corruption
This is the easy part. The most primitive attempt would be to store multiple
copies. The likelihood that all of them get corrupted is slim. However, with
only a few copies you might get into a situation that most of them do not
match mutually. How would you decide then which copy is the true original?
There are mathematical one way functions, so called "hashes". These functions
will calculate a presumably unique large number for any given document. We
will call this number a message digest. Of course, it will depend
on the choice of hashing algorithm how likely it is that this number is really
unique.
Cryptography is an exact science, a branch of mathematics. There is a
lot of money in cryptography, so people working professionally in it show
similar enthusiasm for their work as we do in medicine. As in any other exact
science, good and proven methods are fully disclosed, the rest is likely
to be snake oil.
An excellent choice of hashing algorithm to digest the type of documents
we are likely to encounter in paperless medicine is
RIPEMD-160. You should
visit this link to learn more about it. It is reasonably crash proof
(no known way of altering the document yet preserving the unique message
digest), and the performance is still good enough on low end computers for
daily use.
To generate a RIPEMD-160 message digest ("digital fingerprint") of your
document, you can download the "GNU Privacy
Guard" . After installing it, you enter at your command prompt:
gpg --print-md ripemd160 <your file name>
Wildcards like "*" are allowed, then all files in the current
working directory will be digested. You can capture the screen output
in a file:
gpg --print-md ripemd160 * >> allfiles.rmd
would create a text file "allfiles.rmd" (if it does not exist yet) and
append a list of all file names followed by their message digests. You can
make a backup of this list, and use it to compare it to another list generated
in the future. A digest mismatch for any file will prove that that file has
been altered in some way.
In order to understand the pwoer of such a function, I would suggest the
following experiment:
Get the largest text document you can find on your computer (or download
a large text file like the more than 1000 pages long classic
Anomalies and Curiosities of Medicine by George Milbry Gould and Walter Lytle
Pyle). Calculate the message digest as outlined above. Now open the
document in your favourite text editor, alter one single character anywhere
in the file (like changing a period into a comma), save it and calculate
the message digest again. Impressed?
Now you know how to track down data corruption in an easy and practical way.
Protection against illegitimate manipulation
In order to prove that a gioven document has not been altered, we need
two things:
- a message digest of that document as outlined
in the previous paragraph
- some sort of proof
- when this digest has been generated
- who generated this digest
Proving who generated the digest can be done by digitally signing the digest.
Theoretically, the "standard" algorithm for digital signatures, "DSA", could
be used both for creating the message digest and the signature. Part of the
DSA algotithm is generating a message digest with the "SHA" algorithm.
The signature can be done again with GPG:
gpg --ba <message digest file name>
You can check the signature with
gpg --verify <message digest file name>.asc
Documents in medical health records can be quite large. The reliability of
a hash function declines with the size and complexity of a message.
RIPEMD-60 seems to have a definite advantge. You may read details about the
vulnerability of
SHA ``Differential Collisions in SHA-0,'' Advances in Cryptology - Crypto'98,
LNCS 1462, H. Krawczyk, Ed., Springer-Verlag, 1998, pp. 56-71.
Although it is extremely unlikely that a SHA-digested document can be altered
in a way that it would still produce the same SHA digest, there is no reason
not to use RIPEMD-160 to digest large files as it is apparently more reliable,
free and has no other known disadvantages.
A digital signature proves that you have signed a particular document as
long as yout private key has not been compromised. Similar to loss of credit
cards, you will be liable for loss of your private keys if you don't notify
the key certifying authority (which might be yorself or your Division) that
your key has been compromised. You would have a hard time in court proving
that a document digitally signed by you has not been signed by you.
Now comes the tricky bit: Proving when the document has been signed.
A timestamp is embedded in the signature . The only way the signature generating
software can tell the time is by querying the time provided by your computer
clock - which you can adjust at will, any time.
The solution is that you have to deposit your signature with a trusted 3rd
party (trusted by both you and a judge in a potential court case). Almost
as good a prove will be if the trusted 3rd party countersigns your signature:
as the times tamp is embedded with both signatures, and you can't forge
the signature of the signing 3rd party (if it is a trusted one), you can't
manipulate the time stamp provide by the 3rd party.
You can do this even without Internet access: simply print the signature
and the countersignature onto a piece of paper, and deposit it with a trustworthy
3rd party. To verify the signature,you would have to type it in again.
The more extreme the demands on proving authenticity are, the more 3rd parties
you have to involve. I believe that two independend non-profit organisations
(like the Divisions of General Practice) would be trustworthy enough , but
you might choose to involve a justice of peace, a notary or similar institution.
The benefit of this method is that you don't have to disclose any confidential
data to the signing 3rd party, and that the volume of data to sign
is that small that it is practical to distribute th signatures to many different
servers through the Internet. Done properly,there is no need for expensive
PKI infrastructure or key certification authorities (which are rather worthless
anyway, read this article
written by one of the world's foremost "crypto Gurus", Bruce Schneier.
There is one utmost important issue: always bear in
mind that you might need to proof authenticity of a given document many years,
even decades, in the future. Therfore you must not use any software that
depends on a particular platform like for example Microsoft Windows.
You cannot expect that particular software will survive and be maintained
for decades. The same issue is valid for commercial key authorities. They
are unlikely to exist forever. Do not make yourself dependent on them!
Quality and dependable software suitable for our purposes will run on virtually
any platform, and source code will always be provided. The source code is
written in a portable way that will make it easy to maintain it on future
platforms. Even if you are not able to use the source yourself, ther ewill
always be someone who cyou can pay for such a service. Without the source,
you are lost.
Therefore, at present I cannot recommend any other product than the
GNU Privacy Guard for our purposes. It might be a little bit more dificult
to use than other products, but it is at least future proof.
|