Document management best-practices

Maintaining non-HTML documents on a website presents several unique problems. The most common problem is that out-of-date documents are available to the public through a Google search, resulting in confusion.

A hypothetical:

  1. A document titled "School Rule FAQ.10.1.22.pdf" is uploaded to the server and linked from a page.
  2. Google indexes the link to that file the next time it crawls the site and stores the link to the document in the index.
  3. A new file titled "School Rule FAQ.11.1.22.pdf" (note slightly different link) is uploaded to the server and is linked from the previously mentioned page.
  4. Google indexes the new file the next time it crawls the site and stores the link to the document in the index.
  5. Google now has both "School Rule FAQ.10.1.22.pdf" AND "School Rule FAQ.11.1.22.pdf" in its index. If the old file is still on the server Google will continue to list that file in the index, and people will still be able to view it. Google will only remove the file from the index once it has returned a "404 File not found" error.

To avoid these problems, the following is best practice:

  • When titling a document use a reader-understandable name, e.g. Waiver Request Form.pdf, rather than Waiver-Form-10-16-14.pdf
  • Don't include version numbers or dates in the title of a document. Include versioning and date information in the document itself.
  • Whenever possible "replace" a document, don't upload a new version.