Ångströ.net

From CommerceNet Wiki

Jump to: navigation, search

Contents

[edit] Why

While we expect that an atom-oriented storage system will contain information sensitive enough that users will want to download and run local copies they have full control over, it is equally vaild to insist that the TPd code scale well enough to serve multiple users across multiple servers.

[edit] Storage

Persistent storage of multiple users' data requires effective access control and garbage collection.

This also presumes that there are workable identifiers for retrieving values over time.

[edit] "Files"

Sometimes you want TPd to serve up straight files, like a logo or documentation. In that case, how can TPd trust content uploaded over the Internet? A sysadmin-like account with Basic password authentication over SSL is a plausible solution, but this presumes that TPd has an SSL server certificate. Even when self-signed, that certiicate has to have a DNS name for itself, which is not always available to a runtime or installation script.

A simpler (non-)solution: TPd reads an identity from tpd.key on startup. If it can't find tpd.key, it collects some entropy and creates a key. When posting content, an additional header (x-tpd-content-auth?) is sent containing a hash of the content (in data: form with all percent-escapes decoded) prefixed with the contents of tpd.key. for example,

tpd.key: sdaGDSA

content-type: text/html; charset=utf-8

<h1>huge!</h1>

unescaped data: url:

data:text/html; charset=utf-8,<h1>huge!</h1>

unescaped data prefixed with the contents of tpd.key:

sdaGDSAdata:text/html; charset=utf-8,<h1>huge!</h1>

so, using MD-5:

x-tpd-content-auth: md5:c698b698046ac221e13dea2b37e70741

or, using SHA-1:

x-tpd-content-auth: sha:dc7e1326ffce90d402c31f5bc3831f3a0e5086bd

(or perhaps that should use urn:sha1:... there don't seem to be specs for this yet)

Content posted without the correct x-tpd-content-auth will be treated as "untrusted", and made available only in escaped and filtered forms. For instance, (X)HTML might be stripped of unrecognized and unsafe elements, attributes, and namespaces, and might have markup added to produce a conformant, parseable document.

[edit] Security

  • message validation -- that the entire ANVL integrity is assured
  • liability -- that the A of the ANVL actually certifies that "A believes N == V @ L"

[edit] Bugs

GET must be kept idempotent. Content must be sanitized until proven otherwise

[edit] Needed Before We Can Deploy

  • archiving policy
  • stable state and identifiers
    • offer an optional redirect to the new ANVL in the response to a POST
    • allow syncing across tpds
  • don't reflect GETs in ANVL-space
    • require POST for verb upgrading
  • don't reflect untrusted content in (ab)usable form
    • we need a static content reflector or an auth mechanism to allow apps to be hosted on tpd

[edit] Priority list of to-do-next items

[edit] XML Database

integrate with a real xml database... not sure what priority this gets, but it's a big separable project, such as Sleepycat BerkDB XML

[edit] Atom-Normal Form

Convert codebase to use Atom as the preferred format for pickle/unpickle, input/output (with pretty HTML autogenerated by MiniML + CSS + DHTML banner templates)

  • Should we also use it for input (bursting)?
  • Should we also force UTF-8 (or some other encoding) throughout the project?
Personal tools