Wowbar notes

From CommerceNet Wiki

Jump to: navigation, search

Contents

[edit] Wowbar Spec Outline (Wiki version)

Notation: [] is premature commitment.

[edit] Basic idea

WOWbar is a browser sidebar that displays annotations for the page being viewed. WOW-AA is the Annotation Architecture that WOWbar relies on.

[edit] Naïve User model

WOWbar shows me things I need to know about this web page, displayed alongside each page I look at. These things come from WOW-AA compatible servers that I like.

[edit] Geek User model

WOWbar is a platform supporting webpage annotation presenting results in a browser sidebar. Annotations are what individuals (me, for instance), communities, organizations and their software agents have to say about a web page, displayed alongside the page arranged/prioritized according to my preferences. WOW-AA is the collection of protocols and conventions by which otherwise uncoordinated parties can make their annotations available to others as a service on the web; when I find out about such services, I can plug them into my WOWbar.

[edit] Architectural sketch

  1. The sidebar displays multiple annotation items which are associated with the current page's url.
  2. Items come from a source and are chunks of item-format [a format isomorphic to HTML pages] [displayed in frames].
  3. Sources are uniquely named by uri Sources actually are properly "item-list sources" since they output item-lists.
  4. Item-lists are ordered sets of items, but the order is only a hint.
  5. Users have preferences, which include a source-list (a list of item-sources) and a mixer.
  6. Mixers notionally map sets of item-lists to item-lists (in a bounded amount of time), ordering and eliding items according to its own purposes.
  7. Mixers are more than that, though, since they include a list of sources. Hence they are themselves sources.
  8. Mixers return their results in an appropriate (bounded) amount of time.
  9. Mixers are uniquely named by uri, and conceivably described by a uri.
  10. Users have an associated wow-wiki which provides the page viewed by his Wowbar.
  11. This page is constructed by the displayer which takes mixer output and converts it to display format.
  12. The user's wow-wiki allows him to store annotations per page and the Wowbar UI makes this convenient.
  13. The user's wow-wiki (his annotation store) is a source with a name he can provide to others. It would be nice if these names are nonces which are hence revokable. Similarly, for his mixer.
  14. Sources communicate via HTTP, so can use any HTTP authentication mechanism it wishes to demand. This doesn't help all that much, see below.
  15. The wow-wiki keeps the user's preferences.

[edit] Special Problems

  1. Unless there is a means for passing authorization/authentication down the mixer pipeline, the usual HTTP authentication mechanisms don't work end-to-end.
  2. Click-stream leakage is a massive privacy problem.
  3. URLs are a shaky foundation for annotation (better would be what is connoted, not what connotes) because of aliasing and non-RESTful URLs.
    • Context senstivity even apart from state is a real problem: cookie-based login and personalization is quite common.

[edit] Implementation Ideas

  1. Use [Wikalong] Mozilla plugin as sidebar. The sidebar plugin has two functions: have a ajaxy editor for user's own annotation and a displayer for the associated mixer.
  2. Item-sources are [Opensearch services;] HTTP services that take (target page) URLs as inputs and returns item-lists [represented as an RSS feed] as outputs.
  3. Item-format is [HTML].
  4. Mixer behaviors are specified by scripts and source-lists, which are supplied through the Wowbar UI. It may be convenient to encode this as a URL.
  5. Sources have Bloom filters associated with them so you can screen out some URLs as not worth asking about. These filters can only be cached, though.
  6. It would be nice to have meta-data about sources and items in order to be able to determine provenance. I.E. named contributors, (people or services) by URI.
  7. We get nonces for URLs by using some sort of mutable and user-controllable short-name service.
  8. A common thing mixers should do is eliminate duplicates and sort by simple relevance measures (e.g., date).
  9. A common annotation action which should be easy to do is tagging.

[edit] Excessive Detail

[edit] Big Ideas We Won't Do Soon

  1. URL canonicalization would be good (addresses aliasing)
  2. Support Feedback into sources (sources have state?)
  3. Relevance feedback (mixer has click-stream, remember).
  4. Allow queries which are non URL specifications of a page
    • Annotations can be on a class of pages (pages that are "about" the same thing): Target Generalization
      • Figure this out by text analysis
      • Figure this out by collaborative filtering

[edit] Clickstream Leakage Countermeasures

It is desirable to keep the full clickstream from leaking to sources. Some ways to prevent this for certain kinds of sources:

  1. Use Bloom filters to only hit a source when it has data to provide
  2. Obfuscate URLs (maybe your Schwab account # is encoded in the URL)
    • Send only a short hash of the URL (perhaps 16-32 bits) to the source; allow the source to return metadata for many different URLs in response.
    • Elide some regex patterns (e.g., "*id*=*")
  3. Don't perform queries on sensitive URLs (so, must determine which URLs are senstive).
    • Don't do https: URLs, for starters.
  4. Don't use sources you don't "trust" (so, must determine which sources can be trusted).
    • Sources can publish policies a'la P3P
    • They give you a public key signed by some authority a'la code signing (hiss, boo).
  5. Anonymize origin by intermmediate mixers (using either crowds or onion technique).

[edit] Ideas for Sources

A source can be any http service that notionally takes a URL as input and returns one or more relevant things as a response. One would have to wrap that service to make it Opensearch compliant.

  1. The result of a Technorati search. E.g., commentary on this page in blogs.
  2. The result of a del.icio.us search. E.g. the people who have bookmarked (and commented on) this page there.
  3. The collected annotations of a group of people.
  4. Metadata, statistics & text analysis about the page.
  5. Results of a related page search (links to related pages).
  6. Other query results regarding the page:
    • (Dictionary and encylopedia) definitions of unusual or important words and phrases
    • Corporate info for top-level pages (do Whois and then Hoover's or Yahoo Financials)
    • Search query (e.g., Google) of who names the page (more general than Technorati)

[edit] Notes

  1. Mixers are named, and are completely specified by text containing its parameters such as source list and script (they don't have state); they need to be instantiated somewhere. So confusing the map with the territory -- the (URL) name is an instantiated mixer. Notionally, the an instantiated mixer is named as some encoding of: {interpreter}{source-list}{script}{latebound params}.
  2. Where do we carry the metadata for an item?
  3. Sources can be thought of as "agents" that activate on the pattern: guy is viewing page.
  4. Alternative UIs:
    1. (tiger) Dashboard for typed annotations.
    2. sources as lights at bottom of page; get your attention if they have something to say.
    3. popups or stickies
  5. The WoW-AA is a fine platform for doing Greasemonkey things that are alternate view of a page -- such as Bookburro and GreaseMaps. However, it's not convenient for multiple items on a page.
Personal tools