Wowbar notes
From CommerceNet Wiki
Contents |
[edit] Wowbar Spec Outline (Wiki version)
Notation: [] is premature commitment.
[edit] Basic idea
WOWbar is a browser sidebar that displays annotations for the page being viewed. WOW-AA is the Annotation Architecture that WOWbar relies on.
[edit] Naïve User model
WOWbar shows me things I need to know about this web page, displayed alongside each page I look at. These things come from WOW-AA compatible servers that I like.
[edit] Geek User model
WOWbar is a platform supporting webpage annotation presenting results in a browser sidebar. Annotations are what individuals (me, for instance), communities, organizations and their software agents have to say about a web page, displayed alongside the page arranged/prioritized according to my preferences. WOW-AA is the collection of protocols and conventions by which otherwise uncoordinated parties can make their annotations available to others as a service on the web; when I find out about such services, I can plug them into my WOWbar.
[edit] Architectural sketch
- The sidebar displays multiple annotation items which are associated with the current page's url.
- Items come from a source and are chunks of item-format [a format isomorphic to HTML pages] [displayed in frames].
- Sources are uniquely named by uri Sources actually are properly "item-list sources" since they output item-lists.
- Item-lists are ordered sets of items, but the order is only a hint.
- Users have preferences, which include a source-list (a list of item-sources) and a mixer.
- Mixers notionally map sets of item-lists to item-lists (in a bounded amount of time), ordering and eliding items according to its own purposes.
- Mixers are more than that, though, since they include a list of sources. Hence they are themselves sources.
- Mixers return their results in an appropriate (bounded) amount of time.
- Mixers are uniquely named by uri, and conceivably described by a uri.
- Users have an associated wow-wiki which provides the page viewed by his Wowbar.
- This page is constructed by the displayer which takes mixer output and converts it to display format.
- The user's wow-wiki allows him to store annotations per page and the Wowbar UI makes this convenient.
- The user's wow-wiki (his annotation store) is a source with a name he can provide to others. It would be nice if these names are nonces which are hence revokable. Similarly, for his mixer.
- Sources communicate via HTTP, so can use any HTTP authentication mechanism it wishes to demand. This doesn't help all that much, see below.
- The wow-wiki keeps the user's preferences.
[edit] Special Problems
- Unless there is a means for passing authorization/authentication down the mixer pipeline, the usual HTTP authentication mechanisms don't work end-to-end.
- Click-stream leakage is a massive privacy problem.
- URLs are a shaky foundation for annotation (better would be what is connoted, not what connotes) because of aliasing and non-RESTful URLs.
- Context senstivity even apart from state is a real problem: cookie-based login and personalization is quite common.
[edit] Implementation Ideas
- Use [Wikalong] Mozilla plugin as sidebar. The sidebar plugin has two functions: have a ajaxy editor for user's own annotation and a displayer for the associated mixer.
- Item-sources are [Opensearch services;] HTTP services that take (target page) URLs as inputs and returns item-lists [represented as an RSS feed] as outputs.
- Item-format is [HTML].
- Mixer behaviors are specified by scripts and source-lists, which are supplied through the Wowbar UI. It may be convenient to encode this as a URL.
- Sources have Bloom filters associated with them so you can screen out some URLs as not worth asking about. These filters can only be cached, though.
- It would be nice to have meta-data about sources and items in order to be able to determine provenance. I.E. named contributors, (people or services) by URI.
- We get nonces for URLs by using some sort of mutable and user-controllable short-name service.
- A common thing mixers should do is eliminate duplicates and sort by simple relevance measures (e.g., date).
- A common annotation action which should be easy to do is tagging.
[edit] Excessive Detail
[edit] Big Ideas We Won't Do Soon
- URL canonicalization would be good (addresses aliasing)
- Support Feedback into sources (sources have state?)
- Relevance feedback (mixer has click-stream, remember).
- Allow queries which are non URL specifications of a page
- Annotations can be on a class of pages (pages that are "about" the same thing): Target Generalization
- Figure this out by text analysis
- Figure this out by collaborative filtering
- Annotations can be on a class of pages (pages that are "about" the same thing): Target Generalization
[edit] Clickstream Leakage Countermeasures
It is desirable to keep the full clickstream from leaking to sources. Some ways to prevent this for certain kinds of sources:
- Use Bloom filters to only hit a source when it has data to provide
- Obfuscate URLs (maybe your Schwab account # is encoded in the URL)
- Send only a short hash of the URL (perhaps 16-32 bits) to the source; allow the source to return metadata for many different URLs in response.
- Elide some regex patterns (e.g., "*id*=*")
- Don't perform queries on sensitive URLs (so, must determine which URLs are senstive).
- Don't do https: URLs, for starters.
- Don't use sources you don't "trust" (so, must determine which sources can be trusted).
- Sources can publish policies a'la P3P
- They give you a public key signed by some authority a'la code signing (hiss, boo).
- Anonymize origin by intermmediate mixers (using either crowds or onion technique).
[edit] Ideas for Sources
A source can be any http service that notionally takes a URL as input and returns one or more relevant things as a response. One would have to wrap that service to make it Opensearch compliant.
- The result of a Technorati search. E.g., commentary on this page in blogs.
- The result of a del.icio.us search. E.g. the people who have bookmarked (and commented on) this page there.
- The collected annotations of a group of people.
- Metadata, statistics & text analysis about the page.
- Results of a related page search (links to related pages).
- Other query results regarding the page:
- (Dictionary and encylopedia) definitions of unusual or important words and phrases
- Corporate info for top-level pages (do Whois and then Hoover's or Yahoo Financials)
- Search query (e.g., Google) of who names the page (more general than Technorati)
[edit] Notes
- Mixers are named, and are completely specified by text containing its parameters such as source list and script (they don't have state); they need to be instantiated somewhere. So confusing the map with the territory -- the (URL) name is an instantiated mixer. Notionally, the an instantiated mixer is named as some encoding of: {interpreter}{source-list}{script}{latebound params}.
- Where do we carry the metadata for an item?
- Sources can be thought of as "agents" that activate on the pattern: guy is viewing page.
- Alternative UIs:
- (tiger) Dashboard for typed annotations.
- sources as lights at bottom of page; get your attention if they have something to say.
- popups or stickies
- The WoW-AA is a fine platform for doing Greasemonkey things that are alternate view of a page -- such as Bookburro and GreaseMaps. However, it's not convenient for multiple items on a page.
