MFML

From CommerceNet Wiki

Jump to: navigation, search

Pronounced 'miffle' :) Not to be confused with MiniML.

Contents

[edit] Goals

Goals for a pidgin XML representation of microformatted data:

  • make it easy for traditional XML tools to present, store, & transform microformatted data recovered from XHTML
    • secondarily, make it easier to apply SemWeb tools.
  • make it easier to design new microformats.

Starting points (assumptions):

  • the only normative form for data recovered from a microformat is a DOM tree. (Dicts, hashmaps, lists, and combinations of the above all fail when presented semi-ugly XHTML input. Property lists might be an even more accurate term for the s-expy crowd...)
  • each microformat has a doppelganger XML tag set
    • more "semantic" constraints on valid data recovered from microformats should be specifiable in schema languages (relax-ng, schematron, dtd, etc.) E.g. if there can only be one instance of family-name, don't expect a base microformat parser to enforce that...)

[edit] Obstacles

For an XML-savvy programmer, there are several obstacles to working with semi-structured data "hidden" in XHTML:

  1. the class attribute is hard to process in XML tools
  2. microformats can appear on a very wide range of HTML elements
    1. those elements need not be co-extensive -- a single element can be microformatted multiple times, or the 'members' of a microformat structure can be scattered around a page
  3. microformats require special-case handling for several constructs
    1. abbr[@title]
    2. link[@href]
    3. a[@href] and a content
    4. img[@src] and img[@alt]
  4. html authors require roundtripping without data loss
    1. html attribute information needs to be preserved
    2. html element names need to be preserved
  5. the element subsumption hierarchy cannot be inferred from XMDP alone
  6. misspellings and extensions -- there are no validity constraints
  7. microformats can embed other microformats -- including those your code predates
  8. xhtml fragments must be evaluated with respect to several non-local parameters
    1. base[@href]
    2. head[@profile] (XMDP)
    3. XHTML namespaces and/or DOCTYPEs
  9. other problematic parts
    1. [@id] and [@name] uniqueness
    2. non-microformat classnames

[edit] Architecture

µformatted XHTML (XMDP+template-driven) MiniML XHTML (MiniML) MiniML XML (XMDP+template-driven) "clean" XML for each µformat

An example:

  1. start with an Excel table of names and phone numbers, export to XHTML
  2. microformat the XHTML using hCard
  3. miracle...
  4. print out the phone numbers, sorted by last name using XML tools

Excel table as input:

First NameLast NamePhone
BenSittler555-1212
RohitKhare867-5309
XHTMLµformatted XHTMLmiracle..."clean" XML
<table xmlns="http://www.w3.org/1999/xhtml">
 <tr >
 <th>First Name</th>
 <th>Last Name</th>
 <th>Phone
 </th></tr>
 <tr >
 <td>Ben</td>
 <td>Sittler</td>
 <td>555-1212
 </td></tr>
 <tr >
 <td>Rohit</td>
 <td>Khare</td>
 <td>867-5309
 </td></tr></table>
<table xmlns="http://www.w3.org/1999/xhtml">
 <tr >
 <th>First Name</th>
 <th>Last Name</th>
 <th>Phone
 </th></tr>
 <tr class="vcard">
 <td class="Given-Name">Ben</td>
 <td class="Family-Name">Sittler</td>
 <td class="iridium tel">555-1212
 </td></tr>
 <tr class="vcard">
 <td class="Given-Name">Rohit</td>
 <td class="Family-Name">Khare</td>
 <td class="tel work mobile">867-5309
 </td></tr></table>
MiniML
 <vcard>
  <n>
   <Family-Name>Sittler</Family-Name>
   <Given-Name>Ben</Given-Name>
  </n>
  <tel>
   555-1212
  </tel>
 </vcard>
 <vcard>
  <n>
   <Family-Name>Khare</Family-Name>
   <Given-Name>Rohit</Given-Name>
  </n>
  <tel>
   867-5309
   <mobile/>
   <work/>
  </tel>
 </vcard>

[edit] miracle...

We are told that

  1. vcard is the root element
    1. n is a child
      1. Given-Name is a child
      2. Family-Name is a child
      3. (... there are others in the spec)
    2. tel is a child
      1. it can contain zero or more tags from: work, home, tel, fax, pref etc. (... as enumerated in the spec and at IANA)
    3. (...there are others in the spec, such as adr)

[edit] Find µformat Classes

First, collect the list of "symbols", or unique classnames that can occur in this µformat. Put "--" in front of each one.

<table xmlns="http://www.w3.org/1999/xhtml">
 <tr >
 <th>First Name</th>
 <th>Last Name</th>
 <th>Phone
 </th></tr>
 <tr class="--vcard">
 <td class="--Given-Name">Ben</td>
 <td class="--Family-Name">Sittler</td>
 <td class="iridium --tel">555-1212
 </td></tr>
 <tr class="--vcard">
 <td class="--Given-Name">Rohit</td>
 <td class="--Family-Name">Khare</td>
 <td class="--mobile --tel --work">867-5309
 </td></tr></table>

[edit] Disambiguate µformat Hierarchy (mftidy?)

Note that multiple --name classes are not allowed in MiniML (since elements have only one name,) so we need to apply each microformat's rules to break ties and insert missing levels of hierarchy.

  1. µformats on leaf XHTML elements (img, and potentially also hr, br, isindex, area, param, col, frame, iframe, input, select, option (arguable), meta, link, base, basefont) need to be enclosed in a new container elements, e.g. <img class="--photo" src="..."/> becomes <div class="--photo"><img src="..."/>
  2. "tag" classes (e.g. class="work pref mobile" inside .vcard and .tel) need to become empty child elements
  3. unambiguous inheritance (e.g. class="vcard n")
  4. omitted intermediate levels (e.g. class="vcard Given-Name") -- adjacent nodes missing the same parent share one
  5. shared content, either in a single microformat (e.g. class="vcard n fn")

Expand the element hierarchy to reflect each microformat's constraints.

<table xmlns="http://www.w3.org/1999/xhtml">
 <tr >
 <th>First Name</th>
 <th>Last Name</th>
 <th>Phone
 </th></tr>
 <tr class="--vcard">
 <td class="--Given-Name">Ben</td>
 <td class="--Family-Name">Sittler</td>
 <td class="iridium --tel">555-1212
 </td></tr>
 <tr class="--vcard">
 <td class="--Given-Name">Rohit</td>
 <td class="--Family-Name">Khare</td>
 <td class="--tel">867-5309
 <span class="--mobile"/><span class="--work"/></td></tr></table>

[edit] Convert to MiniML XML

Next, convert the above to MiniML XML: (the following example is controversial in that it does not preserve the original content of @class)

<xh:table
   xmlns:miniml="..."
   xmlns:xh="http://www.w3.org/1999/xhtml">
 <xh:tr >
 <xh:th>First Name</xh:th>
 <xh:th>Last Name</xh:th>
 <xh:th>Phone
 </xh:th></xh:tr>
 <vcard miniml:element="xh:tr">
  <Given-Name miniml:element="xh:td">Ben</Given-Name>
  <Family-Name miniml:element="xh:td">Sittler</Family-Name>
  <tel miniml:element="xh:td">
   <miniml:attr miniml:name="xh:class">iridium</miniml:attr>
   555-1212
  </tel>
 </vcard>
 <vcard miniml:element="xh:tr">
  <Given-Name miniml:element="xh:td">Rohit</Given-Name>
  <Family-Name miniml:element="xh:td">Khare</Family-Name>
  <tel miniml:element="xh:td">
   <miniml:attr miniml:name="xh:class">work mobile</miniml:attr>
   867-5309
  </tel>
 </vcard>
 </table>

[edit] MiniML XHTML generated from the "clean" XML

 <div class="--vcard" xmlns="http://www.w3.org/1999/xhtml">
  <div class="--n">
   <span class="--Family-Name">Sittler</span>
   <span class="--Given-Name">Ben</span>
  </div>
  <span class="--tel">
   555-1212
  </span>
 </div>
 <div class="--vcard">
  <div class="--n">
   <span class="--Family-Name">Khare</span>
   <span class="--Given-Name">Rohit</span>
  </div>
  <div class="--tel">
   867-5309
   <span class="--work"/>
  </div>
 </div>

[edit] "Rules"

  • microformat tokens become tagnames: values that occurs in an XMDP profile (all valid classnames, rel/rev properties, and misspellings of same) become the names of tags in MFML.
  • the highlander rule: if there can only be one of something, should it become an attribute?
  • the abbr rule: the TITLE of an ABBR element is substituted for the entire list of childNodes. [moral: if you want the original pretty-printed HTML, look elsewhere]
  • the scoping rule: since microformats can occur within other microformats, returning a flattened list of all microformat data found in a page discards information (e.g. "was that a relTag of "cool" on the hCalendar entry, or only the hCard of the organizer?"). How should we indicate the relative tree occurrence order?
  • the repeat-yourself rule: should a second occurrence of a token force the creation of a copy of the parent node ("page break")?
  • the subsumption hierarchy: a topological sort of valid tokens in a microformat must be provided as an exogenous input to Miffy. With that, the occurrence of any tag forces the creation of intermediate parent tags.
    • thus
 <em class="locality vcard">Galway</em> 
 

becomes

 <mfml>
     <vcard>
         <adr>
             <locality> Galway </locality>
         </adr>
     </vcard>
 </mfml>
 

Open questions:

  • abbr
  • base urls / scoping
  • XSD-like data typing
    • text
    • XHTML marked up text
    • ISO8601 date/interval
    • number (float, int, ...?)
    • enumerations ??
  • hierarchy
  • mispelin's
  • are link microformats special? (rel/rev)
  • support for "x-" classnames?
  • IMG / A / LINK
  • support "dict list as hash" pattern?
    • e.g. a credits listing for a movie using an open-ended role vocabulary might be a DL of hCard DDs with role-types as DTs

[edit] Examples

(From http://gbraad.survion.com/site/?p=profile )

 <div class="vcard">
 	<img style="float:right; margin:4px" src="http://gbraad.survion.com/photos/profile/0.jpg" alt="Profile photo" class="photo"/> 
 	<a class="url fn" href="http://gbraad.survion.com/" title="Full name">Gerard Braad Jr.</a>
        <span class="bday" title="Date of Birth">1981-02-22</span>
 	<div class="org" title="Organisation"><a class="url work" href="http://www.survion.com/">Sur-V-ioN</a></div>
 	<span class="role" title="Role">(Freelance) Software Developer</span>
 	<div class="adr">
 		<div class="street-address" title="Street">Rustenburgstraat 224</div>
 		<span class="postal-code" title="Postal code">7311JC</span>
 		<span class="locality" title="City">Apeldoorn</span>
 		<span class="country" title="Country">The Netherlands</span>
 	</div>
 	<div class="tel">
 		<span class="pref work voice" title="Work phonenumber">+31 (0)87 1901 799</span>
 		<span class="home voice" title="Home phonenumber">+31 (0)55 521 2488</span>
 		<span class="cell voice" title="Cell phonenumber">+31 (0)6 4256 7996</span>
 	</div>
 	<div class="email">
 		<span class="pref internet" title="Primary email">g_braad@survion.com</span>
 		<span class="internet" title="Alternate email">g_braad@spotsnel.nl</span>
 	</div>
 </div>
 

could become

 <vcard>
     <photo> http://gbraad.survion.com/photos/profile/0.jpg </photo>
     <fn> Gerard Braad Jr. </fn>
     <url> http://gbraad.survion.com/ </url>
     <bday> 1981-02-22 </bday>
     <org> Sur-V-ioN </org>
     <url>
         http://www.survion.com/
         <work />
     </url>
     <role> (Freelance) Software Developer </role>
     <adr>
         <Street-Address> Rustenburgstraat 224 </Street-Address>
         <Postal-Code> 7311JC </Postal-Code>
         <Locality> Apeldoorn </Locality>
         <Country> The Netherlands </Country>
     </adr>
     <tel>
         +31 (0)87 1901 799
 	 <pref />
         <work />
         <voice />
     </tel>
     <tel>
         +31 (0)55 521 2488
 	 <home />
         <voice />
     </tel>
     <tel>
         +31 (0)6 4256 7996
 	 <cell />
         <voice />
     </tel>
     <email>
         g_braad@survion.com
         <pref />
         <internet />
     </email>
     <email>
         g_braad@spotsnel.nl
         <internet />
     </email>
 </vcard>
 

alternative design decisions might include:

 <vcard>
     <photo href="http://gbraad.survion.com/photos/profile/0.jpg"/>
OR
     <photo>
         <content> http://gbraad.survion.com/photos/profile/0.jpg </content>
     </photo>
 </vcard>
 

BTW, prior art includes: http://www.w3.org/TR/vcard-rdf and http://www.imc.org/rfc2426 and http://www.jabber.org/jeps/jep-0054.html ; see http://xml.coverpages.org/vcard.html for a comprehensive discussion

Personal tools