LexTagAnalysis:index
From CommerceNet Wiki
[edit] Lexical Analysis of Tag Collections for the Improvement of Keyword Auto-Generation
By Kevin Hughes, Webmaster, CommerceNet
Originally presented at TagCamp, Palo Alto, CA, October 30, 2005
Later revised for the TagCamp followup at CommerceNet, November 3, 2005
Abstract: This presentation explores the results of the lexical analysis of various tag collections as well as normal text. What can we learn from human-generated metadata to help make automatically-generated metadata more usable, correct, efficient, and most importantly, humane?
The source files and code related to this presentation can be found here:
In this file is maketags.php, which generates tags from a given body of text, wordinfo.php, which produces a number of statistics for a body of text, tagstats.xls, numerical results from the experiments, and a directory of sources I used (text and tag collections from around the Web). These programs require wordnet to run. This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.
- Those interested in this topic are also encouraged to explore TagCloud - it uses Yahoo's Content Analysis Web Service.
- For other tag collection analysis, see The Structure of Collaborative Tagging Systems, by Scott Golder and Bernardo Huberman at HP Labs.
