Robert A. Uhl

Hyperspatial text classification

Friday, 04 May 2007 by Robert A. Uhl in tech machine learning

Hyperspatial Text Classification While reading the docs for CRM114 (a text classification engine; text classification can be used to determine if email is spam; if a log entry is important; or if a newspaper article is worth reading) I discovered that it supports a hyperspatial classifier. It’s a pretty neat idea: a document is broken into its component features (e.g. phrases and individual words; this step is pretty standard for classifiers); each feature is then hashed to a 32-bit integer value; the document is then considered to be a point in a 232-dimensional space — if a feature is present once, then the value of that dimension is one; if twice, then two and so forth. Read more →

Hyperspatial text classification

Friday, 04 May 2007 by Robert A. Uhl in tech machine learning