February 07, 2003

Latent Semantic Analysis (LSA)

Very interesting technique for clustering, classification and indexing of documents. Using a bounce of multi-dimensional sparse matrixes and some mathematics (interpolation and stuff like that) the algorithm allows to cluster up resources into most relevant topics/categories or just can be used to assign keywords.
A recent article on the Guardian goes a step further talking about Latent Semantic Indexing (LSI) which apply LSA to search engines - nice quick summary about LSA and how could be used for automatic metadata extraction and stuff like that. See also very nice paper about Patterns in Unstructured Data which covers a lot of land and goes close to the point of using LSA/LSI for the Semantic Web :-)

Jean Paul-Jeral from JRC/WT has been working on LSA for a while - see also his exploratori research proposal.

Any comment...JPJ? :-)

Posted by alberto at 02:48 PM | Comments (8) | TrackBack

February 06, 2003

Semantic bloggers

Here is a quite complete report from the SWAD-e project about using weblog technologies in combination with RDF to generate and extrat metadata from existing blogs.

They go even through some use cases and scenarios how this could work out and they even mention Stefano Mazzocchi Agora project and some nice MovableType blogger RDF utilities.
This is definitivly inline with the the previous article I posted about leveraging on TrackBack/PingBack to automatically generate data, or like Stefano well said:

"I'm more and more heading myself into the concept of 'data emergence' where you don't go around bothering people to markup their data as *you* like it, but *you* make an effort to collect their data and make a sense out of it."

Is this one of the ways the SW will take off finally?

Posted by alberto at 01:07 AM | Comments (1) | TrackBack

February 03, 2003

Text language identification tool

Interesting system guessing out up-to 69 different languages, including arabic and so on. There is also a good list about similar language analysis technologies.

Might be we can associate to given text automatigically some day then? :-)

Posted by alberto at 12:10 PM | Comments (1)