Very interesting technique for clustering, classification and indexing of documents. Using a bounce of multi-dimensional sparse matrixes and some mathematics (interpolation and stuff like that) the algorithm allows to cluster up resources into most relevant topics/categories or just can be used to assign keywords.
A recent article on the Guardian goes a step further talking about Latent Semantic Indexing (LSI) which apply LSA to search engines - nice quick summary about LSA and how could be used for automatic metadata extraction and stuff like that. See also very nice paper about Patterns in Unstructured Data which covers a lot of land and goes close to the point of using LSA/LSI for the Semantic Web :-)
Jean Paul-Jeral from JRC/WT has been working on LSA for a while - see also his exploratori research proposal.
Any comment...JPJ? :-)
Posted by alberto at February 7, 2003 02:48 PM | TrackBacki wanted to have a detailed idea about LSA.
Posted by: prem rai on December 15, 2003 01:36 PMLSA is a powerful possibilty, absolutely.
Posted by: Onlinekredit on December 18, 2003 10:34 PMLSA extracts the semantic similarity between words as well as between documents based on contextual usage of words in documents as represented by a word-by-document matrix whose entries are frequency counts to begin with. Then they are scaled and normalized and the matrix is passed through singular value decomposition for dimensionality reduction. This reduced dimensional space allows vectorial representation of words and documents. This then allows one to compute semantic similarity between any two words or documents by calculating the cosine measure between their vectors.
LSA has been used primarily in information retrieval community. But it has two other successful application areas. One is cognitive modeling in psychology and intelligent tutoring systems where it performs comparable to human beings in terms of semantic similarity judgement. The second is in statistical language modeling which is used for speech recognition. Here it is used to capture the semantic fabric of spoken document and thus reduce the word error rate.
Posted by: Dharmendra Kanejiya on February 10, 2004 12:22 PMHi. I'm Geomar Lubaton from the University of the Philippines and I am also interested in LSA. Do you happen to have a source code of LSA. I would use it on my project regarding automated essay grading. Thanks! :D
Posted by: im_geo on November 29, 2005 09:06 PMhi,
I am ankit singh, student at IIIT Hyderabad. Can you provide me source code as I require it to understand how LSA works
Hi,
I've been reading up on lsa coz i want to do a final year project on automated essay analysis. Could u provide me some source code so i can understand it better.
Can you please send me the source code of LSA. i need to know how it works. please send me if it is possible.
Posted by: Md. Farhad Shahid on April 9, 2007 08:35 PMCan you please lend me the source code for LSA? I need to understand its working..
Thanks in advance