LSA Lab Assignment
LSA, short for "Latent Semantic Analysis," is a method used to derive the semantic similarity of different words based on the texts in which the words occur. The idea is simple: The more frequently two words occur in similar contexts across different texts (i.e., in similar patterns with similar words), the more similar those two words are to each other in meaning. Conversely, if two words never occur in similar contexts with similar words, they probably have very different meanings.
LSA is very good at predicting human participants' word associations and priming data. As a method, LSA is used widely throughout the cognitive and linguistic sciences. It is also used in many Natural Language Processing applications (any time you use the internet, chances are a method related to LSA is being used to evaluate your searches, posts, etc.).
In this assignment, we're going to have a quick look at LSA. The goal is to get a feel for how a large-scale semantic model works, and the kinds of comparisons that you can make.
1. Go to: http://lsa.colorado.edu/
2. Click on: "Near Neighbors"
This application allows you to type in a word and then to find its "nearest neighbors".
Go through a few examples. Think up some words that you know. And think about words that you think would be similar to it. Try some concrete words like "book", "stone", "house". What about food-related words? What about more abstract words? See whether you can predict what LSA would come up with. Try doing this with the person seated next to you. Do you have different associations?
Question 1: Briefly describe (2-3 sentences) some of your initial explorations.
Now, try the same words with different corpora (plural for corpus, as mentioned in lecture). What happens if you change the topic space, say, from 3rd grade vs. 1st year college? Remember from lecture, a topic space is the massive text set that was used to build the LSA model. Why would results differ, and would it be cognitively interesting how they differ?
Question 2: Briefly explain (2-3 sentences) what differences you found, and why results might differ.
Now, think about what would happen if you looked for similar words to "dog" within a psychology textbook? Are there salient experiments or examples where dogs get used that might influence the word co- occurrences? Feel free to discuss with the person seated next to you. Then, try "dog" with "Psychology Myers 5th edition" as topic vs. the "1st year college" space. Why are they different?
Question 3: Briefly explain (2-3 sentences) why you get differences in the nearest words for these two type of spaces for "dog."
Question 4: Suppose you are designing an educational technology that is going to be sensitive to the meaning of words in order to interact with a student (2-3 sentences). Why would the topic space (as described here) be relevant to age of your student?
That's it! Thanks so much for participating in this little exercise! Please submit brief answers to these 4 questions on CROPS.