Here are the issues I’m facing in developing my vocabulary database:
- There is no definitive source for the vocabulary frequency lists that I want to work with. The various versions that exist around the internet more or less overlap, but I’m going to make a judgement call and go with the ones posted on Paul Nation’s Victoria University website available in the zip file here. My second authority is going to be the online version used by Tom Cobb on the Vocab Profiler website. It’s not perfect, but it’s a pretty good match. Unfortunately, I think the list versions I’ve used over the years are probably not the best ones.
- Also, the GSL lists from the Victoria Uni website actually don’t have exactly 1000 words in each for the 1K and 2K. Unless I’ve made an error doing the copy and paste, it seems that these lists are just under 1000 words. I’m not sure this is a problem, as I can add a few words to each if I want to as appropriate.
- I’m working on a online spreadsheet master list of all words. Hopefully, I can just set this up and hand the work over to someone else.
- I think I need to tweak the headwords from the Waikato list in some cases. For example, change a word like “absence” which appears in the 2K list to “absent” which – while I have no proof – feels more usable and high frequency than “absence”.
- I want an activity that displays the first four (or so) letters of each word for the user to complete. However, one problem is that some words only have three or four letters, so this means the smaller words need to only display two or three letters respectively.
- I’m currently looking at using definitions from the Simple English Wiktionary. However, I’m going to have to modify each definition to ensure that the explanation excludes the word under scrutiny. While it’s great to have definitions that use the word in a sentence, I can’t use them for matching activities as it would give the game away.
- Probably, what I need to do is just use the wiktionary definitions as a jumping off point and simplify as much as possible. The least amount of words the better…
- When I can’t find or use the example sentence from the wiktionary I go here and use the concordance that’s connected to the Vocab Profiler word lists.
- I’m over worrying about whether my definitions are dictionary perfect or whether my example sentences are “authentic”. Pragmatism rules the day here.
- I need to ensure that any definitions I use or modify are not circular in the sense that they use other words made from the word being learned or are obscure in any way.
- If I don’t like the entry in the Simple English Wiktionary or the normal Wiktionary, I go to my old favourite the Longman Dictionary of Contemporary English (LDOCE) online.
- I think I need to stick with the GSL frequency lists because there are a wealth of other resources online and in print that people, especially those with an ESOL background, will be familiar with.
- Sometimes the existing definitions are not that helpful and for my purposes it can be better to go with a synonym for the sake of brevity.
- Another complication is that the academic word list (AWL) contains words arranged by frequency into 10 sublists, so I’ll have to decide whether this is worth adhering to or not when it comes time for the software to decide which words are allocated first. Probably this is a good idea.
- Another problem is that words have more than one meaning and often one word can have multiple parts of speech. Probably what I’ll do here is just take whatever appears to be the most frequently used sense of the word. This is usually the one that appears first in the dictionary listing, or the meaning that appears first in an entry for a particular word. If in doubt, I’ll turn to the LDOCE and see what comes first there.
- For some words in the first thousand (1K) list, it will be pretty much impossible to provide a definition. Here I’m talking about function words versus content words. E.g. How do you define “the” or “an” in a way that avoids total confusion. The answer here might be to either exclude these words from the list at this point, or give a definition in terms of grammar and parts of speech.
Any comments or helpful suggestions let me know below.
One thought