Using the online vocabulary profiler – Part 2


I’m using the online vocabulary profiler to create a new vocabulary diagnostic assessment to use as a model in my course.

This is a bit technical. So if it’s not your thing, just watch some cat videos.

The screenshot above shows the output from the vocabulary profiler, or at least the first screen after I pasted in about 30,000 words from the Learning Progressions for Adult Literacy. I downloaded this as a PDF and then copied and pasted the entire text into the submission box in the main Vocab Profiler landing page.

Of course, this is a lot of words. It works equally well with much smaller texts.

What I’ve got with is a good indication of how many words fit into each of the following categories:

  • K1 Words (First thousand words of English): 72.05%
  • K2 Words (Second thousand words of English): 5.32%
  • AWL (Academic Wordlist): 12.16%
  • Off-list words: 10:48%

What I really want is the word lists… which are much further down the screen. But this data is interesting because it tells me that the text has at least 20% of the words at the AWL level or Off-list.

The Off-list words are going to be interesting, because they are likely to be the technical words from the text.

You can also see the start of the colour-coded text on the right hand side. The different colours allow you to see the distribution of different words from different frequencies.

  • Red = Off-list
  • Yellow = AWL
  • Green = K2
  • Blue = K1

Some of it gets a bit scrambled, and things like names and other languages show up in red as well, but with a bit a filtering you can come up with some really useable word lists.

If I had more time and a smaller text, I would edit the plain text document to either remove or clean up things that are getting scrambled. For example, because I’ve pasted from a PDF document I have a lot of words that have been joined together creating nonsense words.

This is messing with my data for the Off-list words above. But it doesn’t matter too much for the purposes that I’m going to use the words for as I can manually filter this out.

If I scroll down, eventually, I’ll come to the lists of words that fit into each word frequency level. And this is what I’m going to build my vocabulary diagnostic from.

My plan is to create a 30 item assessment with 10 words from each of the 2k, AWL, and off-list.

With lower level learners I would also use the 1K list and perhaps even break that down into just the content only words from the 1K, which is a much smaller list again.

Next stop: creating my own bank of words to use as the items for assessment…

Here’s a screenshot that shows a portion of the wordlists that the Profiler has generated for each of 1K (Blue), 2K (Green) and AWL (Yellow). The numbers in square brackets tell me how many times the word from that frequency level appears in my text.




Author: Graeme Smith

Education, technology, design. Also making cool stuff...

Leave a Reply