Loading... This could take a moment or three. ;)

Text2Label: A Quick and Dirty Client-Side Text "Classifier"

By @colarusso


This is not a true text classifier, hence the quotes. It is a JavaScript-only hack that mostly does the job. I needed a solution that would run on something like GitHub Pages (where you only can do client-side scripting). Anyhow, you start with a two-column CSV file. The first column contains example texts, and the second column contains labels for each text item. Below, I've used single words, but it can support sentences. Using these we build vectors based on the texts and associate them with their labels. Instead of using this as training data for an honest-to-goodness classifier, we just match the vectors of novel texts with existing examples, looking for the closest match.

Build Vectors

Did you notice how this page took a while to load? Well that's because it loaded a bunch of word vectors (used to vectorize your text). You may be able to get away with fewer words. So feel free to play around with the selection below. If you change this selection, however, you'll need to rebuild your vectors to see the results. Also, this page loaded almost 115 MB of word vectors. You'll be using only a subset of these regardless of what option you choose, no more than about 60%. Note: I originally got these vectors from word2vecjson.

That being said, if you have words that aren't found in the list of word vectors, we'll make a placeholder vector for them. That is, we'll take the 300 dimensions of a word vector and set all of them to zero. The first unknown word will have the first dimension set to one, the second will have the second dimension set to one, and so on through 300. We do the inverse for the next 300 unknown words, with all but a single dimension being one. This doesn't capture semantic info about these words, but it makes sure they are considered during matching. We added this feature mostly to catch terms of art and proper names.

You can type your "training data" into the textarea below, or click "Choose File" to upload a CSV. After that, click "Build Vectors," and it will construct a set of vectors based on your texts and associate them with their labels. If you change this content, you'll need to rebuild your vectors to see the results.

Data Collection (Optional)

It might be nice to see what text people are trying to classify, along with the answers they are getting, but since we're trying to run this on a server without server-side scripting or a database, we need to get creative. So I added this AirTable hack. Basically, you create an AirTable and use there API to add a row for every user text. Note: your table must contain two columns, one named "text" and one named "label". If you leave any of the following three fields blank, this functionality will not be added to your code. If you want to use AirTable in this way, just add the relevant info. Keep in mind these will all be visible to your end users should they inspect the code. So don't use credentials that you don't want out in public.

API Key:    Base ID:    Table Name:

Test "Understanding"

Test things out and see if you need more examples. Once you're happy, download the two JavaScript files below, and use their functions to find the best match. Again, to find this match, the files vectorize some string of text and try to find the most similar item in the list of vectors you built above. It does this using cosine similarity.

Download Your Custom Code

Note: Not all browsers support the download feature. You may want to use Chrome. This is only an issue for the download feature on this page, not the code you are downloading. It should work across browsers.

Demo Code

To use your vectors and run a match like that above, you'll need both the "word2vec.js" and "text2label.js" files. For an example of how to use these, look at the code below. Then check it out in action. FYI, the demo uses vectors built on the default Animal, Vegetable, or Mineral data you saw when this page first loaded. Also, you may want to move the src calls to the end of the page so they don't hold up loading.

<html>
  <head>
    <!-- Include the following line only if you're writing data to AitTable -->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/axios/0.16.2/axios.min.js"></script> 
    <script src="js/word2vec.js"></script>
    <script src="js/text2label.js"></script>
    <script>
      function test_understanding(string) {
      // This function makes use of text2lable to find the best answer
          answers = getNClosestAnswer(1, vectorize(string))
          // getNClosestAnswer allows for the return of multiple labels
          // here we've limited it to one. Additionally, we're filtering by
          // QLabels to apply consistent labels. To allow for multiple instances
          // of the same labels we append a #n to the label. This removes that.
          document.getElementById('answer').innerHTML = QLabels[answers[0][0]]
          // Include the following line only if you're writing data to AitTable
          write_to_table(string,QLabels[answers[0][0]])
      }
    </script>
  </head>
  <body>
    <input id="test" style="width:300px;" onkeypress="if (event.keyCode==13){test_understanding('test')}"/>
    <input type="button" value='Get label' onclick="test_understanding(document.getElementById('test').value)"/>
    <p id="answer"></p>
  </body>
</html>

GitHub Repo