<html> <meta http-equiv="Content-Type" content="text/html" charset="UTF-8" > <link rel="Stylesheet" type="text/css" media="all" href="main.css" > <head> <title>fold, spindle, and mutilate | why?</title> </head> <body>

Why Fold, Spindle, and Mutilate?

If you are wondering why the name, it was inspired by toolbooth tickets which often say DO NOT FOLD, SPINDLE, OR MUTILATE, which always seemed to me to be a rather oxymoronic statement. And who spindles anything anymore anyway? How many people even know what a spindle is? I used to own one, just for nostalgia. For those of you who don't know it is a pointed metal spike which you skewer papers on to keep them in place.

Why

I have been working on various text based web art projects for a while, all of them exploring the boundaries of chance and randomness, often using dictionaries and word lists. As my experiments progressed they increasingly pointed to the importance of structure and syntax in language and so I started investigating various approaches. In my exploration I came across <a href="http://www.itri.bton.ac.uk/~Adam.Kilgarriff/bnc-readme.html" target="_new">this</a> a list of over six thousand three hundred common English words (British, not American English) each ranked by it's frequency of use in written and spoken language, and by it's article of speech. I quickly realized that this could be prime material for some of the projects I was imagining, but only if I could access the content, and so I turned it into a database (MySQL). This database, or "dictionary", is the backbone of Fold, Spindle, and Mutilate.

How

The process is really quite simple, each word of the original text (words being things seperated by spaces in a sentence) is looked up in the dictionary. If the word is found then it's article of speech can be determined and a new word is chosen, at random, from the dictionary based on the same article of speech (a noun for a noun, a verb for a verb, etc.). This way the sentence structure of the original text is preserved. Being that the replacement words are chosen totally at random, with no regard given to the proceeding or following word, many of the combinations created, although symanticly correct (e.g. an adverb with a verb), are not ones that one would find in a 'normal' sentence. But of course the whole idea is to use chance operation to create intriuging combinations of words one wouldn't otherwise use.

Snares and pitfalls

As I worked on this project I came across some interesting problems. The first was preserving punctuation; sentences begin with a capital letter and end with a period, as well as contain commas, etc., and so I needed to come up with a mechanism that interpereted this so that the reconstructed text 'looks' like the original. <p /> The second problem was plural words. The dictionary logically contains only singular words e.g. the word 'car' but not the word 'cars'. The first step I took was to take any word that ends in 's' and remove the 's' and then check it against the dictionary. Thus 'cars' becomes 'car', which is in the dictionary and thus the article of speech can be determined. But this method quickly breaks down: the word 'pass' also ends with an 's' but is not a plural version of a word. By applying the same method and removing the last 's' the result is 'pas' which, when checked against the dictionary, is not found and so the system <i>assumes</i> that it is not a plural version of a word. But this is an imperfect solution as there are many words that break this rule e.g. 'does' would become 'doe' which would be found the dictionary, but would also be indentified as a noun. (Actually both these examples are a bit misleading as they are actual words in the dictionary. Only words which are not found AND end with an 's' are stripped of their last 's' and re-checked for a singular form. Still there are other exceptions.)
There are also other plural forms of words which ends in 'es' and in 'ies' so methods had to be developed for these as well (for the later the 'ies' is replaced with a 'y' and the 'y' version of the word is re-checked against the dictionary).

Similar to problem of plural words is the issue of past tense. Again the dictionary does not contain past tense versions of words, and so methods similar to the one described above are employed to attempt to handle past tense words in the original text.<br /> It should also be noted that in case of both past tense words and plural words the replacement words in the newly generated text are accordingly converted to past tense or plural, which in some cases leads to some incorrect grammar but seemed neccassary in order the preserve the 'flavor' of the original text.

Still another problem is hyphenated words in the original text. Words which contain a hyphen are examined and corrospondingly converted to a non-hyphentated version e.g. "can't" is changed to "can not". This is done by splitting the original word at the hyphen. Characters preceding the hyphen are assumed to be the word to be replaced and letters following the hyphen are compared against rules of hyphenation e.g. " ’nt" means "not", and the appropriate suffix is added to the new word.