P-8-20-20-1

William Dean Howells
Collection
Cautionaries are simply edits to the original content for the purposes of improving the usability and clarity of the informatic design.  Edits should focus on identifying the framework of the original content in its entirety, including redundant messages of cultural or legal significance.  The following edits were made to the content to improve the framework:
  1. Words were stemmed.
  2. Stop Words were used.
  • The Stop Word List: 'a', 'about', 'above', 'above', 'across', 'after', 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', 'already', 'also','although','always','am','among', 'amongst', 'amoungst', 'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', 'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become','becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below', 'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by', 'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de', 'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight', 'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine', 'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off', 'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise', 'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please', 'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious', 'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so', 'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', 'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', 'therein', 'thereupon', 'these', 'they', 'thick', 'thin', 'third', 'this', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too', 'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up', 'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever', 'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves', 'the'.

  • The Reasoning Behind the Selection - These words are of high frequency, non-unique generality.  They are simply removed to clarify the content, of a more unique terminology, during the analytic stage of modeling.  There are other words that could be included or excluded, as the method of removal isn’t intended to be exact.  However, the terms should be non-unique, of high frequency, and fully disclosed to users of the informatic model.  That is, these terms after the analytic stage are returned to the informatic model in developing the networks, layering, directionality, and detailing of the model. 
  • Implications of Selection - The methodology generalizes the unstructured information, so regardless of the nuanced changes of a stop word list; which may or may not include some unique terms, or may or may not meet a particular standard asserted as ideal; the given methodology returns these words to the corpus for the informatic modelling, and the generalized form of significant associations are consistently accounted for, even if some words of significant association were treated as stop words initially.  That is, there isn't a perfect stop word list, and lists will vary, but the informatic methodology manages these variations for a consistent outcome, so long as most non-unique terminology is removed.  


Specific Cautionaries

The following cautionaries are more specific to the Howells - Collection
  • There were a large variety of numbers and number-letter combinations that marked news sections. All numbers, letter-number combinations not constituting words or abbreviations were removed after the analytic modeling stage.  Some low-frequency of numbers meshing with words were removed as well.  All combinations were removed to improve the usability and clarity of the content being modeled informatically.
  • No words were removed, other than what is listed on the Stop Word list.  These words were removed only for the framing and analytic stages.  Words are returned during the network, layering, and detailing stages of modeling. 
  • Errors involving the content, such as conversion errors of words are not edited and will remain transparent to viewers of the model.  The focus is on developing trust through process and procedure, not through avenues easily manipulated, such as finely-threaded performances of perfection and cosmetic appeal.  Exceptions will be listed in the "specific edits" section.   
  • Split words that are merged back together, if any, will be listed in specific edits.
  • The userability standard is used moderately.  That is, terms like "ebook", or proper nouns, such as publisher names, or any other term reflective of the overall publication, will likely be included into the modeling process.  The models are designed to account for terms that work in different contexts, such as publication terms, that will be presented alongside the design of the actual written work, with the ideas of the given author intact.  
  • This methodology is designed to manage the unstructured informational environment, of a sound and consistent overall design, that manifests from categorical arrangements that are inconsistent and imperfect, like that of a hairstyle.  Even though terms, these individual hairs, will change, the overall styling, the informatic model, will remain largely the same, of a consistent arrangement of major nodes.  In this way, the unstructured informational environment differs from the structured informational environment.  

Specific Edits

0 0 0 0 0 0 0 00 00 00 00 01 01 01 01 01 01 01 019l 02 02 03 030 036r 04 04 041 047 049 05 05 053 059l 06 067 07 072 08 087 09 093 097l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.25 1.25 1.25 1.25 1.50 1.50 1.50 1.50 1.50 1.50 1.75 1.75 10 10 10 10 10,000 100,000 101 103 103 105 106 107 11 11 11 11 11 110 114 116 117 12 12 12 12 121 122 122 125 125 126 1277 128 12m 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 13 13 13 13 13_ 130 131 1328 133 1334 135 138 139 14 140 141 141 14276 143 143 149 15 15 15 154 154 1565 159 15th 16 1647 165 16mo 16mo 16mo 16mo 16mo 16mo 16mo 16mo 17 170 1707 1764 1774 1781 1788 1788 1791 1797 1799 17th 18 18 18 1802 1803 1814 1817 182 1822 1825 1829 183 1831 1835 1837 1838 1844 1850 1853 1855 1855 18565 18565 1859 1859 186 1860 1860 18605 18605 1863 1864 1864 1864 1866 1867 1869 1870 1870 1872 1872 1874 1875 1879 1879 1879 1879 188 1885 1885 1885 1887 1889 1890 1890 1891 1891 1892 1892 1892 1893 1893 1893 1894 1894 1895 1897 1898 1898 1899 1899 19 190 1900 1900 1900 1900 1900 1901 1901 1902 1902 1902 1902 1904 1904 1904 1905 1906 1906 1908 1908 1909 1909 1909 191 1911 1913 1915 1916 1916 1916 1916 1917 1917 1921 199 1996 19th 1â 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2.00 2.00 2.00 2.00 2.50 20 20 2001 2002 2002 2003 2003 2004 2004 2006 2006 2006 2007 2008 2008 201 2013 2016 2016 2018 202 205 209 21 21 211l 21381 214 219 21st 22 22 22 22 223 223l 22519 22519 228 23 23 23 232 2334 234r 236 237 237 237 24 242r 24689 25 25 25 25 25 2506 25383 25383 255 255 257 258r 259l 26 26 262r 268r 269l 27 27 272 275 278 28 28 28 280 280r 284l 286 28th 29 29 293 3 3 3 3 3 3 3 3 3 3 3 3 3 30 30,000_l_ 300 3051 31 31 31 31st 32 32 32_ 321 3237 32mo 32mo 32mo 32mo 32mo 3363 3364 3365 3374 3377 3377 3378 3378 3380 3380 3383 3383 3384 3384 3385 3385 3386 3386 3387 3387 3389 3389 3389 3389 3390 3391 3392 3393 3394 3395 3396 3398 3398 3398 3399 34 34 3401 3402 3403 3404 35 36 36 37 38 39 4 4 4 4 4 4 4 4 4 4 4 4 4 40 41 41 41 42 42 4270 43 43 43469 44 44 45 45 46 4600 4600 4600 4645 47 48 48 48 4to 5 5 5 5 5 5 5 50 50 50 50 50 50 50 50 50 50 50 50 50,000 51 52 52 52_ 53 56 0 0 0 00 00 00 00647020 01 01 01 02 05 07 08 1 1 1 1 1 1 1.00 1.00 1.00 1.00 1.25 1.50 1.50 1.75 10 10 10 100,000 101 103 105 107 11 11 115th 1162 117 1177 11th 12 122 124 129 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12mo 12th 13 13 13_ 130 1325 1359 1370 14 141 14276 1429 143 143 1469 1484 15 1516 1532 154 159 165 1673 16mo 16mo 1703 1774 1774 1778 1782 1787 1792 18 1810 1812 1812 1817 183 1831 1835 1849 1849 1850 1852 1855 18565 1859 1860 1860 1860 1860 18605 1862 1863 1864 1870 1871 1872 1879 1879 1884 1885 1890 1892 1893 1893 1893 1894 1894 1895 1899 1899 19 190 1900 1901 1902 1902 1903 1905 1906 1906 1908 1909 1916 1917 192 1920 199 1996 19th 2 2 2 2 2 2 2.00 2.00 2_ 2002 2002 2004 2006 2007 2008 2013 2016 2018 21381 219 22 22519 22519 22519 228 23 2334 234r 236 237 237 237 24 24 24th 25 25 25 2506 25383 255 255 257 258r 26 268r 269l 275 28th 29 293 2mo 3 3 3 3 30 30 300 300 302 3051 31 32 3237 32mo 32mo 32mo 32mo 3363 3364 3365 3374 3377 3377 3378 3378 3380 3380 3383 3383 3384 3384 3385 3385 3386 3386 3387 3387 3389 3390 3391 3392 3393 3394 3395 3396 3398 3398 3398 3398 3398 34 34 3401 3402 3403 3404 34th 35 38 38 380 3d 4 4 4 4 40 41 41 42 4270 43 43469 44 44 45 45 45 4600 4600 4600 4645 48 48 48 48 48 4th 4to 5 5 50 50 50 50 50 50 50,000 51 51st 52 53 53 56 56970 57 6 60 61 6221541 65 7 7 7 7.30 70 7130 723 724 724 724 724 726 728 7364 7422 7430 75 75 77 7797 7839 8 8 8 8 8 80 82 84 8449 85 85th 89 89 8th 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 9 9 90 90 900 926 96 96th 99 999th 56970 56970 57 57 59 6 6 6 6 6 60 60 61 6221541 65 67 675 69 7 7 7 7 7 7 7 7 7083 7083 7130 723 724 724 724 726 728 7364 7422 7430 75 75 75 77 7797 8 8 8 8 8 8 8 8 8 8 8 80â 82 83 83 84 85 89 89 89 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 8vo 9 9 9 91 92 92 92 93 93 94 95 96 97 98 99 99 999th