P-4-25-20-1

Samuel Butler
Collection

Cautionaries are simply edits to the original content for the purposes of improving the usability and clarity of the informatic design.  Edits should focus on identifying the framework of the original content in its entirety, including redundant messages of cultural or legal significance.  The following edits were made to the content to improve the framework:
  1. Words were stemmed.
  2. Stop Words were used.
  • The Stop Word List: 'a', 'about', 'above', 'above', 'across', 'after', 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', 'already', 'also','although','always','am','among', 'amongst', 'amoungst', 'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', 'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become','becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below', 'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by', 'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de', 'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight', 'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine', 'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off', 'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise', 'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please', 'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious', 'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so', 'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', 'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', 'therein', 'thereupon', 'these', 'they', 'thick', 'thin', 'third', 'this', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too', 'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up', 'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever', 'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves', 'the'.

  • The Reasoning Behind the Selection - These words are of high frequency, non-unique generality.  They are simply removed to clarify the content, of a more unique terminology, during the analytic stage of modeling.  There are other words that could be included or excluded, as the method of removal isn’t intended to be exact.  However, the terms should be non-unique, of high frequency, and fully disclosed to users of the informatic model.  That is, these terms after the analytic stage are returned to the informatic model in developing the networks, layering, directionality, and detailing of the model. 
  • Implications of Selection - The methodology generalizes the unstructured information, so regardless of the nuanced changes of a stop word list; which may or may not include some unique terms, or may or may not meet a particular standard asserted as ideal; the given methodology returns these words to the corpus for the informatic modelling, and the generalized form of significant associations are consistently accounted for, even if some words of significant association were treated as stop words initially.  That is, there isn't a perfect stop word list, and lists will vary, but the informatic methodology manages these variations for a consistent outcome, so long as most non-unique terminology is removed.  


Specific Cautionaries

The following cautionaries are more specific to the S. Butler - Collection
  • There were a large variety of numbers and number-letter combinations that marked news sections. All numbers, letter-number combinations not constituting words or abbreviations were removed after the analytic modeling stage.  Some low-frequency of numbers meshing with words were removed as well.  All combinations were removed to improve the usability and clarity of the content being modeled informatically.
  • No words were removed, other than what is listed on the Stop Word list.  These words were removed only for the framing and analytic stages.  Words are returned during the network, layering, and detailing stages of modeling. 
  • Errors involving the content, such as conversion errors of words are not edited and will remain transparent to viewers of the model.  The focus is on developing trust through process and procedure, not through avenues easily manipulated, such as finely-threaded performances of perfection and cosmetic appeal.  Exceptions will be listed in the "specific edits" section.   
  • Split words that are merged back together, if any, will be listed in specific edits.
  • The userability standard is used moderately.  That is, terms like "ebook", or proper nouns, such as publisher names, or any other term reflective of the overall publication, will likely be included into the modeling process.  The models are designed to account for terms that work in different contexts, such as publication terms, that will be presented alongside the design of the actual written work, with the ideas of the given author intact.  
  • This methodology is designed to manage the unstructured informational environment, of a sound and consistent overall design, that manifests from categorical arrangements that are inconsistent and imperfect, like that of a hairstyle.  Even though terms, these individual hairs, will change, the overall styling, the informatic model, will remain largely the same, of a consistent arrangement of major nodes.  In this way, the unstructured informational environment differs from the structured informational environment.  

Specific Edits

0 0 0d 1 1 1 1 1 10 10 10,000 101 103 104 104 105 105 105 107 108 109 109 10d 10s 11 11 113 113 114 114 115 117 119 119 12 12 12 12 120 121 122 123 124 125 126 126 126 127 129 13 13 13 130 131 133 133 134 135 135 136 137 139 140 140 141 142 144 145 145 146 147 147 148 149 149 15 15 15,000 15,000 151 152 156 156 157 157 158 158 15th 16 160 163 163 166 166 167 168 16th 171 172 1727 1727 173 174 177 179 181 181 1811 1812 182 183 1831 1835 184 184 1841 1849 185 1850 1850 1851 1851 186 1860 1861 1863 1865 187 1871 1872 1872 1877 189 189 19 1900 1900 1901 1901 192 192 193 193 194 194 195 195 197 197 198 199 199 199 1s 2 2 2 20 20 20 200 200 200 0 1 10 10,000 100 100 100,000 1000 102 104 105 106 107 108 10s 10s 11 11 110 112 114 116 118 12 120 121 122 124 126 128 129 130 132 134 135 136 138 14 140 142 144 145 146 148 149 15 15,000 150 1500 152 154 156 158 16 160 162 164 166 168 17,500 170 172 174 176 178 179 18 180 182 1831 1835 184 1846 1849 185 1851 1858 186 1870 1872 188 189 19 190 1901 192 194 196 197 198 2 2 20 20 20,000 20,000 200 200 202 204 206 208 210 211 212 214 216 218 22 220 222 223 224 225 226 228 23 230 231 2315 232 234 235 236 237 238 239 24 240 242 243 244 246 247 248 25 25 25 250 250 2500 252 254 256 258 26 260 262 263 264 266 268 270 271 272 273 274 275 276 278 28 280 281 282 284 286 288 29 290 291 292 294 296 297 298 299 3 30 30,000 300 300 3000 302 304 306 308 31 32 32 34 3500 36 38 39 4 40 400 4000 41 42 43 44 46 48 5 5 50 50 500 500 500,000 5000 52 53 54 55 56 57 58 6 6 60 60,000 600 62 63 64 66 68 69 6d 7 70 70,000 700 71 72 74 75 75 76 78 79 80 80,000 81 82 84 86 88 8th 90 900 91 92 93 93 94 96 98 99 201 205 207 207 209 21 210 211 213 215 216 216 217 218 219 22 22 220 221 222 222 223 225 226 229 23 23 230 231 2315 232 233 235 235 237 239 24 243 243 244 244 245 245 246 246 247 248 25 25 25 25 250 250 251 252 254 255 255 257 258 259 26 260 261 262 262 263 263 267 268 269 27 27 270 271 271 271 273 274 275 275 276 277 277 278 278 28 280 281 282 282 285 286 286 287 288 288 289 29 291 292 293 293 294 294 296 296 297 298 299 2s 3 3 3 30 300 300 305 306 307 308 31 311 312 315 317 318 319 32 320 322 324 325 327 33 330 331 332 333 335 336 34 342 343 344 345 346 347 348 359 360 362 367 37 37 373 375 378 379 385 39 39 394 395 396 4 4 4 400 404 41 41 416 419 42 421 422 423 426 428 43 438 443 445 447 45 451 453 454 458 46 460 461 462 467 472 478 48 481 482 484 485 49 49 490 491 496 498 5 5 50 500 5000 5000 5000 501 506 51 51 512 513 518 52 520 522 523 527 528 53 530 538 539 54 540 542 549 55 55 551 553 554 556 56 560 561 564 566 57 572 574 576 579 58 580 585 589 59 590 591 592 6 6 6 601 603 604 607 608 609 61 612 613 618 619 62 62 623 625 627 63 630 633 637 638 639 64 641 643 645 646 649 65 656 658 660 661 662 663 667 67 671 673 674 676 678 679 68 681 682 684 688 689 69 693 694 696 6d 6d 6d 6d 6d 7 7 7 70 701 704 706 708 709 71 71 710 713 718 72 720 724 727 728 73 730 733 735 75 75 76 77 78 79 79 8 80 80 80 81 81 82 83 85 86 87 88 9 9 9 91 91 92 92 93 94 96 97 99