P-10-25-20-2

Saki
Collection

Cautionaries are simply edits to the original content for the purposes of improving the usability and clarity of the informatic design.  Edits should focus on identifying the framework of the original content in its entirety, including redundant messages of cultural or legal significance.  The following edits were made to the content to improve the framework:
  1. Words were stemmed.
  2. Stop Words were used.
  • The Stop Word List: 'a', 'about', 'above', 'above', 'across', 'after', 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', 'already', 'also','although','always','am','among', 'amongst', 'amoungst', 'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', 'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become','becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below', 'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by', 'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de', 'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight', 'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine', 'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off', 'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise', 'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please', 'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious', 'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so', 'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', 'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', 'therein', 'thereupon', 'these', 'they', 'thick', 'thin', 'third', 'this', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too', 'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up', 'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever', 'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves', 'the'.

  • The Reasoning Behind the Selection - These words are of high frequency, non-unique generality.  They are simply removed to clarify the content, of a more unique terminology, during the analytic stage of modeling.  There are other words that could be included or excluded, as the method of removal isn’t intended to be exact.  However, the terms should be non-unique, of high frequency, and fully disclosed to users of the informatic model.  That is, these terms after the analytic stage are returned to the informatic model in developing the networks, layering, directionality, and detailing of the model. 
  • Implications of Selection - The methodology generalizes the unstructured information, so regardless of the nuanced changes of a stop word list; which may or may not include some unique terms, or may or may not meet a particular standard asserted as ideal; the given methodology returns these words to the corpus for the informatic modelling, and the generalized form of significant associations are consistently accounted for, even if some words of significant association were treated as stop words initially.  That is, there isn't a perfect stop word list, and lists will vary, but the informatic methodology manages these variations for a consistent outcome, so long as most non-unique terminology is removed.  


Specific Cautionaries

The following cautionaries are more specific to the Saki - Collection
  • There were a large variety of numbers and number-letter combinations that marked news sections. All numbers, letter-number combinations not constituting words or abbreviations were removed after the analytic modeling stage.  Some low-frequency of numbers meshing with words were removed as well.  All combinations were removed to improve the usability and clarity of the content being modeled informatically.
  • No words were removed, other than what is listed on the Stop Word list.  These words were removed only for the framing and analytic stages.  Words are returned during the network, layering, and detailing stages of modeling. 
  • Errors involving the content, such as conversion errors of words are not edited and will remain transparent to viewers of the model.  The focus is on developing trust through process and procedure, not through avenues easily manipulated, such as finely-threaded performances of perfection and cosmetic appeal.  Exceptions will be listed in the "specific edits" section.   
  • Split words that are merged back together, if any, will be listed in specific edits.
  • The userability standard is used moderately.  That is, terms like "ebook", or proper nouns, such as publisher names, or any other term reflective of the overall publication, will likely be included into the modeling process.  The models are designed to account for terms that work in different contexts, such as publication terms, that will be presented alongside the design of the actual written work, with the ideas of the given author intact.  
  • This methodology is designed to manage the unstructured informational environment, of a sound and consistent overall design, that manifests from categorical arrangements that are inconsistent and imperfect, like that of a hairstyle.  Even though terms, these individual hairs, will change, the overall styling, the informatic model, will remain largely the same, of a consistent arrangement of major nodes.  In this way, the unstructured informational environment differs from the structured informational environment.  
  • To improve the readability of models non-alphanumeric symbols are likely to be removed.

Specific Edits

0 0 04 04 0f 0h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 100 100,000 101 101 1015 1016 103 104 105 105 1054 109 11 111 1113 114 115 115 1157 116 117 117 118 118 119 12 12 12 12 12 120 120 1202 1205 121 121 1212 1219 122 122 1228 123 124 124 124 1240 125 1258 1264 1265 1268 127 127 1270 128 128 128 128 129 1294 1296 13 13 130 131 131 1316 132 132 132 1324 1328 133 134 1341 135 135 136 136 137 137 1377 138 138 1384 1386 139 139 1399 13t3 14 14 14 14 14 140 141 141 142 1425 1434 1447 14540 14540 14540 1456 1459 146 1460 1467 147 147 1471 1472 1477 148 148 1485 1487 149 149 149 1490 1490 1492 1499 14th 15 15 15 15 15 15 15 15 15.1 150 150 150 150,000 1505 1508 151 151 1514 1517 152 1520 153 153 1533 1534 1537 1538 154 1548 1553 1554 1557 156 1561 1563 1565 1568 1569 157 1570 1573 1575 1576 1579 1580 1583 1585 1590 1591 1593 1595 16 16 16 16 1604 1604 1605 1607 1609 161 1610 1611 1612 1614 1614 1618 162 163 164 164 164 165 166 167 167 167 168 168 169 169 169 17 17 17 173 173 174 174 174 175 176 176 177 178 17th 18 180 184 184 185 185 185 187 187 1870 18th 19 19 19 19 1900 1901 1901 1902 1903 191 191 191 1911 1913 1914 1915 1915 1916 1916 1916 1918_ 193 193 195 196 196 198 19m 2 2 2 2 2 2 2 2,000 20 20 20 20 20 20 20 20,000 2000 2001 2003 2004 2009 201 201 201 202 205 207 208 20v 21 212 2120 213 22 220 221 222 222 223 229 229 23 230 231 232 233 233 235 236 239 239 24 24 24 243 244 246 247 248 24s 25 25 25 250 251 253 258 258 26 26 263 264 264 265 266 267 269 269 269 27 273 273 274 275 276 278 278 279 27th 28 28 28 282 283 2830 285 289 29 291 293 294 295 296 297 298 298 299 29th 2g2 2i6 2i8 2ig 2ii 2iy 2s 2y 3 3 3 3 3 3 3 3 3.12 30 30 30 30 30 30 300 300 301 304 305 306 306 307 31 31 311 312 312 312 313 313 314 318 318 32 321 322 322 324 326 329 33 33 331 332 333 334 335 336 337 339 341 342 343 345 346 347 348 348 348 349 34a 35 350 351 352 353 356 357 357 359 360 361 362 363 365 366 366 367 3688 3688 369 37 37 370 373 374 375 376 377 379 38 38 381 383 383 384 385 386 387 389 39 391 393 398 399 3i 3i6 3oorf 4 4 4 4 4 4 4 4 4 4,7 40 40,000 400 401 402 407 409 40s 41 41 41 41 412 417 419 41s 42 42 421 422 428 429 43 43 43 430 431 433 434 434 435 437 439 443 447 449 45 450 451 453 454 455 456 457 458 459 46 461 463 464 469 47 47 47 473 477 478 479 483 484 486 487 488 489 49 490 492 493 493 494 495 497 499 49i 4e 4i8 4th 4u 4x6 5 5 5 5 5 5 5 5 5 5 5 5 5 50 50 503 50a 510 517 5178 51s 524 527 527 529 53 53 532 533 534 535 536 537 538 539 53i 54 54 540 541 543 544 545 547 549 55 55 555 555 556 56 562 563 564 567 569 573 574 577 57i 58 582 583 586 59 59 591 592 593 5hort 5i6 5s9 5x8 6 6 6 605 61 614 62 62 629 630 630 633 63a 641 645 646 647 649 65 653 66 66o 67 67 670 671 67a 684 688 68i 68o 69 69 69 692 693 6d 6d.รข 6l 6l6 6lk 6oi 6s6 7 7 7 7 7 7 7 7 7 70 70 709 70u 71 71 710 713 717 72 72 720 73 73 74 74 74 74 75 76 76 76 77 78 78 79 8 8 8 8 80 83 843 86 87 88 89 8l 9 9 9 9 9 90 91 912 92 92 93 93 93 94 94 94 95 9503 96 96 97 97 977 98