P-12-3-20-1

Edith Wharton
Collection

Cautionaries are simply edits to the original content for the purposes of improving the usability and clarity of the informatic design.  Edits should focus on identifying the framework of the original content in its entirety, including redundant messages of cultural or legal significance.  The following edits were made to the content to improve the framework:
  1. Words were stemmed.
  2. Stop Words were used.
  • The Stop Word List: 'a', 'about', 'above', 'above', 'across', 'after', 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', 'already', 'also','although','always','am','among', 'amongst', 'amoungst', 'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', 'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become','becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below', 'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by', 'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de', 'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight', 'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine', 'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off', 'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise', 'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please', 'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious', 'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so', 'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', 'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', 'therein', 'thereupon', 'these', 'they', 'thick', 'thin', 'third', 'this', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too', 'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up', 'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever', 'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves', 'the'.

  • The Reasoning Behind the Selection - These words are of high frequency, non-unique generality.  They are simply removed to clarify the content, of a more unique terminology, during the analytic stage of modeling.  There are other words that could be included or excluded, as the method of removal isn’t intended to be exact.  However, the terms should be non-unique, of high frequency, and fully disclosed to users of the informatic model.  That is, these terms after the analytic stage are returned to the informatic model in developing the networks, layering, directionality, and detailing of the model. 
  • Implications of Selection - The methodology generalizes the unstructured information, so regardless of the nuanced changes of a stop word list; which may or may not include some unique terms, or may or may not meet a particular standard asserted as ideal; the given methodology returns these words to the corpus for the informatic modelling, and the generalized form of significant associations are consistently accounted for, even if some words of significant association were treated as stop words initially.  That is, there isn't a perfect stop word list, and lists will vary, but the informatic methodology manages these variations for a consistent outcome, so long as most non-unique terminology is removed.  


Specific Cautionaries

The following cautionaries are more specific to the Wharton - Collection
  • There were a large variety of numbers and number-letter combinations that marked news sections. All numbers, letter-number combinations not constituting words or abbreviations were removed after the analytic modeling stage.  Some low-frequency of numbers meshing with words were removed as well.  All combinations were removed to improve the usability and clarity of the content being modeled informatically.
  • No words were removed, other than what is listed on the Stop Word list.  These words were removed only for the framing and analytic stages.  Words are returned during the network, layering, and detailing stages of modeling. 
  • Errors involving the content, such as conversion errors of words are not edited and will remain transparent to viewers of the model.  The focus is on developing trust through process and procedure, not through avenues easily manipulated, such as finely-threaded performances of perfection and cosmetic appeal.  Exceptions will be listed in the "specific edits" section.   
  • Split words that are merged back together, if any, will be listed in specific edits.
  • The userability standard is used moderately.  That is, terms like "ebook", or proper nouns, such as publisher names, or any other term reflective of the overall publication, will likely be included into the modeling process.  The models are designed to account for terms that work in different contexts, such as publication terms, that will be presented alongside the design of the actual written work, with the ideas of the given author intact.  
  • This methodology is designed to manage the unstructured informational environment, of a sound and consistent overall design, that manifests from categorical arrangements that are inconsistent and imperfect, like that of a hairstyle.  Even though terms, these individual hairs, will change, the overall styling, the informatic model, will remain largely the same, of a consistent arrangement of major nodes.  In this way, the unstructured informational environment differs from the structured informational environment.  
  • To improve the readability of models non-alphanumeric symbols are likely to be removed.

Specific Edits

0 0 0 0 0 00 01 01 02 03 04 05 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.1 1.2 1.3 1.4 1.5 1.50 1.6 1.8 1.9 10 10 10 10 100 100 100 100 102 104 106 106 108 108 11 11 11 110 110 110 11052 112 113 113 116 116 116 117 119 12 12 12 12 120 121 122 1248 1263 128 129 13 13 13 130 139 13th 14 14 140 143 144 144 1444 148 1481 1487 14th 14th_ 15 15 15 150 1508 151 151 1514 1516 1537 1541 1546 1548 1550 1564 1578 1580 159 15th 16 16 16 16 160 1604 1604 1604 1614 1634 166 1672 1673 1682 1683 1693 16th 17 17 170 1714 1720 173 1735 175 175 1752 176 1778 179 18 18 1803 181 1818 182 1830 1831 1832 184 186 187 187 187 187 1876 1876 1877 1877 1878 1878 1878 188 1880 1880 189 189 1891 1892 1893 1893 1893 1894 1894 19 19 19 19 1900 1901 1901 1902 1903 1903 1904 1904 1904 1905 1905_ 1907 1908 1908 1908 1908 1909 1909 191 1910 1911 1912 1912 1912 1913 1914 1914 1914 1914 1915 1915 1915 1916 1916 1916 1916 1916 1916 1916 1916 1916 1917 1917 1917 1918 1918 1918â 1919 1919 1919 1919 192 1920 1920 1920 1922 1922 1922 1923 1923 1924 194 196 197 198 19th 1st 2 2 2 2 2 2 2 2 2 2 2.1 2.11 2.13 2.14 2.15 2.2 2.6 2.7 2.8 20 200 2002 2003 2003 2008 2009 2017 202 2020 2020 204 205 207 208 209 20th 20th 20th 21 210 212 212 218 22,600 222 222 224 225 226 226 227 229 22nd 23 23d 23rd 240 240 241 24131 24132 24133 24133 24133 24348 24349 24350 244 245 246 24689 247 249 24th 25 25 250 250 254 256 256 258 258 259 25th 26 263 268 268 269 27 274 276 283 284 28th 29 29 295 3 3 3 3 3 3 3 3 3 3.3 3.4 3.5 3.6 3.7 30 30 306 31 311 32 33 33 33 330 39 39 4 4 4 4 4 4 4 4 4.1 4.10 4.11 4.2 4.5 4.6 4.8 4.9 40 41,985 41855 43 4327 439 45 45 4517 4518 4519 4519 4519 4533 4549 4550 47 48 48 49 5 5 5 5 5 5 5 5 5 5 5 5 50 5009 51 52 52 53 53 53495 54 541 56 56 57 0 0 0 08 09 1 1 1.3 1.4 1.5 1.6 10 100 100 102 103 106 108 11 11 110 110 11052 113 113 113 117 119 12 122 125 1263 13 13 130 139 14 14 140 140 140 143 144 148 14th 15 150 150 151 151 1514 1516 1537 1550 1564 1564 1578 1580 1581 159 16 160 1604 1614 1622 166 1682 1701 1714 1720 1735 175 1751 176 1765 1774 1778 1789 179 18 181 1818 1830 186 187 187 187 1876 1877 1878 188 1880 1892 1893 1893 1894 19 1900 1903 1905 1908 1908 1909 191 1914 1914 1914 1914 1914 1916 1916 1917 1917 1918 1918â 1919 1919 1920 196 198 2 2 2 2 2 2.13 2.14 2.15 2.6 20 200 2002 2008 2009 2017 2020 207 208 209 20th 21 210 210 211 211 212 212 212 213 218 219 219 22,600 222 223 224 225 225 226 226 226 227 229 23rd 24 241 24131 24132 24133 24348 24349 24350 244 245 249 25 250 250 256 258 258 259 26 269 27 276 276 283 284 28th 29 295 3 3 3 3 3 3.7 306 311 32 32 33 33 330 362 39 4 4 4 4.2 4.5 4.6 41,985 41855 4327 4517 4518 4519 4519 4519 4519 4533 4549 4550 46 47 48 50 53 53 53495 54 541 56 57 57 57786 6 6 6 60 60 60 61290 61321 62 7 70 747 75 7516 76 77 78 8 80 824 83 83 84 87 87 89 9 90 90 90 9190 98 57 57 575 57786 58 58 6 6 6 6 6 6 60 60 61290 61290 61321 61321 66 67 7 7 7 7 7 70 70 72 747 7516 76 77 79 7th 8 8 8 8 8 8 8 8 80 824 83 83 84 86 87 87 88 89 89 8th 9 9 9 9 91 91 92 92 93 93 93 94 94 94 95 95 96 96 97 97 98 98 99