P-7-17-20-1

Herman Melville
Collection
Cautionaries are simply edits to the original content for the purposes of improving the usability and clarity of the informatic design.  Edits should focus on identifying the framework of the original content in its entirety, including redundant messages of cultural or legal significance.  The following edits were made to the content to improve the framework:
  1. Words were stemmed.
  2. Stop Words were used.
  • The Stop Word List: 'a', 'about', 'above', 'above', 'across', 'after', 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', 'already', 'also','although','always','am','among', 'amongst', 'amoungst', 'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', 'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become','becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below', 'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by', 'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de', 'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight', 'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine', 'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off', 'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise', 'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please', 'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious', 'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so', 'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', 'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', 'therein', 'thereupon', 'these', 'they', 'thick', 'thin', 'third', 'this', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too', 'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up', 'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever', 'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves', 'the'.

  • The Reasoning Behind the Selection - These words are of high frequency, non-unique generality.  They are simply removed to clarify the content, of a more unique terminology, during the analytic stage of modeling.  There are other words that could be included or excluded, as the method of removal isn’t intended to be exact.  However, the terms should be non-unique, of high frequency, and fully disclosed to users of the informatic model.  That is, these terms after the analytic stage are returned to the informatic model in developing the networks, layering, directionality, and detailing of the model. 
  • Implications of Selection - The methodology generalizes the unstructured information, so regardless of the nuanced changes of a stop word list; which may or may not include some unique terms, or may or may not meet a particular standard asserted as ideal; the given methodology returns these words to the corpus for the informatic modelling, and the generalized form of significant associations are consistently accounted for, even if some words of significant association were treated as stop words initially.  That is, there isn't a perfect stop word list, and lists will vary, but the informatic methodology manages these variations for a consistent outcome, so long as most non-unique terminology is removed.  


Specific Cautionaries

The following cautionaries are more specific to the Melville - Collection
  • There were a large variety of numbers and number-letter combinations that marked news sections. All numbers, letter-number combinations not constituting words or abbreviations were removed after the analytic modeling stage.  Some low-frequency of numbers meshing with words were removed as well.  All combinations were removed to improve the usability and clarity of the content being modeled informatically.
  • No words were removed, other than what is listed on the Stop Word list.  These words were removed only for the framing and analytic stages.  Words are returned during the network, layering, and detailing stages of modeling. 
  • Errors involving the content, such as conversion errors of words are not edited and will remain transparent to viewers of the model.  The focus is on developing trust through process and procedure, not through avenues easily manipulated, such as finely-threaded performances of perfection and cosmetic appeal.  Exceptions will be listed in the "specific edits" section.   
  • Split words that are merged back together, if any, will be listed in specific edits.
  • The userability standard is used moderately.  That is, terms like "ebook", or proper nouns, such as publisher names, or any other term reflective of the overall publication, will likely be included into the modeling process.  The models are designed to account for terms that work in different contexts, such as publication terms, that will be presented alongside the design of the actual written work, with the ideas of the given author intact.  
  • This methodology is designed to manage the unstructured informational environment, of a sound and consistent overall design, that manifests from categorical arrangements that are inconsistent and imperfect, like that of a hairstyle.  Even though terms, these individual hairs, will change, the overall styling, the informatic model, will remain largely the same, of a consistent arrangement of major nodes.  In this way, the unstructured informational environment differs from the structured informational environment.  

Specific Edits

0 0 0 0 0 00 01 02 03 04 05 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1819 1_st_ 10 10 10,000,000 100 100 101 102 103 104 105 106 107 10712 10712 108 109 10â 11 11 11 110 111 112 11231 113 114 115 116 117 118 118,952 119 11th 12 12 12 120 121 122 123 12384 124 125 125,000 127 128 12841 129 13 13 13 13 130 131 132 133 134 135 13720 139 14 14 14 14 147 15 15 15 15 150 15422 15859 15859 16 16 16 16 1613 1652 1668 167 1671 168 1681_ 1683 1684 17 17 17 17 1729 173 1772 1775 1775 1776 1778 1793 1798 1798 18 18 18 18 18 1803 1803 1808 1813 1820 1821 1825 1828 1828 1833 1836 1839 1839 1839 184 1840 1841 1846 1847 1849 1849 1849 1850 1850 1851 1851 1851 1852 1852 1856 1857 1857 1860 1860 1861 1861 1861 1861 1861 1862 1862 1862 1862 1863 1863 1863 1863 1864 1864 1865 1865 1865 1865 1865_ 1866 1876 1876 1886 1891 1892 1892 19 19 19 1900 1900 194 1999 1st 1st 2 2 2 2 2 2 2 2 2 20 20 20 20,000 20,000,000 200,000 2003 2004 2005 2007 2008 2008 2010 2011 2015 2017 20â 20th 20th 21 21 21 21 211 2151 21816 21816 22 22 22 22 224 23 23 23 23 233 24 24 24 246 247 25 25 257 26 266 268 2701 2701 2701 271 277 28 28 28 29 29 294 2nd 3 3 3 3 3 3 3 3_d_ 30 30 301 31 31 31_st_ 312 317 32 32 329 32â 33 33 331 333 34 34 34970 34970 35 350 351 36 36 360 37 37 374 38 38 384 38â 39 39 4 4 4 4 4 4 4 4 4,000,000 4.1 40 40 40 402 4045 41 41 418 418 42 42 43 43 439 44 44 45 450 46 46 47 47 478 48 48 49 49 4th 5 5 5 5 5 5 5 5 50 50 500 500,000 500,000 5000 51 52 53 54 54 55 55 56 56 57 58 59 59 6 6 6 60 60 61 61 62 62 63 63 64 64 64 65 0 0 0 1 1 1 1 1 1 1 10,000,000 10,440 100 100 103 106 10712 10712 108 109 10â 11 11,000,000 110 11231 116 12 12 12 12 121 121 122 12384 125,000 12841 13 13720 139 14 14 14 140 147 15 15422 15859 15th 167 1671 1681_ 1684 1688 173 1768 1776 1776 1777 1782 1793 1798 1799 18 180 1807 1812 1813 1813 1825 1840 1841 1849 1849 1850 1851 1852 1856 1862 1864 1864 1865 1865_ 1866 1872 1876 1886 1892 19 1900 1900 194 1st 1st 1st 1st 1st 2 2 2 20 20 200,000 2000 2004 2005 2007 2008 2010 2011 2015 2017 21st 22 224 23 233 24 24 247 24th 25 25,000 257 26 268 2701 275th 277 28 28 29 294 2nd 3 3 3 3 3 3 30 300 300,000 31 31 312 32 32 321 32â 33 331 333 34970 35 350 360 37 374 38 384 4,000 4.1 40 40 40 402 4045 41 418 42 43 439 45 450 46 47 475 49 4th 5 5 5 5,400 50 50 500 500 500,000 500,000 5000 55 550 56 56 57 58 58 59 5th 6 6 60 61 62 63 64 64 65 67 68 7 7 7 73 74 75 78 78 8 8 8 8 83 84 87 88 89 89 9 9 90 90 90 92 93 96 97 98 98 99 65 66 66 67 67 68 68 69 69 7 7 7 7 7 70 70 71 72 72 73 73 74 74 74 75 75 750 76 76 77 77 78 78 79 79 8 8 8 8 8 8 8 8 80 80 81 82 83 83 84 85 86 86 87 87 88 88 89 89 890 9 9 9 9 9 90 90 91 91 92 92 92 93 93 93 94 94 94 95 95 95 96 96 965,000 97 97 97 98 98 98 98 99 99 99 99 99