|
1 | 1 | ## Dataset Summary
|
2 | 2 |
|
3 |
| -The USNM Bumblebee Dataset is a natural history dataset containing, for each of 62,602 Bumblebee specimens in the genus Bombus, a single image in lateral view and a tab-separated value file with occurrence data. Occurrence data includes the species classification, the date/time and site/location of collection, and other metadata conforming to the Darwin Core data standard (https://dwc.tdwg.org). 10,764 specimens are not identified to species and these specimens are included as ‘Bombus sp.’ The collecting sites/locations of the majority of specimens (41,008), have been georeferenced. The dataset is worldwide in scope, but is limited to the specimens available in the Smithsonian USNM collection. |
| 3 | +The USNM Bumblebee Dataset is a natural history dataset containing, for each of 73,497 Bumblebee specimens in the family Apidae, a single image in lateral or dorsal view and a tab-separated value file with occurrence data. Occurrence data includes the species classification, the date and site/location of collection, and other metadata conforming to the Darwin Core data standard (https://dwc.tdwg.org). 11,421 specimens are not identified to species and these specimens are included as 'Bombus sp.' or 'Xylocopa sp.' The collecting sites/locations of the majority of specimens (55,301), have been georeferenced. The dataset is worldwide in scope, but is limited to the specimens available in the Smithsonian USNM collection. |
4 | 4 |
|
5 | 5 | ## Languages
|
6 | 6 |
|
7 | 7 | English
|
8 | 8 |
|
9 | 9 | ## Data Instances
|
| 10 | + |
| 11 | +A typical data point comprises of the specimen metadata and image information for a single bumblebee specimen. |
| 12 | + |
| 13 | +An example from the dataset looks as follows: |
| 14 | + |
10 | 15 | ```json
|
11 |
| -[ |
12 |
| - { |
13 |
| - "gbifID": "2900487313", |
14 |
| - "abstract": "", |
15 |
| - "accessRights": "", |
16 |
| - "accrualMethod": "", |
17 |
| - "accrualPeriodicity": "", |
18 |
| - "accrualPolicy": "", |
19 |
| - "alternative": "", |
20 |
| - "audience": "", |
21 |
| - "available": "", |
22 |
| - "bibliographicCitation": "", |
23 |
| - "conformsTo": "", |
24 |
| - "contributor": "", |
25 |
| - "coverage": "", |
26 |
| - "created": "", |
27 |
| - "creator": "", |
28 |
| - "date": "", |
29 |
| - "dateAccepted": "", |
30 |
| - "dateCopyrighted": "", |
31 |
| - "dateSubmitted": "", |
32 |
| - "description": "", |
33 |
| - "educationLevel": "", |
34 |
| - "extent": "", |
35 |
| - "format": "", |
36 |
| - "hasFormat": "", |
37 |
| - "hasPart": "", |
38 |
| - "hasVersion": "", |
39 |
| - "identifier": "http://n2t.net/ark:/65665/3fd7061b8-48b0-4041-a9ea-a5b9aac0e767", |
40 |
| - "instructionalMethod": "", |
41 |
| - "isFormatOf": "", |
42 |
| - "isPartOf": "", |
43 |
| - "isReferencedBy": "", |
44 |
| - "isReplacedBy": "", |
45 |
| - "isRequiredBy": "", |
46 |
| - "isVersionOf": "", |
47 |
| - "issued": "", |
48 |
| - "language": "", |
49 |
| - "license": "CC0_1_0", |
50 |
| - "mediator": "", |
51 |
| - "medium": "", |
52 |
| - "modified": "2020-09-24T19:56:00Z", |
53 |
| - "provenance": "", |
54 |
| - "publisher": "", |
55 |
| - "references": "", |
56 |
| - "relation": "", |
57 |
| - "replaces": "", |
58 |
| - "requires": "", |
59 |
| - "rights": "", |
60 |
| - "rightsHolder": "", |
61 |
| - "source": "", |
62 |
| - "spatial": "", |
63 |
| - "subject": "", |
64 |
| - "tableOfContents": "", |
65 |
| - "temporal": "", |
66 |
| - "title": "", |
67 |
| - "type": "PhysicalObject", |
68 |
| - "valid": "", |
69 |
| - "institutionID": "urn:lsid:biocol.org:col:34871", |
70 |
| - "collectionID": "urn:uuid:18e3cd08-a962-4f0a-b72c-9a0b3600c5ad", |
71 |
| - "datasetID": "", |
72 |
| - "institutionCode": "USNM", |
73 |
| - "collectionCode": "ENT", |
74 |
| - "datasetName": "NMNH Extant Biology", |
75 |
| - "ownerInstitutionCode": "", |
76 |
| - "basisOfRecord": "PRESERVED_SPECIMEN", |
77 |
| - "informationWithheld": "", |
78 |
| - "dataGeneralizations": "", |
79 |
| - "dynamicProperties": "", |
80 |
| - "occurrenceID": "http://n2t.net/ark:/65665/3fd7061b8-48b0-4041-a9ea-a5b9aac0e767", |
81 |
| - "catalogNumber": "USNMENT741814", |
82 |
| - "recordNumber": "", |
83 |
| - "recordedBy": "", |
84 |
| - "recordedByID": "", |
85 |
| - "individualCount": "1", |
86 |
| - "organismQuantity": "", |
87 |
| - "organismQuantityType": "", |
88 |
| - "sex": "", |
89 |
| - "lifeStage": "Adult", |
90 |
| - "reproductiveCondition": "", |
91 |
| - "behavior": "", |
92 |
| - "establishmentMeans": "", |
93 |
| - "degreeOfEstablishment": "", |
94 |
| - "pathway": "", |
95 |
| - "georeferenceVerificationStatus": "", |
96 |
| - "occurrenceStatus": "PRESENT", |
97 |
| - "preparations": "Pinned", |
98 |
| - "disposition": "", |
99 |
| - "associatedOccurrences": "", |
100 |
| - "associatedReferences": "", |
101 |
| - "associatedSequences": "", |
102 |
| - "associatedTaxa": "", |
103 |
| - "otherCatalogNumbers": "", |
104 |
| - "occurrenceRemarks": "EMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization project.", |
105 |
| - "organismID": "", |
106 |
| - "organismName": "", |
107 |
| - "organismScope": "", |
108 |
| - "associatedOrganisms": "", |
109 |
| - "previousIdentifications": "", |
110 |
| - "organismRemarks": "", |
111 |
| - "materialSampleID": "", |
112 |
| - "eventID": "", |
113 |
| - "parentEventID": "", |
114 |
| - "fieldNumber": "", |
115 |
| - "eventDate": "", |
116 |
| - "eventTime": "", |
117 |
| - "startDayOfYear": "", |
118 |
| - "endDayOfYear": "", |
119 |
| - "year": "", |
120 |
| - "month": "", |
121 |
| - "day": "", |
122 |
| - "verbatimEventDate": "", |
123 |
| - "habitat": "", |
124 |
| - "samplingProtocol": "", |
125 |
| - "sampleSizeValue": "", |
126 |
| - "sampleSizeUnit": "", |
127 |
| - "samplingEffort": "", |
128 |
| - "fieldNotes": "", |
129 |
| - "eventRemarks": "", |
130 |
| - "locationID": "", |
131 |
| - "higherGeographyID": "", |
132 |
| - "higherGeography": "", |
133 |
| - "continent": "", |
134 |
| - "waterBody": "", |
135 |
| - "islandGroup": "", |
136 |
| - "island": "", |
137 |
| - "countryCode": "", |
138 |
| - "stateProvince": "", |
139 |
| - "county": "", |
140 |
| - "municipality": "", |
141 |
| - "locality": "", |
142 |
| - "verbatimLocality": "", |
143 |
| - "verbatimElevation": "", |
144 |
| - "verticalDatum": "", |
145 |
| - "verbatimDepth": "", |
146 |
| - "minimumDistanceAboveSurfaceInMeters": "", |
147 |
| - "maximumDistanceAboveSurfaceInMeters": "", |
148 |
| - "locationAccordingTo": "", |
149 |
| - "locationRemarks": "", |
150 |
| - "decimalLatitude": "", |
151 |
| - "decimalLongitude": "", |
152 |
| - "coordinateUncertaintyInMeters": "", |
153 |
| - "coordinatePrecision": "", |
154 |
| - "pointRadiusSpatialFit": "", |
155 |
| - "verbatimCoordinateSystem": "", |
156 |
| - "verbatimSRS": "", |
157 |
| - "footprintWKT": "", |
158 |
| - "footprintSRS": "", |
159 |
| - "footprintSpatialFit": "", |
160 |
| - "georeferencedBy": "", |
161 |
| - "georeferencedDate": "", |
162 |
| - "georeferenceProtocol": "", |
163 |
| - "georeferenceSources": "", |
164 |
| - "georeferenceRemarks": "", |
165 |
| - "geologicalContextID": "", |
166 |
| - "earliestEonOrLowestEonothem": "", |
167 |
| - "latestEonOrHighestEonothem": "", |
168 |
| - "earliestEraOrLowestErathem": "", |
169 |
| - "latestEraOrHighestErathem": "", |
170 |
| - "earliestPeriodOrLowestSystem": "", |
171 |
| - "latestPeriodOrHighestSystem": "", |
172 |
| - "earliestEpochOrLowestSeries": "", |
173 |
| - "latestEpochOrHighestSeries": "", |
174 |
| - "earliestAgeOrLowestStage": "", |
175 |
| - "latestAgeOrHighestStage": "", |
176 |
| - "lowestBiostratigraphicZone": "", |
177 |
| - "highestBiostratigraphicZone": "", |
178 |
| - "lithostratigraphicTerms": "", |
179 |
| - "group": "", |
180 |
| - "formation": "", |
181 |
| - "member": "", |
182 |
| - "bed": "", |
183 |
| - "identificationID": "", |
184 |
| - "verbatimIdentification": "", |
185 |
| - "identificationQualifier": "", |
186 |
| - "typeStatus": "", |
187 |
| - "identifiedBy": "", |
188 |
| - "identifiedByID": "", |
189 |
| - "dateIdentified": "", |
190 |
| - "identificationReferences": "", |
191 |
| - "identificationVerificationStatus": "", |
192 |
| - "identificationRemarks": "", |
193 |
| - "taxonID": "", |
194 |
| - "scientificNameID": "", |
195 |
| - "acceptedNameUsageID": "", |
196 |
| - "parentNameUsageID": "", |
197 |
| - "originalNameUsageID": "", |
198 |
| - "nameAccordingToID": "", |
199 |
| - "namePublishedInID": "", |
200 |
| - "taxonConceptID": "", |
201 |
| - "scientificName": "Bombus vosnesenskii Radoszkowski, 1862", |
202 |
| - "acceptedNameUsage": "", |
203 |
| - "parentNameUsage": "", |
204 |
| - "originalNameUsage": "", |
205 |
| - "nameAccordingTo": "", |
206 |
| - "namePublishedIn": "", |
207 |
| - "namePublishedInYear": "", |
208 |
| - "higherClassification": "Animalia, Arthropoda, Insecta, Hymenoptera, Apidae, Apinae", |
209 |
| - "kingdom": "Animalia", |
210 |
| - "phylum": "Arthropoda", |
211 |
| - "class": "Insecta", |
212 |
| - "order": "Hymenoptera", |
213 |
| - "family": "Apidae", |
214 |
| - "subfamily": "", |
215 |
| - "genus": "Bombus", |
216 |
| - "genericName": "Bombus", |
217 |
| - "subgenus": "", |
218 |
| - "infragenericEpithet": "", |
219 |
| - "specificEpithet": "vosnesenskii", |
220 |
| - "infraspecificEpithet": "", |
221 |
| - "cultivarEpithet": "", |
222 |
| - "taxonRank": "SPECIES", |
223 |
| - "verbatimTaxonRank": "", |
224 |
| - "vernacularName": "", |
225 |
| - "nomenclaturalCode": "", |
226 |
| - "taxonomicStatus": "ACCEPTED", |
227 |
| - "nomenclaturalStatus": "", |
228 |
| - "taxonRemarks": "", |
229 |
| - "datasetKey": "821cc27a-e3bb-4bc5-ac34-89ada245069d", |
230 |
| - "publishingCountry": "US", |
231 |
| - "lastInterpreted": "2022-07-15T15:28:18.011Z", |
232 |
| - "elevation": "", |
233 |
| - "elevationAccuracy": "", |
234 |
| - "depth": "", |
235 |
| - "depthAccuracy": "", |
236 |
| - "distanceAboveSurface": "", |
237 |
| - "distanceAboveSurfaceAccuracy": "", |
238 |
| - "issue": "", |
239 |
| - "mediaType": "StillImage", |
240 |
| - "hasCoordinate": "false", |
241 |
| - "hasGeospatialIssues": "false", |
242 |
| - "taxonKey": "1340436", |
243 |
| - "acceptedTaxonKey": "1340436", |
244 |
| - "kingdomKey": "1", |
245 |
| - "phylumKey": "54", |
246 |
| - "classKey": "216", |
247 |
| - "orderKey": "1457", |
248 |
| - "familyKey": "4334", |
249 |
| - "genusKey": "1340278", |
250 |
| - "subgenusKey": "", |
251 |
| - "speciesKey": "1340436", |
252 |
| - "species": "Bombus vosnesenskii", |
253 |
| - "acceptedScientificName": "Bombus vosnesenskii Radoszkowski, 1862", |
254 |
| - "verbatimScientificName": "Bombus (Pyrobombus) vosnesenskii", |
255 |
| - "typifiedName": "", |
256 |
| - "protocol": "DWC_ARCHIVE", |
257 |
| - "lastParsed": "2022-07-15T15:28:18.011Z", |
258 |
| - "lastCrawled": "2022-07-15T14:04:11.391Z", |
259 |
| - "repatriated": "", |
260 |
| - "relativeOrganismQuantity": "", |
261 |
| - "level0Gid": "", |
262 |
| - "level0Name": "", |
263 |
| - "level1Gid": "", |
264 |
| - "level1Name": "", |
265 |
| - "level2Gid": "", |
266 |
| - "level2Name": "", |
267 |
| - "level3Gid": "", |
268 |
| - "level3Name": "", |
269 |
| - "iucnRedListCategory": "LC" |
270 |
| - } |
271 |
| -] |
| 16 | +{ |
| 17 | + 'occurrenceID': 'http://n2t.net/ark:/65665/30042e2d8-669d-4520-b456-e3c64203eff8', |
| 18 | + 'catalogNumber': 'USNMENT01732649', |
| 19 | + 'recordedBy': 'R. Craig', |
| 20 | + 'year': '1949', |
| 21 | + 'month': '4', |
| 22 | + 'day': '13', |
| 23 | + 'country': 'United States', |
| 24 | + 'stateProvince': 'California', |
| 25 | + 'county': 'Fresno', |
| 26 | + 'locality': 'Auberry', |
| 27 | + 'decimalLatitude': '37.0808', |
| 28 | + 'decimalLongitude': '-119.485', |
| 29 | + 'identifiedBy': "O'Brien, L. R.", |
| 30 | + 'scientificName': 'Xylocopa (Notoxylocopa) tabaniformis orpifex', |
| 31 | + 'genus': 'Xylocopa', |
| 32 | + 'subgenus': 'Notoxylocopa', |
| 33 | + 'specificEpithet': 'tabaniformis', |
| 34 | + 'infraspecificEpithet': 'orpifex', |
| 35 | + 'scientificNameAuthorship': 'Smith', |
| 36 | + 'accessURI': 'https://ids.si.edu/ids/deliveryService?id=NMNH-USNMENT01732649', |
| 37 | + 'PixelXDimension': 2000, |
| 38 | + 'PixelYDimension': 1212 |
| 39 | +} |
272 | 40 | ```
|
273 | 41 |
|
274 | 42 | ## Data Fields
|
275 | 43 |
|
276 |
| -Fields conform to the Darwin Core data standard and are detailed here: https://dwc.tdwg.org. |
| 44 | +Specimen metadata fields conform to the Darwin Core data standard and are detailed here: https://dwc.tdwg.org. Image metadata fields conform to the Audiovisual Core data standard and are detailed here: https://ac.tdwg.org/. |
277 | 45 |
|
278 | 46 | ## Curation Rationale
|
279 | 47 |
|
280 | 48 | The dataset represents a portion of the U. S. National Entomological Collection. The U.S. National Entomological Collection (USNM) traces its origins in part to the acquisition of the U.S. Department of Agriculture Collection of 138,000 specimens donated in 1885. These specimens became the foundation of one of the world’s largest and most important accessible entomological collections, with over 33 million specimens taken care of by the combined staff of three government agencies: the Smithsonian Institution; the Systematic Entomology Laboratory (Agricultural Research Service, United States Department of Agriculture); and the Walter Reed Biosystematics Unit (Walter Reed Army Institute of Research). The specimens were imaged in a mass-digitization project in collaboration with the Digitization Program Office. The goal was to digitize every Bombus specimen in the collection.
|
281 | 49 |
|
282 | 50 | ## Initial Data Collection and Normalization
|
283 | 51 |
|
284 |
| -Bumblebee specimens were collected over a period of 150 years (earliest specimen dates from 1861, most recent specimen dates from 2011). The specimens were collected by and identified by many different individual researchers over this time. The initial images of 48,000 specimens were taken in a rapid capture project by a dedicated team in 2014 with additional specimen images (14,000) taken in 2018. The labels containing the information on site/location, date of collection, collector, and identifier were removed from the insect pin. The occurrence data were transcribed from the labels by online volunteers and a professional transcription service into Darwin Core fields. Following quality control of the transcribed data by NMNH staff, they were imported into the institutional database (EMu). |
| 52 | +Bumblebee specimens were collected over a period of 150 years (earliest specimen dates from 1807, most recent specimen dates from 2020). The specimens were collected by and identified by many different individual researchers over this time. The initial images of about 49,000 specimens were taken in a rapid capture project by a dedicated team in 2014 with additional specimen images (about 25,000) taken in 2018. The labels containing the information on site/location, date of collection, collector, and identifier were removed from the insect pin. The occurrence data were transcribed from the labels by online volunteers and a professional transcription service into Darwin Core fields. Following quality control of the transcribed data by NMNH staff, they were imported into the institutional database (EMu). |
285 | 53 |
|
286 | 54 | NMNH specimen data get exported to the Global Biodiversity Information Facility (GBIF) on a weekly basis through an installation of an Integrated Publishing Toolkit (IPT, https://collections.nmnh.si.edu/ipt/). Some data transformation takes place within EMu and GBIF likewise normalizes the data to meet their standards.
|
287 | 55 |
|
@@ -319,7 +87,7 @@ Some site/location names could cause harm as they are insensitive or racist towa
|
319 | 87 |
|
320 | 88 | Estimates of species geographic ranges based on these data may not be complete. There are many reasons collectors may collect more frequently from some areas rather than others, including their own taxonomic interests, proximity to collections institutions, accessibility via roads, ability to acquire permits for a specific area, or for geopolitical reasons.
|
321 | 89 |
|
322 |
| -The majority of specimens in this dataset originate from North America and no specimens from Australia and Africa are available. |
| 90 | +The majority of specimens in this dataset originate from North America. |
323 | 91 |
|
324 | 92 | Most specimens are expected to be female, because bumblebees are social insects and it is more common to find female bees.
|
325 | 93 |
|
@@ -351,11 +119,9 @@ Public domain, Creative Commons CC0.
|
351 | 119 |
|
352 | 120 | ## Citation Information
|
353 | 121 |
|
354 |
| -GBIF.org (26 October 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.48yf72 |
| 122 | +Orrell T, Informatics Office (2023). NMNH Extant Specimen Records (USNM, US). Version 1.72. National Museum of Natural History, Smithsonian Institution. Occurrence dataset. https://collections.nmnh.si.edu/ipt/resource?r=nmnh_extant_dwc-a&v=1.72 |
355 | 123 |
|
356 | 124 |
|
357 | 125 | ## Contributions
|
358 | 126 |
|
359 | 127 | Thanks to NMNH for adding this dataset.
|
360 |
| - |
361 |
| - |
0 commit comments