Skip to content

Commit 3a89b59

Browse files
authored
Merge pull request #1 from MikeTrizna/main
Updated bumblebees dataset card
2 parents 5c0801d + ada5549 commit 3a89b59

File tree

1 file changed

+34
-268
lines changed

1 file changed

+34
-268
lines changed

NMNH-Bumblebees.md

+34-268
Original file line numberDiff line numberDiff line change
@@ -1,287 +1,55 @@
11
## Dataset Summary
22

3-
The USNM Bumblebee Dataset is a natural history dataset containing, for each of 62,602 Bumblebee specimens in the genus Bombus, a single image in lateral view and a tab-separated value file with occurrence data. Occurrence data includes the species classification, the date/time and site/location of collection, and other metadata conforming to the Darwin Core data standard (https://dwc.tdwg.org). 10,764 specimens are not identified to species and these specimens are included as Bombus sp.The collecting sites/locations of the majority of specimens (41,008), have been georeferenced. The dataset is worldwide in scope, but is limited to the specimens available in the Smithsonian USNM collection.
3+
The USNM Bumblebee Dataset is a natural history dataset containing, for each of 73,497 Bumblebee specimens in the family Apidae, a single image in lateral or dorsal view and a tab-separated value file with occurrence data. Occurrence data includes the species classification, the date and site/location of collection, and other metadata conforming to the Darwin Core data standard (https://dwc.tdwg.org). 11,421 specimens are not identified to species and these specimens are included as 'Bombus sp.' or 'Xylocopa sp.' The collecting sites/locations of the majority of specimens (55,301), have been georeferenced. The dataset is worldwide in scope, but is limited to the specimens available in the Smithsonian USNM collection.
44

55
## Languages
66

77
English
88

99
## Data Instances
10+
11+
A typical data point comprises of the specimen metadata and image information for a single bumblebee specimen.
12+
13+
An example from the dataset looks as follows:
14+
1015
```json
11-
[
12-
{
13-
"gbifID": "2900487313",
14-
"abstract": "",
15-
"accessRights": "",
16-
"accrualMethod": "",
17-
"accrualPeriodicity": "",
18-
"accrualPolicy": "",
19-
"alternative": "",
20-
"audience": "",
21-
"available": "",
22-
"bibliographicCitation": "",
23-
"conformsTo": "",
24-
"contributor": "",
25-
"coverage": "",
26-
"created": "",
27-
"creator": "",
28-
"date": "",
29-
"dateAccepted": "",
30-
"dateCopyrighted": "",
31-
"dateSubmitted": "",
32-
"description": "",
33-
"educationLevel": "",
34-
"extent": "",
35-
"format": "",
36-
"hasFormat": "",
37-
"hasPart": "",
38-
"hasVersion": "",
39-
"identifier": "http://n2t.net/ark:/65665/3fd7061b8-48b0-4041-a9ea-a5b9aac0e767",
40-
"instructionalMethod": "",
41-
"isFormatOf": "",
42-
"isPartOf": "",
43-
"isReferencedBy": "",
44-
"isReplacedBy": "",
45-
"isRequiredBy": "",
46-
"isVersionOf": "",
47-
"issued": "",
48-
"language": "",
49-
"license": "CC0_1_0",
50-
"mediator": "",
51-
"medium": "",
52-
"modified": "2020-09-24T19:56:00Z",
53-
"provenance": "",
54-
"publisher": "",
55-
"references": "",
56-
"relation": "",
57-
"replaces": "",
58-
"requires": "",
59-
"rights": "",
60-
"rightsHolder": "",
61-
"source": "",
62-
"spatial": "",
63-
"subject": "",
64-
"tableOfContents": "",
65-
"temporal": "",
66-
"title": "",
67-
"type": "PhysicalObject",
68-
"valid": "",
69-
"institutionID": "urn:lsid:biocol.org:col:34871",
70-
"collectionID": "urn:uuid:18e3cd08-a962-4f0a-b72c-9a0b3600c5ad",
71-
"datasetID": "",
72-
"institutionCode": "USNM",
73-
"collectionCode": "ENT",
74-
"datasetName": "NMNH Extant Biology",
75-
"ownerInstitutionCode": "",
76-
"basisOfRecord": "PRESERVED_SPECIMEN",
77-
"informationWithheld": "",
78-
"dataGeneralizations": "",
79-
"dynamicProperties": "",
80-
"occurrenceID": "http://n2t.net/ark:/65665/3fd7061b8-48b0-4041-a9ea-a5b9aac0e767",
81-
"catalogNumber": "USNMENT741814",
82-
"recordNumber": "",
83-
"recordedBy": "",
84-
"recordedByID": "",
85-
"individualCount": "1",
86-
"organismQuantity": "",
87-
"organismQuantityType": "",
88-
"sex": "",
89-
"lifeStage": "Adult",
90-
"reproductiveCondition": "",
91-
"behavior": "",
92-
"establishmentMeans": "",
93-
"degreeOfEstablishment": "",
94-
"pathway": "",
95-
"georeferenceVerificationStatus": "",
96-
"occurrenceStatus": "PRESENT",
97-
"preparations": "Pinned",
98-
"disposition": "",
99-
"associatedOccurrences": "",
100-
"associatedReferences": "",
101-
"associatedSequences": "",
102-
"associatedTaxa": "",
103-
"otherCatalogNumbers": "",
104-
"occurrenceRemarks": "EMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization project.",
105-
"organismID": "",
106-
"organismName": "",
107-
"organismScope": "",
108-
"associatedOrganisms": "",
109-
"previousIdentifications": "",
110-
"organismRemarks": "",
111-
"materialSampleID": "",
112-
"eventID": "",
113-
"parentEventID": "",
114-
"fieldNumber": "",
115-
"eventDate": "",
116-
"eventTime": "",
117-
"startDayOfYear": "",
118-
"endDayOfYear": "",
119-
"year": "",
120-
"month": "",
121-
"day": "",
122-
"verbatimEventDate": "",
123-
"habitat": "",
124-
"samplingProtocol": "",
125-
"sampleSizeValue": "",
126-
"sampleSizeUnit": "",
127-
"samplingEffort": "",
128-
"fieldNotes": "",
129-
"eventRemarks": "",
130-
"locationID": "",
131-
"higherGeographyID": "",
132-
"higherGeography": "",
133-
"continent": "",
134-
"waterBody": "",
135-
"islandGroup": "",
136-
"island": "",
137-
"countryCode": "",
138-
"stateProvince": "",
139-
"county": "",
140-
"municipality": "",
141-
"locality": "",
142-
"verbatimLocality": "",
143-
"verbatimElevation": "",
144-
"verticalDatum": "",
145-
"verbatimDepth": "",
146-
"minimumDistanceAboveSurfaceInMeters": "",
147-
"maximumDistanceAboveSurfaceInMeters": "",
148-
"locationAccordingTo": "",
149-
"locationRemarks": "",
150-
"decimalLatitude": "",
151-
"decimalLongitude": "",
152-
"coordinateUncertaintyInMeters": "",
153-
"coordinatePrecision": "",
154-
"pointRadiusSpatialFit": "",
155-
"verbatimCoordinateSystem": "",
156-
"verbatimSRS": "",
157-
"footprintWKT": "",
158-
"footprintSRS": "",
159-
"footprintSpatialFit": "",
160-
"georeferencedBy": "",
161-
"georeferencedDate": "",
162-
"georeferenceProtocol": "",
163-
"georeferenceSources": "",
164-
"georeferenceRemarks": "",
165-
"geologicalContextID": "",
166-
"earliestEonOrLowestEonothem": "",
167-
"latestEonOrHighestEonothem": "",
168-
"earliestEraOrLowestErathem": "",
169-
"latestEraOrHighestErathem": "",
170-
"earliestPeriodOrLowestSystem": "",
171-
"latestPeriodOrHighestSystem": "",
172-
"earliestEpochOrLowestSeries": "",
173-
"latestEpochOrHighestSeries": "",
174-
"earliestAgeOrLowestStage": "",
175-
"latestAgeOrHighestStage": "",
176-
"lowestBiostratigraphicZone": "",
177-
"highestBiostratigraphicZone": "",
178-
"lithostratigraphicTerms": "",
179-
"group": "",
180-
"formation": "",
181-
"member": "",
182-
"bed": "",
183-
"identificationID": "",
184-
"verbatimIdentification": "",
185-
"identificationQualifier": "",
186-
"typeStatus": "",
187-
"identifiedBy": "",
188-
"identifiedByID": "",
189-
"dateIdentified": "",
190-
"identificationReferences": "",
191-
"identificationVerificationStatus": "",
192-
"identificationRemarks": "",
193-
"taxonID": "",
194-
"scientificNameID": "",
195-
"acceptedNameUsageID": "",
196-
"parentNameUsageID": "",
197-
"originalNameUsageID": "",
198-
"nameAccordingToID": "",
199-
"namePublishedInID": "",
200-
"taxonConceptID": "",
201-
"scientificName": "Bombus vosnesenskii Radoszkowski, 1862",
202-
"acceptedNameUsage": "",
203-
"parentNameUsage": "",
204-
"originalNameUsage": "",
205-
"nameAccordingTo": "",
206-
"namePublishedIn": "",
207-
"namePublishedInYear": "",
208-
"higherClassification": "Animalia, Arthropoda, Insecta, Hymenoptera, Apidae, Apinae",
209-
"kingdom": "Animalia",
210-
"phylum": "Arthropoda",
211-
"class": "Insecta",
212-
"order": "Hymenoptera",
213-
"family": "Apidae",
214-
"subfamily": "",
215-
"genus": "Bombus",
216-
"genericName": "Bombus",
217-
"subgenus": "",
218-
"infragenericEpithet": "",
219-
"specificEpithet": "vosnesenskii",
220-
"infraspecificEpithet": "",
221-
"cultivarEpithet": "",
222-
"taxonRank": "SPECIES",
223-
"verbatimTaxonRank": "",
224-
"vernacularName": "",
225-
"nomenclaturalCode": "",
226-
"taxonomicStatus": "ACCEPTED",
227-
"nomenclaturalStatus": "",
228-
"taxonRemarks": "",
229-
"datasetKey": "821cc27a-e3bb-4bc5-ac34-89ada245069d",
230-
"publishingCountry": "US",
231-
"lastInterpreted": "2022-07-15T15:28:18.011Z",
232-
"elevation": "",
233-
"elevationAccuracy": "",
234-
"depth": "",
235-
"depthAccuracy": "",
236-
"distanceAboveSurface": "",
237-
"distanceAboveSurfaceAccuracy": "",
238-
"issue": "",
239-
"mediaType": "StillImage",
240-
"hasCoordinate": "false",
241-
"hasGeospatialIssues": "false",
242-
"taxonKey": "1340436",
243-
"acceptedTaxonKey": "1340436",
244-
"kingdomKey": "1",
245-
"phylumKey": "54",
246-
"classKey": "216",
247-
"orderKey": "1457",
248-
"familyKey": "4334",
249-
"genusKey": "1340278",
250-
"subgenusKey": "",
251-
"speciesKey": "1340436",
252-
"species": "Bombus vosnesenskii",
253-
"acceptedScientificName": "Bombus vosnesenskii Radoszkowski, 1862",
254-
"verbatimScientificName": "Bombus (Pyrobombus) vosnesenskii",
255-
"typifiedName": "",
256-
"protocol": "DWC_ARCHIVE",
257-
"lastParsed": "2022-07-15T15:28:18.011Z",
258-
"lastCrawled": "2022-07-15T14:04:11.391Z",
259-
"repatriated": "",
260-
"relativeOrganismQuantity": "",
261-
"level0Gid": "",
262-
"level0Name": "",
263-
"level1Gid": "",
264-
"level1Name": "",
265-
"level2Gid": "",
266-
"level2Name": "",
267-
"level3Gid": "",
268-
"level3Name": "",
269-
"iucnRedListCategory": "LC"
270-
}
271-
]
16+
{
17+
'occurrenceID': 'http://n2t.net/ark:/65665/30042e2d8-669d-4520-b456-e3c64203eff8',
18+
'catalogNumber': 'USNMENT01732649',
19+
'recordedBy': 'R. Craig',
20+
'year': '1949',
21+
'month': '4',
22+
'day': '13',
23+
'country': 'United States',
24+
'stateProvince': 'California',
25+
'county': 'Fresno',
26+
'locality': 'Auberry',
27+
'decimalLatitude': '37.0808',
28+
'decimalLongitude': '-119.485',
29+
'identifiedBy': "O'Brien, L. R.",
30+
'scientificName': 'Xylocopa (Notoxylocopa) tabaniformis orpifex',
31+
'genus': 'Xylocopa',
32+
'subgenus': 'Notoxylocopa',
33+
'specificEpithet': 'tabaniformis',
34+
'infraspecificEpithet': 'orpifex',
35+
'scientificNameAuthorship': 'Smith',
36+
'accessURI': 'https://ids.si.edu/ids/deliveryService?id=NMNH-USNMENT01732649',
37+
'PixelXDimension': 2000,
38+
'PixelYDimension': 1212
39+
}
27240
```
27341

27442
## Data Fields
27543

276-
Fields conform to the Darwin Core data standard and are detailed here: https://dwc.tdwg.org.
44+
Specimen metadata fields conform to the Darwin Core data standard and are detailed here: https://dwc.tdwg.org. Image metadata fields conform to the Audiovisual Core data standard and are detailed here: https://ac.tdwg.org/.
27745

27846
## Curation Rationale
27947

28048
The dataset represents a portion of the U. S. National Entomological Collection. The U.S. National Entomological Collection (USNM) traces its origins in part to the acquisition of the U.S. Department of Agriculture Collection of 138,000 specimens donated in 1885. These specimens became the foundation of one of the world’s largest and most important accessible entomological collections, with over 33 million specimens taken care of by the combined staff of three government agencies: the Smithsonian Institution; the Systematic Entomology Laboratory (Agricultural Research Service, United States Department of Agriculture); and the Walter Reed Biosystematics Unit (Walter Reed Army Institute of Research). The specimens were imaged in a mass-digitization project in collaboration with the Digitization Program Office. The goal was to digitize every Bombus specimen in the collection.
28149

28250
## Initial Data Collection and Normalization
28351

284-
Bumblebee specimens were collected over a period of 150 years (earliest specimen dates from 1861, most recent specimen dates from 2011). The specimens were collected by and identified by many different individual researchers over this time. The initial images of 48,000 specimens were taken in a rapid capture project by a dedicated team in 2014 with additional specimen images (14,000) taken in 2018. The labels containing the information on site/location, date of collection, collector, and identifier were removed from the insect pin. The occurrence data were transcribed from the labels by online volunteers and a professional transcription service into Darwin Core fields. Following quality control of the transcribed data by NMNH staff, they were imported into the institutional database (EMu).
52+
Bumblebee specimens were collected over a period of 150 years (earliest specimen dates from 1807, most recent specimen dates from 2020). The specimens were collected by and identified by many different individual researchers over this time. The initial images of about 49,000 specimens were taken in a rapid capture project by a dedicated team in 2014 with additional specimen images (about 25,000) taken in 2018. The labels containing the information on site/location, date of collection, collector, and identifier were removed from the insect pin. The occurrence data were transcribed from the labels by online volunteers and a professional transcription service into Darwin Core fields. Following quality control of the transcribed data by NMNH staff, they were imported into the institutional database (EMu).
28553

28654
NMNH specimen data get exported to the Global Biodiversity Information Facility (GBIF) on a weekly basis through an installation of an Integrated Publishing Toolkit (IPT, https://collections.nmnh.si.edu/ipt/). Some data transformation takes place within EMu and GBIF likewise normalizes the data to meet their standards.
28755

@@ -319,7 +87,7 @@ Some site/location names could cause harm as they are insensitive or racist towa
31987

32088
Estimates of species geographic ranges based on these data may not be complete. There are many reasons collectors may collect more frequently from some areas rather than others, including their own taxonomic interests, proximity to collections institutions, accessibility via roads, ability to acquire permits for a specific area, or for geopolitical reasons.
32189

322-
The majority of specimens in this dataset originate from North America and no specimens from Australia and Africa are available.
90+
The majority of specimens in this dataset originate from North America.
32391

32492
Most specimens are expected to be female, because bumblebees are social insects and it is more common to find female bees.
32593

@@ -351,11 +119,9 @@ Public domain, Creative Commons CC0.
351119

352120
## Citation Information
353121

354-
GBIF.org (26 October 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.48yf72
122+
Orrell T, Informatics Office (2023). NMNH Extant Specimen Records (USNM, US). Version 1.72. National Museum of Natural History, Smithsonian Institution. Occurrence dataset. https://collections.nmnh.si.edu/ipt/resource?r=nmnh_extant_dwc-a&v=1.72
355123

356124

357125
## Contributions
358126

359127
Thanks to NMNH for adding this dataset.
360-
361-

0 commit comments

Comments
 (0)