The dataset is comprised of direct transcriptions of the joke cards in Phyllis Diller’s gag file, which is part of the collections of the National Museum of American History (NMAH). Diller used a gag file to organize her material, which consisted of a steel cabinet with 48 drawers (along with a 3 drawer expansion) containing over 52,000 3-by-5 inch index cards, each holding a typewritten joke or gag. These index cards are organized alphabetically by subject, ranging from accessories to world affairs and covering almost everything in between. Each joke card generally features a category, author, date, and joke. All text is in English. The transcription process recorded the full contents of Diller’s gag file in a searchable dataset and was completed by Smithsonian Digital Volunteers in March 2017. https://transcription.si.edu/project/8927
American English (en-US).
The dataset is available for download in CSV, TXT, or XML format. The JSON example below is converted from the XML export.
{
"transcriptionProject": {
"project": {
"projectMetadata": "edanmdm:nmah_1218385",
"projectImage": "NMAH-AHB2016q100006",
"projectImportQuery": "/Public_Sets/NMAH/NMAH-CIS/2003_0289_01_01/Drawer01/01",
"projectDeletedBy": 0,
"isProjectDeleted": 0,
"projectAdminNotes": "",
"isMediaTranscription": 0,
"assets": {
"asset": [
{
"assetImage": "NMAH-AHB2016q100001",
"transcription": {
"tl1_text": "ACCIDENT\nPhyllis Gag\nJUL/1964\nMother's driving -- Model T up the hill on 3 tires -- backing in filling station -- Hoover in car -- hit from back -- Dad in ditch passing on wrong side."
},
"assetName": "NMAH-AHB2016q100001",
"templateId": 1,
"assetStatus": 10
},
{
"assetImage": "NMAH-AHB2016q100002",
"transcription": {
"tl1_text": "ACCIDENT\nPhyllis Gag\nJUL/1964\nDangerous living is not new to me, you know -- I drive."
},
"assetName": "NMAH-AHB2016q100002",
"templateId": 1,
"assetStatus": 10
},
{
"assetImage": "NMAH-AHB2016q100003",
"transcription": {
"tl1_text": "Accidents\nBill Guschl\n19/DEC/1964\nFang came home late the other night and ran into the garage - Lucky he didn't have the car."
},
"assetName": "NMAH-AHB2016q100003",
"templateId": 1,
"assetStatus": 10
}
assetImage: Text; Describes the unique digital asset number of the joke card image to be transcribed; Identical to assetName
tl1_text: Input field; Text; direct transcription of joke category; author; date; joke text
assetName: Text; Describes the NMAH unique digital asset number of the joke card image to be transcribed; Identical to assetImage
External funding, intended for the digitization and subsequent transcription of all of the cards in the Phyllis Diller gag file, allowed for a deep dive into the joke cards which hadn’t been fully explored before. We were interested in capturing the specific information on each card (category, author, date, and joke), and not generally interested in the specific formatting of each card. The four pieces of information [joke category; author; date; joke text] were requested to be transcribed in that specific order and on separate lines. We were interested in documenting Diller’s joke writing process, so, if text had been edited to change a joke, then we requested that process be recorded, but we were not interested in whether text was underlined, handwritten, or stamped. Even though each card was not exactly alike, we attempted to keep the formatting for each transcription similar so that the data could be sorted in multiple ways for later uses. The specific instructions for the gag file transcriptions can be found here: https://transcription.si.edu/phyllis-diller-cards
The text on the joke cards in Phyllis Diller’s gag file, transcribed by virtual volunteers.
Each joke card was transcribed by a virtual volunteer using these instructions: https://transcription.si.edu/phyllis-diller-cards Each transcription was then approved by an NMAH staff member to assure accuracy and adherence to requested formatting.
The data were created by humans who volunteered with SI’s Transcription Center. After 2 days of the project there were contributors from 65 countries, including Australia, Germany, Argentina, Latvia, Japan, US, UK, Canada, and New Zealand. 1263 unique volunteers contributed to the project overall (of which 1 is “anonymous” which includes anyone who didn’t create an account). Through this project, NMAH gained over 1140 new volunteers, meaning only about 120 or so of the Diller volunteers were already existing volunteers with the SI Transcription Center. There are dozens or even hundreds of people represented in the data. These people include the authors of the jokes as well as the subjects of the jokes. The subjects are generally celebrities or politicians popular during the 1960s to 1990s.
Authors of the jokes, if known, are listed with first and last names. Some author addresses are listed on cards, but those cards are not included in the publicly available dataset. Celebrities, politicians, and public figures are also represented in the joke cards, often by first and last name. There are many identity categories mentioned in the data, but none come directly from the person to whom they refer. These categories are attributed by Phyllis Diller or the joke’s author as means of comedy.
The dataset of Phyllis Diller’s gag file spans the mid-1960s to the mid-1990s and captures what was at the front of Americans’ minds during that time. The jokes feature U.S. presidents and politicians in addition to celebrities and other public figures. Topics such as inflation, the Vietnam War, feminism, and more are covered by Diller’s jokes which show how these topics were addressed at different periods throughout history. Because the data dates from the 1960s to the 1990s, some content includes outdated and often culturally insensitive or offensive views. We have chosen to leave the content as it was created for the benefit of research from the primary source.
The dataset is of material originally performed by (and mostly written by) a wealthy, white woman during the 2nd half of the 20th century. This obviously leads to many biases. We believe leaving the material as originally written provides a useful research opportunity to engage with primary source material, despite the potentially culturally insensitive or offensive views it may include.
The dataset stems from sources originally written or purchased by Phyllis Diller and then performed by Phyllis Diller. The dataset was created by transcribing this material exactly. These transcriptions were created by virtual volunteers of the SI Transcription Center. All transcriptions were then edited and approved by NMAH Project Assistant Hanna BredenbeckCorp. Funding for the original digitization of the Phyllis Diller gag file was provided by Mike Wilkins and Sheila Duignan.
These data are provided under the Smithsonian Terms of Use (https://www.si.edu/termsofuse).
Thanks to NMAH for adding this dataset.