One of the most powerful things about building custom GPTs is you can upload knowledge you want it to reference when engaged, such as training manuals, product manuals and specifications, internal policies, frequently asked questions (FAQs), industry research and reports, customer service scripts, company history and culture documents, technical whitepapers, business plans, marketing strategies, financial reports and budgets, sales scripts and techniques, etc.
While this is incredible, we had questions:
- What kind of files can I upload?
- How many files can I upload?
- What are the limits on these files?
- And the #1 question... Does OpenAI use the files I upload to train their models?
Here's some answers I was able to gather for us:
File Information:
- You can upload text documents, spreadsheets, and presentations—pretty much any common file type.
- Up to 20 files can be uploaded for each custom GPT.
- Documents can be up to 512MB each. For spreadsheets, there's no limit on how much information they contain, but the file size still applies.
- Images within documents need to be under 20MB
- Each end-user is capped at 10GB. An error will be displayed if a user cap has been hit.
- All text and document files uploaded to a GPT are capped at 2 million tokens per file. This limitation does not apply to spreadsheets.
What does 2 million tokens mean? On average, one token can be roughly equivalent to around 4 characters or ¾ of a word in English. This is a very rough approximation, as the exact number will vary based on the language and the specific content of your document. Using the average case, 2 million tokens could roughly translate to around 1.5 million words. Again, this is a broad estimate; the actual word count could be higher or lower depending on how tokenization splits your specific text.
- Images in Documents: Right now, you can upload documents with images, but GPTs can't process images within documents (ex. an image in the slide of a slide deck), but OpenAI is working on adding this feature.
Now for the most important question...
Will OpenAI use files uploaded to train its models?
- "The answer depends on the service you are using. As explained in this article, we may use content submitted to ChatGPT, DALL·E, and our other services for individuals to improve model performance. Content may include files that are uploaded. Please refer to this article to understand how content may be used to improve model performance and the choices that users have." - OpenAI
But now for the most important step for business owners or organizations that want to make sure their proprietary data isn't being utilized...
- You can opt out of training through their privacy portal by clicking on “do not train on my content,” or to turn off training for your ChatGPT conversations (you only get this option once you upload a file). Once you opt out, new conversations will not be used to train their models!