Files and Images

Language models work with text, but users want to share documents and images. The SDK bridges this gap by automatically extracting text from files before sending them as context.

Document Processing

The SDK extracts text from PDF (all pages), Word documents (raw text), Excel spreadsheets (structured JSON with sheet names), and ZIP archives (recursively processing files inside). Processing happens automatically when you attach files to a message. The extracted text is sent as context to the model, while original metadata is preserved for your UI.

For more control over individual file types, see usePdf and useOCR. To manage file attachments directly, use useFiles.

Images

Images are sent directly to vision models without text extraction. Models can identify objects, read text in images, understand charts, and answer questions about visual content. If you need text extracted from images specifically, use the OCR utility separately.

With Chat

useChatStorage handles file processing automatically when you send messages with attachments. You can configure processing behavior:


const { sendMessage } = useChatStorage({
  database,
  getToken,
  fileProcessingOptions: {
    maxFileSizeBytes: 10 * 1024 * 1024, // 10MB
    keepOriginalFiles: true,
    onProgress: (current, total) => setProgress(current / total),
  },
});

Generated Content

When models generate images through the image generation tool, the SDK downloads them automatically, stores them encrypted locally, and persists them in conversation history. Temporary API URLs are replaced with permanent local storage, so images remain available even after the original URLs expire.