doc_curation.pdf

Curate and process pdf files.

doc_curation.pdf.ocr(pdf_path, google_key='/home/vvasuki/gitland/vvasuki-git/sysconf/kunchikA/google/sanskritnlp/service_account_key.json')[source]

OCR some pdf with google drive. Automatically splits into 25 page bits and ocrs them individually.

Parameters

pdf_path

Returns

doc_curation.pdf.split_into_small_pdfs(pdf_path, output_directory=None, start_page=1, end_page=None, small_pdf_pages=25)[source]