Within the realm of doc administration and processing, guaranteeing the textual accuracy of PDF information is paramount. This information will stroll you thru a classy Python script that converts PDF pages to photographs and makes use of OpenAI’s superior fashions to detect spelling, grammar, and vocabulary errors inside these photographs.
Earlier than you start, guarantee you may have the next dependencies put in:
- PyMuPDF (
fitz
) - OpenAI’s Python shopper library
You may set up these packages utilizing pip:
pip set up pymupdf openai
Beneath is the whole Python script designed for this job. It reads a PDF file, converts every web page into a picture, and makes use of OpenAI’s API to test for textual errors within the photographs.
import fitz # PyMuPDF
import base64
from openai import OpenAI
# Set the API key
shopper = OpenAI(api_key="your-api-key-here")
MODEL = "gpt-4o"
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.learn()).decode("utf-8")
def pdf_to_images(pdf_path):
doc = fitz.open(pdf_path)
photographs = []
for page_num in vary(len(doc)):
web page = doc.load_page(page_num)
pix = web page.get_pixmap()
image_path = f"page_{page_num + 1}.png"
pix.save(image_path)
photographs.append(image_path)
return photographs
def process_image(image_path):
base64_image = encode_image(image_path)
response = shopper.chat.completions.create(
mannequin=MODEL,
messages=[
{"role": "system", "content": "You are an assistant designed to detect spelling, grammar, and vocabulary errors in images."},
{"role": "user", "content": [
{"type": "text", "text": "Please review the following image for any text errors."},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
],
temperature=0.0,
)
reply = response.selections[0].message.content material
print(reply)
return reply
def primary(pdf_path):
photographs = pdf_to_images(pdf_path)
responses = []
for picture in photographs:
response = process_image(picture)
responses.append(response)
return responses
# Instance utilization
PDF_PATH = r"C:UsersTheKoDownloadstest.pdf"
responses = primary(PDF_PATH)
for page_num, response in enumerate(responses, begin=1):
print(f"Response for web page {page_num}:n{response}n")
- Set Up OpenAI and Import Libraries: Import the required libraries (
fitz
for PyMuPDF,base64
for encoding photographs, andOpenAI
for API interplay). - API Key: Set your OpenAI API key utilizing the
OpenAI
shopper. - Encode Picture: The
encode_image
operate reads a picture file and encodes it to a base64 string. - Convert PDF to Photographs: The
pdf_to_images
operate opens a PDF file and converts every web page into a picture, saving every picture with a sequential filename. - Course of Picture: The
process_image
operate sends the base64-encoded picture to the OpenAI API and requests a test for spelling, grammar, or vocabulary errors. It prints and returns the response from OpenAI. - Important Operate: The
primary
operate coordinates the conversion of PDF pages to photographs and processes every picture by means of OpenAI’s API. It collects and prints the responses for every web page.
Change "your-api-key-here"
together with your precise OpenAI API key and replace the PDF_PATH
with the trail to your PDF file. Run the script, and it’ll output any detected errors for every web page of the PDF.
To make this performance accessible by way of an internet browser, we will deploy the answer utilizing Flask. Beneath is the whole code to arrange a Flask internet software for importing PDFs, processing them, and displaying the outcomes.
from flask import Flask, request, render_template, redirect, url_for, jsonify
import os
import fitz # PyMuPDF
import base64
from openai import OpenAI
from werkzeug.utils import secure_filename
# Initialize the OpenAI shopper
MODEL = "gpt-4o"
shopper = OpenAI(api_key="your-api-key-here") # Change together with your precise OpenAI API key
# Initialize Flask app
app = Flask(__name__)
app.config["UPLOAD_FOLDER"] = 'uploads'
app.config["ALLOWED_EXTENSIONS"] = {'pdf'}
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].decrease() in app.config['ALLOWED_EXTENSIONS']
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.learn()).decode("utf-8")
def pdf_to_images(pdf_path):
doc = fitz.open(pdf_path)
photographs = []
for page_num in vary(len(doc)):
web page = doc.load_page(page_num)
pix = web page.get_pixmap()
image_path = os.path.be a part of(app.config['UPLOAD_FOLDER'], f"page_{page_num + 1}.png")
pix.save(image_path)
photographs.append(image_path)
return photographs
def process_image(image_path):
base64_image = encode_image(image_path)
response = shopper.chat.completions.create(
mannequin=MODEL,
messages=[
{"role": "system", "content": "You are an assistant designed to detect spelling, grammar, and vocabulary errors in images."},
{"role": "user", "content": [
{"type": "text", "text": "Please review the following image for any text errors."},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
],
temperature=0.0,
)
reply = response.selections[0].message.content material
return reply
@app.route('/', strategies=['GET', 'POST'])
def upload_file():
if request.technique == 'POST':
if 'file' not in request.information:
return redirect(request.url)
file = request.information['file']
if file.filename == '':
return redirect(request.url)
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file_path = os.path.be a part of(app.config['UPLOAD_FOLDER'], filename)
file.save(file_path)
return redirect(url_for('process_pdf', filename=filename))
return render_template('add.html')
@app.route('/course of/<filename>', strategies=['GET'])
def process_pdf(filename):
return render_template('outcomes.html', filename=filename)
@app.route('/process_page/<filename>/<int:page_num>', strategies=['GET'])
def process_page(filename, page_num):
file_path = os.path.be a part of(app.config['UPLOAD_FOLDER'], filename)
photographs = pdf_to_images(file_path)
if page_num < len(photographs):
outcome = process_image(photographs[page_num])
next_page = page_num + 1 if page_num + 1 < len(photographs) else None
return jsonify(success=True, outcome=outcome, next_page=next_page)
else:
return jsonify(success=False)
if __name__ == '__main__':
app.run(debug=True)
add.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Add PDF</title>
</head>
<physique>
<h1>Add PDF</h1>
<type technique="publish" enctype="multipart/form-data">
<enter kind="file" identify="file">
<enter kind="submit" worth="Add">
</type>
</physique>
</html>
outcomes.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta identify="viewport" content material="width=device-width, initial-scale=1.0">
<title>Processing Outcomes</title>
<fashion>
.spinner {
margin: 100px auto;
width: 40px;
peak: 40px;
place: relative;
text-align: heart;
animation: rotate 2.0s infinite linear;
}
.dot1, .dot2 {
width: 60%;
peak: 60%;
show: inline-block;
place: absolute;
high: 0;
background-color: #333;
border-radius: 100%;
animation: bounce 2.0s infinite ease-in-out;
}
.dot2 {
high: auto;
backside: 0;
animation-delay: -1.0s;
}
@keyframes rotate { 100% { rework: rotate(360deg); } }
@keyframes bounce {
0%, 100% { rework: scale(0.0) }
50% { rework: scale(1.0) }
}
</fashion>
<script>
doc.addEventListener('DOMContentLoaded', (occasion) => {
const filename = "{{ filename }}";
fetchResults(filename, 0);
});
operate fetchResults(filename, pageNum) {
showLoading();
fetch(`/process_page/${filename}/${pageNum}`)
.then(response => response.json())
.then(information => {
hideLoading();
if (information.success) {
doc.getElementById('outcomes').innerHTML += `<p><sturdy>Web page ${pageNum + 1}:</sturdy> ${information.outcome}</p>`;
if (information.next_page !== null) {
fetchResults(filename, information.next_page);
}
} else {
console.error('Error processing the PDF.');
}
})
.catch(error => {
hideLoading();
console.error('Error:', error);
});
}
operate showLoading() {
doc.getElementById('loading').fashion.show = 'block';
}
operate hideLoading() {
doc.getElementById('loading').fashion.show = 'none';
}
</script>
</head>
<physique>
<h1>Processing Outcomes</h1>
<div id="loading" class="spinner" fashion="show:none;">
<div class="dot1"></div>
<div class="dot2"></div>
</div>
<div id="outcomes"></div>
</physique>
</html>
With this setup, you may simply add a PDF doc by way of an internet interface, course of every web page to detect textual errors, and examine the leads to a user-friendly format.
By integrating these superior applied sciences, you guarantee high-quality doc processing and error detection, thus enhancing the general accuracy and reliability of your paperwork.