Throughout the realm of doc administration and processing, guaranteeing the textual accuracy of PDF info is paramount. This info will stroll you via a cultured Python script that converts PDF pages to images and makes use of OpenAI’s superior fashions to detect spelling, grammar, and vocabulary errors inside these images.
Sooner than you begin, assure you will have the subsequent dependencies put in:
- PyMuPDF (
fitz
) - OpenAI’s Python shopper library
You could arrange these packages using pip:
pip arrange pymupdf openai
Beneath is the entire Python script designed for this job. It reads a PDF file, converts each net web page into an image, and makes use of OpenAI’s API to check for textual errors throughout the images.
import fitz # PyMuPDF
import base64
from openai import OpenAI
# Set the API key
shopper = OpenAI(api_key="your-api-key-here")
MODEL = "gpt-4o"
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.study()).decode("utf-8")
def pdf_to_images(pdf_path):
doc = fitz.open(pdf_path)
images = []
for page_num in differ(len(doc)):
net web page = doc.load_page(page_num)
pix = net web page.get_pixmap()
image_path = f"page_{page_num + 1}.png"
pix.save(image_path)
images.append(image_path)
return images
def process_image(image_path):
base64_image = encode_image(image_path)
response = shopper.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are an assistant designed to detect spelling, grammar, and vocabulary errors in images."},
{"role": "user", "content": [
{"type": "text", "text": "Please review the following image for any text errors."},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
],
temperature=0.0,
)
reply = response.picks[0].message.content material materials
print(reply)
return reply
def major(pdf_path):
images = pdf_to_images(pdf_path)
responses = []
for image in images:
response = process_image(image)
responses.append(response)
return responses
# Occasion utilization
PDF_PATH = r"C:UsersTheKoDownloadstest.pdf"
responses = major(PDF_PATH)
for page_num, response in enumerate(responses, start=1):
print(f"Response for net web page {page_num}:n{response}n")
- Set Up OpenAI and Import Libraries: Import the required libraries (
fitz
for PyMuPDF,base64
for encoding images, andOpenAI
for API interaction). - API Key: Set your OpenAI API key using the
OpenAI
shopper. - Encode Image: The
encode_image
function reads an image file and encodes it to a base64 string. - Convert PDF to Images: The
pdf_to_images
function opens a PDF file and converts each net web page into an image, saving each image with a sequential filename. - Course of Image: The
process_image
function sends the base64-encoded image to the OpenAI API and requests a check for spelling, grammar, or vocabulary errors. It prints and returns the response from OpenAI. - Vital Function: The
major
function coordinates the conversion of PDF pages to images and processes each image by the use of OpenAI’s API. It collects and prints the responses for each net web page.
Change "your-api-key-here"
collectively together with your exact OpenAI API key and change the PDF_PATH
with the path to your PDF file. Run the script, and it will output any detected errors for each net web page of the PDF.
To make this efficiency accessible by the use of an web browser, we’ll deploy the reply using Flask. Beneath is the entire code to rearrange a Flask web software program for importing PDFs, processing them, and displaying the outcomes.
from flask import Flask, request, render_template, redirect, url_for, jsonify
import os
import fitz # PyMuPDF
import base64
from openai import OpenAI
from werkzeug.utils import secure_filename
# Initialize the OpenAI shopper
MODEL = "gpt-4o"
shopper = OpenAI(api_key="your-api-key-here") # Change collectively together with your exact OpenAI API key
# Initialize Flask app
app = Flask(__name__)
app.config["UPLOAD_FOLDER"] = 'uploads'
app.config["ALLOWED_EXTENSIONS"] = {'pdf'}
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.study()).decode("utf-8")
def pdf_to_images(pdf_path):
doc = fitz.open(pdf_path)
images = []
for page_num in differ(len(doc)):
net web page = doc.load_page(page_num)
pix = net web page.get_pixmap()
image_path = os.path.be part of(app.config['UPLOAD_FOLDER'], f"page_{page_num + 1}.png")
pix.save(image_path)
images.append(image_path)
return images
def process_image(image_path):
base64_image = encode_image(image_path)
response = shopper.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are an assistant designed to detect spelling, grammar, and vocabulary errors in images."},
{"role": "user", "content": [
{"type": "text", "text": "Please review the following image for any text errors."},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
],
temperature=0.0,
)
reply = response.picks[0].message.content material materials
return reply
@app.route('/', methods=['GET', 'POST'])
def upload_file():
if request.approach == 'POST':
if 'file' not in request.info:
return redirect(request.url)
file = request.info['file']
if file.filename == '':
return redirect(request.url)
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file_path = os.path.be part of(app.config['UPLOAD_FOLDER'], filename)
file.save(file_path)
return redirect(url_for('process_pdf', filename=filename))
return render_template('add.html')
@app.route('/course of/<filename>', methods=['GET'])
def process_pdf(filename):
return render_template('outcomes.html', filename=filename)
@app.route('/process_page/<filename>/<int:page_num>', methods=['GET'])
def process_page(filename, page_num):
file_path = os.path.be part of(app.config['UPLOAD_FOLDER'], filename)
images = pdf_to_images(file_path)
if page_num < len(images):
final result = process_image(images[page_num])
next_page = page_num + 1 if page_num + 1 < len(images) else None
return jsonify(success=True, final result=final result, next_page=next_page)
else:
return jsonify(success=False)
if __name__ == '__main__':
app.run(debug=True)
add.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Add PDF</title>
</head>
<physique>
<h1>Add PDF</h1>
<kind approach="publish" enctype="multipart/form-data">
<enter variety="file" establish="file">
<enter variety="submit" value="Add">
</kind>
</physique>
</html>
outcomes.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta establish="viewport" content material materials="width=device-width, initial-scale=1.0">
<title>Processing Outcomes</title>
<trend>
.spinner {
margin: 100px auto;
width: 40px;
peak: 40px;
place: relative;
text-align: coronary heart;
animation: rotate 2.0s infinite linear;
}
.dot1, .dot2 {
width: 60%;
peak: 60%;
present: inline-block;
place: absolute;
excessive: 0;
background-color: #333;
border-radius: 100%;
animation: bounce 2.0s infinite ease-in-out;
}
.dot2 {
excessive: auto;
bottom: 0;
animation-delay: -1.0s;
}
@keyframes rotate { 100% { rework: rotate(360deg); } }
@keyframes bounce {
0%, 100% { rework: scale(0.0) }
50% { rework: scale(1.0) }
}
</trend>
<script>
doc.addEventListener('DOMContentLoaded', (event) => {
const filename = "{{ filename }}";
fetchResults(filename, 0);
});
function fetchResults(filename, pageNum) {
showLoading();
fetch(`/process_page/${filename}/${pageNum}`)
.then(response => response.json())
.then(info => {
hideLoading();
if (info.success) {
doc.getElementById('outcomes').innerHTML += `<p><sturdy>Internet web page ${pageNum + 1}:</sturdy> ${info.final result}</p>`;
if (info.next_page !== null) {
fetchResults(filename, info.next_page);
}
} else {
console.error('Error processing the PDF.');
}
})
.catch(error => {
hideLoading();
console.error('Error:', error);
});
}
function showLoading() {
doc.getElementById('loading').trend.present = 'block';
}
function hideLoading() {
doc.getElementById('loading').trend.present = 'none';
}
</script>
</head>
<physique>
<h1>Processing Outcomes</h1>
<div id="loading" class="spinner" trend="present:none;">
<div class="dot1"></div>
<div class="dot2"></div>
</div>
<div id="outcomes"></div>
</physique>
</html>
With this setup, you might merely add a PDF doc by the use of an web interface, course of each net web page to detect textual errors, and study the results in a user-friendly format.
By integrating these superior utilized sciences, you assure high-quality doc processing and error detection, thus enhancing the overall accuracy and reliability of your paperwork.