diff --git a/en/python-net/advanced-operations/working-with-documents/working-with-layers/_index.md b/en/python-net/advanced-operations/working-with-documents/working-with-layers/_index.md index e54be37dac..a29ec70165 100644 --- a/en/python-net/advanced-operations/working-with-documents/working-with-layers/_index.md +++ b/en/python-net/advanced-operations/working-with-documents/working-with-layers/_index.md @@ -3,7 +3,7 @@ title: Work with PDF layers using Python linktitle: Work with PDF layers type: docs weight: 50 -url: /net/working-with-pdf-layers/ +url: /python-net/working-with-pdf-layers/ description: The next task explains how to lock a PDF layer, extract PDF layer elements, flatten a layered PDF, and merge all layers inside PDF into one. lastmod: "2025-09-17" sitemap: diff --git a/en/python-net/advanced-operations/working-with-images/_index.md b/en/python-net/advanced-operations/working-with-images/_index.md index 168c987377..a1170df205 100644 --- a/en/python-net/advanced-operations/working-with-images/_index.md +++ b/en/python-net/advanced-operations/working-with-images/_index.md @@ -5,7 +5,7 @@ type: docs weight: 40 url: /python-net/working-with-images/ description: This section describes the features of working with images in a PDF file using Python library. -lastmod: "2025-02-27" +lastmod: "2025-09-27" sitemap: changefreq: "monthly" priority: 0.7 @@ -23,4 +23,6 @@ You are able to do the following: - [Add Image to Existing PDF File](/pdf/python-net/add-image-to-existing-pdf-file/) - add images and references of a single image in PDF document, after that control quality. - [Delete Images from PDF File](/pdf/python-net/delete-images-from-pdf-file/) - check code snippet for deleting images from PDF file. - [Extract Images from PDF File](/pdf/python-net/extract-images-from-pdf-file/) - the next article shows how to extract images from PDF file using Python library. +- [Search and Get Images from PDF Document](/pdf/python-net/search-and-get-images-from-pdf-document/) - you can get an image from an individual page and search among images on all pages with Python. +- [Replace Image in Existing PDF File](/pdf/python-net/replace-image-in-existing-pdf-file/) - check our code snippet, it shows you how to replace an image in a PDF file. diff --git a/en/python-net/advanced-operations/working-with-images/add-image-to-existing-pdf-file/_index.md b/en/python-net/advanced-operations/working-with-images/add-image-to-existing-pdf-file/_index.md index 9e1a1e4f49..b25d107ef1 100644 --- a/en/python-net/advanced-operations/working-with-images/add-image-to-existing-pdf-file/_index.md +++ b/en/python-net/advanced-operations/working-with-images/add-image-to-existing-pdf-file/_index.md @@ -5,57 +5,158 @@ type: docs weight: 10 url: /python-net/add-image-to-existing-pdf-file/ description: This section describes how to add image to existing PDF file using Python library. -lastmod: "2025-02-27" +lastmod: "2025-09-27" TechArticle: true -AlternativeHeadline: How to add images into PDf using Python +AlternativeHeadline: How to add images into PDF using Python Abstract: This article provides guidance on adding images to existing PDF files using Python with the Aspose.PDF library. Two methods are outlined for achieving this. The first method involves using the `Document` class from Aspose.PDF, where the user loads the PDF, specifies the page number, and uses the `add_image` method of the `Page` class to position the image. The document is then saved using the `save()` method. The second method utilizes the `PdfFileMend` class from the Aspose.PDF.Facades namespace, which offers a simpler interface. Here, the `add_image()` method is invoked to add the image to the specified page and coordinates, followed by saving the updated PDF and closing the `PdfFileMend` object. Code snippets are provided for both methods to demonstrate the process. --- ## Add Image in an Existing PDF File -The following code snippet shows how to add image in the PDF file. - -1. Load the input PDF file. -1. Specify the page number on which the picture will be placed. -1. To define the position of the image on the page call the [Page](https://reference.aspose.com/pdf/python-net/aspose.pdf/page/) class [add_image](https://reference.aspose.com/pdf/python-net/aspose.pdf/page/#methods) method. -1. Call the [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) class [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method. +This example demonstrates how to insert an image into a specific position on a PDF page using Aspose.PDF for Python via .NET. +1. Load the PDF document with 'ap.Document'. +1. Select the target page '(document.pages[1]' - the first page). +1. Use 'page.add_image()' to place the image: + - File path of the image. + - A 'Rectangle' object defining the image’s coordinates (left=20, bottom=730, right=120, top=830). +1. Save the updated PDF. ```python import aspose.pdf as ap + from io import FileIO + from os import path + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + page = document.pages[1] + page.add_image( + path.join(self.data_dir, image_file), + ap.Rectangle(20, 730, 120, 830, True), + ) + document.save(path_outfile) +``` - # Open document - document = ap.Document(input_file) - - document.pages[1].add_image(image_file, ap.Rectangle(20, 730, 120, 830, True)) +## Add an Image Using Operators - document.save(output_pdf) -``` +Next code snippet shows a low-level approach to adding an image to a PDF page by manually working with PDF operators rather than high-level helper methods. -## Add Image in an Existing PDF File (Facades) +Steps: -There is also an alternative, easier way to add a Image to a PDF file. You can use [AddImage](https://reference.aspose.com/pdf/python-net/aspose.pdf.facades/pdffilemend/methods/addimage/index) method of the [PdfFileMend](https://reference.aspose.com/pdf/python-net/aspose.pdf.facades/pdffilemend/) class. The [add_image()](https://reference.aspose.com/pdf/python-net/aspose.pdf.facades/pdffilemend/#methods) method requires the image to be added, the page number at which the image needs to be added and the coordinate information. After that, save the updated PDF file, and close the PdfFileMend object using [close()](https://reference.aspose.com/pdf/python-net/aspose.pdf.facades/pdffilemend/#methods) method. The following code snippet shows you how to add image in an existing PDF file. +1. Create a new blank 'Document'. +1. Add a page and set its size (842 × 595 - landscape A4). +1. Access the page’s image resources (page.resources.images). +1. Load the image file into a stream and add it to the resources. + - The method returns an 'image_id'. + - The newly added image object is retrieved from the resources. +1. Define a rectangle that maintains the aspect ratio of the image: +1. Build an operator sequence: + - 'GSave()' - Save the current graphics state. + - 'ConcatenateMatrix(matrix)' - Apply transformation (scale and center the image vertically on the page). + - 'Do(image_id)' - Render the image. + - 'GRestore()' - Restore graphics state. +1. Add the operator sequence to 'page.contents'. +1. Save the resulting PDF. ```python import aspose.pdf as ap + from io import FileIO + from os import path + + path_infile = path.join(self.data_dir, image_file) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document() + page = document.pages.add() + page.set_page_size(842,595) + + # Get page resources + resources_images = page.resources.images + + # Add image to resources + image_stream = FileIO(path.join(self.data_dir, path_infile), "rb") + image_id = resources_images.add(image_stream) - # Open document - mender = ap.facades.PdfFileMend() + x_image = list(resources_images)[-1] - # Create PdfFileMend object to add text - mender.bind_pdf(input_file) + rectangle = ap.Rectangle( + 0, + 0, + page.media_box.width, + (page.media_box.width * x_image.height) / x_image.width, + True, + ) - # Add image in the PDF file - mender.add_image(image_file, 1, 100.0, 600.0, 200.0, 700.0) + # Create operator sequence for adding image + operators = [] - # Save changes - mender.save(output_pdf) + # Save graphics state + operators.append(ap.operators.GSave()) - # Close PdfFileMend object - mender.close() + # Set transformation matrix (position and size) + matrix = ap.Matrix( + rectangle.urx - rectangle.llx, + 0, + 0, + rectangle.ury - rectangle.lly, + rectangle.llx, + rectangle.llx + (page.media_box.height - rectangle.height) / 2, + ) + operators.append(ap.operators.ConcatenateMatrix(matrix)) + # Draw the image + operators.append(ap.operators.Do(image_id)) + + # Restore graphics state + operators.append(ap.operators.GRestore()) + + # Add operators to page contents + page.contents.add(operators) + + document.save(path_outfile) ``` +## Add Image with Alternative Text + +This example shows how to add an image to a PDF page and assign alternative text (alt text) for accessibility compliance (such as PDF/UA). +1. Create a new 'Document' and add a page (842 × 595, landscape A4). +1. Place the image on the page using 'page.add_image()' with a rectangle that spans the full page. +1. Access the page’s image resources ('page.resources.images'). +1. Define an alternative text string (e.g., 'Alternative text for image'). +1. Retrieve the first image object from resources ('x_image = resources_images[1]'). +1. Use 'try_set_alternative_text(alt_text, page)' to assign alt text to the image. +1. Save the resulting PDF. + +```python + + import aspose.pdf as ap + from io import FileIO + from os import path + + path_image_file = path.join(self.data_dir, image_file) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document() + page = document.pages.add() + page.set_page_size(842,595) + + page.add_image( + path_image_file, + ap.Rectangle(0, 0, 842, 595, True), + ) + + resources_images = page.resources.images + alt_text = "Alternative text for image" + x_image = resources_images[1] + result = x_image.try_set_alternative_text(alt_text, page) + + # If set is successful, then get the alternative text for the image + if (result): + print ("Text has been added successfuly") + document.save(path_outfile) +``` \ No newline at end of file diff --git a/en/python-net/advanced-operations/working-with-images/delete-images-from-pdf-file/_index.md b/en/python-net/advanced-operations/working-with-images/delete-images-from-pdf-file/_index.md index a75b3cb984..6dd9c4aa2d 100644 --- a/en/python-net/advanced-operations/working-with-images/delete-images-from-pdf-file/_index.md +++ b/en/python-net/advanced-operations/working-with-images/delete-images-from-pdf-file/_index.md @@ -5,18 +5,16 @@ type: docs weight: 20 url: /python-net/delete-images-from-pdf-file/ description: This section explain how to delete Images from PDF File using Aspose.PDF for Python via .NET. -lastmod: "2025-02-27" +lastmod: "2025-09-27" TechArticle: true AlternativeHeadline: How to remove images from PDF using Python Abstract: The article discusses the various reasons for removing images from PDF files, such as protecting privacy, preventing unauthorized access to sensitive information, reducing file size for easier sharing and storage, and preparing the document for compression or text extraction. It introduces **Aspose.PDF for Python via .NET** as a tool to accomplish this task. The article provides step-by-step instructions and code snippets for deleting specific images or all images from a PDF file using Aspose.PDF. The process involves opening an existing PDF document, deleting images either individually or in bulk, and saving the updated file. The provided Python code demonstrates how to remove images by accessing the document's resources and modifying the desired pages. --- There are many reasons for removing all or specific images from PDFs. - Sometimes a PDF file may contain important images that need to be removed to protect privacy or prevent unauthorized access to certain information. Removing unwanted or redundant images can help reduce file size, making it easier to share or store PDFs. - If necessary, you can reduce the number of pages by removing all images from the document. Also, deleting images from the document will help prepare the PDF for compression or extraction of text information. @@ -35,33 +33,12 @@ The following code snippet shows how to delete an image from a PDF file. ```python import aspose.pdf as ap + from os import path - # Open document - document = ap.Document(input_file) - - # Delete particular image - document.pages[2].resources.images.delete(1) - - # Save updated PDF file - document.save(output_pdf) -``` - -## Delete all images from input PDF - -```python - - import aspose.pdf as ap - - # Open document - document = ap.Document(input_file) - - # Delete all images on all pages - for i in range(len(document.pages)): - while len(document.pages[i + 1].resources.images) != 0: - document.pages[i + 1].resources.images.delete(1) - - # Save updated PDF file - document.save(output_file) -``` - + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, outfile) + document = ap.Document(path_infile) + document.pages[1].resources.images.delete(1) + document.save(path_outfile) +``` \ No newline at end of file diff --git a/en/python-net/advanced-operations/working-with-images/extract-images-from-pdf-file/_index.md b/en/python-net/advanced-operations/working-with-images/extract-images-from-pdf-file/_index.md index 5855343bac..f5e5d53eb1 100644 --- a/en/python-net/advanced-operations/working-with-images/extract-images-from-pdf-file/_index.md +++ b/en/python-net/advanced-operations/working-with-images/extract-images-from-pdf-file/_index.md @@ -5,31 +5,146 @@ type: docs weight: 30 url: /python-net/extract-images-from-pdf-file/ description: This section shows how to extract images from PDF file using Python library. -lastmod: "2025-02-27" +lastmod: "2025-09-27" TechArticle: true -AlternativeHeadline: Get images from PDF with Python +AlternativeHeadline: Extract images from PDF with Python Abstract: This article discusses the process of extracting images from PDF files using Aspose.PDF for Python. It highlights the utility of separating images for purposes such as management, archiving, analysis, or sharing. The article explains that images within a PDF are stored in each page's resources collection, specifically within the XImage collection. To extract an image, users can access a particular page and retrieve the image using its index from the Images collection. The XImage object returned by the index provides a `save()` method to save the extracted image. A code snippet is provided to demonstrate the steps required to open a PDF document, extract a specific image from the second page using its index, and save it to a file. --- Do you need to separate images from your PDF files? For simplified management, archiving, analysis, or sharing images of your documents, use **Aspose.PDF for Python** and extract images from PDF files. -Images are held in each page's [resources](https://reference.aspose.com/pdf/python-net/aspose.pdf/page/#properties) collection's [XImage](https://reference.aspose.com/pdf/python-net/aspose.pdf/ximagecollection/) collection. To extract a particular page, then get the image from the Images collection using the particular index of the image. +1. Load the PDF document with 'ap.Document()'. +1. Access the desired page of the document (document.pages[1]). +1. Select the image from the page resources (for example, resources.images[1]). +1. Create an output stream (FileIO) for the target file. +1. Save the extracted image using 'xImage.save(output_image)'. -The image's index returns an [XImage](https://reference.aspose.com/pdf/python-net/aspose.pdf/ximage/) object. This object provides a [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method which can be used to save the extracted image. The following code snippet shows how to extract images from a PDF file. +```python + + import aspose.pdf as ap + from io import FileIO + from os import path + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, outfile) + + document = ap.Document(path_infile) + xImage = document.pages[1].resources.images[1] + with FileIO(path_outfile, "w") as output_image: + xImage.save(output_image) +``` + +## Extract Images from Specific Region in PDF + +This example extracts images located within a specified rectangular region on a PDF page and saves them as separate files. + +1. Load the PDF document using 'ap.Document'. +1. Create an 'ImagePlacementAbsorber' to collect all images on the first page. +1. Call 'document.pages[1].accept(absorber)' to analyze image placements. +1. Iterate through all images in 'absorber.image_placements': + - Get the image bounding box (llx, lly, urx, ury). + - Check if both corners of the image rectangle fall inside the target rectangle (rectangle.contains()). + - If true, save the image to a file using FileIO, replacing 'index' in the filename with a sequential number. +1. Increment the index for each saved image. ```python import aspose.pdf as ap + from io import FileIO + from os import path - # Open document - document = ap.Document(input_file) + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, outfile) - # Extract a particular image - xImage = document.pages[2].resources.images[1] - outputImage = io.FileIO(output_image, "w") + rectangle = ap.Rectangle(0, 0, 590, 590, True) - # Save output image - xImage.save(outputImage) - outputImage.close() + document = ap.Document(path_infile) + absorber = ap.ImagePlacementAbsorber() + document.pages[1].accept(absorber) + index = 1 + for image_placement in absorber.image_placements: + point1 = ap.Point( + image_placement.rectangle.llx, image_placement.rectangle.lly + ) + point2 = ap.Point( + image_placement.rectangle.urx, image_placement.rectangle.urx + ) + if rectangle.contains(point1, True) and rectangle.contains(point2, True): + with FileIO(path_outfile.replace("index", str(index)), "w") as output_image: + image_placement.image.save(output_image) + index = index + 1 ``` +## Extract Image Information from PDF + +The example below demonstrates how to analyze images embedded in a PDF page and calculate their effective resolution. + +1. Open the PDF with 'ap.Document'. +1. Track graphics state while reading page content. +1. Handle operators: + - 'GSave'/'GRestore' - push/pop matrix. + - 'ConcatenateMatrix' - update transform. + - 'Do' - if it’s an image, get size & apply transform. +1. Calculate effective DPI. +1. Print image name, scaled size, and DPI. + +```python + + import aspose.pdf as ap + from io import FileIO + from os import path + + path_infile = path.join(self.data_dir, infile) + + document = ap.Document(path_infile) + + default_resolution = 72 + graphics_state = [] + + image_names = list(document.pages[1].resources.images.names) + + graphics_state.append(drawing.drawing2d.Matrix(float(1), float(0), float(0), float(1), float(0), float(0))) + + for op in document.pages[1].contents: + if is_assignable(op, ap.operators.GSave): + graphics_state.append(cast(drawing.drawing2d.Matrix, graphics_state[-1]).clone()) + + elif is_assignable(op, ap.operators.GRestore): + graphics_state.pop() + + elif is_assignable(op, ap.operators.ConcatenateMatrix): + opCM = cast(ap.operators.ConcatenateMatrix, op) + cm = drawing.drawing2d.Matrix( + float(opCM.matrix.a), + float(opCM.matrix.b), + float(opCM.matrix.c), + float(opCM.matrix.d), + float(opCM.matrix.e), + float(opCM.matrix.f), + ) + + graphics_state[-1].multiply(cm) + continue + + elif is_assignable(op, ap.operators.Do): + opDo = cast(ap.operators.Do, op) + if opDo.name in image_names: + last_ctm = cast(drawing.drawing2d.Matrix, graphics_state[-1]) + index = image_names.index(opDo.name) + 1 + image = document.pages[1].resources.images[index] + + scaled_width = math.sqrt(last_ctm.elements[0] ** 2 + last_ctm.elements[1] ** 2) + scaled_height = math.sqrt(last_ctm.elements[2] ** 2 + last_ctm.elements[3] ** 2) + + original_width = image.width + original_height = image.height + + res_horizontal = original_width * default_resolution / scaled_width + res_vertical = original_height * default_resolution / scaled_height + + print( + f"{self.data_dir}image {opDo.name} " + f"({scaled_width:.2f}:{scaled_height:.2f}): " + f"res {res_horizontal:.2f} x {res_vertical:.2f}" + ) +``` \ No newline at end of file diff --git a/en/python-net/advanced-operations/working-with-images/replace-image-in-existing-pdf-file/_index.md b/en/python-net/advanced-operations/working-with-images/replace-image-in-existing-pdf-file/_index.md new file mode 100644 index 0000000000..3b203307ed --- /dev/null +++ b/en/python-net/advanced-operations/working-with-images/replace-image-in-existing-pdf-file/_index.md @@ -0,0 +1,83 @@ +--- +title: Replace Image in Existing PDF File using Python +linktitle: Replace Image +type: docs +weight: 70 +url: /python-net/replace-image-in-existing-pdf-file/ +description: This section describes about replace image in existing PDF file using Python library. +lastmod: "2025-09-17" +TechArticle: true +AlternativeHeadline: Replace an Image in PDF +Abstract: The Aspose.PDF for Python via .NET documentation provides a comprehensive guide on replacing images within existing PDF files. This functionality is essential for tasks such as updating logos, graphics, or other visual elements in a PDF document without altering its textual content. +--- + +## Replace an Image in PDF + +How to replace an existing image on a PDF page with a new image? Implement this using Aspose.PDF for Python via .NET. + +1. Import necessary modules (aspose.pdf, os.path, FileIO). +1. Define paths for: + - Input PDF (infile) + - New image file (image_file) + - Output PDF (outfile) +1. Load the PDF document using 'apdf.Document(path_infile)'. +1. Open the new image file in binary read mode. +1. Replace the first image on the first page: + - 'document.pages[1].resources.images.replace(1, image_stream)' +1. Save the updated PDF to 'path_outfile'. + +```python + + import aspose.pdf as apdf + from io import FileIO + from os import path + + path_infile = path.join(self.data_dir, infile) + path_image_file = path.join(self.data_dir, image_file) + path_outfile = path.join(self.data_dir, outfile) + + document = apdf.Document(path_infile) + + with FileIO(path_image_file, "rb") as image_stream: + document.pages[1].resources.images.replace(1, image_stream) + + document.save(path_outfile) +``` + +## Replace specific Image + +This example demonstrates how to replace a specific image on a PDF page by locating it via image placement detection. + +1. Load the PDF using 'apdf.Document()'. +1. Create an 'ImagePlacementAbsorber' to collect all image placements on the page. +1. Accept the absorber on the first page ('document.pages[1].accept(absorber)'). +1. Check if any image placements exist on the page. +1. Select the first image placement (absorber.image_placements[1]) and replace it. +1. Save the modified PDF to 'path_outfile'. + +```python + + import aspose.pdf as apdf + from io import FileIO + from os import path + + path_infile = path.join(self.data_dir, infile) + path_image_file = path.join(self.data_dir, image_file) + path_outfile = path.join(self.data_dir, outfile) + + document = apdf.Document(path_infile) + + # Create ImagePlacementAbsorber to find image placements + absorber = apdf.ImagePlacementAbsorber() + + # Accept the absorber for the first page + document.pages[1].accept(absorber) + + # Replace the first image placement found + if len(absorber.image_placements) > 0: + image_placement = absorber.image_placements[1] + with FileIO(path_image_file, "rb") as image_stream: + image_placement.replace(image_stream) + + document.save(path_outfile) +``` \ No newline at end of file diff --git a/en/python-net/advanced-operations/working-with-images/search-and-get-images-from-pdf-document/_index.md b/en/python-net/advanced-operations/working-with-images/search-and-get-images-from-pdf-document/_index.md new file mode 100644 index 0000000000..4810048afb --- /dev/null +++ b/en/python-net/advanced-operations/working-with-images/search-and-get-images-from-pdf-document/_index.md @@ -0,0 +1,203 @@ +--- +title: Get and Search Images in PDF +linktitle: Get and Search Images +type: docs +weight: 40 +url: /python-net/search-and-get-images-from-pdf-document/ +description: Learn how to search and get images from the PDF document in Python using Aspose.PDF. +lastmod: "2025-09-17" +TechArticle: true +AlternativeHeadline: Searching and Extracting Images from PDF +Abstract: The Aspose.PDF for Python via .NET library offers robust capabilities for searching and extracting images from PDF documents. Utilizing the 'ImagePlacementAbsorber' class, developers can efficiently locate and access images embedded across all pages of a PDF. +--- + +## Inspect Image Placement Properties in a PDF Page + +This example demonstrates how to analyze and display properties of all images on a specific PDF page using Aspose.PDF for Python via .NET. + +1. Create an 'ImagePlacementAbsorber' to collect all images on the page. +1. Call 'document.pages[1].accept(absorber)' to analyze image placements on the first page. +1. Iterate through 'absorber.image_placements' and display key properties of each image: + - Width and Height (points). + - Lower-left X (LLX) and Lower-left Y (LLY) coordinates. + - Horizontal (X) and Vertical (Y) resolution (DPI). + +```python + + import math + import aspose.pdf as ap + from aspose.pycore import cast, is_assignable + from os import path + + path_infile = path.join(self.data_dir, infile) + + document = ap.Document(path_infile) + absorber = ap.ImagePlacementAbsorber() + document.pages[1].accept(absorber) + + for image_placement in absorber.image_placements: + # Display image placement properties for all placements + print("image width: " + str(image_placement.rectangle.width)) + print("image height: " + str(image_placement.rectangle.height)) + print("image LLX: " + str(image_placement.rectangle.llx)) + print("image LLY: " + str(image_placement.rectangle.lly)) + print("image horizontal resolution: " + str(image_placement.resolution.x)) + print("image vertical resolution: " + str(image_placement.resolution.y)) +``` + +## Extract and Count Image Types in a PDF + +This function analyzes all images on the first page of a PDF and counts how many are grayscale and RGB images. + +1. Create an 'ImagePlacementAbsorber' to collect all images on the page. +1. Initialize counters for grayscale and RGB images. +1. Call 'document.pages[1].accept(absorber)' to analyze image placements. +1. Print the total number of images found. +1. Iterate through each image in 'absorber.image_placements': + - Get the image color type using 'image_placement.image.get_color_type()'. + - Increment the corresponding counter (grayscaled or rgb). + - Print a message for each image indicating whether it is grayscale or RGB. + +```python + + import math + import aspose.pdf as ap + from aspose.pycore import cast, is_assignable + from os import path + + path_infile = path.join(self.data_dir, infile) + + document = ap.Document(path_infile) + absorber = ap.ImagePlacementAbsorber() + + # Counters for grayscale and RGB images + grayscaled = 0 + rgb = 0 + + document.pages[1].accept(absorber) + + print("--------------------------------") + print("Total Images = " + str(len(absorber.image_placements))) + + image_counter = 1 + + for image_placement in absorber.image_placements: + # Determine the color type of the image + colorType = image_placement.image.get_color_type() + if colorType == ap.ColorType.GRAYSCALE: + grayscaled += 1 + print(f"Image {image_counter} is Grayscale...") + elif colorType == ap.ColorType.RGB: + rgb += 1 + print(f"Image {image_counter} is RGB...") + image_counter += 1 +``` + +## Extract Detailed Image Information from a PDF + +This function analyzes all images on the first page of a PDF and calculates their scaled dimensions and effective resolution based on the page’s graphics transformations. + +1. Load PDF and initialize variables +1. Collect image resources +1. Process page content operators: + - 'GSave' - push the current CTM onto the stack. + - 'GRestore' - pop the last CTM from the stack. + - 'ConcatenateMatrix' - update the current CTM by multiplying with the operator’s matrix. +1. Print image name, scaled dimensions, and calculated resolution. + +```python + + import math + import aspose.pdf as ap + from aspose.pycore import cast, is_assignable + from os import path + + path_infile = path.join(self.data_dir, infile) + + document = ap.Document(path_infile) + + default_resolution = 72 + graphics_state = [] + + image_names = list(document.pages[1].resources.images.names) + + graphics_state.append(drawing.drawing2d.Matrix(float(1), float(0), float(0), float(1), float(0), float(0))) + + for op in document.pages[1].contents: + if is_assignable(op, ap.operators.GSave): + graphics_state.append(cast(drawing.drawing2d.Matrix, graphics_state[-1]).clone()) + + elif is_assignable(op, ap.operators.GRestore): + graphics_state.pop() + + elif is_assignable(op, ap.operators.ConcatenateMatrix): + opCM = cast(ap.operators.ConcatenateMatrix, op) + cm = drawing.drawing2d.Matrix( + float(opCM.matrix.a), + float(opCM.matrix.b), + float(opCM.matrix.c), + float(opCM.matrix.d), + float(opCM.matrix.e), + float(opCM.matrix.f), + ) + + graphics_state[-1].multiply(cm) + continue + + elif is_assignable(op, ap.operators.Do): + opDo = cast(ap.operators.Do, op) + if opDo.name in image_names: + last_ctm = cast(drawing.drawing2d.Matrix, graphics_state[-1]) + index = image_names.index(opDo.name) + 1 + image = document.pages[1].resources.images[index] + + scaled_width = math.sqrt(last_ctm.elements[0] ** 2 + last_ctm.elements[1] ** 2) + scaled_height = math.sqrt(last_ctm.elements[2] ** 2 + last_ctm.elements[3] ** 2) + + original_width = image.width + original_height = image.height + + res_horizontal = original_width * default_resolution / scaled_width + res_vertical = original_height * default_resolution / scaled_height + + print( + f"{self.data_dir}image {opDo.name} " + f"({scaled_width:.2f}:{scaled_height:.2f}): " + f"res {res_horizontal:.2f} x {res_vertical:.2f}" + ) +``` + +## Extract Alternative Text from Images in a PDF + +This function retrieves alternative text (alt text) from all images on the first page of a PDF, useful for accessibility and PDF/UA compliance checks. + +1. Load the PDF document using 'ap.Document()'. +1. Create an 'ImagePlacementAbsorber' to collect all images on the page. +1. Accept the absorber on the first page (page.accept(absorber)). +1. Iterate through each image in 'absorber.image_placements': + - Print the name of the image in the page’s resource collection (get_name_in_collection()). + - Retrieve the alternative text using 'get_alternative_text(page)'. + - Print the first line of the alt text. + +```python + + import math + import aspose.pdf as ap + from aspose.pycore import cast, is_assignable + from os import path + + path_infile = path.join(self.data_dir, infile) + + document = ap.Document(path_infile) + absorber = ap.ImagePlacementAbsorber() + page = document.pages[1] + page.accept(absorber) + + for image_placement in absorber.image_placements: + print( + "Name in collection: " + + str(image_placement.image.get_name_in_collection()) + ) + lines = image_placement.image.get_alternative_text(page) + print("Alt Text: " + lines[0]) +``` \ No newline at end of file diff --git a/en/python-net/converting/_index.md b/en/python-net/converting/_index.md index 6aa69434e4..749ba66096 100644 --- a/en/python-net/converting/_index.md +++ b/en/python-net/converting/_index.md @@ -64,6 +64,10 @@ If honestly, externally, it is very difficult to determine if it is PDF or PDF/A - [Convert other file formats to PDF](/pdf/python-net/convert-other-files-to-pdf/) - this topic describes conversion with various formats like EPUB, XPS, Postscript, text and others. +- [Convert PDF/x to PDF](/pdf/python-net/convert-pdf_x-to-pdf/) - this topic describes converting PDF/UA, and PDF/A to PDF. + +- [Convert PDF to PDF/x](/pdf/python-net/convert-pdf-to-pdf_x/) - this topic describes converting PDF to PDF/A, PDF/E and PDF/X formats. + ## Try to convert PDF files online {{% alert color="success" %}} diff --git a/en/python-net/converting/convert-html-to-pdf/_index.md b/en/python-net/converting/convert-html-to-pdf/_index.md index 1ed2451aa1..dd1ca64442 100644 --- a/en/python-net/converting/convert-html-to-pdf/_index.md +++ b/en/python-net/converting/convert-html-to-pdf/_index.md @@ -14,17 +14,6 @@ AlternativeHeadline: How to convert HTML to PDF using Aspose.PDF for Python Abstract: Aspose.PDF for Python via .NET offers a robust solution for creating PDF files from web pages and raw HTML code within applications. This article provides a guide on converting HTML to PDF using Python, outlining the use of Aspose.PDF for Python, a PDF manipulation API that enables seamless conversion of HTML documents to PDF format. The conversion process can be customized as needed. The article includes a Python code sample demonstrating the conversion process, which involves creating an instance of the HtmlLoadOptions class, initializing a Document object, and saving the output PDF document using the Document.Save() method. Additionally, Aspose offers an online tool for converting HTML to PDF, allowing users to explore the functionality and quality of the conversion process. --- -## Overview - -Aspose.PDF for Python via .NET is a professional solution that allows you to create PDF files from web pages and raw HTML code in your applications. - -This article explains how to **convert HTML to PDF using Python**. It covers the following topics. - -_Format_: **HTML** -- [Python HTML to PDF](#python-html-to-pdf) -- [Python Convert HTML to PDF](#python-html-to-pdf) -- [Python How to convert HTML to PDF](#python-html-to-pdf) - ## Python HTML to PDF Conversion **Aspose.PDF for Python** is a PDF manipulation API that lets you convert any existing HTML documents to PDF seamlessly. The process of converting HTML to PDF can be flexibly customized. @@ -62,11 +51,13 @@ Aspose presents you online free application ["HTML to PDF"](https://products.asp [![Aspose.PDF Convertion HTML to PDF using Free App](html.png)](https://products.aspose.app/html/en/conversion/html-to-pdf) {{% /alert %}} -## Convert HTML to PDF media type +## Convert HTML to PDF using media type -1. Create an instance of the [HtmlLoadOptions()](https://reference.aspose.com/pdf/python-net/aspose.pdf/htmlloadoptions/) class. 'html_media_type' applies CSS rules intended for on-screen display. -1. Load and convert HTML. -1. Save output PDF document by calling **document.save()** method. +This example shows how to convert an HTML file to PDF using Aspose.PDF for Python via .NET with specific rendering options. + +1. Create an instance of the [HtmlLoadOptions()](https://reference.aspose.com/pdf/python-net/aspose.pdf/htmlloadoptions/) class. The 'html_media_type' applies CSS rules intended for on-screen display. The 'html_media_type' property can have multiple values. You can set it to HtmlMediaType.SCREEN or HtmlMediaType.PRINT. +1. Load the HTML into an ap.Document using the load options. +1. Save the document as a PDF. ```python @@ -87,9 +78,12 @@ Aspose presents you online free application ["HTML to PDF"](https://products.asp ## Convert HTML to PDF priority CSS Page Rule +Some documents may contain layout settings that utilize [the Page rule](https://developer.mozilla.org/en-US/docs/Web/CSS/@page), which can create ambiguity when generating the layout. You can control the priority using the 'is_priority_css_page_rule' property. If this property is set to 'True', the CSS rule is applied first. + 1. Create an instance of the [HtmlLoadOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/htmlloadoptions/) class. -1. Optionally sets how CSS is applied. The commented line 'is_priority_css_page_rule' can override how CSS rules affect page layout. -1. Converts and saves it as a PDF. +1. Set 'is_priority_css_page_rule = False' to disable prioritizing @page CSS rules, allowing other styles to take precedence. +1. Load the HTML into an ap.Document with the configured options. +1. Save the document as a PDF. ```python @@ -108,11 +102,14 @@ Aspose presents you online free application ["HTML to PDF"](https://products.asp print(infile + " converted into " + outfile) ``` -## Convert HTML to PDF Embed Fonts +## Convert HTML to PDF with Embeded Fonts -1. Loads an HTML file. -1. Embeds fonts into the resulting PDF to preserve appearance across devices. -1. Saves the PDF to a specified output location. +This example shows how to convert an HTML file to PDF while embedding fonts. If you need a PDF document with Embedded Fonts, you should set 'is_embed_fonts' to True. + +1. Create 'HtmlLoadOptions()' to configure HTML to PDF conversion. +1. Set 'is_embed_fonts = True' to ensure that all fonts used in the HTML are embedded directly into the PDF, preserving visual fidelity. +1. Load the HTML into an ap.Document with these options. +1. Save the document as a PDF. ```python @@ -131,16 +128,17 @@ Aspose presents you online free application ["HTML to PDF"](https://products.asp print(infile + " converted into " + outfile) ``` -## Render content on single page during HTML to PDF conversion +## Render Content on single Page during HTML to PDF conversion -Aspose.PDF for Python via .NET to convert an HTML file into a single-page PDF. +This example demonstrates how to convert an HTML file into a single-page PDF using Aspose.PDF for Python. +You can display all content on one page using the 'is_render_to_single_page property'. -1. Loads an HTML file. -1. Configures Aspose to render everything on a single PDF page. -1. Saves the PDF to a specified location. +1. Create an instance of 'HtmlLoadOptions()' to configure the conversion process. +1. Enable 'is_render_to_single_page' to render the entire HTML content onto a single continuous PDF page. +1. Load the document with the configured options into an 'ap.Document'. +1. Save the result as a PDF file. ```python - from os import path import aspose.pdf as ap import requests @@ -158,12 +156,12 @@ Aspose.PDF for Python via .NET to convert an HTML file into a single-page PDF. ## Convert MHTML to PDF -Loads an MHT (MHTML) file from disk and converts it into a PDF using Aspose.PDF for Pyhton via .NET. Allows specifying custom page dimensions to ensure a consistent layout. With this operation, you can support embedded HTML, CSS, and images in MHT files, as well as Configurable page width and height, and save the converted document to a specified output folder. +This example shows how to convert an MHT (MHTML) file into a PDF document using Aspose.PDF for Python with specific page dimensions. -1. Loads an MHT (MHTML) file. -1. Sets the page size for the converted document. -1. Converts the content into PDF (or another format) using Aspose.PDF. -1. Saves the file to a specific folder and prints a confirmation. +1. Create an instance of ap.MhtLoadOptions() to configure MHT file processing. +1. Set various parameters, such as page size. +1. Initialize the document with the input file and configured loading options. +1. Save the resulting document as a PDF. ```python diff --git a/en/python-net/converting/convert-images-format-to-pdf/_index.md b/en/python-net/converting/convert-images-format-to-pdf/_index.md index a425e0e7d3..453a3fec7a 100644 --- a/en/python-net/converting/convert-images-format-to-pdf/_index.md +++ b/en/python-net/converting/convert-images-format-to-pdf/_index.md @@ -11,53 +11,9 @@ sitemap: priority: 0.5 TechArticle: true AlternativeHeadline: How to Convert Images to PDF in Python -Abstract: This article provides a comprehensive guide on converting various image formats to PDF using Python, specifically leveraging the Aspose.PDF library for Python via .NET. The article covers a range of image formats including BMP, CGM, DICOM, EMF, GIF, PNG, SVG, and TIFF. Each section details the steps required to perform the conversion, providing code snippets to illustrate the process. For example, converting BMP to PDF involves creating a new PDF document, defining image placement, inserting the image, and saving the document. Similarly, for formats like CGM, DICOM, and others, specific load options and processing steps are outlined. The article also highlights the advantages of using Aspose.PDF for such tasks, such as its support for different encoding methods and the ability to process both single-frame and multi-frame images. Additionally, it references online tools provided by Aspose for users to try out these conversions without code. Each format is discussed with its unique characteristics and requirements for conversion, providing a +Abstract: This article provides a comprehensive guide on converting various image formats to PDF using Python, specifically leveraging the Aspose.PDF library for Python via .NET. The article covers a range of image formats including BMP, CGM, DICOM, EMF, GIF, PNG, SVG, and TIFF. Each section details the steps required to perform the conversion, providing code snippets to illustrate the process. For example, converting BMP to PDF involves creating a new PDF document, defining image placement, inserting the image, and saving the document. Similarly, for formats like CGM, DICOM, and others, specific load options and processing steps are outlined. The article also highlights the advantages of using Aspose.PDF for such tasks, such as its support for different encoding methods and the ability to process both single-frame and multi-frame images. --- -## Overview - -This article explains how to convert various Images formats to PDF using Python. It covers these topics. - -_Format_: **BMP** -- [Python BMP to PDF](#python-bmp-to-pdf) -- [Python Convert BMP to PDF](#python-bmp-to-pdf) -- [Python How to convert BMP image to PDF](#python-bmp-to-pdf) - -_Format_: **CGM** -- [Python CGM to PDF](#python-cgm-to-pdf) -- [Python Convert CGM to PDF](#python-cgm-to-pdf) -- [Python How to convert CGM image to PDF](#python-cgm-to-pdf) - -_Format_: **DICOM** -- [Python DICOM to PDF](#python-dicom-to-pdf) -- [Python Convert DICOM to PDF](#python-dicom-to-pdf) -- [Python How to convert DICOM image to PDF](#python-dicom-to-pdf) - -_Format_: **EMF** -- [Python EMF to PDF](#python-emf-to-pdf) -- [Python Convert EMF to PDF](#python-emf-to-pdf) -- [Python How to convert EMF image to PDF](#python-emf-to-pdf) - -_Format_: **GIF** -- [Python GIF to PDF](#python-gif-to-pdf) -- [Python Convert GIF to PDF](#python-gif-to-pdf) -- [Python How to convert GIF image to PDF](#python-gif-to-pdf) - -_Format_: **PNG** -- [Python PNG to PDF](#python-png-to-pdf) -- [Python Convert PNG to PDF](#python-png-to-pdf) -- [Python How to convert PNG image to PDF](#python-png-to-pdf) - -_Format_: **SVG** -- [Python SVG to PDF](#python-svg-to-pdf) -- [Python Convert SVG to PDF](#python-svg-to-pdf) -- [Python How to convert SVG image to PDF](#python-svg-to-pdf) - -_Format_: **TIFF** -- [Python TIFF to PDF](#python-tiff-to-pdf) -- [Python Convert TIFF to PDF](#python-tiff-to-pdf) -- [Python How to convert TIFF image to PDF](#python-tiff-to-pdf) - ## Python Images to PDF Conversions **Aspose.PDF for Python via .NET** allows you to convert different formats of images to PDF files. Our library demonstrates code snippets for converting the most popular image formats, such as - BMP, CGM, DICOM, EMF, JPG, PNG, SVG and TIFF formats. @@ -72,10 +28,10 @@ You can convert BMP to PDF files with Aspose.PDF for Python via .NET API. Theref Steps to Convert BMP to PDF in Python: -1. Creates an empty PDF document. -1. Adds a new A4-sized page. +1. Create an empty PDF document. +1. Create the page you need, for example, we created A4, but you can specify your own format. 1. Places the image (from infile) inside the page using the defined rectangle. -1. Saves the document as PDF. +1. Save the document as PDF. So the following code snippet follows these steps and shows how to convert BMP to PDF using Python: @@ -109,7 +65,7 @@ Aspose presents you online free application ["BMP to PDF"](https://products.aspo ## Convert CGM to PDF -Converts a CGM (Computer Graphics Metafile) into PDF (or another supported format) using Aspose.PDF for Python via .NET. +Convert a CGM (Computer Graphics Metafile) into PDF (or another supported format) using Aspose.PDF for Python via .NET. CGM is a file extension for a Computer Graphics Metafile format commonly used in CAD (computer-aided design) and presentation graphics applications. CGM is a vector graphics format that supports three different encoding methods: binary (best for program read speed), character-based (produces the smallest file size and allows for faster data transfers) or cleartext encoding (allows users to read and modify the file with a text editor). @@ -148,7 +104,12 @@ Steps to Convert CGM to PDF in Python: **Aspose.PDF for Python** allows you to convert DICOM and SVG images, but for technical reasons to add images you need to specify the type of file to be added to PDF. -The following code snippet shows how to convert DICOM files to PDF format with Aspose.PDF. You should load DICOM image, place the image on a page in a PDF file and save the output as PDF. +The following code snippet shows how to convert DICOM files to PDF format with Aspose.PDF. You should load DICOM image, place the image on a page in a PDF file and save the output as PDF. We use the additional pydicom library to set the dimensions of this image. If you want to position the image on the page, you can skip this code snippet. + +1. Initialize a new 'ap.Document()' and add a page +1. Insert DICOM Image. Create an apdf.Image(), set its type to DICOM, and assign the file path. +1. Adjust Page Size. Match the PDF page dimensions to the DICOM image size, remove margins. +1. Add the image to the page, save the document to the output file. 1. Load the DICOM file. 1. Extract image dimensions. @@ -297,10 +258,10 @@ Aspose presents you online free application ["GIF to PDF"](https://products.aspo You can convert PNG to PDF image using the below steps: -1. Create a New PDF Document -1. Define Image Placement -1. Save the PDF -1. Print Conversion Message +1. Create a New PDF Document. +1. Define Image Placement. +1. Save the PDF. +1. Print Conversion Message. Moreover, the code snippet below shows how to convert PNG to PDF with Python: @@ -336,11 +297,11 @@ Aspose presents you online free application ["PNG to PDF"](https://products.aspo ## Convert SVG to PDF -**Aspose.PDF for Python via .NET** explains how to convert SVG images to PDF format and how to get dimensions of the source SVG file. +**Aspose.PDF for Python via .NET** explains how to convert SVG images to PDF format and how to get dimensions of the source SVG file. Scalable Vector Graphics (SVG) is a family of specifications of an XML-based file format for two-dimensional vector graphics, both static and dynamic (interactive or animated). The SVG specification is an open standard that has been under development by the World Wide Web Consortium (W3C) since 1999. -SVG images and their behaviors are defined in XML text files. This means that they can be searched, indexed, scripted, and if required, compressed. As XML files, SVG images can be created and edited with any text editor, but it is often more convenient to create them with drawing programs such as Inkscape. +SVG images and their behaviors are defined in XML text files. This means that they can be searched, indexed, scripted, and if required, compressed. As XML files, SVG images can be created and edited with any text editor, but it is often more convenient to create them with drawing programs such as Inkscape. {{% alert color="success" %}} **Try to convert SVG format to PDF online** @@ -374,7 +335,7 @@ The following code snippet shows the process of converting SVG file into PDF for ## Convert TIFF to PDF -**Aspose.PDF** file format supported, be it a single frame or multi-frame TIFF image. It means that you can convert the TIFF image to PDF. +**Aspose.PDF** file format supported, be it a single frame or multi-frame TIFF image. It means that you can convert the TIFF image to PDF. TIFF or TIF, Tagged Image File Format, represents raster images that are meant for usage on a variety of devices that comply with this file format standard. TIFF image can contain several frames with different images. Aspose.PDF file format is also supported, be it a single frame or multi-frame TIFF image. @@ -404,11 +365,11 @@ You can convert TIFF to PDF in the same manner as the rest raster file formats g ## Convert CDR to PDF -Next code snippet shows how to load a CorelDRAW (CDR) file and save it as a PDF using 'CdrLoadOptions' in Aspose.PDF for Python via .NET. +Following code snippet shows how to load a CorelDRAW (CDR) file and save it as a PDF using 'CdrLoadOptions' in Aspose.PDF for Python via .NET. -1. Initialize load options for CDR format. -1. Load CDR file into Document object. -1. Save the document in PDF format. +1. Create 'CdrLoadOptions()' to configure how the CDR file should be loaded. +1. Initialize a Document object with the CDR file and load options. +1. Save the document as a PDF. ```python @@ -432,7 +393,7 @@ Next code snippet shows how to load a CorelDRAW (CDR) file and save it as a PDF ## Convert JPEG to PDF -This example shows how to create a new PDF document, add a blank A4-sized page, and insert an image into it using Aspose.PDF for Python via .NET. +This example shows how to convert JPEG to PDF file using Aspose.PDF for Python via .NET. 1. Create a new PDF document. 1. Add a new page. @@ -461,47 +422,3 @@ This example shows how to create a new PDF document, add a blank A4-sized page, document.save(path_outfile) print(infile + " converted into " + outfile) ``` - -## See Also - -This article also covers these topics. The codes are same as above. - -_Format_: **BMP** -- [Python BMP to PDF Code](#python-bmp-to-pdf) -- [Python BMP to PDF API](#python-bmp-to-pdf) -- [Python BMP to PDF Programmatically](#python-bmp-to-pdf) -- [Python BMP to PDF Library](#python-bmp-to-pdf) -- [Python Save BMP as PDF](#python-bmp-to-pdf) -- [Python Generate PDF from BMP](#python-bmp-to-pdf) -- [Python Create PDF from BMP](#python-bmp-to-pdf) -- [Python BMP to PDF Converter](#python-bmp-to-pdf) - -_Format_: **CGM** -- [Python CGM to PDF Code](#python-cgm-to-pdf) -- [Python CGM to PDF API](#python-cgm-to-pdf) -- [Python CGM to PDF Programmatically](#python-cgm-to-pdf) -- [Python CGM to PDF Library](#python-cgm-to-pdf) -- [Python Save CGM as PDF](#python-cgm-to-pdf) -- [Python Generate PDF from CGM](#python-cgm-to-pdf) -- [Python Create PDF from CGM](#python-cgm-to-pdf) -- [Python CGM to PDF Converter](#python-cgm-to-pdf) - -_Format_: **DICOM** -- [Python DICOM to PDF Code](#python-dicom-to-pdf) -- [Python DICOM to PDF API](#python-dicom-to-pdf) -- [Python DICOM to PDF Programmatically](#python-dicom-to-pdf) -- [Python DICOM to PDF Library](#python-dicom-to-pdf) -- [Python Save DICOM as PDF](#python-dicom-to-pdf) -- [Python Generate PDF from DICOM](#python-dicom-to-pdf) -- [Python Create PDF from DICOM](#python-dicom-to-pdf) -- [Python DICOM to PDF Converter](#python-dicom-to-pdf) - -_Format_: **EMF** -- [Python EMF to PDF Code](#python-emf-to-pdf) -- [Python EMF to PDF API](#python-emf-to-pdf) -- [Python EMF to PDF Programmatically](#python-emf-to-pdf) -- [Python EMF to PDF Library](#python-emf-to-pdf) -- [Python Save EMF as PDF](#python-emf-to-pdf) -- [Python Generate PDF from EMF](#python-emf-to-pdf) -- [Python Create PDF from EMF](#python-emf-to-pdf) -- [Python EMF to PDF Converter](#python-emf-to-pdf) diff --git a/en/python-net/converting/convert-other-files-to-pdf/_index.md b/en/python-net/converting/convert-other-files-to-pdf/_index.md index d44bb5e5bd..35eea7eab6 100644 --- a/en/python-net/converting/convert-other-files-to-pdf/_index.md +++ b/en/python-net/converting/convert-other-files-to-pdf/_index.md @@ -14,50 +14,67 @@ AlternativeHeadline: How to Convert other file formats to PDF in Python Abstract: This article provides a comprehensive guide on converting various file formats to PDF using Python, leveraging the capabilities of Aspose.PDF for Python via .NET. The document outlines conversion processes for several formats, including EPUB, Markdown, PCL, Text, XPS, PostScript, XML, XSL-FO, and LaTeX/TeX. Each section provides specific code snippets and instructions for implementing these conversions. The article emphasizes the utility of Aspose.PDF's features, such as load options tailored for each file type, to ensure accurate and efficient conversion. Additionally, it highlights the availability of free online conversion applications for users to explore the functionality firsthand. The guide serves as a practical resource for developers seeking to integrate PDF conversion capabilities into their Python applications. --- -## Overview - This article explains how to **convert various other types of file formats to PDF using Python**. It covers the following topics. -_Format_: **OFD** -- [Python OFD to PDF](#python-convert-ofd-to-pdf) -- [Python Convert OFD to PDF](#python-convert-ofd-to-pdf) -- [Python How to convert OFD file to PDF](#python-convert-ofd-to-pdf) - -_Format_: **EPUB** -- [Python EPUB to PDF](#python-convert-epub-to-pdf) -- [Python Convert EPUB to PDF](#python-convert-epub-to-pdf) -- [Python How to convert EPUB file to PDF](#python-convert-epub-to-pdf) - -_Format_: **Markdown** -- [Python Markdown to PDF](#python-convert-markdown-to-pdf) -- [Python Convert Markdown to PDF](#python-convert-markdown-to-pdf) -- [Python How to convert Markdown file to PDF](#python-convert-markdown-to-pdf) - -_Format_: **MD** -- [Python MD to PDF](#python-convert-md-to-pdf) -- [Python Convert MD to PDF](#python-convert-md-to-pdf) -- [Python How to convert MD file to PDF](#python-convert-md-to-pdf) - -_Format_: **PCL** -- [Python PCL to PDF](#python-convert-pcl-to-pdf) -- [Python Convert PCL to PDF](#python-convert-pcl-to-pdf) -- [Python How to convert PCL file to PDF](#python-convert-pcl-to-pdf) - -_Format_: **Text** -- [Python Text to PDF](#python-convert-text-to-pdf) -- [Python Convert Text to PDF](#python-convert-text-to-pdf) -- [Python How to convert Text file to PDF](#python-convert-text-to-pdf) - -_Format_: **TXT** -- [Python TXT to PDF](#python-convert-txt-to-pdf) -- [Python Convert TXT to PDF](#python-convert-txt-to-pdf) -- [Python How to convert TXT file to PDF](#python-convert-txt-to-pdf) - -_Format_: **XPS** -- [Python XPS to PDF](#python-convert-xps-to-pdf) -- [Python Convert XPS to PDF](#python-convert-xps-to-pdf) -- [Python How to convert XPS file to PDF](#python-convert-xps-to-pdf) +## Convert OFD to PDF + +OFD stands for Open Fixed-layout Document (also called Open Fixed Document format). It is a Chinese national standard (GB/T 33190-2016) for electronic documents, introduced as an alternative to PDF. + +Steps Convert OFD to PDF in Python: + +1. Set up OFD load options using OfdLoadOptions(). +1. Load the OFD document. +1. Save as PDF. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + load_options = ap.OfdLoadOptions() + document = ap.Document(path_infile, load_options) + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` + +## Convert LaTeX/TeX to PDF + +The LaTeX file format is a text file format with markup in the LaTeX derivative of the TeX family of languages and LaTeX is a derived format of the TeX system. LaTeX (ˈleɪtɛk/lay-tek or lah-tek) is a document preparation system and document markup language. It is widely used for the communication and publication of scientific documents in many fields, including mathematics, physics, and computer science. It also plays a key role in the preparation and publication of books and articles containing complex multilingual material, such as Korean, Japanese, Chinese characters, and Arabic, including special editions. + +LaTeX uses the TeX typesetting program for formatting its output, and is itself written in the TeX macro language. + +{{% alert color="success" %}} +**Try to convert LaTeX/TeX to PDF online** + +Aspose.PDF for Python via .NET presents you online free application ["LaTex to PDF"](https://products.aspose.app/pdf/conversion/tex-to-pdf), where you may try to investigate the functionality and quality it works. + +[![Aspose.PDF Convertion LaTeX/TeX to PDF with Free App](latex.png)](https://products.aspose.app/pdf/conversion/tex-to-pdf) +{{% /alert %}} + +Steps Convert TEX to PDF in Python: + +1. Set up LaTeX load options using LatexLoadOptions(). +1. Load the LaTeX document. +1. Save as PDF. + +```python + + from os import path + import aspose.pdf as ap + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + load_options = ap.LatexLoadOptions() + document = ap.Document(path_infile, load_options) + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` ## Convert OFD to PDF OFD stands for Open Fixed-layout Document (sometimes called Open Fixed Document format). It is a Chinese national standard (GB/T 33190-2016) for electronic documents, introduced as an alternative to PDF. @@ -120,9 +137,9 @@ Steps Convert TEX to PDF in Python: **Aspose.PDF for Python via .NET** allows you simply convert EPUB files to PDF format. -EPUB (short for electronic publication) is a free and open e-book standard from the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that an EPUB reader can optimize text for a particular display device. +EPUB (short for electronic publication) is a free and open e-book standard from the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that an EPUB reader can optimize text for a particular display device. -EPUB also supports fixed-layout content. The format is intended as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard.The version EPUB 3 is also endorsed by the Book Industry Study Group (BISG), a leading book trade association for standardized best practices, research, information and events, for packaging of content. +EPUB also supports fixed-layout content. The format is intended as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard.The version EPUB 3 is also endorsed by the Book Industry Study Group (BISG), a leading book trade association for standardized best practices, research, information and events, for packaging of content. {{% alert color="success" %}} **Try to convert EPUB to PDF online** @@ -224,7 +241,7 @@ Steps Convert PCL to PDF in Python: print(infile + " converted into " + outfile) ``` -## Convert Text to PDF +## Convert Preformatted Text to PDF **Aspose.PDF for Python via .NET** support the feature converting plain text and pre-formatted text file to PDF format. @@ -240,12 +257,12 @@ Aspose.PDF for Python via .NET presents you online free application ["Text to PD Steps Convert TEXT to PDF in Python: -1. Read the text file. -1. Set up font. -1. Create PDF and first page. -1. Page formatting. -1. Loop through lines. -1. Save PDF. +1. Read the input text file line by line. +1. Set up a monospaced font (Courier New) for consistent text alignment. +1. Create a new PDF Document and add the first page with custom margins and font settings. +1. Iterate through lines of the text file To simulate Typewriter, we use the 'monospace_font' font and size 12. +1. Limit page creation to 4 pages. +1. Save the final PDF to the specified path. ```python @@ -288,6 +305,30 @@ Steps Convert TEXT to PDF in Python: print(infile + " converted into " + outfile) ``` +## Convert PostScript to PDF + +This example demonstrates how to convert a PostScript file into a PDF document using Aspose.PDF for Python via .NET. + +1. Create an instance of 'PsLoadOptions' to correctly interpret the PS file. +1. Load the 'PostScript' file into a Document object using the load options. +1. Save the document in PDF format to the desired output path. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + load_options = ap.PsLoadOptions() + + document = ap.Document(path_infile, load_options) + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` + ## Convert XPS to PDF **Aspose.PDF for Python via .NET** support feature converting XPS files to PDF format. Check this article to resolve your tasks. @@ -341,3 +382,55 @@ Following code snippet can be used to convert a XSLFO to PDF format with Aspose. print(xmlfile + " converted into " + outfile) ``` + +## Convert XML with XSLT to PDF + +This example demonstrates how to convert an XML file into a PDF by first transforming it into HTML using an XSLT template and then loading the HTML into Aspose.PDF. + +1. Create an instance of 'HtmlLoadOptions' to configure HTML-to-PDF conversion. +1. Load the transformed HTML file into an Aspose.PDF Document object. +1. Save the document as a PDF at the specified output path. +1. Remove the temporary HTML file after successful conversion. + +```python + + from os import path + import aspose.pdf as ap + + def transform_xml_to_html(xml_file, xslt_file, html_file): + from lxml import etree + """ + Transform XML to HTML using XSLT and return as a stream + """ + # Parse XML document + xml_doc = etree.parse(xml_file) + + # Parse XSLT stylesheet + xslt_doc = etree.parse(xslt_file) + transform = etree.XSLT(xslt_doc) + + # Apply transformation + result = transform(xml_doc) + + # Save result to HTML file + with open(html_file, 'w', encoding='utf-8') as f: + f.write(str(result)) + + + def convert_XML_to_PDF(template, infile, outfile): + path_infile = path.join(data_dir, infile) + path_outfile = path.join(data_dir, "python", outfile) + path_template = path.join(data_dir, template) + path_temp_file = path.join(data_dir, "temp.html") + + load_options = ap.HtmlLoadOptions() + transform_xml_to_html(path_infile, path_template, path_temp_file) + + document = ap.Document(path_temp_file, load_options) + document.save(path_outfile) + + if path.exists(path_temp_file): + os.remove(path_temp_file) + + print(infile + " converted into " + outfile) +``` diff --git a/en/python-net/converting/convert-pdf-to-excel/_index.md b/en/python-net/converting/convert-pdf-to-excel/_index.md index 6432a3c42b..b8b6abbb57 100644 --- a/en/python-net/converting/convert-pdf-to-excel/_index.md +++ b/en/python-net/converting/convert-pdf-to-excel/_index.md @@ -14,40 +14,6 @@ AlternativeHeadline: How to Convert PDF to Excel in Python Abstract: This article provides a comprehensive guide on converting PDF files to various Excel formats using Python, specifically with the Aspose.PDF for Python via .NET library. It details the conversion processes for XLS, XLSX, CSV, and ODS formats. The document explains the steps needed to convert PDF to XLS and XLSX, highlighting the creation of Document and ExcelSaveOptions instances, and the use of the Document.Save() method to specify output formats. The article also discusses features such as controlling the insertion of blank columns and minimizing worksheet numbers during conversion. Additionally, it provides examples of converting PDFs to single Excel worksheets and other formats like CSV and ODS, emphasizing the flexibility and functionality of Aspose.PDF. An online tool for PDF to XLSX conversion is also mentioned, allowing users to explore the conversion quality. The article concludes with a list of related topics and code snippets to further aid in understanding and implementing these conversions programmatically. --- -## Overview - -This article explains how to **convert PDF to Excel formats using Python**. It covers the following topics. - -_Format_: **XLS** - -- [Python PDF to XLS](#python-pdf-to-xls) -- [Python Convert PDF to XLS](#python-pdf-to-xls) -- [Python How to convert PDF file to XLS](#python-pdf-to-xls) - -_Format_: **XLSX** - -- [Python PDF to XLSX](#python-pdf-to-xlsx) -- [Python Convert PDF to XLSX](#python-pdf-to-xlsx) -- [Python How to convert PDF file to XLSX](#python-pdf-to-xlsx) - -_Format_: **Excel** - -- [Python PDF to Excel](#python-pdf-to-xlsx) -- [Python PDF to Excel XLS](#python-pdf-to-xls) -- [Python PDF to Excel XLSX](#python-pdf-to-xlsx) - -_Format_: **CSV** - -- [Python PDF to CSV](#python-pdf-to-csv) -- [Python Convert PDF to CSV](#python-pdf-to-csv) -- [Python How to convert PDF file to CSV](#python-pdf-to-csv) - -_Format_: **ODS** - -- [Python PDF to ODS](#python-pdf-to-ods) -- [Python Convert PDF to ODS](#python-pdf-to-ods) -- [Python How to convert PDF file to ODS](#python-pdf-to-ods) - ## PDF to EXCEL conversion via Python **Aspose.PDF for Python via .NET** support the feature of converting PDF files to Excel, and CSV formats. @@ -64,7 +30,7 @@ Aspose.PDF presents you online free application ["PDF to XLSX"](https://products The following code snippet shows the process for converting PDF file into XLS or XLSX format with Aspose.PDF for Python via .NET. -Steps: Convert a PDF file to an Excel (XML Spreadsheet 2003) format +Steps: Convert a PDF file to an Excel (XML Spreadsheet 2003) format 1. Load the PDF document. 1. Set up Excel save options using [ExcelSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/excelsaveoptions/). @@ -86,7 +52,7 @@ The following code snippet shows the process for converting PDF file into XLS or print(infile + " converted into " + outfile) ``` -Steps: Convert a PDF file to an XLSX format (Excel 2007+) +Steps: Convert a PDF file to an XLSX format (Excel 2007+) 1. Load the PDF document. 1. Set up Excel save options using [ExcelSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/excelsaveoptions/). @@ -133,7 +99,7 @@ When converting a PDF to XLS format, a blank column is added to the output file Aspose.PDF for Python via .NET shows how to convert a PDF to an Excel (.xlsx) file, with the 'minimize_the_number_of_worksheets' option enabled. -Steps: Convert PDF to XLS or XLSX Single Worksheet in Python +Steps: Convert PDF to XLS or XLSX Single Worksheet in Python 1. Load the PDF document. 1. Set up Excel save options using [ExcelSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/excelsaveoptions/). @@ -173,7 +139,6 @@ This Python example shows how to convert a PDF file into an Excel file in XLSM f save_options = apdf.ExcelSaveOptions() save_options.format = apdf.ExcelSaveOptions.ExcelFormat.XLSM document.save(path_outfile, save_options) - print(infile + " converted into " + outfile) ``` @@ -183,7 +148,7 @@ This Python example shows how to convert a PDF file into an Excel file in XLSM f The 'convert_pdf_to_excel_2007_csv' function performs the same operation as before, but this time the target format is CSV (Comma-Separated Values) instead of XLSM. -Steps: Convert PDF to CSV in Python +Steps: Convert PDF to CSV in Python 1. Create an instance of [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) object with the source PDF document. 1. Create an instance of [ExcelSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/excelsaveoptions/) with **ExcelSaveOptions.ExcelFormat.CSV** @@ -191,11 +156,12 @@ The 'convert_pdf_to_excel_2007_csv' function performs the same operation as befo ```python - from os import path - import aspose.pdf as apdf +from os import path +import aspose.pdf as apdf - path_infile = path.join(self.data_dir, infile) - path_outfile = path.join(self.data_dir, "python", outfile) +def convert_pdf_to_excel_2007_csv(infile, outfile): + path_infile = path.join(data_dir, infile) + path_outfile = path.join(data_dir, "python", outfile) document = apdf.Document(path_infile) save_options = apdf.ExcelSaveOptions() @@ -207,7 +173,7 @@ The 'convert_pdf_to_excel_2007_csv' function performs the same operation as befo ### Convert to ODS -Steps: Convert PDF to ODS in Python +Steps: Convert PDF to ODS in Python 1. Create an instance of [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) object with the source PDF document. 1. Create an instance of [ExcelSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/excelsaveoptions/) with **ExcelSaveOptions.ExcelFormat.ODS** @@ -230,57 +196,3 @@ Conversion to ODS format performs in the same way as all other formats. print(infile + " converted into " + outfile) ``` - -## See Also - -This article also covers these topics. The codes are same as above. - -_Format_: **Excel** -- [Python PDF to Excel Code](#python-pdf-to-xlsx) -- [Python PDF to Excel API](#python-pdf-to-xlsx) -- [Python PDF to Excel Programmatically](#python-pdf-to-xlsx) -- [Python PDF to Excel Library](#python-pdf-to-xlsx) -- [Python Save PDF as Excel](#python-pdf-to-xlsx) -- [Python Generate Excel from PDF](#python-pdf-to-xlsx) -- [Python Create Excel from PDF](#python-pdf-to-xlsx) -- [Python PDF to Excel Converter](#python-pdf-to-xlsx) - -_Format_: **XLS** -- [Python PDF to XLS Code](#python-pdf-to-xls) -- [Python PDF to XLS API](#python-pdf-to-xls) -- [Python PDF to XLS Programmatically](#python-pdf-to-xls) -- [Python PDF to XLS Library](#python-pdf-to-xls) -- [Python Save PDF as XLS](#python-pdf-to-xls) -- [Python Generate XLS from PDF](#python-pdf-to-xls) -- [Python Create XLS from PDF](#python-pdf-to-xls) -- [Python PDF to XLS Converter](#python-pdf-to-xls) - -_Format_: **XLSX** -- [Python PDF to XLSX Code](#python-pdf-to-xlsx) -- [Python PDF to XLSX API](#python-pdf-to-xlsx) -- [Python PDF to XLSX Programmatically](#python-pdf-to-xlsx) -- [Python PDF to XLSX Library](#python-pdf-to-xlsx) -- [Python Save PDF as XLSX](#python-pdf-to-xlsx) -- [Python Generate XLSX from PDF](#python-pdf-to-xlsx) -- [Python Create XLSX from PDF](#python-pdf-to-xlsx) -- [Python PDF to XLSX Converter](#python-pdf-to-xlsx) - -_Format_: **CSV** -- [Python PDF to CSV Code](#python-pdf-to-csv) -- [Python PDF to CSV API](#python-pdf-to-csv) -- [Python PDF to CSV Programmatically](#python-pdf-to-csv) -- [Python PDF to CSV Library](#python-pdf-to-csv) -- [Python Save PDF as CSV](#python-pdf-to-csv) -- [Python Generate CSV from PDF](#python-pdf-to-csv) -- [Python Create CSV from PDF](#python-pdf-to-csv) -- [Python PDF to CSV Converter](#python-pdf-to-csv) - -_Format_: **ODS** -- [Python PDF to ODS Code](#python-pdf-to-ods) -- [Python PDF to ODS API](#python-pdf-to-ods) -- [Python PDF to ODS Programmatically](#python-pdf-to-ods) -- [Python PDF to ODS Library](#python-pdf-to-ods) -- [Python Save PDF as ODS](#python-pdf-to-ods) -- [Python Generate ODS from PDF](#python-pdf-to-ods) -- [Python Create ODS from PDF](#python-pdf-to-ods) -- [Python PDF to ODS Converter](#python-pdf-to-ods) diff --git a/en/python-net/converting/convert-pdf-to-html/_index.md b/en/python-net/converting/convert-pdf-to-html/_index.md index df877a0bc9..4dc13aa31c 100644 --- a/en/python-net/converting/convert-pdf-to-html/_index.md +++ b/en/python-net/converting/convert-pdf-to-html/_index.md @@ -14,15 +14,6 @@ AlternativeHeadline: How to Convert PDF to HTML in Python Abstract: This article provides a comprehensive guide on converting PDF files to HTML using Python, specifically through the Aspose.PDF for Python via .NET library. It outlines the necessary steps to achieve this conversion programmatically, highlighting the creation of a `Document` object from the source PDF and utilizing the `HtmlSaveOptions` for saving the document in HTML format. The article includes a concise Python code snippet demonstrating the conversion process. Additionally, it introduces an online tool, Aspose.PDF's "PDF to HTML" application, for users to explore the functionality and quality of the conversion. The article is structured to cater to various related topics, ensuring a thorough understanding of using Python for PDF to HTML conversion. --- -## Overview - -This article explains how to **convert PDF to HTML using Python**. It covers these topics. - -_Format_: **HTML** -- [Python PDF to HTML](#python-pdf-to-html) -- [Python Convert PDF to HTML](#python-pdf-to-html) -- [Python How to convert PDF file to HTML](#python-pdf-to-html) - ## Convert PDF to HTML **Aspose.PDF for Python via .NET** provides many features for converting various file formats into PDF documents and converting PDF files into various output formats. This article discusses how to convert a PDF file into HTML. You can use just a couple of lines of code Python for converting PDF To HTML. You may need to convert PDF to HTML if you want to create a website or add content to an online forum. One way to convert PDF to HTML is to programmatically use Python. @@ -35,10 +26,10 @@ Aspose.PDF for Python presents you online free application ["PDF to HTML"](https [![Aspose.PDF Convertion PDF to HTML with Free App](pdf_to_html.png)](https://products.aspose.app/pdf/conversion/pdf-to-html) {{% /alert %}} -Steps: Convert PDF to HTML in Python +Steps: Convert PDF to HTML in Python 1. Create an instance of [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) object with the source PDF document. -2. Save it to [HtmlSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/htmlsaveoptions/) by calling [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method. +1. Save it to [HtmlSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/htmlsaveoptions/) by calling [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method. ```python @@ -80,7 +71,7 @@ This function converts a PDF file into HTML format using Aspose.PDF for Python v This function converts a PDF file into multi-page HTML, where each PDF page is exported as a separate HTML file. This makes the output easier to navigate and reduces loading time for large PDFs. -1. Load the source PDF using 'apdf.Document'. +1. Load the source PDF using 'ap.Document'. 1. Create 'HtmlSaveOptions' and 'set split_into_pages'. 1. Save the document as HTML with pages split into separate files. 1. Print a confirmation message. @@ -104,7 +95,7 @@ This function converts a PDF file into multi-page HTML, where each PDF page is e This function converts a PDF into HTML format while storing all images as SVG files in a specified folder, instead of embedding them directly in the HTML. -1. Load the source PDF using 'apdf.Document'. +1. Load the source PDF using 'ap.Document'. 1. Create 'HtmlSaveOptions' and 'set special_folder_for_svg_images' to the target folder. 1. Save the document as HTML with external SVG images. 1. Print a confirmation message. @@ -128,7 +119,7 @@ This function converts a PDF into HTML format while storing all images as SVG fi This snippet converts a PDF into HTML format, storing all images as SVG files in a specified folder and compressing them to reduce file size. -1. Load the PDF document using 'apdf.Document'. +1. Load the PDF document using 'ap.Document'. 1. Create 'HtmlSaveOptions' and: - Set 'special_folder_for_svg_images' to store SVG images externally. - Enable 'compress_svg_graphics_if_any' to compress SVG images. @@ -155,7 +146,7 @@ This snippet converts a PDF into HTML format, storing all images as SVG files in This snippet converts a PDF into HTML format, embedding raster images as PNG page backgrounds. This approach preserves image quality and page layout within the HTML. -1. Load the PDF document using 'apdf.Document'. +1. Load the PDF document using 'ap.Document'. 1. Create 'HtmlSaveOptions' and 'set raster_images_saving_mode' to 'AS_EMBEDDED_PARTS_OF_PNG_PAGE_BACKGROUND'. 1. Save the document as HTML with embedded raster images. 1. Print a confirmation message. @@ -179,7 +170,7 @@ This snippet converts a PDF into HTML format, embedding raster images as PNG pag This function converts a PDF into HTML format, generating 'body-only' content without extra 'html' or 'head' tags, and splits the output into separate pages. -1. Load the PDF document using 'apdf.Document'. +1. Load the PDF document using 'ap.Document'. 1. Create 'HtmlSaveOptions' and configure: - 'html_markup_generation_mode = WRITE_ONLY_BODY_CONTENT' to generate only the 'body' content. - 'split_into_pages' to create separate HTML files for each PDF page. @@ -188,7 +179,7 @@ This function converts a PDF into HTML format, generating 'body-only' content wi ```python - from os import path +from os import path import aspose.pdf as apdf path_infile = path.join(self.data_dir, infile) @@ -206,7 +197,7 @@ This function converts a PDF into HTML format, generating 'body-only' content wi This function converts a PDF into HTML format, rendering all text as transparent, including shadowed texts, which preserves visual fidelity while allowing flexible styling in the output HTML. -1. Load the PDF document using 'apdf.Document'. +1. Load the PDF document using 'ap.Document'. 1. Create 'HtmlSaveOptions' and configure: - 'save_transparent_texts' to render normal text as transparent. - 'save_shadowed_texts_as_transparent_texts' to render shadowed text as transparent. @@ -233,7 +224,7 @@ This function converts a PDF into HTML format, rendering all text as transparent This function converts a PDF into HTML format, preserving document layers by converting marked content into separate layers in the output HTML. This allows layered elements (like annotations, backgrounds, and overlays) to be rendered accurately. -1. Load the PDF document using 'apdf.Document'. +1. Load the PDF document using 'ap.Document'. 1. Create 'HtmlSaveOptions' and enable 'convert_marked_content_to_layers' to preserve layers. 1. Save the document as HTML with layered content. 1. Print a confirmation message. @@ -253,16 +244,3 @@ This function converts a PDF into HTML format, preserving document layers by con print(infile + " converted into " + outfile) ``` -## See Also - -This article also covers these topics. The codes are same as above. - -_Format_: **HTML** -- [Python PDF to HTML Code](#python-pdf-to-html) -- [Python PDF to HTML API](#python-pdf-to-html) -- [Python PDF to HTML Programmatically](#python-pdf-to-html) -- [Python PDF to HTML Library](#python-pdf-to-html) -- [Python Save PDF as HTML](#python-pdf-to-html) -- [Python Generate HTML from PDF](#python-pdf-to-html) -- [Python Create HTML from PDF](#python-pdf-to-html) -- [Python PDF to HTML Converter](#python-pdf-to-html) diff --git a/en/python-net/converting/convert-pdf-to-images-format/_index.md b/en/python-net/converting/convert-pdf-to-images-format/_index.md index d23758d9cd..848cc7a8f4 100644 --- a/en/python-net/converting/convert-pdf-to-images-format/_index.md +++ b/en/python-net/converting/convert-pdf-to-images-format/_index.md @@ -14,45 +14,6 @@ AlternativeHeadline: How to Convert PDF to Image Formats in Python Abstract: This article provides a comprehensive guide on converting PDF files into various image formats using Python, specifically leveraging the Aspose.PDF for Python library. The document outlines methods for converting PDFs to image formats including TIFF, BMP, EMF, JPG, PNG, GIF, and SVG. Two primary conversion approaches are discussed - using the Device approach and SaveOption. The Device approach involves utilizing classes like `DocumentDevice` and `ImageDevice` for whole document or page-specific conversions. Detailed steps and Python code examples are provided for converting PDF pages to different formats such as TIFF using `TiffDevice`, and BMP, EMF, JPEG, PNG, and GIF using respective device classes (`BmpDevice`, `EmfDevice`, `JpegDevice`, `PngDevice`, `GifDevice`). For SVG conversion, the `SvgSaveOptions` class is introduced. The article also highlights online tools for trying these conversions. --- -## Overview - -This article explains how to convert PDF to different image formats using Python. It covers the following topics. - -_Image Format_: **TIFF** -- [Python PDF to TIFF](#python-pdf-to-tiff) -- [Python Convert PDF to TIFF](#python-pdf-to-tiff) -- [Python Convert Single or Particular Pages of PDF to TIFF](#python-pdf-to-tiff-pages) - -_Image Format_: **BMP** -- [Python PDF to BMP](#python-pdf-to-bmp) -- [Python Convert PDF to BMP](#python-pdf-to-bmp) -- [Python PDF to BMP Converter](#python-pdf-to-bmp) - -_Image Format_: **EMF** -- [Python PDF to EMF](#python-pdf-to-emf) -- [Python Convert PDF to EMF](#python-pdf-to-emf) -- [Python PDF to EMF Converter](#python-pdf-to-emf) - -_Image Format_: **JPG** -- [Python PDF to JPG](#python-pdf-to-jpg) -- [Python Convert PDF to JPG](#python-pdf-to-jpg) -- [Python PDF to JPG Converter](#python-pdf-to-jpg) - -_Image Format_: **PNG** -- [Python PDF to PNG](#python-pdf-to-png) -- [Python Convert PDF to PNG](#python-pdf-to-png) -- [Python PDF to PNG Converter](#python-pdf-to-png) - -_Image Format_: **GIF** -- [Python PDF to GIF](#python-pdf-to-gif) -- [Python Convert PDF to GIF](#python-pdf-to-gif) -- [Python PDF to GIF Converter](#python-pdf-to-gif) - -_Image Format_: **SVG** -- [Python PDF to SVG](#python-pdf-to-svg) -- [Python Convert PDF to SVG](#python-pdf-to-svg) -- [Python PDF to SVG Converter](#python-pdf-to-svg) - ## Python Convert PDF to Image **Aspose.PDF for Python** uses several approaches to convert PDF to image. Generally speaking, we use two approaches: conversion using the Device approach and conversion using SaveOption. This section will show you how to convert PDF documents to image formats such as BMP, JPEG, GIF, PNG, EMF, TIFF, and SVG formats using one of those approaches. @@ -77,12 +38,12 @@ Aspose.PDF for Python via .NET presents you online free application ["PDF to TIF Aspose.PDF for Python explain how to convert all pages in a PDF file to a single TIFF image: -Steps: Convert PDF to TIFF in Python +Steps: Convert PDF to TIFF in Python 1. Create an object of the [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) class. -2. Create [TiffSettings](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffsettings/) and [TiffDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffdevice/) objects -3. Call the [process](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffdevice/#methods) method to convert the PDF document to TIFF. -4. To set the output file's properties, use the [TiffSettings](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffsettings/) class. +1. Create [TiffSettings](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffsettings/) and [TiffDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffdevice/) objects +1. Call the [process](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffdevice/#methods) method to convert the PDF document to TIFF. +1. To set the output file's properties, use the [TiffSettings](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/tiffsettings/) class. The following code snippet shows how to convert all the PDF pages to a single TIFF image. @@ -123,12 +84,6 @@ Let's take a look at how to convert a PDF page to an image. [BmpDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/bmpdevice/) class provides a method named [process](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/bmpdevice/#methods) which allows you to convert a particular page of the PDF file to BMP image format. The other classes have the same method. So, if we need to convert a PDF page to an image, we just instantiate the required class. - - - - - - The following steps and code snippet in Python shows this possibility: - [Convert PDF to BMP in Python](#python-pdf-to-image) @@ -137,16 +92,16 @@ The following steps and code snippet in Python shows this possibility: - [Convert PDF to PNG in Python](#python-pdf-to-image) - [Convert PDF to GIF in Python](#python-pdf-to-image) -Steps: PDF to Image (BMP, EMF, JPG, PNG, GIF) in Python +Steps: PDF to Image (BMP, EMF, JPG, PNG, GIF) in Python 1. Load the PDF file using [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) class. -2. Create an instance of subclass of [ImageDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/imagedevice/) i.e. +1. Create an instance of subclass of [ImageDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/imagedevice/) i.e. * [BmpDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/bmpdevice/) (to convert PDF to BMP) * [EmfDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/emfdevice/) (to convert PDF to Emf) * [JpegDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/jpegdevice/) (to convert PDF to JPG) * [PngDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/pngdevice/) (to convert PDF to PNG) * [GifDevice](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/gifdevice/) (to convert PDF to GIF) -3. Call the [ImageDevice.process()](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/imagedevice/#methods) method to perform PDF to Image conversion. +1. Call the [ImageDevice.process()](https://reference.aspose.com/pdf/python-net/aspose.pdf.devices/imagedevice/#methods) method to perform PDF to Image conversion. ### Convert PDF to BMP @@ -219,7 +174,8 @@ The following steps and code snippet in Python shows this possibility: page_count = page_count + 1 print(infile + " converted into " + outfile) -``` +``` + ### Convert PDF to PNG @@ -244,7 +200,38 @@ The following steps and code snippet in Python shows this possibility: page_count = page_count + 1 print(infile + " converted into " + outfile) -``` +``` + +### Convert PDF to PNG with default font + +```python + + from os import path + import aspose.pdf as ap + from io import FileIO + + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + resolution = ap.devices.Resolution(300) + + rendering_options = ap.RenderingOptions() + rendering_options.default_font_name = "Arial" + + device = ap.devices.PngDevice(resolution) + device.rendering_options = rendering_options + + page_count = 1 + while page_count <= len(document.pages): + image_stream = FileIO(path_outfile + str(page_count) + "_out.png", "w") + device.process(document.pages[page_count], image_stream) + image_stream.close() + page_count = page_count + 1 + + print(infile + " converted into " + outfile) +``` ### Convert PDF to GIF @@ -268,7 +255,7 @@ The following steps and code snippet in Python shows this possibility: page_count = page_count + 1 print(infile + " converted into " + outfile) -``` +``` {{% alert color="success" %}} **Try to convert PDF to PNG online** @@ -300,11 +287,11 @@ Aspose.PDF for Python supports the feature to convert SVG image to PDF format an The following code snippet shows the steps for converting a PDF file to SVG format with Python. -Steps: Convert PDF to SVG in Python +Steps: Convert PDF to SVG in Python 1. Create an object of the [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) class. -2. Create [SvgSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/svgsaveoptions/) object with needed settings. -3. Call the [document.save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method and pass it [SvgSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/svgsaveoptions/) object convert the PDF document to SVG. +1. Create [SvgSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/svgsaveoptions/) object with needed settings. +1. Call the [document.save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method and pass it [SvgSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/svgsaveoptions/) object convert the PDF document to SVG. ### Convert PDF to SVG diff --git a/en/python-net/converting/convert-pdf-to-other-files/_index.md b/en/python-net/converting/convert-pdf-to-other-files/_index.md index e625462a92..cc1d6e9815 100644 --- a/en/python-net/converting/convert-pdf-to-other-files/_index.md +++ b/en/python-net/converting/convert-pdf-to-other-files/_index.md @@ -4,7 +4,7 @@ linktitle: Convert PDF to other formats type: docs weight: 90 url: /python-net/convert-pdf-to-other-files/ -lastmod: "2025-02-27" +lastmod: "2025-09-27" description: This topic shows you how to convert PDF file to other file formats like EPUB, LaTeX, Text, XPS etc using Python. sitemap: changefreq: "monthly" @@ -24,25 +24,25 @@ Aspose.PDF for Python presents you online free application ["PDF to EPUB"](https [![Aspose.PDF Convertion PDF to EPUB with Free App](pdf_to_epub.png)](https://products.aspose.app/pdf/conversion/pdf-to-epub) {{% /alert %}} -**EPUB** is a free and open e-book standard from the International Digital Publishing Forum (IDPF). Files have the extension .epub. +EPUB is a free and open e-book standard from the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that an EPUB reader can optimize text for a particular display device. EPUB also supports fixed-layout content. The format is intended as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard. -Aspose.PDF for Python also supports the feature to convert PDF documents to EPUB format. Aspose.PDF for Python has a class named 'EpubSaveOptions' which can be used as the second argument to [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method, to generate an EPUB file. +Aspose.PDF for Python also supports the feature to convert PDF documents to EPUB format. Aspose.PDF for Python has a class named 'EpubSaveOptions' which can be used as the second argument to [document.save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method, to generate an EPUB file. Please try using the following code snippet to accomplish this requirement with Python. ```python - import aspose.pdf as apdf - from io import FileIO from os import path - import pydicom + import aspose.pdf as ap - path_infile = path.join(self.dataDir, infile) - path_outfile = path.join(self.dataDir, "python", outfile) + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) - document = apdf.Document(path_infile) - save_options = apdf.EpubSaveOptions() - save_options.content_recognition_mode = apdf.EpubSaveOptions.RecognitionMode.FLOW + document = ap.Document(path_infile) + save_options = ap.EpubSaveOptions() + save_options.content_recognition_mode = ( + ap.EpubSaveOptions.RecognitionMode.FLOW + ) document.save(path_outfile, save_options) print(infile + " converted into " + outfile) @@ -67,25 +67,14 @@ The following code snippet shows the process of converting PDF files into the TE ```python - import aspose.pdf as apdf - from io import FileIO from os import path - import pydicom + import aspose.pdf as ap - path_infile = path.join(self.dataDir, infile) - path_outfile = path.join(self.dataDir, "python", outfile) + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) - # Open PDF document - - document = apdf.Document(path_infile) - - # Instantiate an object of SvgSaveOptions - - save_options = apdf.SvgSaveOptions() - - # Instantiate an object of LaTeXSaveOptions - - save_options = apdf.LaTeXSaveOptions() + document = ap.Document(path_infile) + save_options = ap.LaTeXSaveOptions() document.save(path_outfile, save_options) print(infile + " converted into " + outfile) @@ -93,26 +82,18 @@ The following code snippet shows the process of converting PDF files into the TE ## Convert PDF to Text -**Aspose.PDF for Python** support converting whole PDF document and single page to a Text file. - -### Convert PDF document to Text file - -You can convert PDF document to TXT file using 'TextDevice' class. - -The following code snippet explains how to extract the texts from the all pages. +**Aspose.PDF for Python** support converting whole PDF document and single page to a Text file. You can convert PDF document to TXT file using 'TextDevice' class. The following code snippet explains how to extract the texts from the all pages. ```python - import aspose.pdf as apdf - from io import FileIO from os import path - import pydicom + import aspose.pdf as ap - path_infile = path.join(self.dataDir, infile) - path_outfile = path.join(self.dataDir, "python", outfile) + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) - document = apdf.Document(path_infile) - device = apdf.devices.TextDevice() + document = ap.Document(path_infile) + device = ap.devices.TextDevice() device.process(document.pages[1], path_outfile) print(infile + " converted into " + outfile) @@ -128,7 +109,7 @@ Aspose.PDF for Python presents you online free application ["PDF to Text"](https ## Convert PDF to XPS -**Aspose.PDF for Python** gives a possibility to convert PDF files to XPS format. Let try to use the presented code snippet for converting PDF files to XPS format with Python. +**Aspose.PDF for Python** gives a possibility to convert PDF files to XPS format. Let try to use the presented code snippet for converting PDF files to XPS format with Python. {{% alert color="success" %}} **Try to convert PDF to XPS online** @@ -140,54 +121,74 @@ Aspose.PDF for Python presents you online free application ["PDF to XPS"](https: The XPS file type is primarily associated with the XML Paper Specification by Microsoft Corporation. The XML Paper Specification (XPS), formerly codenamed Metro and subsuming the Next Generation Print Path (NGPP) marketing concept, is Microsoft's initiative to integrate document creation and viewing into the Windows operating system. -To convert PDF files to XPS, Aspose.PDF has the class [XpsSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/xpssaveoptions/) that is used as the second argument to the [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method to generate the XPS file. +To convert PDF files to XPS, Aspose.PDF has the class [XpsSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/xpssaveoptions/) that is used as the second argument to the [document.save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method to generate the XPS file. The following code snippet shows the process of converting PDF file into XPS format. ```python - import aspose.pdf as apdf - from io import FileIO from os import path - import pydicom + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) - path_infile = path.join(self.dataDir, infile) - path_outfile = path.join(self.dataDir, "python", outfile) - document = apdf.Document(path_infile) - save_options = apdf.XpsSaveOptions() + document = ap.Document(path_infile) + save_options = ap.XpsSaveOptions() + save_options.use_new_imaging_engine = True document.save(path_outfile, save_options) print(infile + " converted into " + outfile) ``` -## Convert PDF to XML +## Convert PDF to MD -{{% alert color="success" %}} -**Try to convert PDF to XML online** +Aspose.PDF has the class 'MarkdownSaveOptions()', which converts a PDF document into Markdown (MD) format while preserving images and resources. -Aspose.PDF for Python presents you online free application ["PDF to XML"](https://products.aspose.app/pdf/conversion/pdf-to-xml), where you may try to investigate the functionality and quality it works. +1. Load the source PDF using 'ap.Document'. +1. Create an instance of 'MarkdownSaveOptions'. +1. Set 'resources_directory_name' to 'images' – extracted images will be stored in this folder. +1. Save the converted Markdown document using the configured options. +1. Print a confirmation message after conversion. -[![Aspose.PDF Convertion PDF to XML with Free App](pdf_to_xml.png)](https://products.aspose.app/pdf/conversion/pdf-to-xml) -{{% /alert %}} +```python -XML is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. + from os import path + import aspose.pdf as ap -Aspose.PDF for Python also supports the feature to convert PDF documents to XML format. Aspose.PDF for Python has a class named 'XmlSaveOptions' which can be used as the second argument to [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods) method, to generate an XML file. -Please try using the following code snippet to accomplish this requirement with Python. + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + save_options = ap.MarkdownSaveOptions() + # save_options.extract_vector_graphics = True + save_options.resources_directory_name = "images" + save_options.use_image_html_tag = True + document.save(path_outfile, save_options) + + print(infile + " converted into " + outfile) +``` + +A Markdown file with text and linked images stored in the specified images folder. + +## Convert PDF to MobiXML + +This method converts a PDF document into the MOBI (MobiXML) format, which is commonly used for eBooks on Kindle devices. + +1. Load the source PDF document using 'ap.Document'. +1. Save the document with the format 'ap.SaveFormat.MOBI_XML'. +1. Print a confirmation message once the conversion is complete. ```python - import aspose.pdf as apdf - from io import FileIO from os import path - import pydicom + import aspose.pdf as ap - path_infile = path.join(self.dataDir, infile) - path_outfile = path.join(self.dataDir, "python", outfile) + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) - document = apdf.Document(path_infile) - save_options = apdf.PdfXmlSaveOptions() - document.save(path_outfile, save_options) + document = ap.Document(path_infile) + document.save(path_outfile, ap.SaveFormat.MOBI_XML) print(infile + " converted into " + outfile) -``` +``` \ No newline at end of file diff --git a/en/python-net/converting/convert-pdf-to-other-files/pdf_to_xml.png b/en/python-net/converting/convert-pdf-to-other-files/pdf_to_xml.png deleted file mode 100644 index 5f2bc80689..0000000000 Binary files a/en/python-net/converting/convert-pdf-to-other-files/pdf_to_xml.png and /dev/null differ diff --git a/en/python-net/converting/convert-pdf-to-pdfa/_index.md b/en/python-net/converting/convert-pdf-to-pdfa/_index.md deleted file mode 100644 index 61aaf0af88..0000000000 --- a/en/python-net/converting/convert-pdf-to-pdfa/_index.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -title: Convert PDF to PDF/A formats in Python -linktitle: Convert PDF to PDF/A formats -type: docs -weight: 100 -url: /python-net/convert-pdf-to-pdfa/ -lastmod: "2025-02-27" -description: Learn how to convert PDF files to PDF/A format for compliance with archiving standards using Aspose.PDF in Python via .NET. -sitemap: - changefreq: "monthly" - priority: 0.8 -TechArticle: true -AlternativeHeadline: How to Convert PDF to PDF/A formats in Python -Abstract: This article outlines the process of converting a PDF file to a PDF/A compliant format using Aspose.PDF for Python. The procedure includes validating the original PDF file according to Adobe Preflight standards, as various tools interpret PDF/A conformance differently. Once validated, the PDF is converted using the `Document` class's `Convert` method, with the validation results stored in an XML file. The conversion process allows for handling unconvertible elements via the `ConvertErrorAction` enumeration. Additionally, Aspose.PDF offers an online tool for converting PDFs to PDF/A-1A, enabling users to test the functionality and quality of the conversion process. A Python code snippet is provided to demonstrate converting a PDF to PDF/A-1b, illustrating the necessary steps and functions involved in the conversion. ---- - -**Aspose.PDF for Python** allows you to convert a PDF file to a PDF/A compliant PDF file. Before doing so, the file must be validated. This topic explains how. - -{{% alert color="primary" %}} - -Please note we follow Adobe Preflight for validating PDF/A conformance. All tools on the market have their own “representation” of PDF/A conformance. Please check this article on PDF/A validation tools for reference. We chose Adobe products for verifying how Aspose.PDF produces PDF files because Adobe is at the center of everything connected to PDF. - -{{% /alert %}} - -Convert the file using the Document class Convert method. Before converting the PDF to PDF/A compliant file, validate the PDF using the Validate method. The validation result is stored in an XML file and then this result is also passed to the Convert method. You can also specify the action for the elements which cannot be converted using the ConvertErrorAction enumeration. - -{{% alert color="success" %}} -**Try to convert PDF to PDF/A online** - -Aspose.PDF for Python presents you online free application ["PDF to PDF/A-1A"](https://products.aspose.app/pdf/conversion/pdf-to-pdfa1a), where you may try to investigate the functionality and quality it works. - -[![Aspose.PDF Convertion PDF to PDF/A with Free App](pdf_to_pdfa.png)](https://products.aspose.app/pdf/conversion/pdf-to-pdfa1a) -{{% /alert %}} - - -## Convert PDF file to PDF/A-1b - -The following code snippet shows how to convert PDF files to PDF/A-1b compliant PDF. - -```python - - import aspose.pdf as apdf - from io import FileIO - from os import path - import pydicom - - path_infile = path.join(self.dataDir, infile) - path_outfile = path.join(self.dataDir, "python", outfile) - - document = apdf.Document(path_infile) - document.convert( - self.dataDir + "pdf_pdfa.log", - apdf.PdfFormat.PDF_A_1B, - apdf.ConvertErrorAction.DELETE, - ) - document.save(path_outfile) - - print(infile + " converted into " + outfile) -``` - diff --git a/en/python-net/converting/convert-pdf-to-pdfx/_index.md b/en/python-net/converting/convert-pdf-to-pdfx/_index.md new file mode 100644 index 0000000000..0842496813 --- /dev/null +++ b/en/python-net/converting/convert-pdf-to-pdfx/_index.md @@ -0,0 +1,307 @@ +--- +title: Convert PDF to PDF/x formats in Python +linktitle: Convert PDF to PDF/x formats +type: docs +weight: 120 +url: /python-net/convert-pdf-to-pdf_x/ +lastmod: "2025-09-27" +description: This topic shows you how to convert PDF to PDF/x formats using Aspose.PDF for Python via .NET. +sitemap: + changefreq: "monthly" + priority: 0.8 +TechArticle: true +AlternativeHeadline: How to convert PDF to PDF/x formats +Abstract: The article provides a comprehensive guide on converting PDF to PDF/A, PDF/E, and PDF/X formats using Aspose.PDF for Python. +--- + +**PDF to PDF/x format means the ability to convert PDF to additional formats, namely PDF/A, PDF/E and PDF/X.** + +## Convert PDF to PDF/A + +**Aspose.PDF for Python** allows you to convert a PDF file to a PDF/A compliant PDF file. Before doing so, the file must be validated. This topic explains how. + +{{% alert color="primary" %}} + +Please note we follow Adobe Preflight for validating PDF/A conformance. All tools on the market have their own “representation” of PDF/A conformance. Please check this article on PDF/A validation tools for reference. We chose Adobe products for verifying how Aspose.PDF produces PDF files because Adobe is at the center of everything connected to PDF. + +{{% /alert %}} + +Convert the file using the Document class Convert method. Before converting the PDF to PDF/A compliant file, validate the PDF using the Validate method. The validation result is stored in an XML file and then this result is also passed to the Convert method. You can also specify the action for the elements which cannot be converted using the ConvertErrorAction enumeration. + +{{% alert color="success" %}} +**Try to convert PDF to PDF/A online** + +Aspose.PDF for Python presents you online free application ["PDF to PDF/A-1A"](https://products.aspose.app/pdf/conversion/pdf-to-pdfa1a), where you may try to investigate the functionality and quality it works. + +[![Aspose.PDF Convertion PDF to PDF/A with Free App](pdf_to_pdfa.png)](https://products.aspose.app/pdf/conversion/pdf-to-pdfa1a) +{{% /alert %}} + +The 'document.validate()' method validates whether a PDF file conforms to the PDF/A-1B standard (an ISO-standardized version of PDF designed for long-term archiving). The validation results are saved in a log file. + +1. Load the PDF document using 'ap.Document'. +1. Call the validate method with the target compliance level (ap.PdfFormat.PDF_A_1B). +1. The results of the validation are written into the specified log file. + +```python + + path_infile = path.join(self.data_dir, infile) + path_logfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + document.validate(path_logfile, ap.PdfFormat.PDF_A_1B) +``` + +### Convert PDF to PDF/A-1B + +The following code snippet shows how to convert PDF files to PDF/A-1B format: + +1. Load the PDF document using 'ap.Document'. +1. Call the convert method with the following parameters: + - Log file path - stores the details of the conversion process and compliance checks. + - Target format - 'ap.PdfFormat.PDF_A_1B' (archival standard). + - Error action - 'ap.ConvertErrorAction.DELETE' — automatically removes elements that prevent compliance. +1. Save the converted PDF/A-compliant file to the output path. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + document.convert( + self.data_dir + "pdf_pdfa.log", + ap.PdfFormat.PDF_A_1B, + ap.ConvertErrorAction.DELETE, + ) + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` + +### Convert PDF to PDF 2.0 and PDF/A-4 + +This example demonstrates how to convert a PDF document into newer standardized formats: PDF 2.0 and PDF/A-4. +Both conversions help ensure compliance with modern specifications and archival requirements. + +1. Load the input document using ap.Document. +1. Perform the first conversion to PDF 2.0 by calling document.convert with: + - Log file path for conversion details. + - Target format - 'ap.PdfFormat.V_2_0'. + - Error action - 'ap.ConvertErrorAction.DELETE' to remove non-compliant elements. +1. Perform a second conversion to PDF/A-4 using the same method, ensuring the file is also compliant with archival standards. +1. Save the resulting document in the specified output path. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + path_logfile = path_outfile.replace(".pdf","_log.xml") + + document = ap.Document(path_infile) + document.convert(path_logfile, ap.PdfFormat.V_2_0, ap.ConvertErrorAction.DELETE) + + document.convert(path_logfile, ap.PdfFormat.PDF_A_4, ap.ConvertErrorAction.DELETE) + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` + +### Convert PDF to PDF/A-3A with Embedded Files + +Next code snippet demonstrates how to embed external files into a PDF and then convert the PDF into PDF/A-3A format, which supports attachments and is suitable for long-term archival with embedded content. + +1. Load the input PDF using 'ap.Document'. +1. Create a 'FileSpecification' object pointing to the file to embed (e.g., "aspose-logo.jpg") with a description. +1. Add the file specification to the PDF’s 'embedded_files' collection. +1. Convert the document to PDF/A-3A using 'document.convert', specifying: + - Log file path. + - Target format - 'ap.PdfFormat.PDF_A_3A'. + - Error action - 'ap.ConvertErrorAction.DELETE' to remove non-compliant elements. +1. Save the converted PDF to the output path. +1. Print a confirmation message. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + path_logfile = path_outfile.replace(".pdf","_log.xml") + + document = ap.Document(path_infile) + + fileSpecification = ap.FileSpecification(self.data_dir + "aspose-logo.jpg", "Large Image file") + document.embedded_files.add(fileSpecification) + document.convert(path_logfile, ap.PdfFormat.PDF_A_3A, ap.ConvertErrorAction.DELETE) + document.save(path_outfile) + print(infile + " converted into " + outfile) +``` + +### Convert PDF to PDF/A-1B with Font Substitution + +This function converts a PDF into PDF/A-1B format while handling missing fonts by substituting them with available ones. This ensures the converted PDF remains visually consistent and compliant with archival standards. + +1. Load the PDF using 'ap.Document'. +1. Convert the PDF to PDF/A-1B using 'document.convert', specifying: + - Log file path. + - Target format - 'ap.PdfFormat.PDF_A_1B'. + - Error action - 'ap.ConvertErrorAction.DELETE' to remove non-compliant elements. +1. Save the converted PDF to the output path. +1. Print a confirmation message. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + path_logfile = path_outfile.replace(".pdf","_log.xml") + + try: + ap.text.FontRepository.find_font("AgencyFB") + + except ap.FontNotFoundException: + font_substitution = ap.text.SimpleFontSubstitution("AgencyFB", "Arial") + ap.text.FontRepository.Substitutions.append(font_substitution) + + document = ap.Document(path_infile) + document.convert(path_logfile, ap.PdfFormat.PDF_A_1B, ap.ConvertErrorAction.DELETE) + document.save(path_outfile) + print(infile + " converted into " + outfile) +``` + +### Convert PDF to PDF/A-1B with Automatic Tagging + +This function converts a PDF document into PDF/A-1B format while automatically tagging the content for accessibility and structural consistency. Automatic tagging improves document usability for screen readers and ensures proper semantic structure. + +1. Load the PDF using 'ap.Document'. +1. Create 'PdfFormatConversionOptions' specifying: + - Log file path. + - Target format - 'ap.PdfFormat.PDF_A_1B'. + - Error action - 'ap.ConvertErrorAction.DELETE' to remove non-compliant elements. +1. Configure 'AutoTaggingSettings': + - Enable 'enable_auto_tagging = True'. + - Set 'heading_recognition_strategy = AUTO' to automatically detect headings. +1. Assign the auto-tagging settings to the conversion options. +1. Convert the PDF using 'document.convert(options)'. +1. Save the converted PDF to the output path. +1. Print a confirmation message. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + path_logfile = path_outfile.replace(".pdf","_log.xml") + + document = ap.Document(path_infile) + options = ap.PdfFormatConversionOptions(path_logfile, ap.PdfFormat.PDF_A_1B, ap.ConvertErrorAction.DELETE) + + auto_tagging_settings = ap.AutoTaggingSettings() + auto_tagging_settings.enable_auto_tagging = True + + auto_tagging_settings.heading_recognition_strategy = ap.HeadingRecognitionStrategy.AUTO + + options.auto_tagging_settings = auto_tagging_settings + document.convert(options) + document.save(path_outfile) + print(infile + " converted into " + outfile) +``` + +## Convert PDF to PDF/E + +This snippet validates whether a PDF document conforms to the PDF/E-1 standard, which is an ISO standard tailored for engineering and technical documents. The validation results are saved to a log file. + +1. Load the PDF document using 'ap.Document'. +1. Call the validate method, specifying: + - Log file path to store validation results. + - Target format - 'ap.PdfFormat.PDF_E_1'. +1. The validation results are saved in the log file for review. + +```python + + path_infile = path.join(self.data_dir, infile) + path_logfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + document.validate(path_logfile, ap.PdfFormat.PDF_E_1) +``` + +Next example demonstrates how to convert a PDF into PDF/E-1 format, which is an ISO standard tailored for engineering and technical documentation. This format preserves precise layout, graphics, and metadata required for engineering workflows. + +1. Load the source PDF using 'ap.Document'. +1. Create 'PdfFormatConversionOptions' specifying: + - Log file path for tracking conversion issues. + - Target format - 'ap.PdfFormat.PDF_E_1'. + - Error action - 'ap.ConvertErrorAction.DELETE' to remove non-compliant elements. +1. Convert the PDF using 'document.convert(options)'. +1. Save the converted PDF to the specified output path. +1. Print a confirmation message. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + path_logfile = path_outfile.replace(".pdf","_log.xml") + + document = ap.Document(path_infile) + options = ap.PdfFormatConversionOptions(path_logfile, ap.PdfFormat.PDF_E_1, ap.ConvertErrorAction.DELETE) + + document.convert(options) + + # Save PDF document + document.save(path_outfile) + print(infile + " converted into " + outfile) +``` + +## Convert PDF to PDF/X + +Next code snippet converts a PDF document into PDF/X-4 format, which is an ISO standard commonly used in the printing and publishing industry. PDF/X-4 ensures color accuracy, maintains transparency, and embeds ICC profiles for consistent output across devices. + +1. Load the source PDF using 'ap.Document'. +1. Create 'PdfFormatConversionOptions' specifying: + - Log file path. + - Target format - 'ap.PdfFormat.PDF_X_4'. + - Error action - 'ap.ConvertErrorAction.DELETE' to remove non-compliant elements. +1. Provide the **ICC profile file** for color management via 'icc_profile_file_name'. +1. Specify an **OutputIntent** with a condition identifier (e.g., "FOGRA39") for printing requirements. +1. Convert the PDF using 'document.convert()'. +1. Save the converted PDF to the specified output path. +1. Print a confirmation message. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + path_logfile = path_outfile.replace(".pdf","_log.xml") + + document = ap.Document(path_infile) + options = ap.PdfFormatConversionOptions(path_logfile, ap.PdfFormat.PDF_X_4, ap.ConvertErrorAction.DELETE) + + # Provide the name of the external ICC profile file (optional) + options.icc_profile_file_name = path.join(self.data_dir,"ISOcoated_v2_eci.icc") + # Provide an output condition identifier and other necessary OutputIntent properties (optional) + options.output_intent = ap.OutputIntent("FOGRA39") + + document.convert(options) + + # Save PDF document + document.save(path_outfile) + print(infile + " converted into " + outfile) +``` \ No newline at end of file diff --git a/en/python-net/converting/convert-pdf-to-pdfa/pdf_to_pdfa.png b/en/python-net/converting/convert-pdf-to-pdfx/pdf_to_pdfa.png similarity index 100% rename from en/python-net/converting/convert-pdf-to-pdfa/pdf_to_pdfa.png rename to en/python-net/converting/convert-pdf-to-pdfx/pdf_to_pdfa.png diff --git a/en/python-net/converting/convert-pdf-to-powerpoint/_index.md b/en/python-net/converting/convert-pdf-to-powerpoint/_index.md index 26a0041f9f..197a8e078b 100644 --- a/en/python-net/converting/convert-pdf-to-powerpoint/_index.md +++ b/en/python-net/converting/convert-pdf-to-powerpoint/_index.md @@ -14,34 +14,19 @@ AlternativeHeadline: How to Convert PDF to PowerPoint in Python Abstract: This article provides a comprehensive guide on converting PDF files into PowerPoint presentations using Python, specifically focusing on the PPTX format. It introduces the use of Aspose.PDF for Python via .NET, which facilitates the conversion process by allowing PDF pages to be transformed into individual slides in a PPTX file. The article outlines the necessary steps for conversion, including creating instances of the Document and PptxSaveOptions classes and utilizing the Save method. Additionally, it highlights a feature to convert PDFs to PPTX with slides as images by setting a specific property in the PptxSaveOptions. Code snippets are provided to illustrate the conversion process. The article also references an online application for testing the PDF to PPTX conversion feature, offering users a hands-on experience. Furthermore, it lists various related topics and functionalities available within this context, emphasizing the versatility and programmatic approach to handling PDF to PowerPoint conversions using Python. --- -## Overview - -Is it possible to convert a PDF file into a PowerPoint? Yes, you can! And it's easy! -This article explains how to **convert PDF to PowerPoint using Python**. It covers these topics. - -_Format_: **PPTX** -- [Python PDF to PPTX](#python-pdf-to-pptx) -- [Python Convert PDF to PPTX](#python-pdf-to-pptx) -- [Python How to convert PDF file to PPTX](#python-pdf-to-pptx) - -_Format_: **PowerPoint** -- [Python PDF to PowerPoint](#python-pdf-to-powerpoint) -- [Python Convert PDF to PowerPoint](#python-pdf-to-powerpoint) -- [Python How to convert PDF file to PowerPoint](#python-pdf-to-powerpoint) - ## Python PDF to PowerPoint and PPTX Conversion **Aspose.PDF for Python via .NET** lets you track the progress of PDF to PPTX conversion. -We have an API named Aspose.Slides which offers the feature to create as well as manipulate PPT/PPTX presentations. This API also provides the feature to convert PPT/PPTX files to PDF format. During this conversion, the individual pages of the PDF file are converted to separate slides in the PPTX file. +We have an API named Aspose.Slides which offers the feature to create as well as manipulate PPT/PPTX presentations. This API also provides the feature to convert PPTX files to PDF format. During this conversion, the individual pages of the PDF file are converted to separate slides in the PPTX file. -During PDF to PPTX conversion, the text is rendered as Text where you can select/update it. Please note that in order to convert PDF files to PPTX format, Aspose.PDF provides a class named [PptxSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/pptxsaveoptions/). An object of the PptxSaveOptions class is passed as a second argument to the [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods). The following code snippet shows the process for converting PDF files into PPTX format. +During PDF to PPTX conversion, the text is rendered as Text where you can select/update it. Please note that in order to convert PDF files to PPTX format, Aspose.PDF provides a class named [PptxSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/pptxsaveoptions/). An object of the PptxSaveOptions class is passed as a second argument to the [save()](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/#methods). The following code snippet shows the process for converting PDF files into PPTX format. ## Simple conversion PDF to PowerPoint using Python and Aspose.PDF for Python via .NET In order to convert PDF to PPTX, Aspose.PDF for Python advice to use the following code steps. -Steps: Convert PDF to PowerPoint in Python | Steps: Convert PDF to PPTX in Python +Steps: Convert PDF to PowerPoint in Python 1. Create an instance of [Document](https://reference.aspose.com/pdf/python-net/aspose.pdf/document/) class. 1. Create an instance of [PptxSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/pptxsaveoptions/) class. @@ -67,7 +52,8 @@ In order to convert PDF to PPTX, Aspose.PDF for Python advice to use the followi {{% alert color="success" %}} **Try to convert PDF to PowerPoint online** -Aspose.PDF for Python via .NET presents you online free application ["PDF to PPTX"](https://products.aspose.app/pdf/conversion/pdf-to-pptx), where you may try to investigate the functionality and quality it works. +Aspose.PDF present you online free application ["PDF to PPTX"](https://products.aspose.app/pdf/conversion/pdf-to-pptx), where you may try to investigate the functionality and quality it works. + [![Aspose.PDF Convertion PDF to PPTX with Free App](pdf_to_pptx.png)](https://products.aspose.app/pdf/conversion/pdf-to-pptx) {{% /alert %}} @@ -95,7 +81,7 @@ In case if you need to convert a searchable PDF to PPTX as images instead of sel This method converts a PDF document into a PowerPoint (PPTX) file while setting a custom image resolution (300 DPI) for improved quality. -1. Load the PDF into an 'apdf.Document' object. +1. Load the PDF into an 'ap.Document' object. 1. Create a 'PptxSaveOptions' instance. 1. Set the 'image_resolution' property to 300 DPI for high-quality rendering. 1. Save the PDF as a PPTX file using the defined save options. @@ -116,27 +102,3 @@ This method converts a PDF document into a PowerPoint (PPTX) file while setting print(infile + " converted into " + outfile) ``` - -## See Also - -This article also covers these topics. The codes are same as above. - -_Format_: **PowerPoint** -- [Python PDF to PowerPoint Code](#python-pdf-to-powerpoint) -- [Python PDF to PowerPoint API](#python-pdf-to-powerpoint) -- [Python PDF to PowerPoint Programmatically](#python-pdf-to-powerpoint) -- [Python PDF to PowerPoint Library](#python-pdf-to-powerpoint) -- [Python Save PDF as PowerPoint](#python-pdf-to-powerpoint) -- [Python Generate PowerPoint from PDF](#python-pdf-to-powerpoint) -- [Python Create PowerPoint from PDF](#python-pdf-to-powerpoint) -- [Python PDF to PowerPoint Converter](#python-pdf-to-powerpoint) - -_Format_: **PPTX** -- [Python PDF to PPTX Code](#python-pdf-to-pptx) -- [Python PDF to PPTX API](#python-pdf-to-pptx) -- [Python PDF to PPTX Programmatically](#python-pdf-to-pptx) -- [Python PDF to PPTX Library](#python-pdf-to-pptx) -- [Python Save PDF as PPTX](#python-pdf-to-pptx) -- [Python Generate PPTX from PDF](#python-pdf-to-pptx) -- [Python Create PPTX from PDF](#python-pdf-to-pptx) -- [Python PDF to PPTX Converter](#python-pdf-to-pptx) diff --git a/en/python-net/converting/convert-pdf-to-word/_index.md b/en/python-net/converting/convert-pdf-to-word/_index.md index daef56eb41..688de97861 100644 --- a/en/python-net/converting/convert-pdf-to-word/_index.md +++ b/en/python-net/converting/convert-pdf-to-word/_index.md @@ -1,6 +1,6 @@ --- title: Convert PDF to Microsoft Word Documents in Python -linktitle: Convert PDF to Word 2003/2019 +linktitle: Convert PDF to Word type: docs weight: 10 url: /python-net/convert-pdf-to-word/ @@ -14,34 +14,15 @@ AlternativeHeadline: How to Convert PDF to Word in Python Abstract: This article provides a comprehensive guide on converting PDF files to Microsoft Word formats (DOC and DOCX) using Python, specifically utilizing the Aspose.PDF library. It outlines the advantages of converting PDFs to editable Word documents, enabling easier content manipulation such as text, tables, and images. The article details the process of converting PDF to DOC (Word 97-2003 format) and DOCX, with code snippets demonstrating these conversions through Python. The process involves creating a `Document` object from the PDF and saving it in the desired format using the `save()` method and the `SaveFormat` enumeration. Additionally, it introduces the `DocSaveOptions` class, which allows further customization of the conversion process, such as specifying recognition modes. The article also highlights online applications provided by Aspose.PDF for testing the conversion quality and functionality. The content includes a structured overview and links to corresponding sections for each format. --- -## Overview - -This article explains how to **convert PDF to Microsoft Word Documents using Python**. It covers these topics. - -_Format_: **DOC** -- [Python PDF to DOC](#python-pdf-to-doc) -- [Python Convert PDF to DOC](#python-pdf-to-doc) -- [Python How to convert PDF file to DOC](#python-pdf-to-doc) - -_Format_: **DOCX** -- [Python PDF to DOCX](#python-pdf-to-docx) -- [Python Convert PDF to DOCX](#python-pdf-to-docx) -- [Python How to convert PDF file to DOCX](#python-pdf-to-docx) - -_Format_: **Word** -- [Python PDF to Word](#python-pdf-to-docx) -- [Python Convert PDF to Word](#python-pdf-to-doc) -- [Python How to convert PDF file to Word](#python-pdf-to-docx) - ## Convert PDF to DOC One of the most popular features is the PDF to Microsoft Word DOC conversion, which makes content management easier. **Aspose.PDF for Python via .NET** allows you to convert PDF files not only to DOC but also to DOCX format, easily and efficiently. The [DocSaveOptions](https://reference.aspose.com/pdf/python-net/aspose.pdf/docsaveoptions/) class provides numerous properties that improve the process of converting PDF files to DOC format. Among these properties, Mode enables you to specify the recognition mode for PDF content. You can specify any value from the RecognitionMode enumeration for this property. Each of these values has specific benefits and limitations: -Steps: Convert PDF to DOC in Python +Steps: Convert PDF to DOC in Python -1. Load the PDF into an 'apdf.Document' object. +1. Load the PDF into an 'ap.Document' object. 1. Create a 'DocSaveOptions' instance. 1. Set the format property to 'DocFormat.DOC' to ensure the output is in .doc format (older Word format). 1. Save the PDF as a Word document using the specified save options. @@ -77,9 +58,9 @@ Aspose.PDF for Python API lets you read and convert PDF documents to DOCX using The following Python code snippet shows the process of converting a PDF file into DOCX format. -Steps: Convert PDF to DOCX in Python +Steps: Convert PDF to DOCX in Python -1. Load the source PDF using 'apdf.Document'. +1. Load the source PDF using 'ap.Document'. 1. Create an instance of 'DocSaveOptions'. 1. Set the format property to 'DocFormat.DOC_X' to generate a .docx file (modern Word format). 1. Save the PDF as a DOCX file with the configured save options. @@ -111,37 +92,3 @@ Aspose.PDF for Python presents you online free application ["PDF to Word"](https [![Aspose.PDF Convertion PDF to Word Free App](/pdf/net/images/pdf_to_word.png)](https://products.aspose.app/pdf/conversion/pdf-to-docx) {{% /alert %}} - -## See Also - -This article also covers these topics. The codes are same as above. - -_Format_: **Word** -- [Python PDF to Word Code](#python-pdf-to-docx) -- [Python PDF to Word API](#python-pdf-to-docx) -- [Python PDF to Word Programmatically](#python-pdf-to-docx) -- [Python PDF to Word Library](#python-pdf-to-docx) -- [Python Save PDF as Word](#python-pdf-to-docx) -- [Python Generate Word from PDF](#python-pdf-to-docx) -- [Python Create Word from PDF](#python-pdf-to-docx) -- [Python PDF to Word Converter](#python-pdf-to-docx) - -_Format_: **DOC** -- [Python PDF to DOC Code](#python-pdf-to-doc) -- [Python PDF to DOC API](#python-pdf-to-doc) -- [Python PDF to DOC Programmatically](#python-pdf-to-doc) -- [Python PDF to DOC Library](#python-pdf-to-doc) -- [Python Save PDF as DOC](#python-pdf-to-doc) -- [Python Generate DOC from PDF](#python-pdf-to-doc) -- [Python Create DOC from PDF](#python-pdf-to-doc) -- [Python PDF to DOC Converter](#python-pdf-to-doc) - -_Format_: **DOCX** -- [Python PDF to DOCX Code](#python-pdf-to-docx) -- [Python PDF to DOCX API](#python-pdf-to-docx) -- [Python PDF to DOCX Programmatically](#python-pdf-to-docx) -- [Python PDF to DOCX Library](#python-pdf-to-docx) -- [Python Save PDF as DOCX](#python-pdf-to-docx) -- [Python Generate DOCX from PDF](#python-pdf-to-docx) -- [Python Create DOCX from PDF](#python-pdf-to-docx) -- [Python PDF to DOCX Converter](#python-pdf-to-docx) diff --git a/en/python-net/converting/convert-pdfx-to-pdf/_index.md b/en/python-net/converting/convert-pdfx-to-pdf/_index.md new file mode 100644 index 0000000000..fba3351222 --- /dev/null +++ b/en/python-net/converting/convert-pdfx-to-pdf/_index.md @@ -0,0 +1,61 @@ +--- +title: Convert PDF/x to PDF formats in Python +linktitle: Convert PDF/x to PDF formats +type: docs +weight: 120 +url: /python-net/convert-pdf_x-to-pdf/ +lastmod: "2025-09-27" +description: This topic shows you how to convert PDF/x to PDF formats using Aspose.PDF for Python via .NET. +sitemap: + changefreq: "monthly" + priority: 0.8 +TechArticle: true +AlternativeHeadline: How to convert PDF/x to PDF formats +Abstract: The article provides a comprehensive guide on converting PDF/UA, and PDF/A to PDF file using Aspose.PDF for Python. +--- + +**PDF/x to PDF format means the ability to convert PDF/UA, and PDF/A to PDF file.** + +## Convert PDF/A to PDF + +1. Load the PDF document using 'ap.Document'. +1. Call 'remove_pdfa_compliance()' to strip all PDF/A-related compliance settings and metadata. +1. Save the resulting PDF to the output path. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + document.remove_pdfa_compliance() + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` + +## Removing PDF/UA compliance + +This function demonstrates a two-step conversion process: first removing PDF/UA (Universal Accessibility) compliance, and then converting the resulting PDF into PDF/A-1B format with automatic tagging for accessibility and semantic structure. + +1. Load the PDF document using 'ap.Document()'. +1. Call 'document.remove_pdfa_compliance()' to remove any PDF/A restrictions or compliance settings. +1. Save the modified PDF to 'path_outfile'. + +```python + + from os import path + import aspose.pdf as ap + + path_infile = path.join(self.data_dir, infile) + path_outfile = path.join(self.data_dir, "python", outfile) + + document = ap.Document(path_infile) + document.remove_pdfa_compliance() + document.save(path_outfile) + + print(infile + " converted into " + outfile) +``` \ No newline at end of file