Exporting Richly Formatted Text In Python

Today, I wondered whether I could automatically save an image of colored text from the Python console.  I was looking for a way to display very long strings that automatically wrapped to its container, so I avoided the dreaded run-on string that never ends. Also, could I save the image elegantly with high resolution? In this article, I will discuss potential approaches for getting a file with nice-looking colored text that can be programmatically generated.

Option #1: Colored Text to Image

My first thought was to save colored text from the terminal as an image. However, this does not seem possible. All images that I found online were screenshots of the terminal, not programmatically images. This was disappointing because it is pretty straightforward to produce nice-looking text that emphasizes selected words with bright colors in the console using the colorama or termcolor packages as I’ve shown in a previous post

The major drawback of this approach is that I have to manually screenshot all of my outputs or I would need a program outside of Python to capture the outputs, which was a no-go for me.

Option #2: HTML to Image

Another option was to generate formatted text and save it as an html file then automatically convert the content into an image. It is fairly straightforward to write python code that will output a HTML file, if you know a bit of CSS and HTML. However, how would I automatically extract the images? I found a library called imgkit that supposedly converts HTML to an image format, but it did not work, so I used Selenium instead to capture screenshots of webpages. 

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
 
#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1024,960")
browser = webdriver.Chrome(executable_path=r'C:\Users\...\chromedriver_win32\chromedriver.exe', options=options)
#--| Parse or automation
browser.get('https://www.google.com/')
browser.save_screenshot("screenshot.png")
browser.close()
Screenshot image of Google’s homepage generated by code above

However, I would need to spend time tediously formatting text as HTML and tweak the image size settings to crop the output image, so next!

Option #3: Text Drawing using Pillow

We could generate a customized image with a selected background and font type of text with the Python library Pillow. The parameters optimize and quality can be used to control the size of the output image. Check this site for details on how to automatically wrap text to the size of your rectangular canvas. Now, nice-looking images can be created on the fly, but this seems like overkill for my intended purpose.

Option #4: Save Text as a PDF or MS Word File

Another option was to save my text as a PDF and to highlight/color selected words in the file. This could work well because PDFs can fit lots of text on one page and are pretty popular. There should be sufficient documentation and libraries in Python on writing to PDFs, which is a plus. However, PDFs are not editable, a feature that should facilitate fast and easy feedback from my end users. You can use reportlab or pyfpdf libraries to create PDFs in Python. ReportLab is a good solution for creating full-featured PDFs. It also has a commercial version called ReportLab-Plus that companies can utilize for complex solutions. PyFPDF is simple, compact to use for creating PDFs from pure PHP (see http://www.fpdf.org/ for more info). PyFPDF has more extensive documentation and seems more popular. 

With a Microsoft Word file, the format of the text can be easily changed – programmatically or by the end user.  Python-docx can be used to read and write MS docx files in Python.  There are many capabilities such as adding paragraphs, images, page breaks and headers as well as styling (bold, italic, underline and emboss).

Option #5: Save Text as a MS Excel File

This option is similar to the MS Word option, except it gives the end user more information in a tabular format that provides increased flexibility for searching and filtering. From the workbook, we can format the strings in a cell to have various colors, be bold/italic or be center-aligned using xlsxwriter library. Pandas uses xlsxwriter by default to write excel files, which makes formatting strings easy to integrate into our workflow. You can use the write_rich_string method developed by John NcNamara here. Below is some example code to illustrate the formatting capability of xlsxwriter:

import re
import xlsxwriter

workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True, 'color': 'red'})
ital = workbook.add_format({'italic': True,'color': 'blue'})
green = workbook.add_format({"font_name":'Times New Roman','font_size':20,'color': 'green'})
cell_formats = [bold,ital, green]

# Make the column wider for clarity.
worksheet.set_column('A:A', 35)

# Some sample test data.
strings = ["datathrillz.com is the best!",
           "datathrillz.com is the best!",
           "datathrillz.com is the best!"]

# Iterate over the strings.
for row_num, string in enumerate(strings):
    # Split the sentence on the word "best", on word boundaries.
    segments = re.split(r'(\bdatathrillz.com\b)', string)

    if len(segments) == 1:
        # String doesn't have a match, write as a normal string.
        worksheet.write(row_num, 0, string)
    else:
        # Iternate through the segments and add a format before the matches.
        tmp_array = []
        for segment in segments:
            if segment == 'datathrillz.com':
                # Add a bold format before the matched word.
                tmp_array.append(cell_formats[row_num])
                tmp_array.append(segment)
            elif not segment:
                # Ignore blank segments.
                pass
            else:
                tmp_array.append(segment)

        worksheet.write_rich_string(row_num, 0, *tmp_array)
workbook.close()
Result in Excel generated from above code

For guidance on writing Excel files based on dynamically valued strings, check out this Stackoverflow thread.

I decided to save the formatted text as an Excel file in order to permit users to search and filter by columns. I had hundreds of snippets of text to be formatted for various groups, so exporting as a table makes more sense than exporting the snippets individually. Also, because Excel is a pretty popular tool that non-technical users don’t fear using, I expect increased engagement with the data which will hopefully facilitate insights. 

We have discussed 5 approaches for automatically saving formatted text using Python. With several choices, we considered the pros and cons of each approach before selecting the best one for our use case.  

Do you have a better way to generate and capture colored text? 

Exporting Richly Formatted Text In Python

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top