Open Source Wikipedia To Markdown Generator

← Go Back Software
670 Words • ~4 Minute Reading Time
Open Source Wikipedia To Markdown Generator by Eric David Smith
Click image to view on GitHub

If you want to convert a Wikipedia article to Markdown, you can use my open source package I wrote to do it in seconds.

I made it because I wanted to convert some Wikipedia articles to Markdown for my personal notes and some AI / ML projects. I couldn't find a simple script to do this, so I wrote one myself. I hope you find it useful.

This is a simple script to convert a Wikipedia article to Markdown and optionally download the images too.


  • Python 3


git clone
cd wikipedia-markdown-generator
pip3 install -r requirements.txt


python3 <topic_name>


The output is a Markdown file with the same name as the topic name under the newly created directory md_output if using If you want to download images too, use the file and the images will be placed inside md_output/images/.

Note: eventually, and will be combined into one script with a flag to download images or not.


I wanted to convert some Wikipedia articles to Markdown for my personal notes. I couldn't find a simple script to do this, so I wrote one myself.

Is This Open Source?

Yes, I wouldn't have it any other way. I hope you find it useful.


There are two scripts, one that downloads images and one that doesn't. I'll show you both.

Without Images

Here's the file:

import os
import wikipedia
import argparse
import re

def generate_markdown(topic):
        page =
    except wikipedia.exceptions.DisambiguationError as e:
        return None
    except wikipedia.exceptions.PageError:
        print(f"Page not found for the topic: {topic}")
        return None

    markdown_text = f"# {topic}\n\n"

    page_content = re.sub(r"=== ([^=]+) ===", r"### \1", page.content)
    page_content = re.sub(r"== ([^=]+) ==", r"## \1", page_content)

    sections = re.split(r"\n(## .*)\n", page_content)
    for i in range(0, len(sections), 2):
        if i + 1 < len(sections) and any(
            line.strip() for line in sections[i + 1].split("\n")
            markdown_text += f"{sections[i]}\n{sections[i+1]}\n\n"

    # Create a directory for markdown files
    directory = "md_output"
    os.makedirs(directory, exist_ok=True)

    filename = os.path.join(directory, f"{topic.replace(' ', '_')}.md")

    with open(filename, "w") as md_file:

    print(f"Markdown file created: {filename}")
    return filename

parser = argparse.ArgumentParser(
    description="Generate a markdown file for a provided topic."
    help="The topic to generate a markdown file for.",

args = parser.parse_args()

topic = f"{args.topic}"


With Images

Here's the file (incase you want to scrape images too):

import os
import wikipedia
import argparse
import re
import requests
import urllib.parse

def generate_markdown(topic):
        page =
    except wikipedia.exceptions.DisambiguationError as e:
        return None
    except wikipedia.exceptions.PageError:
        print(f"Page not found for the topic: {topic}")
        return None

    markdown_text = f"# {topic}\n\n"

    page_content = re.sub(r"=== ([^=]+) ===", r"### \1", page.content)
    page_content = re.sub(r"== ([^=]+) ==", r"## \1", page_content)

    sections = re.split(r"\n(## .*)\n", page_content)
    for i in range(0, len(sections), 2):
        if i + 1 < len(sections) and any(
            line.strip() for line in sections[i + 1].split("\n")
            markdown_text += f"{sections[i]}\n{sections[i+1]}\n\n"

    # Create a directory for markdown files
    output_directory = "md_output"
    os.makedirs(output_directory, exist_ok=True)

    # Create a directory for image files
    image_directory = os.path.join(output_directory, "images")
    os.makedirs(image_directory, exist_ok=True)

    for image_url in page.images:
        image_filename = urllib.parse.unquote(os.path.basename(image_url))
        image_path = os.path.join(image_directory, image_filename)
        image_data = requests.get(image_url).content
        with open(image_path, "wb") as image_file:
        markdown_text += f"![{image_filename}](./images/{image_filename})\n"

    filename = os.path.join(output_directory, f'{topic.replace(" ", "_")}.md')

    with open(filename, "w") as md_file:

    print(f"Markdown file created: {filename}")
    return filename

parser = argparse.ArgumentParser(
    description="Generate a markdown file for a provided topic."
    help="The topic to generate a markdown file for.",

args = parser.parse_args()

topic = f"{args.topic}"



This project is licensed under the MIT License - see the LICENSE file for details.


If you find this useful as is please let me know. If you find any bugs, please feel free to submit a pull request or open an issue. If you have any questions, you can contact me.

Supporting My Work

Please consider Buying Me A Coffee. I work hard to bring you my best content and any support would be greatly appreciated. Thank you for your support!

Eric David Smith
Father / Software Engineer / Musician / Entrepreneur

Discover More (16) Software

Blog Tags