Remove HTML Tags in Javascript with Regex

I am trying to remove all the html tags out of a string in Javascript. Heres what I have... I can't figure out why its not working....any know what I am doing wrong?

<script type="text/javascript">

var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);

</script>

Thanks a lot!

Answers:

Answer

Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:

var regex = /(<([^>]+)>)/ig
,   body = "<p>test</p>"
,   result = body.replace(regex, "");

console.log(result);

If you're willing to use a library such as jQuery, you could simply do this:

console.log($('<p>test</p>').text());
Answer

Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.

@license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
    var tmp = document.createElement("DIV");
    tmp.innerHTML = html;
    var res = tmp.textContent || tmp.innerText || '';
    res.replace('\u200B', ''); // zero width space
    res = res.trim();
    return res;
}
Answer

you can use a powerful library for management String which is undrescore.string.js

_('a <a href="#">link</a>').stripTags()

=> 'a link'

_('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags()

=> 'a linkalert("hello world!")'

Don't forget to import this lib as following :

        <script src="underscore.js" type="text/javascript"></script>
        <script src="underscore.string.js" type="text/javascript"></script>
        <script type="text/javascript"> _.mixin(_.str.exports())</script>
Answer

my simple JavaScript library called FuncJS has a function called "strip_tags()" which does the task for you — without requiring you to enter any regular expressions.

For example, say that you want to remove tags from a sentence - with this function, you can do it simply like this:

strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");

This will produce "This string contains a lot of tags!".

For a better understanding, please do read the documentation at GitHub FuncJS.

Additionally, if you'd like, please provide some feedback through the form. It would be very helpful to me!

Answer

For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

Answer

The way I do it is practically a one-liner.

The function creates a Range object and then creates a DocumentFragment in the Range with the string as the child content.

Then it grabs the text of the fragment, removes any "invisible"/zero-width characters, and trims it of any leading/trailing white space.

I realize this question is old, I just thought my solution was unique and wanted to share. :)

function getTextFromString(htmlString) {
    return document
        .createRange()
        // Creates a fragment and turns the supplied string into HTML nodes
        .createContextualFragment(htmlString)
        // Gets the text from the fragment
        .textContent
        // Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters
        .replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '')
        // Trims off any extra space on either end of the string
        .trim();
}

var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>');

alert(cleanString);
Answer

This is an old question, but I stumbled across it and thought I'd share the method I used:

var body = '<div id="anid">some <a href="link">text</a></div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

sanitized will now contain: "some text and some more text"

Simple, no jQuery needed, and it shouldnt let you down even in more complex cases :)

James

Answer

This worked for me.

   var regex = /(&nbsp;|<([^>]+)>)/ig
      ,   body = tt
     ,   result = body.replace(regex, "");
       alert(result);
Answer

This is a solution for HTML tag and &nbsp etc and you can remove and add conditions to get the text without HTML and you can replace it by any.

convertHtmlToText(passHtmlBlock)
{
   str = str.toString();
  return str.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, 'ReplaceIfYouWantOtherWiseKeepItEmpty');
}
Answer
<html>
<head>
<script type="text/javascript">
function striptag(){
var html = /(<([^>]+)>)/gi;
for (i=0; i < arguments.length; i++)
arguments[i].value=arguments[i].value.replace(html, "")
}
</script>
</head> 
<body>
       <form name="myform">
<textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br>
<input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)">
</form>
</body>
</html>
Answer

The selected answer doesn't always ensure that HTML is stripped, as it's still possible to construct an invalid HTML string through it by crafting a string like the following.

  "<<h1>h1>foo<<//</h1>h1/>"

This input will ensure that the stripping assembles a set of tags for you and will result in:

  "<h1>foo</h1>"

additionally jquery's text function will strip text not surrounded by tags.

Here's a function that uses jQuery but should be more robust against both of these cases:

var stripHTML = function(s) {
    var lastString;

    do {            
        s = $('<div>').html(lastString = s).text();
    } while(lastString !== s) 

    return s;
};
Answer

Like others have stated, regex will not work. Take a moment to read my article about why you cannot and should not try to parse html with regex, which is what you're doing when you're attempting to strip html from your source string.

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us Javascript

©2020 All rights reserved.