Regular expressions work on regular languages and HTML is not a regular language. So, while you can do some limited extraction of html using a regexp, regexps are not the right tool for the job. Instead of this, I’d suggest you using the Dom.Document class and the Dom.XmlNode class.