HTML Content Extraction: Articles, Tables, and Lists