Fandom

Programmer's Wiki

Read HTML

406pages on
this wiki
Add New Page
Talk0 Share
16px-Pencil

Reading HTML is commonly known as "screen scraping". The Idea behind this is to automate the task of reading web pages and extracting the data in them so that some useful information can be extracted and analysed. The main problem is that HTML on the web doesn't always follow the standards so you need to have code that cleans up the HTML for you.

Robots.txt Edit

Webmasters can create a file called robots.txt telling screen scraping bots what pages if any can be read. It is a good idea to respect this.

See Also Edit

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.