! Note !
A few more notes and links may be added,
if/when I revisit this page in the future.
Sections of this page, below:
In 2014, I found that I was getting more and more frustrated with encountering web pages that took minutes --- not seconds, MINUTES --- to load into my web browser.
I noticed that at the bottom of my web browser, there is a 'status line' which shows the web hostnames that are being accessed (connected to) --- as the web page is being processed by my web browser.
I could tell which web hosts were causing a page to load slowly --- because the web host name would be 'frozen' in the status line for many seconds --- sometimes a minute or more.
(I like to use the Mozilla 'Seamonkey' web browser --- because it has a robust 'Bookmarks Manager' --- to deal with the thousands of bookmarks that I have collected over the years. The Seamonkey web browser is similar to the Mozilla Firefox browser in many ways.)
(I often browse the Internet on a little netbook computer, while watching TV --- to have something useful to do while the cable TV channels --- FOR WHICH I PAY A VERY HEALTHY MONTHLY FEE --- bombard me with a half-dozen to a dozen ads at a time --- many times per hour. The old Intel Atom N450 chip on that netbook computer is rated less than half as fast as higher level Intel chips --- such as 'i3' or 'i5' or 'i7' --- but it still should browse most sites quickly. The pages of my site --- this site --- come up quickly on that netbook computer --- as does the simple Google search page.)
Often a web page would appear to be completely loaded, but my cursor would be 'spinning' --- showing that the web browser was still 'busy' --- and a hostname would be 'frozen' in the status line while my web browser program was apparently trying to connect to that host --- apparently with little success. (OR that Internet host was trying to collect a heck of a lot of information from my computer --- OR that host's processing was caught in a loop of its own twisted making.)
Some web hostnames would flash by so fast I could not read them. But some would stay in the 'status line' for many seconds --- hostnames like those marked with an exclamation point in the image at the top of this page:
These are all Google related. (Google bought DoubleClick in 2008.) Some other web hostnames would also appear 'frozen' in my web browser status line. Names like
These are all '3rd party' hosts that are getting between me (the 'first party') and the web site (the '2nd party') that I am trying to visit. Typically I arrive at a web page because a web search has led me to a page at the web site --- web pages at sites like
The occurrence of 'brown-outs' and 'freezes' like these were getting so annoying (and were becoming such a major time-waster) that I decided to look for a solution.
I found that I mainly needed to avoid connecting to certain hostnames that were not providing any service to me --- but presumably some service to the maintainers of the web site --- such as gathering my personal information or popping up 'monetizing' ads in my face. F**k that sh*t.
Host Block Lists
I found that when I did a web search on words like 'google-analytics slow web browsing', I found many pages with titles like 'Tired of waiting for www.google-analytics.com' and 'Why I removed Google Analytics from my website'.
I immediately found that some people were blocking connection to such hostnames via lines like
added to a 'local hosts file'. On Linux operating systems, the fully-qualified name of that file is '/etc/hosts'.
I found that when I did a web search on words like 'hosts block list', the first hits that came up were at winhelp2002.mvps.org/hosts.htm --- titled 'Blocking Unwanted Connections with a Hosts File' --- and someonewhocares.org/hosts/ --- titled 'Using a Hosts File To Make The Internet Not Suck (as much)'.
These are sites that offer thousands of lines that you can add to your 'hosts file' to block connections to offending hosts.
I immediately had several questions:
It was not easy to find answers to these questions via web searches.
Question-1: '0.0.0.0' or '127.0.0.1' ?
There has been some controversy over whether '0.0.0.0' or '127.0.0.1' results in faster blocking.
As far as Linux goes (in 2015 --- or, more specifically, with Ubuntu 9.10), there appears to be no significant difference. I found that when I made an '/etc/hosts' file containing lines like
and when I tested by doing a 'ping' of such a host, I got the following type of result:
PING www.google-analytics.com (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.040 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.049 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=0.039 ms
Note that although I had '0.0.0.0' in the /etc/hosts file, the 'ping' was indicating that the hostname was being 'automatically' translated to '127.0.0.1'.
Furthermore, note that the 'pings' were being resolved in hundredths of a millisecond rather than the typical tens or hundreds of milliseconds. Example:
PING www.google.com (188.8.131.52) 56(84) bytes of data.
64 bytes from yh-in-f104.1e100.net (184.108.40.206): icmp_seq=1 ttl=46 time=30.1 ms
64 bytes from yh-in-f104.1e100.net (220.127.116.11): icmp_seq=2 ttl=46 time=29.7 ms
64 bytes from yh-in-f104.1e100.net (18.104.22.168): icmp_seq=3 ttl=46 time=30.6 ms
So the '0.0.0.0' statements in the /etc/hosts file are doing their job of blocking the connection to an external host --- by fooling the web browser (and 'ping') into trying a local connection.
At one time, the 'winhelp2002.mvps.org' site said that there was no difference in using '0.0.0.0' or '127.0.0.1'. However, in 2015 Jan, I see the page winhelp2002.mvps.org/hosts.htm contains the following sentence.
The HOSTS file now contains a change in the prefix in the HOSTS entries to "0.0.0.0" instead of the usual "127.0.0.1". This was done to resolve a slowdown issue that occurs with the change Microsoft made in the "TCP loopback interface" in Win8.1.
So, on some operating systems, you may find a difference between using '0.0.0.0' and '127.0.0.1'.
I am using '0.0.0.0' on my Ubuntu 9.10 (2009 October, Karmic Koala) version of Linux.
By the way, the following image indicates how I edit (add hostname lines to) my /etc/hosts file.
I backed up my original /etc/hosts files to /etc/hosts_ORIG by using a 'sudo cp' command.
Nowadays I go into edit mode on the 'hosts' file using a command like 'sudo gedit hosts'. I enter (or paste) new hostname lines into the editor window. I simply use the 'Save' option when done.
I also use 'Save As...' to make a copy to a backup file with a name like 'hosts_2015jan30'.
Unfortunately, the 'hosts.deny' and 'hosts.allow' files are
not typically used to block communication to certain hosts
from web browsers because those two files work with a
system which is oriented more toward stopping
ssh (login ... *incoming*) attacks. The 'hosts.deny' and
'hosts.allow' files 'apparently' cannot be used to block *outgoing*
TCP communications --- from apps such as web browsers
using the HTTP protocol.
These two files would be a great option, because they
blocking (or allowing) entire ranges of IP addresses with
a single control statement. (See 'iptables' notes below.)
These two files would be a great option, because they allow for blocking (or allowing) entire ranges of IP addresses with a single control statement. (See 'iptables' notes below.)
Question-2: Huge Hosts File Affects Performance ?
In doing web searches on this question, I found a few people who claimed that you can have many thousands of entries in the 'hosts file' without impacting performance in the 'host lookups'.
However, I finally found a web page (a 2009 posting at 'discussions.apple.com') that indicated that a really big 'hosts file' (16,000-plus lines) can have a really bad impact on computer performance (using 100% of all 8 cores of an 8-core CPU --- for 'Directory Services').
I prefer to assemble my own /etc/hosts file --- adding hosts as I encounter them.
And if I find that I am accumulating many thousands of hostnames, I may comment out many of the ones that seem to 'resolve' very quickly.
The hostnames that actually show up in the 'status line' of my web browser for at least a second or two, I will enter (or leave uncommented) in my /etc/hosts file.
Question-3: How to capture offensive hostnames from a web page ?
The status line ? (NO)
Since many of the offensive hostnames 'zipped by' on the 'status line' of my web browser, I needed to find a different way to gather the offensive hostnames from a web page.
The web page source ? (NO)
Browsing the source of a web page would be very tedious and time-consuming. So using the 'View > Page Source' option of my web browser was not going to cut it --- even if I would save the page-source to a file and apply a script to that file to extract the pertinent hostnames. (It would be a challenge to write such a script.)
The 'tcpdump' command ? (NO)
Another solution I considered was using the 'tcpdump' command --- just before loading a web page --- to capture the hostnames from SYN or ACK packets.
However, I found that rather than capturing names like 'www.google.com', the 'tcpdump' command was showing names like 'yh-in-f104.1e100.net' --- the names after they had been converted to another hostname. This was probably not going to work well --- to block the connection to the original hostname.
The 'Page Info' option of Seamonkey ? (YES)
In searching the interface of my Seamonkey web browser, I tried out the various icons on the periphery of the web browser window.
On the lower-right of the browser window, there was a pad-lock icon as seen in the following image. It referred to 'security information'.
When I clicked on the pad-lock, the following 'Page Info' panel appeared.
When there are icons on an application window, typically the icon function is also available via a drop-down menu from the top 'toolbar' of the application window. So I started looking for such a drop-down menu.
I found the 'View > Page Info' option in the Seamonkey drop-down menu --- as seen in the following image.
When I clicked on this 'Page Info' option, the 'Page Info' panel appeared --- but it looked a little different from the 'Page Info' panel that I got from clicking on the pad-lock icon.
The following image indicates that the text in the 'General' tab is shown --- whereas the text in the 'Security' tab is shown when you click on the pad-lock icon.
Clicking on the 'Links' tab revealed a listing of 'Scripts' --- and 'Related Items' and 'Stylesheets' --- similar to the 'Page Info' list above.
And scrolling to the bottom of the 'Links' window revealed the same 'Script' lines as in the 'Page Info' list above.
And I can see the 'PageInfo > Links' window for any web page by any of 3 methods:
Question-4: Block by IP-address-ranges rather than individual hostnames ?
I have seen that some of these 'offensive' 3rd-party hosts refer to themselves as providers of 'SaaS' (Software as a Service).
I look at them as providers of 'SaaD' (Software as a Dis-service).
Unfortunately, there are unscrupulous 'SaaD' people out there who are purposely making it hard for people to block their hosts. They do this by generating essentially hundreds of possible hostnames in their 'domain'. Examples:
Example-1 : (intellitxt)
'intellitxt' adds customer/company names to some hostnames that they use to provide their 'dis-service'. Example:
They have other 'fixed' hostnames, such as 'images.intellitxt.com' --- which I block.
I am reluctant to add, to my 'hosts' file, the hostnames with customer-ID's in the hostname. I may end up with hundreds of such hostnames in my 'hosts' file.
Example-2 : (cloudfront)
In January 2015, it appeared that 'cloudfront' added today's date --- in the form yyyy-mm-dd --- to some of the hostnames that they were using to provide their 'dis-service'. Example:
(That fourth set of digits is probably an hour-of-the-day.)
In February 2015, I found that 'cloudfront' was apparently generating 'randomized' hostnames, about 13 or 14 characters long, such as the following.
d3v27wwd40f0xu.cloudfront.net # at tigerdirect.com (2015feb17)
dnn506yrbagrg.cloudfront.net # at linuxjournal.com (2015feb17)
When I pinged them 4 days later, they were still ping-able --- at the following IP addresses.
d3v27wwd40f0xu.cloudfront.net # at 22.214.171.124 = server-54-230-18-168.iad12.r.cloudfront.net
dnn506yrbagrg.cloudfront.net # at 126.96.36.199 = server-54-240-160-26.iad12.r.cloudfront.net
It is not clear to me, yet, for how long these 13/14-character 'cloudfront.net' hostnames will be 'defined'. But 'cloudfront' may use some method of changing the hostnames that they use, over time.
(In any case, their hostnames will probably still be associated with IP addresses of their 'more permanent' hosts --- which may continue to use 'server-*-cloudfront.net' hostnames.)
Note that 'cloudfront' is an Amazon entity. Amazon started Cloudfront in 2008.
Amazon has other such companies, such as Goodreads, started in 2007, which has hostnames such as 's.gr-assets.com' --- which I block because it was causing some serious slowdowns.
It is somewhat ironic (and very disgusting) that two of the most useful websites on the Internet --- www.google.com and www.amazon.com --- are also responsible for a large part of the shenanigans (and slowdowns) that are going on 'underneath the covers' via organizations like 'doubleclick' and 'cloudfront' and 'goodreads'.
It appears that these 'cloudfront' and 'intellitxt' A-holes are going to lead us 'end-users' to look up the IP-address ranges that have been assigned to them, and then block the entire IP-address ranges.
There are plenty of web pages on the Internet showing how to block incoming IP-address ranges (for example, from China and Korea) by using the 'iptables' command on Linux operating systems. Example:
iptables -A INPUT -s 188.8.131.52/12 -j DROP
iptables -A INPUT -s 184.108.40.206/12 -j DROP
iptables -A INPUT -s 220.127.116.11/11 -j DROP
iptables -A INPUT -s 18.104.22.168/11 -j DROP
iptables -A INPUT -s 22.214.171.124/18 -j DROP
However, we will need to block the outgoing connection attempts from our web browser. Presumably, we will be able to do that with commands like
iptables -A OUTPUT -p tcp -d 126.96.36.199/19 -j DROP
I would probably drop the '-p tcp' parameter, because I would want to block ALL output to the IP address range --- whether the protocol was TCP, UDP, or whatever.
I will add more information here after I accumulate more experience with using 'iptables' to block these 'SaaD' people from degrading and destroying my Internet browsing experience.
As a preliminary example, here is how I may proceed to
write a script to block lots of 'cloudfront.net' addresses:
You can verify that '188.8.131.52' and '184.108.40.206' are in the IP address list given by CIDR (Classless Inter-Domain Routing) notation '220.127.116.11/22' by using this CIDR-to-IP-list Converter at magic-cookie.co.uk.
The IP address list can be rather long. To get a range (simply min-address and max-address), you can try this CIDR-to-IP-range Converter at tools.tracemyip.org.
Whenever I want to block more IP address ranges, I would add OUTPUT-DROP statements to this script --- then simply rerun the script. (And I would probably un-comment and use the INPUT-DROP statements as well.)
If you wanted to block entire countries, you could try a
script like one at
howtoforge.com. In case this link goes dead,
here is the sample script.
A simple, small addition for your 'hosts' file :
You can probably experience a much faster, smoother internet browsing experience by simply adding a set of about 15 statements to the bottom of your 'hosts' file --- like these:
As I pointed out above, the web pages of
In 2015 September, I had about 300 '3rd party' hosts in my hosts file --- with very pleasing results. I do not provide my file here, because some of my '3rd party' host blocks cause some web pages to display in an almost unreadable format. And some '2nd party' web sites I block completely, because their web pages are such extreme processing hogs.
I am willing to tolerate those kinds of inconveniences, but others would probably prefer to see pages that I have blocked or 'partially blocked' via my hosts file. I suggest that people collect their own entries for their hosts file --- tailored to the sites that they have visited.
Some effects of host blocking on web pages :
Blocking some of these 'internet ruining' hosts can have some effects that you will notice in web pages --- or in their popups.
For example, when you went to 'phoronix.com' (in early 2015), you were always greeted with a popup advertisement --- in a window like the following.
In this case, you can see the hostname 'ad.doubleclick.net'. Blocking this hostname did not stop the popup, but it stopped the processing that would have been incurred by connecting to 'ad.doubleclick.net' and fetching the ad and placing it in the popup window.
In some cases, you can see this 'Failed to Connect' image in places on web pages where images (fetched from a '3rd party' host) were supposed to be displayed.
Whenever I visted 'target.com' web pages, I noticed that the hostname 'Img1.targetimg1.com' was appearing for seconds at a time in the 'status line' of my web browser --- and the 'target.com' web pages were very slow to load on my old netbook. I blocked that hostname for a while, but I found that the 'target.com' web pages were coming up with no images. That affected the readability of the web page. The text was scattered around the web browser window --- making it hard to decipher the intended web page content.
So I commented out the '0.0.0.0 Img1.targetimg1.com' line in my '/etc/hosts' file. But, in the future, if I find 'Img1.targetimg1.com' is slowing down the showing of 'target.com' web pages to an annoying degree, I will un-comment that block-line and simply try to read the 'target.com' web pages without the images --- AND visit 'target.com' less frequently --- AND maybe block 'www.target.com' entirely. (I do not want to accidentally go there via a web search and have my computer lockup.)
I had a similar experience with the hostname 'www18.officedepot.com'. Images were being fetched --- slowly --- from this hostname, for web pages at 'officedepot.com'. I blocked the 'www18.officedepot.com' hostname for a while, but I commented the '0.0.0.0 www18.officedepot.com' when it was making it hard to read the 'officedepot.com' web pages.
However, like with 'target.com', if this images-host slows down my viewing of 'officedepot.com' web pages to an annoying degree, I will un-comment that block-line and simply try to read the 'officedepot.com' web pages without the images --- AND visit 'officedepot.com' less frequently --- AND maybe block 'www.officedepot.com' entirely with a '0.0.0.0 www.officedepot.com' line in my 'hosts' file.
DISGUSTING PEOPLE :
There are a lot of disgusting classes of people in this world:
People with corrupted-minds like these are scattered throughout local, state, and federal government agencies such as the 'Bureau of Land mis-Management', the EPA - blocking clean water protections, the FDA - blocking food poisoning protections, the SEC - shielding their buddies at Goldman Sachs and elsewhere, etc. etc.
And right up there, in the middle of these horrible-mind-people --- say, between 'predatory priests' and 'spouse beaters' --- we have the 'ruiners and burglars of the internet'. Below is a list of many of the companies (their web domain names) that are the '3rd parties' that have either ruined my web browsing experience or offer nothing that I need and nothing that I want.
I do not want these 'remote hosts' interfering with my / our viewing the content of a web page --- by bombarding me / us with
which, by the way, are driving up the cellphone bills of millions, if not billions, of people.
I will not go into the personal-data-gathering (PDG) that many of them do --- and how well/poorly they protect that data --- and what they do with it. That is a huge area for discussion. I will just say that it is not their right to do that PDG without our consent.
Some '3rd-party' ('internet-ruiner') domain/company names:
'google*.com' (in its various forms, see above), 'doubleclick.net' (a Google company, as noted above), 'cloudfront.net' (an Amazon company, as noted above), 'goodreads (gr-assets.com)' (an Amazon company, as noted above), and many more :
This list is just 'the tip of the iceberg'. This is just a sampling of less-than-useless-to-me 3rd-parties that I have discovered so far in web pages I have visited. And there is another one of these companies born (funded by venture capitalists) every day.
Note that the list of 'big dogs' acquiring these 'internet ruiners' is quite disgusting --- in alphabetical order: Adobe, Amazon, Alphabet/Google/YouTube, Oracle, Verizon, and more.
There is no doubt in my mind that Intel, HP, Dell, Lenovo, Cisco, Netgear, Motorola, AT&T, Sprint, Apple, Microsoft, semi-conductor manufacturers, and others are more or less directly involved in 'CDN' development --- or at least approve of it (and/or don't want to expose it 'to the light of day') because it is good for their business --- for example, good for selling more powerful computers (and/or routers) to people who think that having a faster processor and more memory is going to somehow deal with what is mostly a network latency (waiting for a response) issue.
The 'SaaD' 'CDN' people of the world rank somewhere within the above list of classes of people on the 'disgusting and do-not-deserve-to-live-on-this-planet' ranking scale.
The CEO's of these 'CDN' companies --- and the venture capitalists who support them while these CEO's salaries continue to outpace their revenues --- should get a soul and find another line of work. They are ruining the Internet. If they can't find gainful employment in a more socially-friendly job, they should do everyone a favor and simply die.
These 'monetizing' people can't seem to understand that it is way more than 'impolite' to shove ads and other crap in people's faces --- and to bog down their computing devices and networks --- without even offering their 'monetizing targets' an option (like a link on a web page with a brief explanation of what is being offered, and an option to opt-out).
Their techniques border on criminality. They steal peoples' time and their computing resources --- and run-up their cell phone bills. The victims' time is money --- and their cell phone bills are certainly money. The victims should have a legal basis to sue to get their time-equals-money and phone-bill-money back.
There are strong similarities of 'SaaD' people with the horribly anti-social classes of people listed above --- especially with 'investment scammers' --- as these 'SaaD' 'CDN' people desperately seek ways to 'monetize' the Internet --- somewhat like 'investment scammers' seek ways to 'super-monetize' the investment business, which is already heavily monetized. (The investment business is money looking for more money.)
I should point out that the 'CDN' companies are the enablers (or sources) of most of this network latency --- but it is really the customers of the CDN companies (and the web page designers of those customers) that are responsible for a major part of the problem. CDN-customers --- such as owners of news sites, shopping sites, and tech-info sites --- are designing their pages such that they do not give the web site visitors any options to avoid the horrible slow-downs.
Suggested laws : (and punishments)
There should be no 'you-have-to-Opt-Out' nonsense that is hidden in small print someplace. We get more than enough of that 'OptOut' nonsense. Namely, we are initially opted-in --- unbeknownst to us --- by our banks, money lenders (userers), investment companies, credit card companies, insurance companies, telephone companies, Google, Facebook, and the like.
There is to be no 'settlements without admission of guilt' nonsense. Penalty for first offense (omission of an Opt-In button on a page) is the CEO or CEO's of the offending company or companies serving a week in jail. (No finger pointing. Send them all to jail.)
Second offense: one year in jail. Third offense: Ten years in jail. Fourth offense: life in prison, surfing the internet --- enduring thousands and thousands of unwanted, useless, inapplicable (does-not-apply-to-me/you), product-never-to-be-used-by-me/you ads --- and unrequested popups and unrequested video/audio.
For some opinions in the same vein as I have expressed on this page, see this text version of a 2015 October talk on Web Obesity, by Maciej Ceglowski. He injects some humor along with much disgust over how web sites are composed these days (circa 2015). An excerpt that pretty much sums up:
"Here is the web pyramid as we observe it in the wild:
Bottom of this page on
To return to a previously visited web page location, click on the
Back button of your web browser a sufficient number of times.
OR, use the History-list option of your web browser.
Page was created 2015 Jan 30.
Following are three images of the 'Links' panel of the
'Page Info' window of the Seamonkey web browser --- showing
the top, middle, and bottom of a 'duckduckgo.com'
web-search-results web page. See comments below each image.
but, at least, they are all fetched from the same host that I am visiting,
NOT from some '3rd party' hostnames. It is unlikely that 'duckduckgo.com' is going
to slow down their search results web pages by allowing a '3rd party' to
possibly 'hang-up' the 'duckduckgo' search results display.
Note that, here in the middle of this extract from the HTML code of
the web-search-results page, there are lots of 'Anchor' statements,
'inline' code, being executed from 'this page' at 'this site'.
but, at least, it is fetched from the same host that I am visiting,
NOT from some '3rd party' hostname. Like with the scripts at the top
of the HTML code, it is unlikely that 'duckduckgo.com' is going to
slow down their search results web pages by allowing a '3rd party' to
possibly 'hang-up' the 'duckduckgo' search results display.
This is not to say that 'duckduckgo' is not doing some extra processing ---
such as collecting information on your web searches (for internal use, say)
--- but, at least, if they are doing 'extra' processing, it is
probably all 'localized' and relatively fast.
By the way, if you look at the 'Links' list of this 'host blocking' web page
(the page you are reading right now), you will see that there is essentially
nothing but 'Anchor' in the 'Type' column --- essentially no 'Script' lines
--- and, essentially, not even a 'Stylesheet' or 'Related Item' or 'Form Submission'.
'Anchor' statments are 'nice' in the sense that the user has to click on
a 'link' in the web page before the web browser will take you to the 'location'
that you see in the 'Address' column. IOW, the user has the power to opt-in to an anchor
--- like the 'IOW' anchor at the beginning of this sentence.
'Anchor' statments are also 'nice' in the sense that you can see the hostname
of the 'link'. When I move the mouse cursor over the 'link', I can see the
hostname and web-page filename that the link will take me to --- in the
'status-line' at the bottom of my browser window.
home (computer) unbeknownst to you and rifling through your belongings.