From prashant on Thu, 11 Feb 1999
I am using Red Hat Linux.And I want to install a proxy server. I have a modem can configure ppp over that.
But i want that proxy to do the following functions:
I dont know how i am going to do this please help me.
Your list mixes needs with conclusions. I don't recommend that when doing "requirements analysis" as you'll probably end up with some inappropriate constraints.
If I understand it correctly you want to "optimize" your PPP connection in the sense that you want to minimize the traffic flowing over it, and the latency between requests and responses.
I'm not familiar with a package named "webproxy-1.3" --- but any caching/proxy will tend to lessen the traffic depending on your usage patterns and the co-operation of the sites that you access over these protocols. Squid is probably the most advanced caching proxying available --- and it's designed to peer with other ICP (Internet Caching Protocol) servers, (potentially minimizing traffic over other links, further out on the Internet, beyond your PPP link while also minimizing latency).
I don't understand item three at all. What doesn't support many protcocols? Squid supports a number of protocols (all those that are amenable to caching, that I can think of). Also the conclusion: "So I want a router linked it" is completely bogus. A router does routing, a proxy does proxying and caching. These functions operate at different (though sometimes blurred) levels in the OSI reference model.
If you use your Linux system as a "gateway" to the Internet for any systems other than itself (if it has an ethernet and a PPP link or any other combination of two or more non-loopback interfaces) than it probably is acting as a router.
So, let's step back from the constraints implied by these extraneous comments and focus on what you want.
You could do some protocol analysis on your PPP link to determine what protocols are consuming which percentages of the bandwidth; and to determine the average latency among various protocols. This would help you focus on which protocols are likely to benefit the most from caching. It's also possible you might find other ways to help improve your utilization.
Without going into gory details of using 'tcpdump' and performing data analysis on that we can suggest that you start with the basics.
Run a caching nameserver on your PPP/router. This should immediately improve response time and reduce bandwidth utilization by obviating the need to forward/route DNS queries across the link. Make sure to configure the /etc/resolv.conf (or its equivalent on your non-Unix systems) to actually use your caching nameserver. That includes the resolv.conf on the router/gateway itself!
Install Squid and configure your web browsers and any gopher, WAIS, or other supported clients to use it. That should help with those web sites that don't egregiously prevent caching. Note that some sites use HTTP headers (Pragmas) to eliminate or minimize caching of their pages. This is often done by "advertising" supported sites as part of their "imprint" accounting and to support their high traffic claims (to their customers). That is BAD for the Internet as a whole (since it forces every link between those sites and all of their clients to carry redundant traffic). Oh well! There goes the neighborhood!
After you've taken these two steps (and provided your caching proxy/router with LOTS of disk space and memory) you should monitor the line performance (informally) to see if that meets your needs. You've probably gained 80-90% of the potential efficiency gains already --- so additional work will have diminishing returns.
You can install DeleGate for FTP proxying (I don't know how to make "normal" FTP clients talk to Squid's FTP proxying --- but they can be configured to use DeleGate as you'd use any SOCKS proxy, and you can "manually" traverse a DeleGate FTP or telnet proxy in a way that's conceptually similar to the old TIS FWTK (though completely different, and much cleaner, in syntax).
That's probably about as far as you can go with simple proxying. From there you'll have to change the mixture of protocols you run, and/or optimize the way you work. For example if you have e-mail flowing over that PPP link you might reconfigure that to "Hold" (as "expensive") and queue it for delivery during off peak hours.
You might even reconfigure your e-mail and any netnews traffic (both outgoing and incoming) to go through UUCP. UUCP allows you to "grade" your traffic, and to schedule the delivery and receipt. This can include file transfers as well as mail and news. Naturally you'd have to arrange for some ISP to provide your UUCP batching for you. There are still some ISPs that specialize in this, and there are still some co-operative arrangements available in some localities.
These techniques have a very steep learning curve. No one has been providing WYSI new front ends to make the configuration of UUCP links as easy as common PPP scenarios are today. Also there are very few ISPs with the expertise and interest to provide these services. In addition the entire discussion is moot if you aren't carrying netnews, email, or file-transfer traffic over your link (if you don't read netnews, you've arranged ISP POP accounts on the other side of your link and your file transfers can't be scheduled and automated with UUCP).
Another option is to look at your work and access patterns. If you know that you're going to want to read "Linux Weekly News" every Thursday morning when you come in, create a cron job to 'wget' or do a 'lynx -traversal' of http://www.lwn.net every Thursday morning at 3:00am (before you come in, but still in the "dead of the night). The LWN crew seems to consistently have that up by about midnight (U.S. Mountain time). You could have similar daily jobs for your "Dilbert" fix (http://www.unitedmedia.com/dilbert) etc.
There are some tricks you can do to minimize the amount of your bandwidth you devote to downloading advertising and graphics. One method is to use Lynx (which doesn't download any graphics by default, and therefore filters out most banner ads). Another is to create your own "localhost" aliases for some sites like "click.net" --- sites which are used exclusively to serve banner ads that are embedded in the HTML of the sites you visit. Of course, the advertisers, web site maintainers (like Yahoo!) and click.net itself might complain that you are "depriving" them of revenue by viewing these advertiser supported pages while filtering out the advertsing.
If a statistically significant number of users employ these strategies then we'll see a resulting "arms race" to force the advertisments down your throat. They'll increasingly "mix" the advertising and content as inextricably as possible --- meaning that text browsers and search engines will become useless.
It's a pity that more of us don't consider the implications of advertiser supported media on our lives. Your broadcast news, TV, radio, newspapers and other periodical publications are all completely funded by advertising and therefore fundamentally suspect in regards to content and focus. Its not a "conspiracy" theory --- merely and economic fact. You get what was paid for. Since you didn't "pay for" the content that you're receiving through traditional media (and increasingly for Internet "content") --- you have little or no say in what's provided over them.
You have obscure indirect effects by your selection of products and services and somewhat more by complaint (to government and regulatory bodies and to sponsors). It's all very "negative" (in a philosophical sense). It's a pity we haven't come up with a better way to do things --- though the Internet's netnews, mailing lists, and the personally and "activist" run and maintained web sites continue to be a "ray of hope."
In any event: That's about all there is to caching and proxying for small sites over PPP and other low-bandwidth links. Larger internetwork sites might benefit from more elaborate ICP arrangments (peering among departmental Squid servers and creating a whole caching hierarchy).
Remember that this is not a magic bullet.
It's possible that your usage patterns actually won't benefit from caching or proxying. If everyone on your network is always visiting different sites, and they only visit sites that change frequently --- then the cache will be a waste of your systems memory and disk space.
Best of luck!