There are several important components to the standard cache architecture of your typical web proxy server. In order to implement a fully functional Web proxy cache, a cache architecture requires several components:
- A storage mechanism for storing the cache data.
- A mapping mechanism to the establish relationship between the URLs to their respective cached copies.
- Format of the cached object content and its metadata.
These components may vary from implementation to implementation, and certain architectures can do away with some components. Storage The main Web cache storage type is persistent disk storage. However, it is common to have a combination of disk and in-memory caches, so that frequently accessed documents remain in the main memory of the proxy server and don’t have to be constantly reread from the disk.
The disk storage may be deployed in different ways:
- The disk maybe used as a raw partition and the proxy performs all space management, data addressing, and lookup-related tasks.
- The cache may be in a single or a few large ﬁles which contain an internal structure capable of storing any number of cached documents.
The proxy deals with the issues of space management and addressing. ‘ The ﬁlesystem provided by the operating system may be used to create a hierarchical structure (a directory tree); data is then stored in ﬁlesystem ﬁles and addressed by ﬁlesystem paths. The operating system will do the work of locating the ﬁle(s). ° An object database may be used.
Again, the database may internally use the disk as a raw partition and perform all space manage- ment tasks, or it may create a single ﬁle, or a set of ﬁles, and create its own “ﬁlesystem” within those ﬁles. Mapping In order to cache the document, a mapping has to be established such that, given the URL, the cached document can be looked up Fast. The mapping may be a straight-forward mapping to a ﬁle system path, although this can be stored internally as a static route.
Typically a proxy would store any resource that is accessed frequently. For example in many UK proxies, the BBC website is extremely popular and it’s essential that this is cached. even for satellite offices it allows people to access BBC VPN through the companies internal network. This is because the page is requested and cached by the proxy which is based in the UK, so instead of the BBC being blocked outside the UK it is still accessible.
Indeed many large multinational corporations sometimes inadvertently offer these facilities. Employees who have the technical know how can connect their remote access clients to specific servers in order to obtain access to normally blocked resources. So they would connect through the British proxy to access the BBC and then switch to a French proxy in order to access a media site like M6 Replay which only allows French IP addresses.
It is also important to remember that direct mappings are normally reversible, that is if you have the correct cache file name then you can use it to produce the unique URL for that document. There are lots of applications which can make use of this function and include some sort of mapping function based on hashes.