Components of a Web Proxy Cache

There are several important components to the standard cache architecture of your typical web proxy server. In order to implement a fully functional Web proxy cache, a cache architecture requires several components:

  • A storage mechanism for storing the cache data.
  • A mapping mechanism to the establish relationship between the URLs to their respective cached copies.
  • Format of the cached object content and its metadata.

These components may vary from implementation to implementation, and certain architectures can do away with some components. Storage The main Web cache storage type is persistent disk storage. However, it is common to have a combination of disk and in-memory caches, so that frequently accessed documents remain in the main memory of the proxy server and don’t have to be constantly reread from the disk.

The disk storage may be deployed in different ways:

  • The disk maybe used as a raw partition and the proxy performs all space management, data addressing, and lookup-related tasks.
  • The cache may be in a single or a few large files which contain an internal structure capable of storing any number of cached documents.

The proxy deals with the issues of space management and addressing. ‘ The filesystem provided by the operating system may be used to create a hierarchical structure (a directory tree); data is then stored in filesystem files and addressed by filesystem paths. The operating system will do the work of locating the file(s). ° An object database may be used.

Again, the database may internally use the disk as a raw partition and perform all space manage- ment tasks, or it may create a single file, or a set of files, and create its own “filesystem” within those files. Mapping In order to cache the document, a mapping has to be established such that, given the URL, the cached document can be looked up Fast. The mapping may be a straight-forward mapping to a file system path, although this can be stored internally as a static route.

Typically a proxy would store any resource that is accessed frequently. For example in many UK proxies, the BBC website is extremely popular and it’s essential that this is cached. even for satellite offices it allows people to access BBC VPN through the companies internal network. This is because the page is requested and cached by the proxy which is based in the UK, so instead of the BBC being blocked outside the UK it is still accessible.

Indeed many large multinational corporations sometimes inadvertently offer these facilities. Employees who have the technical know how can connect their remote access clients to specific servers in order to obtain access to normally blocked resources. So they would connect through the British proxy to access the BBC and then switch to a French proxy in order to access a media site like M6 Replay which only allows French IP addresses.

It is also important to remember that direct mappings are normally reversible, that is if you have the correct cache file name then you can use it to produce the unique URL for that document. There are lots of applications which can make use of this function and include some sort of mapping function based on hashes.

Programming Terms: Garbage Collection

There are lots of IT terms thrown about which can be quite confusing for even the experienced IT worker. Particularly in the world of network programming and proxies, sometimes similar words have completely different meanings depending on where you are in the world.

Let’s step back for a minute, and look at what garbage collection means in the programming language world. Though not strictly relevant to the subject of this blog, it is a good way to illustrate the benefits and draw- backs of garbage collection type memory management, whether on disk or in memory. Compiled programming languages, such as C or Pascal, typically do not have run-time garbage collection type memory management.  Which is why they are often not suitable for heavy duty network resources such as the BBC servers which supply millions  streaming live TV such as Match of the Day from VPNs and home connections anywhere.

Instead, those languages require the program authors to explicitly manage the dynamic memory: memory is allocated by a call to malloc ( ) , and the allocated memory must be freed by a call to free ( ) once the memory is no longer needed. Otherwise, the memory space will get cluttered and may run out. Other programming languages, such as Lisp, use an easier [1] memory management style: dynamic memory that gets allocated does not have to be explicitly freed. Instead, the run-time system will periodically inspect its dynamic memory pool and figure out which chunks of memory are still used, and which are no longer needed and can be marked free.

Usually programming languages that are interpreted or object oriented (Lisp, ]ava, Smalltalk) use garbage collection techniques for their dynamic memory management. The determination of what is still used is done by determining whether the memory area is still referenced somewhere-—that is, if there is still a pointer pointing to that area. If all references are lost for example if it has been thrown away by the program—-the memory could no longer be accessed and therefore could be freed.

There are several different approaches to doing this reference detection. One approach is to make each memory block contain an explicit reference counter which gets incremented when a new reference is created and decremented when the reference is deleted or changed to point somewhere else. This requires more work from the run-time system when managing memory references. Another approach is simply to use brute force periodically and traverse the entire memory arena of the program looking for memory references and determine which chunks still get referenced.

This makes it easier and faster to manage memory references as reference counters don’t have to be updated constantly. However, at the same time it introduces a rather heavyweight operation of having to traverse the entire memory scanning for references.

John Williams

From – Web Site

No Comments Networks, Protocols

Subroutine – Passing Parameters

Passing parameters into Subroutines, following examples are from Perl scripts.

Parameters are passed into subroutines in a list with a special name — it’s called @_ and it doesn’t conform to the usual rules of variable naming. This name isn’t descriptive, so it’s usual to copy the incoming variables into other variables within the subroutine.

Here’s what we did at the start of the getplayer subroutine: $angle = $_[O]; If multiple parameters are going to be passed, you’ll write something like: ($angle,$units) = @_; Or if a list is passed to a subroutine: @pqr = @_; In each of these examples, you’ve taken a copy of each of the incoming parameters; this means that if you alter the value held in the variable, that will not alter the value of any variable in the calling code.

This copying is a wise thing to do; later on, when other people use your subroutines, they may get a little annoyed if you change the value of an incoming variable!   Although this method can also be used to hack into websites or divert video streams to bypass geo-blocking for example to watch BBC News outside the UK  like this.

Returning values Our first example concludes the subroutine with a return statement: return ($response); which very clearly states that the value of $response is to returned as the result of running the subroutine. Note that if you execute a return statement earlier in your subroutine, the rest of the code in the subroutine will be skipped over.

For example: sub flines { $fnrd = $_[0]; open (FH,$fnrd) or return (—1); @tda = ; close PH; return (scalar (@tda)); l will return a -1 value if the file requested couldn’t be opened.

Writing subroutines in a separate file
Subroutines are often reused between programs. You really won’t want to rewrite the same code many times, and you’ll
certainly not want to have to maintain the same thing many times over Here’s a simple technique and checklist that you can use in your own programs. This is from a Perl coding lesson, but can be used in any high level programming language
which supports subroutines.

Plan of action:
a) Place the subroutines in a separate file, using a file extension .pm
b) Add a use statement at the top of your main program, calling
in that tile of subroutines
c) Add a 1; at the end oi the file of subroutines. This is necessary since use executes any code that’s not included in subroutine blocks as the tile is loaded, and that code must return a true value — a safety feature to prevent people using TV channels and files that weren’t designed to be used.

No Comments News, Protocols, VPN

Network Programming : What are Subroutines?

What are subroutines and why would you use them?The limitations of “single block code” You won’t be the first person in the world to want to :

  • be able to read options from the command line
  • interpret form input in a CGI script –
  • pluralize words in English

But it doesn’t stop there, lets choose a few other seemingly simple but useful tasks that your code may need to accomplish.  You won’t be the first person in your organisation to want to

  • output your organisation’s copyright statement
  • validate an employee code
  • automatically contact a resource on your web site

These are the sort of tasks that may need to happen again and again, both in the same piece of codes or perhaps across different programs. You may need to handle the same data in several programs, or to handle in your programs the same data that your colleagues handle in theirs. And you may want to perform the same series of instructions at several places within the same program. Almost all programming languages, at least the high level ones can handle these operation including things like Perl. Even the beginners who start off with all your code has been in a single file and indeed has “flowed” from top to bottom.

You can use these subroutines to perform tasks that need to be repeated over and over again. In the context of network programming you could use a specific subroutine to assign a British IP address to a client or hardware device,
You have not been able to call the same code in two different places ‘ You have not been able to share code between programs — copying is not normally an option as it gives maintenance problems ~ You have not used your colleague’s code, nor code that’s available for everyone on the CPAN, nor additional code that’s so often needed that it’s shipped with the Perl distribution. First use of subroutines The first computer programs were written rather like the ones that we’ve written so far.

Each one for its own specific task. In time, programmers (said to be naturally lazy people) noticed that they could save effort by placing commonly used sections of code into separate blocks which could be called whenever and wherever they were needed. Such separate blocks were variously known as functions, procedures or subroutines.

We’ll use the word “subroutine” because Perl does! Structured programming The subroutine approach was then taken to extreme so that all the code was put into separate blocks, each of which could be described as performing a single task. For example, the program I run might be described as performing the task of “reporting on all towns with names matching a pattern”.  You could then split that task into multiple tasks for example creating multiple network connections to different servers.  On a multimedia server you could call the relevant subroutines depending on which channel was to be displayed e’g one for English channel, one for commercial ITV channel abroad  and another for a French variant.  All of these could be separate subroutines called from within the main code when the user presses a button.

No Comments News, Protocols

Intrusion Detection – Post Attack Phase

If you’re protecting any network then understanding the options and various phases of an attack can be crucial.  When you detect an intrusion, it’s important to quickly assess what stage the attack is at and what possible developments are likely.  Whether it’s a skilled attacker of some opportunist kid with some technical skill makes a huge difference in the possible outcomes.

Even regular, normal traffic in suspicious or unusual situations can indicate a possible intrusion. If you suddenly notice TCP three-way handshakes completing on TCP ports 20 and 21 on a home Web server, but you know that you do not run an FTP server at home, it is safe to assume that something suspicious is going on. Post—Attack Phase After an attacker has successfully penetrated a host on your network, the further actions he will take for the most part follow no predictable pattern.   Obviously the danger is much greater if the attacker is both skilled and has plans to further exploit your network while many will simply deface a few pages or use it as  a VPN to watch US or UK TV channels abroad.

This phase is where the attacker carries out his plan and makes use of any information resources as he sees fit. Some of the different options available to the attacker at this point include the following:

  • Covering tracks
  • Penetrating deeper into network infrastructure
  • Using the host to attack other networks
  • Gathering, manipulating, or destroying data
  • Handing over the host to a friend or hacker group
  • Walking or running away

If the attacker is even somewhat skilled, he is likely to attempt to cover his tracks. There are several methods; most involve the removal of evidence and the replacement of system files with modified versions.The replaced versions of system files are designed to hide the presence of the intruder. On a Linux box, netstat would be modified to hide a Trojan listening on a particular port. Hackers can also cover their tracks by destroying system or security log files that would alert an administrator to their presence. Removing logs can also disable an HIDS that relies on them to detect malicious activity. There are automated scripts available that can perform all these actions with a single command. These scripts are commonly referred to as root/ens.

Externally facing servers in large network topologies usually contain very little in terms of useful data for the attacker. Application logic and data is usually stored in subsequent tiers separated by firewalls.The attacker may use the compromised host to cycle through the first three attack phases to penetrate deeper into the system infrastructure. Another possibility for the black hat is to make use of the host as an attack or scanning box.When skilled hackers want to penetrate a high—profile network, they often compromise a chain of hosts to hide their tracks.   It’s not unusual for the attackers to relay their connections through multiple servers, bouncing from remote sites such as Russian, Czech and a German proxy for example before attacking the network.

The most obvious possibilities for the attacker are to gather, manipulate, or destroy data. The attacker may steal credit card numbers and then format the server. The cracker could subtract monies from a transactional database.The possibilities are endless. Sometimes the attackers motivation is solely to intrude into vulnerable hosts to see whether he can. Skilled hackers take pride in pulling off complicated hacks and do not desire to cause damage. He may turn the compromised system over to a friend to play with or to a hacker group he belongs to. The cracker may realize that he has gotten in over his head and attacked a highly visible host, such as the military’s or major financial institutions host, and want to walk away from it praying he isn’t later discovered.

Cryptographic Methods and Authentication

It used to be the domain of mathematicians and spies but know cryptography plays an important part in all our lives. It is important if we want to continue to use the internet for commerce and any sort of financial transactions. All our basic web traffic exists in the clear and is transported via a myriad of shared network equipment. Which means basically anything can be intercepted and read unless we protect it in some way – the most accessible option is to use encryption.

Cryptographic methods are utilized by software to maintain computing and data resources safe-,effectively shielding them with secret code or their,’key.   It’s not always necessary of course, the requirements are heavily dependent on what the connection is being used for.  For example there’s little point encrypting compressed streams like audio and video in normal circumstance, no-one is at risk from intercepting you streaming UK TV abroad from your computer.The key holder is the only individual who has access to the secure information. That individual might share the key with others, permitting them to also get into the information. In a digital world, and especially from the envisaged world of electronic commerce, the demand for safety which is backed by cryptographic systems is paramount. At the future, a person’s initial approach to most electronic devices, and especially to networked electronic devices, will demand cryptography working from the background. Whenever security is necessary, the first point from the human-to machine interface is that of authentication.

The electronic system should know with whom it’s dealing. But just how is this done?  Strong authentication is based on three characteristics which a user needs to have:

  • What the user knows.
  • What the user has.
  • Who the user is.

Today, a typical authentication routine will be to present what you’ve, a token like an identification card, then to uncover what you know, a pin number or password. In a very brief time in the future, the ,who you are kind of identification would be common, first on computers, and after that on an entire selection of merchandise, progressively phasing out the need for us to memorize contact numbers and passwords.  Indeed many entertainment websites are looking at developments in this field with a view to incorporating identity checks in a seamless way.  For example to allow access to UK TV license fee payers who want to watch the BBC from Ireland for example.

But where does the cryptography come to the equation? . In the easiest level, you might offer a system. Like a pc terminal, a password. The system checks your password. You can be logged on to the system. In this example of quite weak authentication, cryptographic methods are utilized to encrypt your password stored inside the system. If your password was held in clear text, rather than cipher text, then a person with an aptitude for programming could soon find the password inside the system and start to usurp and obtaining access to all of the information and system resources you’re permitted to use.

Cryptography does its best to defend the secret, which is your password. Now consider a system that requires stronger authentication. The automatic teller machine is a good example. To perform transactions in an Automated teller machine terminal, you want an ATM Card and a pin number. Inside the terminal, information is encrypted. The information the terminal transmits to the bank is also encrypted. Security is better, but not perfect, since the system will authenticate an individual who isn’t the owner of the card / pin number. The person might be a relative utilizing your card by permission, or he can be a burglar who has just relieved you of your pocket and is about to save you of your life savings. Time, you could think, for stronger authentication. Systems currently in field tests require an additional attribute based on your identity to strengthen the authentication procedure.

TCP/UDP Port Numbers

Both TCP and UDP require port numbers in order to communicate with the upper layers.  These port numbers are used to keep track of varying conversations which criss-cross the network simultaneously. The origin port numbers are dynamically assigned by the source host, most of them will be  at some number above 1024.   All the numbers below 1024 are reserved for specific services as defined in RFC 1700 – they are known as well known port numbers.

Any virtual circuit which is not assigned with a specified service will always be assigned a random port number from this range above 1024.    The port numbers will identify the source and destination in the TCP segment.    Here’s some common port numbers that are associated with well known services:

  • FTP – 21
  • Telnet -23
  • DNS – 53
  • TFTP – 69
  • POP3 – 110
  • News – 144

As you can see all the port numbers assigned are under 1023, whereas above 1024 and above are assigned by the upper layers to set up connections with other hosts.

The internet layer exists for two main reasons, routing and providing a specific network interface to the upper layers. As regards to routing none of the upper or lower layer protocols have any specific functions. Al the routing functionality is primarily the job of the internet layer. As well as routing the internet layer has a second function – to provide a single network interface and gateway to the upper layer protocols.
Application programmers, use this layer to to access the functionality into their application for network access. It is important as it ensures that there is a standardization to access the network layer. Therefore the same functions apply whether you’re on a ethernet or Token ring network.

IP provides a single network interface to access all of these upper layer protocols. The following protocols specifically work at the internet layer:

  • Internet Protocol (IP)
  • Internet Protocol (ICMP)
  • Address Resolution Protocol (ARP)
  • Reverse Address Resolution Protocol (RARP)

The internet protocol is essentially the Internet layer, all the other protocols merely support this functionality. So if for instance you buy UK proxy connections then IP would look at each packet’s address. Then using a routing table, the protocol would decide where the packet should be routed next. The other protocols, the network access layer ones at the bottom of the OSI model are not able to see the entire network topology as they only have connections to the physical addresses.

In order to decide on the specific route, the IP layer needs to answer two specific questions,. The first is which network is the destination host on and the second is what is the ID on that network.   these can be determined and allocated as the logical and hardware address.  The logical address is better known as the IP address and is a unique identifier on any network of the location of a specific host.  These are allocated by specific location and are used by websites to determine resources, so for example to watch BBC iPlayer in Ireland you’d need to route through a UK IP address and not your assigned Irish address.


Data Encapsulation and the OSI Model

When a client needs to transmit data across the network to another device an important process happens.  This process is called encapsulation and involved adding protocol information from each layer of the OSI model.   Every layer in the model only communicates with it’s peer layer on the receiving device.

In order to communicate and exchange information, each layer uses something called PDU which are Protocol Data Units.   These are extremely important and contain the control information attached to the data at each layer of the model.  It’s normally attached to the header of the data field however it can also be attached to the trailer at the end of the data.

The encapsulation process is how the PDU is attached to the data at each layer of the OSI model.  Every PDU has a specific name which is dependent on the information contained in each header.   The PDU is only read by the peer layer on the receiving device at which point it is stripped off and the data handed to the next layer.

Upper layer information only is passed onto the next level and then transmitted onto the network.    After this process the data is converted and handed down to the Transport layer this is done by setting up a virtual circuit to the receiving device by sending a synch packet.   In most cases the data needs to be broken up into smaller segments then a Transport layer PDU attached to the header of the field.

Network addressing and routing through the internetwork happens at the network layer and each data segment.    Logical addressing for example IP is used to transport every data segment to it’s destination network.  When the Network layer protocol adds the control header from the data received from the transport layer it is then described as packet or datagram.  This addressing information is essential to ensure the data reaches it’s destination.  It can allow data to traverse all sorts of networks and devices with the right delivery information added to subsequent PDUs on it’s journey.

One aspect that often causes confusion is the later where packets are taken from the network layer and placed in the actual delivery medium (e.g. cable or wireless for example). This can be even more confusing when other complications such as VPNs are included which involve routing the data through a specified path.   For example people route through a VPN server in order to access BBC iPlayer abroad like this post which will add additional PDUs to the data.   This stage is covered by the Data Link layer which encapsulates all the data into a frame and adds to the header the hardware address of both the source and the destination.

Remember for this data to be transmitted over a physical network it must be converted into a digital signal.  A frame is therefore simply a logical group of binary digits – 1 and 0s which is read only by devices on the local networks.   Receiving devices will synchronize the digital signal and extract all the 1s and 0s.  Here the devices build the frames and run a CRC (Cyclic Redundancy Check) in order to ensure it matches with the transmitted frame.

Additional Information 

No Comments Networks, Protocols, VPN

Network Topology: Ethernet at Physical Layer

Ethernet is commonly implemented in a shared hub/switch environment where if one station broadcasts a frame then all devices must synchronize to the digital signal to extract the data from the physical wire.  The connection is between physical medium, and all the devices that share this need to listen to each frame as they are considered to be on the same collision domain.  The downside of this is that only one device can transmit at each time plus all devices need to synchronize and extract all the data.

If two devices try to transmit at the same time, and this is very possible – the a collision will occur.  Many years ago, in 1984 to be precise, the IEEE Ethernet Committee released a method of dealing with this situation.  It’s a protocol called the Carrier Sense Multiple Access with Collision Detect protocol or CSMA/CD for short.  The function of this protocol is to tell all stations to listen for devices trying to transmit and to stop and wait if they detect any activity.  The length of the wait is predetermined by the protocol and will vary randomly, the idea is that when the collision is detected it won’t be repeated.

It’s important to remember that Ethernet, uses a bus topology.   This means that whenever a device transmits then the signal must run from one end of the segment to the other.   It also defines that a baseband technology should be used which means that when a station does transmit it is allowed to use all potential bandwidth on the wire.  There is no allowance for other devices to utilise the potential available bandwidth.

Over the years the original IEEE 802.3 standards have been updated but here are the initial settings:

  • 10Base2: 10 Mbps, baseband technology up to 185 meters in cable length.  Also known as thinnet capable of supporting up to 30 workstations in one segment.  Not often seen now.
  • 10base5: 10 Mbps, baseband technology allows up to 500 meters in length. Known as thicknet.
  • 10BaseT: 10Mbps using category 3 twisted pair cables. Here every device must connect directly into a network hub or switch.   This also means that there can only be one device per network segment.

Both the speeds and topologies have changed greatly over the years, and of course 10Mbps is no longer adequate for most applications.  In fact most networks will run on gigabit switches in order to facilitate the increasing demands of network enabled applications.    Remember allowing access to the internet means that bandwidth requirements will rocket even if you allow for places like the BBC blocking VPN access (article here).

Each of the 802.3 standards defines an Attachment Unit Interface (AUI) that allows one bit at a time transfer using the data link media access method to the Physical layer.  This means that the physical layer becomes adaptable and can support any emerging or newer technologies which operate in a different way.  There is one exception though and it is a notable one, the AUI interface cannot support 100Mbs Ethernet for one specific reason – it cannot cope with the high frequencies involved.   Obviously this is the case for even faster technologies too like Gigabit Ethernet.

John Smith

Author and Network VPN Blogger.

No Comments Networks, Protocols, VPN

Using SSL for Email and Internet Protocols

If you want to increase the security attached to your email messaging then there’s several routes you can take.  First of all, you should look at digitally signing and encrypting all your email messages.  There are several applications that can do this, or you could switch your emails to the cloud and look at a server based email system.    Most of the major suppliers of web based secure mail are extremely secure with regards to interception and end point security, however obviously you have to trust your email with a third party.

Many companies won’t be happy with outsourcing their messaging like this as it’s often the most crucial part of a companies digital communications.   However what are the options if you want to operate a secure and digitally advanced email messaging service within your corporation?  Well the first place to investigate is increasing the security of authentication and data transmission.   There are plenty of RFCs (Request for Comments) on these subjects particularly related to emails and their related protocols.

Here’s a few of the RFC based protocols related to Email  :

  • Post Office Protocol 3 (POP3) – the simple but effective protocol used to retrieve email messages from an inbox on a dedicated email server.
  • Internet Message Access Protocol 4 (IMAP4) – this is usually used to retrieve any messages stored on an email server. It includes those stored in inboxes, and other types of message boxes such as drafts, sent items and public folders.
  • Simple Mail Transfer Protocol (SMTP) – very popular and ubiquitous email protocol, generally just used to send email messages to recipients.
  • Network News Transfer Protocol (NNTP) – Not specifically an email protocol, however can be used as such if required! It’s normally used to post and download newsgroup messages from news servers.  Perhaps slightly outdated now, but a seriously efficient protocol that can be used for distributing emails.

The big security issue with all these protocols however is that the majority in default mode send their messages in plain text. You can obviously counteract this by encrypting on a client level, the easiest method is by simply using a VPN. Many people already use VPN to access things like various media channels – read this post about BBC iPlayer VPN which is not exclusively about security more about bypassing region blocks.

However remember when an email message is transmitted in clear text it can be intercepted at various levels. Anyone with a decent network sniffer and access to the data could read the information and message content. The solution is in some ways obvious and implied in the title of this post – implement SSL. Using this extra security layer you can protect all the simple RFC based email protocols, and better still they can slot simply to interact with standard email systems like Exchange.

It works and is easy to implement and also when SSL is implemented the server will accept connections on the SSL sport and not the standard oirt that the email protocol normally uses. If you have only one or two users who need a high level of email security then using a virtual private network might be sufficient. There are many sophisticated services that come with support – for instance this BBC Live VPN is based in Prague and has some high level security experts who work in support.

No Comments News, Proxies, VPN