Monday, February 28, 2011

All About Linux Hosting & Windows Hosting

All About Linux Hosting

A Linux hosting choice often comes down to cost, stability, and security. The open source Linux hosting alternative allows websites to be built on the Linux operating system. By taking advantage of this particular platform for your website development, you have access to the effective and efficient open-source technologies, which include PHP, MySQL, Python, and XML.

The definitions of these are as follows:

MySQL - A popular open source database

Read more about MySQL Hosting

PHP - A server-side scripting language. PHP commands are embedded in a web page's HTML. PHP applications are normally found on Linux servers along with MySQL databases. This provides Linux hosting the same abilities as Windows.

Read more about PHP Hosting

Python - A computer language that is portable, open source, and can be re-sold, despite the fact that it is copyrighted material.

XML- Short for Extensible Markup Language. It is designed for web documents, allowing web designers to customize for maximum efficiency.

What about Linux operating system itself? Simply put, is easy to get. You can download off the internet, cheap CD, or purchase it with full manuals and support less than $150.

The significant difference between Windows and Linux is that the source code is included. The source code is a program written in programming language, which translates automatically into a machine code that a computer understands. The plus on this is that since you have it, you can alter it to do whatever it is that you need your Linux web hosting company to accomplish. Programmers are constantly working to improve Linux. As a result, problems are found and corrected immediately. There are also more reliable uptimes and fewer crashes for your Linux hosting provider, which is great for your business. Upgrades, security, and software issues are all handled with minimal or no disruption to your service.

While most software made for Windows will not run efficiently on Linux, many Linux vendors make versions of the programs that you are accustomed to using with Windows. The systems also have tools and web browsers that are similar to those used on Windows. Linux runs on a wide range of computers, hard disk space, and processors.

Windows Hosting

Windows Web Hosting Explained

There is a common misconception that just because you use a Windows operating system on your desktop, you must use a windows hosting provider for your web presence. The two are unrelated. Just as your personal computer requires an operating system, which is more than likely a version of Microsoft Windows, so too does a web server. Microsoft just so happens to produce one of those as well - windows web hosting.

Though the OS on your desktop is not a determining factor, there are many instances where a windows web hosting company is necessitated. If, for example, you use .NET script to compose your website, you will need a windows hosting provider. If you use MS SQL or Microsoft Access databases, you will need a windows hosting provider. If you use ASP or ASP.NET scripts for server-side programming, you'll want a windows web hosting provider. If you use PHP scripts and MySQL databases, however, you may want a Linux host.

Windows hosting offers ASP, ASP.NET, and Access Database support, the MS SQL server database, a IIS server, and is PHP and MySQL compatible. Most people using a Windows web hosting server are running Active Server Pages (ASP) technology. Linux, however, offers greater support for Perl scripts and CGI programs.

If you use FrontPage to design and publish your website, you'll find better support on a windows hosting provider than a Linux or a Unix based provider - likewise, if you plan to have Windows Streaming Media on your site, or use Visual Interdev. If, on the other hand, you expect to use remote interactive access via SSH or telnet, you'll only find support from Linux servers.

Linux is also an open source code, rather than a proprietary code like Microsoft's Windows products, which gives it a versatility and malleability that windows hosting lacks. If you enjoy programming custom applications, or scouring the net for innovative, free web tools, Linux may be just your thing. But if you'd rather rely on the expertise and experience of a stalwart team in a highly competitive industry, if you'd prefer neatly interlocking, integrated modules with point-and-click simplicity and minimal programming knowledge on the part of the user, then windows web hosting is probably for you.

Although a windows server can be quite expensive, it has one of the simplest user-interfaces available, and is therefore a server of choice for new and inexperienced administrators. Windows hosting is also incredibly convenient in that it is compatible with most other Microsoft products.

Providers with the newest windows hosting version offer windows hosting clients a collection of third party applications called Web Site Starters custom designed for a Windows environment.

If, heaven forbid, your database technology and web-building tools are from different platforms (one Linux-based, the other Windows-based) then your first step is probably to determine which you prefer - Linux or Windows web hosting - and switch over any incompatible web software to the winning platform. Generic, cross-platform technologies offer you the most versatility, but often with limited functionality. It's worth doing a little extra research to find which software systems you prefer, because you'll get more for your money with a fully integrated web-presence package than doing it piecemeal. And you'll have a far easier time with the inevitable, constant updating involved in administering a website.

Types of Web Hosting

Whenever you visit a website, what you observe on your web browser is essentially just a web page that is downloaded from the web server onto your web browser. In general, a web site is made up of many web pages. A web page is basically composed of texts and realistic images. All these web pages need to be stored on the web servers so that online users can visit your website. Therefore, if you map to own a new website, you will need to host your website on a web server.

When your website goes live on the web server, online users can then browse your website on the Internet. Company that provides the web servers to host your website is called web hosting providers.

There are dissimilar kinds of windows web hosting companies out there with different characteristics. The main types of web hosts can be organized into the following categories:

a) Shared Hosting: In shared hosting (also known as virtual web hosting), many websites are sharing the space on the same physical web servers. Depending on the web host, a physical web server can host a few hundred to even thousand of different websites at one time. But when the web server is congested and exceeds the rational number of websites that it can support, and then you will begin to experience a slower response from the web server.

b) Dedicated Hosting: In contrast to shared hosting, dedicated hosting assign a specific web server to be used only by one customer. Since a devoted web server is owed to only a single customer, the customer has the option to host single/multiple web sites, modify the software configuration, handle greater site traffic and scale the bandwidth as necessary. Therefore, dedicated hosting commands a higher premium and typically starts at $50 per month and can range up to $200 - $500 per month. As a result, devoted hosting is regularly used by high traffic and tremendously important website.

c) Co-location hosting: In dedicated hosting, the web server belongs to the web hosting provider and customers only rent the web server during the hosting period. While in co-location hosting, the customer owns the web server hardware and only housed their web server within the web hosting provider’s secure data center. In this way, the customer has full control over their web server and simultaneously benefit from the 24/7 server monitoring and maintenance provided by the secure data center. Depending on the monthly bandwidth and rack space required, typically co-location hosting ranges from $500 - $1000 per month.

d) Reseller hosting: In windows reseller hosting, a web hosting source offers web server storage to third-party (i.e. reseller) at a discount price, who then resell the web server storage to their customers. Typically, resellers are web consultants including web designers, web developers, or system Integration Company who resell the web hosting as an add-on service to complement their other range of services. Commonly, resellers can receive up to 50 percent discount on the price of a hosting account from the web hosting provider. And resellers are allowed to decide its own pricing structure and even set up its own branding.


Tomcat Hosting - How To Handle Tomcat Java With Your Web Hosting Company

If you are launching a site that needs to use Tomcat, make sure to check with your web host. I had an experience recently where we launched a site that utilized Tomcat Java and we were having a very hard time trying to find a host that would get it to work, I will share the knowledge I learned here from finally getting this project launched.

Tomcat Java is really Apache Tomcat, it also goes by the names Jakarta Tomcat, or just plain old Tomcat. It was created by Apache Software Foundation and is an open source servlet container. Without getting too technical, it implements the Java Servlet with the JavaServer Pages (JSP) specifications from Sun, and utilizes a pure Java server environment for Java to operate inside.

Don't confuse Tomcat with Apache web server, they are completely different, Apache web server is a C execution of a HTTP server. Apache Tomcat has all the software and tools used to configure and manage operations. Another nice thing is that Tomcat can also be configured by using XML files. This can come in handy if you need to configure using XML.

After extensive research we settled on hostgator, which will install Tomcat free on Windows dedicated, or Linux VPS/Dedicated plans that use Cpanel for a control panel. In both instances Tomcat is installed as an addon in Cpanel or Plesk, depending on the server operating system. You have to contact the support team to enable it.

If you are launching a website that needs to utilize Tomcat hosting, there are options available. Many web hosts do not turn this plugin on by default and many charge an upgrade fee if they do. For us, we just had to ask for it to be turned on.


Slowly Changing Dimensions SCD in Dimensional Modeling

Slowly Changing Dimensions

Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. lead to changes in the attributes of dimensions. In a transaction system, many a times the change is overwritten and track of change is lost.

For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billing statements. However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on how many customers move to new locations over what frequency.

A key benefit for Data Warehouse is to provide historical information, which is typically over-written (and thus lost)in the transaction systems. How to handle slowly changing dimensions in a Dimensional Model is a key determinant to that benefit.
There are three ways to handle the same:
Slowly Changing Dimension method 1 (In short SCD 1)

The way most of the source systems will handle it- Overwrite the attribute value. For example if a customer’s marital status has moved from 'Unmarried' to 'Married', we over-write 'unmarried' to 'Married'. Similarly, if an insurance policy status has moved from 'Lapsed' to 'Re-instated' the new status is over written on the old status. This is obviously done, when we are not analyzing the historical information.
Slowly Changing Dimension Method 2 (in short SCD 2)

This is the true-blue technique to deliver precise historical analysis. This is used, when there is more than one change in the attributes of an entity, and we need to track the date of change of the attribute.

In this method, a new record is added whereby the new record is given a separate identifier as the primary key. We cannot use the production key as the primary key here as it has not changed (Customer ID has remained the same, while the value of its attribute 'marital status' has changed). This new identifier is called the surrogate key.

Apart from adding a new record and providing a new primary (surrogate) key, the validity period for this new record is also added.

For example- You have a dimensional table with customer_ID '110002' with marital status as 'single'. Overtime, customer gets married and also moved to a new location. The customer dimension record will be:
Surrogate Key Customer ID Date Valid Marital Status Date of Birth City
1100021 110002 Sept 23, 2004 Single Jan8, 1982 Palo Alto
1100022 110002 Oct 25, 2005 Married Jan8, 1982 Palo Alto
1100023 110002 Nov 23, 2005 Married Jan8, 1982 San Francisco
Slowly changing dimension method 3 (SCD 3)

This is a mid-way between method 1 and method 2. Here we don’t add an additional record, but add a new field 'old attribute value'. However, this has limitations. This method has to know from the beginning on what attributes will change. This is because a new field/attribute has to be added in the design for every attribute, which can change. Secondly, attribute can change maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse.
Surrogate Key Customer ID Marital Status Date of Birth City Marital Status Old City Old
1100021 110002 Married Jan8, 1982 San Francisco Single Palo Alto

NOTE – The term of 'Slowly changing dimension' is used because of it being a universally acknowledged term. However, the same methods will apply to fast changing dimensions as well.
Surrogate Keys as Primary keys of dimension tables

There is a best practice in dimensional model design to not to use the production primary key as the primary key for the dimension table. This goes against conventional logic, but has a reason.

Data Warehouse has a core need for maintaining historical information and how an entity has moved and changed shape through the passage of time. Typically Source Systems need for this kind of information is quite less. In case of historical tracking in source systems, these systems can have the luxury of using multiple-field primary key (including the key identifier of the entity plus date stamp). For example if an insurance policy is lapsed and after two months it becomes reinstated, one can use the primary key as a combination of Policy number+ date/time +the status in the 'policy history table'. However, Data Warehouse doesn’t recommend the luxury of using multiple field primary key in dimension table.

Therefore, the concept 'surrogate Key' comes into play where the primary key is not the production key, but a key generated by the system. The production key is also used as an attribute within the same dimension table.
The situations/reasons on when a surrogate key is used:

* 'Slowly changing dimensions'
* When the primary key itself is repeated.
* When there is a multiple field primary key. Dimension model typically does not use multiple field primary key to link to the fact table.

Therefore it is always recommended to use surrogate keys. it is difficult to find the organizations, which will not face the situations as highlighted above. If there are, they could as well manage their needs using excel and pivot tables.


Slowly changing dimensions modeling tips

Modeling slowly changing dimensions

Oracle is only a computer program, and therefore does exactly what it is coded to do (most of the time). Quantum mechanics really do not seem to have any relevance in the Oracle world because it is coded with specific outcomes to every true-or-false question. Logic gates are applied to bits, and the programmed outcome takes place. Non-programmed outcomes result in errors (ORA-600, ORA-7445).

However, things get more complex when we are modeling new environments because we end up dealing with conceptual outcomes with multiple possibilities. This is especially true in dimensional modeling, because dimensions are supposed to hold factual lookup data. How can we record a fact when important data regarding that fact may change?

For example, imagine a tax company that keeps track of deductions found in their various offices and wishes to form a star schema that will allow past and present analysis.

Let CLIENT_DIM record = E. Schrödinger, 3 dependents
Let TIME_DIM record = 2006 tax year
Let LOCATION_DIM record = Virginia Beach, VA

The data will come together in our fact table to show that he had $25,000 in deductions for this combination of dimensional data. This “fact” is currently safe. During the next year, E. Schrödinger and his wife have a lovely baby boy (at least in this universe!).

Let CLIENT_DIM record = E. Schrödinger, 4 dependents
Let TIME_DIM record = 2007 tax year
Let LOCATION_DIM record = Virginia Beach, VA

The data will once again come together in our fact table, this time reporting $28,000 in deductions (I’m not a CPA, don’t get me on tax code) due to the extra dependent. This “fact” is currently safe, just as the last one was.

However, we now have a paradox in our data. In 2007, we report $28,000 in deductions, which was based on the fact that E. Schrödinger had four dependents. In 2006, we reported $25,000 in deductions due to E. Schrödinger’s 3 dependents. When we run our analytic reports for 2007, we will get great results; we will be able to break down the deductions and the number of dependents will play a proper role in these calculations.

But when we run our reports against 2006, the deduction calculations will not compute properly. The CLIENT_DIM record will show 4 dependents, but the deduction amount for the 2006 year will have been based upon 3 dependents. Our dimensional data (CLIENT_DIM record) changed over time.

This is known as a slowly changing dimension (or slowly changing dimension). Though we may not realize it, almost every dimension is in fact slowly changing; stores may move, clients may die (especially if they hang out with sadistic quantum physicists with ready supplies of hydrocyanic acid), and even our human definition of time can change over time (consider the changes to daylight savings this year).

However, we don’t have to worry about all of these possible changes; we only have to worry about the ones pertaining to the facts on which we are attempting to report. Our business needs, in the end, determine which dimensions must be slowly changing.

There are three types of slowly changing dimensions: Type 1, Type 2, and Type 3. Each of these types tries to help the designer of the star schema eliminate paradox from their dimensional model (just as the three interpretations of the Schrödinger’s Cat thought experiment tries to eliminate the paradox of the living dead).

  • Type 1 slowly changing dimension: Overwrite the old value with the new value and call it a day. This is very useful when dealing with issues such as typos on the client’s name. We don’t care about the history in this case because it was incorrect anyways.
  • Type 2 slowly changing dimension: Create a new record in the dimension with a new primary key. In the example we’ve given, there would be two records in CLIENT_DIM for E. Schrödinger, one in which he has 3 dependents and once in which he has 4. Though he is one person from the business point of view, he is two people from a dimensional point of view.
  • Type 3 slowly changing dimension: Overwrite the old value with the new value, and add additional data to the table such as the effective date of the change. This type of slowly changing dimension resolution would be beneficial if there is a change that can happen once and only once (such as death).

These three types of slowly changing dimension resolution usually help in resolving changes to “factual” lookup data. However, we can see clear correlations between these three types of resolutions and the three interpretations of the Schrödinger’s Cat thought experiment!

slowly changing dimension Management Type 1

This clearly matches up with the Copenhagen Interpretation of the Schrödinger’s Cat thought experiment. In the Copenhagen Interpretation, the state of the cat changes and all other states are discarded as the waveform collapses. Criticism to this interpretation applies as well to Type 1 slowly changing dimension resolution. The Copenhagen Interpretation ignores the possibility of reconstruction; in quantum mechanics, it must be possible to return to any original state before measurement was taken place. In our star schema, it also ignores the possibility of reconstruction; we will not be able to return to the original state or even acknowledge that a previous state existed for the purpose of analytics.

slowly changing dimension Management Type 2

This matches with the Many Worlds Interpretation of the Schrödinger’s Cat thought experiment. Instead of completely destroying the other possible waveform, we simply maintain that the two possibilities decohere and form their own universes that will no longer share any correlation. This interpretation shares a similar problem with Type 2 slowly changing dimension resolution. By spawning new records (universes) as the outcome of a changing event, we create multiple possibilities that no longer share any correlation.

For instance, if E. Schrödinger has a new dependent, there will be two resulting rows: one in which has 3 dependents and one in which he has 4. If he then legally changes his last name to Schroedinger, we will have to record that change since it is important for tax records. Because of this, we will have three total records for this one client (E. Schrödinger with 3 dependents, E. Schrödinger with 4 dependents, and E. Schroedinger with 4 dependents). These three records will not have any correlation unless we create some sort of superkey that properly identifies a single person and their many instances. This will be important if we will be doing mining that incorporates multiple times, clients, and locations in our analysis.

slowly changing dimension Management Type 3

This matches with the Many Histories Interpretation of the Schrödinger’s Cat thought experiment. When an observed outcome occurs (like the birth of a new child), the old record is changed to reflect the new “real” outcome; in this case, the addition of a single dependent. However, the change is noted by the addition of a column such as an effective date, to show that this is not the only outcome that has ever existed, but it is THE outcome that does exist at this time.

In effect, the old outcome (3 dependents) still exists, but is discarded, ignored, and irretrievable now that the outcomes are decoherent. In our star schema, this type of resolution will only provide us with a confusing result of “This is the case now, but it was not always so. It changed on ….” For some situations, as with Type 1 resolution, this will suffice, such as the death of a client. We only need to record that death once. However, if any new data enters our model after the death of the client (post-mortem taxes?) or if the status changes again (miraculous recovery!), we may have unreliable report output.


In quantum mechanics, we are calculating multiple events that happen at the same exact point in time (or over a period of unknown time); whereas in a data warehouse we are dealing with history and fully acknowledge the changes time may have on our data. However, one cannot help but notice the correlation between the two forms of paradox and their resolutions.

In fact, in data warehousing the so-called Schrödinger’s Cat paradox becomes even more problematic because we are forced to not only predict future outcomes based on dimensional data, but to also report on past/present information based on the same data. Physicists attempting to provide interpretations of this thought experiment only have to worry about future conditions; the past and present are unmeasured, and therefore have no relevance on the problem except that they are in a quantum state.

They seek to explain the future of the cat once observation/decoherence have taken place. If Schrödinger’s Cat were in a data warehouse, we would have to analyze the entire life of the cat, the cat inside that horrible box of doom, and try to figure out whether the cat will want dry food, wet food, or a good burial place after the experiment is finished.


Slowly Changing Dimension Transformation

The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables. For example, you can use this transformation to configure the transformation outputs that insert and update records in the DimProduct table of the AdventureWorksDW2008R2 OLAP database with data from the Production.Products table in the AdventureWorks2008R2 OLTP database.

Important noteImportant

The Slowly Changing Dimension Wizard only supports connections to SQL Server.

The Slowly Changing Dimension transformation provides the following functionality for managing slowly changing dimensions:

  • Matching incoming rows with rows in the lookup table to identify new and existing rows.

  • Identifying incoming rows that contain changes when changes are not permitted.

  • Identifying inferred member records that require updating.

  • Identifying incoming rows that contain historical changes that require insertion of new records and the updating of expired records.

  • Detecting incoming rows that contain changes that require the updating of existing records, including expired ones.

The Slowly Changing Dimension transformation supports four types of changes: changing attribute, historical attribute, fixed attribute, and inferred member.

  • Changing attribute changes overwrite existing records. This kind of change is equivalent to a Type 1 change. The Slowly Changing Dimension transformation directs these rows to an output named Changing Attributes Updates Output.

  • Historical attribute changes create new records instead of updating existing ones. The only change that is permitted in an existing record is an update to a column that indicates whether the record is current or expired. This kind of change is equivalent to a Type 2 change. The Slowly Changing Dimension transformation directs these rows to two outputs: Historical Attribute Inserts Output and New Output.

  • Fixed attribute changes indicate the column value must not change. The Slowly Changing Dimension transformation detects changes and can direct the rows with changes to an output named Fixed Attribute Output.

  • Inferred member indicates that the row is an inferred member record in the dimension table. An inferred member exists when a fact table references a dimension member that is not yet loaded. A minimal inferred-member record is created in anticipation of relevant dimension data, which is provided in a subsequent loading of the dimension data. The Slowly Changing Dimension transformation directs these rows to an output named Inferred Member Updates. When data for the inferred member is loaded, you can update the existing record rather than create a new one.


The Slowly Changing Dimension transformation does not support Type 3 changes, which require changes to the dimension table. By identifying columns with the fixed attribute update type, you can capture the data values that are candidates for Type 3 changes.

At run time, the Slowly Changing Dimension transformation first tries to match the incoming row to a record in the lookup table. If no match is found, the incoming row is a new record; therefore, the Slowly Changing Dimension transformation performs no additional work, and directs the row to New Output.

If a match is found, the Slowly Changing Dimension transformation detects whether the row contains changes. If the row contains changes, the Slowly Changing Dimension transformation identifies the update type for each column and directs the row to the Changing Attributes Updates Output, Fixed Attribute Output, Historical Attributes Inserts Output, or Inferred Member Updates Output. If the row is unchanged, the Slowly Changing Dimension transformation directs the row to the Unchanged Output.

The Slowly Changing Dimension transformation has one input and up to six outputs. An output directs a row to the subset of the data flow that corresponds to the update and the insert requirements of the row. This transformation does not support an error output.

The following table describes the transformation outputs and the requirements of their subsequent data flows. The requirements describe the data flow that the Slowly Changing Dimension Wizard creates.



Data flow requirements

Changing Attributes Updates Output

The record in the lookup table is updated. This output is used for changing attribute rows.

An OLE DB Command transformation updates the record using an UPDATE statement.

Fixed Attribute Output

The values in rows that must not change do not match values in the lookup table. This output is used for fixed attribute rows.

No default data flow is created. If the transformation is configured to continue after it encounters changes to fixed attribute columns, you should create a data flow that captures these rows.

Historical Attributes Inserts Output

The lookup table contains at least one matching row. The row marked as “current” must now be marked as "expired". This output is used for historical attribute rows.

Derived Column transformations create columns for the expired row and the current row indicators. An OLE DB Command transformation updates the record that must now be marked as "expired". The row with the new column values is directed to the New Output, where the row is inserted and marked as "current".

Inferred Member Updates Output

Rows for inferred dimension members are inserted. This output is used for inferred member rows.

An OLE DB Command transformation updates the record using an SQL UPDATE statement.

New Output

The lookup table contains no matching rows. The row is added to the dimension table. This output is used for new rows and changes to historical attributes rows.

A Derived Column transformation sets the current row indicator, and an OLE DB destination inserts the row.

Unchanged Output

The values in the lookup table match the row values. This output is used for unchanged rows.

No default data flow is created because the Slowly Changing Dimension transformation performs no work. If you want to capture these rows, you should create a data flow for this output.

The Slowly Changing Dimension transformation requires at least one business key column.

The Slowly Changing Dimension transformation does not support null business keys. If the data include rows in which the business key column is null, those rows should be removed from the data flow. You can use the Conditional Split transformation to filter rows whose business key columns contain null values. For more information, see Conditional Split Transformation.

For suggestions on how to improve the performance of the Slowly Changing Dimension Transformation, see Improving the Performance of the Data Flow.

You can log the calls that the Slowly Changing Dimension transformation makes to external data providers. You can use this logging capability to troubleshoot the connections, commands, and queries to external data sources that the Slowly Changing Dimension transformation performs. To log the calls that the Slowly Changing Dimension transformation makes to external data providers, enable package logging and select the Diagnostic event at the package level. For more information, see Troubleshooting Package Execution.

You can set properties through SSIS Designer or programmatically.

For more information about the properties that you can set in the Advanced Editor dialog box or programmatically, click one of the following topics:

For more information about how to set properties, see How to: Set the Properties of a Data Flow Component.

Coordinating the update and insertion of records in dimension tables can be a complex task, especially if both Type 1 and Type 2 changes are used. SSIS Designer provides two ways to configure support for slowly changing dimensions:

  • The Advanced Editor dialog box, in which you to select a connection, set common and custom component properties, choose input columns, and set column properties on the six outputs. To complete the task of configuring support for a slowly changing dimension, you must manually create the data flow for the outputs that the Slowly Changing Dimension transformation uses. For more information, see Designing Package Data Flow.

  • The Load Dimension Wizard, which guides you though the steps to configure the Slowly Changing Dimension transformation and build the data flow for transformation outputs. To change the configuration for slowly change dimensions, rerun the Load Dimension Wizard. For more information, see Configuring Outputs Using the Slowly Changing Dimension Wizard.


Sunday, February 20, 2011

Basic Java terminology

The purpose of this page is to help the beginner with the basic Java terminology. Advanced programmers and teachers often use words that are very common in Java and computers without explaining them because they are so accustomed to using them. Therefore, the beginner is often unable to understand what is being taught, because he is lacking in the basic vocabulary. The terminology below is meant to be easily understood, and is therefore not given a technical definition.

  • processor: The processor is inside the brain of the computer, and it is its job to read, or process, the instructions. It takes each instruction (after it has been broken down into basic computer language by the compiler) and runs it, turning on and off switches, causing the program to execute.
  • CPU: The CPU or Central Processing Unit, is the brain of the computer. It holds inside it the processor, the Arithmetic Logical Unit (ALU: the computer's math machine), and controls and tells the rest of the computer what to do.
  • code: Code just means the instructions that the programmer writes. The code of a program means the part that the programmer wrote up, and will give to the computer to run.
  • compiler The compiler "translates" the code that you have written in the language that you understand (like Java) into a language that the computer understands, either assembly language, or pure computer language (ones and zeros).
  • keywords: Keywords are words that have a very specific meaning to the compiler. Whenever the compiler sees one, it knows what it is, and translates it as such. For example, the word"int" means a number. Whenever in the program the programmer writes "int", it has only that meaning, a number. The programmer can never use it for something else. Another example is the word "if". If means try to see if what I am saying is true. If it is then do the next line, if not not. The word if can only mean that, and can never change. There are approximately 30 keywords.
  • control statements: Whenever the computer runs a Java program, it goes straight from the first line of code to the last. However, lets say you only want to run some code on condition. For example, you are writing an adventure game, and your player is hit, do you want him to die, or not. Well, it depends, how many "hits" is he allowed to have before he dies. So you want a control statement that says "if he is down to his last hit, then run the dying code, else subtract the number of hits that he is allowed, and continue". The if else is one kind of control statement, and it changes control to read from a different line of code, instead of the next. For more see the Control statements section.
  • variables Variables are words in your code, that have different meanings based on different times in the program. Why would I want that? The reason is, that that which makes a computer so powerful, is the ability to be given a set of instructions, and act differently based on the circumstances. For example, in the adventure game, your player has the ability to be hit 5 times before he dies. That number is going to change, as he runs through the game. Sometimes when he gets to the tower (for example) he will have 4 "hits" remaining, sometimes he will be down to his last one, and when the bad guy comes and hits him he will die. How is the computer able to keep track and know this? With variables. There will be one variable that is called "life", and you will start it off with 5. Every time that the player gets hit, you will tell life to subtract from itself one. You will also check, if life equals zero, then run the code to die. Since "life" is a variable and not a keyword, you will create this word. I called it life, but you can call it whatever you like. Also, when you go online, the computer asks you to tell them your name. You type in "Shlomo" (assuming that that is your name). Then for the rest of the time that you are on that site they always call you Shlomo. You go onto a new page, and they say "Shlomo, what do you want to buy?". They don't have a hundred premade web pages for each name. Instead, they collect your name into a variable, let's call firstName, and have one web page that says "print out 'firstName, what do you want to buy'".
  • operations and operands: Operations are symbols that have a specific meaning. Operands are the words that use the operations. For example the line " salary + bonus " means take the two operands of salary and bonus, and use the operation of "+" to add them. In Java the operation of "+" means to add, the operation of "=" means to get, and the operation of "==" means equal. "int payment = salary + bonus " means create a number variable called "payment" and let it get the variable of salary added to the variable of bonus.
  • int : in Java a number is represented by the word int. However, this is only one representation. For a complicated number like 4.5, you will need a different representation, like "double" or "float". The word "int" is a keyword that means number. If you write "int payment" you are saying that you are making a variable called payment that is a number. If you write "int payment = 5" you are saying that you are making a variable called payment that is a number and giving it the value of 5.
  • String : A string is a variable that holds a word, or a few words, or a mixture of characters, that do not mean anything besides for a quote. Meaning: when I go online and they ask me for my name, I tell them "Shlomo". They save that quote of "Shlomo" in a String variable. The actual word "Shlomo" doesn't mean anything to the compiler. However, whenever the program wants to call my name it says "print out that String that we called firstName" and whatever is in that String is printed. For example, if I were to write a piece of code "String message = "How are you today?", I would be telling the computer that there is a variable called message, and whenever I refer to that variable, I am referring to the String of "How are you today?".
  • Primitive : Almost everything in Java is an Object or Class, meaning that it has inside it operations, behaviors and methods (meaning stuff). (See the Object and Class section). Primitives are the exception. A primitive is only what it is, and nothing more, it can not do anything but hold that one piece of information that it is supposed to hold. For example, an int is a primitive. It can only hold a number. A String is an Object, it can change itself. Meaning, I can tell the String change yourself to be capitalized, and the String of "hello" will change into "HELLO". A primitive can not do anything besides hold itself.
  • Debugging : debugging means finding the errors and fixing them. See the debugging and fixing errors section.
  • Objects, classes, class interfaces, methods, argument lists, parameters, inheritance : check out the section on Object and Class.
  • interfaces, user interfaces, and GUI : An interface simply means the way that two entities communicate with each other. For example, if two people write two separate programs that interact with each other, they communicate with each other through an interface. If you write a program that works on a real system, like an ATM, then in order to communicate with the money dispenser, you will need an interface in between. An interface allows the ability to say I don't know how you do what you are doing, all I need is a common way to communicate and get the information that you are giving me that I need to use, and give you only what you need to use. (In Java there is a special thing called an interface, check out the above note). A user interface is the way that a user interacts with a machine. The user doesn't need to know how the machine works, or what is going on inside, it just needs to be able to interact with it. The command console below is a user interface. A GUI or Graphical User Interface is an user interface, that talks to the user using graphics. For example, Windows is a GUI. Instead of typing in the word "exit", "OK', or whatever you will need to type, in a GUI you can press a button, use a scrollbar, etc.
A User Interface A Graphical User Interface (GUI)
The ability to talk to the computer. You need a new folder you type md, you need the directory you type dir, etc. The ability to talk to the computer with graphics. You need a new folder you right click, you need the directory you click the folder, etc.
user interface graphical user interface

What is Java, it?s history? & Where is Java being Used?

What is Java, it?s history?

Java is a high-level object-oriented programming language developed by the Sun Microsystems. Though it is associated with the World Wide Web but it is older than the origin of Web. It was only developed keeping in mind the consumer electronics and communication equipments. It came into existence as a part of web application, web services and a platform independent programming language in the 1990s.

Earlier, C++ was widely used to write object oriented programming languages, however, it was not a platform independent and needed to be recompiled for each different CPUs. A team of Sun Microsystems including Patrick Naughton, Mike Sheridan in the guidance of James Goslings decided to develop an advanced programming language for the betterment of consumer electronic devices. They wanted to make it new software based on the power of networks that can run on different application areas, such as computers and electronic devices. In the year 1991 they make platform independent software and named it Oak. But later due to some patent conflicts, it was renamed as Java and in 1995 the Java 1.0 was officially released to the world.

Java is influenced by C, C++, Smalltalk and borrowed some advanced features from some other languages. The company promoted this software product with a slogan named “Write Once Run Anywhere” that means it can develop and run on any device equipped with Java Virtual Machine (JVM). This language is applicable in all kinds of operating systems including Linux, Windows, Solaris, and HP-UX etc

Where is Java being Used?

The programming language Java was developed by Sun Microsystems in the year 1995. Earlier, it was only used to design and program small computing devices but later adopted as one of the platform independent programming language. The most important feature of Java is its byte code that can be interpreted on any platform including windows, Linux etc. One can also download it freely from the official website of Sun.

As we have mentioned above that java-programming language was only developed for the small devices but now it can be found in a variety of devices like cell phones, e-commerce application, PCs and almost all network or computing devices.

Java is available in different form:
JSP – Like PHP and ASP, Java Server Pages based on a code with normal HTML tags, which helps in creating dynamic web pages.

Java Applets – This is another type of Java program that used within a web page to add many new features to a web browser. These are small program used in the programming of instant messaging, chat service, solving some complex calculation and for many other purposes.

J2EE – The software Java 2 Enterprise Edition are used by various companies to transfer data based on XML structured documents between one another.

JavaBeans – This is something like Visual Basic and a reusable software component that can be easily assemble to create some new and advanced application.

As far as syntax is concerned, Java is similar as the C programming language but a distinct style of coding. It follows all the general programming features like loops, data types, conditions, curly braces, semi-colon etc. Its a fully featured Object Oriented Programming (OOP) language as it supports all OOP features including classes, modules, inheritance, Polymorphism etc.

Mobile Java - Besides the above technology, Java is also used for various entertainment devices especially mobile phone. Mobile Information Devices Profile (MIDP) uses Java run time environment in cell phones, mobile tracking systems and other traditional PDA devices. Java technology enabled application is key to the games and services available in the mobile world. This also plays an important role in the field of telemedicine such as PulseMeter. As far as mobile technology is concerned, it offers offline facility, so that users can get service even if they face loss of connection. Today, all leading mobile service provider like Nokia, Siemens, Vodafone are using Java technology. Sun Java Wireless Toolkit offers complete support for developing different MIDP application.

Java technology is enabled with healthy content ecosystem by offering a healthy development and deployment environment, protecting users and operators from down time and viruses. The increase volume of users now encouraging manufactures and developers to apply Java technology in numerous other productive and functional ways including MP3 players, digital TV, video, 3D, simplifying games, etc.

How to set up Java on my computer

The very first thing to do is install the Java Developer's Kit (jdk) onto your computer. Sometimes if you buy a Java book it comes on a CD with the book. However, usually you have to go to Sun's Java site, in order to download it. For simplicity, you can go to Downloading compilers where it will explain to you how and what kind of Java Developer's Kit to download. After you have downloaded the jdk you will have to follow their simple installation instructions. There are two ways to run a program. The basic way is to click on the start button on the bottom of the Windows desktop, click "run" and type in "CMD". If you did not set the path (see next paragraph how) you will then have to find the bin directory. In my computer I typed "cd.." until I reached C:\>, I found my Java directory (on my computer it was j2sdk1.4.1, but may be different on yours), and then the bin directory. If you have set the path, then you can run Java in any folder that you want. I then typed "edit". "Edit" opens a simple editor (writing program), and the name of my program is called "". Every Java class has the .java extension. I wrote up a simple code that displays to the screen "welcome to programming". I saved, exited, and then typed "javac". Javac stands for Java compiler, and the computer then compiles (meaning converts from what you wrote into a language that the computer can understand) my programs. If I were to look at my folder, I would find a new class called Hello.class. When I compiled my "", the compiler created a "Hello.class", which is in computer understood language. If you try to read it, it will look like garbage, because it is in a language that only the computer can read. I then type "java Hello", which tells the computer to run the class called "Hello". (Note in Java capitalization makes a difference. If your class is called Hello, and you type java hello, it will not work.) Once you are familiar with programming, a better choice would be write your program using an editor or an IDE, which makes writing code easier. Go to Downloading an editor or an IDE to figure out which one you want, and how to get one.

Setting the path. The commands of "javac" and "java" (amongst others), are not system commands, they are Java commands. Meaning, that if you are in the Java bin directory, then the computer understands what these commands mean. If you are out of that directory, then the computer doesn't understand what these commands mean. In short, you do not need to set the path, but if you don't all of your programs will have to be in the bin directory. If you would want to create a file in the C directory, and call it javawork, and keep all of your code in that folder, then you need to set the path. How? The following is taken from Deitel's Java How To Program

Update the PATH variable in Windows

Next, you will set your system's PATH variable to conveniently run the Java 2 SDK executables (javac.exe, java.exe, javadoc.exe, etc.) from any directory without having to type the full path of the command. If you don't set the PATH variable, you need to specify the full path to the executable every time you run it, such as:

C:> \j2sdk1.4.1\bin\javac

While you do not need to set your system's PATH variable to run Java, other software you will install later expects that your PATH variable will already have been set.

It's useful to set the PATH permanently so it will persist after rebooting.

How do I set the PATH permanently?

To set the PATH permanently, add the full path of the j2sdk1.4.1\bin directory to the PATH variable. Typically this full path looks something like C:\j2sdk1.4.1\bin. Set the PATH as follows, depending on the version of Windows you are using:

Windows NT, Windows 2000, and Windows XP - To set the PATH permanently:

1. Choose Start, Settings, Control Panel, and double-click System. On Windows NT, select the Environment tab; on Windows 2000 select the Advanced tab and then Environment Variables. Look for "Path" in the User Variables and System Variables. If you're not sure where to add the path, add it to the right end of the "Path" in the User Variables. A typical value for PATH is:


Capitalization doesn't matter. Click "Set", "OK" or "Apply".

The PATH can be a series of directories separated by semi-colons (;). Microsoft Windows looks for programs in the PATH directories in order, from left to right. You should only have one bin directory for a Java SDK in the path at a time (those following the first are ignored), so if one is already present, you can update it to j2sdk1.4.1.

2. The new path takes effect in each new Command Prompt window you open after setting the PATH variable.

Windows 98, Windows 95 - To set the PATH permanently, open the AUTOEXEC.BAT file and add or change the PATH statement as follows:

1. Start the system editor. Choose "Start", "Run" and enter sysedit, then click OK. The system editor starts up with several windows showing. Go to the window that is displaying AUTOEXEC.BAT

2. Look for the PATH statement. (If you don't have one, add one.) If you're not sure where to add the path, add it to the right end of the PATH. For example, in the following PATH statement, we have added the bin directory at the right end:


Capitalization doesn't matter. The PATH can be a series of directories separated by semi-colons (;). Microsoft Windows searches for programs in the PATH directories in order, from left to right. You should only have one bin directory for a Java SDK in the path at a time (those following the first are ignored), so if one is already present, you can update it to j2sdk1.4.1.

3. To make the path take effect in the current Command Prompt window, execute the following:

C:> c:\autoexec.bat

To find out the current value of your PATH, to see if it took effect, at the command prompt, type:

C:> path

Windows ME - To set the PATH permanently:

From the start menu, choose Programs --> Accessories --> System Tools --> System Information. This brings up a window titled "Microsoft Help and Support". From here, choose Tools --> System Configuration Utility. Click the Environment tab, select PATH, and click Edit. Now add the SDK to your path as described in step B above. After you've added the location of the SDK to your PATH, save the changes and reboot your machine when prompted.

What is programming?

Computer program is a set of instructions that guide a computer to execute a particular task. It is like a recipe for a cook in making a particular dish. The recipe contains a list of ingredients called the data or variables, and a list of steps that guide the computer what to do with the data. So programming is the technique of making a computer to perform something you want to do.

Programming or coding is a language that is used by operating systems to perform the task. We know computer understands binary languages with digits 1s and 0s. These binary languages are difficult to understand by human; so we generally use an intermediate language instead of binary language. Again the program uses high-level language that is interpreted into bytes that the computer understands. So a programmer writes a source code and uses a tool or interpreter that allows the computer to read, translate and execute the programs to perform a function.

Today there are different user friendly and easily understandable languages supporting different styles of programming. Some of the computer languages are like formula translation (FORTRAN), C, C++, PASCAL, BASIC, Java, C sharp(C#) and many other high-level languages. Further an interpretable p-code or byte code is generated in case some advanced languages like Java and .NET.

These languages enable one to create and perform various kinds of applications. However, in the whole process of programming it is important to understand that a program written in any of the high-level languages needs to be converted to machine language to run on a computer. This conversion is done with a complier or an interpreter. In all kinds of programming languages a complier and if required an interpreter is available. But the basic difference between these two is that the complier converts the entire program into machine code. An interpreter, on the other hand, converts one statement at a time to machine language and executes it.

The most important aspect of programming is to analyze and adopt specific solution while solving any problem. This needs a programming approach that defines the modularity of program that one writes and how it is related to others in an application. Basically there are two different programming approaches; procedure oriented and object oriented. The procedure oriented programming (POP) approach focuses on creating and ordering procedures or a block of code keeping in mind to accomplish a specific job. The key features of this kind of approach are: use of procedures, sequencing of procedures and sharing global data.

However, in case of the object oriented programming (OOP) approach the focus is totally towards identifying objects or data and not on creative activities. Unlike the procedure oriented programming approach where a peace of code uses data, the object-oriented approach the data uses a peace of code to execute tasks. Principal features of the object-oriented approach are: data classification into classes and objects, data encapsulation, abstraction, inheritance, and polymorphism. This kind of approach provides a realistic representation, flexibility to change and data security. Now a days most of the high level programming languages such as Java, C#, C++, and Visual Basic are based on object oriented approach.

Free Java tutorials & programming source code

Learn the fundamentals of Java programming language through a variety of online tutorials. These tutorials teach the essential concepts behind building applications using various programming concepts and modules. This site can be used as a practical, example based guide for beginning programmers or those without much Object Oriented programming experience.

Free Java Guide: This site lists General Java tutorials and specific Java programming topics for serious programming. In the case of sql tutorial, for each command, the SQL syntax will first be presented and explained, followed by an example. This site aims to teach beginners the building blocks of SQL. Well organized, easy to understand SQL guide with lots of examples that helps you need to get started using SQL. If you are also looking for a PL/SQL tutorial, this is the site. Our PL/SQL tutorial provides the help you need to get started using SQL and PL/SQL.

Master Java, find popular listings for various Java technologies ranging from Core Java, PL/SQL, HTML, XML and SQL


Learn the Core Java basics. This topic is for those learning Java programming or having general Java programming questions. It is a fundamental guide, aimed at beginners to java programming.


If you are looking to learn PL/SQL, this is the site. It provides the help you need to get started using SQL and PL/SQL. It gives an introduction to Procedural Structured Query Language (pl/sql). PL/SQL help, examples and references needed to start programming in plsql.


This covers the basics of SQL Language. This lists the commonly used SQL commands, and is divided into the various sections organized by sql topics. By the end of this sql guide, you should have a good general understanding of the SQL syntax. In addition, you should be able to write SQL queries using the correct syntax.

ORACLE Question Bank

It contains more than 500 Questions (in 10 pages) of oracle and SQL + PL/SQL which can be used for facing interviews and for personal evaluation of oracle knowledge organized by topic. This question bank helps by asking you questions and explaining which answer is correct and why.

HTML Tutorial for beginners HTML Tutorial - advanced

This is a terrific resource for beginners and students. It includes many copy & paste HTML scripts with detailed explanations that you can put right into an existing web page. It's also a good reference to find that tag that you just can't remember but need for your web page.

The following material is a part of 'IBM's resource for developers' website.

1. SCJP, Part 1

This SCJP guide is to help you become a Sun certified Java programmer. It is organized in the same way as the Sun Certified Java Programmer (SCJP) 1.4 exam and provides a detailed overview of all of the exam's main objectives. Throughout the java pdf, simple examples are provided to illustrate the important concepts covered in the exam.

2. Introduction to Core java I/O

This java I/O pdf is an overview of Java I/O and all the classes in the package. This guide assumes you have a basic knowledge of I/O, including Input Stream and Output Stream.

3. Enterprise Beans Fundamentals

This ejb pdf provides an introduction to Enterprise JavaBeans technology with particular attention to the role of Enterprise JavaBean components in distributed computing scenarios, the architecture, the extension APIs, and the fundamentals of working with EJB technologies

4. The Class Loader

The Java ClassLoader is a crucial, but often overlooked, component of the Java runtime system. It is the class responsible for finding and loading class files at run time. Creating your own ClassLoader lets you customize the JVM in useful and interesting ways, allowing you to completely redefine how class files are brought into the system.

5. Design Patterns 101

This lesson is for Java programmers who want to learn about java design patterns as a means of improving their object oriented design and development skills. After reading this pdf you will:
* Understand what design patterns are and how they are described and categorized in several well known catalogs
* Be able to use design patterns as a vocabulary for understanding and discussing object oriented software design
* Understand a few of the most common design patterns and know when and how they should be used

4. Introduction to Threads

This tutorial explores the basics of threads -- what they are, why they are useful, and how to get started writing simple programs that use them. It also explains the basic building blocks of more sophisticated threading applications, how to exchange data between threads, how to control threads, and how threads can communicate with each other.

XML Tutorials

Introduction: Learn what XML is all about and discover how it differs from HTML. Explore XML syntax rules, learn how to write well formed XML documents, adjust XML attributes, validate XML documents and XML programming with java. In these tutorials you will learn what XML is about. You'll understand the basic XML syntax. Know what's needed to make XML usable along with java programming. You'll be able to understand XML Documents and most of XML DTD's.

6. XML programming in Java technology, Part 1

This xml tutorial covers the basics of manipulating XML documents using Java technology and looks at the common APIs for XML and discusses how to parse, create, manipulate, and transform XML documents. Covers basics of XML parsing in the Java language.

7. XML programming in Java technology, Part 2

This looks at working with namespaces, validating XML documents, building XML structures without a typical XML document, converting between one API and another, and manipulating tree structures.

8. XML programming in Java technology, Part 3

Covers more sophisticated topics for manipulating XML documents with Java technology. It shows you how to do tasks such as generate XML data structures, manipulate those structures, and interface XML parsers with non XML data sources.

8. Understanding DOM

This is designed for developers who understand the basic concept of XML and are ready to move on to coding applications to manipulate XML using the Document Object Model (DOM). It assumes that you are familiar with concepts such as well formed ness and the tag like nature of an XML document.


Thursday, February 17, 2011

Java 6.0 Features Part - 2 : Pluggable Annotation Processing API

1) Introduction

The first part of this article listed out the major new features of Java 6 (Mustang) related to areas like Common Annotations (JSR 250), Scripting Language for the Java Platform (JSR 223) and JDBC 4.0. This article assumed that Readers have got sufficiently fair bit of knowledge in the various concepts of Java 5.0. First-time Readers of Java 6 are strongly encouraged to read the first part of this article titled "Introduction to Java 6.0 New Features, Part–I". This article covers the left-over features of Part-I. More specifically, it will cover the Pluggabable Annotation Processing API (JSR 269), Java API for XML Binding (JSR 222) and Streaming API for XML (JSR 173).
2) Pluggable Annotation Processing API
2.1) Introduction to Annotation

Annotations have been there in the Java World from Java 5.0. Java Annotations are a result of the JSR 175 which aimed in providing a Meta-Data Facility to the Java Programming Language. It can be greatly used by the Build-time Tools and Run-time Environments to do a bunch of useful tasks like Code Generation, Validation and other valuable stuffs. Java 6 has introduced a new JSR called JSR 269, which is the Pluggable Annotation Processing API. With this API, now it is possible for the Application Developers to write a Customized Annotation Processor which can be plugged-in to the code to operate on the set of Annotations that appear in a Source File.

Let us see in the subsequent sections how to write a Java File which will make use of Custom Annotations along with a CustomAnnotation Processor to process them.
2.2) Writing Custom Annotations

This section provides two Custom Annotations which will be used by a Sample Java File and a Custom Annotation Processor. One is the Class Level Annotation and the other is the Method Level Annotation. Following is the listing for both the Annotation Declarations. See how the Targets for the Annotations and are set to ElementType.TYPE and ElementType.METHOD respectively.

package net.javabeat.articles.java6.newfeatures.customannotations;

import java.lang.annotation.*;

@Target(value = {ElementType.TYPE})
public @interface ClassLevelAnnotation {

package net.javabeat.articles.java6.newfeatures.customannotations;

import java.lang.annotation.*;

@Target(value = {ElementType.METHOD})
public @interface MethodLevelAnnotation {

package net.javabeat.articles.java6.newfeatures.customannotations;

public class AnnotatedJavaFile {

public void annotatedMethod(){

The above is a Sample Java File that makes use of the Class Level and the Method Level Annotations. Note that @ClassLevelAnnotation is applied at the Class Level and the @MethodLevelAnnotation is applied at the method Level. This is because both the Annotation Types have been defined to be tagged to these respective Elements only with the help of @Target Annotation.
2.3) Writing a Simple Custom Annotation Processor

package net.javabeat.articles.java6.newfeatures.customannotations;

import java.util.*;
import javax.annotation.processing.*;
import javax.lang.model.*;
import javax.lang.model.element.*;

@SupportedAnnotationTypes(value= {"*"})

public class TestAnnotationProcessor extends AbstractProcessor {

public boolean process(
Set extends TypeElement> annotations, RoundEnvironment roundEnv){

for (TypeElement element : annotations){
return true;

Let us discuss the core points in writing a Custom Annotation Processor in Java 6. The first notable thing is that Test Annotation Processor class extends AbstractProcessor class which encapsulates an Abstract Annotation Processor. We have to inform what Annotation Types our Test Annotation Processor Supports. This is manifested through the Class-Level Annotation called @SupportedAnnotationTypes(). A value of "*" indicates that all types of Annotations will be processed by this Annotation Processor. Which version of Source Files this Annotation Processor supports is mentioned through @SupportedSourceVersion Annotation.

The javac compiler of Mustang has an option called '-processor' where we can specify the Name of the Annotation Processor along with a Set of Java Source Files containing the Annotations. For example, in our case, the command syntax would be something like the following,

javac -processor

The above command tells that the name of the Annotation Processor is net.javabeat.articles.java6.newfeatures.customannotations.TestAnnotationProcessor and it is going to process the As soon as this command is issued in the console, the TestAnnotationProcessor.process() method will be called by passing the Set of Annotations that are found in the Source Files along with the Annotation Processing Information as represented by RoundEnvironment. This TestAnnotationProcessor just list the various Annotations present in the Sample Java File ( by iterating over it.

Following is the output of the above program



The Java XPath API

If you send someone out to purchase a gallon of milk, what would you rather tell that person? "Please go buy a gallon of milk." Or, "Exit the house through the front door. Turn left at the sidewalk. Walk three blocks. Turn right. Walk one half block. Turn right and enter the store. Go to aisle four. Walk five meters down the aisle. Turn left. Pick up a gallon jug of milk. Bring it to the checkout counter. Pay for it. Then retrace your steps home." That's ridiculous. Most adults are intelligent enough to procure the milk on their own with little more instruction than "Please go buy a gallon of milk."

Query languages and computer search are similar. It's easier to say, "Find a copy of Cryptonomicon" than it is to write the detailed logic for searching some database. Because search operations have very similar logic, you can invent general languages that allow you to make statements like "Find all the books by Neal Stephenson," and then write an engine that processes those queries against certain data stores.


Among the many query languages, Structured Query Language (SQL) is a language designed and optimized for querying certain kinds of relational databases. Other less familiar query languages include Object Query Language (OQL) and XQuery. However, the subject of this article is XPath, a query language designed for querying XML documents. For example, a simple XPath query that finds the titles of all the books in a document whose author is Neal Stephenson might look like this:

//book[author="Neal Stephenson"]/title

By contrast, a pure DOM search for that same information would look something like Listing 1:

Listing 1. DOM code to find all the title elements of books by Neal Stephenson

ArrayList result = new ArrayList();
NodeList books = doc.getElementsByTagName("book");
for (int i = 0; i < books.getLength(); i++) {
Element book = (Element) books.item(i);
NodeList authors = book.getElementsByTagName("author");
boolean stephenson = false;
for (int j = 0; j < authors.getLength(); j++) {
Element author = (Element) authors.item(j);
NodeList children = author.getChildNodes();
StringBuffer sb = new StringBuffer();
for (int k = 0; k < children.getLength(); k++) {
Node child = children.item(k);
// really should to do this recursively
if (child.getNodeType() == Node.TEXT_NODE) {
if (sb.toString().equals("Neal Stephenson")) {
stephenson = true;

if (stephenson) {
NodeList titles = book.getElementsByTagName("title");
for (int j = 0; j < titles.getLength(); j++) {


Believe it or not, the DOM code in Listing 1 still isn't as generic or robust as the simple XPath expression. Which would you rather write, debug, and maintain? I think the answer is obvious.

However, expressive as it is, XPath is not the Java language -- in fact, XPath is not a complete programming language. There are many things you can't say in XPath, even queries you can't make. For example, XPath can't find all the books whose International Standard Book Number (ISBN) check digit doesn't match or all the authors for whom the external accounts database shows a royalty payment is due. Fortunately, it is possible to integrate XPath into Java programs so that you get the best of both worlds: Java for what Java is good for and XPath for what XPath is good for.

Until recently, the exact application program interface (API) by which Java programs made XPath queries varied with the XPath engine. Xalan had one API, Saxon had another, and other engines had other APIs. This meant your code tended to lock you into one product. Ideally, you'd like to able to experiment with different engines that have different performance characteristics without undue hassle or rewriting of code.

For this reason, Java 5 introduced the javax.xml.xpath package to provide an engine and object-model independent XPath library. This package is also available in Java 1.3 and later if you install Java API for XML Processing (JAXP) 1.3 separately. Among other products, Xalan 2.7 and Saxon 8 include an implementation of this library.

Back to top

A simple example

I'll begin with a demonstration of how this actually works in practice. Then I'll delve into some of the details. Suppose you want to query a list of books to find those written by Neal Stephenson. In particular, assume the list is in the form shown in Listing 2:

Listing 2. XML document containing book information

Snow Crash
Neal Stephenson

Burning Tower
Larry Niven
Jerry Pournelle

Neal Stephenson

Abstract factories

The XPathFactory is an abstract factory. The abstract factory design pattern enables this one API to support different object models such as DOM, JDOM, and XOM. To choose a different model, you pass a Uniform Resource Identifier (URI) identifying the object model to the XPathFactory.newInstance() method. For example, might select XOM. However, in practice, DOM is the only object model this API supports so far.

The XPath query that finds all the books is simple enough: //book[author="Neal Stephenson"]. To find the titles of those books, simply add one more step so the expression becomes //book[author="Neal Stephenson"]/title. Finally, what you really want are the text node children of the title element. This requires one more step so the full expression is //book[author="Neal Stephenson"]/title/text().

Now I'll produce a simple program that executes this search from Java language and then prints out the titles of all the books it finds. First you need to load the document into a DOM Document object. For simplicity, I'll assume the document is in the books.xml file in the current working directory. Here's a simple code fragment that parses the document and constructs the corresponding Document object:

Listing 3. Parsing a document with JAXP

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("books.xml");

So far, this is just standard JAXP and DOM, nothing really new.

Next you create an XPathFactory:

XPathFactory factory = XPathFactory.newInstance();

You then use this factory to create an XPath object:

XPath xpath = factory.newXPath();

The XPath object compiles the XPath expression:

XPathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");

Immediate evaluation: If you only use the XPath expression once, you might want to skip the compilation step and call the evaluate() method on the XPath object instead. However, if you reuse the same expression many times, compilation is likely faster.

Finally, you evaluate the XPath expression to get the result. The expression is evaluated with respect to a certain context node, which in this case is the entire document. It's also necessary to specify the return type. Here I ask for a node-set back:

Object result = expr.evaluate(doc, XPathConstants.NODESET);

You can then cast the result to a DOM NodeList and iterate through that to find all the titles:

NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {

Listing 4 puts this all together into a single program. Notice also that these methods can throw several checked exceptions that I must declare in a throws clause, though I glossed over them above:

Listing 4. A complete program to query an XML document with a fixed XPath expression

import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;

public class XPathExample {

public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("books.xml");

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile("//book[author='Neal Stephenson']/title/text()");

Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {



The XPath data model

Whenever you mix two different languages such as XPath and Java, expect some noticeable seams where you've glued the two together. Not everything fits just right. XPath and Java language do not have identical type systems. XPath 1.0 has only four basic data types:

* node-set
* number
* boolean
* string

The Java language, of course, has many more, including user-defined object types.

Most XPath expressions, especially location paths, return node-sets. However, there are other possibilities. For example, the XPath expression count(//book) returns the number of books in the document. The XPath expression count(//book[@author="Neal Stephenson"]) > 10 returns a boolean: true if there are more than ten books by Neal Stephenson in the document, false if there are ten or fewer.

The evaluate() method is declared to return Object. What it actually does return depends on the result of the XPath expression, as well as the type you ask for. Generally speaking, an XPath

* number maps to a java.lang.Double
* string maps to a java.lang.String
* boolean maps to a java.lang.Boolean
* node-set maps to an org.w3c.dom.NodeList

XPath 2

So far I assumed that you're working with XPath 1.0. XPath 2 significantly expands and revises the type system. The main change needed in the Java XPath API to support XPath 2 is additional constants for returning the new XPath 2 types.

When you evaluate an XPath expression in Java, the second argument specifies the return type you want. There are five possibilities, all named constants in the javax.xml.xpath.XPathConstants class:

* XPathConstants.NODESET
* XPathConstants.BOOLEAN
* XPathConstants.NUMBER
* XPathConstants.STRING
* XPathConstants.NODE

The last one, XPathConstants.NODE, doesn't actually match an XPath type. You use it when you know the XPath expression will only return a single node or you don't want more than one node. If the XPath expression does return more than one node and you've specified XPathConstants.NODE, then evaluate() returns the first node in document order. If the XPath expression selects an empty set and you've specified XPathConstants.NODE, then evaluate() returns null.

If the requested conversion can't be made, then evaluate() throws an XPathException.

Back to top

Namespace contexts

If the elements in the XML document are in a namespace, then the XPath expression for querying that document must use the same namespace. The XPath expression does not need to use the same prefixes, only the same namespace URIs. Indeed, when the XML document uses the default namespace, the XPath expression must use a prefix even though the target document does not.

However, Java programs are not XML documents, so normal namespace resolution does not apply. Instead you provide an object that maps the prefixes to the namespace URIs. This object is an instance of the javax.xml.namespace.NamespaceContext interface. For example, suppose the books document is placed in the namespace, as in Listing 5:

Listing 5. XML document using the default namespace

Snow Crash
Neal Stephenson

The XPath expression that finds the titles of all of Neal Stephenson's books now becomes something like //pre:book[pre:author="Neal Stephenson"]/pre:title/text(). However, you have to map the prefix pre to the URI It's a little silly that the NamespaceContext interface doesn't have a default implementation in the Java software development kit (JDK) or JAXP, but it doesn't. However, it's not hard to implement yourself. Listing 6 demonstrates a simple implementation just for this one namespace. You should map the xml prefix as well.

Listing 6. A simple context for binding a single namespace plus the default

import java.util.Iterator;
import javax.xml.*;
import javax.xml.namespace.NamespaceContext;

public class PersonalNamespaceContext implements NamespaceContext {

public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("pre".equals(prefix)) return "";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;

// This method isn't necessary for XPath processing.
public String getPrefix(String uri) {
throw new UnsupportedOperationException();

// This method isn't necessary for XPath processing either.
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();


It's not hard to use a map to store the bindings and add setter methods that allow for a more reusable namespace context.

After you create a NamespaceContext object, install it on the XPath object before you compile the expression. From that point forward, you can query using those prefixes as before. For example:

Listing 7. XPath query that uses namespaces

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new PersonalNamespaceContext());
XPathExpression expr
= xpath.compile("//pre:book[pre:author='Neal Stephenson']/pre:title/text()");

Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {

Back to top

Function resolvers

On occasion, it's useful to define extension functions in Java language for use within XPath expressions. These functions perform tasks that are difficult to impossible to perform with pure XPath. However, they should be true functions, not simply arbitrary methods. That is, they should have no side-effects. (XPath functions can be evaluated in any order and any number of times.)

Extension functions accessed through the Java XPath API must implement the javax.xml.xpath.XPathFunction interface. This interface declares a single method, evaluate:

public Object evaluate(List args) throws XPathFunctionException

This method should return one of the five types that Java language can convert to XPath:

* String
* Double
* Boolean
* Nodelist
* Node

For example, Listing 8 shows an extension function that verifies the checksum in an ISBN and returns a Boolean. The basic rule for this checksum is that each of the first nine digits is multiplied by its position (that is, the first digit times one, the second digit times two, and so on). These values are added, and the remainder after the division by eleven is taken. If the remainder is ten, then the last digit is X.

Listing 8. An XPath extension function for checking ISBNs

import java.util.List;
import javax.xml.xpath.*;
import org.w3c.dom.*;

public class ISBNValidator implements XPathFunction {

// This class could easily be implemented as a Singleton.

public Object evaluate(List args) throws XPathFunctionException {

if (args.size() != 1) {
throw new XPathFunctionException("Wrong number of arguments to valid-isbn()");

String isbn;
Object o = args.get(0);

// perform conversions
if (o instanceof String) isbn = (String) args.get(0);
else if (o instanceof Boolean) isbn = o.toString();
else if (o instanceof Double) isbn = o.toString();
else if (o instanceof NodeList) {
NodeList list = (NodeList) o;
Node node = list.item(0);
// getTextContent is available in Java 5 and DOM 3.
// In Java 1.4 and DOM 2, you'd need to recursively
// accumulate the content.
isbn= node.getTextContent();
else {
throw new XPathFunctionException("Could not convert argument type");

char[] data = isbn.toCharArray();
if (data.length != 10) return Boolean.FALSE;
int checksum = 0;
for (int i = 0; i < 9; i++) {
checksum += (i+1) * (data[i]-'0');
int checkdigit = checksum % 11;

if (checkdigit + '0' == data[9] || (data[9] == 'X' && checkdigit == 10)) {
return Boolean.TRUE;
return Boolean.FALSE;



The next step is to make the extension function available to the Java program. To do this, you install a javax.xml.xpath.XPathFunctionResolver in the XPath object before compiling the expression. The function resolver maps an XPath name and namespace URI for the function to the Java class that implements the function. Listing 9 is a simple function resolver that maps the extension function valid-isbn with the namespace to the class in Listing 8. For example, the XPath expression //book[not(pre:valid-isbn(isbn))] finds all the books whose ISBN checksum doesn't match.

Listing 9. A function context that recognizes the valid-isbn extension function

import javax.xml.namespace.QName;
import javax.xml.xpath.*;

public class ISBNFunctionContext implements XPathFunctionResolver {

private static final QName name
= new QName("", "valid-isbn");

public XPathFunction resolveFunction(QName name, int arity) {
if (name.equals( && arity == 1) {
return new ISBNValidator();
return null;


Because extension functions must be in namespaces, you must use a NamespaceResolver when evaluating an expression containing extension functions, even if the document being queried doesn't use namespaces at all. Because XPathFunctionResolver, XPathFunction, and NamespaceResolver are interfaces, you can even put them all in the same class, if that's convenient.