Magazines, Books and Articles

Saturday, May 3, 2014

The rise and rise of data

Even a cursory study of the history of the internet, especially after the advent of the World Wide Web, indicates the power of the medium. In terms of users, it has grown 588.4% between Dec 2000 and June 2013, from around 369 million to 2.4 billion, though only 34.3% of the world's population had access to the internet in this period [source]. The penetration is expected to grow to 45% by 2016, bringing the digital world to almost 3.4 billion users.

Businesses and governments have taken advantage of the internet, especially of the WWW, to create applications that many of us can't do without. Shopping for most of your needs, reservation of your airline, train, bus tickets, planning and managing your holiday tours, most banking activities, paying your bills, buying and renewing your insurance, filing your tax returns - all of these can be done over the internet. Streaming music and video keeps us entertained, social networking applications and blogs fulfil our need to share, Skype and WhatsApp help us to connect. Google Drive, Onedrive, Dropbox store our important documents and allow us to share restrictively. Almost every sphere of activity have applications dedicated to it, photography, ornithology, skill enhancement, stocks, funding of ideas to production, you name it, all allowing us to generate and use content.

All of these generate data, huge volumes of it. The infographic in this article gives an indication of how much user driven data is generated every minute. CISCO estimates total Global IP traffic by 2017 will be 120643 PB per month up from 43570 PB per month in 2012.(What is a Petabyte (PB)? also see here).

Not all data is user generated.

A example is India's 'Adhar' initiative, implemented by the Unique Identification Authority of India [UIDAI]. Briefly it aims to provide a 12 digit unique identification number to every Indian (that's over a billion people) starting August 2009. It's mandate is to provide these numbers to 600 million (60 crores) citizens by 2014. It enrolls a citizen by collecting his/her iris and thumb scans and demographic data. Each enrollment pack is 5 MB of data. Till now it has generated 1500 PB of data [source].

Every department of governments generate a humungous amount of data. Analysis of these data create more data, usually in the form of reports. Government rules and regulations mandate business to generate and publish to the public domain their Annual Reports. Organisations like Lexis Nexis collate business data from the public domain all over the world and create tools such as their Dossier Suite.

Look around you and everyone is creating data and very large volumes of it: the United Nations, stock exchanges and financial and credit rating organisations, researchers and scientists, even machines and processes where sensors report the working parameters at regular intervals.

The question then is: How do we persist and maintain large volumes of data? And how do we use these? We’ll check these out in subsequent posts.

Tuesday, April 8, 2014

Commodity hardware

One of the USP of NoSQL databases is their ability to run on 'commodity hardware/ machines/ servers/ clusters'. So what is commodity hardware?

My desktop PC is powered by an Intel i5 4th gen processor, has 16 GB (initially 8, upgraded later to 16) of RAM and 1 TB HDD. The hardware is reliable and affordable, even though it was built by an assembler. It is easy to upgrade when required - like I did with the RAM. I have no vendor buy in; my first preference for an upgrade would be the current vendor, but it is not necessary for me to buy from him. I am also not too concerned about the make of the components - after all the industry has matured to an extent where there is not much to distinguish between different makes.

The same would be the case with a branded desktop that you may have bought. You may have vendor buy in because of the warranty, but nobody really can stop you from adding memory or storage or a game card, or maybe even replacing the motherboard and the CPU.

This essentially is then a commodity machine:
Affordable, reliable and upgrade-able
No (or limited) vendor buy in

A commodity cluster is made up of commodity machines working in parallel to increase computing power. Serious supercomputers have been built this way; the nitty-gritty of building and maintaining clusters are known. In the NoSQL context, clusters increase performance and provide scalability. Add machines to the cluster if demand increases; maybe even remove a machine if the demand reduces and do these without disrupting services.

A start-up from Bangalore has built a data center using commodity hardware. A case study worth reading.

Wednesday, April 2, 2014

Creating complex business solutions

In spite of the fact that I haven't posted on this blog for 2-2.5 years, there has been an average of 100 page views a month. I assume from this that the content has been helpful to some. Inspiration enough to start again.
Technology has made giant strides in these few years making it possible to create very complex business solutions. Two areas get highlighted:
a. how do we maintain large volumes of structured and unstructured data, query and analyse it
b. how do we build and deploy applications in terms of performance, scalability and mobility.
It is a time when developers are under extreme pressure to learn and understand a lot of new technologies (in addition to the traditional ones) in a very short time - and deliver. .NET developers, IMHO, are at a greater disadvantage, because solutions are more in the Java/ Open Source space and require more effort on their part.
In subsequent posts we will discuss the topics below. This will address some of the areas related to 'a' above.
1. NoSQL databases: the need, different types, performance comparisons, applications
2. Search: the need, different approaches, performance comparisons, applications

Saturday, October 8, 2011

IE's legacy browsers - will they ever go away?

“W3C publishes documents that define Web technologies. These documents follow a process designed to promote consensus, fairness, public accountability, and quality. At the end of this process, W3C publishes Recommendations, which are considered Web standards.” from Standards FAQ.
The recommendations are not binding, but as the W3C says: “W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.”
And would make us developer’s creating web applications more productive.
Microsoft, in spite of its large representation in the W3C Working Groups, has been a laggard at proactively adopting these recommendations in its browsers. It has thus gifted us a bunch of what it now calls its ‘legacy browsers’ - IE6, IE7 and IE8, which, whether we like it or not, has still to be supported if our web applications/sites are to reach a greater audience.
A case in point is the CSS Color Module Level 3, one of the modules of CSS3, which was endorsed as a W3C Recommendation on 7 June 2011. The first W3C Working Draft was published on 22 June 1999, it became a W3C Candidate Recommendation on 14 May 2003, back to a W3C Working Draft on 21 July 2008, and a W3C Proposed Recommendation on 28 October 2010.
Tantek Çelik [Microsoft’s representative till 2003] and Brad Pettit [Microsoft’s representative] have been associated with this recommendation right from the first working draft to its endorsement as a recommendation.
In this period, IE went from IE6.0 to IE9.0, Firefox to version 5.0 and Chrome to 12.0. The table below indicates how Firefox and Chrome were early adopters of CSS3, whereas Microsoft waited until IE9.
Table 1
How does Microsoft’s apathy affect us in development? Several ways:
1. IE is the most used web browser.
Figure 1: Web browser usage by country in September 2011 [Source: http://en.wikipedia.org/wiki/Usage_share_of_web_browsers]
2. StatCounter estimates the usage share of web browser for September 2011 as below.
Figure 2 [Source of data: StatCounter]
3. The usage share of various versions of IE for September 2011 is shown below (source: StatCounter)
Figure 3 [Source of data: StatCounter]
Figure 3 means that the usage share of Microsoft’s legacy browsers is 32.72%. This is a large population and cannot be ignored.
So why do we have a such a large population using IE’s legacy browsers? 2 reasons in my opinion.
One: Microsoft bundles IE with its OS. So IE is kind of available out of the box. The fact remains that the majority of Internet users are not techies or geeks like you or me. The browser is just another tool to get their work done. And if it is available out of the box, they’ll use it. They don't care if the IE version they are using is W3C compliant or otherwise, as long as it can get their work done.
IE also has a tight integration with the OS. So while you will be able to install Firefox 7 on Window XP or Window 7, IE9 will work only on Window 7+ (though its legacy browsers can run of this OS). The high usage of IE8 is because it was released for Windows XP, Windows Server 2003, Windows Vista, and is the default browser for Windows 7 and Windows 2008 R2.
Window XP and Window 7 more or less share the honours as per StatCounter’s data of September 2011. This delays adoption of IE9 as the primary browser.
Figure 4 [Source of data: StatCounter]
Two: Figures 1, 2 and 3 indicate internet usage. My experience is that the usage of IE’s legacy browser is even greater in intranets in corporate, business and government - this follows from the fact that the browser is bundled with the OS. These have a large user base and the cost and time of upgrading the OS or the browser is huge.
So, IE’s legacy browsers aren’t going away soon. And we will continue to fret for a while longer about the pain these give us. And because we create applications for the masses, we need to write code to support both these and the modern browsers. Something like this ‘best practice CSS’, even though it will never validate.
.50PercentOpaque
{
    opacity: 0.5;     //W3C CSS Color Module Level 3 compliant browsers
    filter: alpha(opacity=50); //IE legacy browsers
}

Tuesday, September 20, 2011

Why I did not buy a Samsung tab...or an iPad

Last Saturday, I walked into an electronic store to purchase a Tablet, a tab. My preference was for the Samsung 10.1; in the store I also came across the iPad 2. And in this section was also on display an array of laptops.
Why do I need a tab? My usual practice is to surf the net after dinner, possibly read an e-book, catch up with people on Facebook or LinkedIn. I do it on a laptop connected to the net through Wi-Fi, sitting at a table in the living room. And I doze off most of the time. It would be so much more comfortable if I could do all of these propped up in bed, and doze off without a care.
The Samsung 10.1 on display was a 3G+ Wi-Fi 16GB version priced a little over Rs 33,000. Its cover, a necessary accessory, was an extra Rs 3,500 or so. Which meant that the tab would cost me between Rs 36-37K. The 3G+ Wi-Fi 16GB version of the iPad2 with cover was Rs 37,000. The one with 32GB was around Rs 41,000.
At this point I began to wonder if, at this price, it made any sense to buy this device. At a price of Rs 40-45K one could purchase a much, much more powerful laptop with oodles more hard drive space, and do more on it than is currently possible on a tab. [Don’t believe me? Check out the Dell Vostro 3750 which sells for about Rs 41,000 all inclusive.] And of course, any worthwhile laptop can connect to the net through a Wi-Fi connection, or a wireless device from your favorite mobile service provider, or can be tethered from a 3G phone. So what would I miss? The touch screen for one, its light weight, and certainly the convenience of surfing or reading propped up in bed.
Didn’t I know all this before I visited the store? I guess I did, but the heart went like “Let’s get one, let’s get one”; and it was only after we actually fiddled around with the tab that the head asserted “Come on. This gizmo is not worth 37K. A 40K laptop is more value for money any day. Get a hold of yourself. You want to spend 37K so you can surf lying down, and push things around the screen with your finger? Shame on you!” The heart didn’t have an answer.
So I am back at my laptop. The heart is disappointed. And so is the head. This innovation did not match up to expectations; or perhaps it wasn’t created for me.

Sunday, September 11, 2011

Weird Architecture

Sometime back the technical consultant of a customer accused me of having created a ‘weird architecture’ for her project. And this after we had completed an iteration that met agreed goals, work which was commended by the customer as ‘brilliant’.
This was a Web 2.0 application, built on the ASP.NET framework, written in C# and targeted the .NET Framework 3.5, with a SQL Server 2005 database, and hosted on IIS 6.0.
So what was ‘weird’?
The project consisted of several web applications and was deployed on the IIS in the manner shown in Figure 1. A web site was created and web applications were deployed as virtual directories under this web site.
Figure 1
A separate Application Pool was created for this web site, as indicated in Figure 2.
Figure 2
From the above example, Application1 could be accessed from the URL http://www.myapplication.com/Application1 and so on.
In the past few years, I have come across several applications which have been or will be deployed in this manner. What was/is the motivation for deploying the applications in this manner?
We analyse this through a couple of case studies. Read the analysis here.

Friday, July 15, 2011

Great People, Great Teams

Bill Taylor’s post Great People Are Overrated on the HBR Blog Network has generated a lot of heat. His post was triggered by a statement Facebook CEO Mark Zukerburg made in an interview: “Someone who is exceptional in their role is not just a little better than someone who is pretty good,” he said. “They are 100 times better.” It emphasised what Marc Andreessen had told him: “Five great programmers can completely outperform 1,000 mediocre programmers.”
Bill Taylor does not agree with them. He goes on to make a case that a team of average performing individuals is better than high performing superstars, and concludes: “Most of business life isn't really a choice between one great person and 100 pretty good people, but if that is the choice, I'm not sure I'd make the same choice as Mark Zuckerberg — especially if those 100 pretty good people work great as a team.”
The last I saw, his post had received 272 comments, mostly from software practitioners, and the majority pilloried him.