Technokloty

Dienstag, 11. Oktober 2011

How speech-recognition endangers languages

Let start this article how lot of articles started this month: Apple introduced new iPhone4S. Main features of this phone are better processor, high resolution camera and speech recognition called Siri. Siri is available in English, German and French and is an intelligent assistant for daily life. It can not only recognize text for SMS, but also answers questions like: How is the weather tomorrow or how to get from A to B. For this Siri needs constant connection to internet, the recognition happens on server and the information is collected from various databases e.g. Wolfram Alpha. The service works best in English, because the databases are not always available with information for other languages, but will be filled soon. Other languages will be added in near future.

So far it sounds very promising. Instead of searching for information over various websites, I can simply ask a question and with a bit of luck I get correct answer, probably even spoken one. So far so good, if I communicate with my phone in English. With German or French life becomes bit harder, not all information is available, and now the real question is which languages will be supported in the future? Probably Top 10 languages, maybe Top 25. But we've just learned that supporting a language by Siri is not enough, also the information-containing databases must have information available, which is for sure not a simple task. Now think a few years forward, what will happen, if technology matures and devices come to market whose only input is voice of their owner? Will these devices be able to support all languages or just concentrate of couple of markets where most money can be earned? Is it too much effort to teach the device such language as Estonian with 800.000 speaking it as mother tongue?

Let take a look at Wikipedia. More than 100.000 articles are available in 39 languages (2.5 Mio articles in English, more than 1 Mio articles in German and French), more than 10.000 articles are available in 64 languages, more than 1000 articles are available in 109 languages, more than 100 articles are available in 95 languages. That's impressive, but ethnologists guess there are about 6500 languages spoken now, about 2/3 of them are endangered.

Look what happened with OCR software. Company Abbyy, with one of the best products on the market supports 189 languages, 45 of them with dictionary support. I could not find a statistic how many languages can be written, but it is quite clear, that these are many more than 189. So documents which are written in these languages cannot be OCRed, which is at least annoying. Situation with Wikipedia is better as with OCR software, probably because it is quite simple to start entering articles in a new language, no technical knowledge is required (the operating system has to support the characters which are used by the language, but with Unicode-support which contains 65.000 symbols this is less of an issue).

Speech recognition requires much more linguistical and technical knowledge than OCR. Feeding databases with information required for a meaningful and useful conversation with a machine is even harder. Therefore it is quite certain that only few languages will be supported with full range of information. What does it mean for the people, who are speaking different mother tongue? If they want to use new devices, they have to use a different language then their mother tongue. So dear Estonian and dear Switzerdutch speaker, be prepared to learn English, German or Russian if you want to have one of these new shiny gadgets.

Montag, 15. August 2011

What are the next steps for augmented reality?

After Web2.0 and mobile internet being widely accepted, the question is, what wil be next. I think the answer is augmented reality.

Currently augmented reality can be experienced through mobile device with display, GPS and cam, the user points with the cam at some point of interest and the software overlays on the display the cam picture with information from Wikipedia or other sources.

Now imagine a device which looks like glasses. The glasses are transparent, next to the glasses there is a small cam. The glasses have a wired (or bluetoothed) connection to a mobile device, which connects to the Internet over LTE. The user looks though the glasses and the information about what he sees is transparantely overlayed. This is the way how our real world gets a coverage by the digital world. Several questions are arising.

What is the technology behind the glasses? Transparent LED-screens are in development for at least 10 years now. I've seen some at CeBIT when I was a student roughly 10 years ago. Also some of high-end cars from BMW show the driver information on the front glas. So it should be possible to pack this technology into glasses, which are not too heavy. The remaining electronics can be integrated in a special device, which is very similiar to todays mobile phones.

How does the input look like? Some of the actions, which are not too complex should be doable by closing and opening of the eye lid and movement of the eyes in different directions. That means that a second cam is needed which watches the eyes of the user. More complex actions like dialing a number should be possible by pointing with the fingers at virtual keyboard, which appears through the glasses. Such technology is used by Microsoft in Kinect products. It is probably not possible or desireable to type longer texts with such technics, but as it was the case with all other input devices, it will not replace them, so touchscreens or even small keyboards will still be used.

The resolution of GPS is not good enough, even in combination with Glonass or Galileo the resolution will be several centimeters, so alone from GPS it will not be possible to determine what the user is looking at. Also user might want to have information about non-stationary objects, so the position will not have any value. So pattern recognition is very important. GPS could help to give rough estimation what the user is seeing in his current environment, so information from eg. Google Street Maps can be used to match exact point user is looking at. One question is how much preprocessing of the image is happening on the device and how much will be offloaded to a server. This determines the required speed of the internet connection and processing power of the mobile device.

But what kind of information can a user get? Well, starting from information about the buildings, their current value, when build, who's the architect, how did it look like in the past to information about public signs and arrival times of public transport, to botanical information about the trees or flowers, zoological names of the animals, car labels and so on. So expect huge databases, which are filled by volounteers just like OpenStreetMap. Of course there will be navigation software available. Now to more sensitive or commercially interesting informations. If you're in front of a store, you can see current offerings, in front of the restaurant you see the lunch card, opening times and ratings from other users. Lot of people will mark their places of living, their window, their car, just to earn a badge from Foursquare or whatever such a service will be called. Now to the most sensitive information. If you know everything about the buildings, the animals, the trees, the cars, the only white spaces which remain, are the people you meet on your way. Do you really think it will stay like that? It will not take too long and the software will be able to recognize automatically the faces and show the profiles of the persons. I don't think it is avoidable, even if the person is against it, but it is hardly possible to control all the photos in the internet, lot of them are tagged, so databases contain enough information for calibration of every face. The recognition will probably never be 100% correct, there will be countermeassures, like big sunglasses, strong make-up, all the tricks from celebrities, but the success rate will be pretty high.

Currently this is a horror for most people to imagine, that they will be recognized on the street by complete strangers, but I can imagine it will change. People will have to live with it, so they will adapt their behaviour and moral norms will change. There are only very few internet trends, which were not accepted by the society, like sharing of child pornography, but this was illegal before internet as well, other forbidden trends like illegal copying of software or media sharing was declared illegal by the industry, faces recognition and augmented reality in general will be a multibillion dollar market, so I cannot imagine that any industry could be disturbed in their business by that. There will always be enough people, who have nothing against being recognized in public, so first resistance won't last too long.

So what is needed for this vision to be realized? Glasses and mobile device are probably main technological problems, but it should be solvable within next 5 years, maybe sooner. LTE should provide enough bandwidth for transferring the data about pattern recognition and information about the recognized objects. Massive databases are required for providing information about every object and an army of volounteers, who are feeding these databases. A new operating system with new input and output possibilities for the glasses and an effective system how to create the data in the databases is necessary. But all these issues are hardly unsolvable, so I expect this to happen in the next couple of years.

Mittwoch, 29. Juni 2011

Call for content creation OSes

This article is about me and people like me, therefore I would like to introduce myself: I'm an engineer, work in an EDA company, where I design different chips. At my working place I used to have a Unix-workstation couple of years ago, now I have a Windows laptop connected to a powerful server grid (call it a cloud if you want) over Citrix connection for the reasons described here. In my office people call me progressive because I'm using Java Desktop System for my work and not CDE as most of my colleagues. Computer for us engineers is a tool to get the job done, it must be fast, stable and run all the software which can cost up to $1.000.000 US per seat / year.

At home I'm a Mac guy since MacOS X 10.1. There was a time when I really wished that at some point MacOSX would be supported by the EDA software, that was a time when Maya was ported to OS X, when AutoCAD for Mac appeared and some of other engineering tools were ported to Mac. Apple positioned MacOS X as UNIX OS with supported XWindows port, with open-source Darwin core, with FreeBSD userland, with OpenGL, first citizen Java VM and so on. Now latest with introduction of Lion there is absolutely no point in porting professional software to MacOS X. From content creation OS it now becomes a content consumer OS. As I wrote one year ago there should be a differentiation between content creation OS and content consumption OS. Content consumption OS should run on a variety of fast booting, network centric, often mobile devices, it must offer easy access to different media, free or commercial, offer unified messaging, have interfaces to different social networks, must sync with other devices and must be dead simple to use. Software, which runs on content consumption OSes, should be flexible enough to run on variety of devices, which may have different screen sizes or input methods. Often these devices are connected to cloud, where media and personal data is saved. One of the main applications is a powerful browser with support for the latest standards like HTML5 and WebGL. The applications and media are provided through a store.

With introduction of Lion and Win8 Apple and Microsoft are heading exactly in direction of consumer OS. It's understandable business decision, 90% of users are consuming media, only 10% are creating. Application and media stores, cloud offerings bring revenue after the sale of the OS. At the end it's fancy, pads, smartphones are selling like crazy, social networking and digital media distribution is all the rage, lot of office applications are moving into web, so some office workers don't need a powerful PC any more and are happy with devices with consumer OSes.

But what about content creators? People who used to have workstations, MCAD users, DTP-professionals, 3D-content creators, architects, geologists, biologists and other scientists? Do they really need an application store? Their software has completely different sales model, than through an application store. How do they update their Mac at work to Lion? Will every employee need an AppleID and put $29 for Lion update on private expenses? Have you read what Apple says about new UNIX features in Lion? How will professional programs benefit from iCloud, from Auto Save (saving of huge databases might take several seconds and block the user from working)? We don't know much about Win8 yet, but for sure none of professional CAD applications will use JavaScript and HTML5 or Silverlight or XNA for designing of their UI. All these techniques make sense in order to easy port of applications to pads or smartphones, but does anyone need Catia on pad?

So after differentiation between server and desktop OSes, now it is time for differentiation between content creation and content consumption OSes. The aims of the OSes and user groups are too different, so that one OS can fit all needs. Linux world shows how it should be, while Ubuntu seems to target consumers, RedHat Enterprise distros are heading toward professional user. The same should be the case for Windows and MacOS. I don't see lot of chances for MacOSX, Apple stopped to care about the professional users couple of years ago, but Microsoft really should rethink if they should establish an extra line for Windows for Professionals, which is not intended to merge with pads and smartphones, but will remain a powerful, stable OS for power users without lot of experiments on UI and programming models.

Project Nadal and its impact on the future of gaming

On the current gaming trade show E3 in Los Angeles Microsoft presented Project Nadal, which allows completely new forms of interaction with computer or console. In this article I would like to analyze what technologies are combined in this project and what I would like to see coming next from the gaming industry.

First let take a look at the hardware, which was used for the project. Beside the XBOX 360 console the most interesting part was a camera, which is able to catch the third dimension of the person or other object in front of it. The technology reminds me of ZCam developed by an israel-based start-up company 3DV. This company was bought by Microsoft few month ago, but Microsoft denies that the technology for their cam comes from 3DV, the reason why the company was bought were patents. But anyway the technology must be quite similar. ZCam from 3DV uses infrared rays to measure the distance to the object, so it is a kind of radar with a resolution of 1-2 cm, which should be sufficient for most needs. The cam has also a microphone, which is needed for the voice recognition capabilities.

But the most interesting part is the software. Microsoft research center integrated so much goodies, which complement each other, that it is hard to separate them and view at each of them independently.

Let start with face recognition. The software seems not only recognize the user of the console, but also the mimics and make assumptions about his current mood. This requires very advanced pattern recognition. The number of users in a home-based console is probably not that large, so the recognition should be quite easy, but finding out the feelings of a person based on expression on his face isn't that simple. It is interesting, if the software needs any calibration in advance. As can be seen in the video, the mimics are used for gameplay (fire balls out of monster's mouth), advanced KI can change the game, so the dialogs can be more personal, music can be played, or films from the online video store can be suggested based on the mood of the user.

Scanning of the objects, which can be then used in the game is another great improvement in the game play. In the demonstration of a new game, which uses capabilities of Nadal, Peter Molyneux demonstrated how the camera scanned a drawing on the paper, which was used for the continuation of the story. Again a very advanced pattern recognition is needed for that. So I think first games will use this feature, just to get the pattern without recognizing it, so things like skateboards or cloths can be customized. Of course this technics can be used for something what Second Life has promised but never delivered, that is the possibility to become one with the avatar, so that it becomes a truly virtual mapping of yourself. This is actually a dream for all fashion companies, so that the customer can try all cloths (even with combining them with cloths he already has) before ordering them. Probably the resolution is still not high enough for tailoring new cloth, which fit exactly, but with the time the resolution will improve, so it becomes possible to send orders for individual manufacturing of clothes.

Voice recognition is an game controlling element, which hasn't been used too much, because it is too slow to control a game with words and the variety of expressions is too large, so KI must be very advanced to be able of handling it. But there is one type of games, which are perfectly suited for voice recognition, these are all quiz shows. Computer must recognize just the right answer, which is much less complicated then handling free speech. However the demonstration of Milo showed that Milo seemed to understand what the person was saying. The answers reminded me of ELIZA, but recognition of free speech and conversion into format for ELIZA is very big achievement, if it really works as promised.

Recognition of players moves. Sony's EyeToy could detect moves, but without support of measurement of 3rd dimension, the recognition was quite inaccurate. Now Microsoft is promising that the whole body will be recognized and the resolution will be much higher without need of any controller. Peter Molyneux is absolutely right when he is saying that controllers with more and more buttons prevent a natural interaction with the console and Nintendo's success with Wii only proves it. So now even the Nunchuck is not required, which should increase the community of console players even more, because the entry barrier is very low. Just stand in front of the TV and start playing. One often criticized point is that it is unrealistic of driving a car with an air wheel, but player can use every object as a wheel, if he wants to have something in his hands.

So what is missing now for a perfect gaming experience? The input is perfect, but the output still lacks some important features for complete diving into the virtual environment. The visual output is with introduction of HDTV much better, but now the format of the screen is not optimal. Of course a VR-cave would be the perfect solution, but it will remain expensive and consumes too much space, so it is impractical. All VR-helmets failed so far and they do not allow social interaction with other persons in the room. So a solution would be a screen, which has height of human body. A fight is much more realistic, if the opponent has approx. the same size as the player himself, so are all sport competitions. For different games, which need a bright view angle like flight simulators, the screen should be rotatable.
3D-cinemas have a new revival with the new digital systems, so this technology should become affordable for home users as well. Either the user could wear shutter-glasses or there are special monitors, which are able to show 3D even without this. A console must generate double as many pictures, but I don't think it is a big problem.
The biggest issue is the force feedback. A boxing fight without having physical contact with the opponent is not realistic. So while gaming industry is offering vibrating controllers or special seats, these solutions do not work if there are no controllers and the gamer is moving in the room. So fresh ideas are needed here.

As a conclusion I can say that project Nadal is revolutionary for the gaming industry, it combines several very advanced technologies into a solution which makes lot of sense and is very intuitive for the consumer. It is interesting to see what kind of games will be using this technologies and how much effort it will be to create such games. Microsoft promised to deliver the cam in 2010, so it remains to be seem, if all promises can be fulfilled. However there are still lot of wishes open, which could make gaming even more realistic.

SoC

SoC (System on a Chip) is the name for a class of microelectronic designs which consist of several parts. Those parts were on different chips in former technologies, but now they are integrated on one silicon die. SoC is the heart of basically all modern electronic devices, like TV set, set-top box, smartphone, tablet and other. Integration of several chips into one is nothing new, older readers will remember how Intel integrated mathematical coprocessor into main processor and called this family of processors i486. But nowadays there are some new requirements for competitive designs which make designing a SoC a real challenge. The advantage of a SoC compared to several separate chips is higher communication speed between the parts, less space consumption, less energy consumption and hence less amount of heat to be dissipated. There is less wiring required on PCB and less elements means lower costs during manufacturing of the system.

A modern SoC consists of one or several micro processor cores, most likely ARM architecture, several interfaces, like DDR memory controller, USB 2.0, HDMI, I²C, CAN for car entertainment systems, hardware media decoder, 3D-graphic accelerator, analog-digital converter, even physical sensors for acceleration and alike. A single company can hardly develop all these parts alone, so it needs to buy IP from other companies. There are hard and soft IP macros: soft IP means just Verilog description of an IP (think of ARM core, which can be optimized timing wise) while hard IP means a complete layout for a specific technology (think of perfectly layouted USB controller). Since all parts are on the same die and are manufactured at once, all IPs must be available in the same technology of a certain foundry like TSMC. These many blocks introduce additional difficulty for developers of analog parts of the chip because analog parts are much more sensitive to variations during manufacturing. Moreover irregular analog structures cause more problems for lithography than regular digital structures in standard cells. There are solutions for this problem like SiP (System in a Package) or 3D-chips, where analog dies are ordered vertically or horizontally next to the digital die, but this means higher costs, since basically two chips must be manufactured and connected in a tight package. Another problem with integrating different functionalities mixing analog and digital parts is that these parts can disturb each other by injecting noise in substrate, by dissipating more heat on a smaller area as well as consuming more power in smaller areas. So if the power grid inside the design is not calculated carefully one part could soak more current and leave other parts underpowered.

But real challenges arise for SoC designer due to new requirements, which are necessary for a modern SoC design due to the fact that it has to be sold several million times in order to become profitable:

1. New interfaces - A SoC must be able to handle inputs from multi-touch displays, GPS satellites, Hall-sensors (magnetic compass), acceleration meters, light sensors, high resolution cameras, several wireless standards, and other input devices. It must drive high-resolution displays (eventually 3D), output hifi-quality audio, or give even physical feedback using actuators.

2. New possibilities for connectivity - A modern TV can connect to internet wirelessly as well as needs to connect to video input from several devices, like blue-ray player, set-top-box, game console. It has Firewire and USB interfaces for external hard-disks, several slots for memory cards. Connecting all these device types to a SoC must be handled by it.

3. New programmability – since the iPhone and its very successful AppStore concept everybody is talking about the app economy which means generation of revenue after sale of the product. Every smartphone series has its own AppStore, in foreseeable future TV and set-top producer will have their own, also AppStores from car manufacturer for their entertainment systems are expected. What does it mean for a SoC? A completely free programmable SoC must be tested more carefully because in advance it is not known which software will run on the system. Moreover since introduction of Windows for ARM several operation systems must be able to run on a SoC and support all its interfaces.

4. Low-Power - mobile systems need to run as long as possible on a single battery charge, also stationary devices should not consume too much power for environmental reasons. That means that parts of the device which are not needed at the moment can be switched off, must wake-up as soon as they are required and start communication with active parts.

Combining these requirements with advanced technology nodes, short time-to-market window, and the pressure to sell several millions of SoCs it becomes clear that some new approaches are needed rather than old ways like writing software and developing the hardware independently and bringing both together after the design has been produced. The amount of verification and testing of different configurations is basically exploding. In order to fulfill the requirements listed above the development of software and hardware must be tightly coupled. That means the software must be able to run on a model of the hardware design which resembles the functionality of the hardware as exact as possible. But here comes the necessary trade-off: it is possible to simulate the behavior of a single transistor and parasitic effects, but simulation of several millions transistors is simply not possible. So a higher abstraction level is necessary, which is still accurate enough to allow that the results are not too different from the produced silicon. Since a SoC consists of several IPs, one of the requirements of modern IPs is to have different models which can be used in simulation and verification of the whole system. If simulation is still too complex to be handled by software running on regular workstations, there are special hardware solutions, either based on FPGAs or special processor arrays, on this hardware the model of the system can be uploaded and emulated.

All big EDA companies started preparation in order to handle the new requirements. Synopsys bought two verification companies and is the biggest IP provider. Cadence started the EDA360 initiative in which it develops IPs with simulation models and creates partnerships with other IP companies like ARM. Mentor is becoming active in software business: it bought several Linux-oriented companies, so the promise here is to have tightly coupled soft- and hardware. Cadence and Mentor are also partnering in defining standards for verification of SoCs and both have powerful hardware based emulation solutions.

Due to the rising complexity and high manufacturing costs of a design in advanced technologies the main focus for chip industry is not development of the chip but integration of several parts into one design. Only a verified design with optimized drivers is competitive on todays market if it consumes less power and provides great multifunctionality.

The insane world of programming for mobile devices

Yet another IT-revolution is happening now, the smartphones are becoming more and more popular and I guess nobody could predict the popularity of apps (well, except Steve Jobs maybe). One measurement how popular is a platform is the amount of available apps, another one, how easy is it to create apps for it. Just three years back, the mobile world offered following platforms:

PalmOS: Palm transformed its handhelds into smartphones, but all applications written for PalmOS could still be used.

WIndows CE: There were several versions of the mobile Microsoft OS, the apps for different versions were not always compatible, but could be adapted

Symbian: Coming from PSION devices the platform was highly optimised for mobile usage, though not easy to code for.

JavaME: Stripped down Java version was included on lot of featured mobile phones. Was good for games and simple apps, where the UI could be completely customised, but a horror to test on different devices and to certify the code. The expectation was that when the devices become more powerful JavaSE and JavaME could be merged as some point, JavaFX was also a hot candidate for the JavaME replacement, but it seems Oracle is not very successful in convincing the platform creators to include JavaFX in their environment.

With exception of JavaME all platform could be coded in C or C++. For JavaME there was a NetBeans plugin from Sun with all required emulators and debugging tools, Windows CE code could be written in Visual Studio.

Then the iPhone appeared on stage and nothing was the same as before.

iPhone did not run JavaME, iPhone did not run Flash, it had its completely new environment for mobile programmers, who had to learn C Objective and handle XCode. Nevertheless they followed Apple and created a stunning number of 300000 apps, which can be downloaded from the AppStore. The hype around iPhone and the lesson what a phone must have to become successful was learned quite quickly by the other platform creators, so they started developing own coding environments in hope that app programmer will use them and create a similar amount of apps for their platform.

Now the situation is that every modern mobile platform asks for a different language, different API, different coding environment:

Platform	Language	API	Coding Environment
iOS	C Objective	Cocoa Touch	XCode
Symbian	C++	Qt + Symbian	Qt Creator
Meego	C++	Qt + Linux	Qt Creator
Android	Java	Android API	Eclipse Plugin
Blackberry Classic	Java	BlackBerryOS API	Eclipse Plugin
Blackberry Playbook	Flash, JavaScript, HTML5		Flash Creator
Windows Phone 7	C#	Silverflash	Visual Studio
WebOS	JavaScript, HTML5	webOS API

Crosscoding between platforms is quite difficult, since not even MVC paradigm can help here, all parts of the code must be transformed in a different language, which is just as much effort as starting programming from scratch.

There are several attempts how to create an app, which works across several platforms:

- HTML5, JavaScript, PhoneGap - All platforms have powerful browsers, which understand a subset of upcoming HTML5 standard and JavaScript. So the idea is to have either an app which consists just of a web-view and a hard-coded internet address. The problem with this approach is that for using this app, the user must be online. Even if all the code is stored offline the second problem is that not all features of the device can be accessed by JavaScript, this is where PhoneGap and similar software step into the ring. They provide a JavaScript API which allows access to device features with JavaScript. Since the API is the same for all supported devices, apps which have been created using PhoneGap can run on different platforms.

- Flash - Adobe is working (marketing) hard to position Flash as a replacement for JavaME, i.e.. a platform which is available for a variety of devices with least common denominator (which is of course much higher than it was for JavaME). So far Flash is available for Android, Symbian, Meego, Blackberries new OS, but there is a compiler for iOS, which transforms Flash into a C Objective app. Probably Flash will be used for the same kind of apps as JavaME, for apps, which do not have to look like native apps e.g. games or fun apps.

But not only programming is different for each plattform, business models are also different, e.g. iPhone user are glad to pay small amount of money for an app, the Android apps should be better financed through ads. The procedures of app-signing, of reviews by AppStore owners, of becoming a publisher in an AppStore, of the policies for an app are all different. All this means that creating apps for different plattform is each time a business decission, which must be reviewed carefully, if it makes sense to support this or that platform. It is not just about going mobile, but going mobile on which platform.

So far there are two clear winners in the app race, this is iOS and Android, with 300000 resp 100000 available apps. But nothing is as fluid as mobile app market. Nokia is still the number one smartphone seller, after bringing Qt on their platform it is possible, that lot of Linux-savvy programmers will be attracted by Meego. Never underestimate the marketing power of Microsoft and Ballmers call for developers. The company behind Android, Google is having hard time from being sued by Oracle for violating patents Oracle has purchased with Sun IP properties. So in half a year the numbers might look completely different and a new competitor can suddenly arise from nowhere. This means that the developer must be prepared to be forced to learn new language and new API. The best thing for them would be if a consolidation would take place to 3-4 platforms and powerful JavaScript/HTML5 API which allows cross-platform programming in one language.

Don't Believe the Hype

Everyone and his dog are talking about cloud computing. This is the future of computing, nobody will have an own server, all data will be send to a hoster, who has unlimited scalability for the given application, the only limitation is the depth of users pocket, but since the resources are shared among multiple users, processing time, bandwidth and memory are much cheaper than having an own server. No configuration, no administration is required, self-healing services allow 24/7 availability of the applications.

Well, this is what we thought, when we started to develop our applications. We are a small startup in Germany and our idea is to provide a new Local Based Service for mobile devices and a portal. When you have a startup you never have enough resources, neither time, nor personal, nor financial, nor lot of experience. Therefore the idea was to use cloud computing as back-end solution. We don't want to have own server, buy expensive upload bandwidth, configure it, administrate it, secure it, backup it. We just want to start programming of our application. We were looking at several alternatives but at the end we decided to go with Google Application Engine. GAE was very intriguing for us, because it offered a Java environment with Google database, fast searching algorithms, low costs and well, this is Google, so what can go wrong? Unfortunately quite a lot.

After installing Eclipse GAE plugin and reading some documentation I was soon able to create first example, test is on localhost and upload it on GAE server and voilá it worked. Encouraged by this success I decided to develop our whole application on GAE.

First difficulties started, when I tried to create references from one object to another and save it in the database. The web is full with happy coders who are praising Google how easy it was to create a 1:1 reference, or even 1:n, but I was not able to create references, which worked flawlessly. At least in documentation it was stated that m:n references are possible just by manual managing of the keys of the referenced objects, so we ended with manual managing of the keys of all objects, so back to the roots. What by the way was cascading?

After a while, the GAE O/R mapper refused to enhance the POJO entities. As O/R mapper GAE is using DataNucleus. Why they are not using widely accepted and proven Hibernate is beyond my imagination. DataNucleus is nowhere as praxis proven as Hibernate and for some coders this might be English humor, but for me the support answers from DataNucleus guys were pure arrogance. After wild configurations orgies finally we gave up the idea of starting the application from Eclipse and started using an ant script, which worked quite nicely as long, as the application had to be started on the localhost.

After our application became more complex and we deployed it more often on the GAE server, so that more people could take a look at the progress, we realized that the starting time of the application were horrible. It took about 15 sec, till the start page appeared in the browser. Last time a webpage took 15 sec for appearing on my browser was in the 90th, when I was surfing for overclocking tips on weird Japanese websites with my 56kbaud modem. After searching in some forums I realized, that I was not the only one affected by the issue. It seems, if the application is not in usage, it disappears from the main memory of GAE server and it takes long time, till it is reloaded (maybe even recompiled) and is ready to serve. On the forum people were discussing how often a client should send a request to the GAE server, so that own application does not disappear from the memory! Seems it is becoming cat and mouse catching between Google and GAE users, because the intervals between the pings are getting shorter and shorter.

After testing on GAE server we started to realize that the behavior of the application on server is different than in testing environment. The application crashed, at situations, where it run flawlessly on localhost. The problem was that the logfile stayed clean, so no indication what might be wrong. Also complete crashes were not uncommon, when not a single user from different locations could access the application. It took me quite a long time to find out how to restart the application after total crash ("appname".appspot.com?restartApplication).

Another problem risen up, when our webdesigner started his work. It is not possible to upload single files, it must be the complete war directory. So even if a picture has changed, the whole application must be redeployed. Why should an external web-designer have access to the whole application code?

Database administration is very basic. Neither it is possible to dump the whole database or delete it completely and start with a new one. A database viewer is available, so it was possible to see, which references again have been either not created or created wrong.

But the biggest issue hit us two weeks ago. For some reason it was not possible to deploy the application on GAE server! Without any reasonable explanation the uploading stopped at 99% and rollbacked to the old version. If such a thing happens during production mode, this is absolutely unacceptable. Two weeks later the deployment is still not possible.

This was the last drop, therefore say hello to MySQL, hello to Hibernate, hello to Tomcat, hello to BlackHats.

The idea of cloud computing is great, just upload your code, the provider cares for the rest. But praxis shows completely different picture. Cloud Computing must be taken with a very big grain of salt, currently only for testing, at least GAE has still a long way to go, till it can provide a viable alternative to an own server.