Archive for March, 2017

Deploying node.js app in GCP

March 30, 2017

Google has been the pioneer in the technology field from popularizing AJAX to cloud to Bigdata, you name it. However, Amazon is the one that is successful in monetizing on these technologies. IMHO, The reason is two fold, one it was late to the market and second being documentation. If it was pretty late to join the cloud party, preparing documentation like a IEEE journal didn’t help its cause either. In fact, it made it worse. I have been playing with AWS since 2008 but when I looked at GCP (formerly GCE) couple of years ago at one of their events, I left pretty depressing, couldn’t accomplish a simple task of creating a server and hosting a site!

With this background, I started my journey couple of days back to deploy a node.js website in GCP. The webapp is a simple SPA, developed with AngularJS 2.0 as the front and node.js on the back-end. One small twist was that this app was accessing AWS CloudSearch to perform the search and return the results.

One thing I should admit that their documentation has come a long way (I was successful in hosting the site:-)). I started with this https://cloud.google.com/nodejs/getting-started/hello-world. Now that Google is also offering a generous $300 credit for the first year and I was able to follow their document and host the hello-world sample app. Having said that, I looked at the documentation to achieve this on three other places Heroku, Azure and AWS. I must say GCP is the simplest. However, it was not without its quirks. I had my webapp developed using node.js, Express and Angular.JS and here is what you can do to get your webapp ready for deploying in GCP .

  1. Copy the app.yaml from the hello-world sample that you downloaded from the github. Make sure the exclude ^nodemodules is not there.
  2. Went to the command prompt and followed the documentation.
  3. I did gcloud app deploy.

This took a long time and errored out saying “Error loading module (‘aws-sdk’)”. I searched around and after couple of hours found out by trial & error that this is not mentioned in the dependencies in the package.json file. Finally the app got successfully deployed after it took a very long time. I was thrilled but my euphoria was short-lived as the app was not loading on the browser and was throwing 502 error and asked me to try in 30 seconds.

I started researching, there is no solution in sight. Everyone was pointing me to read the logs and being new to GCP, was moping around figured where the logs are and started going through them which is like finding needle in a haystack. Some error was pointing to upstream connectivity failure on nginix and I noticed a loadbalancer error of unable to connect back to the server. Having setup a loadbalancer in both Azure and AWS, I thought that the loadbalancer is not able to direct the traffic to the actual webserver where the site is hosted (though, i didn’t ask for a loadbalancer, GCP by default deploys the site atleast on two servers to make it HA). I found an article (https://cloud.google.com/compute/docs/load-balancing/http/ ) that loadbalancer uses port 80 or 8080 for HTTP traffic and 443 for HTTPS. My server.js file was listening on 8000. I changed it to 8080, voila! it worked. I was ecstatic, how cool is that?

I built and hosted a site on Google Firebase + Angular earlier and now this one, my perception about GCP is changing. Google is becoming a player as well.

AWS CloudSearch & node.js app in GCP

March 27, 2017

In my earlier post, I talked about CloudSearch and now I’m following up with the approach I followed to get the data searchable. There are multiple components. At the core is the AWS CloudSearch which gets populated with the data in JSON format, index those data and ready for search. Now, we have  searchable data but need a mechanism for a user to be able to search that data.

I chose node.js, Express and Angular.JS to create a web based app that will offer a UI where the users can type in the search terms and choose criteria to filter data. The app will convert the search terms into structured or simple query (similar to SQL for RDBMS), call AWS CloudSearch APIs to perform the search and display the search results using a Bootstrap grid. Right upto this point, my thought process was straight. Then I threw a spin, how about we host this web app on Google Cloud Platform?

I had no idea on GCP, but with prior knowledge of AWS and Azure and a liberal credit of $300 from Google, thought, it shouldn’t be a problem. I could have hosted the app on heroku or AWS or Azure. However, after reviewing the documentation, I felt confident that I can make it work. Yes, indeed I was right but only after those frustrating moments one have when attempting a new technology. I will write a separate blog entry on this and I don’t want to cloud out the AWS CloudSearch.

AWS_CloudSearch

Created two IAM users, one with authorization to only search and the other with authorization to upload and index CloudSearch. One good thing about CloudSearch being able to auto convert CSV to JSON. The source was data from MS SQL server, which was extracted as CSV and a separate script uploaded (in 5MB chunks) the data to CloudSearch and indexed the data. I was able to upload data with different formats (number of fields). Then the webapp will build the search terms using “structured syntax” if it is a compound query checking against multiple fields or a “simple” query if it is just text and all text fields are to be searched. The call to CloudSearch APIs through AWS SDK is made to return the search results which are displayed using a Bootstrap grid.

The full-text search on the relational data (MS SQL Server) took upto 30 seconds while the AWS CloudSearch was less than a second! For people who are looking to know why CloudSearch? 3 seconds vs. 30 seconds is a compelling argument to ignore.

If you have data in CSV format, I can show you a searchable demo pretty quick, ping me.

This was a great experience to get the hands dirty and there is nothing like it when it works! right?

AWS Cloudsearch

March 25, 2017

The cloud is filled with wide ranging options to store and retrieve data and so is on-premise. Every cloud provider has their own cloudsearch solution from Amazon to Azure to Google. In addition to these proprietary solutions, there are these open source platforms ElasticSearch, Apache Solr etc.

Here is a wonderful blog comparing the three products. http://harish11g.blogspot.in/2015/07/amazon-cloudsearch-vs-elasticsearch-vs-Apache-Solr-comparison-report.html

In short, all offer similar features with little difference. I would say there are mainly two big differences between AWS CloudSearch and the other two.

  1. Data import is a batch process in AWS CloudSearch. If you have streaming data or immediate data update, the go for elastic or Solr.
  2. If you don’t need to worry about infrastructure, backups, patches, then go with AWS Cloudsearch. Out-of-the box it comes out as a true cloud product.

Elastic.co as well as AWS offers elastic search as a service where they have simplified the infrastructure part. Elastic.co, infact offers it as a service on AWS cloud. However, Elastic and Solr are more popular than CloudSearch. Thus, it is easy to find resources online for these two compared to AWS CloudSearch.

Thus, I embarked on a journey to take-up AWS CloudSearch and you know what, it is not that difficult (though I went through those gnawing issues and had my own share of frustrating moments). To begin with, I did the manual route of extracting the data out of my RDBMS (SQL Server), upload the data to CloudSearch, indexed it and used the rudimentary UI provided by AWS and was able to search in an hour. The biggest advantage, I see with AWS CloudSearch data upload is that it takes a CSV file and converts to JSON by itself. You can write a batch program to upload in chunks of 5MB files. In addition to CSV, it support multiple other types such as PDF, EXCEL, PPTX, docx etc.

Both Solr and Elastic search, you need to provision a Linux server, then install and configure similar to any software that you download.  Even if you take the service route, you still need to worry about backup, upgrades, applying patches etc. One big advantage of these is that you can have it on-premise as well, while AWS CloudSearch is truly available only on the AWS cloud. Beyond that, Elastic also has data visualization tool Kibana or it comes like a suite (ELK – Elastic, Logstash, Kibana). AWS ColudSearch offers only indexing and search and no visualization which is a separate product Quicksight (I haven’t looked at this but I plan to).

I will write more about the programmatic approach in my next entry. Please drop me a line, if you can’t wait and wish to see it in action!