GE8151 PROBLEM SOLVING AND PYTHON PROGRAMMING : September 2016

Tuesday, September 27, 2016

CS 6712 GRID AND CLOUD COMPUTING LAB MANUAL

Anna University 2013 Regulation

CS 6712 Grid and Cloud Computing Lab Manual can be downloaded from the below Links

Lab Manual 1

https://drive.google.com/open?id=0ByI3h-WZRk-ndS1ZbUpmUkM2OVU

Lab Manual 2

https://drive.google.com/open?id=0ByI3h-WZRk-nTlQwSk4zb04yRnM

Lab Manual 3

https://drive.google.com/open?id=0ByI3h-WZRk-nX3N2ZFNJVDdrWEU

Lab Manual 4

Lab Manual 5

https://drive.google.com/open?id=0ByI3h-WZRk-nSXB2YnlYaUh2VVU

Saturday, September 24, 2016

CS 6712 GRID lab Prerequisites

GRID LAB Exercises - Prerequisites

1. Install Java

2. Install GCC

$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test

$ sudo apt-get update

$ sudo apt-get install gcc-4.9

3. Installing Perl

$ sudo apt-get install perl

4. Installing Grid Essential

$ wgethttp://www.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo_latest_all.deb

$ sudo dpkg -i globus-toolkit-repo_latest_all.deb

if error comes ===> $ sudo apt-get update

$ sudo apt-get install globus-data-management-client

$ sudo apt-get install globus-gridftp

$ sudo apt-get install globus-gram5

$ sudo apt-get install globus-gsi

$ sudo apt-get install globus-data-management-server

$ sudo apt-get install globus-data-management-sdk

$ sudo apt-get install globus-resource-management-server

$ sudo apt-get install globus-resource-management-client

$ sudo apt-get install globus-resource-management-sdk

$ sudo apt-get install myproxy

$ sudo apt-get install gsi-openssh

$ sudo apt-get install globus-gridftp globus-gram5 globus-gsi myproxy myproxy-server myproxy-admin

5. Installing Eclipse or Netbeans

$ wgethttp://download.netbeans.org/netbeans/8.1/final/bundles/netbeans-8.1-javaee-linux.sh

$ chmod +x netbeans-8.1-javaee-linux.sh

$./netbeans-8.1-javaee-linux.sh

6. Installing Apache Axis

from :http://mirror.fibergrid.in/apache/axis/axis2/java/core/1.7.3/

In eclipse --> windows--> preference -> add the axis file---> apply --> ok

Download tomcat and install and start the service

in terminal go to tomcat folder $ bin/startup.sh

in webbrowser --> localhost:8080

Thursday, September 22, 2016

CS6703 GRID AND CLOUD COMPUTING SYLLABUS

CS6703 GRID AND CLOUD COMPUTING L T P C 3 0 0 3

OBJECTIVES:

The student should be made to:

· Understand how Grid computing helps in solving large scale scientific problems.

· Gain knowledge on the concept of virtualization that is fundamental to cloud computing.

· Learn how to program the grid and the cloud.

· Understand the security issues in the grid and the cloud environment.

UNIT I INTRODUCTION 9

Evolution of Distributed computing: Scalable computing over the Internet – Technologies for network based systems – clusters of cooperative computers – Grid computing Infrastructures – cloud computing – service oriented architecture – Introduction to Grid Architecture and standards –
Elements of Grid – Overview of Grid Architecture.

UNIT II GRID SERVICES 9

Introduction to Open Grid Services Architecture (OGSA) – Motivation – Functionality Requirements – Practical & Detailed view of OGSA/OGSI – Data intensive grid service models – OGSA services.

UNIT III VIRTUALIZATION 9

Cloud deployment models: public, private, hybrid, community – Categories of cloud computing: Everything as a service: Infrastructure, platform, software – Pros and Cons of cloud computing – Implementation levels of virtualization – virtualization structure – virtualization of CPU, Memory and I/O devices – virtual clusters and Resource Management – Virtualization for data center automation.

UNIT IV PROGRAMMING MODEL 9

Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model – Introduction to Hadoop Framework – Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters,
configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.

UNIT V SECURITY 9

Trust models for Grid security environment – Authentication and Authorization methods – Grid security infrastructure – Cloud Infrastructure security: network, host and application level – aspects of data security, provider data and its security, Identity and access management architecture, IAM practices in the cloud, SaaS, PaaS, IaaS availability in the cloud, Key privacy issues in the cloud.

TOTAL: 45 PERIODS

OUTCOMES:

At the end of the course, the student should be able to:
Apply grid computing techniques to solve large scale scientific problems.
Apply the concept of virtualization.
Use the grid and cloud tool kits.
Apply the security models in the grid and the cloud environment.

TEXT BOOK:

1. Kai Hwang, Geoffery C. Fox and Jack J. Dongarra, “Distributed and Cloud Computing: Clusters, Grids, Clouds and the Future of Internet”, First Edition, Morgan Kaufman Publisher, an Imprint of Elsevier, 2012.

REFERENCES:

1. Jason Venner, “Pro Hadoop- Build Scalable, Distributed Applications in the Cloud”, A Press, 2009
2. Tom White, “Hadoop The Definitive Guide”, First Edition. O’Reilly, 2009.
3. Bart Jacob (Editor), “Introduction to Grid Computing”, IBM Red Books, Vervante, 2005
4. Ian Foster, Carl Kesselman, “The Grid: Blueprint for a New Computing Infrastructure”, 2nd Edition, Morgan Kaufmann.
5. Frederic Magoules and Jie Pan, “Introduction to Grid Computing” CRC Press, 2009.
6. Daniel Minoli, “A Networking Approach to Grid Computing”, John Wiley Publication, 2005.
7. Barry Wilkinson, “Grid Computing: Techniques and Applications”, Chapman and Hall, CRC, Taylor and Francis Group, 2010.

CS6712 GRID AND CLOUD COMPUTING LAB SYLLABUS

CS6712 GRID AND CLOUD COMPUTING LABORATORY L T P C 0 0 3 2

OBJECTIVES:

The student should be made to:
Be exposed to tool kits for grid and cloud environment.
Be familiar with developing web services/Applications in grid framework
Learn to run virtual machines of different configuration.
Learn to use Hadoop

LIST OF EXPERIMENTS:

GRID COMPUTING LAB:

Use Globus Toolkit or equivalent and do the following:
1. Develop a new Web Service for Calculator.
2. Develop new OGSA-compliant Web Service.
3. Using Apache Axis develop a Grid Service.
4. Develop applications using Java or C/C++ Grid APIs
5. Develop secured applications using basic security mechanisms available in Globus Toolkit.
6. Develop a Grid portal, where user can submit a job and get the result. Implement it with and without GRAM concept.

CLOUD COMPUTING LAB:

Use Eucalyptus or Open Nebula or equivalent to set up the cloud and demonstrate:
1. Find procedure to run the virtual machine of different configuration. Check how many virtual machines can be utilized at particular time.
2. Find procedure to attach virtual block to the virtual machine and check whether it holds the data even after the release of the virtual machine.
3. Install a C compiler in the virtual machine and execute a sample program.
4. Show the virtual machine migration based on the certain condition from one node to the other.
5. Find procedure to install storage controller and interact with it.

6. Find procedure to set up the one node Hadoop cluster.
7. Mount the one node Hadoop cluster using FUSE.
8. Write a program to use the API’s of Hadoop to interact with it.
9. Write a wordcount program to demonstrate the use of Map and Reduce tasks

TOTAL: 45 PERIODS

OUTCOMES:

At the end of the course, the student should be able to:
Use the grid and cloud tool kits.
Design and implement applications on the Grid.
Design and Implement applications on the Cloud.

LIST OF EQUIPMENT FOR A BATCH OF 30 STUDENTS:

SOFTWARE:
Globus Toolkit or equivalent Eucalyptus or Open Nebula or equivalent

HARDWARE:
Standalone desktops 30 Nos

TCS-SEARS Mega Drive on 24th September

APACHE HADOOP ECOSYSTEM

https://opensource.com/sites/default/files/resize/styles/image-full-size/public/images/life-uploads/hadoop-EcoSys_yarn-640x418.PNG

Hadoop Distributed File System: HDFS, the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured data.

MapReduce: MapReduce is a software framework that serves as the compute layer of Hadoop. MapReduce jobs are divided into two (obviously named) parts. The “Map” function divides a query into multiple parts and processes data at the node level. The “Reduce” function aggregates the results of the “Map” function to determine the “answer” to the query.

Hive: Hive is a Hadoop-based data warehousing-like framework originally developed by Facebook. It allows users to write queries in a SQL-like language called HiveQL, which are then converted to MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Microstrategy, Tableau, Revolutions Analytics, etc.

Pig: Pig Latin is a Hadoop-based language developed by Yahoo. It is relatively easy to learn. Pig is a high level scripting language that is used with Apache Hadoop. Pig enables data workers to write complex data transformations without knowing Java. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL

HBase: HBase is a non-relational database that allows for low-latency, quick lookups in Hadoop. It adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. eBay and Facebook use HBase heavily.

Apache Flume

Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log data, events etc.. from various web servers to a centralized data store. It is a highly reliable, distributed, and configurable tool that is principally designed to transfer streaming data from various sources to HDFS.

Oozie: Oozie is a workflow processing system that lets users define a series of jobs written in multiple languages – such as Map Reduce, Pig and Hive -- then intelligently link them to one another. Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed.

Ambari: Ambari is a web-based set of tools for deploying, administering and monitoring Apache Hadoop clusters. It's development is being led by engineers from Hortonworoks, which include Ambari in its Hortonworks Data Platform.

Avro: Avro is a data serialization system that allows for encoding the schema of Hadoop files. It is adept at parsing data and performing removed procedure calls.

Mahout: Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform

Sqoop: Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop. It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata or other relational databases to the target.

HCatalog: HCatalog is a centralized metadata management and sharing service for Apache Hadoop. It allows for a unified view of all data in Hadoop clusters and allows diverse tools, including Pig and Hive, to process any data elements without needing to know physically where in the cluster the data is stored.

BigTop: BigTop is an effort to create a more formal process or framework for packaging and interoperability testing of Hadoop's sub-projects and related components with the goal improving the Hadoop platform as a whole.

R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S.

https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSY3O-3hDcQlqhmGUIjav1iPnZ0y0xn9ZYQrFdHsu5S0WG1XYHm

Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology.

YARN is one of the key features in the second-generation Hadoop 2 version of the Apache Software Foundation's open source distributed processing framework. Originally described by Apache as a redesigned resource manager, YARN is now characterized as a large-scale, distributed operating system for big data applications.

RABBITMQ