This is a helpful blog to understand the basic concepts in Linux and also useful for the preparation of Linux system administration kind of interviews. It also contains exercises for the Anna university Grid and cloud lab(2013 Reg) , GE8151 Problem solving and python programming notes,python books/jobs, Magento2 , Anna university BE./B.Tech Exam Time table, Kannel SMS Gateway and RabbitMQ cluster
Tuesday, September 27, 2016
Saturday, September 24, 2016
CS 6712 GRID lab Prerequisites
GRID LAB Exercises - Prerequisites
1. Install Java
2. Install GCC
$ sudo add-apt-repository
ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install gcc-4.9
3. Installing
Perl
$ sudo apt-get install perl
4. Installing
Grid Essential
$ sudo dpkg -i globus-toolkit-repo_latest_all.deb
if error
comes ===> $ sudo apt-get update
$ sudo apt-get install
globus-data-management-client
$ sudo
apt-get install globus-gridftp
$ sudo
apt-get install globus-gram5
$ sudo
apt-get install globus-gsi
$ sudo
apt-get install globus-data-management-server
$ sudo
apt-get install globus-data-management-sdk
$ sudo
apt-get install globus-resource-management-server
$ sudo
apt-get install globus-resource-management-client
$ sudo
apt-get install globus-resource-management-sdk
$ sudo apt-get install myproxy
$ sudo apt-get install gsi-openssh
$ sudo apt-get install globus-gridftp
globus-gram5 globus-gsi myproxy myproxy-server myproxy-admin
5. Installing
Eclipse or Netbeans
$ chmod +x netbeans-8.1-javaee-linux.sh
$./netbeans-8.1-javaee-linux.sh
6. Installing
Apache Axis
In eclipse -->
windows--> preference -> add the axis file---> apply --> ok
Download
tomcat and install and start the service
in terminal go to
tomcat folder $ bin/startup.sh
in webbrowser
--> localhost:8080
Thursday, September 22, 2016
CS6703 GRID AND CLOUD COMPUTING SYLLABUS
CS6703
GRID AND CLOUD COMPUTING L T P C
3 0 0 3
OBJECTIVES:
The student should be made to:
·
Understand how Grid computing helps
in solving large scale scientific problems.
·
Gain knowledge on the concept of
virtualization that is fundamental to cloud computing.
·
Learn how to program the grid and
the cloud.
·
Understand the security issues in
the grid and the cloud environment.
UNIT I
INTRODUCTION
9
Evolution
of Distributed computing: Scalable computing over the Internet – Technologies
for network based systems – clusters of cooperative computers – Grid
computing Infrastructures – cloud computing – service oriented
architecture – Introduction to Grid Architecture and standards –
Elements of Grid – Overview of Grid Architecture.
Elements of Grid – Overview of Grid Architecture.
UNIT II
GRID SERVICES
9
Introduction
to Open Grid Services Architecture (OGSA) – Motivation – Functionality
Requirements – Practical & Detailed view of OGSA/OGSI – Data intensive
grid service models – OGSA services.
UNIT III
VIRTUALIZATION
9
Cloud
deployment models: public, private, hybrid, community – Categories of cloud
computing: Everything as a service: Infrastructure, platform, software –
Pros and Cons of cloud computing – Implementation levels of virtualization
– virtualization structure – virtualization of CPU, Memory and I/O devices
– virtual clusters and Resource Management – Virtualization for data center
automation.
UNIT IV
PROGRAMMING MODEL
9
Open
source grid middleware packages – Globus Toolkit (GT4) Architecture ,
Configuration – Usage of Globus – Main components and Programming model –
Introduction to Hadoop Framework – Mapreduce, Input splitting, map and
reduce functions, specifying input and output parameters,
configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
UNIT V
SECURITY
9
Trust
models for Grid security environment – Authentication and Authorization methods
– Grid security infrastructure – Cloud Infrastructure security: network,
host and application level – aspects of data security, provider data and
its security, Identity and access management architecture, IAM practices
in the cloud, SaaS, PaaS, IaaS availability in the cloud, Key privacy issues in
the cloud.
TOTAL: 45 PERIODS
OUTCOMES:
At the end of the course, the
student should be able to:
Apply grid computing techniques to solve large scale scientific problems.
Apply the concept of virtualization.
Use the grid and cloud tool kits.
Apply the security models in the grid and the cloud environment.
Apply grid computing techniques to solve large scale scientific problems.
Apply the concept of virtualization.
Use the grid and cloud tool kits.
Apply the security models in the grid and the cloud environment.
TEXT BOOK:
1. Kai Hwang, Geoffery C. Fox and
Jack J. Dongarra, “Distributed and Cloud Computing: Clusters, Grids,
Clouds and the Future of Internet”, First Edition, Morgan Kaufman Publisher, an
Imprint of Elsevier, 2012.
REFERENCES:
1. Jason Venner, “Pro Hadoop- Build
Scalable, Distributed Applications in the Cloud”, A Press, 2009
2. Tom White, “Hadoop The Definitive Guide”, First Edition. O’Reilly, 2009.
3. Bart Jacob (Editor), “Introduction to Grid Computing”, IBM Red Books, Vervante, 2005
4. Ian Foster, Carl Kesselman, “The Grid: Blueprint for a New Computing Infrastructure”, 2nd Edition, Morgan Kaufmann.
5. Frederic Magoules and Jie Pan, “Introduction to Grid Computing” CRC Press, 2009.
6. Daniel Minoli, “A Networking Approach to Grid Computing”, John Wiley Publication, 2005.
7. Barry Wilkinson, “Grid Computing: Techniques and Applications”, Chapman and Hall, CRC, Taylor and Francis Group, 2010.
2. Tom White, “Hadoop The Definitive Guide”, First Edition. O’Reilly, 2009.
3. Bart Jacob (Editor), “Introduction to Grid Computing”, IBM Red Books, Vervante, 2005
4. Ian Foster, Carl Kesselman, “The Grid: Blueprint for a New Computing Infrastructure”, 2nd Edition, Morgan Kaufmann.
5. Frederic Magoules and Jie Pan, “Introduction to Grid Computing” CRC Press, 2009.
6. Daniel Minoli, “A Networking Approach to Grid Computing”, John Wiley Publication, 2005.
7. Barry Wilkinson, “Grid Computing: Techniques and Applications”, Chapman and Hall, CRC, Taylor and Francis Group, 2010.
CS6712 GRID AND CLOUD COMPUTING LAB SYLLABUS
CS6712
GRID AND CLOUD COMPUTING LABORATORY
L T P C 0 0
3 2
OBJECTIVES:
The student should be made to:
Be exposed to tool kits for grid and cloud environment.
Be familiar with developing web services/Applications in grid framework
Learn to run virtual machines of different configuration.
Learn to use Hadoop
Be exposed to tool kits for grid and cloud environment.
Be familiar with developing web services/Applications in grid framework
Learn to run virtual machines of different configuration.
Learn to use Hadoop
LIST OF EXPERIMENTS:
GRID COMPUTING LAB:
Use Globus Toolkit or equivalent and
do the following:
1. Develop a new Web Service for Calculator.
2. Develop new OGSA-compliant Web Service.
3. Using Apache Axis develop a Grid Service.
4. Develop applications using Java or C/C++ Grid APIs
5. Develop secured applications using basic security mechanisms available in Globus Toolkit.
6. Develop a Grid portal, where user can submit a job and get the result. Implement it with and without GRAM concept.
1. Develop a new Web Service for Calculator.
2. Develop new OGSA-compliant Web Service.
3. Using Apache Axis develop a Grid Service.
4. Develop applications using Java or C/C++ Grid APIs
5. Develop secured applications using basic security mechanisms available in Globus Toolkit.
6. Develop a Grid portal, where user can submit a job and get the result. Implement it with and without GRAM concept.
CLOUD COMPUTING LAB:
Use Eucalyptus or Open Nebula or
equivalent to set up the cloud and demonstrate:
1. Find procedure to run the virtual machine of different configuration. Check how many virtual machines can be utilized at particular time.
2. Find procedure to attach virtual block to the virtual machine and check whether it holds the data even after the release of the virtual machine.
3. Install a C compiler in the virtual machine and execute a sample program.
4. Show the virtual machine migration based on the certain condition from one node to the other.
5. Find procedure to install storage controller and interact with it.
1. Find procedure to run the virtual machine of different configuration. Check how many virtual machines can be utilized at particular time.
2. Find procedure to attach virtual block to the virtual machine and check whether it holds the data even after the release of the virtual machine.
3. Install a C compiler in the virtual machine and execute a sample program.
4. Show the virtual machine migration based on the certain condition from one node to the other.
5. Find procedure to install storage controller and interact with it.
6. Find procedure to set up the one
node Hadoop cluster.
7. Mount the one node Hadoop cluster using FUSE.
8. Write a program to use the API’s of Hadoop to interact with it.
9. Write a wordcount program to demonstrate the use of Map and Reduce tasks
7. Mount the one node Hadoop cluster using FUSE.
8. Write a program to use the API’s of Hadoop to interact with it.
9. Write a wordcount program to demonstrate the use of Map and Reduce tasks
TOTAL: 45 PERIODS
OUTCOMES:
At the end of the course, the
student should be able to:
Use the grid and cloud tool kits.
Design and implement applications on the Grid.
Design and Implement applications on the Cloud.
Use the grid and cloud tool kits.
Design and implement applications on the Grid.
Design and Implement applications on the Cloud.
LIST OF EQUIPMENT FOR A BATCH OF 30
STUDENTS:
SOFTWARE:
Globus Toolkit or equivalent Eucalyptus or Open Nebula or equivalent
Globus Toolkit or equivalent Eucalyptus or Open Nebula or equivalent
HARDWARE:
Standalone desktops 30 Nos
Standalone desktops 30 Nos
APACHE HADOOP ECOSYSTEM
Hadoop Distributed File System: HDFS, the storage layer of Hadoop, is a distributed,
scalable, Java-based file system adept at storing large volumes of unstructured
data.
MapReduce: MapReduce is a software framework that serves as the
compute layer of Hadoop. MapReduce jobs are divided into two (obviously named)
parts. The “Map” function divides a query into multiple parts and processes
data at the node level. The “Reduce” function aggregates the results of the
“Map” function to determine the “answer” to the query.
Hive: Hive is a Hadoop-based data warehousing-like framework
originally developed by Facebook. It allows users to write queries in a
SQL-like language called HiveQL, which are then converted to MapReduce. This
allows SQL programmers with no MapReduce experience to use the warehouse and
makes it easier to integrate with business intelligence and visualization tools
such as Microstrategy, Tableau, Revolutions Analytics, etc.
Pig: Pig Latin is a Hadoop-based language developed by Yahoo. It
is relatively easy to learn. Pig is a high level scripting language that
is used with Apache Hadoop. Pig enables data workers to write
complex data transformations without knowing Java. Pig's simple SQL-like
scripting language is called Pig Latin, and appeals to developers
already familiar with scripting languages and SQL
HBase: HBase is a non-relational database that allows for
low-latency, quick lookups in Hadoop. It adds transactional capabilities to
Hadoop, allowing users to conduct updates, inserts and deletes. eBay and
Facebook use HBase heavily.
Apache Flume
Apache Flume is a tool/service/data
ingestion mechanism for collecting aggregating and transporting large amounts
of streaming data such as log data, events etc.. from various web servers to a
centralized data store. It is a highly reliable, distributed, and configurable
tool that is principally designed to transfer streaming data from various
sources to HDFS.
Oozie: Oozie is a workflow processing system that lets users
define a series of jobs written in multiple languages – such as Map Reduce, Pig
and Hive -- then intelligently link them to one another. Oozie allows users to
specify, for example, that a particular query is only to be initiated after
specified previous jobs on which it relies for data are completed.
Ambari: Ambari is a web-based set of tools for deploying,
administering and monitoring Apache Hadoop clusters. It's development is being
led by engineers from Hortonworoks, which include Ambari in its Hortonworks
Data Platform.
Avro: Avro is a data serialization system that allows for
encoding the schema of Hadoop files. It is adept at parsing data and performing
removed procedure calls.
Mahout: Apache Mahout is a project of the Apache Software
Foundation to produce free implementations of distributed or otherwise scalable
machine learning algorithms focused primarily in the areas of collaborative
filtering, clustering and classification. Many of the implementations use the
Apache Hadoop platform
Sqoop: Sqoop is a connectivity tool for moving data from
non-Hadoop data stores – such as relational databases and data warehouses –
into Hadoop. It allows users to specify the target location inside of Hadoop
and instruct Sqoop to move data from Oracle, Teradata or other relational
databases to the target.
HCatalog: HCatalog is a centralized metadata management and sharing
service for Apache Hadoop. It allows for a unified view of all data in Hadoop
clusters and allows diverse tools, including Pig and Hive, to process any data
elements without needing to know physically where in the cluster the data is stored.
BigTop: BigTop is an effort to create a more formal process or
framework for packaging and interoperability testing of Hadoop's sub-projects
and related components with the goal improving the Hadoop platform as a whole.
R is a programming language and software environment for statistical
analysis, graphics representation and reporting. R was created by Ross
Ihaka and Robert Gentleman at the University of Auckland, New Zealand,
and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and
pre-compiled binary versions are provided for various operating systems
like Linux, Windows and Mac. This programming language was named R, based on the first
letter of first name of the two R authors (Robert Gentleman and Ross
Ihaka), and partly a play on the name of the Bell Labs Language S.
Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology.
YARN is one of the key features in the second-generation Hadoop 2
version of the Apache Software Foundation's open source distributed
processing framework. Originally described by Apache as a redesigned
resource manager, YARN is now characterized as a large-scale,
distributed operating system for big data applications.
Subscribe to:
Posts (Atom)