Get your own free workspace
View
 

Storage Clouds: Private versus Public, Security and Polices

Page history last edited by Banned User 1 year, 7 months ago

 

Storage Clouds: Private versus Public, Security and Polices

Attendance = ~25 People.

 

 

Private versus Public storage – How defined

Early clouds even before Amazon S3.

Service, but customers were concerned about security

Wanted SW - but it was a service, not a shrink wrap service

Quick overview of Parascale from Cameron Bahar, why created (demand from scale8 customers who want to install it inside the firewall).

ParaScale is about shrink wrapped SW

Software on commodity Linux servers, including your existing servers

Federated servers

ParaScale was built with several design goals in mind:

Transparency of data, we manage the data so the user doesn’t have to.

Ability to create tiers within clouds for cost or data locality operations

Self * (star) support

·         Managing = cloud handles most operations (replication, capacity balancing, etc.)

·         Discovery = no need to administer node-by-node. Automatically add new nodes and centralized configuration

·         Self-healing = Failures are transparent to applications and users. IP address move to surviving nodes, data is re-replicated to satisfy polices, etc.

Economies of scale based on moore’s laws. Leveraging commodity hardware?

A number of questions about ParaScale from the group:

Do you replicate files or blocks?

Parascale replicates chunks.

Why address the problem from a file perspective (Amazon EBS block store for example)?

Growth is in unstructured data and hence files

Doing blocks is simple and punts on the issues of shared storage for lots of users and security/etc, Block access = single client access at a time

General requirements are different

 

How much CPU is Used?

80% idle for archival, less for streaming.

Questions about ParaScale metadata loss impact

Control nodes are HA pairs and metadata can be recreated from the storage nodes if both control nodes fail.

How does a user get access?

We present virtual file systems that are accessible via standard NFS (no custom clients), HTTP or FTP.

 

 

Security Discussion between Private and Public clouds:

Public = Challenging to have different security models from S3, Nirvanix

What are the use cases we are talking about (regarding security)?

One example given: ISP with millions of customers each with their own virtual instance.

How does a public cloud deal with user security mgmt. It is unlikely they will plug into existing enterprise options (AD / LDAP)?

No real answers from the audience. There was a hope that a standard API or the ability to leverage existing methods would emerge.

What about Private clouds?

Private clouds plug into existing models.

The discussion moved to other topics including:

What about the defensible space on every node (physical access)?

Tough for both as a rogue admin can screw you.

“with a private you can find out who it was and kill them?”

“Adding nodes requires access”

“Things that fail go into a big shredder”

“For private clouds data is chunked so stealing a single server requires putting the pieces together.”

“How about public, things can be subpoenaed and you may not even know it”

“If they know they have it, if not it could be a good place to hide your data.”

“Even more challenging with DOD letters as the provider must find the data or give everything up.”

“Least identity the better”, “What about going with two vendors – one for block and one for file system.”

“You can always make it more complicated (again hiding the data).”

 

 

Policy for data classification discussion:

How do you like polices, simple templates (# of copies, by file type, aging of files) i.e the meat axe? Or fine tuning with lots of arguments, i.e. the scalpel.

Meat axe is more popular. Provide CLI for scalpel operations

“I suggest using client apps to help classify data. Tagging …." Top secret " … translates into policy”

 

 

Somebody asked about ParaScale Metadata scalability?

Based on usage. Cloud is a good fit for lots of large files.

How do you deal with billions of thumbnails?

Cloud offerings don’t. Most use a custom block device that is costly.

Others change the application to map characteristics to the system (bundle files for streaming I/O versus lots of random I/O).

“Hadoop has this functionality for dealing with small files (tunable block sizes as well)”

 

 

Data Protection

Where do you handle transactions (is memory the new disk)?

We have a number of enterprise legacy architects and we don’t trust memory. We ensure data is on the disk. We can’t acknowledge a write until it’s on disk.

“What about tangasol or gigaspaces? – “Disk is the archival system for the memory”.

“Locality of chunks is important for availability - Being able to place copies in different locations / racks.

With 10gE = Send out 5 copies

How big should the storage cloud be? Fault domains of a single cluster, build three 300 nodes or a single 1000 node?

“A few 300 nodes clouds is better than one 1000 node cloud”

“Can you test on a “super cloud” to scale test and for future deployments. No.”

What is ROI of building inside versus Amazon numbers given you can scale CPU separate from Storage?

 PR: wait...  I: wait...  L: wait...  LD: wait...  I: wait... wait...  Rank: wait...  Traffic: wait...  Price: wait...  C: wait...

Out of Time

If you add a new topic, please add it to this list on the main page so people can easily find it.
You can easily add a new page by creating a wiki link with the title of the page like Car Shipping Prices  you want to create on the front page. Then, when you click on that link, you will be taken to a new page that you can add to.

Comments (1)

Coby Royer said

at 11:22 am on Jul 31, 2009

RE "How does a public cloud deal with user security mgmt. It is unlikely they will plug into existing enterprise options (AD / LDAP)?"

--This is exactly the approach taken by Symplified. We recognize that for Enterprises to adopt Cloud Computing they want to leveraging existing Directories. Check out http://www.symplified.com. I've also blogged on this topic: http://blog.symplified.com/blog/bid/23768/Avoiding-Silos-in-the-Cloud .

You don't have permission to comment on this page.