Deal of the Day

Home » Main » Manning Forums » 2009 » R in Action

Thread: Chapter 1 comments

Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 2 - Pages: 1 - Last Post: Jan 24, 2010 5:47 PM by: robert.kabacoff
esawdust

Posts: 19
From: Colorado, USA
Registered: 8/22/09
Chapter 1 comments
Posted: Aug 22, 2009 2:18 PM
  Click to reply to this thread Reply

Overall impression: I thought Chapter 1 did a pretty good job of introducing R in a way that was both informative and not overwhelming. As a software engineer, my first look at a book is #1) does it address my questions (or at least have some promise of that) and #2) how quickly can I get my hands on the code (both the install and looking at the software.)

In both cases, I think chapter 1 met those expectations.

I like the quick direction taken towards "tips" in the form of the Common Mistakes....that's a great thread to take through the entire book and can be really useful and time-saving callouts.

Section 1.3.4 talks about using R interactively vs a script file, but the batch (cli) form of running a script isn't presented until later in 1.5. Seems like it would be good to throw down a quick one-line CLI form of running the script in 1.3.4 and go into more detail later.

Questions I had about Chapter 1 that I think would be good to address:

1) What kind of data scale is R capable of? (Megabyte, Gigabyte, Terabyte? - just trying to get a basic sense of how far it can go.) Rationale - why would I want to invest my time into R if it will fall over after a threshold I know I'll hit?

I come at it from the standpoint that I have 5 years of vehicle location data - latitude, longitude, speed, altitude, direction and other sensor data - data points collected every 3 seconds for 5 years - stuffed in a database and I want to analyze things like proximity to various points of interest (fire stations, etc), speed data at various times during the year, and so on.

I have years worth of wind speed and direction data collected every few seconds from instruments I made and installed at my home.

WIll R be able to handle that much data? If not, why bother learning R? Answering this question of scale in the first chapter would be critical to me as a buyer of the book to hook me.

2) Conversely, what limitations does it have compared to other common stats packages, commercial or open source? Why would I be better off buying Matlab instead of investing time into R? Perhaps some hierarchical diagram showing where R sits compared to the big gorillas in the space would be beneficial.

3) Why would I not want to use R? I know this seems an odd question, but it helps bring out the reasons for why to use it, as well. If the reason not to use R is a problem I don't have, then I can more easily say "Ya, R is right for me."

Anyway, those are some comments and questions I had after initial reading of Chapt 1.

Thanks,

Landon

robert.kabacoff

Posts: 74
Registered: 8/3/09
Re: Chapter 1 comments
Posted: Aug 22, 2009 4:01 PM   in response to: esawdust in response to: esawdust
  Click to reply to this thread Reply

Hi Landon,

These are great questions!

Let's talk about data size first. Out of the box, R keeps all objects in memory, so you are limited by how much you have. I typically use R on datasets that have approximatley 500,000 records and about 500 variables (on a 3 year old Windows XP machine with 2 gig of RAM) and have not had difficulties.

For really large problems, with terabytes of data, you would want to use one of the contributed packages aimed at supporting large problems. The CRAN page
http://cran.r-project.org/web/views/HighPerformanceComputing.html provides numerous pointers. In particular, the packages ff, bigmemory, and biglm allow you to analyze huge problems. However, I have limited experience analyzing such large problems myself.

How about R's limitations? There are two that stand out.

The first is that R is really an interactive programming language. To get the most out of it, you have to be willing to write code. There are GUIs available (which I talk about in an appendix), but they will not give you the FULL power of the language.

The second limitation is one that I see in the business world. R does not create pretty tables in RTF or DOC format (although there are several packages that can help you output your work as HTML files). I have to create pretty reports for clients in MS Word on a weekly basis, and find that I have to import the graphics and reformat the tables to make them acceptable to our clients. Programs like SAS and SPSS aren't great, but they are better than R at this.

So why would you want to use R?

At the risk of hyperbole, it is the best graphics program around and no one beats it for cutting edge statistical methods. It is probably the defacto international standard in academic circles. And it is free and extensible, with a very active user support group.

Hope this helps.

Rob Kabacoff

robert.kabacoff

Posts: 74
Registered: 8/3/09
Re: Chapter 1 comments
Posted: Jan 24, 2010 5:47 PM   in response to: esawdust in response to: esawdust
  Click to reply to this thread Reply

Hi Landon,

I just wanted to let you know that I am adding an Appendix specifically on analyzing large (gigabyte and terabyte) data problems.

Thanks for the question!

Sincerely,

Rob

Legend
Gold: 300 + pts
Silver: 100 - 299 pts
Bronze: 25 - 99 pts
Manning Author
Manning Staff