Deal of the Day

Home » Main » Manning Forums » 2011 » Machine Learning in Action

Thread: Errors and Corrections

Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 82 - Pages: 6 [ Previous | 1 2 3 4 5 6 | Next ] - Last Post: Apr 7, 2014 1:38 AM by: Borz
jrgauthier

Posts: 1
Registered: 12/13/11
Re: Errors and Corrections
Posted: Dec 13, 2011 12:51 PM   in response to: mato in response to: mato
  Click to reply to this thread Reply

On page 13, "Mathmatica" should be Mathematica.

peter.harrington

Posts: 80
Registered: 2/13/11
Re: Errors and Corrections
Posted: Dec 13, 2011 12:56 PM   in response to: jrgauthier in response to: jrgauthier
  Click to reply to this thread Reply

Thanks, it's fixed!

initionSteve

Posts: 2
From: London, UK
Registered: 12/17/11
Re: Errors and Corrections
Posted: Dec 17, 2011 5:31 AM   in response to: peter.harrington in response to: peter.harrington
  Click to reply to this thread Reply

on page 27 you say "We have to explicitly tell the interpreter that we would like the integer version of the last item in the list or it will give us the string version. Usually, you would have to do this but NumPy takes care of those details for you."

I'm not sure which details you're talking about - the logical conclusion would be the cast to int, but that's something that NumPy *isn't* doing for me, right?

Also, the way that you are using "from numpy import *" violates the python guidelines. I realise that you are more interested in teaching machine learning than python, but it's still bad practice.

If you used a simple "import numpy" you would be both helping people with python and highlighting in the code where you are using NumPy - this will help people understand exactly how useful NumPy is!

peter.harrington

Posts: 80
Registered: 2/13/11
Re: Errors and Corrections
Posted: Dec 19, 2011 8:00 PM   in response to: initionSteve in response to: initionSteve
  Click to reply to this thread Reply

Hi Steve, thanks for your input.
About the first item I think the wording I am using is not clear and I will try to fix that.

About the second suggestion, you are right I am more interested in teaching machine learning than following the Python guidelines. Other people have pointed out that I use camelCase rather than snake_case, I have my reasons for doing that. Another thing I have to do as an author is fit the code on one page. If I followed the guidelines the code would look like this:
import numpy
jj = numpy.eye(3)
numpy.shape(jj)

I would have to type an extra "numpy." on every NumPy function. Now consider the same code:
from numpy import *
jj = eye(3)
shape(jj)
That's six fewer characters. I have seen people do the import as:
import numpy as np
Now we would only need three extra characters per NumPy function.
When I write "production" code in Python (if there is such a thing) I only import the functions I need, in the trivial example above I would use:
from numpy import eye, shape

initionSteve

Posts: 2
From: London, UK
Registered: 12/17/11
Re: Errors and Corrections
Posted: Dec 20, 2011 10:37 AM   in response to: peter.harrington in response to: peter.harrington
  Click to reply to this thread Reply

No worries - I thought that might be the reason.

Also, on page 67:
We have an equation for the probability of a piece of data belonging to Class 1 (the circles): p1(x, y), and we have an equation for the belonging to Class 2 (the triangles)

peter.harrington

Posts: 80
Registered: 2/13/11
Re: Errors and Corrections
Posted: Dec 20, 2011 12:05 PM   in response to: initionSteve in response to: initionSteve
  Click to reply to this thread Reply

Thanks for catching that page 67 error, the technical proofer didn't even flag that.

Nabatzis

Posts: 2
From: United States
Registered: 1/22/12
Re: Errors and Corrections
Posted: Jan 22, 2012 11:48 AM   in response to: mato in response to: mato
  Click to reply to this thread Reply

I recently (a few days ago) started going through MLiA and would like to offer a couple of suggestions and also point out a possible error. Perhaps the suggestions could be better placed in another thread, but since they are 'errors and corrections' related, I thought to place them here...

The errors(?), on pg. 27 you prompt us to run the file2Matrix method indicating 'datingTestSet.txt' as the argument. No such file is present in the code zip that I downloaded today (2012-01-22), there only was a 'datingTestSet2.txt', notice the 2.

When I run the method the problem I came across is that the output I get for the datingLabels variable does not match the one presented a few lines further down. Mine looks like, [3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]. From that point on I cannot follow and I am only on page 30 :-(

Now for my suggestion. You mention on page 2 that the manuscript is of version 7, fine but kind of irrelevant. It would be more helpful if you mentioned the date it was put together or more importantly the date of the last 'errors and corrections' it is addressing. Such a date "stamp" would be very much appreciated for the code as well. If I missed any such information I humbly apologize.

Thank you very much for your efforts and happy new year !!!

V/R
Nikolaos

P.S. I just noticed that the said file (datingTestSet2.txt) has four(4) entries not three (3) as explained in the text.

Message was edited by:
Nabatzis

Nabatzis

Posts: 2
From: United States
Registered: 1/22/12
Re: Errors and Corrections
Posted: Jan 24, 2012 9:49 AM   in response to: peter.harrington in response to: peter.harrington
  Click to reply to this thread Reply

I recently (a few days ago) started going through MLiA and would like to offer a couple of suggestions and also point out a possible error. Perhaps the suggestions could be better placed in another thread, but since they are 'errors and corrections' related, I thought to place them here...

The errors(?), on pg. 27 you prompt us to run the file2Matrix method indicating 'datingTestSet.txt' as the argument. No such file is present in the code zip that I downloaded today (2012-01-22), there only was a 'datingTestSet2.txt', notice the 2.

When I run the method the problem I came across is that the output I get for the datingLabels variable does not match the one presented a few lines further down. Mine looks like, [3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]. From that point on I cannot follow and I am only on page 30

Now for my suggestion. You mention on page 2 that the manuscript is of version 7, fine but kind of irrelevant. It would be more helpful if you mentioned the date it was put together or more importantly the date of the last 'errors and corrections' it is addressing. Such a date "stamp" would be very much appreciated for the code as well. If I missed any such information I humbly apologize.

Thank you very much for your efforts and happy new year !!!

V/R
Nikolaos

P.S. I just noticed that the said file (datingTestSet2.txt) has four(4) entries not three (3) as explained in the text. In GitHub both files are present, albeit they both have the same contents!!!
Has anybody been able to go through the examples in the book??? So far it is very disheartening.

Message was edited by:
Nabatzis

petomhay

Posts: 3
From: Christchurch
Registered: 3/1/12
Re: Errors and Corrections
Posted: Mar 1, 2012 3:24 AM   in response to: mato in response to: mato
  Click to reply to this thread Reply

A few points from reading the first two chapters of the production version.
I'm using python 2.7.2 rather than 3 since it fits the examples best.

p22: Importing a python module stored as a source file does not execute it. This means that the ‘from numpy import *’ line will not be executed by just importing kNN. That line will still need to be typed manually before you can proceed.

p27: The data file in Ch2, given in the archive, is called ‘datingTestSet2.txt’, not ‘datingTestSet.txt’. The given file has the label item as an integer, rather than text. A line of your text indicates that the data should be integer - ‘You have to explicitly tell the interpreter that you’d like the integer version of the last item in the list, or it will give you the string version.’ However, the example shows text values, like ‘didntLike’, ‘smallDoses’, etc.

p29-30: Figure 2.4 shows a plot of columns 1 vs 2, as does Figure 2.3 (which lacks colour). However, the text preceding Figure 2.4 says that Figure 2.4 is a plot of columns 0 vs 1 and shows three coloured regions. I think the text should say what Figure 2.4 is, and then say ‘Figure 2.5 is a plot of columns 0 vs 1' at this point.

Minor points, but important to get it right at the beginning or it confuses people.

petomhay

Posts: 3
From: Christchurch
Registered: 3/1/12
Re: Errors and Corrections
Posted: Mar 1, 2012 12:07 PM   in response to: petomhay in response to: petomhay
  Click to reply to this thread Reply

A couple more from Chapter 2.

p32: Listing 2.3 has a typo. In the the line
normDataSet = zeros(shape(dataSet))
the argument to zeros needs to be a tuple so add brackets
normDataSet = zeros( (shape(dataSet)) )

p32: The comment at the bottom of page is a bit confusing. We are not doing matrix division, so you just need to say that / does element-wise division.

petomhay

Posts: 3
From: Christchurch
Registered: 3/1/12
Re: Errors and Corrections
Posted: Mar 13, 2012 4:09 AM   in response to: petomhay in response to: petomhay
  Click to reply to this thread Reply

A few more comments:

p49: Fig 3.2. The ‘No’ outcome of ‘No Surfacing?’ is labelled with a subtable
that has headings of ‘No surfacing?’ and ‘Fish?’
I think the ‘No surfacing?’ heading should be ‘Flippers?’

p56: Output from treePlotter.retrieveTree(1) contains the key 'surfacing' which
is not in the data defined in the function. Perhaps the function has changed
since typing the example output?

p56: The para before listing 3.7 says you may have a version of treePlotter().
It would have been clearer to say a version of treePlotter.py.

p84: localWords has input parameters feed1 and feed0, but the output probabilities are
in the reverse order p0V and p1V. Also the listing 4.6 does not print the 'classification error', as shown in the code file, and also in the text on page 81-82.

p94: Why do you have to set X0? Why is is set at 1.0?
What is the 0th feature? Sounds like it should be called the 0/1 feature.

p95: I appreciate the comments about interpreting the compact python operations. This is where some notation is needed and I suspect most readers will be unfamiliar with it. I'll just take it on trust.

p95: Must include the line
from numpy import *
either in the code or in the interactive session

p96: The value of 'weights' is unknown when plotBestFit is called. This is because the interactive session on p95 does not assign 'weights'. If 'weights' is assigned and used in the call to plotBestFit then an error is raised on line
weights = wei.getA()
to say that the 'ndarray' object has no attribute 'getA'.
If I change the code to
weights =wei
it runs OK.
Looks as if wei cannot provide an address when used as a function parameter.

p97: Listing 5.3. Cannot use 'array' here without first having imported numpy to the interactive shell.

p100: The Constant of 0.01 in listing 5.4 is actually 0.001 in the code file for chapter 5. Results do not look as good as for logistic regression.

Chapter 6 comes as a bit of a shock for me. Terminology and ideas are strange. I don't think I'm prepared for it but we'll see how it goes.

peter.harrington

Posts: 80
Registered: 2/13/11
Re: Errors and Corrections
Posted: Mar 20, 2012 6:54 PM   in response to: Nabatzis in response to: Nabatzis
  Click to reply to this thread Reply

HI Nabatzis,
Someone has gone through all the code in the book and I have made changes so that it all should be working. You should be getting the final version of the e-book today or tomorrow. All the examples there should work. If they are not working, please post here the specific error you are getting and we can try to debug it.

Frederic

Posts: 8
From: Tokyo
Registered: 4/9/12
Re: Errors and Corrections
Posted: Apr 9, 2012 6:07 AM   in response to: mato in response to: mato
  Click to reply to this thread Reply

In Chapter 4, Naive Bayes classifier: the code trainNB0 handles both set of words and bags of words in the same way, but in the case of set of words the probability P(W|C) seems wrong. If we are interested in whether a word W can appear in a document in class C, the probability should be the count of documents containing W in class C, divided by the count of documents in class C. The code is using the number of words in the documents in class C as denominator instead. Is this a mistake or a different distribution?

The trick to prevent probabilities of zero for rare words is similar to additive smoothing http://en.wikipedia.org/wiki/Additive_smoothing, but the latter would add numWords to the denominators instead of just 2. Is this an error, or a different method?

In Chapter 5, Logistic Regression: in loadDataSet(), when building dataMat, why is there a X0? On page 91, the book states that X0 has value zero, while it is 1.

Thanks, I really enjoy reading this book.

Frederic

Frederic

Posts: 8
From: Tokyo
Registered: 4/9/12
Re: Errors and Corrections
Posted: Apr 9, 2012 6:58 PM   in response to: Frederic in response to: Frederic
  Click to reply to this thread Reply

Hi,

In Chapter 5: ok, I think I see why there's a X0: it is to compute the constant for the separating hyperplane.

But the separating hyperplane concepts comes in Chapter 6, so perhaps it would be good to introduce the idea in Chapter 5 instead?

Thanks,
Frederic

peter.harrington

Posts: 80
Registered: 2/13/11
Re: Errors and Corrections
Posted: Apr 9, 2012 7:47 PM   in response to: Frederic in response to: Frederic
  Click to reply to this thread Reply

Hi Frederic,
Yes the X0 term is used to compute the constant value.
I'm glad you enjoyed reading the book.
Peter

Legend
Gold: 300 + pts
Silver: 100 - 299 pts
Bronze: 25 - 99 pts
Manning Author
Manning Staff
Manning Developmental Editor