Algorithms of the intelligent Web

Free Download

Authors:

ISBN: 1933988665, 978-1-933988-66-5

Size: 8 MB (8033829 bytes)

Pages: 369/369

File format:

Language:

Publishing Year:

Category: Tags: ,

Haralambos Marmanis, Dmitry Babenko1933988665, 978-1-933988-66-5


Table of contents :
Front Cover……Page 1
brief contents……Page 6
contents……Page 8
preface……Page 14
H. Marmanis……Page 17
D. Babenko……Page 18
about this book……Page 19
Roadmap……Page 20
Code Conventions……Page 22
About the cover illustration……Page 23
What is the intelligent web?……Page 24
1.1 Examples of intelligent web applications……Page 26
1.2 Basic elements of intelligent applications……Page 27
1.3.1 Social networking sites……Page 29
1.3.2 Mashups……Page 30
1.3.3 Portals……Page 31
1.3.5 Media-sharing sites……Page 32
1.3.6 Online gaming……Page 33
1.4.1 Examine your functionality and your data……Page 34
1.4.2 Get more data from the web……Page 35
1.5 Machine learning, data mining, and all that……Page 38
1.6 Eight fallacies of intelligent applications……Page 39
1.6.1 Fallacy #1: Your data is reliable……Page 40
1.6.5 Fallacy #5: Apply the same good library everywhere……Page 41
1.7 Summary……Page 42
1.8 References……Page 43
Searching……Page 44
2.1 Searching with Lucene……Page 45
2.1.1 Understanding the Lucene code……Page 47
2.1.2 Understanding the basic stages of search……Page 52
2.2 Why search beyond indexing?……Page 55
2.3 Improving search results based on link analysis……Page 56
2.3.1 An introduction to PageRank……Page 57
2.3.2 Calculating the PageRank vector……Page 58
2.3.4 Understanding the power method……Page 61
2.3.5 Combining the index scores and the PageRank scores……Page 66
2.4 Improving search results based on user clicks……Page 68
2.4.1 A first look at user clicks……Page 69
2.4.2 Using the NaiveBayes classifier……Page 71
2.4.3 Combining Lucene indexing, PageRank, and user clicks……Page 74
2.5.1 An introduction to DocRank……Page 78
2.5.2 The inner workings of DocRank……Page 80
2.6 Large-scale implementation issues……Page 84
2.7 Is what you got what you want? Precision and recall……Page 87
2.8 Summary……Page 88
2.9 To do……Page 89
2.10 References……Page 91
Creating suggestions and recommendations……Page 92
3.1 An online music store: the basic concepts……Page 93
3.1.1 The concepts of distance and similarity……Page 94
3.1.2 A closer look at the calculation of similarity……Page 99
3.1.3 Which is the best similarity formula?……Page 102
3.2.1 Recommendations based on similar users……Page 103
3.2.2 Recommendations based on similar items……Page 112
3.2.3 Recommendations based on content……Page 115
3.3.1 Introducing MyDiggSpace.com……Page 122
3.3.2 Finding friends……Page 123
3.3.3 The inner workings of DiggDelphi……Page 125
3.4.1 An introduction of movie datasets and recommenders……Page 130
3.4.2 Data normalization and correlation coefficients……Page 133
3.5 Large-scale implementation and evaluation issues……Page 138
3.7 To Do……Page 140
3.8 References……Page 142
Clustering: grouping things together……Page 144
4.1 The need for clustering……Page 145
4.1.1 User groups on a website: a case study……Page 146
4.1.2 Finding groups with a SQL order by clause……Page 147
4.1.3 Finding groups with array sorting……Page 148
4.2 An overview of clustering algorithms……Page 151
4.2.1 Clustering algorithms based on cluster structure……Page 152
4.2.2 Clustering algorithms based on data type and structure……Page 153
4.2.3 Clustering algorithms based on data size……Page 154
4.3.1 The dendrogram: a basic clustering data structure……Page 155
4.3.2 A first look at link-based algorithms……Page 157
4.3.3 The single-link algorithm……Page 158
4.3.4 The average-link algorithm……Page 160
4.3.5 The minimum-spanning-tree algorithm……Page 162
4.4.1 A first look at the k-means algorithm……Page 165
4.4.2 The inner workings of k-means……Page 166
4.5.1 Introducing ROCK……Page 169
4.5.2 Why does ROCK rock?……Page 170
4.6.1 A first look at density-based algorithms……Page 174
4.6.2 The inner workings of DBSCAN……Page 176
4.7.1 Computational complexity……Page 180
4.7.2 High dimensionality……Page 181
4.8 Summary……Page 183
4.9 To Do……Page 184
4.10 References……Page 185
Classification: placing things where they belong……Page 187
5.1 The need for classification……Page 188
5.2 An overview of classifiers……Page 192
5.2.1 Structural classification algorithms……Page 193
5.2.2 Statistical classification algorithms……Page 195
5.2.3 The lifecycle of a classifier……Page 196
5.3 Automatic categorization of emails and spam filtering……Page 197
5.3.1 NaïveBayes classification……Page 198
5.3.2 Rule-based classification……Page 211
5.4.1 A use case of fraud detection in transactional data……Page 222
5.4.2 Neural networks overview……Page 224
5.4.3 A neural network fraud detector at work……Page 226
5.4.4 The anatomy of the fraud detector neural network……Page 231
5.4.5 A base class for building general neural networks……Page 237
5.5 Are your results credible?……Page 242
5.6 Classification with very large datasets……Page 246
5.7 Summary……Page 248
5.8 To do……Page 249
Books and articles……Page 253
Combining classifiers……Page 255
6.1 Credit worthiness: a case study for combining classifiers……Page 257
6.1.1 A brief description of the data……Page 258
6.1.2 Generating artificial data for real problems……Page 262
6.2.1 The naïve Bayes baseline……Page 266
6.2.2 The decision tree baseline……Page 268
6.2.3 The neural network baseline……Page 270
6.3 Comparing multiple classifiers on the same data……Page 273
6.3.1 McNemar’s test……Page 274
6.3.2 The difference of proportions test……Page 276
6.3.3 Cochran’s Q test and the F test……Page 278
6.4 Bagging: bootstrap aggregating……Page 280
6.4.1 The bagging classifier at work……Page 281
6.4.2 A look under the hood of the bagging classifier……Page 283
6.4.3 Classifier ensembles……Page 286
6.5 Boosting: an iterative improvement approach……Page 288
6.5.1 The boosting classifier at work……Page 289
6.5.2 A look under the hood of the boosting classifier……Page 291
6.6 Summary……Page 295
6.7 To Do……Page 296
6.8 References……Page 300
Putting it all together: an intelligent news portal……Page 301
7.1 An overview of the functionality……Page 303
7.2.1 Get set. Get ready. Crawl the Web!……Page 304
7.2.2 Review of the search prerequisites……Page 305
7.2.3 A default set of retrieved and processed news stories……Page 307
7.3 Searching for news stories……Page 309
7.4 Assigning news categories……Page 311
7.4.1 Order matters!……Page 312
7.4.2 Classifying with the NewsProcessor class……Page 317
7.4.3 Meet the classifier……Page 318
7.4.4 Classification strategy: going beyond low-level assignments……Page 320
7.5 Building news groups with the NewsProcessor class……Page 323
7.5.1 Clustering general news stories……Page 324
7.5.2 Clustering news stories within a news category……Page 328
7.6 Dynamic content based on the user’s ratings……Page 331
7.7 Summary……Page 334
7.8 To do……Page 335
7.9 References……Page 339
A.1 What is BeanShell?……Page 340
A.4 References……Page 341
B.1 An overview of crawler components……Page 342
B.1.2 Our simple crawler……Page 343
B.1.3 Open source web crawlers……Page 344
B.2 References……Page 345
C.1 Vectors and matrices……Page 346
C.2 Measuring distances……Page 347
C.4 References……Page 349
appendix D: Natural language processing……Page 350
D.1 References……Page 352
appendix E: Neural networks……Page 353
E.1 References……Page 354
A……Page 356
C……Page 357
D……Page 359
F……Page 360
I……Page 361
L……Page 362
N……Page 363
P……Page 364
R……Page 365
S……Page 366
T……Page 367
Z……Page 368
Back Cover……Page 369

Reviews

There are no reviews yet.

Be the first to review “Algorithms of the intelligent Web”
Shopping Cart
Scroll to Top