Clustering of Justice

A couple of my friends were discussing the ideological leanings of the Supreme Court in light of Neil Gorsuch’s confirmation hearing, and they argued that despite the court having 4 career liberals and 4 career conservatives, the court actually leaned left. Their reasoning being that both Roberts and Kennedy are so called RINOs (Republicans in Name Only).

Kennedy was appointed during the Reagan administration while there was a Democratic Senate, so Reagan was arguably forced to pick a more moderate Republican as a compromise candidate. Chief Justice Roberts on the other hand has been the crucial swing vote on several landmark cases such as US vs. Windsor (the DOMA ruling)NFIB vs. Sebelius (ACA ruling), and Citizens United vs. FEC (Super PACs). All 8 other justices ruled along party lines in each of those cases, but Roberts voted with the liberal judges on the former 2.

While researching my friends’ claims, I stumbled upon a pretty interesting dataset on a Wikipedia page detailing SCOTUS Justice leanings across history. I thought it would be interesting to apply a more rigorous test to my friend’s assessment of the court’s positions with a little machine learning.

Ideological Leanings of U.S. Supreme Court Justices

The table contains features for some of the most important functions of the court. Taken from the Wikipedia Page:

  • Criminal Procedure = A higher number means pro-defendant votes in cases involving the rights of persons accused of crime, except for the due process rights of prisoners.
  • Civil Rights = A higher number means more votes permitting intervention on First Amendment freedom cases which pertain to classifications based on race (including Native Americans), age, indigence, voting, residence, military, or handicapped status, sex, or alienage.
  • First Amendment = A higher number reflects votes that advocate individual freedoms with regards to speech.
  • Union = A higher number means pro-union votes in cases involving labor activity.
  • Economic = A higher number means more votes against commercial business activity, plus litigation involving injured persons or things, employee actions concerning employers, zoning regulations, and governmental regulation of corruption other than that involving campaign spending.
  • Federalism = A higher number means votes for a larger, more empowered government in conflicts between the federal and state governments, excluding those between state and federal courts, and those involving the priority of federal fiscal claims.
  • Federal Taxes = A higher number means more votes widening the government’s ability to define and enforce tax concepts and policies in cases involving the Internal Revenue Code and related statues.

However, as Thomas Sowell famously explained in Judge Robert Bork’s confirmation hearing, slapping numbers on sometimes subjective criteria like whether a decision is pro-union or pro-defendant does not make the underlying issue any more objective. So baked into these numbers is perhaps some implicit bias, but these definitions are fairly rigid and definitely make for a good starting point.

We treat judges as sample vectors and each ruling issue as feature vectors. Since each cell in the matrix represents a percentage of votes, the data is already normalized and requires little preprocessing. This is an unsupervised problem since the dataset doesn’t label party affiliation, so K-means where k=2 is a good candidate for deducing political party. I held out the current sitting judges as a test set and trained on historical judges to see if we could indeed discern their leanings without biasing the training with their contributions. These are the resulting predictions:

Justice Predicted Ideology
Fred M. Vinson Conservative
Tom C. Clark Conservative
Sherman Minton Conservative
Earl Warren Liberal
John Marshall Harlan II Conservative
William J. Brennan, Jr. Liberal
Charles Evans Whittaker Conservative
Potter Stewart Conservative
Byron White Conservative
Arthur Goldberg Liberal
Abe Fortas Liberal
Thurgood Marshall Liberal
Warren E. Burger Conservative
Harry Blackmun Conservative
Lewis F. Powell, Jr. Conservative
William Rehnquist Conservative
John Paul Stevens Liberal
Sandra Day O’Connor Conservative
William Rehnquist Conservative
Antonin Scalia Conservative
David Souter Liberal
Anthony Kennedy Conservative
Clarence Thomas Conservative
Ruth Bader Ginsburg Liberal
Stephen Breyer Liberal
John Roberts Conservative
Samuel Alito Conservative
Sonia Sotomayor Liberal
Elena Kagan Liberal

Even with a pretty scarce amount of data, we are able to predict all 8 of the current justice’s affiliations properly and the predictions for judges that we trained on also seem to confirm my intuitions.

Here is a more clear visualization of the two clusters in the dataset:

graph

I reduced the dimensionality of the dataset using PCA for plotting purposes, but these two components explain 90% of the variance in the data set. I also interpolated a meshgrid over the plot to illustrate where the cluster boundary for this space is.

The observation that immediately strikes me is that both Roberts and Kennedy are fairly distant from the hyperplane that separates these clusters. In fact, many former conservative judges are closer to the frontier of the cluster than they are, and interestingly, even the current liberal judges are closer to “moderate” than they are. Looking purely at the judges’ track records in this dataset, it seems Roberts and Kennedy are firmly in red territory despite occasional dissent with their peers.

I suspect that while Roberts has ruled with the liberals on a handful of high profile cases, he votes in line with his party more often on the cases that don’t get as much lip service. Since these are all given equal weight in the dataset, the justice is not highly distinguishable from his colleagues.

As a textualist, Gorsuch will probably be fairly close to Scalia and Thomas on the plot, bringing the court further to the right as it was before Scalia’s death. But I do see the point my friends make because the high profile cases do matter and even with Scalia, judicial activism was often the prevailing legal theory on the bench in the news. The dynamics of the bench are certainly in the air these next 4 years, and it’s entirely possible that Trump may get to choose two more justices with Ginsberg and Kennedy likely reaching the end of their careers.

Ideas for follow up analysis and future projects regarding SCOTUS:

  • Biplot for feature contribution to PCA
  • Either manually label the dataset or label it using the clustering output and construct an SVM classifier. Distance from the separating hyperplane can be a proxy score for party affiliation.
    • Can measure the bench’s average party affiliation over time
  • Bench ruling coocurrence matrix
  • Semantic analysis on opinions
    • Topic discovery via LDA
    • Use this output to label opinions, amicus briefs, etc. and classify documents

A Survey of Recommender Systems

[Edit: WordPress doesn’t import images very nicely, I’ll update with the original images as soon as possible]

I had the pleasure of interning with the data science team at RetailMeNot this past summer and at the end of the summer I wrote up a blog post that explains some of the high level concepts that surround my project.

Recommender systems are an integral part of the user experience of major online services. The front page of Amazon and Netflix alone contain numerous recommendations of products and movies tailored to the interest of the user. Of course, curating these types of suggestions at scale requires leveraging user’s history to gain some insight about what might be of interest to them. There are several different approaches to suggesting content to users and given the immense business value of making proper recommendations, it’s worth understanding a variety of techniques employed by major applications like RetailMeNot.

Collaborative Filtering

Perhaps the most well-known use of the collaborative filtering method was in the competition for the Netflix Prize, a million dollar challenge that provided contestants nearly 100 million ratings that users gave to Netflix movies in an attempt to find the algorithm that could best predict the ratings in a test set of nearly 3 million ratings. With data that was limited to just triples formed by the user ID, movie ID, and the rating, it’s not possible to explicitly utilize any characteristics about the user like their age, income, or gender to infer what a good recommendation might be. With limited information, collaborative filtering leverages the assumption that similar users should have similar preferences.

Measuring similarity between users requires some method of representing the tastes of a user that can directly be compared.

Representing a user as a vector of their ratings for every movie allows the recommender system to leverage some useful matrix properties for collaborative filtering.  First, it gives the recommender some intuitive functions to measure user to user similarity. The following are all metrics for the similarity between two vectors:

Cosine Similarity

 

Pearson Correlation Coefficient

 

 

Jaccard Similarity (for bit vectors/binary attributes)

Of these methods, cosine similarity is the metric that performs the best across cross validation tests on several data sets. With this notion of similarity, the concept of finding two similar users from a pool of users can be extended to a more systematic approach using matrix operations. Starting with a user-item matrix where the entries in the matrix are ratings or numerical expressions of the preference that the user u has for item i. After adjusting matrix A such that the columns of A all have norm 1 (meaning that each column in A is divided by the norm of that column vector), computing AAT returns a |U| x |U| user similarity matrix, where each entry represents the cosine similarity between user r and user c. From here, it is a pretty natural and trivial operation to find a user quite similar to user r through sorting that user’s row on the matrix and then finding a movie that user c has rated favorably that user r has not yet rated.

This approach is a fairly naïve approach to a recommender system, and there have been several variations of algorithms that manipulate the user-item matrix to generate recommendations. For example, the previous method leverages what is traditionally known as User-User collaborative filtering, but in a similar manner, Item-Item collaborative filtering computes an Item-Item matrix by normalizing the row vectors of the user-item matrix A to unit vectors and calculating ATA. This |I| x |I| matrix conveys the similarity between item r and item c in each entry. Using a user’s historical interactions with certain items, this matrix can easily be sorted to find similar undiscovered items that the user may rate favorably.

The use cases for certain implementations of collaborative filtering are dependent on the circumstances in which that recommender may be used and the costs and benefits that they present. For example, an advantage of item-item collaborative filtering is that given enough training data, the item-item similarity model does not have to be recomputed often. This is because new users being added to the platform shouldn’t substantially affect how conceptually similar two items are. Thus, new users can be given recommendations without having to reconstruct the user-item matrix since the model already can provide similar items to that which the user has interacted with. The same is not true of user-user collaborative filtering since a new user would need to be incorporated into the user-item matrix to find another similar user for recommendations.

Graph Based Collaborative Filtering

Part of the 2016 Summer Intern program at RetailMeNot was an exploration of graph based approaches to recommendation problems. The spectral representation of the traditional collaborative filtering algorithm is a sparse bipartite graph that treats users and items as distinct types of nodes and draws edges between these nodes based on the type of interaction the user had with a particular item. For rating based algorithms, these edges may be weighted based on the rating, but other algorithms may choose to make these edges unweighted.

 

 

An advantage of using graphs for collaborative filtering is that they can give an intuitive interpretation of what a similar user or item looks like based on the degree of separation between them and the number of paths between a pair of users/items.

 

 

 

For example, it’s pretty easy to see that items 3-7 are candidates for recommendations for user A because they are some of the items that users of User A’s items also use. In other words, they are two degrees of separation removed from user A. As more information such as the rating and general popularity of these relationships are taken into account, the ranking of these recommendations can be refined further.

Matrix Completion

One important consideration that has been glanced over so far is the inherent nature of the user-item matrix to have missing entries where a user has simply not interacted with a certain item. In fact, if there were no missing entries in this matrix, then there would be no need for any type of recommender system at all. But unsurprisingly, the user-item matrix is quite sparse since most users won’t have any rating for the vast majority of items that a platform offers. Deciding how to deal with these missing entries is a central parameter in many collaborative filtering algorithms and can make a substantial difference in the performance of one recommender system compared to another.

Matrix completion algorithms are methods of computing an approximation of the user-item matrix often through some factorization of the matrix that uses some assumption about the missing entries. For instance, we can either assume that the missing entries are zero or that they are the average rating for that item:

 

 

The next step is to factorize the matrix Q into two or more matrices such that the matrix multiplication of these matrices result in an approximation of Q. This approximation is then treated as the rating matrix that can be sorted to make recommendations.

One common approach for matrix completion is known as Alternating Least Squares. It approximates the m x n matrix Q by factorizing it into a m x k and a k x n matrix. k represents the number of latent factors that ALS tries to capture in the matrix.

ALS takes a gradient descent approach to matrix factorization meaning that it takes steps towards minimizing a cost function associated with the problem. In this case, the cost that we’re attempting to minimize is the difference between the approximated matrix and Q. ALS gets its name from the process of alternating the approximation and update of each of the factor matrices based on the other through the following cost functions:

 

 

Each cost function above is in regards to one user or item. The first term in each cost function explains the factorizations’ distance away from Q. The weight matrix is a diagonal matrix with 0 or 1 entries on the diagonal depending on whether there was a rating for that user-item pair (1 if there was). So Wu may represent an |I| x |I| weight matrix for User Bob with a 1 on every diagonal entry corresponding to an item that Bob rated. This is to emphasize fitting the approximation to the rating that are in the training set and not the ones that are subject to prediction. The second term on each of these matrices are regularization terms that prevent overfitting to Q so that the recommendations are more meaningful than the original sparse matrix Q.

Finally, one other way to complete the matrix without factorization is to use something akin to a K-Nearest Neighbors algorithm to fill in the rating of a user.  For instance, the similarity of two users can be defined as (1-mean squared difference) between the entries that the two vectors have completed. From here, each missing rating for user u can be the average of the ratings made for those items by the k nearest neighbors of user u.

Content-Based Filtering

Unlike collaborative filtering, content based filtering does try to infer something about the user’s interests and the item’s properties and doesn’t rely on the tastes of other users to make recommendations. For example, associating tags with movies and using the user’s historical interactions and ratings with those movies to generate recommendations is a method of content based filtering. Of course, leveraging content based filtering requires more information than just user and item IDs; thus a couple of important questions have to be answered. First, formulating a meaningful way to describe items is necessary for this type of analysis. One idea associated with collaborative filtering is to use tags or categories that would determine what buckets certain items belong too. Then a user can be recommended other items with similar tags based on their interactions with items on the site. Of course, the difficulty here is determining what meaningful tags would be for each item that a site offers and scaling up the tagging process for a service with several offerings.

Depending on what type of data an item is associated with, there are different ways of relating tags to items. Social tagging is the process by which items are assigned crowd or user sourced tags. Tweets and YouTube videos go through this process. If this process thoroughly tags items, then using these item-tag associations is a natural way to do content-based filtering.

Otherwise, if items have a title or description associated with them, it is possible to decompose the title or the description into the more relevant words in these phrases. This process is subject to more noise and needs to be accompanied by a few extra steps. First, it helps to filter out common stop words that aren’t relevant in describing a product like “the” or “a” or any other words that might be common in these descriptions despite not adding any valuable insight into what the product is. Additionally, item descriptions can often come from various sources, contain irregular characters, and consist of useless information like serial numbers. It can be helpful to use a regex filter to remove non-alphabetical characters or non-alphanumeric characters. Finally, several words that have the same root contribute the same meaning to an item description but would be registered as different words because they are spelled differently (Examples: “women” and “woman”, “phone” and “phones”). At a high level, lemmatization is the process of reducing a word down to its root, and the nltk module in Python has a good implementation of this approach, which can help convert similar tags into identical tags.

With item to tag associations constructed, it’s not too difficult to create the item-tag matrix to represent this data. In much the same way that the item similarity matrix was constructed in the collaborative filtering method, it’s possible to construct an item similarity matrix with the item-tag matrix as well. The difference here is that instead of relating items through their common users, this algorithm relates items through their tags.

Content based filtering however, is unique from collaborative filtering methods in that there are some information retrieval and natural language processing tools that can be used to enhance the intuition of item similarity by inferring some insight about the context in which these words appear.

The first common method of contextualizing terms in documents is known as Latent Semantic Analysis. LSA leverages the property that all matrices have a Singular Value Decomposition to construct comparable vectors out of terms and the “documents” that they appear in. LSA is often used in information retrieval, in which the literature often refers to terms and documents. For the purposes of recommender systems, tags represent terms and items represent documents. The SVD is a factorization of an m x n matrix A into 3 new matrices:

 

  • U is an m x m matrix composed of m orthogonal unit eigenvectors of the term-term matrix (AAT). Also known as the left singular vectors of A.
  • Σ is an m x n matrix with k non-negative diagonal entries where k is the rank of the matrix. These are known as the singular values of A which are also the square roots of the eigenvalues of (AAT) or (ATA).
  • VT is an n x n matrix composed of n orthogonal unit vectors of the document-document matrix (ATA). Also known as the right singular vectors of A.

The SVD can be truncated to form a low rank approximation of the original matrix and reduce the number of dimensions that represent each document and term. The significance of this truncation is that the reduced term and document vectors capture information about terms that commonly co-occur with other terms.

The last component involves formulating a way to represent a query. Since a query is just a collection of words, a query vector can be represented as the centroid of the word vectors that make up that query. Because documents are now mapped into the same space that terms and queries are due to the truncated SVD, it is now possible to assess the similarity between a query and a document based on the vectors associated with either. The cosine similarity metric from earlier is a good candidate here as well, and with the following matrix operation it’s trivial to find the document most relevant to the query.

A user interacts with different items on the site that all have these tags and over time these tags can be accumulated into a query. This approach works well for users that have interacted more thoroughly with the site, but there is often not enough context for this algorithm to work well with new users.

Graph Based Content Filtering

Most approaches to graph based content filtering involve representing tags and items as nodes with edges drawn between item nodes and the relevant tag nodes that describe them. There is a many to many relationship between tag nodes and item nodes. For example, there may be edges between the item “laptop” and the tags “electronics” and “portable”; likewise, “electronics” may be adjacent to the nodes “TV” and “computer” as well. Most graph based approaches to this problem differ in the ways that they use edge weights to curate recommendations.

While it’s possible to represent the LSA algorithm as a graph traversal problem, there are other interesting ways to use a graph for content based filtering as well. One interesting approach to this utilizes the famous Word2Vec algorithm alongside a graph to generate recommendations.

Word2Vec is also an algorithm that tries to learn the context in which certain keywords appear, but unlike LSA, it uses a shallow neural network to either predict the words around a term or predict a term based on the words around it. This algorithm is actually one that works better for item titles and descriptions than it would for social tagging because it relies on words having context. The nature of tags dictates that they are unordered and independent of each other making them a poor candidate for this model.

An important feature of Word2Vec is its ability to map terms to an n-dimensional vector using gradient descent. Unlike LSA which transforms documents to a lower dimensional vector based on context information of term-concurrence, this model treats both the documents and queries as the centroid of the term vectors associated with the document or query.

Edges can be drawn between category nodes and items where the edge weights correspond to the cosine similarity between the item/document vector and the category/query vector. From here, it is possible to traverse the graph to find the longest path given a path length constraint and the initial category nodes that correspond to the user’s interests.

Hybrid Filtering

As the name suggests, hybrid filtering adopts principles from collaborative and content based filtering. This method uses the information in the user-tag and item-tag matrix to augment the user-item matrix. Similar to the pagerank algorithm, hybrid filtering uses a graph to represent the idea of a stochastic random walk with a Markov Chain.

Given these three matrices, formulated in the same way as in the two prior methods, it’s possible to compute the similarity between each pair of items or users in the following manner:

 

 

Then initialize the user-item matrix to the sparse matrix:

 

These similarity matrices can be used as a state transition probability matrix. For example, to simulate users talking a random walk on the item graph, we use the user-item graph and the item similarity matrix:

 

Conversely, it’s possible to simulate an item taking a random walk on the user graph by using the user-item matrix and the user similarity matrix:

 

Combining these results upon convergence yields a new dense, complete user-item matrix:

 

The new user-item matrix can be used for inferences about recommendations by sorting and filtering out the new user preference vectors just like in ALS. While the result of ALS and the random walk method are both a complete user-item matrix, the random walk approach is not directly trying to minimize a cost function, and it incorporates information pertaining to more than just one matrix rather than trying to factorize just the user-item matrix.

 

Recommender Challenges/Concerns

TF-IDF

The goal of a recommender system is ideally to recommend items that a user would otherwise not interact with. A common problem that recommender systems may face is that they often recommend content that is already popular. To account for this bias, we can borrow a technique used in textual mining to determine the value of a word to a document is inverse document frequency.

Textual mining often deals with documents and terms where documents contain terms. Since certain words like “the” and “a” appear often across many documents, these terms are not as significant to the meaning of the document as other more unique terms like “basketball” or “water” that may appear less frequently between documents. For short documents, a good way of ranking these words is often through the inverse document frequency metric:

 

Where N is the number of documents in the set and nt is the number of documents that contain the term in question. So how does this relate to recommendation? The analogy of documents and terms can be translated to represent users and items respectively instead. Now N represents the number of users and nt represents the number of users that have interacted with an item in question. From here, one approach may be to multiply the scalar IDF score of each item with the item vector associated with it before proceeding with any recommendation algorithm such that the original ratings are adjusted. Another approach may weight the final recommendation list that the algorithm develops against the IDF scores to penalize items that are already popular and strengthen those that aren’t as well known.

 

Validation

Binary Classification

Depending on how important ranking is to your use case, it may be okay to approach recommendation with a testing mindset similar to that of classification. For example, on mobile, RetailMeNot recommends 6 stores you may like in a 3 by 2 grid, and arguably, the order in which they are presented in that grid is not of principal concern, but what is presented is more important. Just like with the MPR measure, this validation technique holds out a set of user-item interactions for training the recommender, and marks the rest of the items that a user interacts with as positive test examples. For user-item pairs that have ratings like Netflix movies, there may need to be additional criteria to determine if the user had a positive interaction with this item. In this case, an assumption could be that a user had a positive experience if they interacted with a movie and gave it a rating greater than 4 stars.

The recommender system can then return a ranked list of recommendations once it is a trained for a user. In order to validate this list, the first n recommendations can be classified as positive examples by the recommendation classifier. This assigns equal importance to all of the top n rankings, but allows access to the vast suite of validation tools available to binary classifiers using the confusion matrix such as precision, recall, ROC curves, F-measure, etc…

 

MPR

Mean Percentile Rank is a weighted form of recall that validates ranked lists based on the following formula:

 

rui represents the ratings of the held out user-item ratings. rankui is a value between 0 and 1 based on the rank predicted by the recommender. The function to translate between the raw ranking to rankui is:

 

An MPR closer to 0 indicates a higher performing recommender system while a score of .5 indicates virtually random recommendation performance.  One concern with MPR is that the metric can punish the recommender just for having higher dimensional data on the ranking list, but this effect is diminished as the test set expands.

Resources

Alternating Least Squares:

http://stanford.edu/~rezab/nips2014workshop/submits/logmat.pdf

http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/

 

Lantent Semantic Analysis:

http://lsa.colorado.edu/papers/dp1.LSAintro.pdf

 

Word2Vec:

http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf

 

Hybrid Filtering:

https://arxiv.org/ftp/arxiv/papers/1310/1310.7957.pdf

 

k-NN recommendation and others:

http://www.sciencedirect.com/science/article/pii/S0950705113001044

NBA Most improved Player

I was watching a Warriors game last week and commentator Jim Barnett said something to the effect of “Stephen Curry deserves to win the most improved player award as well as MVP.” While it’s clear that Steph has made pretty large improvements especially in scoring per game, I think it’s pretty tough for such a high performance player to improve more than the rest of the league.

I decided to take a quantitative approach to this nomination by comparing player stats across points, assists, rebounds, blocks, and steals per game between the 2015 and 2016 season for each player.

The approach I took was pretty simple. I calculated the percent change on each of the categories and summed the changes up for each player for an equally weighted average change which I sorted on. Results of the top 100 most improved players as of 1/25/16 are listed below.

Interestingly, Stephen Curry barely made the top 100 primarily because his blocking has decreased. This result pointed out a flaw with my methodology. Categories like blocks generally have a much lower value than categories like points per game; therefore, the BLKPG has a very wide spread of percent changes relative to the others. For instance, going from 1 blocks per game to 2 is a large percent change, while going from 25 points per game to 30 isn’t as large of a return. So an update to the model that I’ll consider is weighting the average percent change by the average value of each category so that small changes don’t overpower the categories with generally larger values.

Data Source

Source Code

RANK PLAYER RPG PTS APG BLKPG STPG AVERAGE
1 C.J. McCollum 140.00% 205.88% 330.00% 53.85% 68.12% 159.57%
2 Jon Leuer 72.73% 100.00% 85.71% 214.29% 118.52% 118.25%
3 Tony Parker 36.84% -11.81% 4.08% 500.00% 38.46% 113.52%
4 Matthew Dellavedova 26.32% 70.83% 60.00% 300.00% 102.78% 111.99%
5 Mike Scott -24.14% -11.54% -9.09% 550.00% -16.22% 97.80%
6 Otto Porter Jr. 86.67% 108.33% 133.33% -14.63% 174.58% 97.66%
7 J.J. Barea 11.76% 24.00% 17.65% 400.00% -11.63% 88.36%
8 Nik Stauskas 91.67% 59.09% 100.00% 65.22% 118.52% 86.90%
9 Kent Bazemore 46.67% 150.00% 140.00% 2.27% 85.51% 84.89%
10 Ian Mahinmi 22.41% 95.35% 160.00% 37.33% 87.76% 80.57%
11 Jae Crowder 41.67% 89.61% 72.73% 76.67% 109.09% 77.95%
12 Will Barton 117.86% 126.47% 71.43% 24.24% 7.23% 69.45%
13 Omri Casspi 64.10% 39.33% 0.00% 100.00% 78.26% 56.34%
14 Dennis Schroder 28.57% 8.00% 14.63% 140.00% 70.31% 52.30%
15 Mason Plumlee 25.81% 6.90% 211.11% 18.18% -1.27% 52.15%
16 Kyle Lowry 6.38% 16.29% -5.88% 189.47% 42.95% 49.84%
17 Rajon Rondo 18.18% 31.46% 46.84% 100.00% 32.84% 45.86%
18 Evan Fournier 11.54% 14.17% 23.81% 66.67% 97.10% 42.66%
19 Derrick Williams 33.33% 10.84% 14.29% 160.00% -8.70% 41.95%
20 Danilo Gallinari 54.05% 55.65% 78.57% 23.53% -15.00% 39.36%
21 Al-Farouq Aminu 41.30% 87.50% 87.50% -16.67% -10.53% 37.82%
22 Brook Lopez 12.16% 15.12% 128.57% 8.00% 18.33% 36.44%
23 Jerami Grant 50.00% 41.27% 8.33% 50.48% 16.13% 33.24%
24 Marvin Williams 36.73% 28.38% 7.69% 128.26% -35.23% 33.17%
25 Isaiah Thomas 30.43% 32.32% 57.14% 0.00% 38.82% 31.74%
26 Jarrett Jack 38.71% 6.67% 57.45% 37.50% 12.77% 30.62%
27 Louis Williams 42.11% 0.00% 28.57% 92.31% -10.91% 30.42%
28 Darrell Arthur 41.38% 12.12% 20.00% 62.22% 6.02% 28.35%
29 Andre Drummond 14.07% 25.36% 28.57% -20.86% 94.38% 28.31%
30 Kentavious Caldwell-Pope 16.13% 16.54% 46.15% 22.73% 38.94% 28.10%
31 Marcus Morris 10.42% 34.62% 31.25% 50.00% 12.82% 27.82%
32 Ramon Sessions 8.70% 60.32% 7.14% 62.22% 27.68%
33 Cory Joseph 4.17% 22.06% 25.00% 36.36% 48.28% 27.17%
34 Ersan Ilyasova 14.58% -2.61% 20.00% 78.79% 20.97% 26.35%
35 Draymond Green 15.85% 25.64% 97.30% 4.00% -12.18% 26.12%
36 Arron Afflalo 15.63% 4.51% 5.88% 133.33% -39.62% 23.95%
37 Tony Snell 45.83% -3.33% 11.11% 93.33% -27.27% 23.93%
38 Shane Larkin -8.70% 9.68% 23.33% 83.33% 5.74% 22.68%
39 Victor Oladipo 14.29% -22.35% -4.88% 142.31% -22.75% 21.32%
40 Khris Middleton -20.45% 32.09% 73.91% 42.86% -27.92% 20.10%
41 Gerald Green 4.00% -10.08% -25.00% 131.25% -3.51% 19.33%
42 Ryan Anderson 25.00% 22.63% 11.11% 24.24% 12.96% 19.19%
43 Brandon Knight -5.13% 15.88% -1.92% 93.75% -9.79% 18.56%
44 Jameer Nelson 34.78% -2.41% 27.50% 40.00% -8.00% 18.37%
45 DeMar DeRozan -2.17% 15.42% 17.14% 77.78% -18.03% 18.03%
46 Avery Bradley -16.13% 7.19% 16.67% 31.58% 49.06% 17.67%
47 Kemba Walker 20.00% 18.50% -1.96% 28.00% 21.53% 17.21%
48 Kosta Koufos 13.21% 40.38% -40.00% 19.23% 52.78% 17.12%
49 Paul Millsap 12.82% 10.18% 12.90% 43.16% 6.18% 17.05%
50 Pau Gasol -8.47% -10.81% 14.81% 14.36% 75.00% 16.98%
51 Aron Baynes -2.22% -18.18% 40.00% 64.52% 0.00% 16.82%
52 Zaza Pachulia 57.35% 25.30% -20.83% 27.59% -7.27% 16.43%
53 Monta Ellis 16.67% -26.46% 21.95% 77.42% -8.11% 16.29%
54 Greg Monroe -3.92% 0.63% 9.52% 85.71% -11.50% 16.09%
55 Andre Roberson -13.16% 44.12% -20.00% 55.81% 12.66% 15.89%
56 Dirk Nowitzki 13.56% 2.31% -5.26% 44.19% 21.57% 15.27%
57 Richard Jefferson -44.00% -1.72% 0.00% 106.67% 13.95% 14.98%
58 Marcin Gortat 13.79% 11.48% 33.33% -2.24% 18.33% 14.94%
59 Matt Barnes 27.50% -9.90% 13.33% 7.58% 32.95% 14.29%
60 Trevor Booker 34.00% -20.83% -9.09% -5.77% 72.22% 14.11%
61 Robert Covington 24.44% -16.30% 6.67% 38.64% 15.83% 13.86%
62 Anthony Tolliver 6.45% -15.87% 12.50% 33.33% 31.25% 13.53%
63 Nikola Mirotic 18.37% 2.94% 16.67% -1.52% 30.30% 13.35%
64 Leandro Barbosa 0.00% -7.04% 0.00% 33.33% 40.32% 13.32%
65 Andre Iguodala 30.30% -2.56% 13.33% 12.50% 9.48% 12.61%
66 C.J. Miles 3.23% -6.67% 0.00% 37.84% 27.91% 12.46%
67 Charlie Villanueva 13.04% -9.52% 33.33% -14.71% 39.13% 12.26%
68 Nikola Vucevic -21.10% -12.95% 35.00% 43.84% 16.44% 12.24%
69 Amir Johnson 3.28% -13.98% 6.25% 48.10% 16.95% 12.12%
70 Jimmy Butler -8.62% 12.50% 27.27% 30.91% -1.71% 12.07%
71 Luis Scola -13.85% 5.32% -38.46% 95.45% 10.34% 11.76%
72 Brandon Bass -14.29% -40.57% -15.38% 128.21% 0.00% 11.59%
73 Courtney Lee 13.04% -0.99% -30.00% 62.50% 12.37% 11.38%
74 Thaddeus Young 66.67% 7.80% -30.43% 21.21% -8.59% 11.33%
75 Lavoy Allen 5.88% 10.00% -16.67% -20.90% 75.00% 10.66%
76 Giannis Antetokounmpo 7.46% 22.05% 7.69% 10.48% 5.56% 10.65%
77 Mike Conley -3.33% -5.06% 7.41% 55.00% -0.79% 10.64%
78 John Wall -8.70% 13.07% -3.00% 28.07% 22.86% 10.46%
79 Nicolas Batum 6.78% 60.64% 12.50% -12.50% -16.36% 10.21%
80 Kevin Love 12.37% -4.88% 9.09% 0.00% 32.35% 9.79%
81 Wayne Ellington -34.38% -37.00% -43.75% 133.33% 25.49% 8.74%
82 Tobias Harris 14.29% -19.88% 22.22% 20.75% 0.99% 7.67%
83 Russell Westbrook -1.37% -14.95% 12.79% 23.81% 17.70% 7.60%
84 Alex Len -10.61% 6.35% 80.00% -42.11% 2.04% 7.14%
85 Spencer Hawes 20.00% -3.45% 66.67% -60.27% 12.12% 7.01%
86 Jared Dudley 22.58% 19.44% 5.56% -13.33% -2.00% 6.45%
87 Jordan Clarkson 18.75% 26.89% -31.43% -20.00% 37.21% 6.28%
88 Kawhi Leonard -4.17% 21.21% 4.00% 24.00% -17.75% 5.46%
89 Cody Zeller 1.72% 18.42% -43.75% -11.39% 60.00% 5.00%
90 Marco Belinelli -28.00% 17.39% 33.33% 0.00% 2.00% 4.94%
91 Roy Hibbert -21.13% -36.79% 27.27% -0.61% 54.17% 4.58%
92 Bojan Bogdanovic 29.63% 4.44% 22.22% -41.67% 6.82% 4.29%
93 Damian Lillard -4.35% 17.14% 11.29% 19.23% -22.03% 4.26%
94 Michael Carter-Williams -5.66% -18.49% -16.42% 62.22% -2.98% 3.73%
95 Robin Lopez -10.45% -8.33% 66.67% -3.52% -25.93% 3.69%
96 Mario Chalmers -3.85% 0.98% -2.63% 30.77% -11.11% 2.83%
97 Jeremy Lin 23.08% 11.61% -32.61% 48.84% -37.84% 2.61%
98 Stephen Curry 25.58% 26.47% -14.29% -30.00% 1.47% 1.85%
99 Terrence Ross -17.86% -12.24% -30.00% 40.00% 29.23% 1.83%
100 Bismack Biyombo 31.25% 14.58% 0.00% -1.94% -35.71% 1.64%

 

Computer Architecture

I probably won’t be posting as much this week because of exams, but I did want to just make an update to the blog.

My computer architecture professor gave us an essay prompt asking us to take a stance on whether computer architecture has seen any major advances in the last  decade, which I thought would make for a good post. By the way, this was the first essay I’ve had to write as a CS student and it made me realize that my formal writing has gotten a lot worse, but regardless here’s the essay:

Computer architecture has experienced several revolutionary advances in the past decade that have increased performance for users in a number of ways.

One important feature of modern architectures that greatly improves performance for intensive tasks such as gaming or video rendering is the combination of multicore processors and hyperthreading. First, multicore processors places multiple computing “cores” onto a single chip which greatly increases the throughput of programs by allowing instructions to be simultaneously executed. While this type of architecture requires extra handling from the operating system and software, it can quickly enhance performance if executed properly. In fact, “a dual-core chip running multiple applications is about 1.5 times faster than a chip with just one comparable core (Geer 1).”

Unfortunately, adding the hardware for multiple cores can be costly, thus enhancing the performance of a single core can be quite valuable. Hyperthreaded processors queue up instructions of different threads so that the processor is never left idle while the next instruction is found. Hyperthreading was introduced by Intel with their Pentium 4 line of processors and have allowed processors to “[show] speedups of individual benchmarks of up to 30 to 40%… compared to sequential execution (Bulpin 9).” While hyperthreading provides a sense of parallel computing with just a single core, it seems that experimentally, “throughput is always higher with multicore processing than with Hyperthreading (Bulpin 9).” While having multiple cores is preferable, combining multiple processors and hyperthreading has greatly boosted performance on intensive processes and has drastically increased computing power over the past decade.

At the external storage level, solid state drives have also increased the speed of file IO and disk operations. Traditional hard disks were bulky, required defragmentation, and are far slower at reading and writing than modern solid state drives due to the requirement of moving components. A principal advantage of solid state drives over hard disk drives is the ability to access data in random access time whereas with hard disks, the data locality oh the previous drive access affects the next access. This is confirmed by results by experiments that compared SSDs to HDDs and concluded that “all… SSDs significantly outperform the HDD (0.54MB/sec) by over 31 times higher bandwidths (Chen 4).” These experiments also concluded that the latency for reads and writes for SSDs have mostly uniform latency. Unfortunately, SSDs are more expensive per gigabyte and usually don’t offer as much space as a single HDD. Although, limited space on SSDs is more prominently being used to store system files and applications rather than photos, music, and user files. This makes general user operations on the computer load and execute up to 4 times faster.

Finally, over the past decade the inclusion of graphical processing units along with CPUs has become more prominent. GPUs offer a number of advantages over CPUs. Namely, while CPUs may usually have 2-6 cores, GPUs have over a hundred cores that process thousands of threads. The inclusion of GPUs has been able to speed up software nearly 100-fold in several instances. GPUs are best suited for users that plan to do lots of gaming or video rendering. However, a GPU is not a replacement for a CPU because each of the individual cores in a GPU are slower than that of a CPU. For this reason GPUs are good at performing intensive and repetitive calculations over and over again such as rendering because the process of rendering a series of images will be the same for each one. However, the more complex calculations that won’t be done over and over again (everyday computing use) will not be done in an intensive or repetitive manner. Here it is preferable to have the speed that a CPUs cores offer.

These are only a few of the major advancements that computer architecture has experienced in the past decade, but it’s clear that speed, efficiency, and computing power have massively increased due to these innovations.

Citations

Bulpin, James. “Multiprogramming Performance of the Pentium 4 with Hyper-Threading.” (n.d.): n. pag. Web. http://pharm.ece.wisc.edu/wddd/2004/06_bulpin.pdf

Chen, Feng. “Understanding Intrinsic Characteristics and System Implications of Flash Memory Based Solid State Drives.” N.p., n.d. Web. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.151.8491&rep=rep1&type=pdf

Geer, David. “Chip Makers Turn to Multicore Processors.” N.p., n.d. Web. http://www.itu.dk/~jhh/thesis/references/36%20-%20Chip%20Makers%20Turn%20to%20Multicore%20Processors.pdf

First PC Build!

I’ve decided to spend some of the money that I’m making at my internship on my first PC rig.

PC Components

I’m still debating the components, but these are the parts I’m interested in for now:

I’ll probably have it dual boot with linux and windows. I’m still deciding on the linux distro. I think since I’m building the computer and aiming for customizability, I should use Arch Linux. It’ll probably have a learning curve, but I think I can get it to have a pretty sweet customization.

I’m not really a gamer (although I think this build will handle games fairly well), so I didn’t spring for a separate GPU or a really fancy graphics card, but I think the sizable investment in an i7 will be a good investment since Intel has some awesome hyperthreading features. I am still looking into some AMD processors though because they seem to be a lot cheaper and some of them even offer more cores, but I’ll have to do some more research here as well.

I don’t plan on starting the build until I move into my new place for the summer, but once I do I’ll make a post about the actual build process.

Side Note: I took an old rig that my family had lying around to my dorm after spring break this year and booted up Debian Wheezy on it. I tested it at home, and despite it’s pretty terrible specs, I was excited to have it in Austin. For some reason, the computer refuses to detect any network adapter though (I’ve tried like 5 now) and will only work with an ethernet connection. Unfortunately, while I had an ethernet port at home, my terrible dorm (side side note: if you attend UT, you should avoid University Towers like the plague) has no ethernet ports, so I’ve basically had a fairly bulky and useless computer taking up space on my desk.

Not that that’s the reason I want to spend $800 on a new rig, but I figure it’s a good investment to make as a CS major. For one thing, I think very few programmers want to code on their laptop for their whole career, but I also think that building my own PC will be a really fun and rewarding learning experience.

Lastly, as general advice that extends beyond just building a PC, if you ever are in a position where you think some tool or equipment is expensive, ask yourself if whatever this purchase is a good investment in yourself. By that, I mean that if the product of interest will help you improve your craft, then in my opinion, 9 times out of 10, it’s worth the buy. Of course, “your craft” should not be something that you’ve been trying for like 2 days, and you should try to get the best bang for your buck (hence why I did not go with this $25,000 PC build… also because I can only wish I had that kind of money).

But generally, investing in your productivity is a good idea, and I think that if I didn’t feel like I was using this computer to a level that justifies the price tag, I’d feel pretty guilty. So in that way, I’d be more motivated to work on more projects and generally use it to code as much as I can. Besides, I primarily use my internship money to buy food.

Quantum Computing

Quantum computing is a topic that’s both frightening and extremely interesting for computer science majors. It’s an exciting field in computing because it’s a completely new paradigm that could help us find solutions to problems that are too difficult to solve with modern computers. However, it’s scary for programmers because it has the potential to put all off us out of work and render our skills useless.

Quantum computing is a very complicated topic, but at its core it leverages some quantum mechanical properties to change a fundamental convention in modern computers. That fundamental convention is the idea of bits. In a traditional computer, we define the state of a program through digital information that is encoded in binary numbers, a series of 1s and 0s that describes at a very low level everything from what our screen looks like, to what words appear on our screen.

The quantum computer’s corollary to a bit is a qubit. A qubit challenges the idea that a bit must either represent 1 or 0 and uses a principle of quantum mechanics known as superposition to let the value equal both 1 and a 0 at the same time. This curious property helps quantum computers perform their calculations faster by providing quicker access to answer amongst several billion possibilities. A bit gets its value by detecting a voltage, but a qubit’s value is determined by a phosphorus electron’s spin. This video flushes out the concept pretty well:

So while I joked earlier that quantum computers may put computer scientists out of work, that’s definitely not true in the near future. The reason quantum computing is so attractive right now, and the reason that companies like Google and IBM are willing to spend millions researching it, is because it is faster in certain applications such as factoring large numbers (which would make our current RSA encryption systems fairly obsolete) and creating and operating on enormous data sets (probably a huge goal of projects like Google D-wave).

Here’s one more video that does a really awesome job explaining some of the more intricate details of quantum computing (with cartoons!). This YouTube channel has a lot of really cool animated videos that I’ll probably use in later posts too.

I had been planning to write about quantum computing for my first real post for a few days now, but it seems like I got the timing down really well because just today, IBM scientists announced that they’ve made some notable advances in quantum computing. The nature of quantum computing makes debugging very difficult because we can’t observe the intermediate steps in a quantum process because for some very interesting reason our mere observation of the system would kill the process (something I might explain in more depth in a later post). However, scientists at IBM have come up with a way to deal with the problem of not being able to detect two types of errors simultaneously by having the computer correct its own errors.

http://phys.org/news/2015-04-scientists-critical-quantum.html

So this is a really novel field of research and while it’s rapidly advancing technology that has even won a Nobel Prize in Physics, it’s unlikely that they’ll become as ubiquitous as PCs are currently in our lifetime. However, that’s all the more reason work harder at breaking expectations and making revolutionary innovations!