import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import gensim

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer,TfidfTransformer
import sklearn.feature_extraction.text as text 
from sklearn.linear_model import LogisticRegression
from sklearn import decomposition

from IPython.display import Image
from IPython.display import HTML
from IPython.core import display

%matplotlib inline

from pylab import rcParams

rcParams['figure.figsize'] = 15, 10
 
plt.style.use('fivethirtyeight')

# This line will hide code by default when the notebook is exported as HTML
display.display_html('''<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0)
                { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>''', raw=True)

def output_columns(df,rounding=2):
    '''
    Changes columns name from "python_naming" to "Output Names"
    '''
    
    #keep as a dataframe to perform vectorized string operations
    names = pd.DataFrame(df.columns)
    names.columns= ['temp']
    names = pd.DataFrame(names.temp.str.replace("_", " "))
    df.columns = list(names.temp.str.title().str.replace('usv','USV',case=False))
    return df

posts = pd.read_csv('data/usv_posts_cleaned.csv',encoding='utf8',parse_dates=True)
posters = pd.read_csv("data/usv_posters_cleaned.csv",encoding="utf8")

posts.date_created = pd.to_datetime(posts.date_created)
posters['relation_to_USV'] =  np.where((posters.ever_usver)&(posters.is_usver==False),'Former USVer','Civilian')
posters.ix[posters.is_usver,'relation_to_USV'] = "Current USVer"

What is this?¶

This is an analysis of USV.com post data from inception to Feb 2015. For a few years, USV.com was my favorite place to hang out on the internet. There was once a section on the website called "Conversation". It looked like this:
Any user could submit links to an article. The community would comment and upvote them. The links were ranked by comments, upvotes and time submitted. This started as a Hack Day project to understand who was contributing value to the community and topics the community liked to discuss.

There are three sections:

Poster segments: Identify poster segments, who is in them and how they compare against each other.
Post Patterns: Words and topics that are popular with the community, Trends in posting time.
Poster Profiles Words and topics that specific posters prefer.

The data set of posts includes:

the poster's twitter handle
the post title
the user submitted description (body text)
the time of the post
who upvoted it
comments and upvotes received

This is all publicly scrapable, but was given to me during a Hack Day.

Below is an example of the post data set, sorted by comment count:

output_columns(posts.sort_values('comment_count',ascending=False)).head()

Poster Segments¶

Current and Former USVers¶

Here are the stats for all current USVers and alumni:

output_columns(posters.ix[posters.ever_usver,poster_metrics].sort_values(['relation_to_USV','post_count']))

# Create a csv of all USVers
output_columns(posters.loc[posters.ever_usver==True].sort_values('conversations_sparked',ascending=False)).to_csv("USVers by the Numbers.csv",index=False)

Infrequent, but High Value Posters¶

This is a segment of posters that post infrequently, but have high per post engagment. This is a group that the community wants to hear more from. The cutoffs are:

Averages more than 2 comments per post
Posted between 5 and 15 times

These cutoffs, like all cutoffs, are somewhat arbitrary. They felt directionally correct to me.

output_columns(posters.ix[(posters.mean_comments>2)&
           (posters.post_count>=5)&
           (posters.post_count<=15),poster_metrics].sort_values('mean_comments'))

# Uncoment and move up to output as a CSV
# .to_csv('Occasional, but Valuable Posters.csv',index=False)

posts = posts.merge(posters[['poster','is_usver','ever_usver']],on='poster',how='left')

USVers vs Civilians¶

This is the average number of conversations sparked by posters based on their relationship to USV:

sns.barplot('relation_to_USV','conversations_sparked',data=posters, palette="Blues_d",estimator=np.mean,ci=None)
sns.plt.title("Current and Former USVers vs Civilians")
sns.axlabel("Relationship to USV","Average Conversations Sparked")
plt.ylim(0,35)
sns.plt.savefig("USVers vs Non-USVers (Top Posts).png", dpi=100)

Current USVers tend to spark more conversations. Could that be because the civilian category is weighed down by inactive accounts?

Here is amount of current USVers, former USVers and Civilains who have posted.

posters.groupby('relation_to_USV').count()[['poster']].rename(columns={'poster':'Number of Unique Posters'})

There are a lot more Civialians. Perhaps, it's not surpising that their average was low. Below is the same plot, except it gets the maxium conversations sparked, instead of the average for each group.

sns.barplot('relation_to_USV','conversations_sparked',data=posters, palette="Blues_d",estimator=np.max,ci=None)
sns.plt.title("Current and Former USVers vs Civilians")
sns.axlabel("Has Worked For USV","Max Conversations Sparked in Each Group")
plt.ylim(0,35)
sns.plt.savefig("USVers vs Non-USVers (Top Posts).png", dpi=100)

As you can see, some Civilians (or at least one) have sparked a comparable number of discussions to the USVers. The average USVers does spark more conversations than the average civilian. However, the most active civilians are comparable to USVers.

Post Patterns¶

Post Enagagement by Time Posted¶

Does it matter what time of day a post is made?

posts.groupby(posts.date_created.dt.hour).mean()[['upvotes','comment_count']].plot(legend=True)
plt.ylim(0,5)
plt.title('USV.com -- Average Upvotes and Comments by Hour of the Day')
plt.xlabel('Hour of the Day')
plt.ylabel('Average Count')

<matplotlib.text.Text at 0x1104a6c50>

There is not a strong pattern between when something is posted and the number of upvotes and comments it recieves.

This is what the same graph looks like for Product Hunt (data pulled from their API):

Image(filename='images/Counts-and-Upvotes-by-Hour-Product-Hunt.png')

Product Hunt has s strong pattern based on time of day. This is because Product Hunt resets every day. USV.com does not.

This could indicate an opportunity. If USV.com built in a predictable time cycle for new posts, it could lead to habitual visits, posts and discussions. Reddit and Hacker News show you don't need a daily leaderboard that restart at midnight. However, people should expect to see new content at some time interval. AVC does this very well.

##
# posts.set_index('date_created')['comment_count'].resample('M').count()[:-1].plot()
# That is the evolution of montly posts over time. USV.com was primarly a blog for occasional 
# posts from 2006 - 2013. You can see when USV.com allowed anyone to submit links. By number of 
# posts, this peaked in 2013. However, I think number of posts is the wrong metric.

Posts The Community Likes To Discuss¶

First, we must define "likes to discuss". One way to define it is by number of comments. Here is the distribution of posts by the number of comments recieved:

posts.comment_count.loc[posts.comment_count>=0].hist(bins=250)
plt.xlabel('Number of Comments')
plt.ylabel('Number of Posts')
plt.title('Distribution of Comments')
plt.savefig('Distribution of Comments.png')
plt.xlim(0,50)
print "Comment Count Skew: " + str(posts.comment_count.skew().round(2))

Comment Count Skew: 24.05

As can be seen above, most posts get 0 or very few comments. The skew is very postive (23+). Let's define a popular post as one that sparks a discussion by getting at least 5 comments.

# posts.comment_count.value_counts(normalize=True).round(4).sort_index()*100
# That is the distribution of comments. As you can see, 73% of posts got no comments. 90% got 3 or fewer comments.

top5_cutoff = posts.comment_count.quantile(.95)
# print "The top 5%% of posts got at least %s comments so that will be the cutoff." %int(top5_cutoff)

posts['sparked_conversation'] = posts.comment_count>top5_cutoff
posts['got_comments'] = posts.comment_count>0

posts[['title','body_text']] = posts[['title','body_text']].fillna('')

Optional Explanation of Math

Now, I will use the TF-IDF* weighting of one and two word phrases that appear in at least 20 posts titles. TF-IDF, is just the word counts in each title relative to how rare the words are across all titles.

vec = CountVectorizer(ngram_range=(1,2),min_df=20,stop_words='english')
X_words = vec.fit_transform(posts.title)

transformer = TfidfTransformer()
tfidf = transformer.fit_transform(X_words)

Next, I fit a logistic regression model using l1 regularization. This uses the tfidf words to predict whether or not the post will spark a conversation. Logistic regression is not the most predictive model, but it is interpretable, which is the goal of this.

Note that data set is small (relative to the number of features), which makes it hard to do proper cross validation. Instead, I am using regularization and requiring that a phrase appears 20+ times to reduce overfitting. Even so, I wouldn't try to make predictions with this model.

y = posts.sparked_conversation
model = LogisticRegression(penalty='l2')
model.fit(tfidf,y)
print

vocab = zip(vec.get_feature_names(),
            model.coef_[0])

df_vocab = pd.DataFrame(vocab)

Popular Title Words

The higher the coef, the more likely a word is to spark a conversation. Here are the top 20 words:

df_vocab.columns = ['word','coef']
df_vocab.sort_values('coef',ascending=False).head(20)

Topics on USV.com¶

Above we looked at the titles. Now, let's check out the post description. By using non-negative matrix factorization, I tried to pull out topics.

The post descriptions have too few words for perfectly reliable topic models. Many topics were sensitive to hyperparamters. Even with all of the noise, there were still some topics that stood out.

Below are some topics and their top 10 words. I did not use tags to find these. This came solely from which words appear together in descriptions.

posts['body_text_raw'] = posts.body_text
posts['body_text_clean'] = posts.body_text.str.replace(r'\[a-z][a-z][1-9]\[a-z][a-z][a-z]', '', case=False)
posts['body_text_clean'] = posts.body_text_clean.str.replace('\'', '', case=False)
posts['body_text_clean'] = posts.body_text_clean.str.replace('[a-z]/[a-z]', '', case=False)

custom_stopwords= ['looks','look','read','great','good','dont','really','done','kik','lets',
           'http','let','just','that','thats','like','lot','interesting','think','im',
           'thought','thoughts','id','love','twitter']

my_stop_words = text.ENGLISH_STOP_WORDS.union(custom_stopwords)

# This step performs the vectorization,
# tf-idf, stop word extraction, and normalization.
# It assumes docs is a Python list,
#with reviews as its elements.
cv = TfidfVectorizer(ngram_range=[1,1],max_df=0.6, min_df=4,stop_words=my_stop_words)
doc_term_matrix = cv.fit_transform(posts.body_text_clean)
 
# The tokens can be extracted as:
vocab = cv.get_feature_names()

#trial and error got me to 45
num_topics = 45
#doctopic is the W matrix
decomp = decomposition.NMF(n_components = num_topics, random_state=50,init = 'nndsvda')
doctopic = decomp.fit_transform(doc_term_matrix)

n_top_words = 10
topic_words = []
for topic in decomp.components_:
    idx = np.argsort(topic)[::-1][0:n_top_words]
    topic_words.append([vocab[i] for i in idx])

topic_names = [
    "Web Services", "Bitcoin and Blockchain","AVC or Continuations","Customer Success",
    "Mobile","USV Community","Startup Ecosystems","Data Privacy and Security",0,"Net Neutrality",1,"Long Read",2,
    "Venture Capital",3,"Tech Job Market","HTML Tags","Internt Access",4,"Markets","Test Post",
    "Big 4 Tech Co's","Linux & Cloud","Business Models","App Store",5,6,7,8,9,10,11,"iOS vs Android",12,
    "AVC Posts",13,14,"Community Feedback","Technology and Patents","Video","Startup Building",15,
    "Open Source","Product Development",16
              ]

#outputs all named topics and its top 10 words
for count,i in enumerate(topic_words):
    if isinstance(topic_names[count],int):
        pass
    else:
        print "Topic: %s"%topic_names[count]
        print "Top words: " + ", ".join(i)
        print

Topic: Web Services
Top words: web, services, service, users, social, network, information, networks, content, media

Topic: Bitcoin and Blockchain
Top words: bitcoin, currency, blockchain, exchange, coinbase, money, mining, transactions, digital, value

Topic: AVC or Continuations
Top words: post, blog, alberts, freds, wrote, rand, comments, founder, earlier, news

Topic: Customer Success
Top words: customer, success, customers, gainsight, management, service, important, marketing, satisfaction, successful

Topic: Mobile
Top words: mobile, app, design, facebook, future, device, experience, devices, market, messaging

Topic: USV Community
Top words: usv, com, posting, cross, www, avc, mikecollett, fred, list, team

Topic: Startup Ecosystems
Top words: startup, ecosystem, build, life, hear, story, mistakes, founders, key, thinking

Topic: Data Privacy and Security
Top words: data, privacy, users, security, science, market, messenger, point, visualization, using

Topic: Net Neutrality
Top words: net, neutrality, fight, fcc, different, case, piece, marc, thing, end

Topic: Long Read
Top words: nice, essay, analysis, interview, kickstarter, amazon, piece, transparency, equity, write

Topic: Venture Capital
Top words: startups, invest, vcs, investors, advice, growth, businesses, founders, money, point

Topic: Tech Job Market
Top words: tech, job, nyc, women, culture, future, talks, innovation, ny, chicago

Topic: HTML Tags
Top words: div, class, story, review, section, media, social, gives, footer, image

Topic: Internt Access
Top words: internet, things, access, online, security, future, fcc, privacy, freedom, public

Topic: Markets
Top words: android, microsoft, apple, ios, windows, patent, operating, phone, end, market

Topic: Test Post
Top words: testing, security, usvs, real, conversation, simulations, platform, list, cloudflare, albert

Topic: Big 4 Tech Co's
Top words: google, search, results, facebook, https, apple, nsa, amazon, car, reading

Topic: Linux & Cloud
Top words: cloud, linux, red, hat, enterprise, openstack, latest, security, ubuntu, storage

Topic: Business Models
Top words: business, model, start, help, analysis, models, end, small, businesses, simple

Topic: App Store
Top words: apps, list, native, platform, social, different, million, users, developers, wonder

Topic: iOS vs Android
Top words: company, building, portfolio, team, start, things, culture, run, amazing, acquired

Topic: AVC Posts
Top words: today, talk, avc, wrote, amazing, amazon, share, brief, youtube, announced

Topic: Community Feedback
Top words: community, feedback, value, ownership, sharing, online, building, interested, equity, invite

Topic: Technology and Patents
Top words: technology, software, innovation, patent, patents, piece, disruptive, peer, impact, education

Topic: Video
Top words: video, content, interview, marketing, media, watch, music, youtube, week, creators

Topic: Startup Building
Top words: companies, capital, investors, venture, market, vc, investment, angel, investing, funding

Topic: Open Source
Top words: open, source, software, project, platform, foundation, yes, code, projects, networking

Topic: Product Development
Top words: product, building, products, amazing, hunt, team, design, important, growth, user

posts['clean_upvotes']= posts.voted_users.str.replace('^u|\'|\[|\]','')
users_votes = posts.clean_upvotes.str.get_dummies(",")

users_votes.columns = users_votes.columns.to_series().str.replace('^u','')

posts = pd.concat([posts,users_votes],axis=1)

model = LogisticRegression(penalty='l2')
y = posts.sparked_conversation
model.fit(doctopic,y)
print

This is how well a topic predicts if the community will discuss it. This is grossly simplifying. Topic popularity should changes over time. Different user segments probably have different preferences. Even so, it's quite interesting.

community_topics = pd.DataFrame([pd.Series(topic_names),model.coef_[0]]).T.sort_values(1,ascending=False)
community_topics.columns=["Topic","Coefficent"]
community_topics[community_topics.Topic.map(lambda x: type(x))!=float]

Finding Posts on a Topic¶

We can use these topic models to find posts on a topic. For example, "Bitcoin and Blockchain" is a clear topic. These are the "most Bitcoin" posts. These posts are sorted by how strongly it's description is associated the Bitcoin Topic. This could be used for a recommendation engine. However, not all topics came through as cleanly as Bitcoin.

doctopic_df = pd.DataFrame(doctopic)
doctopic_df.columns = topic_names
posts_topics = pd.concat([posts,doctopic_df],axis=1)

topic = "Bitcoin and Blockchain"
posts_topics.sort_values( topic,ascending=False)[['poster','title',"body_text",topic]].head(20)

Poster Profiles¶

Here I use the title and topics to predict if a poster will upvote or share something.

That creates a profile of topics and title words a poster likely finds interesting.

Here are some examples:

# Remake the tfidf word matrix with a threshold of 10 instead of 20 counts. 
vec_user_profiles = TfidfVectorizer(ngram_range=(1,1),min_df=10,stop_words='english')
X_words = vec_user_profiles.fit_transform(posts.title)

def get_user_profile(user):
    model_topics = LogisticRegression(penalty='l2')
    y = users_votes[user]
    model_topics.fit(doctopic,y)
    
    vocab_topics = zip(topic_names,model_topics.coef_[0])
    df_vocab_topics = pd.DataFrame(vocab_topics)
    df_vocab_topics.columns = ['topic','coef']
    df_vocab_topics = df_vocab_topics[(df_vocab_topics.coef>0)&
                                      (df_vocab_topics.topic.map(lambda x: type(x))!=int)]
    
    model_words = LogisticRegression(penalty='l2')
    y = users_votes[user]
    model_words.fit(X_words,y)
    
    vocab = zip(vec_user_profiles.get_feature_names(),model_words.coef_[0])

    df_vocab = pd.DataFrame(vocab)
    df_vocab.columns = ['word','coef']
    df_vocab = df_vocab[df_vocab.coef>0]
    
    print user
    print
    print "Favorite Topics: " +\
    ", ".join(df_vocab_topics.sort_values('coef',ascending=False).head(10)['topic'])
    print 
    print "Favorite Words: " +\
    ", ".join(df_vocab.sort_values('coef',ascending=False).head(25)['word'])
    return None

get_user_profile('nickgrossman')

nickgrossman

Favorite Topics: Net Neutrality, Internt Access, Data Privacy and Security, USV Community, AVC Posts, Video, Test Post, Open Source, HTML Tags, Markets

Favorite Words: nytimes, comcast, fcc, techdirt, hunch, ignore, test, grossman, neutrality, nick, slow, internet, obama, privacy, anti, washington, health, verizon, surveillance, cities, uber, free, snowden, trust, policy

get_user_profile('albertwenger')

albertwenger

Favorite Topics: Open Source, Web Services, Technology and Patents, Startup Building, Internt Access, Net Neutrality, AVC Posts, Data Privacy and Security, AVC or Continuations, Mobile

Favorite Words: continuations, hiring, foursquare, com, wired, update, human, medium, wattpad, revolution, longer, hunt, org, brand, income, computer, age, 500, network, analysis, notes, mit, marketplaces, news, platform

get_user_profile('BenedictEvans')

BenedictEvans

Favorite Topics: Markets, Mobile, Internt Access, Tech Job Market, Product Development, Big 4 Tech Co's

Favorite Words: android, benedict, price, really, mobile, twitter, instagram, self, iphone, youtube, use, scale, evans, dead, facebook, foundation, value, technology, networks, does, platform, social, google

get_user_profile('fredwilson')

fredwilson

Favorite Topics: AVC or Continuations, AVC Posts, USV Community, Startup Building, Bitcoin and Blockchain, iOS vs Android, Long Read, Web Services, Business Models, Net Neutrality

Favorite Words: avc, vc, cliche, techcrunch, panel, continuations, coinbase, week, kik, kickstarter, wsj, talk, friday, blog, albert, evans, twitter, benedict, foursquare, looking, thoughts, capitalism, sessions, atlantic, duckduckgo

get_user_profile('aweissman')

aweissman

Favorite Topics: Long Read, Data Privacy and Security, App Store, Big 4 Tech Co's, Open Source, Internt Access, Startup Building, Mobile, USV Community

Favorite Words: music, yorker, medium, youtube, blog, angellist, com, peer, banks, app, evans, circle, ben, fiber, code, life, indie, funding, ve, beat, billion, bloomberg, free, economist, watch

get_user_profile('jmonegro')
print 
print "The words worked much better than the topics for Joel"

jmonegro

Favorite Topics: 

Favorite Words: decentralized, blockchain, plan, distributed, wireless, communication, secure, messaging, based, market, 000, paypal, firm, paid, devices, bitcoin, post, stock, hard, email, change, sharing, trends, time, washington

The words worked much better than the topics for Joel

get_user_profile('pointsnfigures')

pointsnfigures

Favorite Topics: Bitcoin and Blockchain, Tech Job Market, Startup Ecosystems, Venture Capital

Favorite Words: chicago, drones, entrepreneurship, tech, good, bitcoin, football, women, robots, corporate, trading, robot, life, competition, federal, old, big, drone, act, make, invest, angel, social, times, wrong

get_user_profile('kidmercury')

kidmercury

Favorite Topics: Bitcoin and Blockchain, Big 4 Tech Co's, Technology and Patents, Internt Access, Community Feedback, Net Neutrality, Video, Markets, Data Privacy and Security, Mobile

Favorite Words: verge, hedge, techcrunch, com, news, amazon, zero, insider, state, bitcoin, technology, disruptive, nsa, currency, code, bank, man, google, says, computers, wants, samsung, china, bloomberg, wsj

	Title	Poster	Date Created	Upvotes	Comment Count	Voted Users	Body Text	Body Text Raw	Body Text Clean
1193	Bitcoin As Protocol	albertwenger	2013-10-31 16:00:51.061	42	216	[u'albertwenger', u'aweissman', u'ppearlman', ...	We owe many of the innovations that we use eve...	We owe many of the innovations that we use eve...	We owe many of the innovations that we use eve...
2540	Winning on Trust	nickgrossman	2013-12-24 07:15:10.488	21	88	[u'nickgrossman', u'aweissman', u'albertwenger...	Thoughts on why trust will be central to #winn...	Thoughts on why trust will be central to #winn...	Thoughts on why trust will be central to #winn...
1807	Vote vs. Like vs. Favorite vs. INSERT VERB HERE	falicon	2013-11-22 14:28:30.619	12	67	[u'falicon', u'nickgrossman', u'kidmercury', u...	Throwing the question about what to call the u...	Throwing the question about what to call the u...	Throwing the question about what to call the u...
3045	Feedback wanted: new look front page at usv.com	nickgrossman	2014-01-21 08:59:51.998	11	67	[u'nickgrossman', u'ron_miller', u'julien51', ...	Hi Everyone --\nLast week we started experimen...	Hi Everyone --\nLast week we started experimen...	Hi Everyone --\nLast week we started experimen...
4178	To Dare is To Do	billmcneely	2014-03-17 06:08:44.918	33	62	[u'billmcneely', u'annelibby', u'LonnyLot', u'...	Yesterday I was watching a proper football mat...	Yesterday I was watching a proper football mat...	Yesterday I was watching a proper football mat...

	Poster	Post Count	Mean Comments	Conversations Sparked	Percent Of Posts With Comments	Relation To USV
257	aweissman	452	1.52	45	0.43	Current USVer
381	fredwilson	542	1.46	43	0.39	Current USVer
543	nickgrossman	295	2.19	37	0.43	Current USVer
700	wmougayar	394	1.09	30	0.31	Civilian
235	albertwenger	143	2.97	19	0.39	Current USVer
475	kidmercury	277	0.94	17	0.29	Civilian
567	pointsnfigures	574	0.47	16	0.17	Civilian
282	bwats	99	1.90	15	0.36	Former USVer
367	falicon	58	4.26	12	0.52	Civilian
450	jmonegro	178	0.87	11	0.32	Current USVer

	Poster	Post Count	Mean Comments	Conversations Sparked	Percent Of Posts With Comments	Relation To USV
455	johnbuttrick	33	1.42	2	0.42	Current USVer
278	br_ttany	79	0.91	6	0.27	Current USVer
490	libovness	89	1.58	9	0.36	Current USVer
33	BradUSV	94	1.24	10	0.15	Current USVer
235	albertwenger	143	2.97	19	0.39	Current USVer
450	jmonegro	178	0.87	11	0.32	Current USVer
543	nickgrossman	295	2.19	37	0.43	Current USVer
257	aweissman	452	1.52	45	0.43	Current USVer
381	fredwilson	542	1.46	43	0.39	Current USVer
67	EricFriedman	5	0.00	0	0.00	Former USVer
292	ceonyc	5	0.00	0	0.00	Former USVer
296	christinacaci	17	2.53	3	0.47	Former USVer
385	garychou	18	4.00	3	0.50	Former USVer
11	AlexanderPease	49	1.53	3	0.39	Former USVer
282	bwats	99	1.90	15	0.36	Former USVer

	Poster	Post Count	Mean Comments	Conversations Sparked	Percent Of Posts With Comments	Relation To USV
27	BenedictEvans	13	2.08	2	0.38	Civilian
349	ebellity	10	2.20	2	0.60	Civilian
293	cezinho	6	2.33	1	0.50	Civilian
594	rrhoover	14	2.43	3	0.64	Civilian
283	bwertz	9	2.67	3	0.67	Civilian
559	patrickjmorris	11	2.73	2	0.45	Civilian
229	adsy_me	12	2.75	1	0.42	Civilian
496	manuelmolina	6	2.83	1	0.83	Civilian
673	tomcritchlow	12	2.92	3	0.58	Civilian
456	johnfazzolari	5	3.00	1	0.40	Civilian
77	GeoffreyWeg	10	3.30	3	0.50	Civilian
323	davewiner	7	3.57	2	0.71	Civilian
522	moot	10	3.70	4	0.80	Civilian
568	ppearlman	11	3.73	2	0.27	Civilian
599	ryaneshea	5	4.80	2	0.80	Civilian
696	whitneymcn	5	5.00	2	0.40	Civilian
135	MsPseudolus	15	6.33	4	0.60	Civilian
460	jordancooper	7	6.57	2	0.57	Civilian

	word	coef
315	usv	2.772766
147	introducing	1.483714
20	ask usv	1.309939
264	self	1.247119
43	car	1.203155
53	code	1.181710
89	economy	1.093197
275	snapchat	1.079421
168	let	1.054378
80	did	1.043933
306	trust	1.012399
216	open	0.995027
328	vs	0.971555
44	case	0.968341
30	bitcoin	0.965480
64	crowdfunding	0.961925
270	silicon	0.959581
163	lead	0.949660
82	disruption	0.930395
92	employees	0.894577

	Topic	Coefficent
5	USV Community	2.79843
1	Bitcoin and Blockchain	2.09516
2	AVC or Continuations	2.0201
37	Community Feedback	1.0116
24	App Store	0.85171
4	Mobile	0.708065
17	Internt Access	0.5402
34	AVC Posts	0.470501
0	Web Services	0.452506
38	Technology and Patents	0.353975
43	Product Development	0.262379
42	Open Source	0.197744
21	Big 4 Tech Co's	0.191889
40	Startup Building	0.134794
39	Video	0.115053
13	Venture Capital	0.0783075
9	Net Neutrality	0.0504006
6	Startup Ecosystems	0.0124453
23	Business Models	-0.121243
11	Long Read	-0.123889
19	Markets	-0.135989
32	iOS vs Android	-0.159303
20	Test Post	-0.198601
15	Tech Job Market	-0.270777
7	Data Privacy and Security	-0.282444
16	HTML Tags	-0.289521
3	Customer Success	-0.339316
22	Linux & Cloud	-0.600329

	poster	title	body_text	Bitcoin and Blockchain
3683	TomLabus	Nakamoto Bitcoin Defense	In Bitcoin	0.531392
4210	pointsnfigures	Burger King to Add Mobile Payments on Cell Phones	precursor to Bitcoin Burgers?	0.531392
2889	fredwilson	A VC: Bitcoin - Getting Past Store Of Value an...	Some thoughts on where we go next with bitcoin	0.531392
1723	kidmercury	Bitcoin From Over $900 to Under $540 in Less t...	#bitcoin #hft	0.531392
5349	kidmercury	Bots were responsible for bitcoin’s stratosphe...	The problems of bitcoin anarchy	0.338237
7762	kidmercury	Why Bitcoin Matters (Mini-Documentary)	a good dose of bitcoin propaganda is just what...	0.331816
4579	wmougayar	Search Engine DuckDuckGo Integrates Bitcoin Pr...	DDG goes Bitcoin	0.329520
248	fredwilson	Coinbase	We have been thinking about and looking to mak...	0.309345
5532	N_Clemmons	Coinsis: Bitcoin Credit card [DEMO VIDEO]	Send Bitcoin by email\nPay with a bitcoin cred...	0.299673
1303	wmougayar	You can now buy a car with Bitcoin in Australia	2 weeks ago, it was "you can buy a beer in Ams...	0.296406
7321	albertwenger	BitQuest - The first minecraft server with bit...	Interesting use case for Bitcoin	0.293877
6733	pointsnfigures	If You Use Facebook, Yelp, Reddit, You Should ...	interesting thought about how to use bitcoin b...	0.289755
4937	fredwilson	The Pied Piper Effect – AVC	Some thoughts on MIT and Bitcoin	0.287514
1477	aweissman	Twitter / marcprecipice: Corner bodega, Brookl...	10% off if you pay with Bitcoin	0.285954
3385	pointsnfigures	Bitcoin: Store of Value?	is bitcoin a store of value?	0.283853
5975	EllieAsksWhy	Just a Little Bit More Bitcoin Trouble	There has been so much tumult in the bitcoin a...	0.278721
4836	christinacaci	Why in Satoshi's name would you want a bitcoin?	It seemed bitcoin could stand to be a little m...	0.278001
5215	wmougayar	What Block Chain Analysis Tells Us About Bitcoin	Includes some interesting graphs on bitcoin de...	0.274111
3563	pointsnfigures	Good Sign For Future of Bitcoin	10% of all porn paid for with Bitcoin.	0.272462
1345	nickgrossman	DarkWallet Aims To Be The Anarchist's Bitcoin ...	DarkWallet is an effort to further anonymize b...	0.269487