ML Exploration: Titanic Dataset

In summer 2019 I blogged about how I was taking a couple months to work on Machine Learning.

Since then I’ve mostly focused on software for the Mac and server-side development. My ML hands-on knowledge was getting a bit, rusty… Plus, things have evolved a bit: new technologies, new approaches, new concepts… Perfect timing as the new edition of the Hands-On ML with Sikit, Keras and TensorFlow book was recently released. 

I’ll be re-reading it and redoing all exercises. Below you’ll find the first major exercise I completed yesterday, the Titanic dataset. Today I just started a SPAM filtering model, excellent book. 



import pandas as pd
import matplotlib.pyplot as plt

titanic_train_data = pd.read_csv(‘titanicData/train.csv’)

X_train = titanic_train_data.drop(labels=‘Survived’, axis=1).copy()
y_train = titanic_train_data[[‘Survived’]].copy()



#We have a total of 891 entries. Not known for all are:
# -Cabin information is not known for all with 204 entries.
# -Age is not known for all with 714 entries.
# -Embarked is not known for all with 889 entries.

#Key insights:
# – People are quite young with median at 28 and mean at 29
# – Most people where in 2nd or 3rd class.
# – Most people did not travel with siblings or spouses SibSp, Same re. parent or children Parch.
# – Fare changes significantly and could be an indication of quality of the room.
#3 class types.

#From 1 to 8.

#S, C, Q or nan.

#More male than female, 577 male vs 314 female.

#A lot more third than first, funnily enough more 1st than second.
#Plotting split between classes
plt.pie(x=X_train[‘Pclass’].value_counts(), labels=X_train[‘Pclass’].unique(),autopct=‘%1.0f%%’ )

#Plotting where people came in the titanic[‘S’,‘C’,‘Q’] ,height=X_train[‘Embarked’].value_counts())


#Feature engineering, combine Siblings and Spouses together with Children and Parents
#X_train[‘Siblings’] = X_train[‘SibSp’] + X_train[‘Parch’]

#Remove data we won’t be using
#X_train = X_train.drop(columns=[‘PassengerId’, ‘Name’, ‘Ticket’, ‘Cabin’, ‘SibSp’, ‘Parch’])

#Test that it worked correctly
from sklearn.base import BaseEstimator, TransformerMixin

class PrepareData(BaseEstimator, TransformerMixin):
‘Feature engineering, all custom changes are done in this class’
def __init__(self):
def fit(self, X, y=None):
return self
def transform(self, X):
print(f‘About to {len(list(X))} items -> {list(X)})
X[‘Siblings’] = X[‘SibSp’] + X[‘Parch’]
print(f‘Having {len(list(X))} items -> {list(X)})
X = X.drop(columns=[‘PassengerId’, ‘Name’, ‘Ticket’, ‘Cabin’, ‘SibSp’, ‘Parch’])
print(f‘Returning {len(list(X))} items -> {list(X)})
return X

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

num_pipeline = Pipeline(
(‘imputer’, SimpleImputer(strategy=‘median’)),
(‘std_scaler’, StandardScaler())

#Get the headers
X_train_num_cols = [‘Age’, ‘Siblings’, ‘Fare’, ‘Pclass’]
X_train_cat_cols = [‘Sex’, ‘Embarked’]
#Get numberical values and non numerical values
ext_pipeline = ColumnTransformer(
(‘num’, num_pipeline, X_train_num_cols),
(‘cat’, OneHotEncoder(handle_unknown=‘ignore’), X_train_cat_cols)

full_pipeline = Pipeline(
(‘custPrep’, PrepareData()),
(‘ext_pipe’, ext_pipeline)

X_train_prepared = full_pipeline.fit_transform(X_train)

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

neigh_clf = KNeighborsClassifier(n_neighbors=3, n_jobs=-1)
score = cross_val_score(neigh_clf, X_train_prepared, y=y_train.values.ravel(), cv=5)
score.mean() #80% is not bad considering 60% died and 40% survived

#Death rate
from sklearn.model_selection import GridSearchCV

param_grid = [
‘n_neighbors’:[3, 15, 30, 40, 50],
‘leaf_size’: [15, 20, 30, 35, 45],
‘weights’: [‘uniform’, ‘distance’]

neigh_clf = KNeighborsClassifier()
grid_search = GridSearchCV(neigh_clf, param_grid, cv=3, return_train_score=True), y_train.values.ravel())
#{‘leaf_size’: 15, ‘n_neighbors’: 30, ‘weights’: ‘uniform’}
neigh_clf = grid_search.best_estimator_, y_train.values.ravel())

X_test = pd.read_csv(‘titanicData/test.csv’)
#y_test_withId = pd.read_csv(‘titanicData/gender_submission.csv’)
#y_test = y_test_withId.drop(columns=[‘PassengerId’])

X_test_prepared = full_pipeline.transform(X_test)

from sklearn.metrics import accuracy_score
y_test_pred = neigh_clf.predict(X_test_prepared)
#accuracy_score(y_test, y_test_pred) Can’t use as y_test data is fake. Need to submit to kaggle to get the right data


from sklearn import svm

svm_clf = svm.SVC(kernel= ‘poly’), y_train.values.ravel())
y_test_pred = svm_clf.predict(X_test_prepared)
#accuracy_score(y_test, y_test_pred)

#Lets try with linear kernel
svm_clf = svm.SVC(kernel= ‘linear’), y_train.values.ravel())
y_test_pred = svm_clf.predict(X_test_prepared)
#accuracy_score(y_test, y_test_pred)
#We can find as well coeficiants of feature importance

#And confusion matrix
from sklearn.metrics import plot_confusion_matrix
plot_confusion_matrix(svm_clf, X_test_prepared, y_test.values.ravel(),

#And directly calculating numbers and graphing it in a diferent way
from sklearn.metrics import confusion_matrix
#conf_mx = confusion_matrix(y_test, y_test_pred)

y_test_withId = pd.read_csv(‘titanicData/gender_submission.csv’)
y_test_withId[‘Survived’] = y_test_pred
y_test_withId.to_csv(‘submission.csv’, index=False)

NewsWave 2021.5 for Mac & iOS

I’m happy to report that NewsWave 2021.5 for Mac & iOS has been submitted to the App Store.This is a minor update for both apps, focusing on improving stability and addressing minor edge case bugs. 

This includes better handling of posts returning ‘NULL’ as the summary or edge case handling for certain websites, like ‘Engadget’, returning “'” instead of an apostrophe. 

If you have any comments or feedback do reach me @MarcMasVi on Twitter or

Hope you enjoy the update,


Inspecting mac pkg installers

Have you ever wondered what an installer package is up to? Why is the developer not just providing a dmg with an app? Well, wonder no more! Behold… ‘Suspicious Package’

Screen Shot 2021 01 18 at 10 35 00 PM

Yes, the app name is ‘Suspicious Package’ and it’s awesome. Just drop a package on it and it will tell you exactly what the package its up to:

Screen Shot 2021 01 18 at 10 37 20 PM

I wished I knew about this app sooner. Also, its totally free and you can get it from here. Kudos to the great work of the indie developer behind it. 


NewsWave for Mac 2021.01

NewsWave 2021.01 for Mac is live in the App Store. This update is all about Big Sur and Apple Silicon. If you’re on the latest OS, it will improve the app big time. 

Key changes include:

  • NewsWave is now a Universal Binary for Apple Silicon & Intel.
  • Fully compatible with macOS Big Sur: table navigation, cell selection, icon appearance…
  • Several minor improvements & bug fixes, especially around synchronization.
  • Privacy labels so that users know exactly what information is used and for what. 

Screen Shot 2021 01 17 at 5 12 08 PM

I hope you like it! As always, if you have any feedback please do get in touch. 


2020 in review

One of the things I really enjoy about the Christmas break, is how much it contributes to looking at things with perspective. It may be the copious amounts of food, the change in schedule, the time to think…

Whatever the reason, it really helps assessing how things have gone and where to go next. On this post I’ll be focusing on the former. 

Looking back at the roadmap for 2020 I published last year, there were 3 major milestones I was planning for 2020. Here’s the end of the year summary:

  • NewsWave onboarding redesign: improving the onboarding experience to make it seamless without loosing any user features. 
    • In January, the 2020.1 update completely overhauled the onboarding user experience. It effectively made it frictionless through a combination of server-side and app changes. 
  • NewsWave for Mac: releasing a fully featured Mac-native version of NewsWave. 
    • After several months of development, NewsWave for Mac was launched in May 26th. Since then, it has had multiple updates to improve the experience and further refine it based on customer feedback. 
  • Excelling: update its codebase to leverage newer technologies introduced since its launch. 
    • Shortly after the release of the macOS version of NewsWave I decided against rewriting Excelling in SwiftUI & swift yet. The app is still performing correctly and I do not feel the improvements from a rewrite justify the opportunity costs. 

In addition to the above, I started spending more time on non-apple technologies such as Python, LAMP Systems and ML. This will allow me to create better FullStack, multiOS applications and services in the future. 

I’m quite pleased with the progress in 2020 and I’m really excited about all the 2021 possibilities. I’ll be focusing on that part on an upcoming post. 

Side note, if you’re interested in Python, or you’d like to refresh your knowledge I strongly recommend this free course from dabeaz -> 

Comments/feedback? Do reach me @MarcMasVi on Twitter or


NewsWave 2020.4 for Mac

Today I submitted to the App Store what will likely be the final 2020 update, NewsWave 2020.4 for Mac. This is a minor bug fix update to improve unit testing, address bugs & improve UX. 

Key 2020.4 changes include:

– Fixed a bug that could prevent reading position from syncing correctly. 

– Improved UX in several areas, including when a feed has no posts to show. 

– Improved debugging & unit testing. 

Provided all goes well with App Review, it should become available for download in the next couple of days. 

If you have any comments or feedback do reach me @MarcMasVi on Twitter or

Hope you enjoy the update. Until next time, 


NewsWave 2020.3.1 for Mac

Today I submitted NewsWave for Mac 2020.3.1, a minor bug fix update.

I would typically wait a few more weeks to combine more enhancements & fixes but this update addresses a specially elusive and annoying bug. 

If you opened NewsWave for Mac from scratch and immediately moved it to the background (i.e. doing something else while the app fetched new content), the app would -sometimes- not scroll correctly to the latest article you had read.

As with most complex bugs, this seemed to happen at random, making it very ‘fun’ to track down. On top of the conditions above, the bug would only trigger if the user had used another device -i.e. an iPhone- and read newer content. 

In addition to the ‘fun’ bug, this release adds a couple other minor improvements for users that like the ‘Directly opens web page’ setting. Provided there’s no surprises with App Review it should be available in a day or two. 

If you have any comments or feedback do reach me @MarcMasVi on Twitter or

Hope you enjoy the update. Until next time, 


NewsWave 2020.3 for Mac

I’m happy to report that NewsWave 2020.3 for Mac has been submitted to the App Store.

This update improves the app based on the feedback received since launch. In addition to bug fixes and UX improvements I’ve also taken the opportunity to expand the amount of unit tests that verify each app change and I’ve tweaked the App Store name from ‘NewsWave Reader’ to ‘NewsWave – News Reader+’ to improve discoverability. 

Provided all goes well with App Review it should become available for download in the next couple of days. 

Key 2020.3 changes include:

-Fixed a bug that could result in the setting “Show images in Feed” being ignored.
-Search text will now be correctly reset if user clicks on its sidebar icon.
-When removing an article from the bookmarks section using the key shortcut, the next article now becomes selected.
-Improved wording on helper messages explaining how to add more devices to the user subscription.
-Fixed bug that would show an incorrect dark-mode background color when a search for feeds returned no results.
-Fixed bug that could trigger a message suggesting to add feeds when the right conditions were not met.
-The app may trigger a one-time rating request if the user has read all articles and has been using the app for quite some time.
-Fixed bug that allowed selection of multiple cells if the spacebar was pressed.

Comments / questions?  You can reach me @MarcMasVi on Twitter or

Hope you enjoy the update, please let me know if you have any feedback. Until next time, 


A different approach to email with ‘Hey’

Email, one of the most widespread technologies of all time, it has enabled so much… At the same time, it was designed a long time ago when the internet was very different. 

When I read about Basecamp’s attempt at improving email with ‘Hey’, to address many of its current shortcomings I was intrigued. I spend quite a bit of time on emails after all…

Hiw hero eba1bd6c04c35d82d59934dce730292d83bb15694f66ff23cf7b41b286e1d738

After a few weeks I have to say it’s a very interesting concept. I’ll keep my current setup for now, but I found it compelling enough to subscribe for one year. I will try using the address for all development engagement with my customers, their features will come quite handy.

Even if you’re not interested in switching, their approach is well worth a read. There’s also a video from the CEO where he walks through the features. 

Comments / questions?  You can reach me @MarcMasVi on Twitter or