How To Clear Jupyter Notebook
How to Un-Delete Your Jupyter Notebooks
The metadata science of hacking Jupyter notebooks with SQL and command line fu
Who doesn't love Jupyter notebooks? They're interactive, giving you the instant gratification of immediate feedback. They're extensible — you can even deploy them as websites. Most importantly for data scientists and machine learning engineers, they're expressive — they span the space between the scientists and engineers who manipulate data and the lay audience that consumes and wants to understand the information that data represents.
But Jupyter notebooks have their dra w backs. They're big JSON files that store the code, markdown, input, output, and metadata of every cell that you run. To understand what I mean, here's a short notebook I wrote to define and test the sigmoid function.
And here's what (part of) it looks like when IPython isn't rendering it (I've abridged all but the first actual input cell, because even for a short notebook it's long and ugly):
{
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"version": "3.6.8-final"
},
"orig_nbformat": 2,
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"npconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": 3,
"kernelspec": {
"name": "python36864bitvenvscivenv55fc700d3ea9447888c06400e9b2b088",
"display_name": "Python 3.6.8 64-bit ('venv-sci': venv)"
}
},
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import random"
]
}, ...
}
It's kind of hard to version control them as a result. It also means there's a lot of junk data in there you usually don't care very much about saving, like the cell execution count and outputs.
The next time you wonder why your Jupyter notebook is running so slowly, open it in a plain-text editor and see how many massive dataframes are just hanging out in your notebook's metadata.
Under the right circumstances, though, all that junk can look like precious gems. Those circumstances usually involve:
- Accidentally closing a notebook without saving it
- Hitting the wrong keyboard shortcut and deleting important cells
- Opening the same notebook in multiple browser windows and overwriting your own work
Anyone who's ever strayed into the dangerous territory of doing their development in an IPython notebook has done these things at least once, probably at the same time.
If this is you, and you're here because you Googled recover deleted jupyter notebook refreshed browser window, don't panic. First I'm going to tell you how to fix it.* Then I'm going to tell you how to prevent it from happening again.
*Yes, you can fix it. (Probably.)
Requirements:
- A Python virtual environment with Jupyter, IPython (with nbformat and nbconvert) and jupytext installed (preferably a fresh venv, so you can test things out)
- Working installation of sqlite3 (optional but recommended: a database browser like SQLiteStudio)
- Hope and determination
Scenario: You accidentally deleted a couple of cells in an active notebook
(Unscientific) estimated probability of recovery: 90%
Relative difficulty: Easier
In the least-worst case, you've hit x on a cell you didn't actually want to delete, and now you want to get back its code or data.
Method 1: In and Out
When you use a notebook, the IPython kernel runs your code. The IPython kernel is a process that is separate from your Python interpreter. (That's also why you need to link a new kernel to a new virtual environment. The two are not automatically connected.) It sends and receives messages, like your code cells, using JSON.
When you run a cell or hit "save", the notebook server sends your code as JSON to a notebook on your computer that stores your input and output. So the little words In and Out next to your cell aren't just words, they're containers — specifically lists of your session history. You can print out and index into them.
Method 2: IPython Magic
Use the %history line magic to print your input history (last in, first out). This powerful command grants you access to your current and past sessions by absolute or relative number.
If the current IPython process is still connected, and you've installed nbformat in your virtual environment, execute this code in a cell to recover your notebook:
>>> %notebook your_notebook_filename_backup.ipynb This magic renders the entire current session history as a new Jupyter notebook.
Well, that wasn't so bad.
This won't always work, and you might need to expend a bit of effort weeding extraneous cells from the output.
There are lots more things you can do with the history magic. Here are a few recipes I find useful:
-
%history -l [LIMIT]get the last n inputs -
%history -g -f FILENAME: writes your entire saved history to a file -
%history -n -g [PATTERN]: search your history with a glob pattern and print the session and line numbers -
%history -u: get only the unique history from the current session. -
%history [RANGE] -t: get the native history, a.k.a. the IPython-generated source code (good for debugging) -
%history [SESSION]/[RANGE] -p -o: print input and output with the>>>prompt (nice for readmes and documentation)
If you've been working in a really big data science notebook for a long time, the %notebook magic strategy might produce a lot of noise that you don't want. Use the other parameters to whittle down the output of %history -g, then use jupytext (explained below) to convert the results.
Scenario: You closed an unsaved notebook
(Unscientific) estimated probability of recovery: 70–85%
Relative difficulty: Harder
Remember how we said version control is hard with notebooks? A kernel can connect to more than one frontend at the same time. Which means those two browser tabs with the same notebook open can access the same variables. Which is how you overwrote your code in the first place.
IPython stores your session history in a database. By default, you can find it under your home directory in a folder called .ipython/profile_default.
$ ls ~/.ipython/profile_default db history.sqlite log pid security startup
Back up history.sqlite to a copy.
$ cp history.sqlite history-bak.sqlite Open the backup, either in a database browser or via the sqlite3 command line interface. It has three tables: history, output_history, and sessions. Depending on what you want to recover, you may need to join all three, so brush off your SQL.
Eyeball it
If you can tell from the database browser GUI which session number in the history table has your code, then your life is a bit simpler.
Either execute the SQL command in the browser or on the command line:
sqlite3 ~/.ipython/profile_default/history-bak.sqlite \
"select source || char(10) from history where session = 1;" > recovered.py All this does is specify the session number and the filename to write to (in the example given, 1 and recovered.py) and selects your source code from the database, separating each block with a newline character (which in ASCII is 10).
If you wanted to select the line number as a Python comment, you could do so with a query like:
"select '# Line:' || line || char(10) || source || char(10) from history where session = 1;" Once you have a Python executable, you can pretty much breathe easy. But you could turn it back into a notebook with jupytext, a miraculous tool that can convert plaintext formats to Jupyter notebooks.
jupytext --to notebook recovered.py Not too terrible!
Scenario: You opened your notebook in multiple tabs, reloaded an old version, erased all your work, and killed your kernel
(Unscientific) estimated probability of recovery: 50–75%
Relative difficulty: Hardest
None of the above worked, but you're not ready to give up yet.
Hard mode
Go back to whatever tool you're using to navigate your history-bak.sqlite database. The queries you write will require creative search techniques that make the most of the information you have:
- Timestamps for session starts (always non-null)
- Timestamps for session ends (sometimes null in useful ways)
- Output history
- Number of commands executed per session
- Your input (code and markdown)
- IPython's rendered source code
For example, you could find everything you wrote involving pytest this year with a query like:
select line, source from history join sessions on sessions.session = history.session where sessions.start > '2020-01-01 00:00:00.000000' and history.source like '%pytest%';
Once you've shaped your view to the rows you want, you can export it to an executable .py file as before, and convert it back to .ipynb with jupytext.
How to avoid needing this article
As you savor your relief at not having to rewrite your notebook from scratch, take a moment to ponder a few measures to guard against future expeditions into history.sqlite:
- Don't connect identical frontends to the same kernel. In other words, don't keep the same notebook open in multiple browser tabs. Using Visual Studio Code as your Jupyter IDE largely eliminates this risk.
- Back up your IPython history database file regularly just in case.
- Convert your notebooks to plaintext whenever possible — at least for backup. Jupytext makes this almost trivial.
- Use IPython's
%storemagic to store variables, macros, and aliases in the IPython database. All you need to do is find youripython_config.pyfile inprofile_default(or runipython profile createif you don't have one), and add this line:c.StoreMagics.autorestore = True. You can store, alias, and access anything from environment variables to small machine learning models if you want to. Here's the full documentation.
What are some of your biggest challenges with Jupyter notebooks? Drop a comment on topics you'd like to tackle in future posts.
Resources
- Power up your Python Projects with Visual Studio Code
- Advanced Google Skills for Data Science
- Jupytext
- IPython storemagic
- IPython history magic
- SQLiteStudio
How To Clear Jupyter Notebook
Source: https://towardsdatascience.com/how-to-un-delete-your-jupyter-notebooks-1289e741705f
Posted by: jacksontopoicusin1947.blogspot.com

0 Response to "How To Clear Jupyter Notebook"
Post a Comment