Start your notebook with Heading level 1 and give it a title. Follow with a narrative explaining what the notebook aims to do, where the data is sourced, and what the user can expect. Break down your notebook into smaller parts and use Heading levels 2, 3, 4, and so on for the hierarchy of topics and sub-topics. Markdown cells can be selected in Jupyter Notebook by using the drop-down or also by the keyboard shortcut 'm/M' immediately after inserting a new cell. Headings The Headings starts with '#,' i.e., hash symbol followed by the space, and there are six Headings with the largest heading only using one hash symbol and the smallest titles using six. See full list on tutorialspoint.com. Guide to the Install Jupyter Notebook. Here we discuss Installation Jupyter Notebook in two Different ways, and its Features. Jupyter notebooks is an open-source web-based Python editor which runs in your browser. It allows a combination of text written in a html-like format known as 'markdown', such as the block of text you're reading right now, and inline code, tools and outputs such as this one.
Markdown cell displays text which can be formatted using markdown language. In order to enter a text which should not be treated as code by Notebook server, it must be first converted as markdown cell either from cell menu or by using keyboard shortcut M while in command mode. The In[] prompt before cell disappears.
A markdown cell can display header text of 6 sizes, similar to HTML headers. Start the text in markdown cell by # symbol. Use as many # symbols corresponding to level of header you want. It means single # will render biggest header line, and six # symbols renders header of smallest font size. The rendering will take place when you run the cell either from cell menu or run button of toolbar.
Following screenshot shows markdown cells in edit mode with headers of three different levels.
When cells are run, the output is as follows −
Note that Jupyter notebook markdown doesn’t support WYSWYG feature. The effect of formatting will be rendered only after the markdown cell is run.
To render a numbered list as is done by <ol> tag of HTML, the First item in the list should be numbered as 1. Subsequent items may be given any number. It will be rendered serially when the markdown cell is run. To show an indented list, press tab key and start first item in each sublist with 1.
If you give the following data for markdown −
It will display the following list −
Each item in the list will display a solid circle if it starts with – symbol where as solid square symbol will be displayed if list starts with * symbol. The following example explains this feature −
The rendered markdown shows up as below −
Markdown text starting with http or https automatically renders hyperlink. To attach link to text, place text in square brackets [] and link in parentheses () optionally including hovering text. Following screenshot will explain this.
The rendered markdown appears as shown below −
To show a text in bold face, put it in between double underscores or two asterisks. To show in italics, put it between single underscores or single asterisks.
The result is as shown below −
To display image in a markdown cell, choose ‘Insert image’ option from Edit menu and browse to desired image file. The markdown cell shows its syntax as follows − Block app for mac.
Image will be rendered on the notebook as shown below −
In a markdown cell, a table can be constructed using | (pipe symbol) and – (dash) to mark columns and rows. Note that the symbols need not be exactly aligned while typing. It should only take respective place of column borders and row border. Notebook will automatically resize according to content. A table is constructed as shown below −
The output table will be rendered as shown below −
For this tutorial I am going to assume that you have some idea about using either Jupyter notebook or Python in general. I also assume that you have Anaconda installed, or know how to install packages into Python. If you do not, then I would first suggest putting a few minutes aside for installing Anaconda and taking a crash course in Jupyter.
I am breaking down the data that I’m going to work with because the things I’m going to talk in this post can be applied to any other data which looks similar – That is, a simple two column data, which when plotted will form a 2D line plot with an x and y-axis.
In my lab we use a spectrometer to collect data. For the uninitiated, a spectrometer is basically a fancy prism with a camera at the rainbow end to take a black and white picture (intensity) of the rainbow. The data in this case is formed by spatially dispersing an input light into its constituent colors (wavelengths of that color). The intensity for each color is recorded using a camera. That is two columns of data – Wavelength is the first column, in nanometers and Intensity is the second column (photon counts, let’s say). The data file, of a near-infrared spectrum around 900 nm, if opened in a text editor, would look as follows.
If you would like to use the same data file I am using, you can download it from here.
Now, note that ASCII files like these are easier to handle for us starters and should show good numbers when opened using notepad or Microsoft excel.
Other proprietary formats such as the ones that directly come out of our spectrometer, like .SPE formats (for Princeton instruments cameras) is a binary format. These will give you garbage if you try to open these with notepad or excel. There is a way to open these using Python, but you need to have a detailed information about the format and construct the code accordingly to read them properly. In this post I’m going to deal with simpler ASCII (text, CSV files etc) files only.
If you are dealing with SPE or other such difficult file formats, I would suggest using a file conversion software that usually comes with the equipment to export the binary file to txt or csv format.
Once you have the exported txt or csv, there’s a couple of ways to proceed. From my tests on time taken to read really large data files and for versatility (as you will see in the bonus tips), I have now settled on using pandas to read my files. This is a module that comes installed with Anaconda. Fire up your Jupyter notebook. To import this module you need to type in the following:
This imports the module pandas and all of the useful functions inside of it can be used using the “pd.” prefix. Similarly, the other scientific computing favorite, “numpy” is usually imported as “np” and you do it exactly what you did with pandas:
Pandas is a great library for handling big datasets but we will use only a small part of it to read our data file. Numpy is a library that makes life easier when handling arrays, hence we import that too. We will see in the following code how these will be used. Running your code at this point will do nothing that you will see. You will certainly see errors if the modules were not installed properly.
The most important thing you need to read a data file is its path – where is it in your computer? Say your file is in the same folder as your Jupyter notebook and is called “datafile.txt”. In this case you will first start by storing your file name in an arbitary variable, let’s say “filename”, like this:
This stores the name of your file in the variable “filename”. And the type of this variable is string type – because it is text. This is the reason you need to put the quotations around the string when assigning it to the variable in the code.
But since files are not always conveniently inside the same folder as your code, you can also have the full path of the file stored in this string. This is always better. So now, say your file is in a folder ‘D:/data/datafile.txt’, then you store your full path+filename inside the filename variable as follows:
To read the file using pandas now all you have to do is use the “read_csv” function from pandas as follows:
As long as you have a file with the column like data (shown previously) in it, you will immediately get a table as the output which for the type of data I showed above, would look like this:
But this clearly is not desirable. Pandas read it all into a single column. To take care of that, we need to specify the delimiter to the read_csv function. The delimiter tells the function what separates your second column from your first column. In this case the delimiter is “t” which is a string we use for tab in Python code. When it is separated by a space ” ” is the delimiter and when separated by commas, “,” is the delimiter and so on. The delimiter is specified using sep=”” in the read_csv function as follows:
This gives an output which looks like this:
Which is also not very desirable because the two bold numbers on the top tell me that it read the first row of my data as a header, or the title of my columns. Pandas is a bit too smart to do that, because unlike usual data files we do not have column names in the first row of our file. It’s all data. We need to tell Pandas that by adding this specification. Here’s how you do it.
And that gives us a nice output which has several horizontal rows (only 8 rows shown here) and 2 vertical columns. Note that the header is simply indices now. The data is read inside the table.
I have figured that for simple data with two column of numbers, using pandas beyond this step is an overkill and requires you to learn several other smart quirks it has. Therefore I prefer storing all of the data retrieved by Pandas, in the form of bare-bone array of numbers. To do this you simply add “.values” to the end of your pandas read_csv function and store all of the numbers inside a variable called “data”. Here’s how:
TLDR so far: In 3 lines of code and a few milliseconds of run time you get all of the data from your file! (we are not using the numpy import yet so that can be skipped). The final code we have now is:
The “data” variable now contains all the data from the filename. It is very convenient to access any part of the data or do math on it now. Native instruments for mac os.
Let’s say you need to get the x-axis column, i.e. the full first column of our file, into a variable called “x”. And let’s also do y-axis into a variable called “y” while we are at it. This can be done as follows:
Here the square brackets are used to access parts of an array. If we are dealing with a 2D array, like in the case of the “data” variable, we will need the row number and the column number of the data we want to access. These are put inside the square brackets where the first number represents the index of the row and second number is the index of the column, separated by a comma. In this case, since we want all the rows, the colon “:” gets all of the rows. Since x is the 0th index column (see the header in the data that was printed above), we tell the program to get all rows of 1st column (index 0) from the variable data by writing data[: , 0]. Similarly, all rows of 2nd column (index 1), by writing data[: , 1].
Fancier things can be done too. Like cutting and getting only a part of the data. Say you want all the x and y data from row 100 to row 200 (assuming you have that many rows), you can do this by writing:
Plotting is most popularly and conveniently done using a library called matplotlib. Since we already have x and y arrays, both of same sizes taken out from our data file, this becomes very easy. First we import the plotting function we will need using the following:
plt is a popular shortcut people use everywhere and that’s how I picked it up too. You may also use something of your own but it gets confusing fast. So I would suggest doing the same thing as everyone is doing. Same as pd for Pandas and np for numpy.
Now to plot we type in the following and run the cell.
The output for my data file shows as follows:
plt.plot(x,y) draws your line plot by taking the equal sized x and y arrays that we defined earlier. plt.scatter(x,y) can be used to make a scatter plot instead. While Python draws it in the background, it does not show it till you tell it to show what it drew. For that, we write the plt.show() part.
The color of the line, thickness, type of line, axis labels, text sizes etc can all be easily changed as per your liking. Refer to this documentation to do all of those things. And try making a green dashed line for your plot using plt.plot(x,y). Experiment by adding x and y labels to too.
TLDR: So now, the full code to import and plot in one place is as follows: Driver intel hd for mac.
Now let’s say you want to integrate the data from row 100 to row 200 (find area under the curve). For this the numpy we imported previously as np becomes useful. First we will define the part of the spectrum (x1 and y1) we want to integrate, if you haven’t done it previously.
yint variable now contains is the integrated area under the part of the spectrum and is calculated using the trapezoidal method under numpy. More about it here. Printing out of the yint value gives you the integrated number value as the output.
To show the part of the spectrum we are integrating we can use axvspan and show it as follows. We first plot the full x and y columns using plt.plot(x,y), then use axvspan to show the range from x[100] to x[200], color it red and also make it half transparent by adding alpha = 0.5.
TLDR: So now, our full code to read the data, plot it, cut it, integrate and show the cut part of plot becomes:
My output then looks like:
Putting a magic command, “%%time” at the starting of the cell allows you to calculate the time taken to run the cell. In this case, all of this was done in 180 ms on my machine.
Not specifying a delimiter by saying sep=”none” and adding engine=’python’ in the pd.read_csv as follows, allows you to not specify any delimiter and read all ASCII files with ease. Pandas is smart enough to identify it on its own. I have tested this both with CSV (comma delimiter), space delimiter and tab delimiters. So now, the same line of code works with most of my data files!
In case your file has header text in the first few lines, by saying header = None, you will be trying to read it in as data. You will end up getting garbage inside the variable data. To tackle this problem I would suggest adding skiprows=1, if you have one line of header text. Similarly, you would use skiprows=10 if you have 10 lines of header. In that case your code for pd.read_csv becomes: