Recently I needed to install Drupal 7 non-interactively on a Linux Server running a typical LAMP software stack. There is a lot of information out there already for this type of installation but nothing concise for what I needed to do. Hopefully by adding this one to the pile it will help someone else who went through a similar exercise, if you found it useful please let me know!
What you get after installation with this custom install profile:
- Drupal 7 core
- Core + selected module(s) installed and enabled
- A custom user role and user name passed in on the command-line
Server Setup:
For a testing sandbox it will be easiest to use a Linux box or VM that has the following installed:
- Apache or lighttp w/ php and sqlite
- php w/ sqlite
- Drush v5.x
Use your OS’s packaging system to install the webserver with php and database support.
Make sure you drush is version 5.x, to install it using php pear:
pear channel-discover pear.drush.org pear install drush/drush
Or see http://drupal.org/project/drush
Drush Make
The command “drush make” will read a makefile that has some directives to tell drush where to fetch the core, modules and themes. Here is the drush makefile that we will use that installs the core plus the Role Delegation module.
; Core Drupal ; ------------- core = 7.x api = 2 projects[drupal][version] = 7 ; Modules ; ------------- projects[role_delegation][subdir] = contrib
You can tell drush where to fetch the drupal core and the module but in this case it will get it from drupal.org. There are more powerful things you can do with the makefile, see the docs for more info.
To use this makefile and Drush to install Drupal into a web directory:
drush make /path/to/makefile /path/to/install/dir
This will download and explode the Drupal core and the module into a directory which will then allow you to run the install wizard by pointing a browser to it. What you most likely want however is to automatically install Drupal and get everything set up in one shot. If all you want to do is install the core and an admin user then you can stop reading here; use the drupal site-install command which has command-line arguments for the basic options.
Read on if you want to learn how to make additional customizations.
Drupal Install Profiles
The standard install profile is the default one that is used for a vanilla Drupal core installation. Using that profile as a starting point you can start making customizations like adding modules to the default list and you can add your own install steps. This example will create an additional user and permission role called “Content Creator.” This user will have the abilities to create new article content and some admin abilities like changing themes.
Copy the standard profile into a new one called my_profile
cd /path/to/install/dir cp -r profiles/standard profiles/my_profile
In the “my_profile” directory, rename all references to the name “standard” to “my_profile.”
my_profile.info
The info file has key=value pairs that informs the profile which modules will be activated, add the “role_delegation” module to the list. This is necessary to activate the module on install.
... dependencies[] = role_delegation
my_profile.profile
This file can be customized to add additional install steps. We are going to add a step to ask for a username and password for an additional user who will be given the custom “content creater” user role. Add the following php code to the end of my_profile.profile.
...
/**
* Implements hook_install_tasks().
*/
function my_profile_install_tasks() {
$tasks = array();
// Add a page allowing the user to specify a "content creator" user
$tasks['my_profile_cc_form'] = array(
'display_name' => st('Content creator username'),
'type' => 'form',
);
return $tasks;
}
/**
* Task callback: returns the form allowing the user to add
* a "content creator" user
*/
function my_profile_cc_form() {
drupal_set_title(st('Content Creator Username'));
$form['cc_uid'] = array(
'#type' => 'textfield',
'#title' => st('Username for Content Creator:'),
'#description' => st('Enter the content creator userid'),
);
$form['cc_email'] = array(
'#type' => 'textfield',
'#title' => st('Email for Content Creator:'),
'#description' => st('Enter the content creator email'),
);
$form['cc_pass'] = array(
'#type' => 'textfield',
'#title' => st('Password for Content Creator:'),
'#description' => st('Enter the content creator password in both fields'),
);
$form['actions'] = array('#type' => 'actions');
$form['actions']['submit'] = array(
'#type' => 'submit',
'#value' => st('Create content creator role and user'),
'#weight' => 15,
);
return $form;
}
/**
* Submit callback: creates the "content creator" role and user
*/
function my_profile_cc_form_submit(&$form, &$form_state) {
$uid = $form_state['values']['cc_uid'];
$email = $form_state['values']['cc_email'];
$pass = $form_state['values']['cc_pass'];
// Create a role for "content managers"
$c_role = new stdClass();
$c_role->name = 'content manager';
user_role_save($c_role);
// additional permissions beyond what the authenticated
// user receives
user_role_grant_permissions($c_role->rid, array(
'assign content manager role',
'create article content',
'edit own article content',
'delete own article content',
'create page content',
'edit own page content',
'delete own page content',
'administer themes',
));
$cc_user = array (
'name' => $pass,
'pass' => $pass,
'roles' => array($c_role->rid => $c_role->rid),
'mail' => $email,
'status' => 1, # status: active
);
$user = user_save(NULL, $cc_user);
}
- my_profile_install_tasks() – adds the additional install task
- my_profile_cc_form() – specifies the custom form to get user input
- my_profile_cc_form_submit() – runs when the form is submitted, creates a custom role with limited permissions and creates the user with this role assigned.
my_profile.install
This file has php code that will run on installation, it sets up the basic views, sets the theme, etc. This can be changed or extended in whatever way you want but for this example we will stick with the standard setup.
Once this profile is created it will be available as a new option on the install wizard:
And the new custom screen to create the “content creator” user:
There are many different form elements and attributes that you can read about in the Drupal form api documentation. For this example it would probably be better to use a password_confirm text input for the password but since the goal is to automate it from the commandline it doesn’t really matter.
Automating everything from the command-line
Now that the profile is customized instead of running through the wizard it would be nice to input everything from the command-line. To do this the drush site-install gives you the option of passing form parameters into the command line.
Here is what you would do to automate everything from the command-line where shell variables correspond to the “CC” (content creator) user information and admin account info.
(create the makefile) drush make /path/to/makefile /path/to/install/dir (create the new profile) cd /path/to/install/dir drush -y site-install --clean-url=0 --db-url=sqlite:sites/default/files/db.sqlite --account-name=$ADMIN_USER --account-pass=$ADMIN_PASS --account-mail=$ADMIN_MAIL --site-mail=$SITE_MAIL my_profile my_profile_cc_form.cc_uid=$CC_USER my_profile_cc_form.cc_email=$CC_MAIL my_profile_cc_form.cc_pass=$CC_PASS
“my_profile_cc_form” is the name of the form for the custom “Content Creator” install step and cc_uid, cc_email, and cc_pass are the parameters that entered. When that command completes you will have a fully functional site with a custom user and role.
If you something isn’t working for you I have put all of the above into a standalone shell script, simply install drush and and change the variable assignments at the top to suite your needs.
Filed under: Uncategorized | Leave a Comment
Transcripts of the Sitting Justices
This post takes a look at supreme court transcripts as corpus for natural language processing. Lately I’ve been playing around with the nltk python module and I thought this might be an interesting data set (given that I’m also an avid follower of SCOTUS).
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Supreme Court transcripts are made available by supremecourt.gov and can be downloaded in PDF form. Extracting data from the PDFs is not an exact science since the format varies a bit. I can get a majority of the cases broken up by speaker for what is available for download. The script does its best to compensate for transcription errors, typos, etc. Nothing is perfect though so it’s only useful to look at this data in the aggregate.
Click here for transcripts by person organized by name, or here for organized by case.
Stats from all available transcript data
| statements | sentences | words | stopwords | unique words | |
|---|---|---|---|---|---|
| JUSTICE_ROBERTS | 10338 | 21925 | 127500 | 167153 | 8683 |
| JUSTICE_ALITO | 3042 | 6706 | 51472 | 70438 | 5897 |
| JUSTICE_SCALIA | 11870 | 27347 | 149102 | 220758 | 9239 |
| JUSTICE_THOMAS | 8 | 19 | 114 | 169 | 86 |
| JUSTICE_KENNEDY | 6081 | 12686 | 76322 | 113082 | 7125 |
| JUSTICE_BREYER | 9105 | 31739 | 181260 | 276786 | 9202 |
| JUSTICE_GINSBURG | 7336 | 17898 | 118725 | 169126 | 8740 |
| JUSTICE_KAGAN | 912 | 2222 | 17048 | 22474 | 3026 |
| JUSTICE_SOTOMAYOR | 3263 | 7145 | 43937 | 63213 | 5277 |
| other speakers | 59531 | 185848 | 1552665 | 1952296 | 21500 |
“Other speakers” are petitioners and respondents (not justices)
“stopwords” are high frequency words like the, to, also, etc.
“words”, “unique words” do not include “stopwords”
Stats from 500 randomly selected statements
| sentences | words | stopwords | unique words | |
|---|---|---|---|---|
| JUSTICE_ROBERTS | 1061 | 6971 | 7900 | 1952 |
| JUSTICE_ALITO | 1086 | 9563 | 11618 | 2443 |
| JUSTICE_SCALIA | 1146 | 7435 | 9354 | 1989 |
| JUSTICE_KENNEDY | 1045 | 7508 | 9239 | 2121 |
| JUSTICE_BREYER | 1729 | 11315 | 14582 | 2329 |
| JUSTICE_GINSBURG | 1206 | 9298 | 11493 | 2483 |
| JUSTICE_KAGAN | 1245 | 10867 | 12727 | 2250 |
| JUSTICE_SOTOMAYOR | 1150 | 8189 | 10051 | 2170 |
| other speakers | 1484 | 13256 | 15047 | 3178 |
Justice Thomas is not included in the above data set because there are only 8 statements from him in the generated corpus
“Other speakers” are petitioners and respondents (not justices)
“stopwords” are high frequency words like the, to, also, etc.
“words”, “unique words” do not include “stopwords”
[easychart type="horizbar" height="90" title="num words / sentences from 500 random transcript statements" groupcolors="2E2EFE" groupnames= "Supreme Court Justices" valuenames="Roberts, Alito, Scalia, Kennedy, Breyer, Ginsburg, Kagan, Sotomayor, Other Speakers" group1values="6.57,8.81,6.49,7.18,6.54,7.71,8.73,7.12,8.93" ]
Linguistic diversity is a coarse measure of a varied vocabulary. The chart below displays the total number of unique words divided by the total number of words.
[easychart type="horizbar" height="90" title="Linguistic diversity from 500 random transcript statements" groupcolors="2E2EFE" groupnames="Supreme Court Justices" valuenames="Roberts, Alito, Scalia, Kennedy, Breyer, Ginsburg, Kagan, Sotomayor, Other Speakers" group1values="0.28,0.26,0.27,0.28,0.21,0.27,0.21,0.26,0.24" ]
Nothing too interesting or surprising here. Justices may use less words in their sentences for a variety of reasons. This data isn’t normalized to factor out introductions or interruptions. Nevertheless the trend appears to be that Justice Kagan and Alito’s sentence lengths are longer than the others and about equal to the petitioners and respondents.
Long words in the oral transcripts
18 or 19 letters is the typical length for the longest words (that are in the dictionary) used by the various speakers. This includes non-sitting justices where the data was available.
- Justice Alito – (18) misrepresentation
- Justice Thomas – (16) unconstitutional
- Justice Kagan – (18) misrepresentations
- Justice Rehnquist – (18) telecommunications
- Justice Sotomayor – (18) unconstitutionally
- Justice Ginsberg – (18) misrepresentations / telecommunications / disproportionately
- Justice Scalia – (19) unconstitutionality
- Justice Breyer – (18) representativeness / telecommunications / disproportionately / unconstitutionally
- Other speakers – (19) counterintelligence / unconstitutionality / extraterritoriality
Sentiment Analysis
Sentiment analysis can yield interesting results for corpus data though in this case there is not very good training material. One of the standard data-sets used for this are movie reviews, widely available and with clear negative and positive denotations. For more information about sentiment analysis there is some good information here and in these two articles. Applying this to oral arguments? Well let’s leave it as just one way to look at this data..
[easychart type="horizbar" height="100" title="Sentiment analysis from 500 random statements" groupnames="pos, neg" groupcolors="0000FF,FE2E2E" valuenames="Roberts, Alito, Scalia, Kennedy, Breyer, Ginsburg, Kagan, Sotomayor, Other Speakers" group1values="230,248,213,229,208,269,248,254,324" group2values="270,252,287,271,292,231,252,246,176" ]
Justice Thomas is not included in the above data set because there are only 8 statements from him in the corpus
“Other speakers” are petitioners and respondents (not justices)
Once the sentiment engine is trained, each statement registers as either positive or negative based on the words that match closest to the language in positive and negative movie reviews.
I’m sure the error of margin is high and without training data for similar text I would hesitate to draw any conclusions. One that I might make from this is that petitioner and respondents tend to use more positive language than the judges on the bench. If that is a valid hypothesis this data certainly seems to validate it.
Laugh lines in the oral transcripts
Note: there are venturesome academic papers like this one from the Communication Law Review that address laughter in the SCOTUS courtroom. I make no attempt to go into that depth here though my results more or less agree with previous studies on this topic.
It’s not uncommon to get laughter after a statement from the bench, this is denoated in the transcripts as either [laughter] or (laughter). This chart displays the total number of laughter lines by sitting Judge
[easychart type="horizbar" height="100" title="Number of Laughter Lines in the Oral Transcripts" groupnames="Supreme Court Justices" valuenames="Roberts, Alito, Scalia, Kennedy, Breyer, Ginsburg, Kagan, Sotomayor" group1values="162,24,490,84,319,21,11,5"]
As usual Thomas is excluded for generally not speaking when he sits on the bench.
Here is the same data but instead of the total number of [Laughter] lines it divides it by the number of statements for each justice.
[easychart type="horizbar" height="100" title="Laughter Lines / Number of Spoken Lines" groupnames="Supreme Court Justices" valuenames="Roberts, Alito, Scalia, Kennedy, Breyer, Ginsburg, Kagan, Sotomayor" group1values=".016,.008,.041,.014,.045,.002,.002,.002"]
Justice Breyer appears to be funnier by this measure.
Additional Notes
Unfortunately it’s impossible to cleanly extract the argument data from PDFs. Older transcripts have the Justice’s remarks labeled as “QUESTION”; without specific name references the data had to be discarded. Some transcripts have the Justice’s name spelled incorrectly, for example: JUST SCALIA or JUDGE SCALIA
Here is an example of where there is wrong attribution:
MR. LANDAU: JUSTICE O'CONNOR: Your Honor, that is not -And I think it is conceivable that the Florida court was correct that you could draw the line some way and say contracts that are void should be handled differently.
For this reason don’t take these results too seriously though hopefully the errors are down in the noise (I have no desire to go through and correct them).
The pdfs were parsed and the data was generated with two crufty python script, they are on my github if you want to look at this data yourself. If you make any improvements please let me know!
Filed under: Uncategorized | 2 Comments
Lately I noticed that there were more than the usual amount of ssh invalid logins on a machine I manage. For kicks here is data from a script that extracts some name statistics from auth logs. These stats are compared to the male and female names contained in one of the nltk corpora.
Below is data collected over the month of January, 2012 using authlog as my input. Over the course of a month there were over 125,000 invalid ssh attempts.
- Total number of male name login attempts: 17,804
- Unique male names: 1,565
- Logest male names: christopher, bartholomew, constantine
[easychart type="horizbar" height="120" title="Male names by occurrence (top 1%)" groupcolors="2E2EFE" groupnames="male names - percentage of total" valuenames="michael, victor, shell, angel, adrian, adam, david, dan, robert, cyrus, john, web, alex, temp, billy" group1values="0.44, 0.47, 0.48, 0.48, 0.48, 0.49, 0.54, 0.58, 0.59, 0.62, 0.63, 0.7, 0.99, 1.21, 1.67"]
- Total number of female name login attempts: 22,483
- Unique female names: 2,217
- Logest female names: alexandrina, constantine
[easychart type="horizbar" height="130" title="Female names by occurrence (top 1%)" groupcolors="FE2EC8" groupnames="female names - percentage of total" valuenames="susan, caroline, ann, cecilia, clara, anna, chris, denise, claudia, sharon, daniel, frank, diane, kim, victoria, sarah, shell, angel, adrian, amanda, alex, billy" group1values="0.26, 0.26, 0.26, 0.26, 0.27, 0.27, 0.27, 0.28, 0.28, 0.28, 0.28, 0.29, 0.29, 0.33, 0.34, 0.34, 0.38, 0.38, 0.38, 0.56, 0.79, 1.33"]
“billy” wins the prize; the corpus for “names” has it as both male and female. Also the corpus has “temp” as a male name which appears in the top 1% for obvious reasons.
What about all logins? For these we will just look at the top 0.1%.
- Total number of login attempts: 125,588
- Unique logins: 35,627
- Longest login name: fidelu142muiesteaua8642jet184
[easychart type="horizbar" title="All logins by occurrence (top 0.1%)" groupnames="all logins - percentage of total" valuenames="administrator, postmaster, students, test2, postfix, smtp, alex, www, webmaster, r00t, toor, test1, operator, temp, info, student, apache, geronimo, germany, testing, italy, billy, ts, tester, guest, server, testuser, adm, user, ftp, postgres, nagios, oracle, admin, test" group1values="0.11, 0.11, 0.12, 0.13, 0.13, 0.13, 0.14, 0.14, 0.15, 0.15, 0.16, 0.16, 0.16, 0.17, 0.18, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.24, 0.27, 0.27, 0.33, 0.34, 0.34, 0.34, 0.38, 0.4, 0.41, 0.45, 0.45, 0.71, 1.01"]
Conclusion: if you name your kid billy or test make sure he uses certificate authentication.
Filed under: Uncategorized | Leave a Comment
Amazing Musical Giftbox
I received mail from Piotr Zimnowlodzki last week who showed me this cool hack using the PlayTune library and my midi converter script to play “Can you feel the love tonight” an attiny. Much better than the cheap greeting card hack, amazing craftmanship!
[hana-flv-player video='/flv/giftbox_480x360.flv' /]
Filed under: Uncategorized | Leave a Comment
Here is another weekend hack that plays around with my midi to AVR conversion script and library. With xmas fast approaching I thought it would be fun to convert a pacman candy tin to an xmas ornament and have it play music. Below is the result, pressing a button on the tin will cycle through three Ms. Pacman songs converted from midi files found online.
Construction
Once I had the circuit working on a breadboard it was just a matter of finding a prototype board that fit and some ugly soldering to glue it all together. In my case I had these parts lying around (including the tin) but if you wanted to buy everything it would cost between $5-$10.
- PacMan ghost candy tin
- Prototype Board
- A couple stand-offs and screws
- Two 1k potentiometers
- Piezzo Speaker
- Batter holder
- Coin cell battery
- DIP socket
- ATTINY85 Microcontroller(8k of flash, internal clock @ 8MHz)
- Push-button switch
Other materials..
- AVR SPI programmer or an Arduino to program the attiny
- A drill with a decent bit to cut through the tin
- Musescore sequencing software (free)
- Soldering iron, wires, a free afternoon, etc.
The circuit is only slightly more complicated than the musical greeting card. Two 1K potentiometers are used to mix the two square waves into one speaker. The push switch is connected to the external interrupt pin which is set low when pressed. On the prototype board the switch is wired on the opposite side of the circuit so that the speaker faces down when placed in the candy tin. This makes it louder by drilling holes in the back piece (see below). Standoffs are used to prop up the non-speaker side of the circuit.
Nothing is needed to hold the the circuit board in the tin since the button keeps it in place and there isn’t a lot of extra room when it is put together.
Software
(for more on the PlayTune library see my earlier post on using the PlayTune library with an Arduino)
Like the musical greeting card we will use the PlayTune library to play the melody and the xml2h.py to handle the musical conversion. The conversion takes a single track and converts it into two byte arrays of pitches and delays. The pitch values are a function of clock frequency and the prescaler.
The total size of the program ends up being around 2k so there is plenty of room to add more songs if you are inclined.
I used three midi files for the songs and loaded them into musescore. In the application it was necessary to transpose it an octave, other than that there wasn’t much else to do since these songs are already two-tracks which is exactly what we want for the attiny.
After saving the midi file as a MusicXML file a header file is created for each song. These header files are what the PlayTune library uses for tone and delay values.
The prescale values scale the frequency of the clock by a power of 2. Ideally you want the lowest value given in the list though TIMER0 only supports values of 1024, 256, 64 and 8 so for the first part “64″ is chosen.
For TIMER1 (part2) the lowest number can be selected to give the highest timer resolution. The reason this is important is because the frequency of the square waves generated on the two microcontroller pins are only an approximation of the note frequency. Higher frequency == greater timer resolution == better pitch accuracy.
The AVR is immediatly put into power-down sleep mode. When an external level change on the INT0 pin is detected (button press) the ISR routine will run which will play one of the three tunes.
#include <avr/io.h>
#include <avr/pgmspace.h>
#include <avr/interrupt.h>
#include <avr/sleep.h>
#include <util/delay.h>
#include <avr/interrupt.h>
#include "playtune.h"
#include "songs/mspacman-acti-they-meet-attiny.h"
#include "songs/mspacman-game-start-attiny.h"
#include "songs/mspacman-actii-the-chase-attiny.h"
int main(void)
{
// setup interrupt
GIMSK |= (1<<INT0); // INT0 enabled for interrupts
while(1) {
set_sleep_mode(SLEEP_MODE_PWR_DOWN);
sleep_mode();
}
return(0);
}
volatile uint8_t tune = 0;
ISR (INT0_vect)
{
PlayTune theymeet0(0,MSPACMAN_ACTI_THEY_MEET0);
PlayTune theymeet1(1,MSPACMAN_ACTI_THEY_MEET1);
PlayTune gamestart0(0,MSPACMAN_GAME_START0);
PlayTune gamestart1(1,MSPACMAN_GAME_START1);
PlayTune thechase0(0,MSPACMAN_ACTII_THE_CHASE0);
PlayTune thechase1(1,MSPACMAN_ACTII_THE_CHASE1);
switch(tune) {
case 1:
while ( theymeet0.isPlaying() || theymeet1.isPlaying() ) {
theymeet0.playNote();
theymeet1.playNote();
_delay_ms(65);
}
break;
case 2:
while ( gamestart0.isPlaying() || gamestart1.isPlaying() ) {
gamestart0.playNote();
gamestart1.playNote();
_delay_ms(15);
}
break;
case 3:
while ( thechase0.isPlaying() || thechase1.isPlaying() ) {
thechase0.playNote();
thechase1.playNote();
_delay_ms(65);
}
break;
}
if (tune == 3) {
tune = 1;
} else {
tune++;
}
}
Filed under: Uncategorized | Leave a Comment
Recent Entries
- Automating Drupal 7 Installs Using Drush and Install Profiles
- A Text Analysis of Supreme Court Oral Arguments
- Popular names from ssh break in attempts
- Amazing Musical Giftbox
- Musical Ms. Pacman Candy Tin Hack
- Picked up on lifehacker/hackaday
- Custom musical greeting card for less than $5
- Arduino music from a midi file
- Building the AVR toolchain on Linux
- Django 1.3 / JQuery tutorial – Making a flashcard game (Part 4)
- Django 1.3 / JQuery tutorial – Making a flashcard game (Part 3)
Categories
- Uncategorized (24)















