Tuesday, April 22, 2014

Learn regular expressions to more effectively search through code and the shell

http://www.linuxuser.co.uk/tutorials/regular-expressions-guide

We’re always searching for something – the file where we wrote that recipe (Python or baking); the comment in 100,000 lines of code that points to an unfinished module; the log entry about an iffy connection. Regular expressions (abbreviated as regexps hereafter, but you’ll also see regex and re) are a codified method of searching which, to the unenlightened, suggests line noise. Yet, despite a history that stretches back to Ken Thompson’s 1968 QED editor, they’re still a powerful tool today, thanks to grep – ‘global regular expression print’. Using grep exposes only the limited Basic Regular Expressions (BRE); grep -E (or egrep) gives Extended Regular Expressions (ERE). For other languages, most adopt PCRE (Perl Compatible Regular Expressions), developed in 1997, by Philip Hazel, and understood by many languages, though not always implemented in exactly the same way. We’ll use grep -P when we need to access these. Emacs has its own regexp style but, like grep, has a -P option to use Perl-compatible regexps.
This introduction is mostly aimed at searching from the shell, but you should easily be able to adapt it to standalone Perl scripts, and other languages which use PCRE.
Even the simplest regexp can make you more productive at the command line
Even the simplest regexp can make you more productive at the command line

Resources

Your favourite editor
Perl 5.10 (or later)

Step-by-step

Step 01 Word up!
You’re probably used to searching a text file for occurrences of a word with grep – in that case, the word is the regular expression. More complicated regexps are simply concise ways for searching for parts of words, or character strings, in particular positions.
Step 02 Reserved character
Some characters mean special things in regexp pattern matching: . * [ ] ^ $ \ in Basic Regular Expressions. The ‘.’ matches any character, so using it above doesn’t just find the full stop unless grep’s -F option is used to make the string entirely literal.
Step 03 Atlantic crossing
Extended Regular Expressions add ? | { } ( ) to the metacharacters. grep -E or egrep lets you use them, as above, where ‘standardise|standardize’ can match British or American (and ‘Oxford’) spellings of ‘standardise’.
Step 04 Colourful?
‘|’ gives a choice between the two characters in the parentheses – standardi(s|z)e – saving unnecessary typing. Another way to find both British and American spellings is ‘?’ to indicate one or zero of the preceding element, such as the u in colour.
Step 05 Mmmmm, cooooool
The other quantifiers are + for at least one of the preceding regexps (‘_+’ finds lines with at least one underscore) and * for zero or more (coo*l matches col, cool, coooooooool, but not cl, useful for different spellings of mmmmmmmmm or zzzzzzzzzz).
Step 06 No number
Feeling confident? Good, time for more goodies. [0-9] is short for [0123456789] and matches any element in the square brackets. The ^ inside the brackets is a negation, here matching any non-number but the other ^? …
Step 07 Start to finish
The ^ matches the expression at the beginning of the line; a $ matches the end. Now you can sort your document.text from text.doc and find lines beginning with # or ending in a punctuation mark other than a period.
Step 08 A to Z Guide
The range in [] can be anything from the ASCII character set, so [ \t\r\n\v\f] indicates the whitespace characters (tab, newline et al). [^bd]oom$ matches all words ending in ‘oom’, occurring at the end of the line, except boom and doom.
Step 09 POSIX classes
The POSIX classes for character ranges save a lot of the [A-Za-z0-9], but perhaps most useful is the non-POSIX addition of [:word:] which matches [A-Za-z0-9_], the addition of underscore helping to match identifiers in many programming languages.
Step 10 ASCII style
Where character classes aren’t implemented, knowledge of ASCII’s underpinnings can save you time: so [ -~] is all printable ASCII characters (character codes 32-127) and its inverse [^ -~] is all non-printable ASCII characters.
Step 11 Beyond grep
Find and Locate both work well with regexps. In The Linux Command Line (reviewed in LUD 111), William Shotts gave the great example of find . -regex ‘.*[^-_./0-9a-zA-Z].*’ to find filenames with embedded spaces and other nasties.
Step 12 Nice one Cyril
Speaking of non-standard characters, while [:alpha:] depends on your locale settings, and may only find ASCII text, you can still search for characters of other alphabets – from accented French and Welsh letters to the Greek or Russian alphabet.
Step 13 Ranging repeat
While {4} would match the preceding element if it occurred four times, putting in two numbers gives a range. So, [0-9]{1,3} in the above screenshot finds one-, two- or three- digit numbers – a quick find for dotted quads, although it won’t filter out 256-999.
Step 14 Bye bye, IPv4
FOSDEM was all IPv6 this year, so let’s not waste any more time on IPv4 validation, as the future may actually be here. As can be seen in this glimpse of IPv6 validators, despite some Perl ‘line noise’, it boils down to checking appropriate amounts of hex.
Step 15 Validation
By now regexps should be looking a lot less like line noise, so it’s time to put together a longer one, just building from some of the simpler parts. A common programming task, particularly with web forms, is validating input is in the correct format – such as dates.
In this case we’re looking at validating dates, eg for date-of-birth (future dates could then be filtered using current date). Note that (0[1-9]|[12][0-9]|3[01]) checks numbers 01-31, but won’t prevent 31st February.
Step 16 Back to basics
Now we have the basics, and can string them together, don’t neglect the grep basics – here we’re looking at how many attempts at unauthorised access were made by SSH in a given period. An unnecessary pipe replaced with grep -c.
Step 17 Why vi?
Whatever your position in the venerable and affectionate vi/Emacs war, there will be times and servers where vi is your only tool, so grab yourself a cheat-sheet. Vi and vim mostly follow BRE. Here we see one of the \< \> word boundaries.
Step 18 Boundary guard
As well as ^ and $ for line ends, word boundaries can be matched in regexps with \b – enabling matches on, say, ‘hat’ without matching ‘chatter’. The escape character, \, is used to add a number of extra elements, such as \d for numerical digit.
Step 19 Literally meta
Speaking of boundaries, placing \Q \E around a regexp will treat everything within as literals rather than metacharacters – meaning you can just quote a part of the regexp, unlike grep -F where everything becomes a literal.
Step 20 Lazy = good
Time to think about good practice. * is a greedy operator, expanding something like <.*> by grabbing the last closing tag and anything between, including further tags. <.*?> is non- greedy (lazy), taking the first closing tag.
Step 21 Perl -pie
Aside from grep, Perl remains the most comfortable fit with regexps, as is far more powerful than the former. With perl -pie on the command line, you can perform anything from simple substitutions on one or more files, to…
Step 22 Perl one-liner
…counting the empty lines in a text file (this from Krumin’s Perl One-Liners, see next month’s book reviews). /^$/ matches an empty line; note Perl’s use of // to delimit a regexp; ,, could also be used if / is one of the literals used.
Step 23 A regexp too far
Now you know the basics, you can build slightly more complicated regexps – but, as Jeff Atwood said: “Regular expressions are like a particularly spicy hot sauce – to be used in moderation and with restraint, only when appropriate.”
Step 24 Tagged offender
Finally, know the limitations of regexps. Don’t use on HTML, as they don’t parse complex languages well. Here the legendary StackOverflow reply by Bob Ince to a query on their use with HTML expresses the passion this question engenders.

Unix: More ways to spin the top command

http://www.itworld.com/operating-systems/414414/unix-more-ways-spin-top-command

The top command is one of the most useful commands for getting a quick glimpse into how your Unix server is performing, but stopping there might mean that you're missing out on a lot of interesting options.

The top command provides a quick glimpse into how a Unix system is performing. It highlights the processes that are using most of your CPU cycles, givs you an idea how much memory is in use, and even provides some data that can tell you whether performance is getting better or worse. Still, there are a number of options that you may have never tried that can help you find the answers you are looking for more quickly.
One option is to use the top command to display tasks for just a single user. To do this, just follow the top command with the -u option and the username of the particular user. This will let you focus on what that user is doing on the system.
$ top -u mjones
top - 12:35:45 up 86 days,  1:30,  1 user,  load average: 3.06, 3.03, 3.01
Tasks: 192 total,   5 running, 187 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.3%us, 38.8%sy,  0.0%ni, 24.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2024796k used,    50136k free,   391756k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7996 mjones    25   0 2052m 697m 1084 R  63.0 34.4  653:47  bash
 8564 mjones    16   0  4784  392  384 S  0.0  0.0   0:00.00 bash
 8566 mjones    19   0  2444  988  760 S  0.0  0.0 215:26.19 top
You will see only the processes (and likely all of the processes) being run by that user.
You can also use top to look at a single process and nothing else.
$ top -p 22526
top - 13:00:56 up 86 days,  1:55,  1 user,  load average: 3.00, 3.00, 3.00
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 37.3%us, 37.7%sy,  0.0%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2025044k used,    49888k free,   392164k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
22526 shs       15   0  4784 1476 1204 S  0.0  0.1   0:00.05 bash
While top's output is normally sorted on the %CPU usage column, you can instead sort it on some other column. To sort based on memory usage, for example, start top and then type M (a capital M). Typing a lowercase m will turn off or back on the display of memory statistics that appear at the top of your top output.
top - 12:34:56 up 86 days,  1:29,  1 user,  load average: 3.14, 3.04, 3.01
Tasks: 192 total,   5 running, 187 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.3%us, 38.8%sy,  0.0%ni, 24.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2024672k used,    50260k free,   391736k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7996 mjones    25   0 2052m 697m 1084 R  63.0 34.4  46852:58 bash
 1927 root      10 -10 22524  21m 1740 S  0.0  1.1   0:00.02 iscmd12
 1233 root      18   0 27052  12m 7440 S  0.0  0.6   0:00.43 httpd
18857 apache    17   0 27076 7184 2140 S  0.0  0.3   0:00.00 httpd
You can also select the column you would like to sort your top output on by selecting it from a list of options. To do this, once you've started top, press a capital O and you will see a list of options like that shown below.
Current Sort Field:  K  for window 1:Def
Select sort field via field letter, type any other key to return

  a: PID        = Process Id                        the TTY & WCHAN fields will violate
  b: PPID       = Parent Process Pid                strict ASCII collating sequence.
  c: RUSER      = Real user name                    (shame on you if WCHAN is chosen)
  d: UID        = User Id
  e: USER       = User Name
  f: GROUP      = Group Name
  g: TTY        = Controlling Tty
  h: PR         = Priority
  i: NI         = Nice value
  j: P          = Last used cpu (SMP)
* K: %CPU       = CPU usage
  l: TIME       = CPU Time
  m: TIME+      = CPU Time, hundredths
  n: %MEM       = Memory usage (RES)
  o: VIRT       = Virtual Image (kb)
  p: SWAP       = Non-resident size (kb)
  q: RES        = Resident size (kb)
  r: CODE       = Code size (kb)
  s: DATA       = Data+Stack size (kb)
  t: SHR        = Shared Mem size (kb)
  u: nFLT       = Page Fault count
  v: nDRT       = Dirty Pages count
  w: S          = Process Status
  x: COMMAND    = Command name/line
  y: WCHAN      = Sleeping in Function
  z: Flags      = Task Flags 
Notice the * to the left of K: %CPU. This indicates which of the columns the information is being sorted on currently. Press another letter from the list and you will see the * move to a different line in your display. Then press return to see the data sorted on that column.
If you are sufficiently empowered, you can also kill processes from top without exiting top. Just press a lower case k and you will be prompted first for the process you want to kill and then for the signal you want to use to kill it (the default is 15). You will see an "Operation not permitted" error if you don't have sufficient rights to kill the process that you've selected.
Similarly, you can renice (i.e., change the nice setting) for a process by typing a lowercase r. You will then be prompted for the process ID of the process you want to renice and then the nice setting that you want to use instead.
PID to renice: 22720
and then ...
Renice PID 22720 to value: 10
If the system you are working on has more than one CPU, your top default display will combine the information on all CPUs into one line. To break this down by CPU instead, press a 1 while in top and your display will change to something like this:
top - 13:12:18 up 86 days,  2:07,  1 user,  load average: 3.06, 3.09, 3.05
Tasks: 192 total,   5 running, 187 sleeping,   0 stopped,   0 zombie
Cpu0  : 37.3%us, 62.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 13.3%us, 86.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2025292k used,    49640k free,   392424k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605740k cached
Typing c while running top will display the full path for the currently running process.
 7072 root      25   0  4216  984  812 R 100.0  0.0   9813:10 /usr/bin/whois 134.11.72.123
The top command will normally run continuously, updating its display every few seconds. If you would prefer that it update less frequently, you can type a lowercase d and then, when being prompted, tell top how often (in seconds) you want to see the updates.
Change delay from 3.0 to: 10
If you want top to run through a limited set of iterations, you can provide this number when you start top. For example, if you want to see only two iterations, type top -n 2.
% top -n 2
You can also type a lowercase h to get a little help while running top and, of course, q to quit

Tuesday, April 8, 2014

Check hardware information on Linux with hwinfo command

http://www.binarytides.com/linux-hwinfo-command

Hwinfo

The hwinfo command is a very handy command line tool that can be used to probe for details about hardware components. It reports information about most hardware units like cpu, hdd controllers, usb controllers, network card, graphics cards, multimedia, printers etc.



Hwinfo depends on the libhd library to gather hardware information which depends on libhal.
linux hwinfo command
Hwinfo is available in the repositories of Ubuntu and Debian.
# ubuntu, debian
$ sudo apt-get install hwinfo
To install Hwinfo on Fedora or CentOS follow this post
How to install hwinfo on Fedora 19/20 and CentOS 5/6

Using hwinfo

The help information explains how to use it
$ hwinfo --help
Usage: hwinfo [options]
Probe for hardware.
  --short        just a short listing
  --log logfile  write info to logfile
  --debug level  set debuglevel
  --version      show libhd version
  --dump-db n    dump hardware data base, 0: external, 1: internal
  --hw_item      probe for hw_item
  hw_item is one of:
   all, bios, block, bluetooth, braille, bridge, camera, cdrom, chipcard,
   cpu, disk, dsl, dvb, fingerprint, floppy, framebuffer, gfxcard, hub,
   ide, isapnp, isdn, joystick, keyboard, memory, modem, monitor, mouse,
   netcard, network, partition, pci, pcmcia, pcmcia-ctrl, pppoe, printer,
   scanner, scsi, smp, sound, storage-ctrl, sys, tape, tv, usb, usb-ctrl,
   vbe, wlan, zip

  Note: debug info is shown only in the log file. (If you specify a
  log file the debug level is implicitly set to a reasonable value.)
The options are few, just mention the hardware item for which you would like to see the information and it would display that only.

1. Display all information

Running hwinfo without any options would display detailed information about all hardware units
$ hwinfo

2. Display brief information

The "--short" option will display brief information about the hardware and not the details
$ hwinfo --short
Here is the output from my system
cpu:
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
keyboard:
  /dev/input/event2    AT Translated Set 2 keyboard
mouse:
  /dev/input/mice      Microsoft Basic Optical Mouse v2.0
graphics card:
                       Intel 965G-1
                       Intel 82G35 Express Integrated Graphics Controller
sound:
                       Intel 82801H (ICH8 Family) HD Audio Controller
storage:
                       Intel 82801H (ICH8 Family) 4 port SATA IDE Controller
                       Intel 82801H (ICH8 Family) 2 port SATA IDE Controller
                       JMicron JMB368 IDE controller
network:
  eth0                 Intel 82566DC Gigabit Network Connection
network interface:
  eth0                 Ethernet network interface
  lo                   Loopback network interface
disk:
  /dev/sda             ST3500418AS
partition:
  /dev/sda1            Partition
  /dev/sda2            Partition
  /dev/sda5            Partition
  /dev/sda6            Partition
  /dev/sda7            Partition
  /dev/sda8            Partition
cdrom:
  /dev/sr0             SONY DVD RW DRU-190A
usb controller:
                       Intel 82801H (ICH8 Family) USB UHCI Controller #4
                       Intel 82801H (ICH8 Family) USB UHCI Controller #5
                       Intel 82801H (ICH8 Family) USB2 EHCI Controller #2
                       Intel 82801H (ICH8 Family) USB UHCI Controller #1
                       Intel 82801H (ICH8 Family) USB UHCI Controller #2
                       Intel 82801H (ICH8 Family) USB UHCI Controller #3
                       Intel 82801H (ICH8 Family) USB2 EHCI Controller #1
bios:
                       BIOS
bridge:
                       Intel 82G35 Express DRAM Controller
                       Intel 82801H (ICH8 Family) PCI Express Port 1
                       Intel 82801H (ICH8 Family) PCI Express Port 2
                       Intel 82801H (ICH8 Family) PCI Express Port 3
                       Intel 82801 PCI Bridge
                       Intel 82801HB/HR (ICH8/R) LPC Interface Controller
hub:
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
memory:
                       Main Memory
firewire controller:
                       Agere FW323
unknown:
                       FPU
                       DMA controller
                       PIC
                       Timer
                       Keyboard controller
                       Intel 82801H (ICH8 Family) SMBus Controller
                       Serial controller



Save it to a file
$ hwinfo --short > hardware_brief.txt

3. View CPU details

With the "--cpu" option, hwinfo would display only cpu information.
$ hwinfo --short --cpu
cpu:                                                            
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
Remove the short option to display detailed information, about the cpu.

4. Display network card information

$ sudo hwinfo --short --netcard
network:                                                        
  eth0                 Intel 82566DC Gigabit Network Connection

5. Storage devices and partitions

[term] $ sudo hwinfo --short --block disk: /dev/sda ST3500418AS partition: /dev/sda1 Partition /dev/sda2 Partition /dev/sda5 Partition /dev/sda6 Partition /dev/sda7 Partition /dev/sda8 Partition cdrom: /dev/sr0 SONY DVD RW DRU-190A

6. Hard drive controllers

$ sudo hwinfo --short --storage
storage:                                                        
                       Intel 82801H (ICH8 Family) 4 port SATA IDE Controller
                       Intel 82801H (ICH8 Family) 2 port SATA IDE Controller
                       JMicron JMB368 IDE controller

7. USB devices and controllers

$ sudo hwinfo --short --usb
mouse:                                                          
  /dev/input/mice      Microsoft Basic Optical Mouse v2.0
hub:
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller

8. Display multiple devices together

To display multiple hardware units together, just add all the options
$ sudo hwinfo --short --usb --cpu --block
cpu:                                                            
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
mouse:
  /dev/input/mice      Microsoft Basic Optical Mouse v2.0
disk:
  /dev/sda             ST3500418AS
partition:
  /dev/sda1            Partition
  /dev/sda2            Partition
  /dev/sda5            Partition
  /dev/sda6            Partition
  /dev/sda7            Partition
  /dev/sda8            Partition
cdrom:
  /dev/sr0             SONY DVD RW DRU-190A
hub:
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller

9. Log information to a file

The hwinfo has an option to log all data to a file. The following command will log detailed information about all hardware units to a text file.
$ hwinfo --all --log hardware_info.txt
To log short information in addition to the detailed information, add the "short" option too. Not sure if it is supposed to work like that.
$ hwinfo --all --short --log hardware_info.txt