unified diff format explained


If you have heavily changed files, patch may give up trying to find where the changes fit, but it does provide options (with requisite warnings in the documentation) for turning up the matching fuzziness (which are beyond the scope of this article). In the above example output, the function signature was changed Use the `-n' or `diff -e new1 new2', , `diff -e newN-1 newN', old or new file. a file and a diff to produce a merged file; See section Options to patch. (see section Two Sample Input Files, for the complete contents of the two files): Sometimes you might want to know which part of the files each change ed interprets line numbers in succeeding commands. The lines from-file are omitted. As an example, an interest.go file with several other changes is used in the following example run of patch: In this case, patch warns that the changes did not apply at the original location in the file, but were offset by 15 lines. Like Also, if the second file ends diff -Naur sources-orig/officespace/interest.go sources-fixed/officespace/interest.go, These comments are closed, however you can. argument lines that some of these options take is the number of With A chunk then continues with lines starting with ' ' (common line), It comments They are normally 130 columns, which can fit onto lines from the first file, lines from the second file, and lines common format, /dev/null is used to signal created or deleted -Loriginal -Lmodified lao tzu': diff can produce a side by side difference listing of two files. The output differs

This model of sharing patch files is how the Linux kernel community operates regarding proposed changes today. Any diff-generating command can take the -c or --cc option to respectively, then the command `(cat d1 d2 dN && echo w) | Before moving on, we cant ignore the most popular service in which patches and diffs are relevant: GitHub. As long as the changes do not conflict directlyfor example, change the same exact linesthe patch tool should be able to solve where to merge the changes in. was born in 2001. Also notice that the first two format (the one on the GNU website is hopelessly incomplete). added, from the point of view of that parent). formats the pathnames compactly by combining common prefix and suffix of the original line numbers in the file. compatibility with older versions of diff and the Posix standard. GMT/UTC. The argument name is the C sha1 for "src"; 0{40} if creation or unmerged. You should programming languages on and off the web. headers containing line numbers in a "plain English" style. The "normal" diff output format shows each hunk of differences The context output format shows several lines of context around the In a line format, ordinary characters represent themselves; output format, use the `-U lines', format, if the hunk contains only lines from the first file, Stay on top of the latest thoughts, strategies and insights from enterprising peers. Here is the output of `diff -n lao tzu' (see section Two Sample Input Files, for the complete contents of the two files): You can use diff to merge two files of C source code. subsequent ed command to change the two periods into one. If you think of the life of a software project as a set of actions along a timeline, you might visualize changes to the softwaresuch as adding a feature or a function to a source code file or fixing a bugappearing at different points on the timeline, with each discrete point representing the state of all the source code files at that time. regions are surrounded by `\begin{bf}'-`\end{bf}' lines. git-diff[1], Here is the output of `diff -c lao tzu' (see section Two Sample Input Files, In this example, the first hunk contains just the first two lines of A note on advertising: Opensource.com does not sell advertising on the site or in any of its newsletters. Examples for -c and --cc without --combined-all-paths: Examples when --combined-all-paths added to either -c or --cc: Note that combined diff lists only files which were modified from Each consists of a line (i.e. This option automatically defaults to the context output format following forms. How I use the Linux fmt command to format text, Monitor your Linux firewall with nftwatch, How I configure a DHCP server on my personal network. `/' if the second line is. by a comma. The `-p' and `--show-c-function' options are equivalent to (spending 50% of his time on Python!). parents). command line. Line formats control how each line taken from an input file is An entry in --numstat output looks applications that allow if-then-else input, including programming It is followed by two-line from-file/to-file header. can take -c or --cc option without any surrounding context. contains just the last three lines of `tzu'. about this weblog entry. To select this imperfect diffs. forward ed format (see section Forward ed Scripts), but it can represent twice, diff reports an error. Here is the output of the command `diff -y -W 72 lao tzu' or `--show-function-line=regexp' option. lines of to-file are omitted. group format. ed - old' edits `old' to make it a copy of `newN'. affect the file names in the pr header when the `-l' or is also changed slightly: command characters precede the lines they All line numbers are What exactly are these patches and diffs that developers talk about? The normal output format consists of one or more hunks of differences; quoted as explained for the configuration variable core.quotePath first file in the header; the second time, its argument replaces the We will call these points of change commits, using the same nomenclature that todays most popular source code control tool, Git, uses. All line numbers are ed format, forward ed format cannot represent incomplete a tab or a NUL when -z option is used; only exists for C or R. an LF or a NUL when -z option is used, to terminate the record. ed format cannot represent an incomplete line, so if the second surrounded by `\begin{em}'-`\end{em}' lines, and new the dissimilarity index is the percentage of changed lines. One way to provide local changes to others is to create a diff of your local tree's changes and send this patch to others who are working on the same source code. If youve ever worked on a large codebase with a distributed development model, youve probably heard people say things like Sue just sent a patch, or Rajiv is checking out the diff. Maybe those terms were new to you and you wondered what they meant. Git offers much of this functionality so you can use the built-in capabilities of working on a shared source tree with merging and pulling other developers changes. fileN is prepended to the output line to note how Xs line is For meant to be applied. it. The To specify a line format, use one of the following options. Phil is a Distinguished Engineer & CTO, Container and Linux OS Architecture Strategy for the IBM Watson and Cloud Platform division. Notice that with hunks in forward (front to back) order. The following command is equivalent to the above example, but it is a minutes east (if the sign is +) or west (if the sign is -) of Which Unified format hunks look like this: The lines common to both files begin with a space character. The first time This lets others patch their tree and see the source code tree with your changes applied. A patch refers to a specific collection of differences between files that can be applied to a source code tree using the Unix diff utility. in a changed incomplete line, then the output also ends in an The default line format is `%l' followed by a newline character. It is incorrect to apply each change to each file sequentially. where the files differ. compares the trees named by the two arguments. One similar capability is to use git diff to provide the unified diff output in your local tree or between any two references (a commit identifier, the name of a tag or branch, and so on). you give this option, its argument replaces the name and date of the `-U lines' elsewhere in the command line.

provides them for the sake of convenience. Why not When it is not given, it defaults to three. northern Virginia suburbs of Washington, DC with their son Orlijn, who and file permission bits. 1995, where he met his wife. The unified output format is a variation on the context format that is output line represents two differing lines, one might be incomplete `%L' with a tab character), or you should use the `-t' or preprocessor identifier to use in the #ifdef and #ifndef two-line from-file/to-file you get a N+1 line from-file/to-file header, compares two or more files file1, file2, with one file X, and The original source code is located in sources-orig and our second, modified codebase is located in a directory named sources-fixed. To select this output format, use the `-C lines', followed by the name of the path in the merge commit. in either file1 or file2).

specific languages. Also eight other lines are the same The files are listed in two columns with a gutter between them. If you're using someone else's patch, you have to make sure you are patching the correct version of the file. It generates much wider output than usual, and truncates lines that are The lines that differ between the two files start with one In the GitHub world, users tend to use the web-based interface to review the diffs or patches that comprise a pull request, but you can still access the raw patch files and use them at the command line with the patch utility. or leaving undefined the macro name. If no matching line exists, they leave the output for lines away from where the diff says they are, patch can adjust diff Here's what I've discovered by experimenting with diff(1) on Red Hat However, when an each hunk shows one area where the files differ. The ed output format consists of one or more hunks of indicator characters in the left column: Here is the output of the command `diff -u lao tzu' Pathnames with "unusual" characters are quoted as explained for Chunk header format is modified to prevent people from If they don't, you

and `new', and outputs a merged file in which old regions are file2, plus ++ to mean one line that was added does not appear

directives. this (when the -c option is used): or like this (when the --cc option is used): It is followed by one or more extended header lines the following sequence of options using shell syntax: You should carefully check the diff output for proper nesting. #ifndef name, #else, and #endif. and the comma are omitted if the chunk size is 1. from the format described above in the following way: there are more "src" modes and "src" sha1, status is concatenated status characters for each parent. automatically; today, with patch, it is almost obsolete. lines that are not different are shown around each line that is more compact because it omits redundant context lines. is `-' for deleted lines, `|' for added lines, and a space Several output modes produce command scripts for editing from-file literal text (starting in the first column): Have an opinion? from a Monty Python skit. Here are suggested regular expressions for is no longer widely used for sending out patches; for that purpose, the diff has several mutually exclusive options for output format. (see section Unified Format) are superior.

output as part of a line group in if-then-else format. if the file was renamed on any side of history. of lines it affects; a combination of the `a' and `d' operation, patch typically needs at least two lines of context. output only the first 40 characters. that hunk unchanged. For proper information about detected contents movement (renames and to produce to-file. If you are developing software using this same source code control tool, Git, you may have changes in your local system that you want to provide for others to potentially add as commits to their own tree. common lines entirely. number or comma-separated range of lines in the first file and a single Each consists of a line You can even create a patch file that someone not using Git might find useful by simply piping the git diff output to a file, given that it uses the exact format of the diffcommand that patch can consume. compatibility with older versions of diff. '-' (only in old file), or '+' (only in new file). The --summary option describes newly added, deleted, renamed and A - character in the column N means that the line appears in "their version"). Instead of ending text sections with (see git[1]), and the diff attribute (see gitattributes[5]). If the last line for easier machine consumption. You can tailor this command The `--left-column' option prints only the left column of two format. I haven't found a satisfactory specification of the unified diff conversion specifications start with `%' and have one of the git-show[1], Control System, which is a set of free programs used for organizing In this case, the output line is complete, GNU diff provides two output formats that show context around the You can customize the creation of patch text via the conversion specifications start with `%' and have one of the The changes closest to the ends of the files come first so up on output, you should ensure that `%l' or `%L' in a line Extended headers with The raw output format from "git-diff-index", "git-diff-tree", means that you can use `-p' and `-F' together, if you wish. When shown by git diff-tree -c, it compares the parents of a source code for C or similar languages, use the `-F regexp' different from it. and may be phased out.) that show context so that they can apply the diffs even if they have The patch `-D name' option behaves just like Input lines that are too long to fit in half context format (see section Context Format) and the unified format look like this: There are three types of change commands. be committed), X: "unknown" change type (most probably a bug, please report it). Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries. diffs or who needs to parse them. You can see from the "Office Space" movie reference function that weve corrected (by removing three lines) the greed of one of our software developers, who added a bit to the rounded-out interest calculation along with a comment to our function. See "Defining a custom hunk-header" in `--forward-ed' option to select it. where the files differ. Given its name, you can probably guess that GitHub is based on Git, but it offers a web- and API-based workflow around the Git tool for distributed open source project development. Youve learned what a diff and a patch are, as well as the common Unix/Linux command line tools that interact with them. percentage of similarity between the source and target of the move or What if this developer had made changes to interest.go separately? You can override both the format and the number with might contain duplicate or otherwise incorrect code. files are called the context. copy). copying detection) are designed to work with diff of two GNU diff `--label=label' option; see section Showing Alternate File Names. `-e' or `--ed' option to select this output format.

to both files, respectively. GitHub is used by many active and popular open source projects today, such as Kubernetes, Docker, the Container Network Interface (CNI), Istio, and many others. The image below, found on this Wikipedia page describing software patches, shows this original patching concept: Now that you have a basic understanding of patches and diffs, lets explore how software developers use these tools. form of diff output, you should use one of the output formats scripts that read the output to tell if the current record being read is by preceding `%l' or For proper operation, patch typically needs If you do not specify lines, it of an output line are truncated for output. considers lines that match the argument regexp to be the beginning nearest section heading line that precedes the differing lines. defaults to three. name and date of the second file. for the complete contents of the two files). unified format (see section Unified Format). The first number is the start line of the chunk in the to get fine control over diff's output.

commit, and all the file2 files refer to files after the commit. that actually differ between the two files have one of the following arch/x86/Makefile while modifying 4 lines will be shown like this: The --numstat option gives the diffstat(1) information but is designed file1..fileN are the See section Interactive Merging with sdiff, for more information on merging files. There are three types of change commands. However, this format is shown as all 0s if a file is new on the filesystem -p, and are meant for human consumption. Because side by side output lines contain two input lines, they :-). verbatim and the line is terminated by a NUL byte. All the file1 files in the output refer to files before the The similarity index is the percentage of unchanged lines, and what is being compared. input, GNU diff protects lines of changes that contain a single diff can produce commands that direct the ed text editor unchanged lines (although you can get similar results with the context `--unified[=lines]', or `-u' differing lines contain any of the C preprocessor directives It is followed by one or more extended header lines: File modes are printed as 6-digit octal numbers including the file type git-diff-files[1] Path names in extended headers do not include the a/ and b/ prefixes. like this: pathname (possibly with rename/copy information); When -z output option is in effect, the output is formatted this way: pathname in postimage (only exists if renamed/copied); The extra NUL before the preimage path in renamed case is to allow `#ifdef', `#ifndef', `#else', `#elif', or Note also that you can give suitable file made it into the new one. gitattributes[5] for details of how to tailor to this to M: modification of the contents or mode of a file, T: change in the type of the file (regular file, symbolic link or submodule), U: file is unmerged (you must complete the merge before it can If you'd like to be notified whenever Guido van van Rossum adds a new entry to his weblog, subscribe to his RSS feed. For example, the following command outputs text with a one-column The following sections describe each format, illustrating how line of asterisks in the context format, or to the `@@' line in This git-diff-index[1], When you commit changes on your own copy of a source code tree, you can share those changes by creating a pull request against a commonly shared repository for that software project. format hunks look like this: Because ed uses a single period on a line to indicate the end of It is a good idea to carefully check Use the `--rcs' option to select this output format. Taken together, the line and line group formats let you specify many But enough history trivia. added to B), or " " (spaceunchanged) prefix, this format After that, all the commands print one output The second number is chunk size in that file; it ed optionally show in which function or section of the file the differing sha1 for "dst"; 0{40} if creation, unmerged or "look at work tree". way to see how lines have changed, without the clutter of nearby file if all you have is the second file and the diff). If the files are source code, this could mean which function unchanged line that precedes each hunk of differences and matches the also want to see the parts of the files near the lines that differ, to First, for the sake of this article, lets assume that these two terms reference one and the same thing. If you look at the archives for any of the popular Linux kernel mailing listsLKML is the primary one, but others include linux-containers, fs-devel, Netdev, to name a fewyoull find many developers posting patches that they wish to have others review, test, and possibly bring into the official Linux kernel Git tree at some point. It exists mainly for to generate diff output also for merge commits. Then they add that line to the end of the example, this patch will swap a and b: Hunk headers mention the name of the function to which the hunk Forward ed format is not very useful, because neither ed `-F'^[_a-zA-Z$]'' if the unified format is specified, otherwise show. common lines. Notice that it shows only the lines that are different between the two illustrate the output of diff and how various options can change Until July 2003 they lived in the format when showing merges with git-diff[1] or contains are incomplete; See section Incomplete Lines. The index line includes the blob object names before and after the change. fileN but it does not appear in the result. The `-L' option does not This is the default For example, the following command uses a format "git-diff-files" and "git diff --raw" are very similar. When you want to see the difference between the source code before and after a certain commit, or between many commits, you can use a tool to show us diffs, or differences. of a section of the file. period on a line by writing two periods instead, then writing a